Vexler, Albert; Tanajian, Hovig; Hutson, Alan D
In practice, parametric likelihood-ratio techniques are powerful statistical tools. In this article, we propose and examine novel and simple distribution-free test statistics that efficiently approximate parametric likelihood ratios to analyze and compare distributions of K groups of observations. Using the density-based empirical likelihood methodology, we develop a Stata package that applies to a test for symmetry of data distributions and compares K -sample distributions. Recognizing that recent statistical software packages do not sufficiently address K -sample nonparametric comparisons of data distributions, we propose a new Stata command, vxdbel, to execute exact density-based empirical likelihood-ratio tests using K samples. To calculate p -values of the proposed tests, we use the following methods: 1) a classical technique based on Monte Carlo p -value evaluations; 2) an interpolation technique based on tabulated critical values; and 3) a new hybrid technique that combines methods 1 and 2. The third, cutting-edge method is shown to be very efficient in the context of exact-test p -value computations. This Bayesian-type method considers tabulated critical values as prior information and Monte Carlo generations of test statistic values as data used to depict the likelihood function. In this case, a nonparametric Bayesian method is proposed to compute critical values of exact tests.
Empirical likelihood-based confidence intervals for mean medical cost with censored data.
Jeyarajah, Jenny; Qin, Gengsheng
2017-11-10
In this paper, we propose empirical likelihood methods based on influence function and jackknife techniques for constructing confidence intervals for mean medical cost with censored data. We conduct a simulation study to compare the coverage probabilities and interval lengths of our proposed confidence intervals with that of the existing normal approximation-based confidence intervals and bootstrap confidence intervals. The proposed methods have better finite-sample performances than existing methods. Finally, we illustrate our proposed methods with a relevant example. Copyright © 2017 John Wiley & Sons, Ltd.
Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.
Xie, Yanmei; Zhang, Biao
2017-04-20
Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).
Grummer, Jared A; Bryson, Robert W; Reeder, Tod W
2014-03-01
Current molecular methods of species delimitation are limited by the types of species delimitation models and scenarios that can be tested. Bayes factors allow for more flexibility in testing non-nested species delimitation models and hypotheses of individual assignment to alternative lineages. Here, we examined the efficacy of Bayes factors in delimiting species through simulations and empirical data from the Sceloporus scalaris species group. Marginal-likelihood scores of competing species delimitation models, from which Bayes factor values were compared, were estimated with four different methods: harmonic mean estimation (HME), smoothed harmonic mean estimation (sHME), path-sampling/thermodynamic integration (PS), and stepping-stone (SS) analysis. We also performed model selection using a posterior simulation-based analog of the Akaike information criterion through Markov chain Monte Carlo analysis (AICM). Bayes factor species delimitation results from the empirical data were then compared with results from the reversible-jump MCMC (rjMCMC) coalescent-based species delimitation method Bayesian Phylogenetics and Phylogeography (BP&P). Simulation results show that HME and sHME perform poorly compared with PS and SS marginal-likelihood estimators when identifying the true species delimitation model. Furthermore, Bayes factor delimitation (BFD) of species showed improved performance when species limits are tested by reassigning individuals between species, as opposed to either lumping or splitting lineages. In the empirical data, BFD through PS and SS analyses, as well as the rjMCMC method, each provide support for the recognition of all scalaris group taxa as independent evolutionary lineages. Bayes factor species delimitation and BP&P also support the recognition of three previously undescribed lineages. In both simulated and empirical data sets, harmonic and smoothed harmonic mean marginal-likelihood estimators provided much higher marginal-likelihood estimates than PS and SS estimators. The AICM displayed poor repeatability in both simulated and empirical data sets, and produced inconsistent model rankings across replicate runs with the empirical data. Our results suggest that species delimitation through the use of Bayes factors with marginal-likelihood estimates via PS or SS analyses provide a useful and complementary alternative to existing species delimitation methods.
Empirical likelihood method for non-ignorable missing data problems.
Guan, Zhong; Qin, Jing
2017-01-01
Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)'s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.
Gengsheng Qin; Davis, Angela E; Jing, Bing-Yi
2011-06-01
For a continuous-scale diagnostic test, it is often of interest to find the range of the sensitivity of the test at the cut-off that yields a desired specificity. In this article, we first define a profile empirical likelihood ratio for the sensitivity of a continuous-scale diagnostic test and show that its limiting distribution is a scaled chi-square distribution. We then propose two new empirical likelihood-based confidence intervals for the sensitivity of the test at a fixed level of specificity by using the scaled chi-square distribution. Simulation studies are conducted to compare the finite sample performance of the newly proposed intervals with the existing intervals for the sensitivity in terms of coverage probability. A real example is used to illustrate the application of the recommended methods.
Inferring the parameters of a Markov process from snapshots of the steady state
NASA Astrophysics Data System (ADS)
Dettmer, Simon L.; Berg, Johannes
2018-02-01
We seek to infer the parameters of an ergodic Markov process from samples taken independently from the steady state. Our focus is on non-equilibrium processes, where the steady state is not described by the Boltzmann measure, but is generally unknown and hard to compute, which prevents the application of established equilibrium inference methods. We propose a quantity we call propagator likelihood, which takes on the role of the likelihood in equilibrium processes. This propagator likelihood is based on fictitious transitions between those configurations of the system which occur in the samples. The propagator likelihood can be derived by minimising the relative entropy between the empirical distribution and a distribution generated by propagating the empirical distribution forward in time. Maximising the propagator likelihood leads to an efficient reconstruction of the parameters of the underlying model in different systems, both with discrete configurations and with continuous configurations. We apply the method to non-equilibrium models from statistical physics and theoretical biology, including the asymmetric simple exclusion process (ASEP), the kinetic Ising model, and replicator dynamics.
On Bayesian Testing of Additive Conjoint Measurement Axioms Using Synthetic Likelihood.
Karabatsos, George
2018-06-01
This article introduces a Bayesian method for testing the axioms of additive conjoint measurement. The method is based on an importance sampling algorithm that performs likelihood-free, approximate Bayesian inference using a synthetic likelihood to overcome the analytical intractability of this testing problem. This new method improves upon previous methods because it provides an omnibus test of the entire hierarchy of cancellation axioms, beyond double cancellation. It does so while accounting for the posterior uncertainty that is inherent in the empirical orderings that are implied by these axioms, together. The new method is illustrated through a test of the cancellation axioms on a classic survey data set, and through the analysis of simulated data.
NASA Astrophysics Data System (ADS)
Aminah, Agustin Siti; Pawitan, Gandhi; Tantular, Bertho
2017-03-01
So far, most of the data published by Statistics Indonesia (BPS) as data providers for national statistics are still limited to the district level. Less sufficient sample size for smaller area levels to make the measurement of poverty indicators with direct estimation produced high standard error. Therefore, the analysis based on it is unreliable. To solve this problem, the estimation method which can provide a better accuracy by combining survey data and other auxiliary data is required. One method often used for the estimation is the Small Area Estimation (SAE). There are many methods used in SAE, one of them is Empirical Best Linear Unbiased Prediction (EBLUP). EBLUP method of maximum likelihood (ML) procedures does not consider the loss of degrees of freedom due to estimating β with β ^. This drawback motivates the use of the restricted maximum likelihood (REML) procedure. This paper proposed EBLUP with REML procedure for estimating poverty indicators by modeling the average of household expenditures per capita and implemented bootstrap procedure to calculate MSE (Mean Square Error) to compare the accuracy EBLUP method with the direct estimation method. Results show that EBLUP method reduced MSE in small area estimation.
Xu, Maoqi; Chen, Liang
2018-01-01
The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Maximum Likelihood Estimations and EM Algorithms with Length-biased Data
Qin, Jing; Ning, Jing; Liu, Hao; Shen, Yu
2012-01-01
SUMMARY Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, epidemiological, genetic and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimations and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semi-parametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online. PMID:22323840
Empirical likelihood inference in randomized clinical trials.
Zhang, Biao
2017-01-01
In individually randomized controlled trials, in addition to the primary outcome, information is often available on a number of covariates prior to randomization. This information is frequently utilized to undertake adjustment for baseline characteristics in order to increase precision of the estimation of average treatment effects; such adjustment is usually performed via covariate adjustment in outcome regression models. Although the use of covariate adjustment is widely seen as desirable for making treatment effect estimates more precise and the corresponding hypothesis tests more powerful, there are considerable concerns that objective inference in randomized clinical trials can potentially be compromised. In this paper, we study an empirical likelihood approach to covariate adjustment and propose two unbiased estimating functions that automatically decouple evaluation of average treatment effects from regression modeling of covariate-outcome relationships. The resulting empirical likelihood estimator of the average treatment effect is as efficient as the existing efficient adjusted estimators 1 when separate treatment-specific working regression models are correctly specified, yet are at least as efficient as the existing efficient adjusted estimators 1 for any given treatment-specific working regression models whether or not they coincide with the true treatment-specific covariate-outcome relationships. We present a simulation study to compare the finite sample performance of various methods along with some results on analysis of a data set from an HIV clinical trial. The simulation results indicate that the proposed empirical likelihood approach is more efficient and powerful than its competitors when the working covariate-outcome relationships by treatment status are misspecified.
Ling, Cheng; Hamada, Tsuyoshi; Gao, Jingyang; Zhao, Guoguang; Sun, Donghong; Shi, Weifeng
2016-01-01
MrBayes is a widespread phylogenetic inference tool harnessing empirical evolutionary models and Bayesian statistics. However, the computational cost on the likelihood estimation is very expensive, resulting in undesirably long execution time. Although a number of multi-threaded optimizations have been proposed to speed up MrBayes, there are bottlenecks that severely limit the GPU thread-level parallelism of likelihood estimations. This study proposes a high performance and resource-efficient method for GPU-oriented parallelization of likelihood estimations. Instead of having to rely on empirical programming, the proposed novel decomposition storage model implements high performance data transfers implicitly. In terms of performance improvement, a speedup factor of up to 178 can be achieved on the analysis of simulated datasets by four Tesla K40 cards. In comparison to the other publicly available GPU-oriented MrBayes, the tgMC 3 ++ method (proposed herein) outperforms the tgMC 3 (v1.0), nMC 3 (v2.1.1) and oMC 3 (v1.00) methods by speedup factors of up to 1.6, 1.9 and 2.9, respectively. Moreover, tgMC 3 ++ supports more evolutionary models and gamma categories, which previous GPU-oriented methods fail to take into analysis.
Nonparametric spirometry reference values for Hispanic Americans.
Glenn, Nancy L; Brown, Vanessa M
2011-02-01
Recent literature sites ethnic origin as a major factor in developing pulmonary function reference values. Extensive studies established reference values for European and African Americans, but not for Hispanic Americans. The Third National Health and Nutrition Examination Survey defines Hispanic as individuals of Spanish speaking cultures. While no group was excluded from the target population, sample size requirements only allowed inclusion of individuals who identified themselves as Mexican Americans. This research constructs nonparametric reference value confidence intervals for Hispanic American pulmonary function. The method is applicable to all ethnicities. We use empirical likelihood confidence intervals to establish normal ranges for reference values. Its major advantage: it is model free, but shares asymptotic properties of model based methods. Statistical comparisons indicate that empirical likelihood interval lengths are comparable to normal theory intervals. Power and efficiency studies agree with previously published theoretical results.
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2016-06-30
Wildfire can significantly alter the hydrologic response of a watershed to the extent that even modest rainstorms can generate dangerous flash floods and debris flows. To reduce public exposure to hazard, the U.S. Geological Survey produces post-fire debris-flow hazard assessments for select fires in the western United States. We use publicly available geospatial data describing basin morphology, burn severity, soil properties, and rainfall characteristics to estimate the statistical likelihood that debris flows will occur in response to a storm of a given rainfall intensity. Using an empirical database and refined geospatial analysis methods, we defined new equations for the prediction of debris-flow likelihood using logistic regression methods. We showed that the new logistic regression model outperformed previous models used to predict debris-flow likelihood.
Empirical likelihood-based tests for stochastic ordering
BARMI, HAMMOU EL; MCKEAGUE, IAN W.
2013-01-01
This paper develops an empirical likelihood approach to testing for the presence of stochastic ordering among univariate distributions based on independent random samples from each distribution. The proposed test statistic is formed by integrating a localized empirical likelihood statistic with respect to the empirical distribution of the pooled sample. The asymptotic null distribution of this test statistic is found to have a simple distribution-free representation in terms of standard Brownian bridge processes. The approach is used to compare the lengths of rule of Roman Emperors over various historical periods, including the “decline and fall” phase of the empire. In a simulation study, the power of the proposed test is found to improve substantially upon that of a competing test due to El Barmi and Mukerjee. PMID:23874142
WEIGHTED LIKELIHOOD ESTIMATION UNDER TWO-PHASE SAMPLING
Saegusa, Takumi; Wellner, Jon A.
2013-01-01
We develop asymptotic theory for weighted likelihood estimators (WLE) under two-phase stratified sampling without replacement. We also consider several variants of WLEs involving estimated weights and calibration. A set of empirical process tools are developed including a Glivenko–Cantelli theorem, a theorem for rates of convergence of M-estimators, and a Donsker theorem for the inverse probability weighted empirical processes under two-phase sampling and sampling without replacement at the second phase. Using these general results, we derive asymptotic distributions of the WLE of a finite-dimensional parameter in a general semiparametric model where an estimator of a nuisance parameter is estimable either at regular or nonregular rates. We illustrate these results and methods in the Cox model with right censoring and interval censoring. We compare the methods via their asymptotic variances under both sampling without replacement and the more usual (and easier to analyze) assumption of Bernoulli sampling at the second phase. PMID:24563559
Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood
Bondell, Howard D.; Stefanski, Leonard A.
2013-01-01
Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805
Li, Xiang; Kuk, Anthony Y C; Xu, Jinfeng
2014-12-10
Human biomonitoring of exposure to environmental chemicals is important. Individual monitoring is not viable because of low individual exposure level or insufficient volume of materials and the prohibitive cost of taking measurements from many subjects. Pooling of samples is an efficient and cost-effective way to collect data. Estimation is, however, complicated as individual values within each pool are not observed but are only known up to their average or weighted average. The distribution of such averages is intractable when the individual measurements are lognormally distributed, which is a common assumption. We propose to replace the intractable distribution of the pool averages by a Gaussian likelihood to obtain parameter estimates. If the pool size is large, this method produces statistically efficient estimates, but regardless of pool size, the method yields consistent estimates as the number of pools increases. An empirical Bayes (EB) Gaussian likelihood approach, as well as its Bayesian analog, is developed to pool information from various demographic groups by using a mixed-effect formulation. We also discuss methods to estimate the underlying mean-variance relationship and to select a good model for the means, which can be incorporated into the proposed EB or Bayes framework. By borrowing strength across groups, the EB estimator is more efficient than the individual group-specific estimator. Simulation results show that the EB Gaussian likelihood estimates outperform a previous method proposed for the National Health and Nutrition Examination Surveys with much smaller bias and better coverage in interval estimation, especially after correction of bias. Copyright © 2014 John Wiley & Sons, Ltd.
An alternative empirical likelihood method in missing response problems and causal inference.
Ren, Kaili; Drummond, Christopher A; Brewster, Pamela S; Haller, Steven T; Tian, Jiang; Cooper, Christopher J; Zhang, Biao
2016-11-30
Missing responses are common problems in medical, social, and economic studies. When responses are missing at random, a complete case data analysis may result in biases. A popular debias method is inverse probability weighting proposed by Horvitz and Thompson. To improve efficiency, Robins et al. proposed an augmented inverse probability weighting method. The augmented inverse probability weighting estimator has a double-robustness property and achieves the semiparametric efficiency lower bound when the regression model and propensity score model are both correctly specified. In this paper, we introduce an empirical likelihood-based estimator as an alternative to Qin and Zhang (2007). Our proposed estimator is also doubly robust and locally efficient. Simulation results show that the proposed estimator has better performance when the propensity score is correctly modeled. Moreover, the proposed method can be applied in the estimation of average treatment effect in observational causal inferences. Finally, we apply our method to an observational study of smoking, using data from the Cardiovascular Outcomes in Renal Atherosclerotic Lesions clinical trial. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Suh, Youngsuk; Talley, Anna E.
2015-01-01
This study compared and illustrated four differential distractor functioning (DDF) detection methods for analyzing multiple-choice items. The log-linear approach, two item response theory-model-based approaches with likelihood ratio tests, and the odds ratio approach were compared to examine the congruence among the four DDF detection methods.…
An Empirical Comparison of Heterogeneity Variance Estimators in 12,894 Meta-Analyses
ERIC Educational Resources Information Center
Langan, Dean; Higgins, Julian P. T.; Simmonds, Mark
2015-01-01
Heterogeneity in meta-analysis is most commonly estimated using a moment-based approach described by DerSimonian and Laird. However, this method has been shown to produce biased estimates. Alternative methods to estimate heterogeneity include the restricted maximum likelihood approach and those proposed by Paule and Mandel, Sidik and Jonkman, and…
Survivorship analysis when cure is a possibility: a Monte Carlo study.
Goldman, A I
1984-01-01
Parametric survivorship analyses of clinical trials commonly involves the assumption of a hazard function constant with time. When the empirical curve obviously levels off, one can modify the hazard function model by use of a Gompertz or Weibull distribution with hazard decreasing over time. Some cancer treatments are thought to cure some patients within a short time of initiation. Then, instead of all patients having the same hazard, decreasing over time, a biologically more appropriate model assumes that an unknown proportion (1 - pi) have constant high risk whereas the remaining proportion (pi) have essentially no risk. This paper discusses the maximum likelihood estimation of pi and the power curves of the likelihood ratio test. Monte Carlo studies provide results for a variety of simulated trials; empirical data illustrate the methods.
Statistical methods for the beta-binomial model in teratology.
Yamamoto, E; Yanagimoto, T
1994-01-01
The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716
ERIC Educational Resources Information Center
Egberink, Iris J. L.; Meijer, Rob R.; Tendeiro, Jorge N.
2015-01-01
A popular method to assess measurement invariance of a particular item is based on likelihood ratio tests with all other items as anchor items. The results of this method are often only reported in terms of statistical significance, and researchers proposed different methods to empirically select anchor items. It is unclear, however, how many…
Tests for detecting overdispersion in models with measurement error in covariates.
Yang, Yingsi; Wong, Man Yu
2015-11-30
Measurement error in covariates can affect the accuracy in count data modeling and analysis. In overdispersion identification, the true mean-variance relationship can be obscured under the influence of measurement error in covariates. In this paper, we propose three tests for detecting overdispersion when covariates are measured with error: a modified score test and two score tests based on the proposed approximate likelihood and quasi-likelihood, respectively. The proposed approximate likelihood is derived under the classical measurement error model, and the resulting approximate maximum likelihood estimator is shown to have superior efficiency. Simulation results also show that the score test based on approximate likelihood outperforms the test based on quasi-likelihood and other alternatives in terms of empirical power. By analyzing a real dataset containing the health-related quality-of-life measurements of a particular group of patients, we demonstrate the importance of the proposed methods by showing that the analyses with and without measurement error correction yield significantly different results. Copyright © 2015 John Wiley & Sons, Ltd.
Postmodeling Sensitivity Analysis to Detect the Effect of Missing Data Mechanisms
ERIC Educational Resources Information Center
Jamshidian, Mortaza; Mata, Matthew
2008-01-01
Incomplete or missing data is a common problem in almost all areas of empirical research. It is well known that simple and ad hoc methods such as complete case analysis or mean imputation can lead to biased and/or inefficient estimates. The method of maximum likelihood works well; however, when the missing data mechanism is not one of missing…
Empirical Likelihood-Based Estimation of the Treatment Effect in a Pretest-Posttest Study.
Huang, Chiung-Yu; Qin, Jing; Follmann, Dean A
2008-09-01
The pretest-posttest study design is commonly used in medical and social science research to assess the effect of a treatment or an intervention. Recently, interest has been rising in developing inference procedures that improve efficiency while relaxing assumptions used in the pretest-posttest data analysis, especially when the posttest measurement might be missing. In this article we propose a semiparametric estimation procedure based on empirical likelihood (EL) that incorporates the common baseline covariate information to improve efficiency. The proposed method also yields an asymptotically unbiased estimate of the response distribution. Thus functions of the response distribution, such as the median, can be estimated straightforwardly, and the EL method can provide a more appealing estimate of the treatment effect for skewed data. We show that, compared with existing methods, the proposed EL estimator has appealing theoretical properties, especially when the working model for the underlying relationship between the pretest and posttest measurements is misspecified. A series of simulation studies demonstrates that the EL-based estimator outperforms its competitors when the working model is misspecified and the data are missing at random. We illustrate the methods by analyzing data from an AIDS clinical trial (ACTG 175).
Empirical Likelihood-Based Estimation of the Treatment Effect in a Pretest–Posttest Study
Huang, Chiung-Yu; Qin, Jing; Follmann, Dean A.
2013-01-01
The pretest–posttest study design is commonly used in medical and social science research to assess the effect of a treatment or an intervention. Recently, interest has been rising in developing inference procedures that improve efficiency while relaxing assumptions used in the pretest–posttest data analysis, especially when the posttest measurement might be missing. In this article we propose a semiparametric estimation procedure based on empirical likelihood (EL) that incorporates the common baseline covariate information to improve efficiency. The proposed method also yields an asymptotically unbiased estimate of the response distribution. Thus functions of the response distribution, such as the median, can be estimated straightforwardly, and the EL method can provide a more appealing estimate of the treatment effect for skewed data. We show that, compared with existing methods, the proposed EL estimator has appealing theoretical properties, especially when the working model for the underlying relationship between the pretest and posttest measurements is misspecified. A series of simulation studies demonstrates that the EL-based estimator outperforms its competitors when the working model is misspecified and the data are missing at random. We illustrate the methods by analyzing data from an AIDS clinical trial (ACTG 175). PMID:23729942
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
Robustness of fit indices to outliers and leverage observations in structural equation modeling.
Yuan, Ke-Hai; Zhong, Xiaoling
2013-06-01
Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
A Non-parametric Cutout Index for Robust Evaluation of Identified Proteins*
Serang, Oliver; Paulo, Joao; Steen, Hanno; Steen, Judith A.
2013-01-01
This paper proposes a novel, automated method for evaluating sets of proteins identified using mass spectrometry. The remaining peptide-spectrum match score distributions of protein sets are compared to an empirical absent peptide-spectrum match score distribution, and a Bayesian non-parametric method reminiscent of the Dirichlet process is presented to accurately perform this comparison. Thus, for a given protein set, the process computes the likelihood that the proteins identified are correctly identified. First, the method is used to evaluate protein sets chosen using different protein-level false discovery rate (FDR) thresholds, assigning each protein set a likelihood. The protein set assigned the highest likelihood is used to choose a non-arbitrary protein-level FDR threshold. Because the method can be used to evaluate any protein identification strategy (and is not limited to mere comparisons of different FDR thresholds), we subsequently use the method to compare and evaluate multiple simple methods for merging peptide evidence over replicate experiments. The general statistical approach can be applied to other types of data (e.g. RNA sequencing) and generalizes to multivariate problems. PMID:23292186
Williamson, Ross S.; Sahani, Maneesh; Pillow, Jonathan W.
2015-01-01
Stimulus dimensionality-reduction methods in neuroscience seek to identify a low-dimensional space of stimulus features that affect a neuron’s probability of spiking. One popular method, known as maximally informative dimensions (MID), uses an information-theoretic quantity known as “single-spike information” to identify this space. Here we examine MID from a model-based perspective. We show that MID is a maximum-likelihood estimator for the parameters of a linear-nonlinear-Poisson (LNP) model, and that the empirical single-spike information corresponds to the normalized log-likelihood under a Poisson model. This equivalence implies that MID does not necessarily find maximally informative stimulus dimensions when spiking is not well described as Poisson. We provide several examples to illustrate this shortcoming, and derive a lower bound on the information lost when spiking is Bernoulli in discrete time bins. To overcome this limitation, we introduce model-based dimensionality reduction methods for neurons with non-Poisson firing statistics, and show that they can be framed equivalently in likelihood-based or information-theoretic terms. Finally, we show how to overcome practical limitations on the number of stimulus dimensions that MID can estimate by constraining the form of the non-parametric nonlinearity in an LNP model. We illustrate these methods with simulations and data from primate visual cortex. PMID:25831448
Wang, Bo; Lin, Yin; Pan, Fu-shun; Yao, Chen; Zheng, Zi-Yu; Cai, Dan; Xu, Xiang-dong
2013-01-01
Wells score has been validated for estimation of pretest probability in patients with suspected deep vein thrombosis (DVT). In clinical practice, many clinicians prefer to use empirical estimation rather than Wells score. However, which method is better to increase the accuracy of clinical evaluation is not well understood. Our present study compared empirical estimation of pretest probability with the Wells score to investigate the efficiency of empirical estimation in the diagnostic process of DVT. Five hundred and fifty-five patients were enrolled in this study. One hundred and fifty patients were assigned to examine the interobserver agreement for Wells score between emergency and vascular clinicians. The other 405 patients were assigned to evaluate the pretest probability of DVT on the basis of the empirical estimation and Wells score, respectively, and plasma D-dimer levels were then determined in the low-risk patients. All patients underwent venous duplex scans and had a 45-day follow up. Weighted Cohen's κ value for interobserver agreement between emergency and vascular clinicians of the Wells score was 0.836. Compared with Wells score evaluation, empirical assessment increased the sensitivity, specificity, Youden's index, positive likelihood ratio, and positive and negative predictive values, but decreased negative likelihood ratio. In addition, the appropriate D-dimer cutoff value based on Wells score was 175 μg/l and 108 patients were excluded. Empirical assessment increased the appropriate D-dimer cutoff point to 225 μg/l and 162 patients were ruled out. Our findings indicated that empirical estimation not only improves D-dimer assay efficiency for exclusion of DVT but also increases clinical judgement accuracy in the diagnosis of DVT.
Confirmatory Factor Analysis of Ordinal Variables with Misspecified Models
ERIC Educational Resources Information Center
Yang-Wallentin, Fan; Joreskog, Karl G.; Luo, Hao
2010-01-01
Ordinal variables are common in many empirical investigations in the social and behavioral sciences. Researchers often apply the maximum likelihood method to fit structural equation models to ordinal data. This assumes that the observed measures have normal distributions, which is not the case when the variables are ordinal. A better approach is…
Two models for evaluating landslide hazards
Davis, J.C.; Chung, C.-J.; Ohlmacher, G.C.
2006-01-01
Two alternative procedures for estimating landslide hazards were evaluated using data on topographic digital elevation models (DEMs) and bedrock lithologies in an area adjacent to the Missouri River in Atchison County, Kansas, USA. The two procedures are based on the likelihood ratio model but utilize different assumptions. The empirical likelihood ratio model is based on non-parametric empirical univariate frequency distribution functions under an assumption of conditional independence while the multivariate logistic discriminant model assumes that likelihood ratios can be expressed in terms of logistic functions. The relative hazards of occurrence of landslides were estimated by an empirical likelihood ratio model and by multivariate logistic discriminant analysis. Predictor variables consisted of grids containing topographic elevations, slope angles, and slope aspects calculated from a 30-m DEM. An integer grid of coded bedrock lithologies taken from digitized geologic maps was also used as a predictor variable. Both statistical models yield relative estimates in the form of the proportion of total map area predicted to already contain or to be the site of future landslides. The stabilities of estimates were checked by cross-validation of results from random subsamples, using each of the two procedures. Cell-by-cell comparisons of hazard maps made by the two models show that the two sets of estimates are virtually identical. This suggests that the empirical likelihood ratio and the logistic discriminant analysis models are robust with respect to the conditional independent assumption and the logistic function assumption, respectively, and that either model can be used successfully to evaluate landslide hazards. ?? 2006.
NASA Technical Reports Server (NTRS)
Hoffbeck, Joseph P.; Landgrebe, David A.
1994-01-01
Many analysis algorithms for high-dimensional remote sensing data require that the remotely sensed radiance spectra be transformed to approximate reflectance to allow comparison with a library of laboratory reflectance spectra. In maximum likelihood classification, however, the remotely sensed spectra are compared to training samples, thus a transformation to reflectance may or may not be helpful. The effect of several radiance-to-reflectance transformations on maximum likelihood classification accuracy is investigated in this paper. We show that the empirical line approach, LOWTRAN7, flat-field correction, single spectrum method, and internal average reflectance are all non-singular affine transformations, and that non-singular affine transformations have no effect on discriminant analysis feature extraction and maximum likelihood classification accuracy. (An affine transformation is a linear transformation with an optional offset.) Since the Atmosphere Removal Program (ATREM) and the log residue method are not affine transformations, experiments with Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) data were conducted to determine the effect of these transformations on maximum likelihood classification accuracy. The average classification accuracy of the data transformed by ATREM and the log residue method was slightly less than the accuracy of the original radiance data. Since the radiance-to-reflectance transformations allow direct comparison of remotely sensed spectra with laboratory reflectance spectra, they can be quite useful in labeling the training samples required by maximum likelihood classification, but these transformations have only a slight effect or no effect at all on discriminant analysis and maximum likelihood classification accuracy.
Probabilistic analysis of tsunami hazards
Geist, E.L.; Parsons, T.
2006-01-01
Determining the likelihood of a disaster is a key component of any comprehensive hazard assessment. This is particularly true for tsunamis, even though most tsunami hazard assessments have in the past relied on scenario or deterministic type models. We discuss probabilistic tsunami hazard analysis (PTHA) from the standpoint of integrating computational methods with empirical analysis of past tsunami runup. PTHA is derived from probabilistic seismic hazard analysis (PSHA), with the main difference being that PTHA must account for far-field sources. The computational methods rely on numerical tsunami propagation models rather than empirical attenuation relationships as in PSHA in determining ground motions. Because a number of source parameters affect local tsunami runup height, PTHA can become complex and computationally intensive. Empirical analysis can function in one of two ways, depending on the length and completeness of the tsunami catalog. For site-specific studies where there is sufficient tsunami runup data available, hazard curves can primarily be derived from empirical analysis, with computational methods used to highlight deficiencies in the tsunami catalog. For region-wide analyses and sites where there are little to no tsunami data, a computationally based method such as Monte Carlo simulation is the primary method to establish tsunami hazards. Two case studies that describe how computational and empirical methods can be integrated are presented for Acapulco, Mexico (site-specific) and the U.S. Pacific Northwest coastline (region-wide analysis).
Order-restricted inference for means with missing values.
Wang, Heng; Zhong, Ping-Shou
2017-09-01
Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Staley, Dennis; Negri, Jacquelyn; Kean, Jason
2016-04-01
Population expansion into fire-prone steeplands has resulted in an increase in post-fire debris-flow risk in the western United States. Logistic regression methods for determining debris-flow likelihood and the calculation of empirical rainfall intensity-duration thresholds for debris-flow initiation represent two common approaches for characterizing hazard and reducing risk. Logistic regression models are currently being used to rapidly assess debris-flow hazard in response to design storms of known intensities (e.g. a 10-year recurrence interval rainstorm). Empirical rainfall intensity-duration thresholds comprise a major component of the United States Geological Survey (USGS) and the National Weather Service (NWS) debris-flow early warning system at a regional scale in southern California. However, these two modeling approaches remain independent, with each approach having limitations that do not allow for synergistic local-scale (e.g. drainage-basin scale) characterization of debris-flow hazard during intense rainfall. The current logistic regression equations consider rainfall a unique independent variable, which prevents the direct calculation of the relation between rainfall intensity and debris-flow likelihood. Regional (e.g. mountain range or physiographic province scale) rainfall intensity-duration thresholds fail to provide insight into the basin-scale variability of post-fire debris-flow hazard and require an extensive database of historical debris-flow occurrence and rainfall characteristics. Here, we present a new approach that combines traditional logistic regression and intensity-duration threshold methodologies. This method allows for local characterization of both the likelihood that a debris-flow will occur at a given rainfall intensity, the direct calculation of the rainfall rates that will result in a given likelihood, and the ability to calculate spatially explicit rainfall intensity-duration thresholds for debris-flow generation in recently burned areas. Our approach synthesizes the two methods by incorporating measured rainfall intensity into each model variable (based on measures of topographic steepness, burn severity and surface properties) within the logistic regression equation. This approach provides a more realistic representation of the relation between rainfall intensity and debris-flow likelihood, as likelihood values asymptotically approach zero when rainfall intensity approaches 0 mm/h, and increase with more intense rainfall. Model performance was evaluated by comparing predictions to several existing regional thresholds. The model, based upon training data collected in southern California, USA, has proven to accurately predict rainfall intensity-duration thresholds for other areas in the western United States not included in the original training dataset. In addition, the improved logistic regression model shows promise for emergency planning purposes and real-time, site-specific early warning. With further validation, this model may permit the prediction of spatially-explicit intensity-duration thresholds for debris-flow generation in areas where empirically derived regional thresholds do not exist. This improvement would permit the expansion of the early-warning system into other regions susceptible to post-fire debris flow.
Using phrases and document metadata to improve topic modeling of clinical reports.
Speier, William; Ong, Michael K; Arnold, Corey W
2016-06-01
Probabilistic topic models provide an unsupervised method for analyzing unstructured text, which have the potential to be integrated into clinical automatic summarization systems. Clinical documents are accompanied by metadata in a patient's medical history and frequently contains multiword concepts that can be valuable for accurately interpreting the included text. While existing methods have attempted to address these problems individually, we present a unified model for free-text clinical documents that integrates contextual patient- and document-level data, and discovers multi-word concepts. In the proposed model, phrases are represented by chained n-grams and a Dirichlet hyper-parameter is weighted by both document-level and patient-level context. This method and three other Latent Dirichlet allocation models were fit to a large collection of clinical reports. Examples of resulting topics demonstrate the results of the new model and the quality of the representations are evaluated using empirical log likelihood. The proposed model was able to create informative prior probabilities based on patient and document information, and captured phrases that represented various clinical concepts. The representation using the proposed model had a significantly higher empirical log likelihood than the compared methods. Integrating document metadata and capturing phrases in clinical text greatly improves the topic representation of clinical documents. The resulting clinically informative topics may effectively serve as the basis for an automatic summarization system for clinical reports. Copyright © 2016 Elsevier Inc. All rights reserved.
Empirical projection-based basis-component decomposition method
NASA Astrophysics Data System (ADS)
Brendel, Bernhard; Roessl, Ewald; Schlomka, Jens-Peter; Proksa, Roland
2009-02-01
Advances in the development of semiconductor based, photon-counting x-ray detectors stimulate research in the domain of energy-resolving pre-clinical and clinical computed tomography (CT). For counting detectors acquiring x-ray attenuation in at least three different energy windows, an extended basis component decomposition can be performed in which in addition to the conventional approach of Alvarez and Macovski a third basis component is introduced, e.g., a gadolinium based CT contrast material. After the decomposition of the measured projection data into the basis component projections, conventional filtered-backprojection reconstruction is performed to obtain the basis-component images. In recent work, this basis component decomposition was obtained by maximizing the likelihood-function of the measurements. This procedure is time consuming and often unstable for excessively noisy data or low intrinsic energy resolution of the detector. Therefore, alternative procedures are of interest. Here, we introduce a generalization of the idea of empirical dual-energy processing published by Stenner et al. to multi-energy, photon-counting CT raw data. Instead of working in the image-domain, we use prior spectral knowledge about the acquisition system (tube spectra, bin sensitivities) to parameterize the line-integrals of the basis component decomposition directly in the projection domain. We compare this empirical approach with the maximum-likelihood (ML) approach considering image noise and image bias (artifacts) and see that only moderate noise increase is to be expected for small bias in the empirical approach. Given the drastic reduction of pre-processing time, the empirical approach is considered a viable alternative to the ML approach.
Estimation of submarine mass failure probability from a sequence of deposits with age dates
Geist, Eric L.; Chaytor, Jason D.; Parsons, Thomas E.; ten Brink, Uri S.
2013-01-01
The empirical probability of submarine mass failure is quantified from a sequence of dated mass-transport deposits. Several different techniques are described to estimate the parameters for a suite of candidate probability models. The techniques, previously developed for analyzing paleoseismic data, include maximum likelihood and Type II (Bayesian) maximum likelihood methods derived from renewal process theory and Monte Carlo methods. The estimated mean return time from these methods, unlike estimates from a simple arithmetic mean of the center age dates and standard likelihood methods, includes the effects of age-dating uncertainty and of open time intervals before the first and after the last event. The likelihood techniques are evaluated using Akaike’s Information Criterion (AIC) and Akaike’s Bayesian Information Criterion (ABIC) to select the optimal model. The techniques are applied to mass transport deposits recorded in two Integrated Ocean Drilling Program (IODP) drill sites located in the Ursa Basin, northern Gulf of Mexico. Dates of the deposits were constrained by regional bio- and magnetostratigraphy from a previous study. Results of the analysis indicate that submarine mass failures in this location occur primarily according to a Poisson process in which failures are independent and return times follow an exponential distribution. However, some of the model results suggest that submarine mass failures may occur quasiperiodically at one of the sites (U1324). The suite of techniques described in this study provides quantitative probability estimates of submarine mass failure occurrence, for any number of deposits and age uncertainty distributions.
Cross-validation to select Bayesian hierarchical models in phylogenetics.
Duchêne, Sebastián; Duchêne, David A; Di Giallonardo, Francesca; Eden, John-Sebastian; Geoghegan, Jemma L; Holt, Kathryn E; Ho, Simon Y W; Holmes, Edward C
2016-05-26
Recent developments in Bayesian phylogenetic models have increased the range of inferences that can be drawn from molecular sequence data. Accordingly, model selection has become an important component of phylogenetic analysis. Methods of model selection generally consider the likelihood of the data under the model in question. In the context of Bayesian phylogenetics, the most common approach involves estimating the marginal likelihood, which is typically done by integrating the likelihood across model parameters, weighted by the prior. Although this method is accurate, it is sensitive to the presence of improper priors. We explored an alternative approach based on cross-validation that is widely used in evolutionary analysis. This involves comparing models according to their predictive performance. We analysed simulated data and a range of viral and bacterial data sets using a cross-validation approach to compare a variety of molecular clock and demographic models. Our results show that cross-validation can be effective in distinguishing between strict- and relaxed-clock models and in identifying demographic models that allow growth in population size over time. In most of our empirical data analyses, the model selected using cross-validation was able to match that selected using marginal-likelihood estimation. The accuracy of cross-validation appears to improve with longer sequence data, particularly when distinguishing between relaxed-clock models. Cross-validation is a useful method for Bayesian phylogenetic model selection. This method can be readily implemented even when considering complex models where selecting an appropriate prior for all parameters may be difficult.
Combining evidence using likelihood ratios in writer verification
NASA Astrophysics Data System (ADS)
Srihari, Sargur; Kovalenko, Dimitry; Tang, Yi; Ball, Gregory
2013-01-01
Forensic identification is the task of determining whether or not observed evidence arose from a known source. It involves determining a likelihood ratio (LR) - the ratio of the joint probability of the evidence and source under the identification hypothesis (that the evidence came from the source) and under the exclusion hypothesis (that the evidence did not arise from the source). In LR- based decision methods, particularly handwriting comparison, a variable number of input evidences is used. A decision based on many pieces of evidence can result in nearly the same LR as one based on few pieces of evidence. We consider methods for distinguishing between such situations. One of these is to provide confidence intervals together with the decisions and another is to combine the inputs using weights. We propose a new method that generalizes the Bayesian approach and uses an explicitly defined discount function. Empirical evaluation with several data sets including synthetically generated ones and handwriting comparison shows greater flexibility of the proposed method.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Optimal thresholds for the estimation of area rain-rate moments by the threshold method
NASA Technical Reports Server (NTRS)
Short, David A.; Shimizu, Kunio; Kedem, Benjamin
1993-01-01
Optimization of the threshold method, achieved by determination of the threshold that maximizes the correlation between an area-average rain-rate moment and the area coverage of rain rates exceeding the threshold, is demonstrated empirically and theoretically. Empirical results for a sequence of GATE radar snapshots show optimal thresholds of 5 and 27 mm/h for the first and second moments, respectively. Theoretical optimization of the threshold method by the maximum-likelihood approach of Kedem and Pavlopoulos (1991) predicts optimal thresholds near 5 and 26 mm/h for lognormally distributed rain rates with GATE-like parameters. The agreement between theory and observations suggests that the optimal threshold can be understood as arising due to sampling variations, from snapshot to snapshot, of a parent rain-rate distribution. Optimal thresholds for gamma and inverse Gaussian distributions are also derived and compared.
Empirical Bayes Approaches to Multivariate Fuzzy Partitions.
ERIC Educational Resources Information Center
Woodbury, Max A.; Manton, Kenneth G.
1991-01-01
An empirical Bayes-maximum likelihood estimation procedure is presented for the application of fuzzy partition models in describing high dimensional discrete response data. The model describes individuals in terms of partial membership in multiple latent categories that represent bounded discrete spaces. (SLD)
Benford's law and the FSD distribution of economic behavioral micro data
NASA Astrophysics Data System (ADS)
Villas-Boas, Sofia B.; Fu, Qiuzi; Judge, George
2017-11-01
In this paper, we focus on the first significant digit (FSD) distribution of European micro income data and use information theoretic-entropy based methods to investigate the degree to which Benford's FSD law is consistent with the nature of these economic behavioral systems. We demonstrate that Benford's law is not an empirical phenomenon that occurs only in important distributions in physical statistics, but that it also arises in self-organizing dynamic economic behavioral systems. The empirical likelihood member of the minimum divergence-entropy family, is used to recover country based income FSD probability density functions and to demonstrate the implications of using a Benford prior reference distribution in economic behavioral system information recovery.
Development of advanced acreage estimation methods
NASA Technical Reports Server (NTRS)
Guseman, L. F., Jr. (Principal Investigator)
1980-01-01
The use of the AMOEBA clustering/classification algorithm was investigated as a basis for both a color display generation technique and maximum likelihood proportion estimation procedure. An approach to analyzing large data reduction systems was formulated and an exploratory empirical study of spatial correlation in LANDSAT data was also carried out. Topics addressed include: (1) development of multiimage color images; (2) spectral spatial classification algorithm development; (3) spatial correlation studies; and (4) evaluation of data systems.
Mendoza, Maria C.B.; Burns, Trudy L.; Jones, Michael P.
2009-01-01
Objectives Case-deletion diagnostic methods are tools that allow identification of influential observations that may affect parameter estimates and model fitting conclusions. The goal of this paper was to develop two case-deletion diagnostics, the exact case deletion (ECD) and the empirical influence function (EIF), for detecting outliers that can affect results of sib-pair maximum likelihood quantitative trait locus (QTL) linkage analysis. Methods Subroutines to compute the ECD and EIF were incorporated into the maximum likelihood QTL variance estimation components of the linkage analysis program MAPMAKER/SIBS. Performance of the diagnostics was compared in simulation studies that evaluated the proportion of outliers correctly identified (sensitivity), and the proportion of non-outliers correctly identified (specificity). Results Simulations involving nuclear family data sets with one outlier showed EIF sensitivities approximated ECD sensitivities well for outlier-affected parameters. Sensitivities were high, indicating the outlier was identified a high proportion of the time. Simulations also showed the enormous computational time advantage of the EIF. Diagnostics applied to body mass index in nuclear families detected observations influential on the lod score and model parameter estimates. Conclusions The EIF is a practical diagnostic tool that has the advantages of high sensitivity and quick computation. PMID:19172086
Zhou, Xiaofan; Shen, Xing-Xing; Hittinger, Chris Todd
2018-01-01
Abstract The sizes of the data matrices assembled to resolve branches of the tree of life have increased dramatically, motivating the development of programs for fast, yet accurate, inference. For example, several different fast programs have been developed in the very popular maximum likelihood framework, including RAxML/ExaML, PhyML, IQ-TREE, and FastTree. Although these programs are widely used, a systematic evaluation and comparison of their performance using empirical genome-scale data matrices has so far been lacking. To address this question, we evaluated these four programs on 19 empirical phylogenomic data sets with hundreds to thousands of genes and up to 200 taxa with respect to likelihood maximization, tree topology, and computational speed. For single-gene tree inference, we found that the more exhaustive and slower strategies (ten searches per alignment) outperformed faster strategies (one tree search per alignment) using RAxML, PhyML, or IQ-TREE. Interestingly, single-gene trees inferred by the three programs yielded comparable coalescent-based species tree estimations. For concatenation-based species tree inference, IQ-TREE consistently achieved the best-observed likelihoods for all data sets, and RAxML/ExaML was a close second. In contrast, PhyML often failed to complete concatenation-based analyses, whereas FastTree was the fastest but generated lower likelihood values and more dissimilar tree topologies in both types of analyses. Finally, data matrix properties, such as the number of taxa and the strength of phylogenetic signal, sometimes substantially influenced the programs’ relative performance. Our results provide real-world gene and species tree phylogenetic inference benchmarks to inform the design and execution of large-scale phylogenomic data analyses. PMID:29177474
Outcome-Dependent Sampling with Interval-Censored Failure Time Data
Zhou, Qingning; Cai, Jianwen; Zhou, Haibo
2017-01-01
Summary Epidemiologic studies and disease prevention trials often seek to relate an exposure variable to a failure time that suffers from interval-censoring. When the failure rate is low and the time intervals are wide, a large cohort is often required so as to yield reliable precision on the exposure-failure-time relationship. However, large cohort studies with simple random sampling could be prohibitive for investigators with a limited budget, especially when the exposure variables are expensive to obtain. Alternative cost-effective sampling designs and inference procedures are therefore desirable. We propose an outcome-dependent sampling (ODS) design with interval-censored failure time data, where we enrich the observed sample by selectively including certain more informative failure subjects. We develop a novel sieve semiparametric maximum empirical likelihood approach for fitting the proportional hazards model to data from the proposed interval-censoring ODS design. This approach employs the empirical likelihood and sieve methods to deal with the infinite-dimensional nuisance parameters, which greatly reduces the dimensionality of the estimation problem and eases the computation difficulty. The consistency and asymptotic normality of the resulting regression parameter estimator are established. The results from our extensive simulation study show that the proposed design and method works well for practical situations and is more efficient than the alternative designs and competing approaches. An example from the Atherosclerosis Risk in Communities (ARIC) study is provided for illustration. PMID:28771664
Adaptive power priors with empirical Bayes for clinical trials.
Gravestock, Isaac; Held, Leonhard
2017-09-01
Incorporating historical information into the design and analysis of a new clinical trial has been the subject of much discussion as a way to increase the feasibility of trials in situations where patients are difficult to recruit. The best method to include this data is not yet clear, especially in the case when few historical studies are available. This paper looks at the power prior technique afresh in a binomial setting and examines some previously unexamined properties, such as Box P values, bias, and coverage. Additionally, it proposes an empirical Bayes-type approach to estimating the prior weight parameter by marginal likelihood. This estimate has advantages over previously criticised methods in that it varies commensurably with differences in the historical and current data and can choose weights near 1 when the data are similar enough. Fully Bayesian approaches are also considered. An analysis of the operating characteristics shows that the adaptive methods work well and that the various approaches have different strengths and weaknesses. Copyright © 2017 John Wiley & Sons, Ltd.
An empirical Bayes approach for the Poisson life distribution.
NASA Technical Reports Server (NTRS)
Canavos, G. C.
1973-01-01
A smooth empirical Bayes estimator is derived for the intensity parameter (hazard rate) in the Poisson distribution as used in life testing. The reliability function is also estimated either by using the empirical Bayes estimate of the parameter, or by obtaining the expectation of the reliability function. The behavior of the empirical Bayes procedure is studied through Monte Carlo simulation in which estimates of mean-squared errors of the empirical Bayes estimators are compared with those of conventional estimators such as minimum variance unbiased or maximum likelihood. Results indicate a significant reduction in mean-squared error of the empirical Bayes estimators over the conventional variety.
Detection of Person Misfit in Computerized Adaptive Tests with Polytomous Items.
ERIC Educational Resources Information Center
van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.
2002-01-01
Compared the nominal and empirical null distributions of the standardized log-likelihood statistic for polytomous items for paper-and-pencil (P&P) and computerized adaptive tests (CATs). Results show that the empirical distribution of the statistic differed from the assumed standard normal distribution for both P&P tests and CATs. Also…
Maximum likelihood: Extracting unbiased information from complex networks
NASA Astrophysics Data System (ADS)
Garlaschelli, Diego; Loffredo, Maria I.
2008-07-01
The choice of free parameters in network models is subjective, since it depends on what topological properties are being monitored. However, we show that the maximum likelihood (ML) principle indicates a unique, statistically rigorous parameter choice, associated with a well-defined topological feature. We then find that, if the ML condition is incompatible with the built-in parameter choice, network models turn out to be intrinsically ill defined or biased. To overcome this problem, we construct a class of safely unbiased models. We also propose an extension of these results that leads to the fascinating possibility to extract, only from topological data, the “hidden variables” underlying network organization, making them “no longer hidden.” We test our method on World Trade Web data, where we recover the empirical gross domestic product using only topological information.
Eickhoff, Simon B; Nichols, Thomas E; Laird, Angela R; Hoffstaedter, Felix; Amunts, Katrin; Fox, Peter T; Bzdok, Danilo; Eickhoff, Claudia R
2016-08-15
Given the increasing number of neuroimaging publications, the automated knowledge extraction on brain-behavior associations by quantitative meta-analyses has become a highly important and rapidly growing field of research. Among several methods to perform coordinate-based neuroimaging meta-analyses, Activation Likelihood Estimation (ALE) has been widely adopted. In this paper, we addressed two pressing questions related to ALE meta-analysis: i) Which thresholding method is most appropriate to perform statistical inference? ii) Which sample size, i.e., number of experiments, is needed to perform robust meta-analyses? We provided quantitative answers to these questions by simulating more than 120,000 meta-analysis datasets using empirical parameters (i.e., number of subjects, number of reported foci, distribution of activation foci) derived from the BrainMap database. This allowed to characterize the behavior of ALE analyses, to derive first power estimates for neuroimaging meta-analyses, and to thus formulate recommendations for future ALE studies. We could show as a first consequence that cluster-level family-wise error (FWE) correction represents the most appropriate method for statistical inference, while voxel-level FWE correction is valid but more conservative. In contrast, uncorrected inference and false-discovery rate correction should be avoided. As a second consequence, researchers should aim to include at least 20 experiments into an ALE meta-analysis to achieve sufficient power for moderate effects. We would like to note, though, that these calculations and recommendations are specific to ALE and may not be extrapolated to other approaches for (neuroimaging) meta-analysis. Copyright © 2016 Elsevier Inc. All rights reserved.
Eickhoff, Simon B.; Nichols, Thomas E.; Laird, Angela R.; Hoffstaedter, Felix; Amunts, Katrin; Fox, Peter T.
2016-01-01
Given the increasing number of neuroimaging publications, the automated knowledge extraction on brain-behavior associations by quantitative meta-analyses has become a highly important and rapidly growing field of research. Among several methods to perform coordinate-based neuroimaging meta-analyses, Activation Likelihood Estimation (ALE) has been widely adopted. In this paper, we addressed two pressing questions related to ALE meta-analysis: i) Which thresholding method is most appropriate to perform statistical inference? ii) Which sample size, i.e., number of experiments, is needed to perform robust meta-analyses? We provided quantitative answers to these questions by simulating more than 120,000 meta-analysis datasets using empirical parameters (i.e., number of subjects, number of reported foci, distribution of activation foci) derived from the BrainMap database. This allowed to characterize the behavior of ALE analyses, to derive first power estimates for neuroimaging meta-analyses, and to thus formulate recommendations for future ALE studies. We could show as a first consequence that cluster-level family-wise error (FWE) correction represents the most appropriate method for statistical inference, while voxel-level FWE correction is valid but more conservative. In contrast, uncorrected inference and false-discovery rate correction should be avoided. As a second consequence, researchers should aim to include at least 20 experiments into an ALE meta-analysis to achieve sufficient power for moderate effects. We would like to note, though, that these calculations and recommendations are specific to ALE and may not be extrapolated to other approaches for (neuroimaging) meta-analysis. PMID:27179606
Inferring relationships between pairs of individuals from locus heterozygosities
Presciuttini, Silvano; Toni, Chiara; Tempestini, Elena; Verdiani, Simonetta; Casarino, Lucia; Spinetti, Isabella; Stefano, Francesco De; Domenici, Ranieri; Bailey-Wilson, Joan E
2002-01-01
Background The traditional exact method for inferring relationships between individuals from genetic data is not easily applicable in all situations that may be encountered in several fields of applied genetics. This study describes an approach that gives affordable results and is easily applicable; it is based on the probabilities that two individuals share 0, 1 or both alleles at a locus identical by state. Results We show that these probabilities (zi) depend on locus heterozygosity (H), and are scarcely affected by variation of the distribution of allele frequencies. This allows us to obtain empirical curves relating zi's to H for a series of common relationships, so that the likelihood ratio of a pair of relationships between any two individuals, given their genotypes at a locus, is a function of a single parameter, H. Application to large samples of mother-child and full-sib pairs shows that the statistical power of this method to infer the correct relationship is not much lower than the exact method. Analysis of a large database of STR data proves that locus heterozygosity does not vary significantly among Caucasian populations, apart from special cases, so that the likelihood ratio of the more common relationships between pairs of individuals may be obtained by looking at tabulated zi values. Conclusions A simple method is provided, which may be used by any scientist with the help of a calculator or a spreadsheet to compute the likelihood ratios of common alternative relationships between pairs of individuals. PMID:12441003
A New Monte Carlo Method for Estimating Marginal Likelihoods.
Wang, Yu-Bo; Chen, Ming-Hui; Kuo, Lynn; Lewis, Paul O
2018-06-01
Evaluating the marginal likelihood in Bayesian analysis is essential for model selection. Estimators based on a single Markov chain Monte Carlo sample from the posterior distribution include the harmonic mean estimator and the inflated density ratio estimator. We propose a new class of Monte Carlo estimators based on this single Markov chain Monte Carlo sample. This class can be thought of as a generalization of the harmonic mean and inflated density ratio estimators using a partition weighted kernel (likelihood times prior). We show that our estimator is consistent and has better theoretical properties than the harmonic mean and inflated density ratio estimators. In addition, we provide guidelines on choosing optimal weights. Simulation studies were conducted to examine the empirical performance of the proposed estimator. We further demonstrate the desirable features of the proposed estimator with two real data sets: one is from a prostate cancer study using an ordinal probit regression model with latent variables; the other is for the power prior construction from two Eastern Cooperative Oncology Group phase III clinical trials using the cure rate survival model with similar objectives.
A generalised significance test for individual communities in networks.
Kojaku, Sadamori; Masuda, Naoki
2018-05-09
Many empirical networks have community structure, in which nodes are densely interconnected within each community (i.e., a group of nodes) and sparsely across different communities. Like other local and meso-scale structure of networks, communities are generally heterogeneous in various aspects such as the size, density of edges, connectivity to other communities and significance. In the present study, we propose a method to statistically test the significance of individual communities in a given network. Compared to the previous methods, the present algorithm is unique in that it accepts different community-detection algorithms and the corresponding quality function for single communities. The present method requires that a quality of each community can be quantified and that community detection is performed as optimisation of such a quality function summed over the communities. Various community detection algorithms including modularity maximisation and graph partitioning meet this criterion. Our method estimates a distribution of the quality function for randomised networks to calculate a likelihood of each community in the given network. We illustrate our algorithm by synthetic and empirical networks.
Empiric Auto-Titrating CPAP in People with Suspected Obstructive Sleep Apnea
Drummond, Fitzgerald; Doelken, Peter; Ahmed, Qanta A.; Gilbert, Gregory E.; Strange, Charlie; Herpel, Laura; Frye, Michael D.
2010-01-01
Objective: Efficient diagnosis and treatment of obstructive sleep apnea (OSA) can be difficult because of time delays imposed by clinic visits and serial overnight polysomnography. In some cases, it may be desirable to initiate treatment for suspected OSA prior to polysomnography. Our objective was to compare the improvement of daytime sleepiness and sleep-related quality of life of patients with high clinical likelihood of having OSA who were randomly assigned to receive empiric auto-titrating continuous positive airway pressure (CPAP) while awaiting polysomnogram versus current usual care. Methods: Serial patients referred for overnight polysomnography who had high clinical likelihood of having OSA were randomly assigned to usual care or immediate initiation of auto-titrating CPAP. Epworth Sleepiness Scale (ESS) scores and the Functional Outcomes of Sleep Questionnaire (FOSQ) scores were obtained at baseline, 1 month after randomization, and again after initiation of fixed CPAP in control subjects and after the sleep study in auto-CPAP patients. Results: One hundred nine patients were randomized. Baseline demographics, daytime sleepiness, and sleep-related quality of life scores were similar between groups. One-month ESS and FOSQ scores were improved in the group empirically treated with auto-titrating CPAP. ESS scores improved in the first month by a mean of −3.2 (confidence interval −1.6 to −4.8, p < 0.001) and FOSQ scores improved by a mean of 1.5, (confidence interval 0.5 to 2.7, p = 0.02), whereas scores in the usual-care group did not change (p = NS). Following therapy directed by overnight polysomnography in the control group, there were no differences in ESS or FOSQ between the groups. No adverse events were observed. Conclusion: Empiric auto-CPAP resulted in symptomatic improvement of daytime sleepiness and sleep-related quality of life in a cohort of patients awaiting polysomnography who had a high pretest probability of having OSA. Additional studies are needed to evaluate the applicability of empiric treatment to other populations. Citation: Drummond F; Doelken P; Ahmed QA; Gilbert GE; Strange C; Herpel L; Frye MD. Empiric auto-titrating CPAP in people with suspected obstructive sleep apnea. J Clin Sleep Med 2010;6(2):140-145. PMID:20411690
Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty
Baele, Guy; Lemey, Philippe; Suchard, Marc A.
2016-01-01
Marginal likelihood estimates to compare models using Bayes factors frequently accompany Bayesian phylogenetic inference. Approaches to estimate marginal likelihoods have garnered increased attention over the past decade. In particular, the introduction of path sampling (PS) and stepping-stone sampling (SS) into Bayesian phylogenetics has tremendously improved the accuracy of model selection. These sampling techniques are now used to evaluate complex evolutionary and population genetic models on empirical data sets, but considerable computational demands hamper their widespread adoption. Further, when very diffuse, but proper priors are specified for model parameters, numerical issues complicate the exploration of the priors, a necessary step in marginal likelihood estimation using PS or SS. To avoid such instabilities, generalized SS (GSS) has recently been proposed, introducing the concept of “working distributions” to facilitate—or shorten—the integration process that underlies marginal likelihood estimation. However, the need to fix the tree topology currently limits GSS in a coalescent-based framework. Here, we extend GSS by relaxing the fixed underlying tree topology assumption. To this purpose, we introduce a “working” distribution on the space of genealogies, which enables estimating marginal likelihoods while accommodating phylogenetic uncertainty. We propose two different “working” distributions that help GSS to outperform PS and SS in terms of accuracy when comparing demographic and evolutionary models applied to synthetic data and real-world examples. Further, we show that the use of very diffuse priors can lead to a considerable overestimation in marginal likelihood when using PS and SS, while still retrieving the correct marginal likelihood using both GSS approaches. The methods used in this article are available in BEAST, a powerful user-friendly software package to perform Bayesian evolutionary analyses. PMID:26526428
Development and system identification of a light unmanned aircraft for flying qualities research
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peters, M.E.; Andrisani, D. II
This paper describes the design, construction, flight testing and system identification of a light weight remotely piloted aircraft and its use in studying flying qualities in the longitudinal axis. The short period approximation to the longitudinal dynamics of the aircraft was used. Parameters in this model were determined a priori using various empirical estimators. These parameters were then estimated from flight data using a maximum likelihood parameter identification method. A comparison of the parameter values revealed that the stability derivatives obtained from the empirical estimators were reasonably close to the flight test results. However, the control derivatives determined by themore » empirical estimators were too large by a factor of two. The aircraft was also flown to determine how the longitudinal flying qualities of light weight remotely piloted aircraft compared to full size manned aircraft. It was shown that light weight remotely piloted aircraft require much faster short period dynamics to achieve level I flying qualities in an up-and-away flight task.« less
Models and analysis for multivariate failure time data
NASA Astrophysics Data System (ADS)
Shih, Joanna Huang
The goal of this research is to develop and investigate models and analytic methods for multivariate failure time data. We compare models in terms of direct modeling of the margins, flexibility of dependency structure, local vs. global measures of association, and ease of implementation. In particular, we study copula models, and models produced by right neutral cumulative hazard functions and right neutral hazard functions. We examine the changes of association over time for families of bivariate distributions induced from these models by displaying their density contour plots, conditional density plots, correlation curves of Doksum et al, and local cross ratios of Oakes. We know that bivariate distributions with same margins might exhibit quite different dependency structures. In addition to modeling, we study estimation procedures. For copula models, we investigate three estimation procedures. the first procedure is full maximum likelihood. The second procedure is two-stage maximum likelihood. At stage 1, we estimate the parameters in the margins by maximizing the marginal likelihood. At stage 2, we estimate the dependency structure by fixing the margins at the estimated ones. The third procedure is two-stage partially parametric maximum likelihood. It is similar to the second procedure, but we estimate the margins by the Kaplan-Meier estimate. We derive asymptotic properties for these three estimation procedures and compare their efficiency by Monte-Carlo simulations and direct computations. For models produced by right neutral cumulative hazards and right neutral hazards, we derive the likelihood and investigate the properties of the maximum likelihood estimates. Finally, we develop goodness of fit tests for the dependency structure in the copula models. We derive a test statistic and its asymptotic properties based on the test of homogeneity of Zelterman and Chen (1988), and a graphical diagnostic procedure based on the empirical Bayes approach. We study the performance of these two methods using actual and computer generated data.
A General Model for Estimating Macroevolutionary Landscapes.
Boucher, Florian C; Démery, Vincent; Conti, Elena; Harmon, Luke J; Uyeda, Josef
2018-03-01
The evolution of quantitative characters over long timescales is often studied using stochastic diffusion models. The current toolbox available to students of macroevolution is however limited to two main models: Brownian motion and the Ornstein-Uhlenbeck process, plus some of their extensions. Here, we present a very general model for inferring the dynamics of quantitative characters evolving under both random diffusion and deterministic forces of any possible shape and strength, which can accommodate interesting evolutionary scenarios like directional trends, disruptive selection, or macroevolutionary landscapes with multiple peaks. This model is based on a general partial differential equation widely used in statistical mechanics: the Fokker-Planck equation, also known in population genetics as the Kolmogorov forward equation. We thus call the model FPK, for Fokker-Planck-Kolmogorov. We first explain how this model can be used to describe macroevolutionary landscapes over which quantitative traits evolve and, more importantly, we detail how it can be fitted to empirical data. Using simulations, we show that the model has good behavior both in terms of discrimination from alternative models and in terms of parameter inference. We provide R code to fit the model to empirical data using either maximum-likelihood or Bayesian estimation, and illustrate the use of this code with two empirical examples of body mass evolution in mammals. FPK should greatly expand the set of macroevolutionary scenarios that can be studied since it opens the way to estimating macroevolutionary landscapes of any conceivable shape. [Adaptation; bounds; diffusion; FPK model; macroevolution; maximum-likelihood estimation; MCMC methods; phylogenetic comparative data; selection.].
A systematic review of the association between family meals and adolescent risk outcomes.
Goldfarb, Samantha S; Tarver, Will L; Locher, Julie L; Preskitt, Julie; Sen, Bisakha
2015-10-01
To conduct a systematic review of the literature examining the relationship between family meals and adolescent health risk outcomes. We performed a systematic search of original empirical studies published between January 1990 and September 2013. Based on data from selected studies, we conducted logistic regression models to examine the correlates of reporting a protective association between frequent family meals and adolescent outcomes. Of the 254 analyses from 26 selected studies, most reported a significant association between family meals and the adolescent risk outcome-of-interest. However, model analyses which controlled for family connectedness variables, or used advanced empirical methods to account for family-level confounders, were less likely than unadjusted models to report significant relationships. The type of analysis conducted was significantly associated with the likelihood of finding a protective relationship between family meals and the adolescent outcome-of-interest, yet very few studies are using such methods in the literature. Copyright © 2015 The Foundation for Professionals in Services for Adolescents. Published by Elsevier Ltd. All rights reserved.
Chen, Ying-Ju; Ning, Wei; Gupta, Arjun K
2016-05-01
The mean residual life (MRL) function is one of the basic parameters of interest in survival analysis that describes the expected remaining time of an individual after a certain age. The study of changes in the MRL function is practical and interesting because it may help us to identify some factors such as age and gender that may influence the remaining lifetimes of patients after receiving a certain surgery. In this paper, we propose a detection procedure based on the empirical likelihood for the changes in MRL functions with right censored data. Two real examples are also given: Veterans' administration lung cancer study and Stanford heart transplant to illustrate the detecting procedure. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Nie, Hui; Cheng, Jing; Small, Dylan S
2011-12-01
In many clinical studies with a survival outcome, administrative censoring occurs when follow-up ends at a prespecified date and many subjects are still alive. An additional complication in some trials is that there is noncompliance with the assigned treatment. For this setting, we study the estimation of the causal effect of treatment on survival probability up to a given time point among those subjects who would comply with the assignment to both treatment and control. We first discuss the standard instrumental variable (IV) method for survival outcomes and parametric maximum likelihood methods, and then develop an efficient plug-in nonparametric empirical maximum likelihood estimation (PNEMLE) approach. The PNEMLE method does not make any assumptions on outcome distributions, and makes use of the mixture structure in the data to gain efficiency over the standard IV method. Theoretical results of the PNEMLE are derived and the method is illustrated by an analysis of data from a breast cancer screening trial. From our limited mortality analysis with administrative censoring times 10 years into the follow-up, we find a significant benefit of screening is present after 4 years (at the 5% level) and this persists at 10 years follow-up. © 2011, The International Biometric Society.
Yuan, Ke-Hai; Tian, Yubin; Yanagihara, Hirokazu
2015-06-01
Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N ≥ max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.
What do potential jurors know about police interrogation techniques and false confessions?
Leo, Richard A; Liu, Brittany
2009-01-01
Psychological police interrogation methods in America inevitably involve some level of pressure and persuasion to achieve their goal of eliciting confessions of guilt from custodial suspects. In this article, we surveyed potential jurors about their perceptions of a range of psychological interrogation techniques, the likelihood that such techniques would elicit a true confession from guilty suspects, and the likelihood that such techniques could elicit a false confession from innocent suspects. Participants recognized that these interrogation techniques may be psychologically coercive and may elicit true confessions, but believed that psychologically coercive interrogation techniques are not likely to elicit false confessions. The findings from this survey study indicate that potential jurors believe that false confessions are both counter- intuitive and unlikely, even in response to psychologically coercive interrogation techniques that have been shown to lead to false confessions from the innocent. This study provides empirical support for the idea that expert witnesses may helpfully inform jurors about the social science research on psychologically coercive interrogation methods and how and why such interrogation techniques can lead to false confessions. (c) 2009 John Wiley & Sons, Ltd.
Predicting the evolution of complex networks via similarity dynamics
NASA Astrophysics Data System (ADS)
Wu, Tao; Chen, Leiting; Zhong, Linfeng; Xian, Xingping
2017-01-01
Almost all real-world networks are subject to constant evolution, and plenty of them have been investigated empirically to uncover the underlying evolution mechanism. However, the evolution prediction of dynamic networks still remains a challenging problem. The crux of this matter is to estimate the future network links of dynamic networks. This paper studies the evolution prediction of dynamic networks with link prediction paradigm. To estimate the likelihood of the existence of links more accurate, an effective and robust similarity index is presented by exploiting network structure adaptively. Moreover, most of the existing link prediction methods do not make a clear distinction between future links and missing links. In order to predict the future links, the networks are regarded as dynamic systems in this paper, and a similarity updating method, spatial-temporal position drift model, is developed to simulate the evolutionary dynamics of node similarity. Then the updated similarities are used as input information for the future links' likelihood estimation. Extensive experiments on real-world networks suggest that the proposed similarity index performs better than baseline methods and the position drift model performs well for evolution prediction in real-world evolving networks.
A Walk on the Wild Side: The Impact of Music on Risk-Taking Likelihood.
Enström, Rickard; Schmaltz, Rodney
2017-01-01
From a marketing perspective, there has been substantial interest in on the role of risk-perception on consumer behavior. Specific 'problem music' like rap and heavy metal has long been associated with delinquent behavior, including violence, drug use, and promiscuous sex. Although individuals' risk preferences have been investigated across a range of decision-making situations, there has been little empirical work demonstrating the direct role music may have on the likelihood of engaging in risky activities. In the exploratory study reported here, we assessed the impact of listening to different styles of music while assessing risk-taking likelihood through a psychometric scale. Risk-taking likelihood was measured across ethical, financial, health and safety, recreational and social domains. Through the means of a canonical correlation analysis, the multivariate relationship between different music styles and individual risk-taking likelihood across the different domains is discussed. Our results indicate that listening to different types of music does influence risk-taking likelihood, though not in areas of health and safety.
A Walk on the Wild Side: The Impact of Music on Risk-Taking Likelihood
Enström, Rickard; Schmaltz, Rodney
2017-01-01
From a marketing perspective, there has been substantial interest in on the role of risk-perception on consumer behavior. Specific ‘problem music’ like rap and heavy metal has long been associated with delinquent behavior, including violence, drug use, and promiscuous sex. Although individuals’ risk preferences have been investigated across a range of decision-making situations, there has been little empirical work demonstrating the direct role music may have on the likelihood of engaging in risky activities. In the exploratory study reported here, we assessed the impact of listening to different styles of music while assessing risk-taking likelihood through a psychometric scale. Risk-taking likelihood was measured across ethical, financial, health and safety, recreational and social domains. Through the means of a canonical correlation analysis, the multivariate relationship between different music styles and individual risk-taking likelihood across the different domains is discussed. Our results indicate that listening to different types of music does influence risk-taking likelihood, though not in areas of health and safety. PMID:28539908
Zhang, J; Feng, J-Y; Ni, Y-L; Wen, Y-J; Niu, Y; Tamba, C L; Yue, C; Song, Q; Zhang, Y-M
2017-06-01
Multilocus genome-wide association studies (GWAS) have become the state-of-the-art procedure to identify quantitative trait nucleotides (QTNs) associated with complex traits. However, implementation of multilocus model in GWAS is still difficult. In this study, we integrated least angle regression with empirical Bayes to perform multilocus GWAS under polygenic background control. We used an algorithm of model transformation that whitened the covariance matrix of the polygenic matrix K and environmental noise. Markers on one chromosome were included simultaneously in a multilocus model and least angle regression was used to select the most potentially associated single-nucleotide polymorphisms (SNPs), whereas the markers on the other chromosomes were used to calculate kinship matrix as polygenic background control. The selected SNPs in multilocus model were further detected for their association with the trait by empirical Bayes and likelihood ratio test. We herein refer to this method as the pLARmEB (polygenic-background-control-based least angle regression plus empirical Bayes). Results from simulation studies showed that pLARmEB was more powerful in QTN detection and more accurate in QTN effect estimation, had less false positive rate and required less computing time than Bayesian hierarchical generalized linear model, efficient mixed model association (EMMA) and least angle regression plus empirical Bayes. pLARmEB, multilocus random-SNP-effect mixed linear model and fast multilocus random-SNP-effect EMMA methods had almost equal power of QTN detection in simulation experiments. However, only pLARmEB identified 48 previously reported genes for 7 flowering time-related traits in Arabidopsis thaliana.
NASA Astrophysics Data System (ADS)
Liu, Shuxin; Ji, Xinsheng; Liu, Caixia; Bai, Yi
2017-01-01
Many link prediction methods have been proposed for predicting the likelihood that a link exists between two nodes in complex networks. Among these methods, similarity indices are receiving close attention. Most similarity-based methods assume that the contribution of links with different topological structures is the same in the similarity calculations. This paper proposes a local weighted method, which weights the strength of connection between each pair of nodes. Based on the local weighted method, six local weighted similarity indices extended from unweighted similarity indices (including Common Neighbor (CN), Adamic-Adar (AA), Resource Allocation (RA), Salton, Jaccard and Local Path (LP) index) are proposed. Empirical study has shown that the local weighted method can significantly improve the prediction accuracy of these unweighted similarity indices and that in sparse and weakly clustered networks, the indices perform even better.
2012-01-01
Background An important question in the analysis of biochemical data is that of identifying subsets of molecular variables that may jointly influence a biological response. Statistical variable selection methods have been widely used for this purpose. In many settings, it may be important to incorporate ancillary biological information concerning the variables of interest. Pathway and network maps are one example of a source of such information. However, although ancillary information is increasingly available, it is not always clear how it should be used nor how it should be weighted in relation to primary data. Results We put forward an approach in which biological knowledge is incorporated using informative prior distributions over variable subsets, with prior information selected and weighted in an automated, objective manner using an empirical Bayes formulation. We employ continuous, linear models with interaction terms and exploit biochemically-motivated sparsity constraints to permit exact inference. We show an example of priors for pathway- and network-based information and illustrate our proposed method on both synthetic response data and by an application to cancer drug response data. Comparisons are also made to alternative Bayesian and frequentist penalised-likelihood methods for incorporating network-based information. Conclusions The empirical Bayes method proposed here can aid prior elicitation for Bayesian variable selection studies and help to guard against mis-specification of priors. Empirical Bayes, together with the proposed pathway-based priors, results in an approach with a competitive variable selection performance. In addition, the overall procedure is fast, deterministic, and has very few user-set parameters, yet is capable of capturing interplay between molecular players. The approach presented is general and readily applicable in any setting with multiple sources of biological prior knowledge. PMID:22578440
Branching-ratio approximation for the self-exciting Hawkes process
NASA Astrophysics Data System (ADS)
Hardiman, Stephen J.; Bouchaud, Jean-Philippe
2014-12-01
We introduce a model-independent approximation for the branching ratio of Hawkes self-exciting point processes. Our estimator requires knowing only the mean and variance of the event count in a sufficiently large time window, statistics that are readily obtained from empirical data. The method we propose greatly simplifies the estimation of the Hawkes branching ratio, recently proposed as a proxy for market endogeneity and formerly estimated using numerical likelihood maximization. We employ our method to support recent theoretical and experimental results indicating that the best fitting Hawkes model to describe S&P futures price changes is in fact critical (now and in the recent past) in light of the long memory of financial market activity.
Random-effects meta-analysis: the number of studies matters.
Guolo, Annamaria; Varin, Cristiano
2017-06-01
This paper investigates the impact of the number of studies on meta-analysis and meta-regression within the random-effects model framework. It is frequently neglected that inference in random-effects models requires a substantial number of studies included in meta-analysis to guarantee reliable conclusions. Several authors warn about the risk of inaccurate results of the traditional DerSimonian and Laird approach especially in the common case of meta-analysis involving a limited number of studies. This paper presents a selection of likelihood and non-likelihood methods for inference in meta-analysis proposed to overcome the limitations of the DerSimonian and Laird procedure, with a focus on the effect of the number of studies. The applicability and the performance of the methods are investigated in terms of Type I error rates and empirical power to detect effects, according to scenarios of practical interest. Simulation studies and applications to real meta-analyses highlight that it is not possible to identify an approach uniformly superior to alternatives. The overall recommendation is to avoid the DerSimonian and Laird method when the number of meta-analysis studies is modest and prefer a more comprehensive procedure that compares alternative inferential approaches. R code for meta-analysis according to all of the inferential methods examined in the paper is provided.
New robust statistical procedures for the polytomous logistic regression models.
Castilla, Elena; Ghosh, Abhik; Martin, Nirian; Pardo, Leandro
2018-05-17
This article derives a new family of estimators, namely the minimum density power divergence estimators, as a robust generalization of the maximum likelihood estimator for the polytomous logistic regression model. Based on these estimators, a family of Wald-type test statistics for linear hypotheses is introduced. Robustness properties of both the proposed estimators and the test statistics are theoretically studied through the classical influence function analysis. Appropriate real life examples are presented to justify the requirement of suitable robust statistical procedures in place of the likelihood based inference for the polytomous logistic regression model. The validity of the theoretical results established in the article are further confirmed empirically through suitable simulation studies. Finally, an approach for the data-driven selection of the robustness tuning parameter is proposed with empirical justifications. © 2018, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Chen, Dar-Hsin; Chou, Heng-Chih; Wang, David; Zaabar, Rim
2011-06-01
Most empirical research of the path-dependent, exotic-option credit risk model focuses on developed markets. Taking Taiwan as an example, this study investigates the bankruptcy prediction performance of the path-dependent, barrier option model in the emerging market. We adopt Duan's (1994) [11], (2000) [12] transformed-data maximum likelihood estimation (MLE) method to directly estimate the unobserved model parameters, and compare the predictive ability of the barrier option model to the commonly adopted credit risk model, Merton's model. Our empirical findings show that the barrier option model is more powerful than Merton's model in predicting bankruptcy in the emerging market. Moreover, we find that the barrier option model predicts bankruptcy much better for highly-leveraged firms. Finally, our findings indicate that the prediction accuracy of the credit risk model can be improved by higher asset liquidity and greater financial transparency.
Tyrer, Jonathan P; Guo, Qi; Easton, Douglas F; Pharoah, Paul D P
2013-06-06
The development of genotyping arrays containing hundreds of thousands of rare variants across the genome and advances in high-throughput sequencing technologies have made feasible empirical genetic association studies to search for rare disease susceptibility alleles. As single variant testing is underpowered to detect associations, the development of statistical methods to combine analysis across variants - so-called "burden tests" - is an area of active research interest. We previously developed a method, the admixture maximum likelihood test, to test multiple, common variants for association with a trait of interest. We have extended this method, called the rare admixture maximum likelihood test (RAML), for the analysis of rare variants. In this paper we compare the performance of RAML with six other burden tests designed to test for association of rare variants. We used simulation testing over a range of scenarios to test the power of RAML compared to the other rare variant association testing methods. These scenarios modelled differences in effect variability, the average direction of effect and the proportion of associated variants. We evaluated the power for all the different scenarios. RAML tended to have the greatest power for most scenarios where the proportion of associated variants was small, whereas SKAT-O performed a little better for the scenarios with a higher proportion of associated variants. The RAML method makes no assumptions about the proportion of variants that are associated with the phenotype of interest or the magnitude and direction of their effect. The method is flexible and can be applied to both dichotomous and quantitative traits and allows for the inclusion of covariates in the underlying regression model. The RAML method performed well compared to the other methods over a wide range of scenarios. Generally power was moderate in most of the scenarios, underlying the need for large sample sizes in any form of association testing.
Ghosh, Sujit K
2010-01-01
Bayesian methods are rapidly becoming popular tools for making statistical inference in various fields of science including biology, engineering, finance, and genetics. One of the key aspects of Bayesian inferential method is its logical foundation that provides a coherent framework to utilize not only empirical but also scientific information available to a researcher. Prior knowledge arising from scientific background, expert judgment, or previously collected data is used to build a prior distribution which is then combined with current data via the likelihood function to characterize the current state of knowledge using the so-called posterior distribution. Bayesian methods allow the use of models of complex physical phenomena that were previously too difficult to estimate (e.g., using asymptotic approximations). Bayesian methods offer a means of more fully understanding issues that are central to many practical problems by allowing researchers to build integrated models based on hierarchical conditional distributions that can be estimated even with limited amounts of data. Furthermore, advances in numerical integration methods, particularly those based on Monte Carlo methods, have made it possible to compute the optimal Bayes estimators. However, there is a reasonably wide gap between the background of the empirically trained scientists and the full weight of Bayesian statistical inference. Hence, one of the goals of this chapter is to bridge the gap by offering elementary to advanced concepts that emphasize linkages between standard approaches and full probability modeling via Bayesian methods.
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Walker, H. F.
1978-01-01
This paper addresses the problem of obtaining numerically maximum-likelihood estimates of the parameters for a mixture of normal distributions. In recent literature, a certain successive-approximations procedure, based on the likelihood equations, was shown empirically to be effective in numerically approximating such maximum-likelihood estimates; however, the reliability of this procedure was not established theoretically. Here, we introduce a general iterative procedure, of the generalized steepest-ascent (deflected-gradient) type, which is just the procedure known in the literature when the step-size is taken to be 1. We show that, with probability 1 as the sample size grows large, this procedure converges locally to the strongly consistent maximum-likelihood estimate whenever the step-size lies between 0 and 2. We also show that the step-size which yields optimal local convergence rates for large samples is determined in a sense by the 'separation' of the component normal densities and is bounded below by a number between 1 and 2.
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Walker, H. F.
1976-01-01
The problem of obtaining numerically maximum likelihood estimates of the parameters for a mixture of normal distributions is addressed. In recent literature, a certain successive approximations procedure, based on the likelihood equations, is shown empirically to be effective in numerically approximating such maximum-likelihood estimates; however, the reliability of this procedure was not established theoretically. Here, a general iterative procedure is introduced, of the generalized steepest-ascent (deflected-gradient) type, which is just the procedure known in the literature when the step-size is taken to be 1. With probability 1 as the sample size grows large, it is shown that this procedure converges locally to the strongly consistent maximum-likelihood estimate whenever the step-size lies between 0 and 2. The step-size which yields optimal local convergence rates for large samples is determined in a sense by the separation of the component normal densities and is bounded below by a number between 1 and 2.
A Solution to Separation and Multicollinearity in Multiple Logistic Regression
Shen, Jianzhao; Gao, Sujuan
2010-01-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27–38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth’s penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study. PMID:20376286
A Solution to Separation and Multicollinearity in Multiple Logistic Regression.
Shen, Jianzhao; Gao, Sujuan
2008-10-01
In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, Biometrika, 80(1), 27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.
Fitting power-laws in empirical data with estimators that work for all exponents
Hanel, Rudolf; Corominas-Murtra, Bernat; Liu, Bo; Thurner, Stefan
2017-01-01
Most standard methods based on maximum likelihood (ML) estimates of power-law exponents can only be reliably used to identify exponents smaller than minus one. The argument that power laws are otherwise not normalizable, depends on the underlying sample space the data is drawn from, and is true only for sample spaces that are unbounded from above. Power-laws obtained from bounded sample spaces (as is the case for practically all data related problems) are always free of such limitations and maximum likelihood estimates can be obtained for arbitrary powers without restrictions. Here we first derive the appropriate ML estimator for arbitrary exponents of power-law distributions on bounded discrete sample spaces. We then show that an almost identical estimator also works perfectly for continuous data. We implemented this ML estimator and discuss its performance with previous attempts. We present a general recipe of how to use these estimators and present the associated computer codes. PMID:28245249
Development of probabilistic emission inventories of air toxics for Jacksonville, Florida, USA.
Zhao, Yuchao; Frey, H Christopher
2004-11-01
Probabilistic emission inventories were developed for 1,3-butadiene, mercury (Hg), arsenic (As), benzene, formaldehyde, and lead for Jacksonville, FL. To quantify inter-unit variability in empirical emission factor data, the Maximum Likelihood Estimation (MLE) method or the Method of Matching Moments was used to fit parametric distributions. For data sets that contain nondetected measurements, a method based upon MLE was used for parameter estimation. To quantify the uncertainty in urban air toxic emission factors, parametric bootstrap simulation and empirical bootstrap simulation were applied to uncensored and censored data, respectively. The probabilistic emission inventories were developed based on the product of the uncertainties in the emission factors and in the activity factors. The uncertainties in the urban air toxics emission inventories range from as small as -25 to +30% for Hg to as large as -83 to +243% for As. The key sources of uncertainty in the emission inventory for each toxic are identified based upon sensitivity analysis. Typically, uncertainty in the inventory of a given pollutant can be attributed primarily to a small number of source categories. Priorities for improving the inventories and for refining the probabilistic analysis are discussed.
van Es, Andrew; Wiarda, Wim; Hordijk, Maarten; Alberink, Ivo; Vergeer, Peter
2017-05-01
For the comparative analysis of glass fragments, a method using Laser Ablation Inductively Coupled Plasma Mass Spectrometry (LA-ICP-MS) is in use at the NFI, giving measurements of the concentration of 18 elements. An important question is how to evaluate the results as evidence that a glass sample originates from a known glass source or from an arbitrary different glass source. One approach is the use of matching criteria e.g. based on a t-test or overlap of confidence intervals. An important drawback of this method is the fact that the rarity of the glass composition is not taken into account. A similar match can have widely different evidential values. In addition the use of fixed matching criteria can give rise to a "fall off the cliff" effect. Small differences may result in a match or a non-match. In this work a likelihood ratio system is presented, largely based on the two-level model as proposed by Aitken and Lucy [1], and Aitken, Zadora and Lucy [2]. Results show that the output from the two-level model gives good discrimination between same and different source hypotheses, but a post-hoc calibration step is necessary to improve the accuracy of the likelihood ratios. Subsequently, the robustness and performance of the LR system are studied. Results indicate that the output of the LR system is robust to the sample properties of the dataset used for calibration. Furthermore, the empirical upper and lower bound method [3], designed to deal with extrapolation errors in the density models, results in minimum and maximum values of the LR outputted by the system of 3.1×10 -3 and 3.4×10 4 . Calibration of the system, as measured by empirical cross-entropy, shows good behavior over the complete prior range. Rates of misleading evidence are small: for same-source comparisons, 0.3% of LRs support a different-source hypothesis; for different-source comparisons, 0.2% supports a same-source hypothesis. The authors use the LR system in reporting of glass cases to support expert opinion in the interpretation of glass evidence for origin of source questions. Copyright © 2017 The Chartered Society of Forensic Sciences. Published by Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Chen, Siyue; Leung, Henry; Dondo, Maxwell
2014-05-01
As computer network security threats increase, many organizations implement multiple Network Intrusion Detection Systems (NIDS) to maximize the likelihood of intrusion detection and provide a comprehensive understanding of intrusion activities. However, NIDS trigger a massive number of alerts on a daily basis. This can be overwhelming for computer network security analysts since it is a slow and tedious process to manually analyse each alert produced. Thus, automated and intelligent clustering of alerts is important to reveal the structural correlation of events by grouping alerts with common features. As the nature of computer network attacks, and therefore alerts, is not known in advance, unsupervised alert clustering is a promising approach to achieve this goal. We propose a joint optimization technique for feature selection and clustering to aggregate similar alerts and to reduce the number of alerts that analysts have to handle individually. More precisely, each identified feature is assigned a binary value, which reflects the feature's saliency. This value is treated as a hidden variable and incorporated into a likelihood function for clustering. Since computing the optimal solution of the likelihood function directly is analytically intractable, we use the Expectation-Maximisation (EM) algorithm to iteratively update the hidden variable and use it to maximize the expected likelihood. Our empirical results, using a labelled Defense Advanced Research Projects Agency (DARPA) 2000 reference dataset, show that the proposed method gives better results than the EM clustering without feature selection in terms of the clustering accuracy.
Making an unknown unknown a known unknown: Missing data in longitudinal neuroimaging studies.
Matta, Tyler H; Flournoy, John C; Byrne, Michelle L
2017-10-28
The analysis of longitudinal neuroimaging data within the massively univariate framework provides the opportunity to study empirical questions about neurodevelopment. Missing outcome data are an all-to-common feature of any longitudinal study, a feature that, if handled improperly, can reduce statistical power and lead to biased parameter estimates. The goal of this paper is to provide conceptual clarity of the issues and non-issues that arise from analyzing incomplete data in longitudinal studies with particular focus on neuroimaging data. This paper begins with a review of the hierarchy of missing data mechanisms and their relationship to likelihood-based methods, a review that is necessary not just for likelihood-based methods, but also for multiple-imputation methods. Next, the paper provides a series of simulation studies with designs common in longitudinal neuroimaging studies to help illustrate missing data concepts regardless of interpretation. Finally, two applied examples are used to demonstrate the sensitivity of inferences under different missing data assumptions and how this may change the substantive interpretation. The paper concludes with a set of guidelines for analyzing incomplete longitudinal data that can improve the validity of research findings in developmental neuroimaging research. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Slater, Graham J; Harmon, Luke J; Wegmann, Daniel; Joyce, Paul; Revell, Liam J; Alfaro, Michael E
2012-03-01
In recent years, a suite of methods has been developed to fit multiple rate models to phylogenetic comparative data. However, most methods have limited utility at broad phylogenetic scales because they typically require complete sampling of both the tree and the associated phenotypic data. Here, we develop and implement a new, tree-based method called MECCA (Modeling Evolution of Continuous Characters using ABC) that uses a hybrid likelihood/approximate Bayesian computation (ABC)-Markov-Chain Monte Carlo approach to simultaneously infer rates of diversification and trait evolution from incompletely sampled phylogenies and trait data. We demonstrate via simulation that MECCA has considerable power to choose among single versus multiple evolutionary rate models, and thus can be used to test hypotheses about changes in the rate of trait evolution across an incomplete tree of life. We finally apply MECCA to an empirical example of body size evolution in carnivores, and show that there is no evidence for an elevated rate of body size evolution in the pinnipeds relative to terrestrial carnivores. ABC approaches can provide a useful alternative set of tools for future macroevolutionary studies where likelihood-dependent approaches are lacking. © 2011 The Author(s). Evolution© 2011 The Society for the Study of Evolution.
Bayesian Population Genomic Inference of Crossing Over and Gene Conversion
Padhukasahasram, Badri; Rannala, Bruce
2011-01-01
Meiotic recombination is a fundamental cellular mechanism in sexually reproducing organisms and its different forms, crossing over and gene conversion both play an important role in shaping genetic variation in populations. Here, we describe a coalescent-based full-likelihood Markov chain Monte Carlo (MCMC) method for jointly estimating the crossing-over, gene-conversion, and mean tract length parameters from population genomic data under a Bayesian framework. Although computationally more expensive than methods that use approximate likelihoods, the relative efficiency of our method is expected to be optimal in theory. Furthermore, it is also possible to obtain a posterior sample of genealogies for the data using this method. We first check the performance of the new method on simulated data and verify its correctness. We also extend the method for inference under models with variable gene-conversion and crossing-over rates and demonstrate its ability to identify recombination hotspots. Then, we apply the method to two empirical data sets that were sequenced in the telomeric regions of the X chromosome of Drosophila melanogaster. Our results indicate that gene conversion occurs more frequently than crossing over in the su-w and su-s gene sequences while the local rates of crossing over as inferred by our program are not low. The mean tract lengths for gene-conversion events are estimated to be ∼70 bp and 430 bp, respectively, for these data sets. Finally, we discuss ideas and optimizations for reducing the execution time of our algorithm. PMID:21840857
Empirical testing of two models for staging antidepressant treatment resistance.
Petersen, Timothy; Papakostas, George I; Posternak, Michael A; Kant, Alexis; Guyker, Wendy M; Iosifescu, Dan V; Yeung, Albert S; Nierenberg, Andrew A; Fava, Maurizio
2005-08-01
An increasing amount of attention has been paid to treatment resistant depression. Although it is quite common to observe nonremission to not just one but consecutive antidepressant treatments during a major depressive episode, a relationship between the likelihood of achieving remission and one's degree of resistance is not clearly known at this time. This study was undertaken to empirically test 2 recent models for staging treatment resistance. Psychiatrists from 2 academic sites reviewed charts of patients on their caseloads. Clinical Global Impressions-Severity (CGI-S) and Clinical Global Impressions-Improvement (CGI-I) scales were used to measure severity of depression and response to treatment, and 2 treatment-resistant staging scores were classified for each patient using the Massachusetts General Hospital staging method (MGH-S) and the Thase and Rush staging method (TR-S). Out of the 115 patient records reviewed, 58 (49.6%) patients remitted at some point during treatment. There was a significant positive correlation between the 2 staging scores, and logistic regression results indicated that greater MGH-S scores, but not TR-S scores, predicted nonremission. This study suggests that the hierarchical manner in which the field has typically gauged levels of treatment resistance may not be strongly supported by empirical evidence. This study suggests that the MGH staging model may offer some advantages over the staging method by Thase and Rush, as it generates a continuous score that considers both number of trials and intensity/optimization of each trial.
Ning, Jing; Chen, Yong; Piao, Jin
2017-07-01
Publication bias occurs when the published research results are systematically unrepresentative of the population of studies that have been conducted, and is a potential threat to meaningful meta-analysis. The Copas selection model provides a flexible framework for correcting estimates and offers considerable insight into the publication bias. However, maximizing the observed likelihood under the Copas selection model is challenging because the observed data contain very little information on the latent variable. In this article, we study a Copas-like selection model and propose an expectation-maximization (EM) algorithm for estimation based on the full likelihood. Empirical simulation studies show that the EM algorithm and its associated inferential procedure performs well and avoids the non-convergence problem when maximizing the observed likelihood. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
ASSESSMENT OF LANDSCAPE CHARACTERISTICS ON THEMATIC IMAGE CLASSIFICATION ACCURACY
Landscape characteristics such as small patch size and land cover heterogeneity have been hypothesized to increase the likelihood of misclassifying pixels during thematic image classification. However, there has been a lack of empirical evidence, to support these hypotheses. This...
Use of collateral information to improve LANDSAT classification accuracies
NASA Technical Reports Server (NTRS)
Strahler, A. H. (Principal Investigator)
1981-01-01
Methods to improve LANDSAT classification accuracies were investigated including: (1) the use of prior probabilities in maximum likelihood classification as a methodology to integrate discrete collateral data with continuously measured image density variables; (2) the use of the logit classifier as an alternative to multivariate normal classification that permits mixing both continuous and categorical variables in a single model and fits empirical distributions of observations more closely than the multivariate normal density function; and (3) the use of collateral data in a geographic information system as exercised to model a desired output information layer as a function of input layers of raster format collateral and image data base layers.
Maximum Likelihood and Restricted Likelihood Solutions in Multiple-Method Studies
Rukhin, Andrew L.
2011-01-01
A formulation of the problem of combining data from several sources is discussed in terms of random effects models. The unknown measurement precision is assumed not to be the same for all methods. We investigate maximum likelihood solutions in this model. By representing the likelihood equations as simultaneous polynomial equations, the exact form of the Groebner basis for their stationary points is derived when there are two methods. A parametrization of these solutions which allows their comparison is suggested. A numerical method for solving likelihood equations is outlined, and an alternative to the maximum likelihood method, the restricted maximum likelihood, is studied. In the situation when methods variances are considered to be known an upper bound on the between-method variance is obtained. The relationship between likelihood equations and moment-type equations is also discussed. PMID:26989583
Maximum Likelihood and Restricted Likelihood Solutions in Multiple-Method Studies.
Rukhin, Andrew L
2011-01-01
A formulation of the problem of combining data from several sources is discussed in terms of random effects models. The unknown measurement precision is assumed not to be the same for all methods. We investigate maximum likelihood solutions in this model. By representing the likelihood equations as simultaneous polynomial equations, the exact form of the Groebner basis for their stationary points is derived when there are two methods. A parametrization of these solutions which allows their comparison is suggested. A numerical method for solving likelihood equations is outlined, and an alternative to the maximum likelihood method, the restricted maximum likelihood, is studied. In the situation when methods variances are considered to be known an upper bound on the between-method variance is obtained. The relationship between likelihood equations and moment-type equations is also discussed.
Weis, Daniel; Willems, Helmut
2017-06-01
The article deals with the question of how aggregated data which allow for generalizable insights can be generated from single-case based qualitative investigations. Thereby, two central challenges of qualitative social research are outlined: First, researchers must ensure that the single-case data can be aggregated and condensed so that new collective structures can be detected. Second, they must apply methods and practices to allow for the generalization of the results beyond the specific study. In the following, we demonstrate how and under what conditions these challenges can be addressed in research practice. To this end, the research process of the construction of an empirically based typology is described. A qualitative study, conducted within the framework of the Luxembourg Youth Report, is used to illustrate this process. Specifically, strategies are presented which increase the likelihood of generalizability or transferability of the results, while also highlighting their limitations.
Two new methods to fit models for network meta-analysis with random inconsistency effects.
Law, Martin; Jackson, Dan; Turner, Rebecca; Rhodes, Kirsty; Viechtbauer, Wolfgang
2016-07-28
Meta-analysis is a valuable tool for combining evidence from multiple studies. Network meta-analysis is becoming more widely used as a means to compare multiple treatments in the same analysis. However, a network meta-analysis may exhibit inconsistency, whereby the treatment effect estimates do not agree across all trial designs, even after taking between-study heterogeneity into account. We propose two new estimation methods for network meta-analysis models with random inconsistency effects. The model we consider is an extension of the conventional random-effects model for meta-analysis to the network meta-analysis setting and allows for potential inconsistency using random inconsistency effects. Our first new estimation method uses a Bayesian framework with empirically-based prior distributions for both the heterogeneity and the inconsistency variances. We fit the model using importance sampling and thereby avoid some of the difficulties that might be associated with using Markov Chain Monte Carlo (MCMC). However, we confirm the accuracy of our importance sampling method by comparing the results to those obtained using MCMC as the gold standard. The second new estimation method we describe uses a likelihood-based approach, implemented in the metafor package, which can be used to obtain (restricted) maximum-likelihood estimates of the model parameters and profile likelihood confidence intervals of the variance components. We illustrate the application of the methods using two contrasting examples. The first uses all-cause mortality as an outcome, and shows little evidence of between-study heterogeneity or inconsistency. The second uses "ear discharge" as an outcome, and exhibits substantial between-study heterogeneity and inconsistency. Both new estimation methods give results similar to those obtained using MCMC. The extent of heterogeneity and inconsistency should be assessed and reported in any network meta-analysis. Our two new methods can be used to fit models for network meta-analysis with random inconsistency effects. They are easily implemented using the accompanying R code in the Additional file 1. Using these estimation methods, the extent of inconsistency can be assessed and reported.
IMPACTS OF PATCH SIZE AND LAND COVER HETEROGENEITY ON THEMATIC IMAGE CLASSIFICATION ACCURACY
Landscape characteristics such as small patch size and land cover heterogeneity have been hypothesized to increase the likelihood of miss-classifying pixels during thematic image classification. However, there has been a lack of empirical evidence to support these hypotheses,...
Koenig, Lane; Soltoff, Samuel A; Demiralp, Berna; Demehin, Akinluwa A; Foster, Nancy E; Steinberg, Caroline Rossi; Vaz, Christopher; Wetzel, Scott; Xu, Susan
In 2016, Medicare's Hospital-Acquired Condition Reduction Program (HAC-RP) will reduce hospital payments by $364 million. Although observers have questioned the validity of certain HAC-RP measures, less attention has been paid to the determination of low-performing hospitals (bottom quartile) and the assignment of penalties. This study investigated possible bias in the HAC-RP by simulating hospitals' likelihood of being in the worst-performing quartile for 8 patient safety measures, assuming identical expected complication rates across hospitals. Simulated likelihood of being a poor performer varied with hospital size. This relationship depended on the measure's complication rate. For 3 of 8 measures examined, the equal-quality simulation identified poor performers similarly to empirical data (c-statistic approximately 0.7 or higher) and explained most of the variation in empirical performance by size (Efron's R 2 > 0.85). The Centers for Medicare & Medicaid Services could address potential bias in the HAC-RP by stratifying by hospital size or using a broader "all-harm" measure.
Complex Variation in Measures of General Intelligence and Cognitive Change
Rowe, Suzanne J.; Rowlatt, Amy; Davies, Gail; Harris, Sarah E.; Porteous, David J.; Liewald, David C.; McNeill, Geraldine; Starr, John M.
2013-01-01
Combining information from multiple SNPs may capture a greater amount of genetic variation than from the sum of individual SNP effects and help identifying missing heritability. Regions may capture variation from multiple common variants of small effect, multiple rare variants or a combination of both. We describe regional heritability mapping of human cognition. Measures of crystallised (gc) and fluid intelligence (gf) in late adulthood (64–79 years) were available for 1806 individuals genotyped for 549,692 autosomal single nucleotide polymorphisms (SNPs). The same individuals were tested at age 11, enabling us the rare opportunity to measure cognitive change across most of their lifespan. 547,750 SNPs ranked by position are divided into 10, 908 overlapping regions of 101 SNPs to estimate the genetic variance each region explains, an approach that resembles classical linkage methods. We also estimate the genetic variation explained by individual autosomes and by SNPs within genes. Empirical significance thresholds are estimated separately for each trait from whole genome scans of 500 permutated data sets. The 5% significance threshold for the likelihood ratio test of a single region ranged from 17–17.5 for the three traits. This is the equivalent to nominal significance under the expectation of a chi-squared distribution (between 1df and 0) of P<1.44×10−5. These thresholds indicate that the distribution of the likelihood ratio test from this type of variance component analysis should be estimated empirically. Furthermore, we show that estimates of variation explained by these regions can be grossly overestimated. After applying permutation thresholds, a region for gf on chromosome 5 spanning the PRRC1 gene is significant at a genome-wide 10% empirical threshold. Analysis of gene methylation on the temporal cortex provides support for the association of PRRC1 and fluid intelligence (P = 0.004), and provides a prime candidate gene for high throughput sequencing of these uniquely informative cohorts. PMID:24349040
Estimation of stochastic volatility with long memory for index prices of FTSE Bursa Malaysia KLCI
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Kho Chia; Kane, Ibrahim Lawal; Rahman, Haliza Abd
In recent years, modeling in long memory properties or fractionally integrated processes in stochastic volatility has been applied in the financial time series. A time series with structural breaks can generate a strong persistence in the autocorrelation function, which is an observed behaviour of a long memory process. This paper considers the structural break of data in order to determine true long memory time series data. Unlike usual short memory models for log volatility, the fractional Ornstein-Uhlenbeck process is neither a Markovian process nor can it be easily transformed into a Markovian process. This makes the likelihood evaluation and parametermore » estimation for the long memory stochastic volatility (LMSV) model challenging tasks. The drift and volatility parameters of the fractional Ornstein-Unlenbeck model are estimated separately using the least square estimator (lse) and quadratic generalized variations (qgv) method respectively. Finally, the empirical distribution of unobserved volatility is estimated using the particle filtering with sequential important sampling-resampling (SIR) method. The mean square error (MSE) between the estimated and empirical volatility indicates that the performance of the model towards the index prices of FTSE Bursa Malaysia KLCI is fairly well.« less
Estimation of stochastic volatility with long memory for index prices of FTSE Bursa Malaysia KLCI
NASA Astrophysics Data System (ADS)
Chen, Kho Chia; Bahar, Arifah; Kane, Ibrahim Lawal; Ting, Chee-Ming; Rahman, Haliza Abd
2015-02-01
In recent years, modeling in long memory properties or fractionally integrated processes in stochastic volatility has been applied in the financial time series. A time series with structural breaks can generate a strong persistence in the autocorrelation function, which is an observed behaviour of a long memory process. This paper considers the structural break of data in order to determine true long memory time series data. Unlike usual short memory models for log volatility, the fractional Ornstein-Uhlenbeck process is neither a Markovian process nor can it be easily transformed into a Markovian process. This makes the likelihood evaluation and parameter estimation for the long memory stochastic volatility (LMSV) model challenging tasks. The drift and volatility parameters of the fractional Ornstein-Unlenbeck model are estimated separately using the least square estimator (lse) and quadratic generalized variations (qgv) method respectively. Finally, the empirical distribution of unobserved volatility is estimated using the particle filtering with sequential important sampling-resampling (SIR) method. The mean square error (MSE) between the estimated and empirical volatility indicates that the performance of the model towards the index prices of FTSE Bursa Malaysia KLCI is fairly well.
Likelihood of Tree Topologies with Fossils and Diversification Rate Estimation.
Didier, Gilles; Fau, Marine; Laurin, Michel
2017-11-01
Since the diversification process cannot be directly observed at the human scale, it has to be studied from the information available, namely the extant taxa and the fossil record. In this sense, phylogenetic trees including both extant taxa and fossils are the most complete representations of the diversification process that one can get. Such phylogenetic trees can be reconstructed from molecular and morphological data, to some extent. Among the temporal information of such phylogenetic trees, fossil ages are by far the most precisely known (divergence times are inferences calibrated mostly with fossils). We propose here a method to compute the likelihood of a phylogenetic tree with fossils in which the only considered time information is the fossil ages, and apply it to the estimation of the diversification rates from such data. Since it is required in our computation, we provide a method for determining the probability of a tree topology under the standard diversification model. Testing our approach on simulated data shows that the maximum likelihood rate estimates from the phylogenetic tree topology and the fossil dates are almost as accurate as those obtained by taking into account all the data, including the divergence times. Moreover, they are substantially more accurate than the estimates obtained only from the exact divergence times (without taking into account the fossil record). We also provide an empirical example composed of 50 Permo-Carboniferous eupelycosaur (early synapsid) taxa ranging in age from about 315 Ma (Late Carboniferous) to 270 Ma (shortly after the end of the Early Permian). Our analyses suggest a speciation (cladogenesis, or birth) rate of about 0.1 per lineage and per myr, a marginally lower extinction rate, and a considerable hidden paleobiodiversity of early synapsids. [Extinction rate; fossil ages; maximum likelihood estimation; speciation rate.]. © The Author(s) 2017. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Testing the non-unity of rate ratio under inverse sampling.
Tang, Man-Lai; Liao, Yi Jie; Ng, Hong Keung Tony; Chan, Ping Shing
2007-08-01
Inverse sampling is considered to be a more appropriate sampling scheme than the usual binomial sampling scheme when subjects arrive sequentially, when the underlying response of interest is acute, and when maximum likelihood estimators of some epidemiologic indices are undefined. In this article, we study various statistics for testing non-unity rate ratios in case-control studies under inverse sampling. These include the Wald, unconditional score, likelihood ratio and conditional score statistics. Three methods (the asymptotic, conditional exact, and Mid-P methods) are adopted for P-value calculation. We evaluate the performance of different combinations of test statistics and P-value calculation methods in terms of their empirical sizes and powers via Monte Carlo simulation. In general, asymptotic score and conditional score tests are preferable for their actual type I error rates are well controlled around the pre-chosen nominal level, and their powers are comparatively the largest. The exact version of Wald test is recommended if one wants to control the actual type I error rate at or below the pre-chosen nominal level. If larger power is expected and fluctuation of sizes around the pre-chosen nominal level are allowed, then the Mid-P version of Wald test is a desirable alternative. We illustrate the methodologies with a real example from a heart disease study. (c) 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Intervals for posttest probabilities: a comparison of 5 methods.
Mossman, D; Berger, J O
2001-01-01
Several medical articles discuss methods of constructing confidence intervals for single proportions and the likelihood ratio, but scant attention has been given to the systematic study of intervals for the posterior odds, or the positive predictive value, of a test. The authors describe 5 methods of constructing confidence intervals for posttest probabilities when estimates of sensitivity, specificity, and the pretest probability of a disorder are derived from empirical data. They then evaluate each method to determine how well the intervals' coverage properties correspond to their nominal value. When the estimates of pretest probabilities, sensitivity, and specificity are derived from more than 80 subjects and are not close to 0 or 1, all methods generate intervals with appropriate coverage properties. When these conditions are not met, however, the best-performing method is an objective Bayesian approach implemented by a simple simulation using a spreadsheet. Physicians and investigators can generate accurate confidence intervals for posttest probabilities in small-sample situations using the objective Bayesian approach.
A method for modeling bias in a person's estimates of likelihoods of events
NASA Technical Reports Server (NTRS)
Nygren, Thomas E.; Morera, Osvaldo
1988-01-01
It is of practical importance in decision situations involving risk to train individuals to transform uncertainties into subjective probability estimates that are both accurate and unbiased. We have found that in decision situations involving risk, people often introduce subjective bias in their estimation of the likelihoods of events depending on whether the possible outcomes are perceived as being good or bad. Until now, however, the successful measurement of individual differences in the magnitude of such biases has not been attempted. In this paper we illustrate a modification of a procedure originally outlined by Davidson, Suppes, and Siegel (3) to allow for a quantitatively-based methodology for simultaneously estimating an individual's subjective utility and subjective probability functions. The procedure is now an interactive computer-based algorithm, DSS, that allows for the measurement of biases in probability estimation by obtaining independent measures of two subjective probability functions (S+ and S-) for winning (i.e., good outcomes) and for losing (i.e., bad outcomes) respectively for each individual, and for different experimental conditions within individuals. The algorithm and some recent empirical data are described.
Murray, Aja Louise; Booth, Tom; Eisner, Manuel; Obsuth, Ingrid; Ribeaud, Denis
2018-05-22
Whether or not importance should be placed on an all-encompassing general factor of psychopathology (or p factor) in classifying, researching, diagnosing, and treating psychiatric disorders depends (among other issues) on the extent to which comorbidity is symptom-general rather than staying largely within the confines of narrower transdiagnostic factors such as internalizing and externalizing. In this study, we compared three methods of estimating p factor strength. We compared omega hierarchical and explained common variance calculated from confirmatory factor analysis (CFA) bifactor models with maximum likelihood (ML) estimation, from exploratory structural equation modeling/exploratory factor analysis models with a bifactor rotation, and from Bayesian structural equation modeling (BSEM) bifactor models. Our simulation results suggested that BSEM with small variance priors on secondary loadings might be the preferred option. However, CFA with ML also performed well provided secondary loadings were modeled. We provide two empirical examples of applying the three methodologies using a normative sample of youth (z-proso, n = 1,286) and a university counseling sample (n = 359).
PACM: A Two-Stage Procedure for Analyzing Structural Models.
ERIC Educational Resources Information Center
Lehmann, Donald R.; Gupta, Sunil
1989-01-01
Path Analysis of Covariance Matrix (PACM) is described as a way to separately estimate measurement and structural models using standard least squares procedures. PACM was empirically compared to simultaneous maximum likelihood estimation and use of the LISREL computer program, and its advantages are identified. (SLD)
How users adopt healthcare information: An empirical study of an online Q&A community.
Jin, Jiahua; Yan, Xiangbin; Li, Yijun; Li, Yumei
2016-02-01
The emergence of social media technology has led to the creation of many online healthcare communities, where patients can easily share and look for healthcare-related information from peers who have experienced a similar problem. However, with increased user-generated content, there is a need to constantly analyse which content should be trusted as one sifts through enormous amounts of healthcare information. This study aims to explore patients' healthcare information seeking behavior in online communities. Based on dual-process theory and the knowledge adoption model, we proposed a healthcare information adoption model for online communities. This model highlights that information quality, emotional support, and source credibility are antecedent variables of adoption likelihood of healthcare information, and competition among repliers and involvement of recipients moderate the relationship between the antecedent variables and adoption likelihood. Empirical data were collected from the healthcare module of China's biggest Q&A community-Baidu Knows. Text mining techniques were adopted to calculate the information quality and emotional support contained in each reply text. A binary logistics regression model and hierarchical regression approach were employed to test the proposed conceptual model. Information quality, emotional support, and source credibility have significant and positive impact on healthcare information adoption likelihood, and among these factors, information quality has the biggest impact on a patient's adoption decision. In addition, competition among repliers and involvement of recipients were tested as moderating effects between these antecedent factors and the adoption likelihood. Results indicate competition among repliers positively moderates the relationship between source credibility and adoption likelihood, and recipients' involvement positively moderates the relationship between information quality, source credibility, and adoption decision. In addition to information quality and source credibility, emotional support has significant positive impact on individuals' healthcare information adoption decisions. Moreover, the relationships between information quality, source credibility, emotional support, and adoption decision are moderated by competition among repliers and involvement of recipients. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Essays on the economics and econometrics of human capital
NASA Astrophysics Data System (ADS)
Mosso, Stefano
This thesis is composed by three distinct chapters. They are related by their common theme: the economic analysis of the process of human capital formation. The first chapter distills and extends the recent research on the economics of human development and social mobility. It critically analyzes the literature on the role of early life conditions in shaping multiple life skills with emphasis on the importance of critical and sensitive investments periods in influencing skill development. It develops economic models that rationalize the empirical evidence on treatment effects of social programs and on family influence. It investigates the empirical support of recent claims, made by part of the literature, on the relevance of credit constraints in limiting skill development. It shows how credit constraints are not a major force explaining differences in the amount of parental and self-investments in skills and how untargeted income transfer policies to poor families do not significantly boost child outcomes. The second chapter compares the performance of maximum likelihood and simulated methods of moments in estimating dynamic discrete choice models. It presents a structural model of education and shows how it can be used to estimate heterogeneous returns from schooling choices which account for their continuation values. Continuation values have a large impact on returns, but are ignored in the measures commonly used to assess the value of schooling choices. The estimates from the model are used to compute a synthetic dataset. This is used to assess the ability of maximum likelihood and simulated methods of moments to recover the model parameters. It finally proposes a Monte Carlo exercise to gain confidence on the performance of a simulated method of moments algorithm. The last chapter proposes a method to assess long run impacts on earnings of early interventions even in absence of long-term data collection on earnings histories for program participants. It combines the methodological approaches of the literature on program evaluation, data combination and forecasting to develop estimators of the average treatment effects. This exercise allows a more complete cost-benefit evaluation of social programs accounting for benefits over the whole life cycle.
Joint Inference of Population Assignment and Demographic History
Choi, Sang Chul; Hey, Jody
2011-01-01
A new approach to assigning individuals to populations using genetic data is described. Most existing methods work by maximizing Hardy–Weinberg and linkage equilibrium within populations, neither of which will apply for many demographic histories. By including a demographic model, within a likelihood framework based on coalescent theory, we can jointly study demographic history and population assignment. Genealogies and population assignments are sampled from a posterior distribution using a general isolation-with-migration model for multiple populations. A measure of partition distance between assignments facilitates not only the summary of a posterior sample of assignments, but also the estimation of the posterior density for the demographic history. It is shown that joint estimates of assignment and demographic history are possible, including estimation of population phylogeny for samples from three populations. The new method is compared to results of a widely used assignment method, using simulated and published empirical data sets. PMID:21775468
Luque-Fernandez, Miguel Angel; Belot, Aurélien; Quaresma, Manuela; Maringe, Camille; Coleman, Michel P; Rachet, Bernard
2016-10-01
In population-based cancer research, piecewise exponential regression models are used to derive adjusted estimates of excess mortality due to cancer using the Poisson generalized linear modelling framework. However, the assumption that the conditional mean and variance of the rate parameter given the set of covariates x i are equal is strong and may fail to account for overdispersion given the variability of the rate parameter (the variance exceeds the mean). Using an empirical example, we aimed to describe simple methods to test and correct for overdispersion. We used a regression-based score test for overdispersion under the relative survival framework and proposed different approaches to correct for overdispersion including a quasi-likelihood, robust standard errors estimation, negative binomial regression and flexible piecewise modelling. All piecewise exponential regression models showed the presence of significant inherent overdispersion (p-value <0.001). However, the flexible piecewise exponential model showed the smallest overdispersion parameter (3.2 versus 21.3) for non-flexible piecewise exponential models. We showed that there were no major differences between methods. However, using a flexible piecewise regression modelling, with either a quasi-likelihood or robust standard errors, was the best approach as it deals with both, overdispersion due to model misspecification and true or inherent overdispersion.
Ren, Wen-Long; Wen, Yang-Jun; Dunwell, Jim M; Zhang, Yuan-Ming
2018-03-01
Although nonparametric methods in genome-wide association studies (GWAS) are robust in quantitative trait nucleotide (QTN) detection, the absence of polygenic background control in single-marker association in genome-wide scans results in a high false positive rate. To overcome this issue, we proposed an integrated nonparametric method for multi-locus GWAS. First, a new model transformation was used to whiten the covariance matrix of polygenic matrix K and environmental noise. Using the transferred model, Kruskal-Wallis test along with least angle regression was then used to select all the markers that were potentially associated with the trait. Finally, all the selected markers were placed into multi-locus model, these effects were estimated by empirical Bayes, and all the nonzero effects were further identified by a likelihood ratio test for true QTN detection. This method, named pKWmEB, was validated by a series of Monte Carlo simulation studies. As a result, pKWmEB effectively controlled false positive rate, although a less stringent significance criterion was adopted. More importantly, pKWmEB retained the high power of Kruskal-Wallis test, and provided QTN effect estimates. To further validate pKWmEB, we re-analyzed four flowering time related traits in Arabidopsis thaliana, and detected some previously reported genes that were not identified by the other methods.
Leaché, Adam D.; Banbury, Barbara L.; Felsenstein, Joseph; de Oca, Adrián nieto-Montes; Stamatakis, Alexandros
2015-01-01
Single nucleotide polymorphisms (SNPs) are useful markers for phylogenetic studies owing in part to their ubiquity throughout the genome and ease of collection. Restriction site associated DNA sequencing (RADseq) methods are becoming increasingly popular for SNP data collection, but an assessment of the best practises for using these data in phylogenetics is lacking. We use computer simulations, and new double digest RADseq (ddRADseq) data for the lizard family Phrynosomatidae, to investigate the accuracy of RAD loci for phylogenetic inference. We compare the two primary ways RAD loci are used during phylogenetic analysis, including the analysis of full sequences (i.e., SNPs together with invariant sites), or the analysis of SNPs on their own after excluding invariant sites. We find that using full sequences rather than just SNPs is preferable from the perspectives of branch length and topological accuracy, but not of computational time. We introduce two new acquisition bias corrections for dealing with alignments composed exclusively of SNPs, a conditional likelihood method and a reconstituted DNA approach. The conditional likelihood method conditions on the presence of variable characters only (the number of invariant sites that are unsampled but known to exist is not considered), while the reconstituted DNA approach requires the user to specify the exact number of unsampled invariant sites prior to the analysis. Under simulation, branch length biases increase with the amount of missing data for both acquisition bias correction methods, but branch length accuracy is much improved in the reconstituted DNA approach compared to the conditional likelihood approach. Phylogenetic analyses of the empirical data using concatenation or a coalescent-based species tree approach provide strong support for many of the accepted relationships among phrynosomatid lizards, suggesting that RAD loci contain useful phylogenetic signal across a range of divergence times despite the presence of missing data. Phylogenetic analysis of RAD loci requires careful attention to model assumptions, especially if downstream analyses depend on branch lengths. PMID:26227865
van de Schoot, Rens; Broere, Joris J.; Perryck, Koen H.; Zondervan-Zwijnenburg, Mariëlle; van Loey, Nancy E.
2015-01-01
Background The analysis of small data sets in longitudinal studies can lead to power issues and often suffers from biased parameter values. These issues can be solved by using Bayesian estimation in conjunction with informative prior distributions. By means of a simulation study and an empirical example concerning posttraumatic stress symptoms (PTSS) following mechanical ventilation in burn survivors, we demonstrate the advantages and potential pitfalls of using Bayesian estimation. Methods First, we show how to specify prior distributions and by means of a sensitivity analysis we demonstrate how to check the exact influence of the prior (mis-) specification. Thereafter, we show by means of a simulation the situations in which the Bayesian approach outperforms the default, maximum likelihood and approach. Finally, we re-analyze empirical data on burn survivors which provided preliminary evidence of an aversive influence of a period of mechanical ventilation on the course of PTSS following burns. Results Not suprisingly, maximum likelihood estimation showed insufficient coverage as well as power with very small samples. Only when Bayesian analysis, in conjunction with informative priors, was used power increased to acceptable levels. As expected, we showed that the smaller the sample size the more the results rely on the prior specification. Conclusion We show that two issues often encountered during analysis of small samples, power and biased parameters, can be solved by including prior information into Bayesian analysis. We argue that the use of informative priors should always be reported together with a sensitivity analysis. PMID:25765534
van de Schoot, Rens; Broere, Joris J; Perryck, Koen H; Zondervan-Zwijnenburg, Mariëlle; van Loey, Nancy E
2015-01-01
Background : The analysis of small data sets in longitudinal studies can lead to power issues and often suffers from biased parameter values. These issues can be solved by using Bayesian estimation in conjunction with informative prior distributions. By means of a simulation study and an empirical example concerning posttraumatic stress symptoms (PTSS) following mechanical ventilation in burn survivors, we demonstrate the advantages and potential pitfalls of using Bayesian estimation. Methods : First, we show how to specify prior distributions and by means of a sensitivity analysis we demonstrate how to check the exact influence of the prior (mis-) specification. Thereafter, we show by means of a simulation the situations in which the Bayesian approach outperforms the default, maximum likelihood and approach. Finally, we re-analyze empirical data on burn survivors which provided preliminary evidence of an aversive influence of a period of mechanical ventilation on the course of PTSS following burns. Results : Not suprisingly, maximum likelihood estimation showed insufficient coverage as well as power with very small samples. Only when Bayesian analysis, in conjunction with informative priors, was used power increased to acceptable levels. As expected, we showed that the smaller the sample size the more the results rely on the prior specification. Conclusion : We show that two issues often encountered during analysis of small samples, power and biased parameters, can be solved by including prior information into Bayesian analysis. We argue that the use of informative priors should always be reported together with a sensitivity analysis.
Model averaging in linkage analysis.
Matthysse, Steven
2006-06-05
Methods for genetic linkage analysis are traditionally divided into "model-dependent" and "model-independent," but there may be a useful place for an intermediate class, in which a broad range of possible models is considered as a parametric family. It is possible to average over model space with an empirical Bayes prior that weights models according to their goodness of fit to epidemiologic data, such as the frequency of the disease in the population and in first-degree relatives (and correlations with other traits in the pleiotropic case). For averaging over high-dimensional spaces, Markov chain Monte Carlo (MCMC) has great appeal, but it has a near-fatal flaw: it is not possible, in most cases, to provide rigorous sufficient conditions to permit the user safely to conclude that the chain has converged. A way of overcoming the convergence problem, if not of solving it, rests on a simple application of the principle of detailed balance. If the starting point of the chain has the equilibrium distribution, so will every subsequent point. The first point is chosen according to the target distribution by rejection sampling, and subsequent points by an MCMC process that has the target distribution as its equilibrium distribution. Model averaging with an empirical Bayes prior requires rapid estimation of likelihoods at many points in parameter space. Symbolic polynomials are constructed before the random walk over parameter space begins, to make the actual likelihood computations at each step of the random walk very fast. Power analysis in an illustrative case is described. (c) 2006 Wiley-Liss, Inc.
Smsynth: AN Imagery Synthesis System for Soil Moisture Retrieval
NASA Astrophysics Data System (ADS)
Cao, Y.; Xu, L.; Peng, J.
2018-04-01
Soil moisture (SM) is a important variable in various research areas, such as weather and climate forecasting, agriculture, drought and flood monitoring and prediction, and human health. An ongoing challenge in estimating SM via synthetic aperture radar (SAR) is the development of the retrieval SM methods, especially the empirical models needs as training samples a lot of measurements of SM and soil roughness parameters which are very difficult to acquire. As such, it is difficult to develop empirical models using realistic SAR imagery and it is necessary to develop methods to synthesis SAR imagery. To tackle this issue, a SAR imagery synthesis system based on the SM named SMSynth is presented, which can simulate radar signals that are realistic as far as possible to the real SAR imagery. In SMSynth, SAR backscatter coefficients for each soil type are simulated via the Oh model under the Bayesian framework, where the spatial correlation is modeled by the Markov random field (MRF) model. The backscattering coefficients simulated based on the designed soil parameters and sensor parameters are added into the Bayesian framework through the data likelihood where the soil parameters and sensor parameters are set as realistic as possible to the circumstances on the ground and in the validity range of the Oh model. In this way, a complete and coherent Bayesian probabilistic framework is established. Experimental results show that SMSynth is capable of generating realistic SAR images that suit the needs of a large amount of training samples of empirical models.
Determinants of Corporate Web Services Adoption: A Survey of Companies in Korea
ERIC Educational Resources Information Center
Kim, Daekil
2010-01-01
Despite the growing interest and attention from Information Technology researchers and practitioners, empirical research on factors that influence an organization's likelihood of adoption of Web Services has been limited. This study identified the factors influencing Web Services adoption from the perspective of 151 South Korean firms. The…
Duchesne, Thierry; Fortin, Daniel; Rivest, Louis-Paul
2015-01-01
Animal movement has a fundamental impact on population and community structure and dynamics. Biased correlated random walks (BCRW) and step selection functions (SSF) are commonly used to study movements. Because no studies have contrasted the parameters and the statistical properties of their estimators for models constructed under these two Lagrangian approaches, it remains unclear whether or not they allow for similar inference. First, we used the Weak Law of Large Numbers to demonstrate that the log-likelihood function for estimating the parameters of BCRW models can be approximated by the log-likelihood of SSFs. Second, we illustrated the link between the two approaches by fitting BCRW with maximum likelihood and with SSF to simulated movement data in virtual environments and to the trajectory of bison (Bison bison L.) trails in natural landscapes. Using simulated and empirical data, we found that the parameters of a BCRW estimated directly from maximum likelihood and by fitting an SSF were remarkably similar. Movement analysis is increasingly used as a tool for understanding the influence of landscape properties on animal distribution. In the rapidly developing field of movement ecology, management and conservation biologists must decide which method they should implement to accurately assess the determinants of animal movement. We showed that BCRW and SSF can provide similar insights into the environmental features influencing animal movements. Both techniques have advantages. BCRW has already been extended to allow for multi-state modeling. Unlike BCRW, however, SSF can be estimated using most statistical packages, it can simultaneously evaluate habitat selection and movement biases, and can easily integrate a large number of movement taxes at multiple scales. SSF thus offers a simple, yet effective, statistical technique to identify movement taxis.
Maximum likelihood solution for inclination-only data in paleomagnetism
NASA Astrophysics Data System (ADS)
Arason, P.; Levi, S.
2010-08-01
We have developed a new robust maximum likelihood method for estimating the unbiased mean inclination from inclination-only data. In paleomagnetic analysis, the arithmetic mean of inclination-only data is known to introduce a shallowing bias. Several methods have been introduced to estimate the unbiased mean inclination of inclination-only data together with measures of the dispersion. Some inclination-only methods were designed to maximize the likelihood function of the marginal Fisher distribution. However, the exact analytical form of the maximum likelihood function is fairly complicated, and all the methods require various assumptions and approximations that are often inappropriate. For some steep and dispersed data sets, these methods provide estimates that are significantly displaced from the peak of the likelihood function to systematically shallower inclination. The problem locating the maximum of the likelihood function is partly due to difficulties in accurately evaluating the function for all values of interest, because some elements of the likelihood function increase exponentially as precision parameters increase, leading to numerical instabilities. In this study, we succeeded in analytically cancelling exponential elements from the log-likelihood function, and we are now able to calculate its value anywhere in the parameter space and for any inclination-only data set. Furthermore, we can now calculate the partial derivatives of the log-likelihood function with desired accuracy, and locate the maximum likelihood without the assumptions required by previous methods. To assess the reliability and accuracy of our method, we generated large numbers of random Fisher-distributed data sets, for which we calculated mean inclinations and precision parameters. The comparisons show that our new robust Arason-Levi maximum likelihood method is the most reliable, and the mean inclination estimates are the least biased towards shallow values.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.
Kong, Shengchun; Nan, Bin
2014-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso
Kong, Shengchun; Nan, Bin
2013-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
Two stochastic models useful in petroleum exploration
NASA Technical Reports Server (NTRS)
Kaufman, G. M.; Bradley, P. G.
1972-01-01
A model of the petroleum exploration process that tests empirically the hypothesis that at an early stage in the exploration of a basin, the process behaves like sampling without replacement is proposed along with a model of the spatial distribution of petroleum reserviors that conforms to observed facts. In developing the model of discovery, the following topics are discussed: probabilitistic proportionality, likelihood function, and maximum likelihood estimation. In addition, the spatial model is described, which is defined as a stochastic process generating values of a sequence or random variables in a way that simulates the frequency distribution of areal extent, the geographic location, and shape of oil deposits
Paninski, Liam; Haith, Adrian; Szirtes, Gabor
2008-02-01
We recently introduced likelihood-based methods for fitting stochastic integrate-and-fire models to spike train data. The key component of this method involves the likelihood that the model will emit a spike at a given time t. Computing this likelihood is equivalent to computing a Markov first passage time density (the probability that the model voltage crosses threshold for the first time at time t). Here we detail an improved method for computing this likelihood, based on solving a certain integral equation. This integral equation method has several advantages over the techniques discussed in our previous work: in particular, the new method has fewer free parameters and is easily differentiable (for gradient computations). The new method is also easily adaptable for the case in which the model conductance, not just the input current, is time-varying. Finally, we describe how to incorporate large deviations approximations to very small likelihoods.
Dissociating response conflict and error likelihood in anterior cingulate cortex.
Yeung, Nick; Nieuwenhuis, Sander
2009-11-18
Neuroimaging studies consistently report activity in anterior cingulate cortex (ACC) in conditions of high cognitive demand, leading to the view that ACC plays a crucial role in the control of cognitive processes. According to one prominent theory, the sensitivity of ACC to task difficulty reflects its role in monitoring for the occurrence of competition, or "conflict," between responses to signal the need for increased cognitive control. However, a contrasting theory proposes that ACC is the recipient rather than source of monitoring signals, and that ACC activity observed in relation to task demand reflects the role of this region in learning about the likelihood of errors. Response conflict and error likelihood are typically confounded, making the theories difficult to distinguish empirically. The present research therefore used detailed computational simulations to derive contrasting predictions regarding ACC activity and error rate as a function of response speed. The simulations demonstrated a clear dissociation between conflict and error likelihood: fast response trials are associated with low conflict but high error likelihood, whereas slow response trials show the opposite pattern. Using the N2 component as an index of ACC activity, an EEG study demonstrated that when conflict and error likelihood are dissociated in this way, ACC activity tracks conflict and is negatively correlated with error likelihood. These findings support the conflict-monitoring theory and suggest that, in speeded decision tasks, ACC activity reflects current task demands rather than the retrospective coding of past performance.
Efficient parameter estimation in longitudinal data analysis using a hybrid GEE method.
Leung, Denis H Y; Wang, You-Gan; Zhu, Min
2009-07-01
The method of generalized estimating equations (GEEs) provides consistent estimates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). However, the efficiency of a GEE estimate can be seriously affected by the choice of the working correlation model. This study addresses this problem by proposing a hybrid method that combines multiple GEEs based on different working correlation models, using the empirical likelihood method (Qin and Lawless, 1994). Analyses show that this hybrid method is more efficient than a GEE using a misspecified working correlation model. Furthermore, if one of the working correlation structures correctly models the within-subject correlations, then this hybrid method provides the most efficient parameter estimates. In simulations, the hybrid method's finite-sample performance is superior to a GEE under any of the commonly used working correlation models and is almost fully efficient in all scenarios studied. The hybrid method is illustrated using data from a longitudinal study of the respiratory infection rates in 275 Indonesian children.
An imbalance fault detection method based on data normalization and EMD for marine current turbines.
Zhang, Milu; Wang, Tianzhen; Tang, Tianhao; Benbouzid, Mohamed; Diallo, Demba
2017-05-01
This paper proposes an imbalance fault detection method based on data normalization and Empirical Mode Decomposition (EMD) for variable speed direct-drive Marine Current Turbine (MCT) system. The method is based on the MCT stator current under the condition of wave and turbulence. The goal of this method is to extract blade imbalance fault feature, which is concealed by the supply frequency and the environment noise. First, a Generalized Likelihood Ratio Test (GLRT) detector is developed and the monitoring variable is selected by analyzing the relationship between the variables. Then, the selected monitoring variable is converted into a time series through data normalization, which makes the imbalance fault characteristic frequency into a constant. At the end, the monitoring variable is filtered out by EMD method to eliminate the effect of turbulence. The experiments show that the proposed method is robust against turbulence through comparing the different fault severities and the different turbulence intensities. Comparison with other methods, the experimental results indicate the feasibility and efficacy of the proposed method. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Kim, Eunyoung; Alhaddab, Taghreed A.; Aquino, Katherine C.; Negi, Reema
2016-01-01
Existing body of research indicates that both cognitive and non-cognitive factors contribute to college students' tendency of academic procrastination. However, little attention has been paid to the likelihood of academic procrastination among Asian international college students. Given the need for empirical research on why Asian international…
Etzel, C J; Shete, S; Beasley, T M; Fernandez, J R; Allison, D B; Amos, C I
2003-01-01
Non-normality of the phenotypic distribution can affect power to detect quantitative trait loci in sib pair studies. Previously, we observed that Winsorizing the sib pair phenotypes increased the power of quantitative trait locus (QTL) detection for both Haseman-Elston (HE) least-squares tests [Hum Hered 2002;53:59-67] and maximum likelihood-based variance components (MLVC) analysis [Behav Genet (in press)]. Winsorizing the phenotypes led to a slight increase in type 1 error in H-E tests and a slight decrease in type I error for MLVC analysis. Herein, we considered transforming the sib pair phenotypes using the Box-Cox family of transformations. Data were simulated for normal and non-normal (skewed and kurtic) distributions. Phenotypic values were replaced by Box-Cox transformed values. Twenty thousand replications were performed for three H-E tests of linkage and the likelihood ratio test (LRT), the Wald test and other robust versions based on the MLVC method. We calculated the relative nominal inflation rate as the ratio of observed empirical type 1 error divided by the set alpha level (5, 1 and 0.1% alpha levels). MLVC tests applied to non-normal data had inflated type I errors (rate ratio greater than 1.0), which were controlled best by Box-Cox transformation and to a lesser degree by Winsorizing. For example, for non-transformed, skewed phenotypes (derived from a chi2 distribution with 2 degrees of freedom), the rates of empirical type 1 error with respect to set alpha level=0.01 were 0.80, 4.35 and 7.33 for the original H-E test, LRT and Wald test, respectively. For the same alpha level=0.01, these rates were 1.12, 3.095 and 4.088 after Winsorizing and 0.723, 1.195 and 1.905 after Box-Cox transformation. Winsorizing reduced inflated error rates for the leptokurtic distribution (derived from a Laplace distribution with mean 0 and variance 8). Further, power (adjusted for empirical type 1 error) at the 0.01 alpha level ranged from 4.7 to 17.3% across all tests using the non-transformed, skewed phenotypes, from 7.5 to 20.1% after Winsorizing and from 12.6 to 33.2% after Box-Cox transformation. Likewise, power (adjusted for empirical type 1 error) using leptokurtic phenotypes at the 0.01 alpha level ranged from 4.4 to 12.5% across all tests with no transformation, from 7 to 19.2% after Winsorizing and from 4.5 to 13.8% after Box-Cox transformation. Thus the Box-Cox transformation apparently provided the best type 1 error control and maximal power among the procedures we considered for analyzing a non-normal, skewed distribution (chi2) while Winzorizing worked best for the non-normal, kurtic distribution (Laplace). We repeated the same simulations using a larger sample size (200 sib pairs) and found similar results. Copyright 2003 S. Karger AG, Basel
Effects of Habitual Anger on Employees’ Behavior during Organizational Change
Bönigk, Mareike; Steffgen, Georges
2013-01-01
Organizational change is a particularly emotional event for those being confronted with it. Anger is a frequently experienced emotion under these conditions. This study analyses the influence of employees’ habitual anger reactions on their reported behavior during organizational change. It was explored whether anger reactions conducive to recovering or increasing individual well-being will enhance the likelihood of functional change behavior. Dysfunctional regulation strategies in terms of individual well-being are expected to decrease the likelihood of functional change behavior—mediated by the commitment to change. Four hundred and twelve employees of different organizations in Luxembourg undergoing organizational change participated in the study. Findings indicate that the anger regulation strategy venting, and humor increase the likelihood of deviant resistance to change. Downplaying the incident’s negative impact and feedback increase the likelihood of active support for change. The mediating effect of commitment to change has been found for humor and submission. The empirical findings suggest that a differentiated conceptualization of resistance to change is required. Specific implications for practical change management and for future research are discussed. PMID:24287849
Effects of habitual anger on employees' behavior during organizational change.
Bönigk, Mareike; Steffgen, Georges
2013-11-25
Organizational change is a particularly emotional event for those being confronted with it. Anger is a frequently experienced emotion under these conditions. This study analyses the influence of employees' habitual anger reactions on their reported behavior during organizational change. It was explored whether anger reactions conducive to recovering or increasing individual well-being will enhance the likelihood of functional change behavior. Dysfunctional regulation strategies in terms of individual well-being are expected to decrease the likelihood of functional change behavior-mediated by the commitment to change. Four hundred and twelve employees of different organizations in Luxembourg undergoing organizational change participated in the study. Findings indicate that the anger regulation strategy venting, and humor increase the likelihood of deviant resistance to change. Downplaying the incident's negative impact and feedback increase the likelihood of active support for change. The mediating effect of commitment to change has been found for humor and submission. The empirical findings suggest that a differentiated conceptualization of resistance to change is required. Specific implications for practical change management and for future research are discussed.
NASA Astrophysics Data System (ADS)
Elshall, A. S.; Ye, M.; Niu, G. Y.; Barron-Gafford, G.
2016-12-01
Bayesian multimodel inference is increasingly being used in hydrology. Estimating Bayesian model evidence (BME) is of central importance in many Bayesian multimodel analysis such as Bayesian model averaging and model selection. BME is the overall probability of the model in reproducing the data, accounting for the trade-off between the goodness-of-fit and the model complexity. Yet estimating BME is challenging, especially for high dimensional problems with complex sampling space. Estimating BME using the Monte Carlo numerical methods is preferred, as the methods yield higher accuracy than semi-analytical solutions (e.g. Laplace approximations, BIC, KIC, etc.). However, numerical methods are prone the numerical demons arising from underflow of round off errors. Although few studies alluded to this issue, to our knowledge this is the first study that illustrates these numerical demons. We show that the precision arithmetic can become a threshold on likelihood values and Metropolis acceptance ratio, which results in trimming parameter regions (when likelihood function is less than the smallest floating point number that a computer can represent) and corrupting of the empirical measures of the random states of the MCMC sampler (when using log-likelihood function). We consider two of the most powerful numerical estimators of BME that are the path sampling method of thermodynamic integration (TI) and the importance sampling method of steppingstone sampling (SS). We also consider the two most widely used numerical estimators, which are the prior sampling arithmetic mean (AS) and posterior sampling harmonic mean (HM). We investigate the vulnerability of these four estimators to the numerical demons. Interesting, the most biased estimator, namely the HM, turned out to be the least vulnerable. While it is generally assumed that AM is a bias-free estimator that will always approximate the true BME by investing in computational effort, we show that arithmetic underflow can hamper AM resulting in severe underestimation of BME. TI turned out to be the most vulnerable, resulting in BME overestimation. Finally, we show how SS can be largely invariant to rounding errors, yielding the most accurate and computational efficient results. These research results are useful for MC simulations to estimate Bayesian model evidence.
Ehn, S; Sellerer, T; Mechlem, K; Fehringer, A; Epple, M; Herzen, J; Pfeiffer, F; Noël, P B
2017-01-07
Following the development of energy-sensitive photon-counting detectors using high-Z sensor materials, application of spectral x-ray imaging methods to clinical practice comes into reach. However, these detectors require extensive calibration efforts in order to perform spectral imaging tasks like basis material decomposition. In this paper, we report a novel approach to basis material decomposition that utilizes a semi-empirical estimator for the number of photons registered in distinct energy bins in the presence of beam-hardening effects which can be termed as a polychromatic Beer-Lambert model. A maximum-likelihood estimator is applied to the model in order to obtain estimates of the underlying sample composition. Using a Monte-Carlo simulation of a typical clinical CT acquisition, the performance of the proposed estimator was evaluated. The estimator is shown to be unbiased and efficient according to the Cramér-Rao lower bound. In particular, the estimator is capable of operating with a minimum number of calibration measurements. Good results were obtained after calibration using less than 10 samples of known composition in a two-material attenuation basis. This opens up the possibility for fast re-calibration in the clinical routine which is considered an advantage of the proposed method over other implementations reported in the literature.
NASA Astrophysics Data System (ADS)
Ehn, S.; Sellerer, T.; Mechlem, K.; Fehringer, A.; Epple, M.; Herzen, J.; Pfeiffer, F.; Noël, P. B.
2017-01-01
Following the development of energy-sensitive photon-counting detectors using high-Z sensor materials, application of spectral x-ray imaging methods to clinical practice comes into reach. However, these detectors require extensive calibration efforts in order to perform spectral imaging tasks like basis material decomposition. In this paper, we report a novel approach to basis material decomposition that utilizes a semi-empirical estimator for the number of photons registered in distinct energy bins in the presence of beam-hardening effects which can be termed as a polychromatic Beer-Lambert model. A maximum-likelihood estimator is applied to the model in order to obtain estimates of the underlying sample composition. Using a Monte-Carlo simulation of a typical clinical CT acquisition, the performance of the proposed estimator was evaluated. The estimator is shown to be unbiased and efficient according to the Cramér-Rao lower bound. In particular, the estimator is capable of operating with a minimum number of calibration measurements. Good results were obtained after calibration using less than 10 samples of known composition in a two-material attenuation basis. This opens up the possibility for fast re-calibration in the clinical routine which is considered an advantage of the proposed method over other implementations reported in the literature.
Factors influencing unsafe behaviors and accidents on construction sites: a review.
Khosravi, Yahya; Asilian-Mahabadi, Hassan; Hajizadeh, Ebrahim; Hassanzadeh-Rangi, Narmin; Bastani, Hamid; Behzadan, Amir H
2014-01-01
Construction is a hazardous occupation due to the unique nature of activities involved and the repetitiveness of several field behaviors. The aim of this methodological and theoretical review is to explore the empirical factors influencing unsafe behaviors and accidents on construction sites. In this work, results and findings from 56 related previous studies were investigated. These studies were categorized based on their design, type, methods of data collection, analytical methods, variables, and key findings. A qualitative content analysis procedure was used to extract variables, themes, and factors. In addition, all studies were reviewed to determine the quality rating and to evaluate the strength of provided evidence. The content analysis identified 8 main categories: (a) society, (b) organization, (c) project management, (d) supervision, (e) contractor, (f) site condition, (g) work group, and (h) individual characteristics. The review highlighted the importance of more distal factors, e.g., society and organization, and project management, that may contribute to reducing the likelihood of unsafe behaviors and accidents through the promotion of site condition and individual features (as proximal factors). Further research is necessary to provide a better understanding of the links between unsafe behavior theories and empirical findings, challenge theoretical assumptions, develop new applied theories, and make stronger recommendations.
Family emergency preparedness plans in severe tornadoes.
Cong, Zhen; Liang, Daan; Luo, Jianjun
2014-01-01
Tornadoes, with warnings usually issued just minutes before their touchdowns, pose great threats to properties and people's physical and mental health. Few studies have empirically investigated the association of family emergency preparedness planning and observed protective behaviors in the context of tornadoes. The purpose of this study was to examine predictors for the action of taking shelter at the time of tornadoes. Specifically, this study investigated whether having a family emergency preparedness plan was associated with higher likelihood of taking shelter upon receiving tornado warnings. This study also examined the effects of socioeconomic status and functional limitations on taking such actions. A telephone survey based on random sampling was conducted in 2012 with residents in Tuscaloosa AL and Joplin MO. Each city experienced considerable damages, injuries, and casualties after severe tornadoes (EF-4 and EF-5) in 2011. The working sample included 892 respondents. Analysis was conducted in early 2013. Logistic regression identified emergency preparedness planning as the only shared factor that increased the likelihood of taking shelter in both cities and the only significant factor in Joplin. In Tuscaloosa, being female and white also increased the likelihood of taking shelter. Disability was not found to have an effect. This study provided empirical evidence on the importance of having a family emergency preparedness plan in mitigating the risk of tornadoes. The findings could be applied to other rapid-onset disasters. © 2013 American Journal of Preventive Medicine Published by American Journal of Preventive Medicine All rights reserved.
Information matrix estimation procedures for cognitive diagnostic models.
Liu, Yanlou; Xin, Tao; Andersson, Björn; Tian, Wei
2018-03-06
Two new methods to estimate the asymptotic covariance matrix for marginal maximum likelihood estimation of cognitive diagnosis models (CDMs), the inverse of the observed information matrix and the sandwich-type estimator, are introduced. Unlike several previous covariance matrix estimators, the new methods take into account both the item and structural parameters. The relationships between the observed information matrix, the empirical cross-product information matrix, the sandwich-type covariance matrix and the two approaches proposed by de la Torre (2009, J. Educ. Behav. Stat., 34, 115) are discussed. Simulation results show that, for a correctly specified CDM and Q-matrix or with a slightly misspecified probability model, the observed information matrix and the sandwich-type covariance matrix exhibit good performance with respect to providing consistent standard errors of item parameter estimates. However, with substantial model misspecification only the sandwich-type covariance matrix exhibits robust performance. © 2018 The British Psychological Society.
NASA Astrophysics Data System (ADS)
Li, M.; Jiang, Y. S.
2014-11-01
Micro-Doppler effect is induced by the micro-motion dynamics of the radar target itself or any structure on the target. In this paper, a simplified cone-shaped model for ballistic missile warhead with micro-nutation is established, followed by the theoretical formula of micro-nutation is derived. It is confirmed that the theoretical results are identical to simulation results by using short-time Fourier transform. Then we propose a new method for nutation period extraction via signature maximum energy fitting based on empirical mode decomposition and short-time Fourier transform. The maximum wobble angle is also extracted by distance approximate approach in a small range of wobble angle, which is combined with the maximum likelihood estimation. By the simulation studies, it is shown that these two feature extraction methods are both valid even with low signal-to-noise ratio.
NASA Astrophysics Data System (ADS)
Taylor, Gabriel James
The failure of electrical cables exposed to severe thermal fire conditions are a safety concern for operating commercial nuclear power plants (NPPs). The Nuclear Regulatory Commission (NRC) has promoted the use of risk-informed and performance-based methods for fire protection which resulted in a need to develop realistic methods to quantify the risk of fire to NPP safety. Recent electrical cable testing has been conducted to provide empirical data on the failure modes and likelihood of fire-induced damage. This thesis evaluated numerous aspects of the data. Circuit characteristics affecting fire-induced electrical cable failure modes have been evaluated. In addition, thermal failure temperatures corresponding to cable functional failures have been evaluated to develop realistic single point thermal failure thresholds and probability distributions for specific cable insulation types. Finally, the data was used to evaluate the prediction capabilities of a one-dimension conductive heat transfer model used to predict cable failure.
Empiric auto-titrating CPAP in people with suspected obstructive sleep apnea.
Drummond, Fitzgerald; Doelken, Peter; Ahmed, Qanta A; Gilbert, Gregory E; Strange, Charlie; Herpel, Laura; Frye, Michael D
2010-04-15
Efficient diagnosis and treatment of obstructive sleep apnea (OSA) can be difficult because of time delays imposed by clinic visits and serial overnight polysomnography. In some cases, it may be desirable to initiate treatment for suspected OSA prior to polysomnography. Our objective was to compare the improvement of daytime sleepiness and sleep-related quality of life of patients with high clinical likelihood of having OSA who were randomly assigned to receive empiric auto-titrating continuous positive airway pressure (CPAP) while awaiting polysomnogram versus current usual care. Serial patients referred for overnight polysomnography who had high clinical likelihood of having OSA were randomly assigned to usual care or immediate initiation of auto-titrating CPAP. Epworth Sleepiness Scale (ESS) scores and the Functional Outcomes of Sleep Questionnaire (FOSQ) scores were obtained at baseline, 1 month after randomization, and again after initiation of fixed CPAP in control subjects and after the sleep study in auto-CPAP patients. One hundred nine patients were randomized. Baseline demographics, daytime sleepiness, and sleep-related quality of life scores were similar between groups. One-month ESS and FOSQ scores were improved in the group empirically treated with auto-titrating CPAP. ESS scores improved in the first month by a mean of -3.2 (confidence interval -1.6 to -4.8, p < 0.001) and FOSQ scores improved by a mean of 1.5, (confidence interval 0.5 to 2.7, p = 0.02), whereas scores in the usual-care group did not change (p = NS). Following therapy directed by overnight polysomnography in the control group, there were no differences in ESS or FOSQ between the groups. No adverse events were observed. Empiric auto-CPAP resulted in symptomatic improvement of daytime sleepiness and sleep-related quality of life in a cohort of patients awaiting polysomnography who had a high pretest probability of having OSA. Additional studies are needed to evaluate the applicability of empiric treatment to other populations.
Inferring social ties from geographic coincidences.
Crandall, David J; Backstrom, Lars; Cosley, Dan; Suri, Siddharth; Huttenlocher, Daniel; Kleinberg, Jon
2010-12-28
We investigate the extent to which social ties between people can be inferred from co-occurrence in time and space: Given that two people have been in approximately the same geographic locale at approximately the same time, on multiple occasions, how likely are they to know each other? Furthermore, how does this likelihood depend on the spatial and temporal proximity of the co-occurrences? Such issues arise in data originating in both online and offline domains as well as settings that capture interfaces between online and offline behavior. Here we develop a framework for quantifying the answers to such questions, and we apply this framework to publicly available data from a social media site, finding that even a very small number of co-occurrences can result in a high empirical likelihood of a social tie. We then present probabilistic models showing how such large probabilities can arise from a natural model of proximity and co-occurrence in the presence of social ties. In addition to providing a method for establishing some of the first quantifiable estimates of these measures, our findings have potential privacy implications, particularly for the ways in which social structures can be inferred from public online records that capture individuals' physical locations over time.
Kim, Joanne Soo-Min; Coyte, Peter C.; Cotterchio, Michelle; Keogh, Louise A.; Flander, Louisa B.; Gaff, Clara; Laporte, Audrey
2016-01-01
Background This study investigated whether receiving the results of predictive genetic testing for Lynch syndrome—indicating the presence or absence of an inherited predisposition to various cancers, including colorectal cancer—was associated with change in individual colonoscopy and smoking behaviours, which could prevent colorectal cancer. Methods The study population included individuals with no previous diagnosis of colorectal cancer, whose families had already-identified deleterious mutations in the mismatch repair or EPCAM genes. Hypotheses were generated from a simple health economics model and tested against individual-level panel data from the Australasian Colorectal Cancer Family Registry. Results The empirical analysis revealed evidence consistent with some of the hypotheses, with a higher likelihood of undergoing colonoscopy in those who discovered their genetic predisposition to colorectal cancer and a lower likelihood of quitting smoking in those who discovered their lack thereof. Conclusion Predictive genetic information about Lynch syndrome was associated with change in individual colonoscopy and smoking behaviours but not necessarily in ways to improve population health. Impact The study findings suggest that the impact of personalized medicine on disease prevention is intricate, warranting further analyses to determine the net benefits and costs. PMID:27528600
An Extension of the Partial Credit Model with an Application to the Measurement of Change.
ERIC Educational Resources Information Center
Fischer, Gerhard H.; Ponocny, Ivo
1994-01-01
An extension to the partial credit model, the linear partial credit model, is considered under the assumption of a certain linear decomposition of the item x category parameters into basic parameters. A conditional maximum likelihood algorithm for estimating basic parameters is presented and illustrated with simulation and an empirical study. (SLD)
Factors Related to the Likelihood of Grade Inflation at Community Colleges
ERIC Educational Resources Information Center
Heulett, Steven Talmadge
2013-01-01
A number of studies have documented a trend of higher grades awarded by postsecondary institutions in both the United States and Canada over the last two decades. Grade inflation in higher education is a potentially costly problem for a variety of reasons, but little empirical research about the causes of grade inflation has been conducted. This…
IRT-LR-DIF with Estimation of the Focal-Group Density as an Empirical Histogram
ERIC Educational Resources Information Center
Woods, Carol M.
2008-01-01
Item response theory-likelihood ratio-differential item functioning (IRT-LR-DIF) is used to evaluate the degree to which items on a test or questionnaire have different measurement properties for one group of people versus another, irrespective of group-mean differences on the construct. Usually, the latent distribution is presumed normal for both…
ERIC Educational Resources Information Center
Shippen, Margaret E.; Patterson, DaShaunda; Green, Kemeche L.; Smitherman, Tracy
2012-01-01
Youth at risk for school failure need community and school supports to reduce the likelihood of developing delinquent behavior. This article provides an overview of community and school approaches aimed at intervening on the school-to-prison pipeline. Community and school efforts are emerging that take into account empirical evidence demonstrating…
ERIC Educational Resources Information Center
Guillet, Emma; Sarrazin, Philippe; Fontayne, Paul; Brustad, Robert J.
2006-01-01
An empirical research study based upon the expectancy-value model of Eccles and colleagues (1983) investigated the effect of gender-role orientations on psychological dimensions of female athletes' sport participation and the likelihood of their continued participation in a stereotypical masculine activity. The model (Eccles et al., 1983) posits…
ERIC Educational Resources Information Center
Smith, Emma; White, Patrick
2015-01-01
This paper contributes to the empirical evidence on participation and attainment in higher education by reviewing the patterns of entry and success of undergraduate students. It examines the characteristics of entrants to different subjects and considers the role that subject studied plays in determining the likelihood of graduating with a…
Regional Earthquake Likelihood Models: A realm on shaky grounds?
NASA Astrophysics Data System (ADS)
Kossobokov, V.
2005-12-01
Seismology is juvenile and its appropriate statistical tools to-date may have a "medievil flavor" for those who hurry up to apply a fuzzy language of a highly developed probability theory. To become "quantitatively probabilistic" earthquake forecasts/predictions must be defined with a scientific accuracy. Following the most popular objectivists' viewpoint on probability, we cannot claim "probabilities" adequate without a long series of "yes/no" forecast/prediction outcomes. Without "antiquated binary language" of "yes/no" certainty we cannot judge an outcome ("success/failure"), and, therefore, quantify objectively a forecast/prediction method performance. Likelihood scoring is one of the delicate tools of Statistics, which could be worthless or even misleading when inappropriate probability models are used. This is a basic loophole for a misuse of likelihood as well as other statistical methods on practice. The flaw could be avoided by an accurate verification of generic probability models on the empirical data. It is not an easy task in the frames of the Regional Earthquake Likelihood Models (RELM) methodology, which neither defines the forecast precision nor allows a means to judge the ultimate success or failure in specific cases. Hopefully, the RELM group realizes the problem and its members do their best to close the hole with an adequate, data supported choice. Regretfully, this is not the case with the erroneous choice of Gerstenberger et al., who started the public web site with forecasts of expected ground shaking for `tomorrow' (Nature 435, 19 May 2005). Gerstenberger et al. have inverted the critical evidence of their study, i.e., the 15 years of recent seismic record accumulated just in one figure, which suggests rejecting with confidence above 97% "the generic California clustering model" used in automatic calculations. As a result, since the date of publication in Nature the United States Geological Survey website delivers to the public, emergency planners and the media, a forecast product, which is based on wrong assumptions that violate the best-documented earthquake statistics in California, which accuracy was not investigated, and which forecasts were not tested in a rigorous way.
Nowak, Michael D.; Smith, Andrew B.; Simpson, Carl; Zwickl, Derrick J.
2013-01-01
Molecular divergence time analyses often rely on the age of fossil lineages to calibrate node age estimates. Most divergence time analyses are now performed in a Bayesian framework, where fossil calibrations are incorporated as parametric prior probabilities on node ages. It is widely accepted that an ideal parameterization of such node age prior probabilities should be based on a comprehensive analysis of the fossil record of the clade of interest, but there is currently no generally applicable approach for calculating such informative priors. We provide here a simple and easily implemented method that employs fossil data to estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade, which can be used to fit an informative parametric prior probability distribution on a node age. Specifically, our method uses the extant diversity and the stratigraphic distribution of fossil lineages confidently assigned to a clade to fit a branching model of lineage diversification. Conditioning this on a simple model of fossil preservation, we estimate the likely amount of missing history prior to the oldest fossil occurrence of a clade. The likelihood surface of missing history can then be translated into a parametric prior probability distribution on the age of the clade of interest. We show that the method performs well with simulated fossil distribution data, but that the likelihood surface of missing history can at times be too complex for the distribution-fitting algorithm employed by our software tool. An empirical example of the application of our method is performed to estimate echinoid node ages. A simulation-based sensitivity analysis using the echinoid data set shows that node age prior distributions estimated under poor preservation rates are significantly less informative than those estimated under high preservation rates. PMID:23755303
Haplotype-Based Association Analysis via Variance-Components Score Test
Tzeng, Jung-Ying ; Zhang, Daowen
2007-01-01
Haplotypes provide a more informative format of polymorphisms for genetic association analysis than do individual single-nucleotide polymorphisms. However, the practical efficacy of haplotype-based association analysis is challenged by a trade-off between the benefits of modeling abundant variation and the cost of the extra degrees of freedom. To reduce the degrees of freedom, several strategies have been considered in the literature. They include (1) clustering evolutionarily close haplotypes, (2) modeling the level of haplotype sharing, and (3) smoothing haplotype effects by introducing a correlation structure for haplotype effects and studying the variance components (VC) for association. Although the first two strategies enjoy a fair extent of power gain, empirical evidence showed that VC methods may exhibit only similar or less power than the standard haplotype regression method, even in cases of many haplotypes. In this study, we report possible reasons that cause the underpowered phenomenon and show how the power of the VC strategy can be improved. We construct a score test based on the restricted maximum likelihood or the marginal likelihood function of the VC and identify its nontypical limiting distribution. Through simulation, we demonstrate the validity of the test and investigate the power performance of the VC approach and that of the standard haplotype regression approach. With suitable choices for the correlation structure, the proposed method can be directly applied to unphased genotypic data. Our method is applicable to a wide-ranging class of models and is computationally efficient and easy to implement. The broad coverage and the fast and easy implementation of this method make the VC strategy an effective tool for haplotype analysis, even in modern genomewide association studies. PMID:17924336
The Scientific Method, Diagnostic Bayes, and How to Detect Epistemic Errors
NASA Astrophysics Data System (ADS)
Vrugt, J. A.
2015-12-01
In the past decades, Bayesian methods have found widespread application and use in environmental systems modeling. Bayes theorem states that the posterior probability, P(H|D) of a hypothesis, H is proportional to the product of the prior probability, P(H) of this hypothesis and the likelihood, L(H|hat{D}) of the same hypothesis given the new/incoming observations, \\hat {D}. In science and engineering, H often constitutes some numerical simulation model, D = F(x,.) which summarizes using algebraic, empirical, and differential equations, state variables and fluxes, all our theoretical and/or practical knowledge of the system of interest, and x are the d unknown parameters which are subject to inference using some data, \\hat {D} of the observed system response. The Bayesian approach is intimately related to the scientific method and uses an iterative cycle of hypothesis formulation (model), experimentation and data collection, and theory/hypothesis refinement to elucidate the rules that govern the natural world. Unfortunately, model refinement has proven to be very difficult in large part because of the poor diagnostic power of residual based likelihood functions tep{gupta2008}. This has inspired te{vrugt2013} to advocate the use of 'likelihood-free' inference using approximate Bayesian computation (ABC). This approach uses one or more summary statistics, S(\\hat {D}) of the original data, \\hat {D} designed ideally to be sensitive only to one particular process in the model. Any mismatch between the observed and simulated summary metrics is then easily linked to a specific model component. A recurrent issue with the application of ABC is self-sufficiency of the summary statistics. In theory, S(.) should contain as much information as the original data itself, yet complex systems rarely admit sufficient statistics. In this article, we propose to combine the ideas of ABC and regular Bayesian inference to guarantee that no information is lost in diagnostic model evaluation. This hybrid approach, coined diagnostic Bayes, uses the summary metrics as prior distribution and original data in the likelihood function, or P(x|\\hat {D}) ∝ P(x|S(\\hat {D})) L(x|\\hat {D}). A case study illustrates the ability of the proposed methodology to diagnose epistemic errors and provide guidance on model refinement.
Assessment of parametric uncertainty for groundwater reactive transport modeling,
Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun
2014-01-01
The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
The distribution of genetic variance across phenotypic space and the response to selection.
Blows, Mark W; McGuigan, Katrina
2015-05-01
The role of adaptation in biological invasions will depend on the availability of genetic variation for traits under selection in the new environment. Although genetic variation is present for most traits in most populations, selection is expected to act on combinations of traits, not individual traits in isolation. The distribution of genetic variance across trait combinations can be characterized by the empirical spectral distribution of the genetic variance-covariance (G) matrix. Empirical spectral distributions of G from a range of trait types and taxa all exhibit a characteristic shape; some trait combinations have large levels of genetic variance, while others have very little genetic variance. In this study, we review what is known about the empirical spectral distribution of G and show how it predicts the response to selection across phenotypic space. In particular, trait combinations that form a nearly null genetic subspace with little genetic variance respond only inconsistently to selection. We go on to set out a framework for understanding how the empirical spectral distribution of G may differ from the random expectations that have been developed under random matrix theory (RMT). Using a data set containing a large number of gene expression traits, we illustrate how hypotheses concerning the distribution of multivariate genetic variance can be tested using RMT methods. We suggest that the relative alignment between novel selection pressures during invasion and the nearly null genetic subspace is likely to be an important component of the success or failure of invasion, and for the likelihood of rapid adaptation in small populations in general. © 2014 John Wiley & Sons Ltd.
Dawson, Ian G J; Johnson, Johnnie E V
2014-08-01
In 2011, the global human population reached 7 billion and medium variant projections indicate that it will exceed 9 billion before 2045. Theoretical and empirical perspectives suggest that this growth could lead to an increase in the likelihood of adverse events (e.g., food shortages, climate change, etc.) and/or the severity of adverse events (e.g., famines, natural disasters, etc.). Several scholars have posited that the size to which the global population grows and the extent to which this growth increases the likelihood of adverse outcomes will largely be shaped by individuals' decisions (in households, organizations, governments, etc.). In light of the strong relationship between perceived risk and decision behaviors, it is surprising that there remains a dearth of empirical research that specifically examines the perceived risks of population growth and how these perceptions might influence related decisions. In an attempt to motivate this important strand of research, this article examines the major risks that may be exacerbated by global population growth and draws upon empirical work concerning the perception and communication of risk to identify potential directions for future research. The article also considers how individuals might perceive both the risks and benefits of population growth and be helped to better understand and address the related issues. The answers to these questions could help humanity better manage the emerging consequences of its continuing success in increasing infant survival and adult longevity. © 2014 Society for Risk Analysis.
On Bayesian Testing of Additive Conjoint Measurement Axioms Using Synthetic Likelihood
ERIC Educational Resources Information Center
Karabatsos, George
2017-01-01
This article introduces a Bayesian method for testing the axioms of additive conjoint measurement. The method is based on an importance sampling algorithm that performs likelihood-free, approximate Bayesian inference using a synthetic likelihood to overcome the analytical intractability of this testing problem. This new method improves upon…
Jeon, Jihyoun; Hsu, Li; Gorfine, Malka
2012-07-01
Frailty models are useful for measuring unobserved heterogeneity in risk of failures across clusters, providing cluster-specific risk prediction. In a frailty model, the latent frailties shared by members within a cluster are assumed to act multiplicatively on the hazard function. In order to obtain parameter and frailty variate estimates, we consider the hierarchical likelihood (H-likelihood) approach (Ha, Lee and Song, 2001. Hierarchical-likelihood approach for frailty models. Biometrika 88, 233-243) in which the latent frailties are treated as "parameters" and estimated jointly with other parameters of interest. We find that the H-likelihood estimators perform well when the censoring rate is low, however, they are substantially biased when the censoring rate is moderate to high. In this paper, we propose a simple and easy-to-implement bias correction method for the H-likelihood estimators under a shared frailty model. We also extend the method to a multivariate frailty model, which incorporates complex dependence structure within clusters. We conduct an extensive simulation study and show that the proposed approach performs very well for censoring rates as high as 80%. We also illustrate the method with a breast cancer data set. Since the H-likelihood is the same as the penalized likelihood function, the proposed bias correction method is also applicable to the penalized likelihood estimators.
ERIC Educational Resources Information Center
Mittal, Surabhi; Mehar, Mamta
2016-01-01
Purpose: The paper analyzes factors that affect the likelihood of adoption of different agriculture-related information sources by farmers. Design/Methodology/Approach: The paper links the theoretical understanding of the existing multiple sources of information that farmers use, with the empirical model to analyze the factors that affect the…
An Empirical Investigation of Organizational Effectiveness and Performance.
1983-07-29
on rerie side II necessaIr and Identify by block nuIb*’) Organizational Effectiveness Model of Effectiveness Corporate Strategy Organizational...complex and problematic task with little likelihood of reaching a final or even acceptable solution, social science research, never-the-less, should...regulations, interest rates, the weather, availability of supplies and raw materials, changing social values, and other external organizational events
Temsah, Gheda; Johnson, Kiersten; Evans, Thea; Adams, Diane K
2018-01-01
There is an urgent need for an improved empirical understanding of the relationship among biodiverse marine resources, human health and development outcomes. Coral reefs are often at this intersection for developing nations in the tropics-an ecosystem targeted for biodiversity conservation and one that provides sustenance and livelihoods for many coastal communities. To explore these relationships, we use the comparative development contexts of Haiti and the Dominican Republic on the island of Hispaniola. We combine child nutrition data from the Demographic Health Survey with coastal proximity and coral reef habitat diversity, and condition to empirically test human benefits of marine natural resources in differing development contexts. Our results indicate that coastal children have a reduced likelihood of severe stunting in Haiti but have increased likelihoods of stunting and reduced dietary diversity in the Dominican Republic. These contrasting results are likely due to the differential in developed infrastructure and market access. Our analyses did not demonstrate an association between more diverse and less degraded coral reefs and better childhood nutrition. The results highlight the complexities of modelling interactions between the health of humans and natural systems, and indicate the next steps needed to support integrated development programming.
Temsah, Gheda; Johnson, Kiersten; Evans, Thea
2018-01-01
There is an urgent need for an improved empirical understanding of the relationship among biodiverse marine resources, human health and development outcomes. Coral reefs are often at this intersection for developing nations in the tropics—an ecosystem targeted for biodiversity conservation and one that provides sustenance and livelihoods for many coastal communities. To explore these relationships, we use the comparative development contexts of Haiti and the Dominican Republic on the island of Hispaniola. We combine child nutrition data from the Demographic Health Survey with coastal proximity and coral reef habitat diversity, and condition to empirically test human benefits of marine natural resources in differing development contexts. Our results indicate that coastal children have a reduced likelihood of severe stunting in Haiti but have increased likelihoods of stunting and reduced dietary diversity in the Dominican Republic. These contrasting results are likely due to the differential in developed infrastructure and market access. Our analyses did not demonstrate an association between more diverse and less degraded coral reefs and better childhood nutrition. The results highlight the complexities of modelling interactions between the health of humans and natural systems, and indicate the next steps needed to support integrated development programming. PMID:29795591
[Usefulness of sputum Gram staining in community-acquired pneumonia].
Sato, Tadashi; Aoshima, Masahiro; Ohmagari, Norio; Tada, Hiroshi; Chohnabayashi, Naohiko
2002-07-01
To evaluate the usefulness of sputum gram staining in community-acquired pneumonia (CAP), we reviewed 144 cases requiring hospitalization in the last 4 years. The sensitivity was 75.5%, specificity 68.2%, positive predictive value 74.1%, negative predictive value 69.8%, positive likelihood ratio 2.37, negative likelihood ratio 0.36 and accuracy 72.2% in 97 cases. Both sputum gram staining and culture were performed. Concerning bacterial pneumonia (65 cases), we compared the Gram staining group (n = 33), which received initial antibiotic treatment, based on sputum gram staining with the Empiric group (n = 32) that received antibiotics empirically. The success rates of the initial antibiotic treatment were 87.9% vs. 78.1% (P = 0.473); mean hospitalization periods were 9.67 vs. 11.75 days (P = 0.053); and periods of intravenous therapy were 6.73 vs. 7.91 days (P = 0.044), respectively. As for initial treatment, penicillins were used in the Gram staining group more frequently (P < 0.01). We conclude that sputum gram staining is useful for the shortening of the treatment period and the appropriate selection of initial antibiotics in bacterial pneumonia. We believe, therefore, that sputum gram staining is indispensable as a diagnostic tool CAP.
Exoplanet Biosignatures: Future Directions
Bains, William; Cronin, Leroy; DasSarma, Shiladitya; Danielache, Sebastian; Domagal-Goldman, Shawn; Kacar, Betul; Kiang, Nancy Y.; Lenardic, Adrian; Reinhard, Christopher T.; Moore, William; Schwieterman, Edward W.; Shkolnik, Evgenya L.; Smith, Harrison B.
2018-01-01
Abstract We introduce a Bayesian method for guiding future directions for detection of life on exoplanets. We describe empirical and theoretical work necessary to place constraints on the relevant likelihoods, including those emerging from better understanding stellar environment, planetary climate and geophysics, geochemical cycling, the universalities of physics and chemistry, the contingencies of evolutionary history, the properties of life as an emergent complex system, and the mechanisms driving the emergence of life. We provide examples for how the Bayesian formalism could guide future search strategies, including determining observations to prioritize or deciding between targeted searches or larger lower resolution surveys to generate ensemble statistics and address how a Bayesian methodology could constrain the prior probability of life with or without a positive detection. Key Words: Exoplanets—Biosignatures—Life detection—Bayesian analysis. Astrobiology 18, 779–824. PMID:29938538
Exoplanet Biosignatures: Future Directions.
Walker, Sara I; Bains, William; Cronin, Leroy; DasSarma, Shiladitya; Danielache, Sebastian; Domagal-Goldman, Shawn; Kacar, Betul; Kiang, Nancy Y; Lenardic, Adrian; Reinhard, Christopher T; Moore, William; Schwieterman, Edward W; Shkolnik, Evgenya L; Smith, Harrison B
2018-06-01
We introduce a Bayesian method for guiding future directions for detection of life on exoplanets. We describe empirical and theoretical work necessary to place constraints on the relevant likelihoods, including those emerging from better understanding stellar environment, planetary climate and geophysics, geochemical cycling, the universalities of physics and chemistry, the contingencies of evolutionary history, the properties of life as an emergent complex system, and the mechanisms driving the emergence of life. We provide examples for how the Bayesian formalism could guide future search strategies, including determining observations to prioritize or deciding between targeted searches or larger lower resolution surveys to generate ensemble statistics and address how a Bayesian methodology could constrain the prior probability of life with or without a positive detection. Key Words: Exoplanets-Biosignatures-Life detection-Bayesian analysis. Astrobiology 18, 779-824.
Occupancy Modeling Species-Environment Relationships with Non-ignorable Survey Designs.
Irvine, Kathryn M; Rodhouse, Thomas J; Wright, Wilson J; Olsen, Anthony R
2018-05-26
Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if datasets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling datasets composed of sites contributed outside of a probability design Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occu31 pancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species-environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design-unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and 4 revisits). Aggregating datasets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
ERIC Educational Resources Information Center
Mahmud, Jumailiyah; Sutikno, Muzayanah; Naga, Dali S.
2016-01-01
The aim of this study is to determine variance difference between maximum likelihood and expected A posteriori estimation methods viewed from number of test items of aptitude test. The variance presents an accuracy generated by both maximum likelihood and Bayes estimation methods. The test consists of three subtests, each with 40 multiple-choice…
New applications of maximum likelihood and Bayesian statistics in macromolecular crystallography.
McCoy, Airlie J
2002-10-01
Maximum likelihood methods are well known to macromolecular crystallographers as the methods of choice for isomorphous phasing and structure refinement. Recently, the use of maximum likelihood and Bayesian statistics has extended to the areas of molecular replacement and density modification, placing these methods on a stronger statistical foundation and making them more accurate and effective.
Jelicić, Helena; Phelps, Erin; Lerner, Richard M
2009-07-01
Developmental science rests on describing, explaining, and optimizing intraindividual changes and, hence, empirically requires longitudinal research. Problems of missing data arise in most longitudinal studies, thus creating challenges for interpreting the substance and structure of intraindividual change. Using a sample of reports of longitudinal studies obtained from three flagship developmental journals-Child Development, Developmental Psychology, and Journal of Research on Adolescence-we examined the number of longitudinal studies reporting missing data and the missing data techniques used. Of the 100 longitudinal studies sampled, 57 either reported having missing data or had discrepancies in sample sizes reported for different analyses. The majority of these studies (82%) used missing data techniques that are statistically problematic (either listwise deletion or pairwise deletion) and not among the methods recommended by statisticians (i.e., the direct maximum likelihood method and the multiple imputation method). Implications of these results for developmental theory and application, and the need for understanding the consequences of using statistically inappropriate missing data techniques with actual longitudinal data sets, are discussed.
Staley, Dennis M.; Negri, Jacquelyn; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2017-01-01
Early warning of post-fire debris-flow occurrence during intense rainfall has traditionally relied upon a library of regionally specific empirical rainfall intensity–duration thresholds. Development of this library and the calculation of rainfall intensity-duration thresholds often require several years of monitoring local rainfall and hydrologic response to rainstorms, a time-consuming approach where results are often only applicable to the specific region where data were collected. Here, we present a new, fully predictive approach that utilizes rainfall, hydrologic response, and readily available geospatial data to predict rainfall intensity–duration thresholds for debris-flow generation in recently burned locations in the western United States. Unlike the traditional approach to defining regional thresholds from historical data, the proposed methodology permits the direct calculation of rainfall intensity–duration thresholds for areas where no such data exist. The thresholds calculated by this method are demonstrated to provide predictions that are of similar accuracy, and in some cases outperform, previously published regional intensity–duration thresholds. The method also provides improved predictions of debris-flow likelihood, which can be incorporated into existing approaches for post-fire debris-flow hazard assessment. Our results also provide guidance for the operational expansion of post-fire debris-flow early warning systems in areas where empirically defined regional rainfall intensity–duration thresholds do not currently exist.
NASA Astrophysics Data System (ADS)
Staley, Dennis M.; Negri, Jacquelyn A.; Kean, Jason W.; Laber, Jayme L.; Tillery, Anne C.; Youberg, Ann M.
2017-02-01
Early warning of post-fire debris-flow occurrence during intense rainfall has traditionally relied upon a library of regionally specific empirical rainfall intensity-duration thresholds. Development of this library and the calculation of rainfall intensity-duration thresholds often require several years of monitoring local rainfall and hydrologic response to rainstorms, a time-consuming approach where results are often only applicable to the specific region where data were collected. Here, we present a new, fully predictive approach that utilizes rainfall, hydrologic response, and readily available geospatial data to predict rainfall intensity-duration thresholds for debris-flow generation in recently burned locations in the western United States. Unlike the traditional approach to defining regional thresholds from historical data, the proposed methodology permits the direct calculation of rainfall intensity-duration thresholds for areas where no such data exist. The thresholds calculated by this method are demonstrated to provide predictions that are of similar accuracy, and in some cases outperform, previously published regional intensity-duration thresholds. The method also provides improved predictions of debris-flow likelihood, which can be incorporated into existing approaches for post-fire debris-flow hazard assessment. Our results also provide guidance for the operational expansion of post-fire debris-flow early warning systems in areas where empirically defined regional rainfall intensity-duration thresholds do not currently exist.
ExGUtils: A Python Package for Statistical Analysis With the ex-Gaussian Probability Density.
Moret-Tatay, Carmen; Gamermann, Daniel; Navarro-Pardo, Esperanza; Fernández de Córdoba Castellá, Pedro
2018-01-01
The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are often modeled through the ex-Gaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the ex-Gaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the ex-Gaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done.
ExGUtils: A Python Package for Statistical Analysis With the ex-Gaussian Probability Density
Moret-Tatay, Carmen; Gamermann, Daniel; Navarro-Pardo, Esperanza; Fernández de Córdoba Castellá, Pedro
2018-01-01
The study of reaction times and their underlying cognitive processes is an important field in Psychology. Reaction times are often modeled through the ex-Gaussian distribution, because it provides a good fit to multiple empirical data. The complexity of this distribution makes the use of computational tools an essential element. Therefore, there is a strong need for efficient and versatile computational tools for the research in this area. In this manuscript we discuss some mathematical details of the ex-Gaussian distribution and apply the ExGUtils package, a set of functions and numerical tools, programmed for python, developed for numerical analysis of data involving the ex-Gaussian probability density. In order to validate the package, we present an extensive analysis of fits obtained with it, discuss advantages and differences between the least squares and maximum likelihood methods and quantitatively evaluate the goodness of the obtained fits (which is usually an overlooked point in most literature in the area). The analysis done allows one to identify outliers in the empirical datasets and criteriously determine if there is a need for data trimming and at which points it should be done. PMID:29765345
Measuring coherence of computer-assisted likelihood ratio methods.
Haraksim, Rudolf; Ramos, Daniel; Meuwly, Didier; Berger, Charles E H
2015-04-01
Measuring the performance of forensic evaluation methods that compute likelihood ratios (LRs) is relevant for both the development and the validation of such methods. A framework of performance characteristics categorized as primary and secondary is introduced in this study to help achieve such development and validation. Ground-truth labelled fingerprint data is used to assess the performance of an example likelihood ratio method in terms of those performance characteristics. Discrimination, calibration, and especially the coherence of this LR method are assessed as a function of the quantity and quality of the trace fingerprint specimen. Assessment of the coherence revealed a weakness of the comparison algorithm in the computer-assisted likelihood ratio method used. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
High throughput nonparametric probability density estimation.
Farmer, Jenny; Jacobs, Donald
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.
High throughput nonparametric probability density estimation
Farmer, Jenny
2018-01-01
In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803
Zero-inflated Poisson model based likelihood ratio test for drug safety signal detection.
Huang, Lan; Zheng, Dan; Zalkikar, Jyoti; Tiwari, Ram
2017-02-01
In recent decades, numerous methods have been developed for data mining of large drug safety databases, such as Food and Drug Administration's (FDA's) Adverse Event Reporting System, where data matrices are formed by drugs such as columns and adverse events as rows. Often, a large number of cells in these data matrices have zero cell counts and some of them are "true zeros" indicating that the drug-adverse event pairs cannot occur, and these zero counts are distinguished from the other zero counts that are modeled zero counts and simply indicate that the drug-adverse event pairs have not occurred yet or have not been reported yet. In this paper, a zero-inflated Poisson model based likelihood ratio test method is proposed to identify drug-adverse event pairs that have disproportionately high reporting rates, which are also called signals. The maximum likelihood estimates of the model parameters of zero-inflated Poisson model based likelihood ratio test are obtained using the expectation and maximization algorithm. The zero-inflated Poisson model based likelihood ratio test is also modified to handle the stratified analyses for binary and categorical covariates (e.g. gender and age) in the data. The proposed zero-inflated Poisson model based likelihood ratio test method is shown to asymptotically control the type I error and false discovery rate, and its finite sample performance for signal detection is evaluated through a simulation study. The simulation results show that the zero-inflated Poisson model based likelihood ratio test method performs similar to Poisson model based likelihood ratio test method when the estimated percentage of true zeros in the database is small. Both the zero-inflated Poisson model based likelihood ratio test and likelihood ratio test methods are applied to six selected drugs, from the 2006 to 2011 Adverse Event Reporting System database, with varying percentages of observed zero-count cells.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hill, J.R.; Heger, A.S.; Koen, B.V.
1984-04-01
This report is the result of a preliminary feasibility study of the applicability of Stein and related parametric empirical Bayes (PEB) estimators to the Nuclear Plant Reliability Data System (NPRDS). A new estimator is derived for the means of several independent Poisson distributions with different sampling times. This estimator is applied to data from NPRDS in an attempt to improve failure rate estimation. Theoretical and Monte Carlo results indicate that the new PEB estimator can perform significantly better than the standard maximum likelihood estimator if the estimation of the individual means can be combined through the loss function or throughmore » a parametric class of prior distributions.« less
Estimating parameter of Rayleigh distribution by using Maximum Likelihood method and Bayes method
NASA Astrophysics Data System (ADS)
Ardianti, Fitri; Sutarman
2018-01-01
In this paper, we use Maximum Likelihood estimation and Bayes method under some risk function to estimate parameter of Rayleigh distribution to know the best method. The prior knowledge which used in Bayes method is Jeffrey’s non-informative prior. Maximum likelihood estimation and Bayes method under precautionary loss function, entropy loss function, loss function-L 1 will be compared. We compare these methods by bias and MSE value using R program. After that, the result will be displayed in tables to facilitate the comparisons.
Davidov, Ori; Rosen, Sophia
2011-04-01
In medical studies, endpoints are often measured for each patient longitudinally. The mixed-effects model has been a useful tool for the analysis of such data. There are situations in which the parameters of the model are subject to some restrictions or constraints. For example, in hearing loss studies, we expect hearing to deteriorate with time. This means that hearing thresholds which reflect hearing acuity will, on average, increase over time. Therefore, the regression coefficients associated with the mean effect of time on hearing ability will be constrained. Such constraints should be accounted for in the analysis. We propose maximum likelihood estimation procedures, based on the expectation-conditional maximization either algorithm, to estimate the parameters of the model while accounting for the constraints on them. The proposed methods improve, in terms of mean square error, on the unconstrained estimators. In some settings, the improvement may be substantial. Hypotheses testing procedures that incorporate the constraints are developed. Specifically, likelihood ratio, Wald, and score tests are proposed and investigated. Their empirical significance levels and power are studied using simulations. It is shown that incorporating the constraints improves the mean squared error of the estimates and the power of the tests. These improvements may be substantial. The methodology is used to analyze a hearing loss study.
Comparison of estimators of standard deviation for hydrologic time series
Tasker, Gary D.; Gilroy, Edward J.
1982-01-01
Unbiasing factors as a function of serial correlation, ρ, and sample size, n for the sample standard deviation of a lag one autoregressive model were generated by random number simulation. Monte Carlo experiments were used to compare the performance of several alternative methods for estimating the standard deviation σ of a lag one autoregressive model in terms of bias, root mean square error, probability of underestimation, and expected opportunity design loss. Three methods provided estimates of σ which were much less biased but had greater mean square errors than the usual estimate of σ: s = (1/(n - 1) ∑ (xi −x¯)2)½. The three methods may be briefly characterized as (1) a method using a maximum likelihood estimate of the unbiasing factor, (2) a method using an empirical Bayes estimate of the unbiasing factor, and (3) a robust nonparametric estimate of σ suggested by Quenouille. Because s tends to underestimate σ, its use as an estimate of a model parameter results in a tendency to underdesign. If underdesign losses are considered more serious than overdesign losses, then the choice of one of the less biased methods may be wise.
May, Michael R; Moore, Brian R
2016-11-01
Evolutionary biologists have long been fascinated by the extreme differences in species numbers across branches of the Tree of Life. This has motivated the development of statistical methods for detecting shifts in the rate of lineage diversification across the branches of phylogenic trees. One of the most frequently used methods, MEDUSA, explores a set of diversification-rate models, where each model assigns branches of the phylogeny to a set of diversification-rate categories. Each model is first fit to the data, and the Akaike information criterion (AIC) is then used to identify the optimal diversification model. Surprisingly, the statistical behavior of this popular method is uncharacterized, which is a concern in light of: (1) the poor performance of the AIC as a means of choosing among models in other phylogenetic contexts; (2) the ad hoc algorithm used to visit diversification models, and; (3) errors that we reveal in the likelihood function used to fit diversification models to the phylogenetic data. Here, we perform an extensive simulation study demonstrating that MEDUSA (1) has a high false-discovery rate (on average, spurious diversification-rate shifts are identified [Formula: see text] of the time), and (2) provides biased estimates of diversification-rate parameters. Understanding the statistical behavior of MEDUSA is critical both to empirical researchers-in order to clarify whether these methods can make reliable inferences from empirical datasets-and to theoretical biologists-in order to clarify the specific problems that need to be solved in order to develop more reliable approaches for detecting shifts in the rate of lineage diversification. [Akaike information criterion; extinction; lineage-specific diversification rates; phylogenetic model selection; speciation.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
May, Michael R.; Moore, Brian R.
2016-01-01
Evolutionary biologists have long been fascinated by the extreme differences in species numbers across branches of the Tree of Life. This has motivated the development of statistical methods for detecting shifts in the rate of lineage diversification across the branches of phylogenic trees. One of the most frequently used methods, MEDUSA, explores a set of diversification-rate models, where each model assigns branches of the phylogeny to a set of diversification-rate categories. Each model is first fit to the data, and the Akaike information criterion (AIC) is then used to identify the optimal diversification model. Surprisingly, the statistical behavior of this popular method is uncharacterized, which is a concern in light of: (1) the poor performance of the AIC as a means of choosing among models in other phylogenetic contexts; (2) the ad hoc algorithm used to visit diversification models, and; (3) errors that we reveal in the likelihood function used to fit diversification models to the phylogenetic data. Here, we perform an extensive simulation study demonstrating that MEDUSA (1) has a high false-discovery rate (on average, spurious diversification-rate shifts are identified ≈30% of the time), and (2) provides biased estimates of diversification-rate parameters. Understanding the statistical behavior of MEDUSA is critical both to empirical researchers—in order to clarify whether these methods can make reliable inferences from empirical datasets—and to theoretical biologists—in order to clarify the specific problems that need to be solved in order to develop more reliable approaches for detecting shifts in the rate of lineage diversification. [Akaike information criterion; extinction; lineage-specific diversification rates; phylogenetic model selection; speciation.] PMID:27037081
Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha
2017-01-01
The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Zilberberg, Marya D; Kothari, Smita; Shorr, Andrew F
2009-01-01
Introduction Recent epidemiologic literature indicates that candidal species resistant to azoles are becoming more prevalent in the face of increasing incidence of hospitalizations with candidemia. Echinocandins, a new class of antifungal agents, are effective against resistant candidal species. As delaying appropriate antifungal coverage leads to increased mortality, we evaluated the cost-effectiveness of 100 mg daily empiric micafungin (MIC) vs. 400 mg daily fluconazole (FLU) for suspected intensive care unit-acquired candidemia (ICU-AC) among septic patients. Methods We designed a decision model with inputs from the literature in a hypothetical 1000-patient cohort with suspected ICU-AC treated empirically with either MIC or FLU or no treatment accompanied by a watchful waiting strategy. We examined the differences in the number of survivors, acquisition costs of antifungals, and lifetime costs among survivors in the cohort under each scenario, and calculated cost per quality adjusted life year (QALY). We conducted Monte Carlo simulations and sensitivity analyses to determine the stability of our estimates. Results In the base case analysis, assuming ICU-AC attributable mortality of 0.40 and a 52% relative risk reduction in mortality with appropriate timely therapy, compared with FLU (total deaths 31), treatment with MIC (total deaths 27) would result in four fewer deaths at an incremental cost/death averted of $61,446. Similarly, in reference case, incremental cost-effectiveness of MIC over FLU was $34,734 (95% confidence interval $26,312 to $49,209) per QALY. The estimates were most sensitive to the QALY adjustment factor and the risk of candidemia among septic patients. Conclusions Given the increasing likelihood of azole resistance among candidal isolates, empiric treatment of ICU-AC with 100 mg daily MIC is a cost-effective alternative to FLU. PMID:19545361
NASA Technical Reports Server (NTRS)
Peters, B. C., Jr.; Walker, H. F.
1975-01-01
A general iterative procedure is given for determining the consistent maximum likelihood estimates of normal distributions. In addition, a local maximum of the log-likelihood function, Newtons's method, a method of scoring, and modifications of these procedures are discussed.
The Maximum Likelihood Solution for Inclination-only Data
NASA Astrophysics Data System (ADS)
Arason, P.; Levi, S.
2006-12-01
The arithmetic means of inclination-only data are known to introduce a shallowing bias. Several methods have been proposed to estimate unbiased means of the inclination along with measures of the precision. Most of the inclination-only methods were designed to maximize the likelihood function of the marginal Fisher distribution. However, the exact analytical form of the maximum likelihood function is fairly complicated, and all these methods require various assumptions and approximations that are inappropriate for many data sets. For some steep and dispersed data sets, the estimates provided by these methods are significantly displaced from the peak of the likelihood function to systematically shallower inclinations. The problem in locating the maximum of the likelihood function is partly due to difficulties in accurately evaluating the function for all values of interest. This is because some elements of the log-likelihood function increase exponentially as precision parameters increase, leading to numerical instabilities. In this study we succeeded in analytically cancelling exponential elements from the likelihood function, and we are now able to calculate its value for any location in the parameter space and for any inclination-only data set, with full accuracy. Furtermore, we can now calculate the partial derivatives of the likelihood function with desired accuracy. Locating the maximum likelihood without the assumptions required by previous methods is now straight forward. The information to separate the mean inclination from the precision parameter will be lost for very steep and dispersed data sets. It is worth noting that the likelihood function always has a maximum value. However, for some dispersed and steep data sets with few samples, the likelihood function takes its highest value on the boundary of the parameter space, i.e. at inclinations of +/- 90 degrees, but with relatively well defined dispersion. Our simulations indicate that this occurs quite frequently for certain data sets, and relatively small perturbations in the data will drive the maxima to the boundary. We interpret this to indicate that, for such data sets, the information needed to separate the mean inclination and the precision parameter is permanently lost. To assess the reliability and accuracy of our method we generated large number of random Fisher-distributed data sets and used seven methods to estimate the mean inclination and precision paramenter. These comparisons are described by Levi and Arason at the 2006 AGU Fall meeting. The results of the various methods is very favourable to our new robust maximum likelihood method, which, on average, is the most reliable, and the mean inclination estimates are the least biased toward shallow values. Further information on our inclination-only analysis can be obtained from: http://www.vedur.is/~arason/paleomag
NASA Technical Reports Server (NTRS)
Sielken, R. L., Jr. (Principal Investigator)
1981-01-01
Several methods of estimating individual crop acreages using a mixture of completely identified and partially identified (generic) segments from a single growing year are derived and discussed. A small Monte Carlo study of eight estimators is presented. The relative empirical behavior of these estimators is discussed as are the effects of segment sample size and amount of partial identification. The principle recommendations are (1) to not exclude, but rather incorporate partially identified sample segments into the estimation procedure, (2) try to avoid having a large percentage (say 80%) of only partially identified segments, in the sample, and (3) use the maximum likelihood estimator although the weighted least squares estimator and least squares ratio estimator both perform almost as well. Sets of spring small grains (North Dakota) data were used.
An Optimization-based Framework to Learn Conditional Random Fields for Multi-label Classification
Naeini, Mahdi Pakdaman; Batal, Iyad; Liu, Zitao; Hong, CharmGil; Hauskrecht, Milos
2015-01-01
This paper studies multi-label classification problem in which data instances are associated with multiple, possibly high-dimensional, label vectors. This problem is especially challenging when labels are dependent and one cannot decompose the problem into a set of independent classification problems. To address the problem and properly represent label dependencies we propose and study a pairwise conditional random Field (CRF) model. We develop a new approach for learning the structure and parameters of the CRF from data. The approach maximizes the pseudo likelihood of observed labels and relies on the fast proximal gradient descend for learning the structure and limited memory BFGS for learning the parameters of the model. Empirical results on several datasets show that our approach outperforms several multi-label classification baselines, including recently published state-of-the-art methods. PMID:25927015
ERIC Educational Resources Information Center
Sletten, Mira Aaboen
2010-01-01
The article examines leisure time socializing and the subjective experience of social isolation among Norwegian 13-16-year-olds in poor families. The empirical analyses use data from a representative survey in Norway in 2002 and show the likelihood of participation in leisure time socializing with peers to be lower among 13-16-year-olds in poor…
ERIC Educational Resources Information Center
Coughlin, Kevin B.
2013-01-01
This study is intended to provide researchers with empirically derived guidelines for conducting factor analytic studies in research contexts that include dichotomous and continuous levels of measurement. This study is based on the hypotheses that ordinary least squares (OLS) factor analysis will yield more accurate parameter estimates than…
Estimating Function Approaches for Spatial Point Processes
NASA Astrophysics Data System (ADS)
Deng, Chong
Spatial point pattern data consist of locations of events that are often of interest in biological and ecological studies. Such data are commonly viewed as a realization from a stochastic process called spatial point process. To fit a parametric spatial point process model to such data, likelihood-based methods have been widely studied. However, while maximum likelihood estimation is often too computationally intensive for Cox and cluster processes, pairwise likelihood methods such as composite likelihood, Palm likelihood usually suffer from the loss of information due to the ignorance of correlation among pairs. For many types of correlated data other than spatial point processes, when likelihood-based approaches are not desirable, estimating functions have been widely used for model fitting. In this dissertation, we explore the estimating function approaches for fitting spatial point process models. These approaches, which are based on the asymptotic optimal estimating function theories, can be used to incorporate the correlation among data and yield more efficient estimators. We conducted a series of studies to demonstrate that these estmating function approaches are good alternatives to balance the trade-off between computation complexity and estimating efficiency. First, we propose a new estimating procedure that improves the efficiency of pairwise composite likelihood method in estimating clustering parameters. Our approach combines estimating functions derived from pairwise composite likeli-hood estimation and estimating functions that account for correlations among the pairwise contributions. Our method can be used to fit a variety of parametric spatial point process models and can yield more efficient estimators for the clustering parameters than pairwise composite likelihood estimation. We demonstrate its efficacy through a simulation study and an application to the longleaf pine data. Second, we further explore the quasi-likelihood approach on fitting second-order intensity function of spatial point processes. However, the original second-order quasi-likelihood is barely feasible due to the intense computation and high memory requirement needed to solve a large linear system. Motivated by the existence of geometric regular patterns in the stationary point processes, we find a lower dimension representation of the optimal weight function and propose a reduced second-order quasi-likelihood approach. Through a simulation study, we show that the proposed method not only demonstrates superior performance in fitting the clustering parameter but also merits in the relaxation of the constraint of the tuning parameter, H. Third, we studied the quasi-likelihood type estimating funciton that is optimal in a certain class of first-order estimating functions for estimating the regression parameter in spatial point process models. Then, by using a novel spectral representation, we construct an implementation that is computationally much more efficient and can be applied to more general setup than the original quasi-likelihood method.
New prior sampling methods for nested sampling - Development and testing
NASA Astrophysics Data System (ADS)
Stokes, Barrie; Tuyl, Frank; Hudson, Irene
2017-06-01
Nested Sampling is a powerful algorithm for fitting models to data in the Bayesian setting, introduced by Skilling [1]. The nested sampling algorithm proceeds by carrying out a series of compressive steps, involving successively nested iso-likelihood boundaries, starting with the full prior distribution of the problem parameters. The "central problem" of nested sampling is to draw at each step a sample from the prior distribution whose likelihood is greater than the current likelihood threshold, i.e., a sample falling inside the current likelihood-restricted region. For both flat and informative priors this ultimately requires uniform sampling restricted to the likelihood-restricted region. We present two new methods of carrying out this sampling step, and illustrate their use with the lighthouse problem [2], a bivariate likelihood used by Gregory [3] and a trivariate Gaussian mixture likelihood. All the algorithm development and testing reported here has been done with Mathematica® [4].
NASA Technical Reports Server (NTRS)
Murphy, Patrick Charles
1985-01-01
An algorithm for maximum likelihood (ML) estimation is developed with an efficient method for approximating the sensitivities. The algorithm was developed for airplane parameter estimation problems but is well suited for most nonlinear, multivariable, dynamic systems. The ML algorithm relies on a new optimization method referred to as a modified Newton-Raphson with estimated sensitivities (MNRES). MNRES determines sensitivities by using slope information from local surface approximations of each output variable in parameter space. The fitted surface allows sensitivity information to be updated at each iteration with a significant reduction in computational effort. MNRES determines the sensitivities with less computational effort than using either a finite-difference method or integrating the analytically determined sensitivity equations. MNRES eliminates the need to derive sensitivity equations for each new model, thus eliminating algorithm reformulation with each new model and providing flexibility to use model equations in any format that is convenient. A random search technique for determining the confidence limits of ML parameter estimates is applied to nonlinear estimation problems for airplanes. The confidence intervals obtained by the search are compared with Cramer-Rao (CR) bounds at the same confidence level. It is observed that the degree of nonlinearity in the estimation problem is an important factor in the relationship between CR bounds and the error bounds determined by the search technique. The CR bounds were found to be close to the bounds determined by the search when the degree of nonlinearity was small. Beale's measure of nonlinearity is developed in this study for airplane identification problems; it is used to empirically correct confidence levels for the parameter confidence limits. The primary utility of the measure, however, was found to be in predicting the degree of agreement between Cramer-Rao bounds and search estimates.
Synthesizing Regression Results: A Factored Likelihood Method
ERIC Educational Resources Information Center
Wu, Meng-Jia; Becker, Betsy Jane
2013-01-01
Regression methods are widely used by researchers in many fields, yet methods for synthesizing regression results are scarce. This study proposes using a factored likelihood method, originally developed to handle missing data, to appropriately synthesize regression models involving different predictors. This method uses the correlations reported…
Glavis-Bloom, Justin; Modi, Payal; Nasrin, Sabiha; Rege, Soham; Chu, Chieh; Schmid, Christopher H; Alam, Nur H
2015-01-01
Introduction: Diarrhea remains one of the most common and most deadly conditions affecting children worldwide. Accurately assessing dehydration status is critical to determining treatment course, yet no clinical diagnostic models for dehydration have been empirically derived and validated for use in resource-limited settings. Methods: In the Dehydration: Assessing Kids Accurately (DHAKA) prospective cohort study, a random sample of children under 5 with acute diarrhea was enrolled between February and June 2014 in Bangladesh. Local nurses assessed children for clinical signs of dehydration on arrival, and then serial weights were obtained as subjects were rehydrated. For each child, the percent weight change with rehydration was used to classify subjects with severe dehydration (>9% weight change), some dehydration (3–9%), or no dehydration (<3%). Clinical variables were then entered into logistic regression and recursive partitioning models to develop the DHAKA Dehydration Score and DHAKA Dehydration Tree, respectively. Models were assessed for their accuracy using the area under their receiver operating characteristic curve (AUC) and for their reliability through repeat clinical exams. Bootstrapping was used to internally validate the models. Results: A total of 850 children were enrolled, with 771 included in the final analysis. Of the 771 children included in the analysis, 11% were classified with severe dehydration, 45% with some dehydration, and 44% with no dehydration. Both the DHAKA Dehydration Score and DHAKA Dehydration Tree had significant AUCs of 0.79 (95% CI = 0.74, 0.84) and 0.76 (95% CI = 0.71, 0.80), respectively, for the diagnosis of severe dehydration. Additionally, the DHAKA Dehydration Score and DHAKA Dehydration Tree had significant positive likelihood ratios of 2.0 (95% CI = 1.8, 2.3) and 2.5 (95% CI = 2.1, 2.8), respectively, and significant negative likelihood ratios of 0.23 (95% CI = 0.13, 0.40) and 0.28 (95% CI = 0.18, 0.44), respectively, for the diagnosis of severe dehydration. Both models demonstrated 90% agreement between independent raters and good reproducibility using bootstrapping. Conclusion: This study is the first to empirically derive and internally validate accurate and reliable clinical diagnostic models for dehydration in a resource-limited setting. After external validation, frontline providers may use these new tools to better manage acute diarrhea in children. PMID:26374802
Efficient estimation of Pareto model: Some modified percentile estimators.
Bhatti, Sajjad Haider; Hussain, Shahzad; Ahmad, Tanvir; Aslam, Muhammad; Aftab, Muhammad; Raza, Muhammad Ali
2018-01-01
The article proposes three modified percentile estimators for parameter estimation of the Pareto distribution. These modifications are based on median, geometric mean and expectation of empirical cumulative distribution function of first-order statistic. The proposed modified estimators are compared with traditional percentile estimators through a Monte Carlo simulation for different parameter combinations with varying sample sizes. Performance of different estimators is assessed in terms of total mean square error and total relative deviation. It is determined that modified percentile estimator based on expectation of empirical cumulative distribution function of first-order statistic provides efficient and precise parameter estimates compared to other estimators considered. The simulation results were further confirmed using two real life examples where maximum likelihood and moment estimators were also considered.
Inferring the photometric and size evolution of galaxies from image simulations. I. Method
NASA Astrophysics Data System (ADS)
Carassou, Sébastien; de Lapparent, Valérie; Bertin, Emmanuel; Le Borgne, Damien
2017-09-01
Context. Current constraints on models of galaxy evolution rely on morphometric catalogs extracted from multi-band photometric surveys. However, these catalogs are altered by selection effects that are difficult to model, that correlate in non trivial ways, and that can lead to contradictory predictions if not taken into account carefully. Aims: To address this issue, we have developed a new approach combining parametric Bayesian indirect likelihood (pBIL) techniques and empirical modeling with realistic image simulations that reproduce a large fraction of these selection effects. This allows us to perform a direct comparison between observed and simulated images and to infer robust constraints on model parameters. Methods: We use a semi-empirical forward model to generate a distribution of mock galaxies from a set of physical parameters. These galaxies are passed through an image simulator reproducing the instrumental characteristics of any survey and are then extracted in the same way as the observed data. The discrepancy between the simulated and observed data is quantified, and minimized with a custom sampling process based on adaptive Markov chain Monte Carlo methods. Results: Using synthetic data matching most of the properties of a Canada-France-Hawaii Telescope Legacy Survey Deep field, we demonstrate the robustness and internal consistency of our approach by inferring the parameters governing the size and luminosity functions and their evolutions for different realistic populations of galaxies. We also compare the results of our approach with those obtained from the classical spectral energy distribution fitting and photometric redshift approach. Conclusions: Our pipeline infers efficiently the luminosity and size distribution and evolution parameters with a very limited number of observables (three photometric bands). When compared to SED fitting based on the same set of observables, our method yields results that are more accurate and free from systematic biases.
Ternès, Nils; Rotolo, Federico; Michiels, Stefan
2016-07-10
Correct selection of prognostic biomarkers among multiple candidates is becoming increasingly challenging as the dimensionality of biological data becomes higher. Therefore, minimizing the false discovery rate (FDR) is of primary importance, while a low false negative rate (FNR) is a complementary measure. The lasso is a popular selection method in Cox regression, but its results depend heavily on the penalty parameter λ. Usually, λ is chosen using maximum cross-validated log-likelihood (max-cvl). However, this method has often a very high FDR. We review methods for a more conservative choice of λ. We propose an empirical extension of the cvl by adding a penalization term, which trades off between the goodness-of-fit and the parsimony of the model, leading to the selection of fewer biomarkers and, as we show, to the reduction of the FDR without large increase in FNR. We conducted a simulation study considering null and moderately sparse alternative scenarios and compared our approach with the standard lasso and 10 other competitors: Akaike information criterion (AIC), corrected AIC, Bayesian information criterion (BIC), extended BIC, Hannan and Quinn information criterion (HQIC), risk information criterion (RIC), one-standard-error rule, adaptive lasso, stability selection, and percentile lasso. Our extension achieved the best compromise across all the scenarios between a reduction of the FDR and a limited raise of the FNR, followed by the AIC, the RIC, and the adaptive lasso, which performed well in some settings. We illustrate the methods using gene expression data of 523 breast cancer patients. In conclusion, we propose to apply our extension to the lasso whenever a stringent FDR with a limited FNR is targeted. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Bias Correction for the Maximum Likelihood Estimate of Ability. Research Report. ETS RR-05-15
ERIC Educational Resources Information Center
Zhang, Jinming
2005-01-01
Lord's bias function and the weighted likelihood estimation method are effective in reducing the bias of the maximum likelihood estimate of an examinee's ability under the assumption that the true item parameters are known. This paper presents simulation studies to determine the effectiveness of these two methods in reducing the bias when the item…
Han, Jubong; Lee, K B; Lee, Jong-Man; Park, Tae Soon; Oh, J S; Oh, Pil-Jei
2016-03-01
We discuss a new method to incorporate Type B uncertainty into least-squares procedures. The new method is based on an extension of the likelihood function from which a conventional least-squares function is derived. The extended likelihood function is the product of the original likelihood function with additional PDFs (Probability Density Functions) that characterize the Type B uncertainties. The PDFs are considered to describe one's incomplete knowledge on correction factors being called nuisance parameters. We use the extended likelihood function to make point and interval estimations of parameters in the basically same way as the least-squares function used in the conventional least-squares method is derived. Since the nuisance parameters are not of interest and should be prevented from appearing in the final result, we eliminate such nuisance parameters by using the profile likelihood. As an example, we present a case study for a linear regression analysis with a common component of Type B uncertainty. In this example we compare the analysis results obtained from using our procedure with those from conventional methods. Copyright © 2015. Published by Elsevier Ltd.
Ecological Forecasting in Chesapeake Bay: Using a Mechanistic-Empirical Modelling Approach
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, C. W.; Hood, Raleigh R.; Long, Wen
The Chesapeake Bay Ecological Prediction System (CBEPS) automatically generates daily nowcasts and three-day forecasts of several environmental variables, such as sea-surface temperature and salinity, the concentrations of chlorophyll, nitrate, and dissolved oxygen, and the likelihood of encountering several noxious species, including harmful algal blooms and water-borne pathogens, for the purpose of monitoring the Bay's ecosystem. While the physical and biogeochemical variables are forecast mechanistically using the Regional Ocean Modeling System configured for the Chesapeake Bay, the species predictions are generated using a novel mechanistic empirical approach, whereby real-time output from the coupled physical biogeochemical model drives multivariate empirical habitat modelsmore » of the target species. The predictions, in the form of digital images, are available via the World Wide Web to interested groups to guide recreational, management, and research activities. Though full validation of the integrated forecasts for all species is still a work in progress, we argue that the mechanistic–empirical approach can be used to generate a wide variety of short-term ecological forecasts, and that it can be applied in any marine system where sufficient data exist to develop empirical habitat models. This paper provides an overview of this system, its predictions, and the approach taken.« less
Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les
2008-01-01
To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.
Monaural room acoustic parameters from music and speech.
Kendrick, Paul; Cox, Trevor J; Li, Francis F; Zhang, Yonggang; Chambers, Jonathon A
2008-07-01
This paper compares two methods for extracting room acoustic parameters from reverberated speech and music. An approach which uses statistical machine learning, previously developed for speech, is extended to work with music. For speech, reverberation time estimations are within a perceptual difference limen of the true value. For music, virtually all early decay time estimations are within a difference limen of the true value. The estimation accuracy is not good enough in other cases due to differences between the simulated data set used to develop the empirical model and real rooms. The second method carries out a maximum likelihood estimation on decay phases at the end of notes or speech utterances. This paper extends the method to estimate parameters relating to the balance of early and late energies in the impulse response. For reverberation time and speech, the method provides estimations which are within the perceptual difference limen of the true value. For other parameters such as clarity, the estimations are not sufficiently accurate due to the natural reverberance of the excitation signals. Speech is a better test signal than music because of the greater periods of silence in the signal, although music is needed for low frequency measurement.
A polychromatic adaption of the Beer-Lambert model for spectral decomposition
NASA Astrophysics Data System (ADS)
Sellerer, Thorsten; Ehn, Sebastian; Mechlem, Korbinian; Pfeiffer, Franz; Herzen, Julia; Noël, Peter B.
2017-03-01
We present a semi-empirical forward-model for spectral photon-counting CT which is fully compatible with state-of-the-art maximum-likelihood estimators (MLE) for basis material line integrals. The model relies on a minimum calibration effort to make the method applicable in routine clinical set-ups with the need for periodic re-calibration. In this work we present an experimental verifcation of our proposed method. The proposed method uses an adapted Beer-Lambert model, describing the energy dependent attenuation of a polychromatic x-ray spectrum using additional exponential terms. In an experimental dual-energy photon-counting CT setup based on a CdTe detector, the model demonstrates an accurate prediction of the registered counts for an attenuated polychromatic spectrum. Thereby deviations between model and measurement data lie within the Poisson statistical limit of the performed acquisitions, providing an effectively unbiased forward-model. The experimental data also shows that the model is capable of handling possible spectral distortions introduced by the photon-counting detector and CdTe sensor. The simplicity and high accuracy of the proposed model provides a viable forward-model for MLE-based spectral decomposition methods without the need of costly and time-consuming characterization of the system response.
On the error probability of general tree and trellis codes with applications to sequential decoding
NASA Technical Reports Server (NTRS)
Johannesson, R.
1973-01-01
An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random binary tree codes is derived and shown to be independent of the length of the tree. An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random L-branch binary trellis codes of rate R = 1/n is derived which separates the effects of the tail length T and the memory length M of the code. It is shown that the bound is independent of the length L of the information sequence. This implication is investigated by computer simulations of sequential decoding utilizing the stack algorithm. These simulations confirm the implication and further suggest an empirical formula for the true undetected decoding error probability with sequential decoding.
The simple rules of social contagion.
Hodas, Nathan O; Lerman, Kristina
2014-03-11
It is commonly believed that information spreads between individuals like a pathogen, with each exposure by an informed friend potentially resulting in a naive individual becoming infected. However, empirical studies of social media suggest that individual response to repeated exposure to information is far more complex. As a proxy for intervention experiments, we compare user responses to multiple exposures on two different social media sites, Twitter and Digg. We show that the position of exposing messages on the user-interface strongly affects social contagion. Accounting for this visibility significantly simplifies the dynamics of social contagion. The likelihood an individual will spread information increases monotonically with exposure, while explicit feedback about how many friends have previously spread it increases the likelihood of a response. We provide a framework for unifying information visibility, divided attention, and explicit social feedback to predict the temporal dynamics of user behavior.
The Simple Rules of Social Contagion
NASA Astrophysics Data System (ADS)
Hodas, Nathan O.; Lerman, Kristina
2014-03-01
It is commonly believed that information spreads between individuals like a pathogen, with each exposure by an informed friend potentially resulting in a naive individual becoming infected. However, empirical studies of social media suggest that individual response to repeated exposure to information is far more complex. As a proxy for intervention experiments, we compare user responses to multiple exposures on two different social media sites, Twitter and Digg. We show that the position of exposing messages on the user-interface strongly affects social contagion. Accounting for this visibility significantly simplifies the dynamics of social contagion. The likelihood an individual will spread information increases monotonically with exposure, while explicit feedback about how many friends have previously spread it increases the likelihood of a response. We provide a framework for unifying information visibility, divided attention, and explicit social feedback to predict the temporal dynamics of user behavior.
Smoothing of the bivariate LOD score for non-normal quantitative traits.
Buil, Alfonso; Dyer, Thomas D; Almasy, Laura; Blangero, John
2005-12-30
Variance component analysis provides an efficient method for performing linkage analysis for quantitative traits. However, type I error of variance components-based likelihood ratio testing may be affected when phenotypic data are non-normally distributed (especially with high values of kurtosis). This results in inflated LOD scores when the normality assumption does not hold. Even though different solutions have been proposed to deal with this problem with univariate phenotypes, little work has been done in the multivariate case. We present an empirical approach to adjust the inflated LOD scores obtained from a bivariate phenotype that violates the assumption of normality. Using the Collaborative Study on the Genetics of Alcoholism data available for the Genetic Analysis Workshop 14, we show how bivariate linkage analysis with leptokurtotic traits gives an inflated type I error. We perform a novel correction that achieves acceptable levels of type I error.
Ding, Jieli; Zhou, Haibo; Liu, Yanyan; Cai, Jianwen; Longnecker, Matthew P.
2014-01-01
Motivated by the need from our on-going environmental study in the Norwegian Mother and Child Cohort (MoBa) study, we consider an outcome-dependent sampling (ODS) scheme for failure-time data with censoring. Like the case-cohort design, the ODS design enriches the observed sample by selectively including certain failure subjects. We present an estimated maximum semiparametric empirical likelihood estimation (EMSELE) under the proportional hazards model framework. The asymptotic properties of the proposed estimator were derived. Simulation studies were conducted to evaluate the small-sample performance of our proposed method. Our analyses show that the proposed estimator and design is more efficient than the current default approach and other competing approaches. Applying the proposed approach with the data set from the MoBa study, we found a significant effect of an environmental contaminant on fecundability. PMID:24812419
Joint Modeling Approach for Semicompeting Risks Data with Missing Nonterminal Event Status
Hu, Chen; Tsodikov, Alex
2014-01-01
Semicompeting risks data, where a subject may experience sequential non-terminal and terminal events, and the terminal event may censor the non-terminal event but not vice versa, are widely available in many biomedical studies. We consider the situation when a proportion of subjects’ non-terminal events is missing, such that the observed data become a mixture of “true” semicompeting risks data and partially observed terminal event only data. An illness-death multistate model with proportional hazards assumptions is proposed to study the relationship between non-terminal and terminal events, and provide covariate-specific global and local association measures. Maximum likelihood estimation based on semiparametric regression analysis is used for statistical inference, and asymptotic properties of proposed estimators are studied using empirical process and martingale arguments. We illustrate the proposed method with simulation studies and data analysis of a follicular cell lymphoma study. PMID:24430204
Modeling regional variation in riverine fish biodiversity in the Arkansas-White-Red River basin
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schweizer, Peter E; Jager, Yetta
The patterns of biodiversity in freshwater systems are shaped by biogeography, environmental gradients, and human-induced factors. In this study, we developed empirical models to explain fish species richness in subbasins of the Arkansas White Red River basin as a function of discharge, elevation, climate, land cover, water quality, dams, and longitudinal position. We used information-theoretic criteria to compare generalized linear mixed models and identified well-supported models. Subbasin attributes that were retained as predictors included discharge, elevation, number of downstream dams, percent forest, percent shrubland, nitrate, total phosphorus, and sediment. The random component of our models, which assumed a negative binomialmore » distribution, included spatial correlation within larger river basins and overdispersed residual variance. This study differs from previous biodiversity modeling efforts in several ways. First, obtaining likelihoods for negative binomial mixed models, and thereby avoiding reliance on quasi-likelihoods, has only recently become practical. We found the ranking of models based on these likelihood estimates to be more believable than that produced using quasi-likelihoods. Second, because we had access to a regional-scale watershed model for this river basin, we were able to include model-estimated water quality attributes as predictors. Thus, the resulting models have potential value as tools with which to evaluate the benefits of water quality improvements to fish.« less
Epidemiology of Suicide Attempts among Youth Transitioning to Adulthood.
Thompson, Martie P; Swartout, Kevin
2018-04-01
Suicide is the second leading cause of death for older adolescents and young adults. Although empirical literature has identified important risk factors of suicidal behavior, it is less understood if changes in risk factors correspond with changes in suicide risk. To address this knowledge gap, we assessed if there were different trajectories of suicidal behavior as youth transition into young adulthood and determined what time-varying risk factors predicted these trajectories. This study used four waves of data spanning approximately 13 years from the National Longitudinal Study of Adolescent Health. The sample included 9027 respondents who were 12-18 years old (M = 15.26; SD = 1.76) at Wave 1, 50% male, 17% Hispanic, and 58% White. The results indicated that 93.6% of the sample had a low likelihood for suicide attempts across time, 5.1% had an elevated likelihood of attempting suicide in adolescence but not young adulthood, and 1.3% had an elevated likelihood of attempting suicide during adolescence and adulthood. The likelihood of a suicide attempt corresponded with changes on depression, impulsivity, delinquency, alcohol problems, family and friend suicide history, and experience with partner violence. Determining how suicide risk changes as youth transition into young adulthood and what factors predict these changes can help prevent suicide. Interventions targeting these risk factors could lead to reductions in suicide attempts.
Botticello, Amanda L.; Rohrbach, Tanya; Cobbold, Nicolette
2014-01-01
Purpose There is a need for empirical support of the association between the built environment and disability-related outcomes. This study explores the associations between community and neighborhood land uses and community participation among adults with acquired physical disability. Methods Cross-sectional data from 508 community-living, chronically disabled adults in New Jersey were obtained from among participants in national Spinal Cord Injury Model Systems database. Participants’ residential addresses were geocoded to link individual survey data with Geographic Information Systems (GIS) data on land use and destinations. The influence of residential density, land use mix, destination counts, and open space on four domains of participation were modeled at two geographic scales—the neighborhood (i.e., half mile buffer) and community (i.e., five mile) using multivariate logistic regression. All analyses were adjusted for demographic and impairment-related differences. Results Living in communities with greater land use mix and more destinations was associated with a decreased likelihood of reporting optimum social and physical activity. Conversely, living in neighborhoods with large portions of open space was positively associated with the likelihood of reporting full physical, occupational, and social participation. Conclusions These findings suggest that the overall living conditions of the built environment may be relevant to social inclusion for persons with physical disabilities. PMID:24935467
Halder, Indrani; Yang, Bao-Zhu; Kranzler, Henry R.; Stein, Murray B.; Shriver, Mark D.; Gelernter, Joel
2010-01-01
Variation in individual admixture proportions leads to heterogeneity within populations. Though novel methods and marker panels have been developed to quantify individual admixture, empirical data describing individual admixture distributions are limited. We investigated variation in individual admixture in four US populations [European American (EA), African American (AA) and Hispanics from Connecticut (EC) and California (WC)] assuming three-way intermixture among Europeans, Africans and Indigenous Americans. Admixture estimates were inferred using a panel of 36 microsatellites and 1 SNP, which have significant allele frequency differences between ancestral populations, and by using both a maximum likelihood (ML) based method and a Bayesian method implemented in the program STRUCTURE. Simulation studies showed that estimates obtained with this marker panel are within 96% of expected values. EAs had the lowest non-European admixture with both methods, but showed greater homogeneity with STRUCTURE than with ML. All other samples showed a high degree of variation in admixture estimates with both methods, were highly concordant and showed evidence of admixture stratification. With both methods, AA subjects had 16% European and <10% Indigenous American admixture on average. EC Hispanics had higher mean African admixture and the WC Hispanics higher mean Indigenous American admixture, possibly reflecting their different continental origins. PMID:19572378
Shih, Weichung Joe; Li, Gang; Wang, Yining
2016-03-01
Sample size plays a crucial role in clinical trials. Flexible sample-size designs, as part of the more general category of adaptive designs that utilize interim data, have been a popular topic in recent years. In this paper, we give a comparative review of four related methods for such a design. The likelihood method uses the likelihood ratio test with an adjusted critical value. The weighted method adjusts the test statistic with given weights rather than the critical value. The dual test method requires both the likelihood ratio statistic and the weighted statistic to be greater than the unadjusted critical value. The promising zone approach uses the likelihood ratio statistic with the unadjusted value and other constraints. All four methods preserve the type-I error rate. In this paper we explore their properties and compare their relationships and merits. We show that the sample size rules for the dual test are in conflict with the rules of the promising zone approach. We delineate what is necessary to specify in the study protocol to ensure the validity of the statistical procedure and what can be kept implicit in the protocol so that more flexibility can be attained for confirmatory phase III trials in meeting regulatory requirements. We also prove that under mild conditions, the likelihood ratio test still preserves the type-I error rate when the actual sample size is larger than the re-calculated one. Copyright © 2015 Elsevier Inc. All rights reserved.
Liu, Peigui; Elshall, Ahmed S.; Ye, Ming; ...
2016-02-05
Evaluating marginal likelihood is the most critical and computationally expensive task, when conducting Bayesian model averaging to quantify parametric and model uncertainties. The evaluation is commonly done by using Laplace approximations to evaluate semianalytical expressions of the marginal likelihood or by using Monte Carlo (MC) methods to evaluate arithmetic or harmonic mean of a joint likelihood function. This study introduces a new MC method, i.e., thermodynamic integration, which has not been attempted in environmental modeling. Instead of using samples only from prior parameter space (as in arithmetic mean evaluation) or posterior parameter space (as in harmonic mean evaluation), the thermodynamicmore » integration method uses samples generated gradually from the prior to posterior parameter space. This is done through a path sampling that conducts Markov chain Monte Carlo simulation with different power coefficient values applied to the joint likelihood function. The thermodynamic integration method is evaluated using three analytical functions by comparing the method with two variants of the Laplace approximation method and three MC methods, including the nested sampling method that is recently introduced into environmental modeling. The thermodynamic integration method outperforms the other methods in terms of their accuracy, convergence, and consistency. The thermodynamic integration method is also applied to a synthetic case of groundwater modeling with four alternative models. The application shows that model probabilities obtained using the thermodynamic integration method improves predictive performance of Bayesian model averaging. As a result, the thermodynamic integration method is mathematically rigorous, and its MC implementation is computationally general for a wide range of environmental problems.« less
Model Checking with Multi-Threaded IC3 Portfolios
2015-01-15
different runs varies randomly depending on the thread interleaving. The use of a portfolio of solvers to maximize the likelihood of a quick solution is...empirically show (cf. Sec. 5.2) that the predictions based on this formula have high accuracy. Note that each solver in the portfolio potentially searches...speedup of over 300. We also show that widening the proof search of ic3 by randomizing its SAT solver is not as effective as paral- lelization
Approximated maximum likelihood estimation in multifractal random walks
NASA Astrophysics Data System (ADS)
Løvsletten, O.; Rypdal, M.
2012-04-01
We present an approximated maximum likelihood method for the multifractal random walk processes of [E. Bacry , Phys. Rev. EPLEEE81539-375510.1103/PhysRevE.64.026103 64, 026103 (2001)]. The likelihood is computed using a Laplace approximation and a truncation in the dependency structure for the latent volatility. The procedure is implemented as a package in the r computer language. Its performance is tested on synthetic data and compared to an inference approach based on the generalized method of moments. The method is applied to estimate parameters for various financial stock indices.
Draborg, Eva; Andersen, Christian Kronborg
2006-01-01
Health technology assessment (HTA) has been used as input in decision making worldwide for more than 25 years. However, no uniform definition of HTA or agreement on assessment methods exists, leaving open the question of what influences the choice of assessment methods in HTAs. The objective of this study is to analyze statistically a possible relationship between methods of assessment used in practical HTAs, type of assessed technology, type of assessors, and year of publication. A sample of 433 HTAs published by eleven leading institutions or agencies in nine countries was reviewed and analyzed by multiple logistic regression. The study shows that outsourcing of HTA reports to external partners is associated with a higher likelihood of using assessment methods, such as meta-analysis, surveys, economic evaluations, and randomized controlled trials; and with a lower likelihood of using assessment methods, such as literature reviews and "other methods". The year of publication was statistically related to the inclusion of economic evaluations and shows a decreasing likelihood during the year span. The type of assessed technology was related to economic evaluations with a decreasing likelihood, to surveys, and to "other methods" with a decreasing likelihood when pharmaceuticals were the assessed type of technology. During the period from 1989 to 2002, no major developments in assessment methods used in practical HTAs were shown statistically in a sample of 433 HTAs worldwide. Outsourcing to external assessors has a statistically significant influence on choice of assessment methods.
Goldfarb, Samantha; Tarver, Will L; Sen, Bisakha
2014-01-01
Previous literature has asserted that family meals are a key protective factor for certain adolescent risk behaviors. It is suggested that the frequency of eating with the family is associated with better psychological well-being and a lower risk of substance use and delinquency. However, it is unclear whether there is evidence of causal links between family meals and adolescent health-risk behaviors. The purpose of this article is to review the empirical literature on family meals and adolescent health behaviors and outcomes in the US. A SEARCH WAS CONDUCTED IN FOUR ACADEMIC DATABASES: Social Sciences Full Text, Sociological Abstracts, PsycINFO®, and PubMed/MEDLINE. We included studies that quantitatively estimated the relationship between family meals and health-risk behaviors. Data were extracted on study sample, study design, family meal measurement, outcomes, empirical methods, findings, and major issues. Fourteen studies met the inclusion criteria for the review that measured the relationship between frequent family meals and various risk-behavior outcomes. The outcomes considered by most studies were alcohol use (n=10), tobacco use (n=9), and marijuana use (n=6). Other outcomes included sexual activity (n=2); depression, suicidal ideation, and suicide attempts (n=4); violence and delinquency (n=4); school-related issues (n=2); and well-being (n=5). The associations between family meals and the outcomes of interest were most likely to be statistically significant in unadjusted models or models controlling for basic family characteristics. Associations were less likely to be statistically significant when other measures of family connectedness were included. Relatively few analyses used sophisticated empirical techniques available to control for confounders in secondary data. More research is required to establish whether or not the relationship between family dinners and risky adolescent behaviors is an artifact of underlying confounders. We recommend that researchers make more frequent use of sophisticated methods to reduce the problem of confounders in secondary data, and that the scope of adolescent problem behaviors also be further widened.
A computational approach to compare regression modelling strategies in prediction research.
Pajouheshnia, Romin; Pestman, Wiebe R; Teerenstra, Steven; Groenwold, Rolf H H
2016-08-25
It is often unclear which approach to fit, assess and adjust a model will yield the most accurate prediction model. We present an extension of an approach for comparing modelling strategies in linear regression to the setting of logistic regression and demonstrate its application in clinical prediction research. A framework for comparing logistic regression modelling strategies by their likelihoods was formulated using a wrapper approach. Five different strategies for modelling, including simple shrinkage methods, were compared in four empirical data sets to illustrate the concept of a priori strategy comparison. Simulations were performed in both randomly generated data and empirical data to investigate the influence of data characteristics on strategy performance. We applied the comparison framework in a case study setting. Optimal strategies were selected based on the results of a priori comparisons in a clinical data set and the performance of models built according to each strategy was assessed using the Brier score and calibration plots. The performance of modelling strategies was highly dependent on the characteristics of the development data in both linear and logistic regression settings. A priori comparisons in four empirical data sets found that no strategy consistently outperformed the others. The percentage of times that a model adjustment strategy outperformed a logistic model ranged from 3.9 to 94.9 %, depending on the strategy and data set. However, in our case study setting the a priori selection of optimal methods did not result in detectable improvement in model performance when assessed in an external data set. The performance of prediction modelling strategies is a data-dependent process and can be highly variable between data sets within the same clinical domain. A priori strategy comparison can be used to determine an optimal logistic regression modelling strategy for a given data set before selecting a final modelling approach.
Dahabreh, Issa J; Trikalinos, Thomas A; Lau, Joseph; Schmid, Christopher H
2017-03-01
To compare statistical methods for meta-analysis of sensitivity and specificity of medical tests (e.g., diagnostic or screening tests). We constructed a database of PubMed-indexed meta-analyses of test performance from which 2 × 2 tables for each included study could be extracted. We reanalyzed the data using univariate and bivariate random effects models fit with inverse variance and maximum likelihood methods. Analyses were performed using both normal and binomial likelihoods to describe within-study variability. The bivariate model using the binomial likelihood was also fit using a fully Bayesian approach. We use two worked examples-thoracic computerized tomography to detect aortic injury and rapid prescreening of Papanicolaou smears to detect cytological abnormalities-to highlight that different meta-analysis approaches can produce different results. We also present results from reanalysis of 308 meta-analyses of sensitivity and specificity. Models using the normal approximation produced sensitivity and specificity estimates closer to 50% and smaller standard errors compared to models using the binomial likelihood; absolute differences of 5% or greater were observed in 12% and 5% of meta-analyses for sensitivity and specificity, respectively. Results from univariate and bivariate random effects models were similar, regardless of estimation method. Maximum likelihood and Bayesian methods produced almost identical summary estimates under the bivariate model; however, Bayesian analyses indicated greater uncertainty around those estimates. Bivariate models produced imprecise estimates of the between-study correlation of sensitivity and specificity. Differences between methods were larger with increasing proportion of studies that were small or required a continuity correction. The binomial likelihood should be used to model within-study variability. Univariate and bivariate models give similar estimates of the marginal distributions for sensitivity and specificity. Bayesian methods fully quantify uncertainty and their ability to incorporate external evidence may be useful for imprecisely estimated parameters. Copyright © 2017 Elsevier Inc. All rights reserved.
Huang, Chiung-Yu; Qin, Jing
2013-01-01
The Canadian Study of Health and Aging (CSHA) employed a prevalent cohort design to study survival after onset of dementia, where patients with dementia were sampled and the onset time of dementia was determined retrospectively. The prevalent cohort sampling scheme favors individuals who survive longer. Thus, the observed survival times are subject to length bias. In recent years, there has been a rising interest in developing estimation procedures for prevalent cohort survival data that not only account for length bias but also actually exploit the incidence distribution of the disease to improve efficiency. This article considers semiparametric estimation of the Cox model for the time from dementia onset to death under a stationarity assumption with respect to the disease incidence. Under the stationarity condition, the semiparametric maximum likelihood estimation is expected to be fully efficient yet difficult to perform for statistical practitioners, as the likelihood depends on the baseline hazard function in a complicated way. Moreover, the asymptotic properties of the semiparametric maximum likelihood estimator are not well-studied. Motivated by the composite likelihood method (Besag 1974), we develop a composite partial likelihood method that retains the simplicity of the popular partial likelihood estimator and can be easily performed using standard statistical software. When applied to the CSHA data, the proposed method estimates a significant difference in survival between the vascular dementia group and the possible Alzheimer’s disease group, while the partial likelihood method for left-truncated and right-censored data yields a greater standard error and a 95% confidence interval covering 0, thus highlighting the practical value of employing a more efficient methodology. To check the assumption of stable disease for the CSHA data, we also present new graphical and numerical tests in the article. The R code used to obtain the maximum composite partial likelihood estimator for the CSHA data is available in the online Supplementary Material, posted on the journal web site. PMID:24000265
Gomez Penedo, Juan Martin; Constantino, Michael J; Coyne, Alice E; Bernecker, Samantha L; Smith-Hansen, Lotte
2018-01-19
We tested an aptitude by treatment interaction; namely, whether patients' baseline interpersonal problems moderated the comparative efficacy of cognitive-behavioral therapy (CBT) vs. interpersonal psychotherapy (IPT) for bulimia nervosa (BN). Data derived from a randomized-controlled trial. Patients reported on their interpersonal problems at baseline; purge frequency at baseline, midtreatment, and posttreatment; and global eating disorder severity at baseline and posttreatment. We estimated the rate of change in purge frequency across therapy, and the likelihood of attaining clinically meaningful improvement (recovery) in global eating disorder severity by posttreatment. We then tested the interpersonal problem by treatment interactions as predictors of both outcomes. Patients with more baseline overly communal/friendly problems showed steeper reduction in likelihood of purging when treated with CBT vs. IPT. Patients with more problems of being under communal/cold had similar reductions in likelihood of purging across both treatments. Patients with more baseline problems of being overly agentic were more likely to recover when treated with IPT vs. CBT, whereas patients with more problems of being under agentic were more likely to recover when treated with CBT vs. IPT. Interpersonal problems related to communion and agency may inform treatment fit among two empirically supported therapies for BN.
Maximum-likelihood estimation of parameterized wavefronts from multifocal data
Sakamoto, Julia A.; Barrett, Harrison H.
2012-01-01
A method for determining the pupil phase distribution of an optical system is demonstrated. Coefficients in a wavefront expansion were estimated using likelihood methods, where the data consisted of multiple irradiance patterns near focus. Proof-of-principle results were obtained in both simulation and experiment. Large-aberration wavefronts were handled in the numerical study. Experimentally, we discuss the handling of nuisance parameters. Fisher information matrices, Cramér-Rao bounds, and likelihood surfaces are examined. ML estimates were obtained by simulated annealing to deal with numerous local extrema in the likelihood function. Rapid processing techniques were employed to reduce the computational time. PMID:22772282
Implementation of jump-diffusion algorithms for understanding FLIR scenes
NASA Astrophysics Data System (ADS)
Lanterman, Aaron D.; Miller, Michael I.; Snyder, Donald L.
1995-07-01
Our pattern theoretic approach to the automated understanding of forward-looking infrared (FLIR) images brings the traditionally separate endeavors of detection, tracking, and recognition together into a unified jump-diffusion process. New objects are detected and object types are recognized through discrete jump moves. Between jumps, the location and orientation of objects are estimated via continuous diffusions. An hypothesized scene, simulated from the emissive characteristics of the hypothesized scene elements, is compared with the collected data by a likelihood function based on sensor statistics. This likelihood is combined with a prior distribution defined over the set of possible scenes to form a posterior distribution. The jump-diffusion process empirically generates the posterior distribution. Both the diffusion and jump operations involve the simulation of a scene produced by a hypothesized configuration. Scene simulation is most effectively accomplished by pipelined rendering engines such as silicon graphics. We demonstrate the execution of our algorithm on a silicon graphics onyx/reality engine.
The Simple Rules of Social Contagion
Hodas, Nathan O.; Lerman, Kristina
2014-01-01
It is commonly believed that information spreads between individuals like a pathogen, with each exposure by an informed friend potentially resulting in a naive individual becoming infected. However, empirical studies of social media suggest that individual response to repeated exposure to information is far more complex. As a proxy for intervention experiments, we compare user responses to multiple exposures on two different social media sites, Twitter and Digg. We show that the position of exposing messages on the user-interface strongly affects social contagion. Accounting for this visibility significantly simplifies the dynamics of social contagion. The likelihood an individual will spread information increases monotonically with exposure, while explicit feedback about how many friends have previously spread it increases the likelihood of a response. We provide a framework for unifying information visibility, divided attention, and explicit social feedback to predict the temporal dynamics of user behavior. PMID:24614301
Campos-Filho, N; Franco, E L
1989-02-01
A frequent procedure in matched case-control studies is to report results from the multivariate unmatched analyses if they do not differ substantially from the ones obtained after conditioning on the matching variables. Although conceptually simple, this rule requires that an extensive series of logistic regression models be evaluated by both the conditional and unconditional maximum likelihood methods. Most computer programs for logistic regression employ only one maximum likelihood method, which requires that the analyses be performed in separate steps. This paper describes a Pascal microcomputer (IBM PC) program that performs multiple logistic regression by both maximum likelihood estimation methods, which obviates the need for switching between programs to obtain relative risk estimates from both matched and unmatched analyses. The program calculates most standard statistics and allows factoring of categorical or continuous variables by two distinct methods of contrast. A built-in, descriptive statistics option allows the user to inspect the distribution of cases and controls across categories of any given variable.
Bayesian SEM for Specification Search Problems in Testing Factorial Invariance.
Shi, Dexin; Song, Hairong; Liao, Xiaolan; Terry, Robert; Snyder, Lori A
2017-01-01
Specification search problems refer to two important but under-addressed issues in testing for factorial invariance: how to select proper reference indicators and how to locate specific non-invariant parameters. In this study, we propose a two-step procedure to solve these issues. Step 1 is to identify a proper reference indicator using the Bayesian structural equation modeling approach. An item is selected if it is associated with the highest likelihood to be invariant across groups. Step 2 is to locate specific non-invariant parameters, given that a proper reference indicator has already been selected in Step 1. A series of simulation analyses show that the proposed method performs well under a variety of data conditions, and optimal performance is observed under conditions of large magnitude of non-invariance, low proportion of non-invariance, and large sample sizes. We also provide an empirical example to demonstrate the specific procedures to implement the proposed method in applied research. The importance and influences are discussed regarding the choices of informative priors with zero mean and small variances. Extensions and limitations are also pointed out.
Jongerling, Joran; Laurenceau, Jean-Philippe; Hamaker, Ellen L
2015-01-01
In this article we consider a multilevel first-order autoregressive [AR(1)] model with random intercepts, random autoregression, and random innovation variance (i.e., the level 1 residual variance). Including random innovation variance is an important extension of the multilevel AR(1) model for two reasons. First, between-person differences in innovation variance are important from a substantive point of view, in that they capture differences in sensitivity and/or exposure to unmeasured internal and external factors that influence the process. Second, using simulation methods we show that modeling the innovation variance as fixed across individuals, when it should be modeled as a random effect, leads to biased parameter estimates. Additionally, we use simulation methods to compare maximum likelihood estimation to Bayesian estimation of the multilevel AR(1) model and investigate the trade-off between the number of individuals and the number of time points. We provide an empirical illustration by applying the extended multilevel AR(1) model to daily positive affect ratings from 89 married women over the course of 42 consecutive days.
Fourment, Mathieu; Holmes, Edward C
2014-07-24
Early methods for estimating divergence times from gene sequence data relied on the assumption of a molecular clock. More sophisticated methods were created to model rate variation and used auto-correlation of rates, local clocks, or the so called "uncorrelated relaxed clock" where substitution rates are assumed to be drawn from a parametric distribution. In the case of Bayesian inference methods the impact of the prior on branching times is not clearly understood, and if the amount of data is limited the posterior could be strongly influenced by the prior. We develop a maximum likelihood method--Physher--that uses local or discrete clocks to estimate evolutionary rates and divergence times from heterochronous sequence data. Using two empirical data sets we show that our discrete clock estimates are similar to those obtained by other methods, and that Physher outperformed some methods in the estimation of the root age of an influenza virus data set. A simulation analysis suggests that Physher can outperform a Bayesian method when the real topology contains two long branches below the root node, even when evolution is strongly clock-like. These results suggest it is advisable to use a variety of methods to estimate evolutionary rates and divergence times from heterochronous sequence data. Physher and the associated data sets used here are available online at http://code.google.com/p/physher/.
Gasbarra, Dario; Arjas, Elja; Vehtari, Aki; Slama, Rémy; Keiding, Niels
2015-10-01
This paper was inspired by the studies of Niels Keiding and co-authors on estimating the waiting time-to-pregnancy (TTP) distribution, and in particular on using the current duration design in that context. In this design, a cross-sectional sample of women is collected from those who are currently attempting to become pregnant, and then by recording from each the time she has been attempting. Our aim here is to study the identifiability and the estimation of the waiting time distribution on the basis of current duration data. The main difficulty in this stems from the fact that very short waiting times are only rarely selected into the sample of current durations, and this renders their estimation unstable. We introduce here a Bayesian method for this estimation problem, prove its asymptotic consistency, and compare the method to some variants of the non-parametric maximum likelihood estimators, which have been used previously in this context. The properties of the Bayesian estimation method are studied also empirically, using both simulated data and TTP data on current durations collected by Slama et al. (Hum Reprod 27(5):1489-1498, 2012).
Evaluating healthcare priority setting at the meso level: A thematic review of empirical literature
Waithaka, Dennis; Tsofa, Benjamin; Barasa, Edwine
2018-01-01
Background: Decentralization of health systems has made sub-national/regional healthcare systems the backbone of healthcare delivery. These regions are tasked with the difficult responsibility of determining healthcare priorities and resource allocation amidst scarce resources. We aimed to review empirical literature that evaluated priority setting practice at the meso (sub-national) level of health systems. Methods: We systematically searched PubMed, ScienceDirect and Google scholar databases and supplemented these with manual searching for relevant studies, based on the reference list of selected papers. We only included empirical studies that described and evaluated, or those that only evaluated priority setting practice at the meso-level. A total of 16 papers were identified from LMICs and HICs. We analyzed data from the selected papers by thematic review. Results: Few studies used systematic priority setting processes, and all but one were from HICs. Both formal and informal criteria are used in priority-setting, however, informal criteria appear to be more perverse in LMICs compared to HICs. The priority setting process at the meso-level is a top-down approach with minimal involvement of the community. Accountability for reasonableness was the most common evaluative framework as it was used in 12 of the 16 studies. Efficiency, reallocation of resources and options for service delivery redesign were the most common outcome measures used to evaluate priority setting. Limitations: Our study was limited by the fact that there are very few empirical studies that have evaluated priority setting at the meso-level and there is likelihood that we did not capture all the studies. Conclusions: Improving priority setting practices at the meso level is crucial to strengthening health systems. This can be achieved through incorporating and adapting systematic priority setting processes and frameworks to the context where used, and making considerations of both process and outcome measures during priority setting and resource allocation. PMID:29511741
Age, Loss Minimization, and the Role of Probability for Decision-Making.
Best, Ryan; Freund, Alexandra M
2018-04-05
Older adults are stereotypically considered to be risk averse compared to younger age groups, although meta-analyses on age and the influence of gain/loss framing on risky choices have not found empirical evidence for age differences in risk-taking. The current study extends the investigation of age differences in risk preference by including analyses on the effect of the probability of a risky option on choices in gain versus loss situations. Participants (n = 130 adults aged 19-80 years) chose between a certain option and a risky option of varying probability in gain- and loss-framed gambles with actual monetary outcomes. Only younger adults displayed an overall framing effect. Younger and older adults responded differently to probability fluctuations depending on the framing condition. Older adults were more likely to choose the risky option as the likelihood of avoiding a larger loss increased and as the likelihood of a larger gain decreased. Younger adults responded with the opposite pattern: they were more likely to choose the risky option as the likelihood of a larger gain increased and as the likelihood of avoiding a (slightly) larger loss decreased. Results suggest that older adults are more willing to select a risky option when it increases the likelihood that larger losses be avoided, whereas younger adults are more willing to select a risky option when it allows for slightly larger gains. This finding supports expectations based on theoretical accounts of goal orientation shifting away from securing gains in younger adulthood towards maintenance and avoiding losses in older adulthood. Findings are also discussed in respect to the affective enhancement perspective and socioemotional selectivity theory. © 2018 S. Karger AG, Basel.
Midttun, Linda
2007-03-01
In the aftermath of the Norwegian hospital reform of 2002, the private supply of specialized healthcare has increased substantially. This article analyses the likelihood of medical specialists working in the private sector. Sector choice is operationalized in two ways: first, as the likelihood of medical specialists working in the private sector at all (at least 1% of the total work hours), and second, as the likelihood of working full-time (90-100%) privately. The theoretical framework is embedded in work values theory and the results suggest that work values are important predictors of sector choice. All analyses are based on a postal questionnaire survey of medical specialists working in private contract practices and for-profit hospitals and a control group of specialists selected from the Norwegian Medical Association's member register. The analyses revealed that while autonomy values impact positively on the propensity for allocating any time at all to the private sector, professional values have a negative effect. Given that the medical specialist already works in the private sector, a high valuation of professional values and payment and benefit values increases the likelihood of having a dual sector job rather than a full-time private position. However, due to the cross-sectional structure of the data and limitations in the dataset, causality questions cannot be fully settled on the basis of the analyses. The relationship between work values and sector choice should, therefore, be regarded as associations rather than causality links. Finally, the likelihood of working in the private sector varies significantly at the municipality level, suggesting that medical specialist's location is important for sector choice.
Modeling gene expression measurement error: a quasi-likelihood approach
Strimmer, Korbinian
2003-01-01
Background Using suitable error models for gene expression measurements is essential in the statistical analysis of microarray data. However, the true probabilistic model underlying gene expression intensity readings is generally not known. Instead, in currently used approaches some simple parametric model is assumed (usually a transformed normal distribution) or the empirical distribution is estimated. However, both these strategies may not be optimal for gene expression data, as the non-parametric approach ignores known structural information whereas the fully parametric models run the risk of misspecification. A further related problem is the choice of a suitable scale for the model (e.g. observed vs. log-scale). Results Here a simple semi-parametric model for gene expression measurement error is presented. In this approach inference is based an approximate likelihood function (the extended quasi-likelihood). Only partial knowledge about the unknown true distribution is required to construct this function. In case of gene expression this information is available in the form of the postulated (e.g. quadratic) variance structure of the data. As the quasi-likelihood behaves (almost) like a proper likelihood, it allows for the estimation of calibration and variance parameters, and it is also straightforward to obtain corresponding approximate confidence intervals. Unlike most other frameworks, it also allows analysis on any preferred scale, i.e. both on the original linear scale as well as on a transformed scale. It can also be employed in regression approaches to model systematic (e.g. array or dye) effects. Conclusions The quasi-likelihood framework provides a simple and versatile approach to analyze gene expression data that does not make any strong distributional assumptions about the underlying error model. For several simulated as well as real data sets it provides a better fit to the data than competing models. In an example it also improved the power of tests to identify differential expression. PMID:12659637
2003-04-01
34action orientetion ". T^ks concerned pre-flight safety assessments for military combat aircraft and were performed 1^ Army Cobra aviators. Dependent...evaluations are vital during future assessments of team performance and especially for modeling purposes, as the literature lacks empirical...a similar scale, and then assign probabilities to likelihood’s for these in the future . Once completed, one can multiply expected feature values of
Distribution of lod scores in oligogenic linkage analysis.
Williams, J T; North, K E; Martin, L J; Comuzzie, A G; Göring, H H; Blangero, J
2001-01-01
In variance component oligogenic linkage analysis it can happen that the residual additive genetic variance bounds to zero when estimating the effect of the ith quantitative trait locus. Using quantitative trait Q1 from the Genetic Analysis Workshop 12 simulated general population data, we compare the observed lod scores from oligogenic linkage analysis with the empirical lod score distribution under a null model of no linkage. We find that zero residual additive genetic variance in the null model alters the usual distribution of the likelihood-ratio statistic.
Potential fitting biases resulting from grouping data into variable width bins
NASA Astrophysics Data System (ADS)
Towers, S.
2014-07-01
When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses yield unbiased results. However, particle physics experiments are expensive and time consuming to carry out, thus if an analysis has inherent bias (even if unintentional), much money and effort can be wasted trying to replicate or understand the results, particularly if the analysis is fundamental to our understanding of the universe. In this note we discuss the significant biases that can result from data binning schemes. As we will show, if data are binned such that they provide the best comparison to a particular (but incorrect) model, the resulting model parameter estimates when fitting to the binned data can be significantly biased, leading us to too often accept the model hypothesis when it is not in fact true. When using binned likelihood or least squares methods there is of course no a priori requirement that data bin sizes need to be constant, but we show that fitting to data grouped into variable width bins is particularly prone to produce biased results if the bin boundaries are chosen to optimize the comparison of the binned data to a wrong model. The degree of bias that can be achieved simply with variable binning can be surprisingly large. Fitting the data with an unbinned likelihood method, when possible to do so, is the best way for researchers to show that their analyses are not biased by binning effects. Failing that, equal bin widths should be employed as a cross-check of the fitting analysis whenever possible.
A likelihood ratio model for the determination of the geographical origin of olive oil.
Własiuk, Patryk; Martyna, Agnieszka; Zadora, Grzegorz
2015-01-01
Food fraud or food adulteration may be of forensic interest for instance in the case of suspected deliberate mislabeling. On account of its potential health benefits and nutritional qualities, geographical origin determination of olive oil might be of special interest. The use of a likelihood ratio (LR) model has certain advantages in contrast to typical chemometric methods because the LR model takes into account the information about the sample rarity in a relevant population. Such properties are of particular interest to forensic scientists and therefore it has been the aim of this study to examine the issue of olive oil classification with the use of different LR models and their pertinence under selected data pre-processing methods (logarithm based data transformations) and feature selection technique. This was carried out on data describing 572 Italian olive oil samples characterised by the content of 8 fatty acids in the lipid fraction. Three classification problems related to three regions of Italy (South, North and Sardinia) have been considered with the use of LR models. The correct classification rate and empirical cross entropy were taken into account as a measure of performance of each model. The application of LR models in determining the geographical origin of olive oil has proven to be satisfactorily useful for the considered issues analysed in terms of many variants of data pre-processing since the rates of correct classifications were close to 100% and considerable reduction of information loss was observed. The work also presents a comparative study of the performance of the linear discriminant analysis in considered classification problems. An approach to the choice of the value of the smoothing parameter is highlighted for the kernel density estimation based LR models as well. Copyright © 2014 Elsevier B.V. All rights reserved.
Automated model selection in covariance estimation and spatial whitening of MEG and EEG signals.
Engemann, Denis A; Gramfort, Alexandre
2015-03-01
Magnetoencephalography and electroencephalography (M/EEG) measure non-invasively the weak electromagnetic fields induced by post-synaptic neural currents. The estimation of the spatial covariance of the signals recorded on M/EEG sensors is a building block of modern data analysis pipelines. Such covariance estimates are used in brain-computer interfaces (BCI) systems, in nearly all source localization methods for spatial whitening as well as for data covariance estimation in beamformers. The rationale for such models is that the signals can be modeled by a zero mean Gaussian distribution. While maximizing the Gaussian likelihood seems natural, it leads to a covariance estimate known as empirical covariance (EC). It turns out that the EC is a poor estimate of the true covariance when the number of samples is small. To address this issue the estimation needs to be regularized. The most common approach downweights off-diagonal coefficients, while more advanced regularization methods are based on shrinkage techniques or generative models with low rank assumptions: probabilistic PCA (PPCA) and factor analysis (FA). Using cross-validation all of these models can be tuned and compared based on Gaussian likelihood computed on unseen data. We investigated these models on simulations, one electroencephalography (EEG) dataset as well as magnetoencephalography (MEG) datasets from the most common MEG systems. First, our results demonstrate that different models can be the best, depending on the number of samples, heterogeneity of sensor types and noise properties. Second, we show that the models tuned by cross-validation are superior to models with hand-selected regularization. Hence, we propose an automated solution to the often overlooked problem of covariance estimation of M/EEG signals. The relevance of the procedure is demonstrated here for spatial whitening and source localization of MEG signals. Copyright © 2015 Elsevier Inc. All rights reserved.
The Equivalence of Two Methods of Parameter Estimation for the Rasch Model.
ERIC Educational Resources Information Center
Blackwood, Larry G.; Bradley, Edwin L.
1989-01-01
Two methods of estimating parameters in the Rasch model are compared. The equivalence of likelihood estimations from the model of G. J. Mellenbergh and P. Vijn (1981) and from usual unconditional maximum likelihood (UML) estimation is demonstrated. Mellenbergh and Vijn's model is a convenient method of calculating UML estimates. (SLD)
Condition-dependent sex: who does it, when and why?
2016-01-01
We review the phenomenon of condition-dependent sex—where individuals' condition affects the likelihood that they will reproduce sexually rather than asexually. In recent years, condition-dependent sex has been studied both theoretically and empirically. Empirical results in microbes, fungi and plants support the theoretical prediction that negative condition-dependent sex, in which individuals in poor condition are more likely to reproduce sexually, can be evolutionarily advantageous under a wide range of settings. Here, we review the evidence for condition-dependent sex and its potential implications for the long-term survival and adaptability of populations. We conclude by asking why condition-dependent sex is not more commonly observed, and by considering generalizations of condition-dependent sex that might apply even for obligate sexuals. This article is part of the themed issue ‘Weird sex: the underappreciated diversity of sexual reproduction’. PMID:27619702
Simultaneous Control of Error Rates in fMRI Data Analysis
Kang, Hakmook; Blume, Jeffrey; Ombao, Hernando; Badre, David
2015-01-01
The key idea of statistical hypothesis testing is to fix, and thereby control, the Type I error (false positive) rate across samples of any size. Multiple comparisons inflate the global (family-wise) Type I error rate and the traditional solution to maintaining control of the error rate is to increase the local (comparison-wise) Type II error (false negative) rates. However, in the analysis of human brain imaging data, the number of comparisons is so large that this solution breaks down: the local Type II error rate ends up being so large that scientifically meaningful analysis is precluded. Here we propose a novel solution to this problem: allow the Type I error rate to converge to zero along with the Type II error rate. It works because when the Type I error rate per comparison is very small, the accumulation (or global) Type I error rate is also small. This solution is achieved by employing the Likelihood paradigm, which uses likelihood ratios to measure the strength of evidence on a voxel-by-voxel basis. In this paper, we provide theoretical and empirical justification for a likelihood approach to the analysis of human brain imaging data. In addition, we present extensive simulations that show the likelihood approach is viable, leading to ‘cleaner’ looking brain maps and operationally superiority (lower average error rate). Finally, we include a case study on cognitive control related activation in the prefrontal cortex of the human brain. PMID:26272730
Stürmer, Til; Joshi, Manisha; Glynn, Robert J.; Avorn, Jerry; Rothman, Kenneth J.; Schneeweiss, Sebastian
2006-01-01
Objective Propensity score analyses attempt to control for confounding in non-experimental studies by adjusting for the likelihood that a given patient is exposed. Such analyses have been proposed to address confounding by indication, but there is little empirical evidence that they achieve better control than conventional multivariate outcome modeling. Study design and methods Using PubMed and Science Citation Index, we assessed the use of propensity scores over time and critically evaluated studies published through 2003. Results Use of propensity scores increased from a total of 8 papers before 1998 to 71 in 2003. Most of the 177 published studies abstracted assessed medications (N=60) or surgical interventions (N=51), mainly in cardiology and cardiac surgery (N=90). Whether PS methods or conventional outcome models were used to control for confounding had little effect on results in those studies in which such comparison was possible. Only 9 out of 69 studies (13%) had an effect estimate that differed by more than 20% from that obtained with a conventional outcome model in all PS analyses presented. Conclusions Publication of results based on propensity score methods has increased dramatically, but there is little evidence that these methods yield substantially different estimates compared with conventional multivariable methods. PMID:16632131
Methods to estimate the between‐study variance and its uncertainty in meta‐analysis†
Jackson, Dan; Viechtbauer, Wolfgang; Bender, Ralf; Bowden, Jack; Knapp, Guido; Kuss, Oliver; Higgins, Julian PT; Langan, Dean; Salanti, Georgia
2015-01-01
Meta‐analyses are typically used to estimate the overall/mean of an outcome of interest. However, inference about between‐study variability, which is typically modelled using a between‐study variance parameter, is usually an additional aim. The DerSimonian and Laird method, currently widely used by default to estimate the between‐study variance, has been long challenged. Our aim is to identify known methods for estimation of the between‐study variance and its corresponding uncertainty, and to summarise the simulation and empirical evidence that compares them. We identified 16 estimators for the between‐study variance, seven methods to calculate confidence intervals, and several comparative studies. Simulation studies suggest that for both dichotomous and continuous data the estimator proposed by Paule and Mandel and for continuous data the restricted maximum likelihood estimator are better alternatives to estimate the between‐study variance. Based on the scenarios and results presented in the published studies, we recommend the Q‐profile method and the alternative approach based on a ‘generalised Cochran between‐study variance statistic’ to compute corresponding confidence intervals around the resulting estimates. Our recommendations are based on a qualitative evaluation of the existing literature and expert consensus. Evidence‐based recommendations require an extensive simulation study where all methods would be compared under the same scenarios. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd. PMID:26332144
NASA Technical Reports Server (NTRS)
1979-01-01
The computer program Linear SCIDNT which evaluates rotorcraft stability and control coefficients from flight or wind tunnel test data is described. It implements the maximum likelihood method to maximize the likelihood function of the parameters based on measured input/output time histories. Linear SCIDNT may be applied to systems modeled by linear constant-coefficient differential equations. This restriction in scope allows the application of several analytical results which simplify the computation and improve its efficiency over the general nonlinear case.
Nosocomial pneumonia in the ICU--year 2000 and beyond.
Bowton, D L
1999-03-01
Diagnostic and treatment strategies in ICU patients with ventilator-associated pneumonia (VAP) remain controversial, largely because of the paucity of well-controlled comparison trials using clinically important end points. Recent studies indicating that early appropriate antibiotic therapy significantly lowers mortality underscore the urgent need for well-designed comparative trials. When quantitatively cultured, bronchial specimens obtained by noninvasive techniques may provide clinically useful information and avoid the higher costs and risks of invasive bronchoscopic diagnostic techniques. Previous antibiotic use before onset of nosocomial pneumonia raises the likelihood of infection with highly virulent organisms, such as Pseudomonas aeruginosa and Acinetobacter sp. Thus, the empiric antibiotic regimen should be active against these Gram-negative pathogens as well as other common Gram-negative and Gram-positive causative organisms. Promising preventive modalities for nosocomial VAP include use of a semirecumbent position, endotracheal tubes that allow continuous aspiration of secretions, and heat and moisture exchangers. Rotating their standard empiric antibiotic regimens and restricting the use of third-generation cephalosporins as empiric therapy may help hospitals reduce the incidence of nosocomial pneumonia caused by resistant Gram-negative pathogens.
Unified framework to evaluate panmixia and migration direction among multiple sampling locations.
Beerli, Peter; Palczewski, Michal
2010-05-01
For many biological investigations, groups of individuals are genetically sampled from several geographic locations. These sampling locations often do not reflect the genetic population structure. We describe a framework using marginal likelihoods to compare and order structured population models, such as testing whether the sampling locations belong to the same randomly mating population or comparing unidirectional and multidirectional gene flow models. In the context of inferences employing Markov chain Monte Carlo methods, the accuracy of the marginal likelihoods depends heavily on the approximation method used to calculate the marginal likelihood. Two methods, modified thermodynamic integration and a stabilized harmonic mean estimator, are compared. With finite Markov chain Monte Carlo run lengths, the harmonic mean estimator may not be consistent. Thermodynamic integration, in contrast, delivers considerably better estimates of the marginal likelihood. The choice of prior distributions does not influence the order and choice of the better models when the marginal likelihood is estimated using thermodynamic integration, whereas with the harmonic mean estimator the influence of the prior is pronounced and the order of the models changes. The approximation of marginal likelihood using thermodynamic integration in MIGRATE allows the evaluation of complex population genetic models, not only of whether sampling locations belong to a single panmictic population, but also of competing complex structured population models.
Likelihood-based methods for evaluating principal surrogacy in augmented vaccine trials.
Liu, Wei; Zhang, Bo; Zhang, Hui; Zhang, Zhiwei
2017-04-01
There is growing interest in assessing immune biomarkers, which are quick to measure and potentially predictive of long-term efficacy, as surrogate endpoints in randomized, placebo-controlled vaccine trials. This can be done under a principal stratification approach, with principal strata defined using a subject's potential immune responses to vaccine and placebo (the latter may be assumed to be zero). In this context, principal surrogacy refers to the extent to which vaccine efficacy varies across principal strata. Because a placebo recipient's potential immune response to vaccine is unobserved in a standard vaccine trial, augmented vaccine trials have been proposed to produce the information needed to evaluate principal surrogacy. This article reviews existing methods based on an estimated likelihood and a pseudo-score (PS) and proposes two new methods based on a semiparametric likelihood (SL) and a pseudo-likelihood (PL), for analyzing augmented vaccine trials. Unlike the PS method, the SL method does not require a model for missingness, which can be advantageous when immune response data are missing by happenstance. The SL method is shown to be asymptotically efficient, and it performs similarly to the PS and PL methods in simulation experiments. The PL method appears to have a computational advantage over the PS and SL methods.
Handwriting individualization using distance and rarity
NASA Astrophysics Data System (ADS)
Tang, Yi; Srihari, Sargur; Srinivasan, Harish
2012-01-01
Forensic individualization is the task of associating observed evidence with a specific source. The likelihood ratio (LR) is a quantitative measure that expresses the degree of uncertainty in individualization, where the numerator represents the likelihood that the evidence corresponds to the known and the denominator the likelihood that it does not correspond to the known. Since the number of parameters needed to compute the LR is exponential with the number of feature measurements, a commonly used simplification is the use of likelihoods based on distance (or similarity) given the two alternative hypotheses. This paper proposes an intermediate method which decomposes the LR as the product of two factors, one based on distance and the other on rarity. It was evaluated using a data set of handwriting samples, by determining whether two writing samples were written by the same/different writer(s). The accuracy of the distance and rarity method, as measured by error rates, is significantly better than the distance method.
Rasch, Elizabeth K.; Huynh, Minh; Ho, Pei-Shu; Heuser, Aaron; Houtenville, Andrew; Chan, Leighton
2014-01-01
Background: Given the complexity of the adjudication process and volume of applications to Social Security Administration’s (SSA) disability programs, many individuals with serious medical conditions die while awaiting an application decision. Limitations of traditional survival methods called for a new empirical approach to identify conditions resulting in rapid mortality. Objective: To identify health conditions associated with significantly higher mortality than a key reference group among applicants for SSA disability programs. Research design: We identified mortality patterns and generated a survival surface for a reference group using conditions already designated for expedited processing. We identified conditions associated with significantly higher mortality than the reference group and prioritized them by the expected likelihood of death during the adjudication process. Subjects: Administrative records of 29 million Social Security disability applicants, who applied for benefits from 1996 – 2007, were analyzed. Measures: We computed survival spells from time of onset of disability to death, and from date of application to death. Survival data were organized by entry cohort. Results: In our sample, we observed that approximately 42,000 applicants died before a decision was made on their disability claims. We identified 24 conditions with survival profiles comparable to the reference group. Applicants with these conditions were not likely to survive adjudication. Conclusions: Our approach facilitates ongoing revision of the conditions SSA designates for expedited awards and has applicability to other programs where survival profiles are a consideration. PMID:25310524
2009-01-01
Background In 1999 a four-level hierarchy of evidence was promoted by the National Health and Medical Research Council in Australia. The primary purpose of this hierarchy was to assist with clinical practice guideline development, although it was co-opted for use in systematic literature reviews and health technology assessments. In this hierarchy interventional study designs were ranked according to the likelihood that bias had been eliminated and thus it was not ideal to assess studies that addressed other types of clinical questions. This paper reports on the revision and extension of this evidence hierarchy to enable broader use within existing evidence assessment systems. Methods A working party identified and assessed empirical evidence, and used a commissioned review of existing evidence assessment schema, to support decision-making regarding revision of the hierarchy. The aim was to retain the existing evidence levels I-IV but increase their relevance for assessing the quality of individual diagnostic accuracy, prognostic, aetiologic and screening studies. Comprehensive public consultation was undertaken and the revised hierarchy was piloted by individual health technology assessment agencies and clinical practice guideline developers. After two and a half years, the hierarchy was again revised and commenced a further 18 month pilot period. Results A suitable framework was identified upon which to model the revision. Consistency was maintained in the hierarchy of "levels of evidence" across all types of clinical questions; empirical evidence was used to support the relationship between study design and ranking in the hierarchy wherever possible; and systematic reviews of lower level studies were themselves ascribed a ranking. The impact of ethics on the hierarchy of study designs was acknowledged in the framework, along with a consideration of how harms should be assessed. Conclusion The revised evidence hierarchy is now widely used and provides a common standard against which to initially judge the likelihood of bias in individual studies evaluating interventional, diagnostic accuracy, prognostic, aetiologic or screening topics. Detailed quality appraisal of these individual studies, as well as grading of the body of evidence to answer each clinical, research or policy question, can then be undertaken as required. PMID:19519887
Parameter Estimation for Thurstone Choice Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vojnovic, Milan; Yun, Seyoung
We consider the estimation accuracy of individual strength parameters of a Thurstone choice model when each input observation consists of a choice of one item from a set of two or more items (so called top-1 lists). This model accommodates the well-known choice models such as the Luce choice model for comparison sets of two or more items and the Bradley-Terry model for pair comparisons. We provide a tight characterization of the mean squared error of the maximum likelihood parameter estimator. We also provide similar characterizations for parameter estimators defined by a rank-breaking method, which amounts to deducing one ormore » more pair comparisons from a comparison of two or more items, assuming independence of these pair comparisons, and maximizing a likelihood function derived under these assumptions. We also consider a related binary classification problem where each individual parameter takes value from a set of two possible values and the goal is to correctly classify all items within a prescribed classification error. The results of this paper shed light on how the parameter estimation accuracy depends on given Thurstone choice model and the structure of comparison sets. In particular, we found that for unbiased input comparison sets of a given cardinality, when in expectation each comparison set of given cardinality occurs the same number of times, for a broad class of Thurstone choice models, the mean squared error decreases with the cardinality of comparison sets, but only marginally according to a diminishing returns relation. On the other hand, we found that there exist Thurstone choice models for which the mean squared error of the maximum likelihood parameter estimator can decrease much faster with the cardinality of comparison sets. We report empirical evaluation of some claims and key parameters revealed by theory using both synthetic and real-world input data from some popular sport competitions and online labor platforms.« less
Reconstruction of far-field tsunami amplitude distributions from earthquake sources
Geist, Eric L.; Parsons, Thomas E.
2016-01-01
The probability distribution of far-field tsunami amplitudes is explained in relation to the distribution of seismic moment at subduction zones. Tsunami amplitude distributions at tide gauge stations follow a similar functional form, well described by a tapered Pareto distribution that is parameterized by a power-law exponent and a corner amplitude. Distribution parameters are first established for eight tide gauge stations in the Pacific, using maximum likelihood estimation. A procedure is then developed to reconstruct the tsunami amplitude distribution that consists of four steps: (1) define the distribution of seismic moment at subduction zones; (2) establish a source-station scaling relation from regression analysis; (3) transform the seismic moment distribution to a tsunami amplitude distribution for each subduction zone; and (4) mix the transformed distribution for all subduction zones to an aggregate tsunami amplitude distribution specific to the tide gauge station. The tsunami amplitude distribution is adequately reconstructed for four tide gauge stations using globally constant seismic moment distribution parameters established in previous studies. In comparisons to empirical tsunami amplitude distributions from maximum likelihood estimation, the reconstructed distributions consistently exhibit higher corner amplitude values, implying that in most cases, the empirical catalogs are too short to include the largest amplitudes. Because the reconstructed distribution is based on a catalog of earthquakes that is much larger than the tsunami catalog, it is less susceptible to the effects of record-breaking events and more indicative of the actual distribution of tsunami amplitudes.
Mercer, C H; Jones, K G; Johnson, A M; Lewis, R; Mitchell, K R; Clifton, S; Tanton, C; Sonnenberg, P; Wellings, K; Cassell, J A; Estcourt, C S
2017-01-01
Background Partnership type is a determinant of STI risk; yet, it is poorly and inconsistently recorded in clinical practice and research. We identify a novel, empirical-based categorisation of partnership type, and examine whether reporting STI diagnoses varies by the resulting typologies. Methods Analyses of probability survey data collected from 15 162 people aged 16–74 who participated in Britain's third National Survey of Sexual Attitudes and Lifestyles were undertaken during 2010–2012. Computer-assisted self-interviews asked about participants' ≤3 most recent partners (N=14 322 partners/past year). Analysis of variance and regression tested for differences in partnership duration and perceived likelihood of sex again across 21 ‘partnership progression types’ (PPTs) derived from relationship status at first and most recent sex. Multivariable regression examined the association between reporting STI diagnoses and partnership type(s) net of age and reported partner numbers (all past year). Results The 21 PPTs were grouped into four summary types: ‘cohabiting’, ‘now steady’, ‘casual’ and ‘ex-steady’ according to the average duration and likelihood of sex again. 11 combinations of these summary types accounted for 94.5% of all men; 13 combinations accounted for 96.9% of all women. Reporting STI diagnoses varied by partnership-type combination, including after adjusting for age and partner numbers, for example, adjusted OR: 6.03 (95% CI 2.01 to 18.1) for men with two ‘casual’ and one ‘now steady’ partners versus men with one ‘cohabiting’ partner. Conclusions This typology provides an objective method for measuring partnership type and demonstrates its importance in understanding STI risk, net of partner numbers. Epidemiological research and clinical practice should use these methods and results to maximise individual and public health benefit. PMID:27535765
Maximum-likelihood methods in wavefront sensing: stochastic models and likelihood functions
Barrett, Harrison H.; Dainty, Christopher; Lara, David
2008-01-01
Maximum-likelihood (ML) estimation in wavefront sensing requires careful attention to all noise sources and all factors that influence the sensor data. We present detailed probability density functions for the output of the image detector in a wavefront sensor, conditional not only on wavefront parameters but also on various nuisance parameters. Practical ways of dealing with nuisance parameters are described, and final expressions for likelihoods and Fisher information matrices are derived. The theory is illustrated by discussing Shack–Hartmann sensors, and computational requirements are discussed. Simulation results show that ML estimation can significantly increase the dynamic range of a Shack–Hartmann sensor with four detectors and that it can reduce the residual wavefront error when compared with traditional methods. PMID:17206255
Assessing compatibility of direct detection data: halo-independent global likelihood analyses
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gelmini, Graciela B.; Huh, Ji-Haeng; Witte, Samuel J.
2016-10-18
We present two different halo-independent methods to assess the compatibility of several direct dark matter detection data sets for a given dark matter model using a global likelihood consisting of at least one extended likelihood and an arbitrary number of Gaussian or Poisson likelihoods. In the first method we find the global best fit halo function (we prove that it is a unique piecewise constant function with a number of down steps smaller than or equal to a maximum number that we compute) and construct a two-sided pointwise confidence band at any desired confidence level, which can then be comparedmore » with those derived from the extended likelihood alone to assess the joint compatibility of the data. In the second method we define a “constrained parameter goodness-of-fit” test statistic, whose p-value we then use to define a “plausibility region” (e.g. where p≥10%). For any halo function not entirely contained within the plausibility region, the level of compatibility of the data is very low (e.g. p<10%). We illustrate these methods by applying them to CDMS-II-Si and SuperCDMS data, assuming dark matter particles with elastic spin-independent isospin-conserving interactions or exothermic spin-independent isospin-violating interactions.« less
Consistency of Rasch Model Parameter Estimation: A Simulation Study.
ERIC Educational Resources Information Center
van den Wollenberg, Arnold L.; And Others
1988-01-01
The unconditional--simultaneous--maximum likelihood (UML) estimation procedure for the one-parameter logistic model produces biased estimators. The UML method is inconsistent and is not a good alternative to conditional maximum likelihood method, at least with small numbers of items. The minimum Chi-square estimation procedure produces unbiased…
Julien, Clavel; Leandro, Aristide; Hélène, Morlon
2018-06-19
Working with high-dimensional phylogenetic comparative datasets is challenging because likelihood-based multivariate methods suffer from low statistical performances as the number of traits p approaches the number of species n and because some computational complications occur when p exceeds n. Alternative phylogenetic comparative methods have recently been proposed to deal with the large p small n scenario but their use and performances are limited. Here we develop a penalized likelihood framework to deal with high-dimensional comparative datasets. We propose various penalizations and methods for selecting the intensity of the penalties. We apply this general framework to the estimation of parameters (the evolutionary trait covariance matrix and parameters of the evolutionary model) and model comparison for the high-dimensional multivariate Brownian (BM), Early-burst (EB), Ornstein-Uhlenbeck (OU) and Pagel's lambda models. We show using simulations that our penalized likelihood approach dramatically improves the estimation of evolutionary trait covariance matrices and model parameters when p approaches n, and allows for their accurate estimation when p equals or exceeds n. In addition, we show that penalized likelihood models can be efficiently compared using Generalized Information Criterion (GIC). We implement these methods, as well as the related estimation of ancestral states and the computation of phylogenetic PCA in the R package RPANDA and mvMORPH. Finally, we illustrate the utility of the new proposed framework by evaluating evolutionary models fit, analyzing integration patterns, and reconstructing evolutionary trajectories for a high-dimensional 3-D dataset of brain shape in the New World monkeys. We find a clear support for an Early-burst model suggesting an early diversification of brain morphology during the ecological radiation of the clade. Penalized likelihood offers an efficient way to deal with high-dimensional multivariate comparative data.
Estimating the variance for heterogeneity in arm-based network meta-analysis.
Piepho, Hans-Peter; Madden, Laurence V; Roger, James; Payne, Roger; Williams, Emlyn R
2018-04-19
Network meta-analysis can be implemented by using arm-based or contrast-based models. Here we focus on arm-based models and fit them using generalized linear mixed model procedures. Full maximum likelihood (ML) estimation leads to biased trial-by-treatment interaction variance estimates for heterogeneity. Thus, our objective is to investigate alternative approaches to variance estimation that reduce bias compared with full ML. Specifically, we use penalized quasi-likelihood/pseudo-likelihood and hierarchical (h) likelihood approaches. In addition, we consider a novel model modification that yields estimators akin to the residual maximum likelihood estimator for linear mixed models. The proposed methods are compared by simulation, and 2 real datasets are used for illustration. Simulations show that penalized quasi-likelihood/pseudo-likelihood and h-likelihood reduce bias and yield satisfactory coverage rates. Sum-to-zero restriction and baseline contrasts for random trial-by-treatment interaction effects, as well as a residual ML-like adjustment, also reduce bias compared with an unconstrained model when ML is used, but coverage rates are not quite as good. Penalized quasi-likelihood/pseudo-likelihood and h-likelihood are therefore recommended. Copyright © 2018 John Wiley & Sons, Ltd.
On the existence of maximum likelihood estimates for presence-only data
Hefley, Trevor J.; Hooten, Mevin B.
2015-01-01
It is important to identify conditions for which maximum likelihood estimates are unlikely to be identifiable from presence-only data. In data sets where the maximum likelihood estimates do not exist, penalized likelihood and Bayesian methods will produce coefficient estimates, but these are sensitive to the choice of estimation procedure and prior or penalty term. When sample size is small or it is thought that habitat preferences are strong, we propose a suite of estimation procedures researchers can consider using.
Likelihood-based modification of experimental crystal structure electron density maps
Terwilliger, Thomas C [Sante Fe, NM
2005-04-16
A maximum-likelihood method for improves an electron density map of an experimental crystal structure. A likelihood of a set of structure factors {F.sub.h } is formed for the experimental crystal structure as (1) the likelihood of having obtained an observed set of structure factors {F.sub.h.sup.OBS } if structure factor set {F.sub.h } was correct, and (2) the likelihood that an electron density map resulting from {F.sub.h } is consistent with selected prior knowledge about the experimental crystal structure. The set of structure factors {F.sub.h } is then adjusted to maximize the likelihood of {F.sub.h } for the experimental crystal structure. An improved electron density map is constructed with the maximized structure factors.
Combination Therapy for Treatment of Infections with Gram-Negative Bacteria
Cosgrove, Sara E.; Maragakis, Lisa L.
2012-01-01
Summary: Combination antibiotic therapy for invasive infections with Gram-negative bacteria is employed in many health care facilities, especially for certain subgroups of patients, including those with neutropenia, those with infections caused by Pseudomonas aeruginosa, those with ventilator-associated pneumonia, and the severely ill. An argument can be made for empiric combination therapy, as we are witnessing a rise in infections caused by multidrug-resistant Gram-negative organisms. The wisdom of continued combination therapy after an organism is isolated and antimicrobial susceptibility data are known, however, is more controversial. The available evidence suggests that the greatest benefit of combination antibiotic therapy stems from the increased likelihood of choosing an effective agent during empiric therapy, rather than exploitation of in vitro synergy or the prevention of resistance during definitive treatment. In this review, we summarize the available data comparing monotherapy versus combination antimicrobial therapy for the treatment of infections with Gram-negative bacteria. PMID:22763634
Condition-dependent sex: who does it, when and why?
Ram, Yoav; Hadany, Lilach
2016-10-19
We review the phenomenon of condition-dependent sex-where individuals' condition affects the likelihood that they will reproduce sexually rather than asexually. In recent years, condition-dependent sex has been studied both theoretically and empirically. Empirical results in microbes, fungi and plants support the theoretical prediction that negative condition-dependent sex, in which individuals in poor condition are more likely to reproduce sexually, can be evolutionarily advantageous under a wide range of settings. Here, we review the evidence for condition-dependent sex and its potential implications for the long-term survival and adaptability of populations. We conclude by asking why condition-dependent sex is not more commonly observed, and by considering generalizations of condition-dependent sex that might apply even for obligate sexuals.This article is part of the themed issue 'Weird sex: the underappreciated diversity of sexual reproduction'. © 2016 The Author(s).
Maternal cigarette smoking during pregnancy and criminal/deviant behavior: a meta-analysis.
Pratt, Travis C; McGloin, Jean Marie; Fearn, Noelle E
2006-12-01
A growing body of empirical literature has emerged examining the somewhat inconsistent relationship between maternal cigarette smoking (MCS) during pregnancy and children's subsequent antisocial behavior. To systematically assess what existing studies reveal regarding MCS as a criminogenic risk factor for offspring, the authors subjected this body of literature to a meta-analysis. The analysis reveals a statistically significant--yet rather small--overall mean "effect size" of the relationship between MCS and the likelihood children will engage in deviant/criminal behavior. In addition to being rather moderate in size, the MCS-crime/deviance relationship is sensitive to a number of methodological specifications across empirical studies--particularly those associated with sample characteristics. The implications of this modest, and somewhat unstable, relationship are discussed in terms of guidelines for future research on this subject and how existing theoretical perspectives may be integrated to explain the MCS-crime/deviance link.
Health risk perception and betel chewing behavior--the evidence from Taiwan.
Chen, Chiang-Ming; Chang, Kuo-Liang; Lin, Lin; Lee, Jwo-Leun
2013-11-01
In this study, we provided an empirical examination of the interaction between people's health risk perception and betel chewing. We hypothesized that a better knowledge of possible health risks would reduce both the number of individuals who currently chew betel and the likelihood of those who do not yet chew betel to begin the habit. We constructed a simultaneous equation model with Bayesian two-stage approach to control the endogeneity between betel chewing and risk perception. Using a national survey of 26,684 observations in Taiwan, our study results indicated that better health knowledge reduced the possibility that people would become betel chewers. We also found that, in general, betel chewers have a poorer health risk perception than other population. Overall, the empirical evidence suggested that health authorities could reduce the odds of people becoming betel chewers by improving their knowledge of betel-chewing's harmful effects. © 2013 Elsevier Ltd. All rights reserved.
Age of onset group characteristics in forensic patients with schizophrenia.
Vinokur, D; Levine, S Z; Roe, D; Krivoy, A; Fischel, T
2014-03-01
This study aims to empirically identify age of onset groups and their clinical and background characteristics in forensic patients with schizophrenia. Hospital charts were reviewed of all 138 forensic patients with schizophrenia admitted to Geha Psychiatric Hospital that serves a catchment area of approximately 500,000 people, from 2000 to 2009 inclusive. Admixture analysis empirically identified early- (M=19.99, SD=3.31) and late-onset groups (M=36.13, SD=9.25). Early-onset was associated with more suicide attempts, violence before the age of 15, and early conduct problems, whereas late-onset was associated with a greater likelihood of violence after the age of 18 and marriage (P<0.01). The current findings provide clinicians with a unique direction for risk assessment and indicate differences in violence between early- and late-onset schizophrenia, particularly co-occurrence of harmful behavioral phenotypes. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Population Synthesis of Radio and Gamma-ray Pulsars using the Maximum Likelihood Approach
NASA Astrophysics Data System (ADS)
Billman, Caleb; Gonthier, P. L.; Harding, A. K.
2012-01-01
We present the results of a pulsar population synthesis of normal pulsars from the Galactic disk using a maximum likelihood method. We seek to maximize the likelihood of a set of parameters in a Monte Carlo population statistics code to better understand their uncertainties and the confidence region of the model's parameter space. The maximum likelihood method allows for the use of more applicable Poisson statistics in the comparison of distributions of small numbers of detected gamma-ray and radio pulsars. Our code simulates pulsars at birth using Monte Carlo techniques and evolves them to the present assuming initial spatial, kick velocity, magnetic field, and period distributions. Pulsars are spun down to the present and given radio and gamma-ray emission characteristics. We select measured distributions of radio pulsars from the Parkes Multibeam survey and Fermi gamma-ray pulsars to perform a likelihood analysis of the assumed model parameters such as initial period and magnetic field, and radio luminosity. We present the results of a grid search of the parameter space as well as a search for the maximum likelihood using a Markov Chain Monte Carlo method. We express our gratitude for the generous support of the Michigan Space Grant Consortium, of the National Science Foundation (REU and RUI), the NASA Astrophysics Theory and Fundamental Program and the NASA Fermi Guest Investigator Program.
Wu, Yufeng
2012-03-01
Incomplete lineage sorting can cause incongruence between the phylogenetic history of genes (the gene tree) and that of the species (the species tree), which can complicate the inference of phylogenies. In this article, I present a new coalescent-based algorithm for species tree inference with maximum likelihood. I first describe an improved method for computing the probability of a gene tree topology given a species tree, which is much faster than an existing algorithm by Degnan and Salter (2005). Based on this method, I develop a practical algorithm that takes a set of gene tree topologies and infers species trees with maximum likelihood. This algorithm searches for the best species tree by starting from initial species trees and performing heuristic search to obtain better trees with higher likelihood. This algorithm, called STELLS (which stands for Species Tree InfErence with Likelihood for Lineage Sorting), has been implemented in a program that is downloadable from the author's web page. The simulation results show that the STELLS algorithm is more accurate than an existing maximum likelihood method for many datasets, especially when there is noise in gene trees. I also show that the STELLS algorithm is efficient and can be applied to real biological datasets. © 2011 The Author. Evolution© 2011 The Society for the Study of Evolution.
Hock, Sabrina; Hasenauer, Jan; Theis, Fabian J
2013-01-01
Diffusion is a key component of many biological processes such as chemotaxis, developmental differentiation and tissue morphogenesis. Since recently, the spatial gradients caused by diffusion can be assessed in-vitro and in-vivo using microscopy based imaging techniques. The resulting time-series of two dimensional, high-resolutions images in combination with mechanistic models enable the quantitative analysis of the underlying mechanisms. However, such a model-based analysis is still challenging due to measurement noise and sparse observations, which result in uncertainties of the model parameters. We introduce a likelihood function for image-based measurements with log-normal distributed noise. Based upon this likelihood function we formulate the maximum likelihood estimation problem, which is solved using PDE-constrained optimization methods. To assess the uncertainty and practical identifiability of the parameters we introduce profile likelihoods for diffusion processes. As proof of concept, we model certain aspects of the guidance of dendritic cells towards lymphatic vessels, an example for haptotaxis. Using a realistic set of artificial measurement data, we estimate the five kinetic parameters of this model and compute profile likelihoods. Our novel approach for the estimation of model parameters from image data as well as the proposed identifiability analysis approach is widely applicable to diffusion processes. The profile likelihood based method provides more rigorous uncertainty bounds in contrast to local approximation methods.
Profile-Likelihood Approach for Estimating Generalized Linear Mixed Models with Factor Structures
ERIC Educational Resources Information Center
Jeon, Minjeong; Rabe-Hesketh, Sophia
2012-01-01
In this article, the authors suggest a profile-likelihood approach for estimating complex models by maximum likelihood (ML) using standard software and minimal programming. The method works whenever setting some of the parameters of the model to known constants turns the model into a standard model. An important class of models that can be…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernard, Michael Lewis; Smith, Bruce W., 1959-
Understanding the role of emotional states is critical for predicting the kind of decisions people will make in risky situations. Currently, there is little understanding as to how emotion influences decision-making in situations such as terrorist attacks, natural disasters, pandemics, and combat. To help address this, we used behavioral and neuroimaging methods to examine how emotion states and traits influence decisions. Specifically, this study used a wheel of fortune behavioral task and functional magnetic resonance imaging (fMRI) to examine the effects of emotional states and traits on decision-making pertaining to the degree of risk people are willing to make inmore » specific situations. The behavioral results are reported here. The neural data requires additional time to analyze and will be reported at a future date. Biases caused by emotion states and traits were found regarding the likelihood of making risky decisions. The behavioral results will help provide a solid empirical foundation for modeling the effects of emotion on decision in risky situations.« less
Zeng, Chan; Newcomer, Sophia R; Glanz, Jason M; Shoup, Jo Ann; Daley, Matthew F; Hambidge, Simon J; Xu, Stanley
2013-12-15
The self-controlled case series (SCCS) method is often used to examine the temporal association between vaccination and adverse events using only data from patients who experienced such events. Conditional Poisson regression models are used to estimate incidence rate ratios, and these models perform well with large or medium-sized case samples. However, in some vaccine safety studies, the adverse events studied are rare and the maximum likelihood estimates may be biased. Several bias correction methods have been examined in case-control studies using conditional logistic regression, but none of these methods have been evaluated in studies using the SCCS design. In this study, we used simulations to evaluate 2 bias correction approaches-the Firth penalized maximum likelihood method and Cordeiro and McCullagh's bias reduction after maximum likelihood estimation-with small sample sizes in studies using the SCCS design. The simulations showed that the bias under the SCCS design with a small number of cases can be large and is also sensitive to a short risk period. The Firth correction method provides finite and less biased estimates than the maximum likelihood method and Cordeiro and McCullagh's method. However, limitations still exist when the risk period in the SCCS design is short relative to the entire observation period.
Kamneva, Olga K; Rosenberg, Noah A
2017-01-01
Hybridization events generate reticulate species relationships, giving rise to species networks rather than species trees. We report a comparative study of consensus, maximum parsimony, and maximum likelihood methods of species network reconstruction using gene trees simulated assuming a known species history. We evaluate the role of the divergence time between species involved in a hybridization event, the relative contributions of the hybridizing species, and the error in gene tree estimation. When gene tree discordance is mostly due to hybridization and not due to incomplete lineage sorting (ILS), most of the methods can detect even highly skewed hybridization events between highly divergent species. For recent divergences between hybridizing species, when the influence of ILS is sufficiently high, likelihood methods outperform parsimony and consensus methods, which erroneously identify extra hybridizations. The more sophisticated likelihood methods, however, are affected by gene tree errors to a greater extent than are consensus and parsimony. PMID:28469378
Approximate likelihood calculation on a phylogeny for Bayesian estimation of divergence times.
dos Reis, Mario; Yang, Ziheng
2011-07-01
The molecular clock provides a powerful way to estimate species divergence times. If information on some species divergence times is available from the fossil or geological record, it can be used to calibrate a phylogeny and estimate divergence times for all nodes in the tree. The Bayesian method provides a natural framework to incorporate different sources of information concerning divergence times, such as information in the fossil and molecular data. Current models of sequence evolution are intractable in a Bayesian setting, and Markov chain Monte Carlo (MCMC) is used to generate the posterior distribution of divergence times and evolutionary rates. This method is computationally expensive, as it involves the repeated calculation of the likelihood function. Here, we explore the use of Taylor expansion to approximate the likelihood during MCMC iteration. The approximation is much faster than conventional likelihood calculation. However, the approximation is expected to be poor when the proposed parameters are far from the likelihood peak. We explore the use of parameter transforms (square root, logarithm, and arcsine) to improve the approximation to the likelihood curve. We found that the new methods, particularly the arcsine-based transform, provided very good approximations under relaxed clock models and also under the global clock model when the global clock is not seriously violated. The approximation is poorer for analysis under the global clock when the global clock is seriously wrong and should thus not be used. The results suggest that the approximate method may be useful for Bayesian dating analysis using large data sets.
Mikulich-Gilbertson, Susan K; Wagner, Brandie D; Grunwald, Gary K; Riggs, Paula D; Zerbe, Gary O
2018-01-01
Medical research is often designed to investigate changes in a collection of response variables that are measured repeatedly on the same subjects. The multivariate generalized linear mixed model (MGLMM) can be used to evaluate random coefficient associations (e.g. simple correlations, partial regression coefficients) among outcomes that may be non-normal and differently distributed by specifying a multivariate normal distribution for their random effects and then evaluating the latent relationship between them. Empirical Bayes predictors are readily available for each subject from any mixed model and are observable and hence, plotable. Here, we evaluate whether second-stage association analyses of empirical Bayes predictors from a MGLMM, provide a good approximation and visual representation of these latent association analyses using medical examples and simulations. Additionally, we compare these results with association analyses of empirical Bayes predictors generated from separate mixed models for each outcome, a procedure that could circumvent computational problems that arise when the dimension of the joint covariance matrix of random effects is large and prohibits estimation of latent associations. As has been shown in other analytic contexts, the p-values for all second-stage coefficients that were determined by naively assuming normality of empirical Bayes predictors provide a good approximation to p-values determined via permutation analysis. Analyzing outcomes that are interrelated with separate models in the first stage and then associating the resulting empirical Bayes predictors in a second stage results in different mean and covariance parameter estimates from the maximum likelihood estimates generated by a MGLMM. The potential for erroneous inference from using results from these separate models increases as the magnitude of the association among the outcomes increases. Thus if computable, scatterplots of the conditionally independent empirical Bayes predictors from a MGLMM are always preferable to scatterplots of empirical Bayes predictors generated by separate models, unless the true association between outcomes is zero.
Computation of nonparametric convex hazard estimators via profile methods.
Jankowski, Hanna K; Wellner, Jon A
2009-05-01
This paper proposes a profile likelihood algorithm to compute the nonparametric maximum likelihood estimator of a convex hazard function. The maximisation is performed in two steps: First the support reduction algorithm is used to maximise the likelihood over all hazard functions with a given point of minimum (or antimode). Then it is shown that the profile (or partially maximised) likelihood is quasi-concave as a function of the antimode, so that a bisection algorithm can be applied to find the maximum of the profile likelihood, and hence also the global maximum. The new algorithm is illustrated using both artificial and real data, including lifetime data for Canadian males and females.
Transcranial magnetic stimulation: Improved coil design for deep brain investigation
NASA Astrophysics Data System (ADS)
Crowther, L. J.; Marketos, P.; Williams, P. I.; Melikhov, Y.; Jiles, D. C.; Starzewski, J. H.
2011-04-01
This paper reports on a design for a coil for transcranial magnetic stimulation. The design shows potential for improving the penetration depth of the magnetic field, allowing stimulation of subcortical structures within the brain. The magnetic and induced electric fields in the human head have been calculated with finite element electromagnetic modeling software and compared with empirical measurements. Results show that the coil design used gives improved penetration depth, but also indicates the likelihood of stimulation of additional tissue resulting from the spatial distribution of the magnetic field.
A Bayesian approach to parameter and reliability estimation in the Poisson distribution.
NASA Technical Reports Server (NTRS)
Canavos, G. C.
1972-01-01
For life testing procedures, a Bayesian analysis is developed with respect to a random intensity parameter in the Poisson distribution. Bayes estimators are derived for the Poisson parameter and the reliability function based on uniform and gamma prior distributions of that parameter. A Monte Carlo procedure is implemented to make possible an empirical mean-squared error comparison between Bayes and existing minimum variance unbiased, as well as maximum likelihood, estimators. As expected, the Bayes estimators have mean-squared errors that are appreciably smaller than those of the other two.
ERIC Educational Resources Information Center
Klein, Andreas G.; Muthen, Bengt O.
2007-01-01
In this article, a nonlinear structural equation model is introduced and a quasi-maximum likelihood method for simultaneous estimation and testing of multiple nonlinear effects is developed. The focus of the new methodology lies on efficiency, robustness, and computational practicability. Monte-Carlo studies indicate that the method is highly…
Bias and Efficiency in Structural Equation Modeling: Maximum Likelihood versus Robust Methods
ERIC Educational Resources Information Center
Zhong, Xiaoling; Yuan, Ke-Hai
2011-01-01
In the structural equation modeling literature, the normal-distribution-based maximum likelihood (ML) method is most widely used, partly because the resulting estimator is claimed to be asymptotically unbiased and most efficient. However, this may not hold when data deviate from normal distribution. Outlying cases or nonnormally distributed data,…
Five Methods for Estimating Angoff Cut Scores with IRT
ERIC Educational Resources Information Center
Wyse, Adam E.
2017-01-01
This article illustrates five different methods for estimating Angoff cut scores using item response theory (IRT) models. These include maximum likelihood (ML), expected a priori (EAP), modal a priori (MAP), and weighted maximum likelihood (WML) estimators, as well as the most commonly used approach based on translating ratings through the test…
Hudson, H M; Ma, J; Green, P
1994-01-01
Many algorithms for medical image reconstruction adopt versions of the expectation-maximization (EM) algorithm. In this approach, parameter estimates are obtained which maximize a complete data likelihood or penalized likelihood, in each iteration. Implicitly (and sometimes explicitly) penalized algorithms require smoothing of the current reconstruction in the image domain as part of their iteration scheme. In this paper, we discuss alternatives to EM which adapt Fisher's method of scoring (FS) and other methods for direct maximization of the incomplete data likelihood. Jacobi and Gauss-Seidel methods for non-linear optimization provide efficient algorithms applying FS in tomography. One approach uses smoothed projection data in its iterations. We investigate the convergence of Jacobi and Gauss-Seidel algorithms with clinical tomographic projection data.
Correlated Observations, the Law of Small Numbers and Bank Runs
2016-01-01
Empirical descriptions and studies suggest that generally depositors observe a sample of previous decisions before deciding if to keep their funds deposited or to withdraw them. These observed decisions may exhibit different degrees of correlation across depositors. In our model depositors decide sequentially and are assumed to follow the law of small numbers in the sense that they believe that a bank run is underway if the number of observed withdrawals in their sample is large. Theoretically, with highly correlated samples and infinite depositors runs occur with certainty, while with random samples it needs not be the case, as for many parameter settings the likelihood of bank runs is zero. We investigate the intermediate cases and find that i) decreasing the correlation and ii) increasing the sample size reduces the likelihood of bank runs, ceteris paribus. Interestingly, the multiplicity of equilibria, a feature of the canonical Diamond-Dybvig model that we use also, disappears almost completely in our setup. Our results have relevant policy implications. PMID:27035435
Avoiding overstating the strength of forensic evidence: Shrunk likelihood ratios/Bayes factors.
Morrison, Geoffrey Stewart; Poh, Norman
2018-05-01
When strength of forensic evidence is quantified using sample data and statistical models, a concern may be raised as to whether the output of a model overestimates the strength of evidence. This is particularly the case when the amount of sample data is small, and hence sampling variability is high. This concern is related to concern about precision. This paper describes, explores, and tests three procedures which shrink the value of the likelihood ratio or Bayes factor toward the neutral value of one. The procedures are: (1) a Bayesian procedure with uninformative priors, (2) use of empirical lower and upper bounds (ELUB), and (3) a novel form of regularized logistic regression. As a benchmark, they are compared with linear discriminant analysis, and in some instances with non-regularized logistic regression. The behaviours of the procedures are explored using Monte Carlo simulated data, and tested on real data from comparisons of voice recordings, face images, and glass fragments. Copyright © 2018 The Authors. Published by Elsevier B.V. All rights reserved.
Correlated Observations, the Law of Small Numbers and Bank Runs.
Horváth, Gergely; Kiss, Hubert János
2016-01-01
Empirical descriptions and studies suggest that generally depositors observe a sample of previous decisions before deciding if to keep their funds deposited or to withdraw them. These observed decisions may exhibit different degrees of correlation across depositors. In our model depositors decide sequentially and are assumed to follow the law of small numbers in the sense that they believe that a bank run is underway if the number of observed withdrawals in their sample is large. Theoretically, with highly correlated samples and infinite depositors runs occur with certainty, while with random samples it needs not be the case, as for many parameter settings the likelihood of bank runs is zero. We investigate the intermediate cases and find that i) decreasing the correlation and ii) increasing the sample size reduces the likelihood of bank runs, ceteris paribus. Interestingly, the multiplicity of equilibria, a feature of the canonical Diamond-Dybvig model that we use also, disappears almost completely in our setup. Our results have relevant policy implications.
NASA Astrophysics Data System (ADS)
Pan, Zhen; Anderes, Ethan; Knox, Lloyd
2018-05-01
One of the major targets for next-generation cosmic microwave background (CMB) experiments is the detection of the primordial B-mode signal. Planning is under way for Stage-IV experiments that are projected to have instrumental noise small enough to make lensing and foregrounds the dominant source of uncertainty for estimating the tensor-to-scalar ratio r from polarization maps. This makes delensing a crucial part of future CMB polarization science. In this paper we present a likelihood method for estimating the tensor-to-scalar ratio r from CMB polarization observations, which combines the benefits of a full-scale likelihood approach with the tractability of the quadratic delensing technique. This method is a pixel space, all order likelihood analysis of the quadratic delensed B modes, and it essentially builds upon the quadratic delenser by taking into account all order lensing and pixel space anomalies. Its tractability relies on a crucial factorization of the pixel space covariance matrix of the polarization observations which allows one to compute the full Gaussian approximate likelihood profile, as a function of r , at the same computational cost of a single likelihood evaluation.
Undersampling power-law size distributions: effect on the assessment of extreme natural hazards
Geist, Eric L.; Parsons, Thomas E.
2014-01-01
The effect of undersampling on estimating the size of extreme natural hazards from historical data is examined. Tests using synthetic catalogs indicate that the tail of an empirical size distribution sampled from a pure Pareto probability distribution can range from having one-to-several unusually large events to appearing depleted, relative to the parent distribution. Both of these effects are artifacts caused by limited catalog length. It is more difficult to diagnose the artificially depleted empirical distributions, since one expects that a pure Pareto distribution is physically limited in some way. Using maximum likelihood methods and the method of moments, we estimate the power-law exponent and the corner size parameter of tapered Pareto distributions for several natural hazard examples: tsunamis, floods, and earthquakes. Each of these examples has varying catalog lengths and measurement thresholds, relative to the largest event sizes. In many cases where there are only several orders of magnitude between the measurement threshold and the largest events, joint two-parameter estimation techniques are necessary to account for estimation dependence between the power-law scaling exponent and the corner size parameter. Results indicate that whereas the corner size parameter of a tapered Pareto distribution can be estimated, its upper confidence bound cannot be determined and the estimate itself is often unstable with time. Correspondingly, one cannot statistically reject a pure Pareto null hypothesis using natural hazard catalog data. Although physical limits to the hazard source size and by attenuation mechanisms from source to site constrain the maximum hazard size, historical data alone often cannot reliably determine the corner size parameter. Probabilistic assessments incorporating theoretical constraints on source size and propagation effects are preferred over deterministic assessments of extreme natural hazards based on historic data.
Padial, José M; Grant, Taran; Frost, Darrel R
2014-06-26
Brachycephaloidea is a monophyletic group of frogs with more than 1000 species distributed throughout the New World tropics, subtropics, and Andean regions. Recently, the group has been the target of multiple molecular phylogenetic analyses, resulting in extensive changes in its taxonomy. Here, we test previous hypotheses of phylogenetic relationships for the group by combining available molecular evidence (sequences of 22 genes representing 431 ingroup and 25 outgroup terminals) and performing a tree-alignment analysis under the parsimony optimality criterion using the program POY. To elucidate the effects of alignment and optimality criterion on phylogenetic inferences, we also used the program MAFFT to obtain a similarity-alignment for analysis under both parsimony and maximum likelihood using the programs TNT and GARLI, respectively. Although all three analytical approaches agreed on numerous points, there was also extensive disagreement. Tree-alignment under parsimony supported the monophyly of the ingroup and the sister group relationship of the monophyletic marsupial frogs (Hemiphractidae), while maximum likelihood and parsimony analyses of the MAFFT similarity-alignment did not. All three methods differed with respect to the position of Ceuthomantis smaragdinus (Ceuthomantidae), with tree-alignment using parsimony recovering this species as the sister of Pristimantis + Yunganastes. All analyses rejected the monophyly of Strabomantidae and Strabomantinae as originally defined, and the tree-alignment analysis under parsimony further rejected the recently redefined Craugastoridae and Pristimantinae. Despite the greater emphasis in the systematics literature placed on the choice of optimality criterion for evaluating trees than on the choice of method for aligning DNA sequences, we found that the topological differences attributable to the alignment method were as great as those caused by the optimality criterion. Further, the optimal tree-alignment indicates that insertions and deletions occurred in twice as many aligned positions as implied by the optimal similarity-alignment, confirming previous findings that sequence turnover through insertion and deletion events plays a greater role in molecular evolution than indicated by similarity-alignments. Our results also provide a clear empirical demonstration of the different effects of wildcard taxa produced by missing data in parsimony and maximum likelihood analyses. Specifically, maximum likelihood analyses consistently (81% bootstrap frequency) provided spurious resolution despite a lack of evidence, whereas parsimony correctly depicted the ambiguity due to missing data by collapsing unsupported nodes. We provide a new taxonomy for the group that retains previously recognized Linnaean taxa except for Ceuthomantidae, Strabomantidae, and Strabomantinae. A phenotypically diagnosable superfamily is recognized formally as Brachycephaloidea, with the informal, unranked name terrarana retained as the standard common name for these frogs. We recognize three families within Brachycephaloidea that are currently diagnosable solely on molecular grounds (Brachycephalidae, Craugastoridae, and Eleutherodactylidae), as well as five subfamilies (Craugastorinae, Eleutherodactylinae, Holoadeninae, Phyzelaphryninae, and Pristimantinae) corresponding in large part to previous families and subfamilies. Our analyses upheld the monophyly of all tested genera, but we found numerous subgeneric taxa to be non-monophyletic and modified the taxonomy accordingly.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling.
Wang, Sheng; Sun, Siqi; Xu, Jinbo
2016-09-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC.
AUC-Maximized Deep Convolutional Neural Fields for Protein Sequence Labeling
Wang, Sheng; Sun, Siqi
2017-01-01
Deep Convolutional Neural Networks (DCNN) has shown excellent performance in a variety of machine learning tasks. This paper presents Deep Convolutional Neural Fields (DeepCNF), an integration of DCNN with Conditional Random Field (CRF), for sequence labeling with an imbalanced label distribution. The widely-used training methods, such as maximum-likelihood and maximum labelwise accuracy, do not work well on imbalanced data. To handle this, we present a new training algorithm called maximum-AUC for DeepCNF. That is, we train DeepCNF by directly maximizing the empirical Area Under the ROC Curve (AUC), which is an unbiased measurement for imbalanced data. To fulfill this, we formulate AUC in a pairwise ranking framework, approximate it by a polynomial function and then apply a gradient-based procedure to optimize it. Our experimental results confirm that maximum-AUC greatly outperforms the other two training methods on 8-state secondary structure prediction and disorder prediction since their label distributions are highly imbalanced and also has similar performance as the other two training methods on solvent accessibility prediction, which has three equally-distributed labels. Furthermore, our experimental results show that our AUC-trained DeepCNF models greatly outperform existing popular predictors of these three tasks. The data and software related to this paper are available at https://github.com/realbigws/DeepCNF_AUC. PMID:28884168
Automatic Spike Sorting Using Tuning Information
Ventura, Valérie
2011-01-01
Current spike sorting methods focus on clustering neurons’ characteristic spike waveforms. The resulting spike-sorted data are typically used to estimate how covariates of interest modulate the firing rates of neurons. However, when these covariates do modulate the firing rates, they provide information about spikes’ identities, which thus far have been ignored for the purpose of spike sorting. This letter describes a novel approach to spike sorting, which incorporates both waveform information and tuning information obtained from the modulation of firing rates. Because it efficiently uses all the available information, this spike sorter yields lower spike misclassification rates than traditional automatic spike sorters. This theoretical result is verified empirically on several examples. The proposed method does not require additional assumptions; only its implementation is different. It essentially consists of performing spike sorting and tuning estimation simultaneously rather than sequentially, as is currently done. We used an expectation-maximization maximum likelihood algorithm to implement the new spike sorter. We present the general form of this algorithm and provide a detailed implementable version under the assumptions that neurons are independent and spike according to Poisson processes. Finally, we uncover a systematic flaw of spike sorting based on waveform information only. PMID:19548802
Automatic spike sorting using tuning information.
Ventura, Valérie
2009-09-01
Current spike sorting methods focus on clustering neurons' characteristic spike waveforms. The resulting spike-sorted data are typically used to estimate how covariates of interest modulate the firing rates of neurons. However, when these covariates do modulate the firing rates, they provide information about spikes' identities, which thus far have been ignored for the purpose of spike sorting. This letter describes a novel approach to spike sorting, which incorporates both waveform information and tuning information obtained from the modulation of firing rates. Because it efficiently uses all the available information, this spike sorter yields lower spike misclassification rates than traditional automatic spike sorters. This theoretical result is verified empirically on several examples. The proposed method does not require additional assumptions; only its implementation is different. It essentially consists of performing spike sorting and tuning estimation simultaneously rather than sequentially, as is currently done. We used an expectation-maximization maximum likelihood algorithm to implement the new spike sorter. We present the general form of this algorithm and provide a detailed implementable version under the assumptions that neurons are independent and spike according to Poisson processes. Finally, we uncover a systematic flaw of spike sorting based on waveform information only.
Simmons, Mark P; Goloboff, Pablo A
2013-10-01
Empirical and simulated examples are used to demonstrate an artifact caused by undersampling optimal trees in data matrices that consist mostly or entirely of locally sampled (as opposed to globally, for most or all terminals) characters. The artifact is that unsupported clades consisting entirely of terminals scored for the same locally sampled partition may be resolved and assigned high resampling support-despite their being properly unsupported (i.e., not resolved in the strict consensus of all optimal trees). This artifact occurs despite application of random-addition sequences for stepwise terminal addition. The artifact is not necessarily obviated with thorough conventional branch swapping methods (even tree-bisection-reconnection) when just a single tree is held, as is sometimes implemented in parsimony bootstrap pseudoreplicates, and in every GARLI, PhyML, and RAxML pseudoreplicate and search for the most likely tree for the matrix as a whole. Hence GARLI, RAxML, and PhyML-based likelihood results require extra scrutiny, particularly when they provide high resolution and support for clades that are entirely unsupported by methods that perform more thorough searches, as in most parsimony analyses. Copyright © 2013 Elsevier Inc. All rights reserved.
A new estimator method for GARCH models
NASA Astrophysics Data System (ADS)
Onody, R. N.; Favaro, G. M.; Cazaroto, E. R.
2007-06-01
The GARCH (p, q) model is a very interesting stochastic process with widespread applications and a central role in empirical finance. The Markovian GARCH (1, 1) model has only 3 control parameters and a much discussed question is how to estimate them when a series of some financial asset is given. Besides the maximum likelihood estimator technique, there is another method which uses the variance, the kurtosis and the autocorrelation time to determine them. We propose here to use the standardized 6th moment. The set of parameters obtained in this way produces a very good probability density function and a much better time autocorrelation function. This is true for both studied indexes: NYSE Composite and FTSE 100. The probability of return to the origin is investigated at different time horizons for both Gaussian and Laplacian GARCH models. In spite of the fact that these models show almost identical performances with respect to the final probability density function and to the time autocorrelation function, their scaling properties are, however, very different. The Laplacian GARCH model gives a better scaling exponent for the NYSE time series, whereas the Gaussian dynamics fits better the FTSE scaling exponent.
NASA Astrophysics Data System (ADS)
Mondal, Puskar; Korenaga, Jun
2018-03-01
The dispersion relation of the Rayleigh-Taylor instability, a gravitational instability associated with unstable density stratification, is of profound importance in various geophysical contexts. When more than two layers are involved, a semi-analytical technique based on the biharmonic formulation of Stokes flow has been extensively used to obtain such dispersion relation. However, this technique may become cumbersome when applied to lithospheric dynamics, where a number of layers are necessary to represent the continuous variation of viscosity over many orders of magnitude. Here, we present an alternative and more efficient method based on the propagator matrix formulation of Stokes flow. With this approach, the original instability problem is reduced to a compact eigenvalue equation whose size is solely determined by the number of primary density contrasts. We apply this new technique to the stability of the early crust, and combined with the Monte Carlo sensitivity analysis, we derive an empirical formula to compute the growth rate of the Rayleigh-Taylor instability for this particular geophysical setting. Our analysis indicates that the likelihood of crustal delamination hinges critically on the effective viscosity of eclogite.
Likelihood Methods for Adaptive Filtering and Smoothing. Technical Report #455.
ERIC Educational Resources Information Center
Butler, Ronald W.
The dynamic linear model or Kalman filtering model provides a useful methodology for predicting the past, present, and future states of a dynamic system, such as an object in motion or an economic or social indicator that is changing systematically with time. Recursive likelihood methods for adaptive Kalman filtering and smoothing are developed.…
ERIC Educational Resources Information Center
Han, Kyung T.; Guo, Fanmin
2014-01-01
The full-information maximum likelihood (FIML) method makes it possible to estimate and analyze structural equation models (SEM) even when data are partially missing, enabling incomplete data to contribute to model estimation. The cornerstone of FIML is the missing-at-random (MAR) assumption. In (unidimensional) computerized adaptive testing…
Hey, Jody; Nielsen, Rasmus
2007-01-01
In 1988, Felsenstein described a framework for assessing the likelihood of a genetic data set in which all of the possible genealogical histories of the data are considered, each in proportion to their probability. Although not analytically solvable, several approaches, including Markov chain Monte Carlo methods, have been developed to find approximate solutions. Here, we describe an approach in which Markov chain Monte Carlo simulations are used to integrate over the space of genealogies, whereas other parameters are integrated out analytically. The result is an approximation to the full joint posterior density of the model parameters. For many purposes, this function can be treated as a likelihood, thereby permitting likelihood-based analyses, including likelihood ratio tests of nested models. Several examples, including an application to the divergence of chimpanzee subspecies, are provided. PMID:17301231
Challenges in Species Tree Estimation Under the Multispecies Coalescent Model
Xu, Bo; Yang, Ziheng
2016-01-01
The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods. PMID:27927902
Artifact removal from EEG data with empirical mode decomposition
NASA Astrophysics Data System (ADS)
Grubov, Vadim V.; Runnova, Anastasiya E.; Efremova, Tatyana Yu.; Hramov, Alexander E.
2017-03-01
In the paper we propose the novel method for dealing with the physiological artifacts caused by intensive activity of facial and neck muscles and other movements in experimental human EEG recordings. The method is based on analysis of EEG signals with empirical mode decomposition (Hilbert-Huang transform). We introduce the mathematical algorithm of the method with following steps: empirical mode decomposition of EEG signal, choosing of empirical modes with artifacts, removing empirical modes with artifacts, reconstruction of the initial EEG signal. We test the method on filtration of experimental human EEG signals from movement artifacts and show high efficiency of the method.
Goldfarb, Samantha; Tarver, Will L; Sen, Bisakha
2014-01-01
Background Previous literature has asserted that family meals are a key protective factor for certain adolescent risk behaviors. It is suggested that the frequency of eating with the family is associated with better psychological well-being and a lower risk of substance use and delinquency. However, it is unclear whether there is evidence of causal links between family meals and adolescent health-risk behaviors. Purpose The purpose of this article is to review the empirical literature on family meals and adolescent health behaviors and outcomes in the US. Data sources A search was conducted in four academic databases: Social Sciences Full Text, Sociological Abstracts, PsycINFO®, and PubMed/MEDLINE. Study selection We included studies that quantitatively estimated the relationship between family meals and health-risk behaviors. Data extraction Data were extracted on study sample, study design, family meal measurement, outcomes, empirical methods, findings, and major issues. Data synthesis Fourteen studies met the inclusion criteria for the review that measured the relationship between frequent family meals and various risk-behavior outcomes. The outcomes considered by most studies were alcohol use (n=10), tobacco use (n=9), and marijuana use (n=6). Other outcomes included sexual activity (n=2); depression, suicidal ideation, and suicide attempts (n=4); violence and delinquency (n=4); school-related issues (n=2); and well-being (n=5). The associations between family meals and the outcomes of interest were most likely to be statistically significant in unadjusted models or models controlling for basic family characteristics. Associations were less likely to be statistically significant when other measures of family connectedness were included. Relatively few analyses used sophisticated empirical techniques available to control for confounders in secondary data. Conclusion More research is required to establish whether or not the relationship between family dinners and risky adolescent behaviors is an artifact of underlying confounders. We recommend that researchers make more frequent use of sophisticated methods to reduce the problem of confounders in secondary data, and that the scope of adolescent problem behaviors also be further widened. PMID:24627645
Dong, Yi; Mihalas, Stefan; Russell, Alexander; Etienne-Cummings, Ralph; Niebur, Ernst
2012-01-01
When a neuronal spike train is observed, what can we say about the properties of the neuron that generated it? A natural way to answer this question is to make an assumption about the type of neuron, select an appropriate model for this type, and then to choose the model parameters as those that are most likely to generate the observed spike train. This is the maximum likelihood method. If the neuron obeys simple integrate and fire dynamics, Paninski, Pillow, and Simoncelli (2004) showed that its negative log-likelihood function is convex and that its unique global minimum can thus be found by gradient descent techniques. The global minimum property requires independence of spike time intervals. Lack of history dependence is, however, an important constraint that is not fulfilled in many biological neurons which are known to generate a rich repertoire of spiking behaviors that are incompatible with history independence. Therefore, we expanded the integrate and fire model by including one additional variable, a variable threshold (Mihalas & Niebur, 2009) allowing for history-dependent firing patterns. This neuronal model produces a large number of spiking behaviors while still being linear. Linearity is important as it maintains the distribution of the random variables and still allows for maximum likelihood methods to be used. In this study we show that, although convexity of the negative log-likelihood is not guaranteed for this model, the minimum of the negative log-likelihood function yields a good estimate for the model parameters, in particular if the noise level is treated as a free parameter. Furthermore, we show that a nonlinear function minimization method (r-algorithm with space dilation) frequently reaches the global minimum. PMID:21851282
A likelihood ratio test for evolutionary rate shifts and functional divergence among proteins
Knudsen, Bjarne; Miyamoto, Michael M.
2001-01-01
Changes in protein function can lead to changes in the selection acting on specific residues. This can often be detected as evolutionary rate changes at the sites in question. A maximum-likelihood method for detecting evolutionary rate shifts at specific protein positions is presented. The method determines significance values of the rate differences to give a sound statistical foundation for the conclusions drawn from the analyses. A statistical test for detecting slowly evolving sites is also described. The methods are applied to a set of Myc proteins for the identification of both conserved sites and those with changing evolutionary rates. Those positions with conserved and changing rates are related to the structures and functions of their proteins. The results are compared with an earlier Bayesian method, thereby highlighting the advantages of the new likelihood ratio tests. PMID:11734650
Empirical Profiles of Alcohol and Marijuana Use, Drugged Driving, and Risk Perceptions.
Arterberry, Brooke J; Treloar, Hayley; McCarthy, Denis M
2017-11-01
The present study sought to inform models of risk for drugged driving through empirically identifying patterns of marijuana use, alcohol use, and related driving behaviors. Perceived dangerousness and consequences of drugged driving were evaluated as putative influences on risk patterns. We used latent profile analysis of survey responses from 897 college students to identify patterns of substance use and drugged driving. We tested the hypotheses that low perceived danger and low perceived likelihood of negative consequences of drugged driving would identify individuals with higher-risk patterns. Findings from the latent profile analysis indicated that a four-profile model provided the best model fit. Low-level engagers had low rates of substance use and drugged driving. Alcohol-centric engagers had higher rates of alcohol use but low rates of marijuana/simultaneous use and low rates of driving after substance use. Concurrent engagers had higher rates of marijuana and alcohol use, simultaneous use, and related driving behaviors, but marijuana-centric/simultaneous engagers had the highest rates of marijuana use, co-use, and related driving behaviors. Those with higher perceived danger of driving while high were more likely to be in the low-level, alcohol-centric, or concurrent engagers' profiles; individuals with higher perceived likelihood of consequences of driving while high were more likely to be in the low-level engagers group. Findings suggested that college students' perceived dangerousness of driving after using marijuana had greater influence on drugged driving behaviors than alcohol-related driving risk perceptions. These results support targeting marijuana-impaired driving risk perceptions in young adult intervention programs.
Using a Novel Spatial Tool to Inform Invasive Species Early Detection and Rapid Response Efforts
NASA Astrophysics Data System (ADS)
Davidson, Alisha D.; Fusaro, Abigail J.; Kashian, Donna R.
2015-07-01
Management of invasive species has increasingly emphasized the importance of early detection and rapid response (EDRR) programs in limiting introductions, establishment, and impacts. These programs require an understanding of vector and species spatial dynamics to prioritize monitoring sites and efficiently allocate resources. Yet managers often lack the empirical data necessary to make these decisions. We developed an empirical mapping tool that can facilitate development of EDRR programs through identifying high-risk locations, particularly within the recreational boating vector. We demonstrated the utility of this tool in the Great Lakes watershed. We surveyed boaters to identify trips among water bodies and to quantify behaviors associated with high likelihood of species transfer (e.g., not removing organic materials from boat trailers) during that trip. We mapped water bodies with high-risk inbound and outbound boater movements using ArcGIS. We also tested for differences in high-risk behaviors based on demographic variables to understand risk differences among boater groups. Incorporation of boater behavior led to identification of additional high-risk water bodies compared to using the number of trips alone. Therefore, the number of trips itself may not fully reflect the likelihood of invasion. This tool can be broadly applied in other geographic contexts and with different taxa, and can be adjusted according to varying levels of information concerning the vector or species of interest. The methodology is straightforward and can be followed after a basic introduction to ArcGIS software. The visual nature of the mapping tool will facilitate site prioritization by managers and stakeholders from diverse backgrounds.
Xu, Xu Steven; Yuan, Min; Yang, Haitao; Feng, Yan; Xu, Jinfeng; Pinheiro, Jose
2017-01-01
Covariate analysis based on population pharmacokinetics (PPK) is used to identify clinically relevant factors. The likelihood ratio test (LRT) based on nonlinear mixed effect model fits is currently recommended for covariate identification, whereas individual empirical Bayesian estimates (EBEs) are considered unreliable due to the presence of shrinkage. The objectives of this research were to investigate the type I error for LRT and EBE approaches, to confirm the similarity of power between the LRT and EBE approaches from a previous report and to explore the influence of shrinkage on LRT and EBE inferences. Using an oral one-compartment PK model with a single covariate impacting on clearance, we conducted a wide range of simulations according to a two-way factorial design. The results revealed that the EBE-based regression not only provided almost identical power for detecting a covariate effect, but also controlled the false positive rate better than the LRT approach. Shrinkage of EBEs is likely not the root cause for decrease in power or inflated false positive rate although the size of the covariate effect tends to be underestimated at high shrinkage. In summary, contrary to the current recommendations, EBEs may be a better choice for statistical tests in PPK covariate analysis compared to LRT. We proposed a three-step covariate modeling approach for population PK analysis to utilize the advantages of EBEs while overcoming their shortcomings, which allows not only markedly reducing the run time for population PK analysis, but also providing more accurate covariate tests.
Estimating Model Probabilities using Thermodynamic Markov Chain Monte Carlo Methods
NASA Astrophysics Data System (ADS)
Ye, M.; Liu, P.; Beerli, P.; Lu, D.; Hill, M. C.
2014-12-01
Markov chain Monte Carlo (MCMC) methods are widely used to evaluate model probability for quantifying model uncertainty. In a general procedure, MCMC simulations are first conducted for each individual model, and MCMC parameter samples are then used to approximate marginal likelihood of the model by calculating the geometric mean of the joint likelihood of the model and its parameters. It has been found the method of evaluating geometric mean suffers from the numerical problem of low convergence rate. A simple test case shows that even millions of MCMC samples are insufficient to yield accurate estimation of the marginal likelihood. To resolve this problem, a thermodynamic method is used to have multiple MCMC runs with different values of a heating coefficient between zero and one. When the heating coefficient is zero, the MCMC run is equivalent to a random walk MC in the prior parameter space; when the heating coefficient is one, the MCMC run is the conventional one. For a simple case with analytical form of the marginal likelihood, the thermodynamic method yields more accurate estimate than the method of using geometric mean. This is also demonstrated for a case of groundwater modeling with consideration of four alternative models postulated based on different conceptualization of a confining layer. This groundwater example shows that model probabilities estimated using the thermodynamic method are more reasonable than those obtained using the geometric method. The thermodynamic method is general, and can be used for a wide range of environmental problem for model uncertainty quantification.
Renault, Nisa K E; Pritchett, Sonja M; Howell, Robin E; Greer, Wenda L; Sapienza, Carmen; Ørstavik, Karen Helene; Hamilton, David C
2013-01-01
In eutherian mammals, one X-chromosome in every XX somatic cell is transcriptionally silenced through the process of X-chromosome inactivation (XCI). Females are thus functional mosaics, where some cells express genes from the paternal X, and the others from the maternal X. The relative abundance of the two cell populations (X-inactivation pattern, XIP) can have significant medical implications for some females. In mice, the ‘choice' of which X to inactivate, maternal or paternal, in each cell of the early embryo is genetically influenced. In humans, the timing of XCI choice and whether choice occurs completely randomly or under a genetic influence is debated. Here, we explore these questions by analysing the distribution of XIPs in large populations of normal females. Models were generated to predict XIP distributions resulting from completely random or genetically influenced choice. Each model describes the discrete primary distribution at the onset of XCI, and the continuous secondary distribution accounting for changes to the XIP as a result of development and ageing. Statistical methods are used to compare models with empirical data from Danish and Utah populations. A rigorous data treatment strategy maximises information content and allows for unbiased use of unphased XIP data. The Anderson–Darling goodness-of-fit statistics and likelihood ratio tests indicate that a model of genetically influenced XCI choice better fits the empirical data than models of completely random choice. PMID:23652377
Bacterial clonal diagnostics as a tool for evidence-based empiric antibiotic selection
Tchesnokova, Veronika; Avagyan, Hovhannes; Rechkina, Elena; Chan, Diana; Muradova, Mariya; Haile, Helen Ghirmai; Radey, Matthew; Weissman, Scott; Riddell, Kim; Scholes, Delia; Johnson, James R.
2017-01-01
Despite the known clonal distribution of antibiotic resistance in many bacteria, empiric (pre-culture) antibiotic selection still relies heavily on species-level cumulative antibiograms, resulting in overuse of broad-spectrum agents and excessive antibiotic/pathogen mismatch. Urinary tract infections (UTIs), which account for a large share of antibiotic use, are caused predominantly by Escherichia coli, a highly clonal pathogen. In an observational clinical cohort study of urgent care patients with suspected UTI, we assessed the potential for E. coli clonal-level antibiograms to improve empiric antibiotic selection. A novel PCR-based clonotyping assay was applied to fresh urine samples to rapidly detect E. coli and the urine strain's clonotype. Based on a database of clonotype-specific antibiograms, the acceptability of various antibiotics for empiric therapy was inferred using a 20%, 10%, and 30% allowed resistance threshold. The test's performance characteristics and possible effects on prescribing were assessed. The rapid test identified E. coli clonotypes directly in patients’ urine within 25–35 minutes, with high specificity and sensitivity compared to culture. Antibiotic selection based on a clonotype-specific antibiogram could reduce the relative likelihood of antibiotic/pathogen mismatch by ≥ 60%. Compared to observed prescribing patterns, clonal diagnostics-guided antibiotic selection could safely double the use of trimethoprim/sulfamethoxazole and minimize fluoroquinolone use. In summary, a rapid clonotyping test showed promise for improving empiric antibiotic prescribing for E. coli UTI, including reversing preferential use of fluoroquinolones over trimethoprim/sulfamethoxazole. The clonal diagnostics approach merges epidemiologic surveillance, antimicrobial stewardship, and molecular diagnostics to bring evidence-based medicine directly to the point of care. PMID:28350870
Tsunami probability in the Caribbean Region
Parsons, T.; Geist, E.L.
2008-01-01
We calculated tsunami runup probability (in excess of 0.5 m) at coastal sites throughout the Caribbean region. We applied a Poissonian probability model because of the variety of uncorrelated tsunami sources in the region. Coastlines were discretized into 20 km by 20 km cells, and the mean tsunami runup rate was determined for each cell. The remarkable ???500-year empirical record compiled by O'Loughlin and Lander (2003) was used to calculate an empirical tsunami probability map, the first of three constructed for this study. However, it is unclear whether the 500-year record is complete, so we conducted a seismic moment-balance exercise using a finite-element model of the Caribbean-North American plate boundaries and the earthquake catalog, and found that moment could be balanced if the seismic coupling coefficient is c = 0.32. Modeled moment release was therefore used to generate synthetic earthquake sequences to calculate 50 tsunami runup scenarios for 500-year periods. We made a second probability map from numerically-calculated runup rates in each cell. Differences between the first two probability maps based on empirical and numerical-modeled rates suggest that each captured different aspects of tsunami generation; the empirical model may be deficient in primary plate-boundary events, whereas numerical model rates lack backarc fault and landslide sources. We thus prepared a third probability map using Bayesian likelihood functions derived from the empirical and numerical rate models and their attendant uncertainty to weight a range of rates at each 20 km by 20 km coastal cell. Our best-estimate map gives a range of 30-year runup probability from 0 - 30% regionally. ?? irkhaueser 2008.
Bacterial clonal diagnostics as a tool for evidence-based empiric antibiotic selection.
Tchesnokova, Veronika; Avagyan, Hovhannes; Rechkina, Elena; Chan, Diana; Muradova, Mariya; Haile, Helen Ghirmai; Radey, Matthew; Weissman, Scott; Riddell, Kim; Scholes, Delia; Johnson, James R; Sokurenko, Evgeni V
2017-01-01
Despite the known clonal distribution of antibiotic resistance in many bacteria, empiric (pre-culture) antibiotic selection still relies heavily on species-level cumulative antibiograms, resulting in overuse of broad-spectrum agents and excessive antibiotic/pathogen mismatch. Urinary tract infections (UTIs), which account for a large share of antibiotic use, are caused predominantly by Escherichia coli, a highly clonal pathogen. In an observational clinical cohort study of urgent care patients with suspected UTI, we assessed the potential for E. coli clonal-level antibiograms to improve empiric antibiotic selection. A novel PCR-based clonotyping assay was applied to fresh urine samples to rapidly detect E. coli and the urine strain's clonotype. Based on a database of clonotype-specific antibiograms, the acceptability of various antibiotics for empiric therapy was inferred using a 20%, 10%, and 30% allowed resistance threshold. The test's performance characteristics and possible effects on prescribing were assessed. The rapid test identified E. coli clonotypes directly in patients' urine within 25-35 minutes, with high specificity and sensitivity compared to culture. Antibiotic selection based on a clonotype-specific antibiogram could reduce the relative likelihood of antibiotic/pathogen mismatch by ≥ 60%. Compared to observed prescribing patterns, clonal diagnostics-guided antibiotic selection could safely double the use of trimethoprim/sulfamethoxazole and minimize fluoroquinolone use. In summary, a rapid clonotyping test showed promise for improving empiric antibiotic prescribing for E. coli UTI, including reversing preferential use of fluoroquinolones over trimethoprim/sulfamethoxazole. The clonal diagnostics approach merges epidemiologic surveillance, antimicrobial stewardship, and molecular diagnostics to bring evidence-based medicine directly to the point of care.
[Dependent relative: Effects on family health].
Estrada Fernández, M Eugenia; Gil Lacruz, Ana I; Gil Lacruz, Marta; Viñas López, Antonio
2018-01-01
The purpose of this work is to analyse the effects on informal caregiver's health and lifestyle when living with a dependent person at home. A comparison will be made between this situation and other situations involving commitment of time and energy, taking into account gender and age differences in each stage of the life cycle. Cross-sectional study analysing secondary data. The method used for collecting information is the computer assisted personal interview carried out in selected homes by the Ministry of Health, Social Services and Equality. The study included 19,351 participants aged over 25 years who completed the 2011-2012 Spanish National Health Survey. This research is based on demographic information obtained from a Spanish National Health Survey (2011/12). Using an empirical framework, the Logit model was select and the data reported as odds ratio. The estimations were repeated independently by sub-groups of age and gender. The study showed that the health of people who share their lives with a dependent person is worse than those who do not have any dependent person at home (they are 5 times at higher risk of developing health problems). The study found that being a woman, advance age, low educational level and does not work, also has an influence. Being a caregiver reduces the likelihood of maintaining a healthy lifestyle through physical exercise, relaxation, or eating a balanced diet. Living with a dependent person reduces the likelihood of maintaining healthy lifestyles and worsens the state of health of family members. Significant differences in gender and age were found. Copyright © 2017 Elsevier España, S.L.U. All rights reserved.
Martyna, Agnieszka; Michalska, Aleksandra; Zadora, Grzegorz
2015-05-01
The problem of interpretation of common provenance of the samples within the infrared spectra database of polypropylene samples from car body parts and plastic containers as well as Raman spectra databases of blue solid and metallic automotive paints was under investigation. The research involved statistical tools such as likelihood ratio (LR) approach for expressing the evidential value of observed similarities and differences in the recorded spectra. Since the LR models can be easily proposed for databases described by a few variables, research focused on the problem of spectra dimensionality reduction characterised by more than a thousand variables. The objective of the studies was to combine the chemometric tools easily dealing with multidimensionality with an LR approach. The final variables used for LR models' construction were derived from the discrete wavelet transform (DWT) as a data dimensionality reduction technique supported by methods for variance analysis and corresponded with chemical information, i.e. typical absorption bands for polypropylene and peaks associated with pigments present in the car paints. Univariate and multivariate LR models were proposed, aiming at obtaining more information about the chemical structure of the samples. Their performance was controlled by estimating the levels of false positive and false negative answers and using the empirical cross entropy approach. The results for most of the LR models were satisfactory and enabled solving the stated comparison problems. The results prove that the variables generated from DWT preserve signal characteristic, being a sparse representation of the original signal by keeping its shape and relevant chemical information.
Morales, Leonardo Fabio; Gordon-Larsen, Penny; Guilkey, David
2016-12-01
We estimate a structural dynamic model of the determinants of obesity. In addition to including many of the well-recognized endogenous factors mentioned in the literature as obesity determinants, we also model the individual's residential location as a choice variable, which is the main contribution of this paper to the literature. This allows us to control for an individual's self-selection into communities that possess the types of amenities in the built environment, which in turn affect their obesity-related behaviors such as physical activity (PA) and fast food consumption. We specify reduced form equations for a set of endogenous demand decisions, together with an obesity structural equation. The whole system of equations is jointly estimated by a semi-parametric full information log-likelihood method that allows for a general pattern of correlation in the errors across equations. Our model predicts a reduction in adult obesity of 7 percentage points as a result of a continued high level PA from adolescence into adulthood; a reduction of 0.7 (3) percentage points in adult obesity as a result of one standard deviation reduction in weekly fast food consumption for women (men); and a reduction of 0.02 (0.05) in adult obesity as a result of one standard deviation change in several neighborhood amenities for women (men). Another key finding is that controlling for residential self-selection has substantive implications. To our knowledge, this has not been yet documented within a full information maximum likelihood framework. Copyright © 2016 Elsevier B.V. All rights reserved.
Morales, Leonardo Fabio; Gordon-Larsen, Penny; Guilkey, David
2017-01-01
We estimate a structural dynamic model of the determinants of obesity. In addition to including many of the well-recognized endogenous factors mentioned in the literature as obesity determinants, we also model the individual’s residential location as a choice variable, which is the main contribution of this paper to the literature. This allows us to control for an individual's self-selection into communities that possess the types of amenities in the built environment, which in turn affect their obesity-related behaviors such as physical activity (PA) and fast food consumption. We specify reduced form equations for a set of endogenous demand decisions, together with an obesity structural equation. The whole system of equations is jointly estimated by a semi-parametric full information log-likelihood method that allows for a general pattern of correlation in the errors across equations. Our model predicts a reduction in adult obesity of 7 percentage points as a result of a continued high level PA from adolescence into adulthood; a reduction of 0.7 (3) percentage points in adult obesity as a result of one standard deviation reduction in weekly fast food consumption for women (men); and a reduction of 0.02 (0.05) in adult obesity as a result of one standard deviation change in several neighborhood amenities for women (men). Another key finding is that controlling for residential self-selection has substantive implications. To our knowledge, this has not been yet documented within a full information maximum likelihood framework. PMID:27459276
Dose-finding designs using a novel quasi-continuous endpoint for multiple toxicities
Ezzalfani, Monia; Zohar, Sarah; Qin, Rui; Mandrekar, Sumithra J; Deley, Marie-Cécile Le
2013-01-01
The aim of a phase I oncology trial is to identify a dose with an acceptable safety profile. Most phase I designs use the dose-limiting toxicity, a binary endpoint, to assess the unacceptable level of toxicity. The dose-limiting toxicity might be incomplete for investigating molecularly targeted therapies as much useful toxicity information is discarded. In this work, we propose a quasi-continuous toxicity score, the total toxicity profile (TTP), to measure quantitatively and comprehensively the overall severity of multiple toxicities. We define the TTP as the Euclidean norm of the weights of toxicities experienced by a patient, where the weights reflect the relative clinical importance of each grade and toxicity type. We propose a dose-finding design, the quasi-likelihood continual reassessment method (CRM), incorporating the TTP score into the CRM, with a logistic model for the dose–toxicity relationship in a frequentist framework. Using simulations, we compared our design with three existing designs for quasi-continuous toxicity score (the Bayesian quasi-CRM with an empiric model and two nonparametric designs), all using the TTP score, under eight different scenarios. All designs using the TTP score to identify the recommended dose had good performance characteristics for most scenarios, with good overdosing control. For a sample size of 36, the percentage of correct selection for the quasi-likelihood CRM ranged from 80% to 90%, with similar results for the quasi-CRM design. These designs with TTP score present an appealing alternative to the conventional dose-finding designs, especially in the context of molecularly targeted agents. PMID:23335156
ERIC Educational Resources Information Center
Le, Huy; Schmidt, Frank L.; Harter, James K.; Lauver, Kristy J.
2010-01-01
Construct empirical redundancy may be a major problem in organizational research today. In this paper, we explain and empirically illustrate a method for investigating this potential problem. We applied the method to examine the empirical redundancy of job satisfaction (JS) and organizational commitment (OC), two well-established organizational…
Likelihoods for fixed rank nomination networks
HOFF, PETER; FOSDICK, BAILEY; VOLFOVSKY, ALEX; STOVEL, KATHERINE
2014-01-01
Many studies that gather social network data use survey methods that lead to censored, missing, or otherwise incomplete information. For example, the popular fixed rank nomination (FRN) scheme, often used in studies of schools and businesses, asks study participants to nominate and rank at most a small number of contacts or friends, leaving the existence of other relations uncertain. However, most statistical models are formulated in terms of completely observed binary networks. Statistical analyses of FRN data with such models ignore the censored and ranked nature of the data and could potentially result in misleading statistical inference. To investigate this possibility, we compare Bayesian parameter estimates obtained from a likelihood for complete binary networks with those obtained from likelihoods that are derived from the FRN scheme, and therefore accommodate the ranked and censored nature of the data. We show analytically and via simulation that the binary likelihood can provide misleading inference, particularly for certain model parameters that relate network ties to characteristics of individuals and pairs of individuals. We also compare these different likelihoods in a data analysis of several adolescent social networks. For some of these networks, the parameter estimates from the binary and FRN likelihoods lead to different conclusions, indicating the importance of analyzing FRN data with a method that accounts for the FRN survey design. PMID:25110586
General Metropolis-Hastings jump diffusions for automatic target recognition in infrared scenes
NASA Astrophysics Data System (ADS)
Lanterman, Aaron D.; Miller, Michael I.; Snyder, Donald L.
1997-04-01
To locate and recognize ground-based targets in forward- looking IR (FLIR) images, 3D faceted models with associated pose parameters are formulated to accommodate the variability found in FLIR imagery. Taking a Bayesian approach, scenes are simulated from the emissive characteristics of the CAD models and compared with the collected data by a likelihood function based on sensor statistics. This likelihood is combined with a prior distribution defined over the set of possible scenes to form a posterior distribution. To accommodate scenes with variable numbers of targets, the posterior distribution is defined over parameter vectors of varying dimension. An inference algorithm based on Metropolis-Hastings jump- diffusion processes empirically samples from the posterior distribution, generating configurations of templates and transformations that match the collected sensor data with high probability. The jumps accommodate the addition and deletion of targets and the estimation of target identities; diffusions refine the hypotheses by drifting along the gradient of the posterior distribution with respect to the orientation and position parameters. Previous results on jumps strategies analogous to the Metropolis acceptance/rejection algorithm, with proposals drawn from the prior and accepted based on the likelihood, are extended to encompass general Metropolis-Hastings proposal densities. In particular, the algorithm proposes moves by drawing from the posterior distribution over computationally tractible subsets of the parameter space. The algorithm is illustrated by an implementation on a Silicon Graphics Onyx/Reality Engine.
Hearing loss and disability exit: Measurement issues and coping strategies.
Christensen, Vibeke Tornhøj; Datta Gupta, Nabanita
2017-02-01
Hearing loss is one of the most common conditions related to aging, and previous descriptive evidence links it to early exit from the labor market. These studies are usually based on self-reported hearing difficulties, which are potentially endogenous to labor supply. We use unique representative data collected in the spring of 2005 through in-home interviews. The data contains self-reported functional and clinically-measured hearing ability for a representative sample of the Danish population aged 50-64. We estimate the causal effect of hearing loss on early retirement via disability benefits, taking into account the endogeneity of functional hearing. Our identification strategy involves the simultaneous estimation of labor supply, functional hearing, and coping strategies (i.e. accessing assistive devices at work or informing one's employer about the problem). We use hearing aids as an instrument for functional hearing. Our main empirical findings are that endogeneity bias is more severe for men than women and that functional hearing problems significantly increase the likelihood of receiving disability benefits for both men and women. However, relative to the baseline the effect is larger for men (47% vs. 20%, respectively). Availability of assistive devices in the workplace decreases the likelihood of receiving disability benefits, whereas informing an employer about hearing problems increases this likelihood. Copyright © 2016 Elsevier B.V. All rights reserved.
Finite mixture model: A maximum likelihood estimation approach on time series data
NASA Astrophysics Data System (ADS)
Yen, Phoong Seuk; Ismail, Mohd Tahir; Hamzah, Firdaus Mohamad
2014-09-01
Recently, statistician emphasized on the fitting of finite mixture model by using maximum likelihood estimation as it provides asymptotic properties. In addition, it shows consistency properties as the sample sizes increases to infinity. This illustrated that maximum likelihood estimation is an unbiased estimator. Moreover, the estimate parameters obtained from the application of maximum likelihood estimation have smallest variance as compared to others statistical method as the sample sizes increases. Thus, maximum likelihood estimation is adopted in this paper to fit the two-component mixture model in order to explore the relationship between rubber price and exchange rate for Malaysia, Thailand, Philippines and Indonesia. Results described that there is a negative effect among rubber price and exchange rate for all selected countries.
Maximum-Likelihood Methods for Processing Signals From Gamma-Ray Detectors
Barrett, Harrison H.; Hunter, William C. J.; Miller, Brian William; Moore, Stephen K.; Chen, Yichun; Furenlid, Lars R.
2009-01-01
In any gamma-ray detector, each event produces electrical signals on one or more circuit elements. From these signals, we may wish to determine the presence of an interaction; whether multiple interactions occurred; the spatial coordinates in two or three dimensions of at least the primary interaction; or the total energy deposited in that interaction. We may also want to compute listmode probabilities for tomographic reconstruction. Maximum-likelihood methods provide a rigorous and in some senses optimal approach to extracting this information, and the associated Fisher information matrix provides a way of quantifying and optimizing the information conveyed by the detector. This paper will review the principles of likelihood methods as applied to gamma-ray detectors and illustrate their power with recent results from the Center for Gamma-ray Imaging. PMID:20107527
Cosmological parameter estimation using Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Prasad, J.; Souradeep, T.
2014-03-01
Constraining parameters of a theoretical model from observational data is an important exercise in cosmology. There are many theoretically motivated models, which demand greater number of cosmological parameters than the standard model of cosmology uses, and make the problem of parameter estimation challenging. It is a common practice to employ Bayesian formalism for parameter estimation for which, in general, likelihood surface is probed. For the standard cosmological model with six parameters, likelihood surface is quite smooth and does not have local maxima, and sampling based methods like Markov Chain Monte Carlo (MCMC) method are quite successful. However, when there are a large number of parameters or the likelihood surface is not smooth, other methods may be more effective. In this paper, we have demonstrated application of another method inspired from artificial intelligence, called Particle Swarm Optimization (PSO) for estimating cosmological parameters from Cosmic Microwave Background (CMB) data taken from the WMAP satellite.
Maximal likelihood correspondence estimation for face recognition across pose.
Li, Shaoxin; Liu, Xin; Chai, Xiujuan; Zhang, Haihong; Lao, Shihong; Shan, Shiguang
2014-10-01
Due to the misalignment of image features, the performance of many conventional face recognition methods degrades considerably in across pose scenario. To address this problem, many image matching-based methods are proposed to estimate semantic correspondence between faces in different poses. In this paper, we aim to solve two critical problems in previous image matching-based correspondence learning methods: 1) fail to fully exploit face specific structure information in correspondence estimation and 2) fail to learn personalized correspondence for each probe image. To this end, we first build a model, termed as morphable displacement field (MDF), to encode face specific structure information of semantic correspondence from a set of real samples of correspondences calculated from 3D face models. Then, we propose a maximal likelihood correspondence estimation (MLCE) method to learn personalized correspondence based on maximal likelihood frontal face assumption. After obtaining the semantic correspondence encoded in the learned displacement, we can synthesize virtual frontal images of the profile faces for subsequent recognition. Using linear discriminant analysis method with pixel-intensity features, state-of-the-art performance is achieved on three multipose benchmarks, i.e., CMU-PIE, FERET, and MultiPIE databases. Owe to the rational MDF regularization and the usage of novel maximal likelihood objective, the proposed MLCE method can reliably learn correspondence between faces in different poses even in complex wild environment, i.e., labeled face in the wild database.
Program for Weibull Analysis of Fatigue Data
NASA Technical Reports Server (NTRS)
Krantz, Timothy L.
2005-01-01
A Fortran computer program has been written for performing statistical analyses of fatigue-test data that are assumed to be adequately represented by a two-parameter Weibull distribution. This program calculates the following: (1) Maximum-likelihood estimates of the Weibull distribution; (2) Data for contour plots of relative likelihood for two parameters; (3) Data for contour plots of joint confidence regions; (4) Data for the profile likelihood of the Weibull-distribution parameters; (5) Data for the profile likelihood of any percentile of the distribution; and (6) Likelihood-based confidence intervals for parameters and/or percentiles of the distribution. The program can account for tests that are suspended without failure (the statistical term for such suspension of tests is "censoring"). The analytical approach followed in this program for the software is valid for type-I censoring, which is the removal of unfailed units at pre-specified times. Confidence regions and intervals are calculated by use of the likelihood-ratio method.
Less-Complex Method of Classifying MPSK
NASA Technical Reports Server (NTRS)
Hamkins, Jon
2006-01-01
An alternative to an optimal method of automated classification of signals modulated with M-ary phase-shift-keying (M-ary PSK or MPSK) has been derived. The alternative method is approximate, but it offers nearly optimal performance and entails much less complexity, which translates to much less computation time. Modulation classification is becoming increasingly important in radio-communication systems that utilize multiple data modulation schemes and include software-defined or software-controlled receivers. Such a receiver may "know" little a priori about an incoming signal but may be required to correctly classify its data rate, modulation type, and forward error-correction code before properly configuring itself to acquire and track the symbol timing, carrier frequency, and phase, and ultimately produce decoded bits. Modulation classification has long been an important component of military interception of initially unknown radio signals transmitted by adversaries. Modulation classification may also be useful for enabling cellular telephones to automatically recognize different signal types and configure themselves accordingly. The concept of modulation classification as outlined in the preceding paragraph is quite general. However, at the present early stage of development, and for the purpose of describing the present alternative method, the term "modulation classification" or simply "classification" signifies, more specifically, a distinction between M-ary and M'-ary PSK, where M and M' represent two different integer multiples of 2. Both the prior optimal method and the present alternative method require the acquisition of magnitude and phase values of a number (N) of consecutive baseband samples of the incoming signal + noise. The prior optimal method is based on a maximum- likelihood (ML) classification rule that requires a calculation of likelihood functions for the M and M' hypotheses: Each likelihood function is an integral, over a full cycle of carrier phase, of a complicated sum of functions of the baseband sample values, the carrier phase, the carrier-signal and noise magnitudes, and M or M'. Then the likelihood ratio, defined as the ratio between the likelihood functions, is computed, leading to the choice of whichever hypothesis - M or M'- is more likely. In the alternative method, the integral in each likelihood function is approximated by a sum over values of the integrand sampled at a number, 1, of equally spaced values of carrier phase. Used in this way, 1 is a parameter that can be adjusted to trade computational complexity against the probability of misclassification. In the limit as 1 approaches infinity, one obtains the integral form of the likelihood function and thus recovers the ML classification. The present approximate method has been tested in comparison with the ML method by means of computational simulations. The results of the simulations have shown that the performance (as quantified by probability of misclassification) of the approximate method is nearly indistinguishable from that of the ML method (see figure).
Technical Note: Approximate Bayesian parameterization of a process-based tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2014-02-01
Inverse parameter estimation of process-based models is a long-standing problem in many scientific disciplines. A key question for inverse parameter estimation is how to define the metric that quantifies how well model predictions fit to the data. This metric can be expressed by general cost or objective functions, but statistical inversion methods require a particular metric, the probability of observing the data given the model parameters, known as the likelihood. For technical and computational reasons, likelihoods for process-based stochastic models are usually based on general assumptions about variability in the observed data, and not on the stochasticity generated by the model. Only in recent years have new methods become available that allow the generation of likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional Markov chain Monte Carlo (MCMC) sampler, performs well in retrieving known parameter values from virtual inventory data generated by the forest model. We analyze the results of the parameter estimation, examine its sensitivity to the choice and aggregation of model outputs and observed data (summary statistics), and demonstrate the application of this method by fitting the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss how this approach differs from approximate Bayesian computation (ABC), another method commonly used to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can be successfully applied to process-based models of high complexity. The methodology is particularly suitable for heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models.
Assessing Consumer Responses to PREPs: A Review of Tobacco Industry and Independent Research Methods
Rees, Vaughan W.; Kreslake, Jennifer M.; Cummings, K. Michael; O'Connor, Richard J.; Hatsukami, Dorothy K.; Parascandola, Mark; Shields, Peter G.; Connolly, Gregory N.
2009-01-01
Objective Internal tobacco industry documents and the mainstream literature are reviewed to identify methods and measures for evaluating tobacco consumer response. The review aims to outline areas in which established methods exist, identify gaps in current methods for assessing CR, and consider how these methods might be applied to evaluate PREPs and new products. Methods Internal industry research reviewed included published papers, manuscript drafts, presentations, protocols, and instruments relating to consumer response measures were identified and analyzed. Peer-reviewed research was identified using PubMed and Scopus. Results Industry research on consumer response focuses on product development and marketing. To develop and refine new products, the tobacco industry has developed notable strategies for assessing consumers' sensory and subjective responses to product design characteristics. Independent research is often conducted to gauge the likelihood of future product adoption by measuring consumers' risk perceptions, responses to product, and product acceptability. Conclusions A model which conceptualizes consumer response as comprising the separate, but interacting domains of product perceptions and response to product is outlined. Industry and independent research supports the dual domain model, and provides a wide range of methods for assessment of the construct components of consumer response. Further research is needed to validate consumer response constructs, determine the relationship between consumer response and tobacco user behavior, and improve reliability of consumer response measures. Scientifically rigorous consumer response assessment methods will provide a needed empirical basis for future regulation of PREPs and new products, to counteract tobacco industry influence on consumers, and enhance the public health. PMID:19959675
DOE Office of Scientific and Technical Information (OSTI.GOV)
Washeleski, Robert L.; Meyer, Edmond J. IV; King, Lyon B.
2013-10-15
Laser Thomson scattering (LTS) is an established plasma diagnostic technique that has seen recent application to low density plasmas. It is difficult to perform LTS measurements when the scattered signal is weak as a result of low electron number density, poor optical access to the plasma, or both. Photon counting methods are often implemented in order to perform measurements in these low signal conditions. However, photon counting measurements performed with photo-multiplier tubes are time consuming and multi-photon arrivals are incorrectly recorded. In order to overcome these shortcomings a new data analysis method based on maximum likelihood estimation was developed. Themore » key feature of this new data processing method is the inclusion of non-arrival events in determining the scattered Thomson signal. Maximum likelihood estimation and its application to Thomson scattering at low signal levels is presented and application of the new processing method to LTS measurements performed in the plume of a 2-kW Hall-effect thruster is discussed.« less
Washeleski, Robert L; Meyer, Edmond J; King, Lyon B
2013-10-01
Laser Thomson scattering (LTS) is an established plasma diagnostic technique that has seen recent application to low density plasmas. It is difficult to perform LTS measurements when the scattered signal is weak as a result of low electron number density, poor optical access to the plasma, or both. Photon counting methods are often implemented in order to perform measurements in these low signal conditions. However, photon counting measurements performed with photo-multiplier tubes are time consuming and multi-photon arrivals are incorrectly recorded. In order to overcome these shortcomings a new data analysis method based on maximum likelihood estimation was developed. The key feature of this new data processing method is the inclusion of non-arrival events in determining the scattered Thomson signal. Maximum likelihood estimation and its application to Thomson scattering at low signal levels is presented and application of the new processing method to LTS measurements performed in the plume of a 2-kW Hall-effect thruster is discussed.
Optimal filtering and Bayesian detection for friction-based diagnostics in machines.
Ray, L R; Townsend, J R; Ramasubramanian, A
2001-01-01
Non-model-based diagnostic methods typically rely on measured signals that must be empirically related to process behavior or incipient faults. The difficulty in interpreting a signal that is indirectly related to the fundamental process behavior is significant. This paper presents an integrated non-model and model-based approach to detecting when process behavior varies from a proposed model. The method, which is based on nonlinear filtering combined with maximum likelihood hypothesis testing, is applicable to dynamic systems whose constitutive model is well known, and whose process inputs are poorly known. Here, the method is applied to friction estimation and diagnosis during motion control in a rotating machine. A nonlinear observer estimates friction torque in a machine from shaft angular position measurements and the known input voltage to the motor. The resulting friction torque estimate can be analyzed directly for statistical abnormalities, or it can be directly compared to friction torque outputs of an applicable friction process model in order to diagnose faults or model variations. Nonlinear estimation of friction torque provides a variable on which to apply diagnostic methods that is directly related to model variations or faults. The method is evaluated experimentally by its ability to detect normal load variations in a closed-loop controlled motor driven inertia with bearing friction and an artificially-induced external line contact. Results show an ability to detect statistically significant changes in friction characteristics induced by normal load variations over a wide range of underlying friction behaviors.
2015-08-01
McCullagh, P.; Nelder, J.A. Generalized Linear Model , 2nd ed.; Chapman and Hall: London, 1989. 7. Johnston, J. Econometric Methods, 3rd ed.; McGraw...FOR A DOSE-RESPONSE MODEL ECBC-TN-068 Kyong H. Park Steven J. Lagan RESEARCH AND TECHNOLOGY DIRECTORATE August 2015 Approved for public release...Likelihood Estimation Method for Completely Separated and Quasi-Completely Separated Data for a Dose-Response Model 5a. CONTRACT NUMBER 5b. GRANT
Schwartz, Rachel S; Mueller, Rachel L
2010-01-11
Estimates of divergence dates between species improve our understanding of processes ranging from nucleotide substitution to speciation. Such estimates are frequently based on molecular genetic differences between species; therefore, they rely on accurate estimates of the number of such differences (i.e. substitutions per site, measured as branch length on phylogenies). We used simulations to determine the effects of dataset size, branch length heterogeneity, branch depth, and analytical framework on branch length estimation across a range of branch lengths. We then reanalyzed an empirical dataset for plethodontid salamanders to determine how inaccurate branch length estimation can affect estimates of divergence dates. The accuracy of branch length estimation varied with branch length, dataset size (both number of taxa and sites), branch length heterogeneity, branch depth, dataset complexity, and analytical framework. For simple phylogenies analyzed in a Bayesian framework, branches were increasingly underestimated as branch length increased; in a maximum likelihood framework, longer branch lengths were somewhat overestimated. Longer datasets improved estimates in both frameworks; however, when the number of taxa was increased, estimation accuracy for deeper branches was less than for tip branches. Increasing the complexity of the dataset produced more misestimated branches in a Bayesian framework; however, in an ML framework, more branches were estimated more accurately. Using ML branch length estimates to re-estimate plethodontid salamander divergence dates generally resulted in an increase in the estimated age of older nodes and a decrease in the estimated age of younger nodes. Branch lengths are misestimated in both statistical frameworks for simulations of simple datasets. However, for complex datasets, length estimates are quite accurate in ML (even for short datasets), whereas few branches are estimated accurately in a Bayesian framework. Our reanalysis of empirical data demonstrates the magnitude of effects of Bayesian branch length misestimation on divergence date estimates. Because the length of branches for empirical datasets can be estimated most reliably in an ML framework when branches are <1 substitution/site and datasets are > or =1 kb, we suggest that divergence date estimates using datasets, branch lengths, and/or analytical techniques that fall outside of these parameters should be interpreted with caution.
Filtration of human EEG recordings from physiological artifacts with empirical mode method
NASA Astrophysics Data System (ADS)
Grubov, Vadim V.; Runnova, Anastasiya E.; Khramova, Marina V.
2017-03-01
In the paper we propose the new method for dealing with noise and physiological artifacts in experimental human EEG recordings. The method is based on analysis of EEG signals with empirical mode decomposition (Hilbert-Huang transform). We consider noises and physiological artifacts on EEG as specific oscillatory patterns that cause problems during EEG analysis and can be detected with additional signals recorded simultaneously with EEG (ECG, EMG, EOG, etc.) We introduce the algorithm of the method with following steps: empirical mode decomposition of EEG signal, choosing of empirical modes with artifacts, removing empirical modes with artifacts, reconstruction of the initial EEG signal. We test the method on filtration of experimental human EEG signals from eye-moving artifacts and show high efficiency of the method.
An empirical study of scanner system parameters
NASA Technical Reports Server (NTRS)
Landgrebe, D.; Biehl, L.; Simmons, W.
1976-01-01
The selection of the current combination of parametric values (instantaneous field of view, number and location of spectral bands, signal-to-noise ratio, etc.) of a multispectral scanner is a complex problem due to the strong interrelationship these parameters have with one another. The study was done with the proposed scanner known as Thematic Mapper in mind. Since an adequate theoretical procedure for this problem has apparently not yet been devised, an empirical simulation approach was used with candidate parameter values selected by the heuristic means. The results obtained using a conventional maximum likelihood pixel classifier suggest that although the classification accuracy declines slightly as the IFOV is decreased this is more than made up by an improved mensuration accuracy. Further, the use of a classifier involving both spatial and spectral features shows a very substantial tendency to resist degradation as the signal-to-noise ratio is decreased. And finally, further evidence is provided of the importance of having at least one spectral band in each of the major available portions of the optical spectrum.
ERIC Educational Resources Information Center
Molenaar, Peter C. M.; Nesselroade, John R.
1998-01-01
Pseudo-Maximum Likelihood (p-ML) and Asymptotically Distribution Free (ADF) estimation methods for estimating dynamic factor model parameters within a covariance structure framework were compared through a Monte Carlo simulation. Both methods appear to give consistent model parameter estimates, but only ADF gives standard errors and chi-square…
Estimation Methods for Non-Homogeneous Regression - Minimum CRPS vs Maximum Likelihood
NASA Astrophysics Data System (ADS)
Gebetsberger, Manuel; Messner, Jakob W.; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Non-homogeneous regression models are widely used to statistically post-process numerical weather prediction models. Such regression models correct for errors in mean and variance and are capable to forecast a full probability distribution. In order to estimate the corresponding regression coefficients, CRPS minimization is performed in many meteorological post-processing studies since the last decade. In contrast to maximum likelihood estimation, CRPS minimization is claimed to yield more calibrated forecasts. Theoretically, both scoring rules used as an optimization score should be able to locate a similar and unknown optimum. Discrepancies might result from a wrong distributional assumption of the observed quantity. To address this theoretical concept, this study compares maximum likelihood and minimum CRPS estimation for different distributional assumptions. First, a synthetic case study shows that, for an appropriate distributional assumption, both estimation methods yield to similar regression coefficients. The log-likelihood estimator is slightly more efficient. A real world case study for surface temperature forecasts at different sites in Europe confirms these results but shows that surface temperature does not always follow the classical assumption of a Gaussian distribution. KEYWORDS: ensemble post-processing, maximum likelihood estimation, CRPS minimization, probabilistic temperature forecasting, distributional regression models
el Galta, Rachid; Uitte de Willige, Shirley; de Visser, Marieke C H; Helmer, Quinta; Hsu, Li; Houwing-Duistermaat, Jeanine J
2007-09-24
In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known. By means of a simulation study, we compare the performance of the score statistic to Pearson's chi-square statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study. We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chi-square statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.
Rees, Vaughan W; Kreslake, Jennifer M; Cummings, K Michael; O'Connor, Richard J; Hatsukami, Dorothy K; Parascandola, Mark; Shields, Peter G; Connolly, Gregory N
2009-12-01
Internal tobacco industry documents and the mainstream literature are reviewed to identify methods and measures for evaluating tobacco consumer response. The review aims to outline areas in which established methods exist, identify gaps in current methods for assessing consumer response, and consider how these methods might be applied to evaluate potentially reduced exposure tobacco products and new products. Internal industry research reviewed included published articles, manuscript drafts, presentations, protocols, and instruments relating to consumer response measures were identified and analyzed. Peer-reviewed research was identified using PubMed and Scopus. Industry research on consumer response focuses on product development and marketing. To develop and refine new products, the tobacco industry has developed notable strategies for assessing consumers' sensory and subjective responses to product design characteristics. Independent research is often conducted to gauge the likelihood of future product adoption by measuring consumers' risk perceptions, responses to product, and product acceptability. A model that conceptualizes consumer response as comprising the separate, but interacting, domains of product perceptions and response to product is outlined. Industry and independent research supports the dual domain model and provides a wide range of methods for assessment of the construct components of consumer response. Further research is needed to validate consumer response constructs, determine the relationship between consumer response and tobacco user behavior, and improve reliability of consumer response measures. Scientifically rigorous consumer response assessment methods will provide a needed empirical basis for future regulation of potentially reduced-exposure tobacco products and new products, to counteract tobacco industry influence on consumers, and enhance the public health.
Wright, Eric R; Wright, Dustin E; Kooreman, Harold E; Anderson, Jeffrey A
2006-05-01
While both theory and empirical research regarding work team performance suggests that conflict can play an important role in determining productivity and other outcomes, the impact of conflict on the effectiveness of service coordination teams is not well understood. In this study, the team records and charts of 189 young people maintained by service coordinators in a system of care initiative were analyzed to identify the number of intra-team conflicts, the participants involved in each conflict, the theme of each conflict and their relationship with the likelihood that young people were successful in meeting their treatment goals. Findings indicate that interpersonal concerns and concerns about team member follow-through were the most frequent types of conflict. More important, our analyses suggest that more frequent conflicts significantly increased the likelihood that a child and family team (CFT) was unsuccessful in helping the youth and family achieve the desired treatment goals. The results underline the need for further research on how structure and functioning of services coordination teams impact youth and family outcomes.
The role of landscape-dependent disturbance and dispersal in metapopulation persistence.
Elkin, Ché M; Possingham, Hugh
2008-10-01
The fundamental processes that influence metapopulation dynamics (extinction and recolonization) will often depend on landscape structure. Disturbances that increase patch extinction rates will frequently be landscape dependent such that they are spatially aggregated and have an increased likelihood of occurring in some areas. Similarly, landscape structure can influence organism movement, producing asymmetric dispersal between patches. Using a stochastic, spatially explicit model, we examine how landscape-dependent correlations between dispersal and disturbance rates influence metapopulation dynamics. Habitat patches that are situated in areas where the likelihood of disturbance is low will experience lower extinction rates and will function as partial refuges. We discovered that the presence of partial refuges increases metapopulation viability and that the value of partial refuges was contingent on whether dispersal was also landscape dependent. Somewhat counterintuitively, metapopulation viability was reduced when individuals had a preponderance to disperse away from refuges and was highest when there was biased dispersal toward refuges. Our work demonstrates that landscape structure needs to be incorporated into metapopulation models when there is either empirical data or ecological rationale for extinction and/or dispersal rates being landscape dependent.
Unwinding the hairball graph: Pruning algorithms for weighted complex networks
NASA Astrophysics Data System (ADS)
Dianati, Navid
2016-01-01
Empirical networks of weighted dyadic relations often contain "noisy" edges that alter the global characteristics of the network and obfuscate the most important structures therein. Graph pruning is the process of identifying the most significant edges according to a generative null model and extracting the subgraph consisting of those edges. Here, we focus on integer-weighted graphs commonly arising when weights count the occurrences of an "event" relating the nodes. We introduce a simple and intuitive null model related to the configuration model of network generation and derive two significance filters from it: the marginal likelihood filter (MLF) and the global likelihood filter (GLF). The former is a fast algorithm assigning a significance score to each edge based on the marginal distribution of edge weights, whereas the latter is an ensemble approach which takes into account the correlations among edges. We apply these filters to the network of air traffic volume between US airports and recover a geographically faithful representation of the graph. Furthermore, compared with thresholding based on edge weight, we show that our filters extract a larger and significantly sparser giant component.
Urquhart, Erin A; Jones, Stephen H; Yu, Jong W; Schuster, Brian M; Marcinkiewicz, Ashley L; Whistler, Cheryl A; Cooper, Vaughn S
2016-01-01
Reports from state health departments and the Centers for Disease Control and Prevention indicate that the annual number of reported human vibriosis cases in New England has increased in the past decade. Concurrently, there has been a shift in both the spatial distribution and seasonal detection of Vibrio spp. throughout the region based on limited monitoring data. To determine environmental factors that may underlie these emerging conditions, this study focuses on a long-term database of Vibrio parahaemolyticus concentrations in oyster samples generated from data collected from the Great Bay Estuary, New Hampshire over a period of seven consecutive years. Oyster samples from two distinct sites were analyzed for V. parahaemolyticus abundance, noting significant relationships with various biotic and abiotic factors measured during the same period of study. We developed a predictive modeling tool capable of estimating the likelihood of V. parahaemolyticus presence in coastal New Hampshire oysters. Results show that the inclusion of chlorophyll a concentration to an empirical model otherwise employing only temperature and salinity variables, offers improved predictive capability for modeling the likelihood of V. parahaemolyticus in the Great Bay Estuary.
Life shocks and crime: a test of the "turning point" hypothesis.
Corman, Hope; Noonan, Kelly; Reichman, Nancy E; Schwartz-Soicher, Ofira
2011-08-01
Other researchers have posited that important events in men's lives-such as employment, marriage, and parenthood-strengthen their social ties and lead them to refrain from crime. A challenge in empirically testing this hypothesis has been the issue of self-selection into life transitions. This study contributes to this literature by estimating the effects of an exogenous life shock on crime. We use data from the Fragile Families and Child Wellbeing Study, augmented with information from hospital medical records, to estimate the effects of the birth of a child with a severe health problem on the likelihood that the infant's father engages in illegal activities. We conduct a number of auxiliary analyses to examine exogeneity assumptions. We find that having an infant born with a severe health condition increases the likelihood that the father is convicted of a crime in the three-year period following the birth of the child, and at least part of the effect appears to operate through work and changes in parental relationships. These results provide evidence that life events can cause crime and, as such, support the "turning point" hypothesis.
Evolutionary genetic analyses of MEF2C gene: implications for learning and memory in Homo sapiens.
Kalmady, Sunil V; Venkatasubramanian, Ganesan; Arasappa, Rashmi; Rao, Naren P
2013-02-01
MEF2C facilitates context-dependent fear conditioning (CFC) which is a salient aspect of hippocampus-dependent learning and memory. CFC might have played a crucial role in human evolution because of its advantageous influence on survival of species. In this study, we analyzed 23 orthologous mammalian gene sequences of MEF2C gene to examine the evidence for positive selection on this gene in Homo sapiens using Phylogenetic Analysis by Maximum Likelihood (PAML) and HyPhy software. Both PAML Bayes Empirical Bayes (BEB) and HyPhy Fixed Effects Likelihood (FEL) analyses supported significant positive selection on 4 codon sites in H. sapiens. Also, haplotter analysis revealed significant ongoing positive selection on this gene in Central European population. The study findings suggest that adaptive selective pressure on this gene might have influenced human evolution. Further research on this gene might unravel the potential role of this gene in learning and memory as well as its pathogenetic effect in certain hippocampal disorders with evolutionary basis like schizophrenia. Copyright © 2012 Elsevier B.V. All rights reserved.
Yu, Peng; Shaw, Chad A
2014-06-01
The Dirichlet-multinomial (DMN) distribution is a fundamental model for multicategory count data with overdispersion. This distribution has many uses in bioinformatics including applications to metagenomics data, transctriptomics and alternative splicing. The DMN distribution reduces to the multinomial distribution when the overdispersion parameter ψ is 0. Unfortunately, numerical computation of the DMN log-likelihood function by conventional methods results in instability in the neighborhood of [Formula: see text]. An alternative formulation circumvents this instability, but it leads to long runtimes that make it impractical for large count data common in bioinformatics. We have developed a new method for computation of the DMN log-likelihood to solve the instability problem without incurring long runtimes. The new approach is composed of a novel formula and an algorithm to extend its applicability. Our numerical experiments show that this new method both improves the accuracy of log-likelihood evaluation and the runtime by several orders of magnitude, especially in high-count data situations that are common in deep sequencing data. Using real metagenomic data, our method achieves manyfold runtime improvement. Our method increases the feasibility of using the DMN distribution to model many high-throughput problems in bioinformatics. We have included in our work an R package giving access to this method and a vingette applying this approach to metagenomic data. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Harbert, Robert S; Nixon, Kevin C
2015-08-01
• Plant distributions have long been understood to be correlated with the environmental conditions to which species are adapted. Climate is one of the major components driving species distributions. Therefore, it is expected that the plants coexisting in a community are reflective of the local environment, particularly climate.• Presented here is a method for the estimation of climate from local plant species coexistence data. The method, Climate Reconstruction Analysis using Coexistence Likelihood Estimation (CRACLE), is a likelihood-based method that employs specimen collection data at a global scale for the inference of species climate tolerance. CRACLE calculates the maximum joint likelihood of coexistence given individual species climate tolerance characterization to estimate the expected climate.• Plant distribution data for more than 4000 species were used to show that this method accurately infers expected climate profiles for 165 sites with diverse climatic conditions. Estimates differ from the WorldClim global climate model by less than 1.5°C on average for mean annual temperature and less than ∼250 mm for mean annual precipitation. This is a significant improvement upon other plant-based climate-proxy methods.• CRACLE validates long hypothesized interactions between climate and local associations of plant species. Furthermore, CRACLE successfully estimates climate that is consistent with the widely used WorldClim model and therefore may be applied to the quantitative estimation of paleoclimate in future studies. © 2015 Botanical Society of America, Inc.
DOT National Transportation Integrated Search
1980-06-01
Volume 5 evaluates empirical methods in tunneling. Empirical methods that avoid the use of an explicit model by relating ground conditions to observed prototype behavior have played a major role in tunnel design. The main objective of this volume is ...
Cha, Kenny H.; Hadjiiski, Lubomir; Samala, Ravi K.; Chan, Heang-Ping; Caoili, Elaine M.; Cohan, Richard H.
2016-01-01
Purpose: The authors are developing a computerized system for bladder segmentation in CT urography (CTU) as a critical component for computer-aided detection of bladder cancer. Methods: A deep-learning convolutional neural network (DL-CNN) was trained to distinguish between the inside and the outside of the bladder using 160 000 regions of interest (ROI) from CTU images. The trained DL-CNN was used to estimate the likelihood of an ROI being inside the bladder for ROIs centered at each voxel in a CTU case, resulting in a likelihood map. Thresholding and hole-filling were applied to the map to generate the initial contour for the bladder, which was then refined by 3D and 2D level sets. The segmentation performance was evaluated using 173 cases: 81 cases in the training set (42 lesions, 21 wall thickenings, and 18 normal bladders) and 92 cases in the test set (43 lesions, 36 wall thickenings, and 13 normal bladders). The computerized segmentation accuracy using the DL likelihood map was compared to that using a likelihood map generated by Haar features and a random forest classifier, and that using our previous conjoint level set analysis and segmentation system (CLASS) without using a likelihood map. All methods were evaluated relative to the 3D hand-segmented reference contours. Results: With DL-CNN-based likelihood map and level sets, the average volume intersection ratio, average percent volume error, average absolute volume error, average minimum distance, and the Jaccard index for the test set were 81.9% ± 12.1%, 10.2% ± 16.2%, 14.0% ± 13.0%, 3.6 ± 2.0 mm, and 76.2% ± 11.8%, respectively. With the Haar-feature-based likelihood map and level sets, the corresponding values were 74.3% ± 12.7%, 13.0% ± 22.3%, 20.5% ± 15.7%, 5.7 ± 2.6 mm, and 66.7% ± 12.6%, respectively. With our previous CLASS with local contour refinement (LCR) method, the corresponding values were 78.0% ± 14.7%, 16.5% ± 16.8%, 18.2% ± 15.0%, 3.8 ± 2.3 mm, and 73.9% ± 13.5%, respectively. Conclusions: The authors demonstrated that the DL-CNN can overcome the strong boundary between two regions that have large difference in gray levels and provides a seamless mask to guide level set segmentation, which has been a problem for many gradient-based segmentation methods. Compared to our previous CLASS with LCR method, which required two user inputs to initialize the segmentation, DL-CNN with level sets achieved better segmentation performance while using a single user input. Compared to the Haar-feature-based likelihood map, the DL-CNN-based likelihood map could guide the level sets to achieve better segmentation. The results demonstrate the feasibility of our new approach of using DL-CNN in combination with level sets for segmentation of the bladder. PMID:27036584
Accounting for Sampling Error in Genetic Eigenvalues Using Random Matrix Theory.
Sztepanacz, Jacqueline L; Blows, Mark W
2017-07-01
The distribution of genetic variance in multivariate phenotypes is characterized by the empirical spectral distribution of the eigenvalues of the genetic covariance matrix. Empirical estimates of genetic eigenvalues from random effects linear models are known to be overdispersed by sampling error, where large eigenvalues are biased upward, and small eigenvalues are biased downward. The overdispersion of the leading eigenvalues of sample covariance matrices have been demonstrated to conform to the Tracy-Widom (TW) distribution. Here we show that genetic eigenvalues estimated using restricted maximum likelihood (REML) in a multivariate random effects model with an unconstrained genetic covariance structure will also conform to the TW distribution after empirical scaling and centering. However, where estimation procedures using either REML or MCMC impose boundary constraints, the resulting genetic eigenvalues tend not be TW distributed. We show how using confidence intervals from sampling distributions of genetic eigenvalues without reference to the TW distribution is insufficient protection against mistaking sampling error as genetic variance, particularly when eigenvalues are small. By scaling such sampling distributions to the appropriate TW distribution, the critical value of the TW statistic can be used to determine if the magnitude of a genetic eigenvalue exceeds the sampling error for each eigenvalue in the spectral distribution of a given genetic covariance matrix. Copyright © 2017 by the Genetics Society of America.
Dealing with noise and physiological artifacts in human EEG recordings: empirical mode methods
NASA Astrophysics Data System (ADS)
Runnova, Anastasiya E.; Grubov, Vadim V.; Khramova, Marina V.; Hramov, Alexander E.
2017-04-01
In the paper we propose the new method for removing noise and physiological artifacts in human EEG recordings based on empirical mode decomposition (Hilbert-Huang transform). As physiological artifacts we consider specific oscillatory patterns that cause problems during EEG analysis and can be detected with additional signals recorded simultaneously with EEG (ECG, EMG, EOG, etc.) We introduce the algorithm of the proposed method with steps including empirical mode decomposition of EEG signal, choosing of empirical modes with artifacts, removing these empirical modes and reconstructing of initial EEG signal. We show the efficiency of the method on the example of filtration of human EEG signal from eye-moving artifacts.
Jung, Sunyoung
2008-01-01
Objectives. We examined the association between county-level estimates of children's health status and school district performance in California. Methods. We used 3 data sources: the California Health Interview Survey, district archives from the California Department of Education, and census-based estimates of county demographic characteristics. We used logistic regression to estimate whether a school district's failure to meet adequate yearly progress goals in 2004 to 2005 was a function of child and adolescent's health status. Models included district- and county-level fixed effects and were adjusted for the clustering of districts within counties. Results. County-level changes in children's and adolescent's health status decreased the likelihood that a school district would fail to meet adequate yearly progress goals during the investigation period. Health status did not moderate the relatively poor performance of predominantly minority districts. Conclusions. We found empirical support that area variation in children's and adolescent's health status exerts a contextual effect on school district performance. Future research should explore the specific mechanisms through which area-level child health influences school and district achievement. PMID:18309137
The search for the number form area: A functional neuroimaging meta-analysis.
Yeo, Darren J; Wilkey, Eric D; Price, Gavin R
2017-07-01
Recent studies report a putative "number form area" (NFA) in the inferior temporal gyrus (ITG) suggested to be specialized for Arabic numeral processing. However, a number of earlier studies report no such NFA. The reasons for such discrepancies across studies are unclear. To examine evidence for a convergent NFA across studies, we conducted two activation likelihood estimation meta-analyses on 31 and a subset of 20 neuroimaging studies that have contrasted digits with other meaningful symbols. Results suggest the potential existence of an NFA in the right ITG, in addition to a 'symbolic number processing network' comprising bilateral parietal regions, and right-lateralized superior and inferior frontal regions. Critically, convergent localization for the NFA was only evident when contrasts were appropriately controlled for task demands, and does not appear to depend on employing methods designed to overcome fMRI signal dropout in the ITG. Importantly, only five studies had foci within the identified ITG NFA cluster boundary, indicating that more empirical evidence is necessary to determine the true functional specialization and regional specificity of the putative NFA. Copyright © 2017 Elsevier Ltd. All rights reserved.
Default risk modeling with position-dependent killing
NASA Astrophysics Data System (ADS)
Katz, Yuri A.
2013-04-01
Diffusion in a linear potential in the presence of position-dependent killing is used to mimic a default process. Different assumptions regarding transport coefficients, initial conditions, and elasticity of the killing measure lead to diverse models of bankruptcy. One “stylized fact” is fundamental for our consideration: empirically default is a rather rare event, especially in the investment grade categories of credit ratings. Hence, the action of killing may be considered as a small parameter. In a number of special cases we derive closed-form expressions for the entire term structure of the cumulative probability of default, its hazard rate, and intensity. Comparison with historical data on aggregate global corporate defaults confirms the validity of the perturbation method for estimations of long-term probability of default for companies with high credit quality. On a single company level, we implement the derived formulas to estimate the one-year likelihood of default of Enron on a daily basis from August 2000 to August 2001, three months before its default, and compare the obtained results with forecasts of traditional structural models.
Pei, Yanbo; Tian, Guo-Liang; Tang, Man-Lai
2014-11-10
Stratified data analysis is an important research topic in many biomedical studies and clinical trials. In this article, we develop five test statistics for testing the homogeneity of proportion ratios for stratified correlated bilateral binary data based on an equal correlation model assumption. Bootstrap procedures based on these test statistics are also considered. To evaluate the performance of these statistics and procedures, we conduct Monte Carlo simulations to study their empirical sizes and powers under various scenarios. Our results suggest that the procedure based on score statistic performs well generally and is highly recommended. When the sample size is large, procedures based on the commonly used weighted least square estimate and logarithmic transformation with Mantel-Haenszel estimate are recommended as they do not involve any computation of maximum likelihood estimates requiring iterative algorithms. We also derive approximate sample size formulas based on the recommended test procedures. Finally, we apply the proposed methods to analyze a multi-center randomized clinical trial for scleroderma patients. Copyright © 2014 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Murphy, P. C.
1986-01-01
An algorithm for maximum likelihood (ML) estimation is developed with an efficient method for approximating the sensitivities. The ML algorithm relies on a new optimization method referred to as a modified Newton-Raphson with estimated sensitivities (MNRES). MNRES determines sensitivities by using slope information from local surface approximations of each output variable in parameter space. With the fitted surface, sensitivity information can be updated at each iteration with less computational effort than that required by either a finite-difference method or integration of the analytically determined sensitivity equations. MNRES eliminates the need to derive sensitivity equations for each new model, and thus provides flexibility to use model equations in any convenient format. A random search technique for determining the confidence limits of ML parameter estimates is applied to nonlinear estimation problems for airplanes. The confidence intervals obtained by the search are compared with Cramer-Rao (CR) bounds at the same confidence level. The degree of nonlinearity in the estimation problem is an important factor in the relationship between CR bounds and the error bounds determined by the search technique. Beale's measure of nonlinearity is developed in this study for airplane identification problems; it is used to empirically correct confidence levels and to predict the degree of agreement between CR bounds and search estimates.
The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?
Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping
2013-01-01
Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.
Estimation of brood and nest survival: Comparative methods in the presence of heterogeneity
Manly, Bryan F.J.; Schmutz, Joel A.
2001-01-01
The Mayfield method has been widely used for estimating survival of nests and young animals, especially when data are collected at irregular observation intervals. However, this method assumes survival is constant throughout the study period, which often ignores biologically relevant variation and may lead to biased survival estimates. We examined the bias and accuracy of 1 modification to the Mayfield method that allows for temporal variation in survival, and we developed and similarly tested 2 additional methods. One of these 2 new methods is simply an iterative extension of Klett and Johnson's method, which we refer to as the Iterative Mayfield method and bears similarity to Kaplan-Meier methods. The other method uses maximum likelihood techniques for estimation and is best applied to survival of animals in groups or families, rather than as independent individuals. We also examined how robust these estimators are to heterogeneity in the data, which can arise from such sources as dependent survival probabilities among siblings, inherent differences among families, and adoption. Testing of estimator performance with respect to bias, accuracy, and heterogeneity was done using simulations that mimicked a study of survival of emperor goose (Chen canagica) goslings. Assuming constant survival for inappropriately long periods of time or use of Klett and Johnson's methods resulted in large bias or poor accuracy (often >5% bias or root mean square error) compared to our Iterative Mayfield or maximum likelihood methods. Overall, estimator performance was slightly better with our Iterative Mayfield than our maximum likelihood method, but the maximum likelihood method provides a more rigorous framework for testing covariates and explicity models a heterogeneity factor. We demonstrated use of all estimators with data from emperor goose goslings. We advocate that future studies use the new methods outlined here rather than the traditional Mayfield method or its previous modifications.
Technical Note: Approximate Bayesian parameterization of a complex tropical forest model
NASA Astrophysics Data System (ADS)
Hartig, F.; Dislich, C.; Wiegand, T.; Huth, A.
2013-08-01
Inverse parameter estimation of process-based models is a long-standing problem in ecology and evolution. A key problem of inverse parameter estimation is to define a metric that quantifies how well model predictions fit to the data. Such a metric can be expressed by general cost or objective functions, but statistical inversion approaches are based on a particular metric, the probability of observing the data given the model, known as the likelihood. Deriving likelihoods for dynamic models requires making assumptions about the probability for observations to deviate from mean model predictions. For technical reasons, these assumptions are usually derived without explicit consideration of the processes in the simulation. Only in recent years have new methods become available that allow generating likelihoods directly from stochastic simulations. Previous applications of these approximate Bayesian methods have concentrated on relatively simple models. Here, we report on the application of a simulation-based likelihood approximation for FORMIND, a parameter-rich individual-based model of tropical forest dynamics. We show that approximate Bayesian inference, based on a parametric likelihood approximation placed in a conventional MCMC, performs well in retrieving known parameter values from virtual field data generated by the forest model. We analyze the results of the parameter estimation, examine the sensitivity towards the choice and aggregation of model outputs and observed data (summary statistics), and show results from using this method to fit the FORMIND model to field data from an Ecuadorian tropical forest. Finally, we discuss differences of this approach to Approximate Bayesian Computing (ABC), another commonly used method to generate simulation-based likelihood approximations. Our results demonstrate that simulation-based inference, which offers considerable conceptual advantages over more traditional methods for inverse parameter estimation, can successfully be applied to process-based models of high complexity. The methodology is particularly suited to heterogeneous and complex data structures and can easily be adjusted to other model types, including most stochastic population and individual-based models. Our study therefore provides a blueprint for a fairly general approach to parameter estimation of stochastic process-based models in ecology and evolution.
Schwartzkopf, Wade C; Bovik, Alan C; Evans, Brian L
2005-12-01
Traditional chromosome imaging has been limited to grayscale images, but recently a 5-fluorophore combinatorial labeling technique (M-FISH) was developed wherein each class of chromosomes binds with a different combination of fluorophores. This results in a multispectral image, where each class of chromosomes has distinct spectral components. In this paper, we develop new methods for automatic chromosome identification by exploiting the multispectral information in M-FISH chromosome images and by jointly performing chromosome segmentation and classification. We (1) develop a maximum-likelihood hypothesis test that uses multispectral information, together with conventional criteria, to select the best segmentation possibility; (2) use this likelihood function to combine chromosome segmentation and classification into a robust chromosome identification system; and (3) show that the proposed likelihood function can also be used as a reliable indicator of errors in segmentation, errors in classification, and chromosome anomalies, which can be indicators of radiation damage, cancer, and a wide variety of inherited diseases. We show that the proposed multispectral joint segmentation-classification method outperforms past grayscale segmentation methods when decomposing touching chromosomes. We also show that it outperforms past M-FISH classification techniques that do not use segmentation information.
NASA Astrophysics Data System (ADS)
Sutawanir
2015-12-01
Mortality tables play important role in actuarial studies such as life annuities, premium determination, premium reserve, valuation pension plan, pension funding. Some known mortality tables are CSO mortality table, Indonesian Mortality Table, Bowers mortality table, Japan Mortality table. For actuary applications some tables are constructed with different environment such as single decrement, double decrement, and multiple decrement. There exist two approaches in mortality table construction : mathematics approach and statistical approach. Distribution model and estimation theory are the statistical concepts that are used in mortality table construction. This article aims to discuss the statistical approach in mortality table construction. The distributional assumptions are uniform death distribution (UDD) and constant force (exponential). Moment estimation and maximum likelihood are used to estimate the mortality parameter. Moment estimation methods are easier to manipulate compared to maximum likelihood estimation (mle). However, the complete mortality data are not used in moment estimation method. Maximum likelihood exploited all available information in mortality estimation. Some mle equations are complicated and solved using numerical methods. The article focus on single decrement estimation using moment and maximum likelihood estimation. Some extension to double decrement will introduced. Simple dataset will be used to illustrated the mortality estimation, and mortality table.
An alternative method to measure the likelihood of a financial crisis in an emerging market
NASA Astrophysics Data System (ADS)
Özlale, Ümit; Metin-Özcan, Kıvılcım
2007-07-01
This paper utilizes an early warning system in order to measure the likelihood of a financial crisis in an emerging market economy. We introduce a methodology, where we can both obtain a likelihood series and analyze the time-varying effects of several macroeconomic variables on this likelihood. Since the issue is analyzed in a non-linear state space framework, the extended Kalman filter emerges as the optimal estimation algorithm. Taking the Turkish economy as our laboratory, the results indicate that both the derived likelihood measure and the estimated time-varying parameters are meaningful and can successfully explain the path that the Turkish economy had followed between 2000 and 2006. The estimated parameters also suggest that overvalued domestic currency, current account deficit and the increase in the default risk increase the likelihood of having an economic crisis in the economy. Overall, the findings in this paper suggest that the estimation methodology introduced in this paper can also be applied to other emerging market economies as well.
Efficient simulation and likelihood methods for non-neutral multi-allele models.
Joyce, Paul; Genz, Alan; Buzbas, Erkan Ozge
2012-06-01
Throughout the 1980s, Simon Tavaré made numerous significant contributions to population genetics theory. As genetic data, in particular DNA sequence, became more readily available, a need to connect population-genetic models to data became the central issue. The seminal work of Griffiths and Tavaré (1994a , 1994b , 1994c) was among the first to develop a likelihood method to estimate the population-genetic parameters using full DNA sequences. Now, we are in the genomics era where methods need to scale-up to handle massive data sets, and Tavaré has led the way to new approaches. However, performing statistical inference under non-neutral models has proved elusive. In tribute to Simon Tavaré, we present an article in spirit of his work that provides a computationally tractable method for simulating and analyzing data under a class of non-neutral population-genetic models. Computational methods for approximating likelihood functions and generating samples under a class of allele-frequency based non-neutral parent-independent mutation models were proposed by Donnelly, Nordborg, and Joyce (DNJ) (Donnelly et al., 2001). DNJ (2001) simulated samples of allele frequencies from non-neutral models using neutral models as auxiliary distribution in a rejection algorithm. However, patterns of allele frequencies produced by neutral models are dissimilar to patterns of allele frequencies produced by non-neutral models, making the rejection method inefficient. For example, in some cases the methods in DNJ (2001) require 10(9) rejections before a sample from the non-neutral model is accepted. Our method simulates samples directly from the distribution of non-neutral models, making simulation methods a practical tool to study the behavior of the likelihood and to perform inference on the strength of selection.
A general methodology for maximum likelihood inference from band-recovery data
Conroy, M.J.; Williams, B.K.
1984-01-01
A numerical procedure is described for obtaining maximum likelihood estimates and associated maximum likelihood inference from band- recovery data. The method is used to illustrate previously developed one-age-class band-recovery models, and is extended to new models, including the analysis with a covariate for survival rates and variable-time-period recovery models. Extensions to R-age-class band- recovery, mark-recapture models, and twice-yearly marking are discussed. A FORTRAN program provides computations for these models.
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Maximum Likelihood Compton Polarimetry with the Compton Spectrometer and Imager
NASA Astrophysics Data System (ADS)
Lowell, A. W.; Boggs, S. E.; Chiu, C. L.; Kierans, C. A.; Sleator, C.; Tomsick, J. A.; Zoglauer, A. C.; Chang, H.-K.; Tseng, C.-H.; Yang, C.-Y.; Jean, P.; von Ballmoos, P.; Lin, C.-H.; Amman, M.
2017-10-01
Astrophysical polarization measurements in the soft gamma-ray band are becoming more feasible as detectors with high position and energy resolution are deployed. Previous work has shown that the minimum detectable polarization (MDP) of an ideal Compton polarimeter can be improved by ˜21% when an unbinned, maximum likelihood method (MLM) is used instead of the standard approach of fitting a sinusoid to a histogram of azimuthal scattering angles. Here we outline a procedure for implementing this maximum likelihood approach for real, nonideal polarimeters. As an example, we use the recent observation of GRB 160530A with the Compton Spectrometer and Imager. We find that the MDP for this observation is reduced by 20% when the MLM is used instead of the standard method.
Patch-based image reconstruction for PET using prior-image derived dictionaries
NASA Astrophysics Data System (ADS)
Tahaei, Marzieh S.; Reader, Andrew J.
2016-09-01
In PET image reconstruction, regularization is often needed to reduce the noise in the resulting images. Patch-based image processing techniques have recently been successfully used for regularization in medical image reconstruction through a penalized likelihood framework. Re-parameterization within reconstruction is another powerful regularization technique in which the object in the scanner is re-parameterized using coefficients for spatially-extensive basis vectors. In this work, a method for extracting patch-based basis vectors from the subject’s MR image is proposed. The coefficients for these basis vectors are then estimated using the conventional MLEM algorithm. Furthermore, using the alternating direction method of multipliers, an algorithm for optimizing the Poisson log-likelihood while imposing sparsity on the parameters is also proposed. This novel method is then utilized to find sparse coefficients for the patch-based basis vectors extracted from the MR image. The results indicate the superiority of the proposed methods to patch-based regularization using the penalized likelihood framework.
Free energy reconstruction from steered dynamics without post-processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Athenes, Manuel, E-mail: Manuel.Athenes@cea.f; Condensed Matter and Materials Division, Physics and Life Sciences Directorate, LLNL, Livermore, CA 94551; Marinica, Mihai-Cosmin
2010-09-20
Various methods achieving importance sampling in ensembles of nonequilibrium trajectories enable one to estimate free energy differences and, by maximum-likelihood post-processing, to reconstruct free energy landscapes. Here, based on Bayes theorem, we propose a more direct method in which a posterior likelihood function is used both to construct the steered dynamics and to infer the contribution to equilibrium of all the sampled states. The method is implemented with two steering schedules. First, using non-autonomous steering, we calculate the migration barrier of the vacancy in Fe-{alpha}. Second, using an autonomous scheduling related to metadynamics and equivalent to temperature-accelerated molecular dynamics, wemore » accurately reconstruct the two-dimensional free energy landscape of the 38-atom Lennard-Jones cluster as a function of an orientational bond-order parameter and energy, down to the solid-solid structural transition temperature of the cluster and without maximum-likelihood post-processing.« less
Penalized spline estimation for functional coefficient regression models.
Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan
2010-04-01
The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.
Nishiura, Hiroshi; Inaba, Hisashi
2011-03-07
Empirical estimates of the incubation period of influenza A (H1N1-2009) have been limited. We estimated the incubation period among confirmed imported cases who traveled to Japan from Hawaii during the early phase of the 2009 pandemic (n=72). We addressed censoring and employed an infection-age structured argument to explicitly model the daily frequency of illness onset after departure. We assumed uniform and exponential distributions for the frequency of exposure in Hawaii, and the hazard rate of infection for the latter assumption was retrieved, in Hawaii, from local outbreak data. The maximum likelihood estimates of the median incubation period range from 1.43 to 1.64 days according to different modeling assumptions, consistent with a published estimate based on a New York school outbreak. The likelihood values of the different modeling assumptions do not differ greatly from each other, although models with the exponential assumption yield slightly shorter incubation periods than those with the uniform exposure assumption. Differences between our proposed approach and a published method for doubly interval-censored analysis highlight the importance of accounting for the dependence of the frequency of exposure on the survival function of incubating individuals among imported cases. A truncation of the density function of the incubation period due to an absence of illness onset during the exposure period also needs to be considered. When the data generating process is similar to that among imported cases, and when the incubation period is close to or shorter than the length of exposure, accounting for these aspects is critical for long exposure times. Copyright © 2010 Elsevier Ltd. All rights reserved.
Yu, Hao; Dick, Andrew W
2012-01-01
Background Given the rapid growth of health care costs, some experts were concerned with erosion of employment-based private insurance (EBPI). This empirical analysis aims to quantify the concern. Methods Using the National Health Account, we generated a cost index to represent state-level annual cost growth. We merged it with the 1996–2003 Medical Expenditure Panel Survey. The unit of analysis is the family. We conducted both bivariate and multivariate logistic analyses. Results The bivariate analysis found a significant inverse association between the cost index and the proportion of families receiving an offer of EBPI. The multivariate analysis showed that the cost index was significantly negatively associated with the likelihood of receiving an EBPI offer for the entire sample and for families in the first, second, and third quartiles of income distribution. The cost index was also significantly negatively associated with the proportion of families with EBPI for the entire year for each family member (EBPI-EYEM). The multivariate analysis confirmed significance of the relationship for the entire sample, and for families in the second and third quartiles of income distribution. Among the families with EBPI-EYEM, there was a positive relationship between the cost index and this group's likelihood of having out-of-pocket expenditures exceeding 10 percent of family income. The multivariate analysis confirmed significance of the relationship for the entire group and for families in the second and third quartiles of income distribution. Conclusions Rising health costs reduce EBPI availability and enrollment, and the financial protection provided by it, especially for middle-class families. PMID:22417314
Marchiondo, Lisa A; Gonzales, Ernest; Williams, Larry J
2017-07-12
This study addresses older employees' trajectories of perceived workplace age discrimination, and the long-term associations among perceived age discrimination and older workers' mental and self-rated health, job satisfaction, and likelihood of working past retirement age. We evaluate the strength and vulnerability integration (SAVI) model. Three waves of data from employed participants were drawn from the Health and Retirement Study (N = 3,957). Latent growth modeling was used to assess relationships between the slopes and the intercepts of the variables, thereby assessing longitudinal and cross-sectional associations. Perceived workplace age discrimination tends to increase with age, although notable variance exists. The initial status of perceived age discrimination relates to the baseline statuses of depression, self-rated health, job satisfaction, and likelihood of working past retirement age in the expected directions. Over time, perceived age discrimination predicts lower job satisfaction and self-rated health, as well as elevated depressive symptoms, but not likelihood of working past retirement age. This study provides empirical support for the SAVI model and uncovers the "wear and tear" effects of perceived workplace age discrimination on older workers' mental and overall health. We deliberate on social policies that may reduce age discrimination, thereby promoting older employees' health and ability to work longer. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Moderation analysis with missing data in the predictors.
Zhang, Qian; Wang, Lijuan
2017-12-01
The most widely used statistical model for conducting moderation analysis is the moderated multiple regression (MMR) model. In MMR modeling, missing data could pose a challenge, mainly because the interaction term is a product of two or more variables and thus is a nonlinear function of the involved variables. In this study, we consider a simple MMR model, where the effect of the focal predictor X on the outcome Y is moderated by a moderator U. The primary interest is to find ways of estimating and testing the moderation effect with the existence of missing data in X. We mainly focus on cases when X is missing completely at random (MCAR) and missing at random (MAR). Three methods are compared: (a) Normal-distribution-based maximum likelihood estimation (NML); (b) Normal-distribution-based multiple imputation (NMI); and (c) Bayesian estimation (BE). Via simulations, we found that NML and NMI could lead to biased estimates of moderation effects under MAR missingness mechanism. The BE method outperformed NMI and NML for MMR modeling with missing data in the focal predictor, missingness depending on the moderator and/or auxiliary variables, and correctly specified distributions for the focal predictor. In addition, more robust BE methods are needed in terms of the distribution mis-specification problem of the focal predictor. An empirical example was used to illustrate the applications of the methods with a simple sensitivity analysis. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Toussaint, Emmanuel F A; Morinière, Jérôme; Müller, Chris J; Kunte, Krushnamegh; Turlin, Bernard; Hausmann, Axel; Balke, Michael
2015-10-01
The charismatic tropical Polyura Nawab butterflies are distributed across twelve biodiversity hotspots in the Indomalayan/Australasian archipelago. In this study, we tested an array of species delimitation methods and compared the results to existing morphology-based taxonomy. We sequenced two mitochondrial and two nuclear gene fragments to reconstruct phylogenetic relationships within Polyura using both Bayesian inference and maximum likelihood. Based on this phylogenetic framework, we used the recently introduced bGMYC, BPP and PTP methods to investigate species boundaries. Based on our results, we describe two new species Polyura paulettae Toussaint sp. n. and Polyura smilesi Toussaint sp. n., propose one synonym, and five populations are raised to species status. Most of the newly recognized species are single-island endemics likely resulting from the recent highly complex geological history of the Indomalayan-Australasian archipelago. Surprisingly, we also find two newly recognized species in the Indomalayan region where additional biotic or abiotic factors have fostered speciation. Species delimitation methods were largely congruent and succeeded to cross-validate most extant morphological species. PTP and BPP seem to yield more consistent and robust estimations of species boundaries with respect to morphological characters while bGMYC delivered contrasting results depending on the different gene trees considered. Our findings demonstrate the efficiency of comparative approaches using molecular species delimitation methods on empirical data. They also pave the way for the investigation of less well-known groups to unveil patterns of species richness and catalogue Earth's concealed, therefore unappreciated diversity. Published by Elsevier Inc.
A Selective Overview of Variable Selection in High Dimensional Feature Space
Fan, Jianqing
2010-01-01
High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods. PMID:21572976
How rational should bioethics be? The value of empirical approaches.
Alvarez, A A
2001-10-01
Rational justification of claims with empirical content calls for empirical and not only normative philosophical investigation. Empirical approaches to bioethics are epistemically valuable, i.e., such methods may be necessary in providing and verifying basic knowledge about cultural values and norms. Our assumptions in moral reasoning can be verified or corrected using these methods. Moral arguments can be initiated or adjudicated by data drawn from empirical investigation. One may argue that individualistic informed consent, for example, is not compatible with the Asian communitarian orientation. But this normative claim uses an empirical assumption that may be contrary to the fact that some Asians do value and argue for informed consent. Is it necessary and factual to neatly characterize some cultures as individualistic and some as communitarian? Empirical investigation can provide a reasonable way to inform such generalizations. In a multi-cultural context, such as in the Philippines, there is a need to investigate the nature of the local ethos before making any appeal to authenticity. Otherwise we may succumb to the same ethical imperialism we are trying hard to resist. Normative claims that involve empirical premises cannot be reasonable verified or evaluated without utilizing empirical methods along with philosophical reflection. The integration of empirical methods to the standard normative approach to moral reasoning should be reasonably guided by the epistemic demands of claims arising from cross-cultural discourse in bioethics.
Ye, Xin; Garikapati, Venu M.; You, Daehyun; ...
2017-11-08
Most multinomial choice models (e.g., the multinomial logit model) adopted in practice assume an extreme-value Gumbel distribution for the random components (error terms) of utility functions. This distributional assumption offers a closed-form likelihood expression when the utility maximization principle is applied to model choice behaviors. As a result, model coefficients can be easily estimated using the standard maximum likelihood estimation method. However, maximum likelihood estimators are consistent and efficient only if distributional assumptions on the random error terms are valid. It is therefore critical to test the validity of underlying distributional assumptions on the error terms that form the basismore » of parameter estimation and policy evaluation. In this paper, a practical yet statistically rigorous method is proposed to test the validity of the distributional assumption on the random components of utility functions in both the multinomial logit (MNL) model and multiple discrete-continuous extreme value (MDCEV) model. Based on a semi-nonparametric approach, a closed-form likelihood function that nests the MNL or MDCEV model being tested is derived. The proposed method allows traditional likelihood ratio tests to be used to test violations of the standard Gumbel distribution assumption. Simulation experiments are conducted to demonstrate that the proposed test yields acceptable Type-I and Type-II error probabilities at commonly available sample sizes. The test is then applied to three real-world discrete and discrete-continuous choice models. For all three models, the proposed test rejects the validity of the standard Gumbel distribution in most utility functions, calling for the development of robust choice models that overcome adverse effects of violations of distributional assumptions on the error terms in random utility functions.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ye, Xin; Garikapati, Venu M.; You, Daehyun
Most multinomial choice models (e.g., the multinomial logit model) adopted in practice assume an extreme-value Gumbel distribution for the random components (error terms) of utility functions. This distributional assumption offers a closed-form likelihood expression when the utility maximization principle is applied to model choice behaviors. As a result, model coefficients can be easily estimated using the standard maximum likelihood estimation method. However, maximum likelihood estimators are consistent and efficient only if distributional assumptions on the random error terms are valid. It is therefore critical to test the validity of underlying distributional assumptions on the error terms that form the basismore » of parameter estimation and policy evaluation. In this paper, a practical yet statistically rigorous method is proposed to test the validity of the distributional assumption on the random components of utility functions in both the multinomial logit (MNL) model and multiple discrete-continuous extreme value (MDCEV) model. Based on a semi-nonparametric approach, a closed-form likelihood function that nests the MNL or MDCEV model being tested is derived. The proposed method allows traditional likelihood ratio tests to be used to test violations of the standard Gumbel distribution assumption. Simulation experiments are conducted to demonstrate that the proposed test yields acceptable Type-I and Type-II error probabilities at commonly available sample sizes. The test is then applied to three real-world discrete and discrete-continuous choice models. For all three models, the proposed test rejects the validity of the standard Gumbel distribution in most utility functions, calling for the development of robust choice models that overcome adverse effects of violations of distributional assumptions on the error terms in random utility functions.« less
Guindon, Stéphane; Dufayard, Jean-François; Lefort, Vincent; Anisimova, Maria; Hordijk, Wim; Gascuel, Olivier
2010-05-01
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
Ferreira, António Miguel; Marques, Hugo; Tralhão, António; Santos, Miguel Borges; Santos, Ana Rita; Cardoso, Gonçalo; Dores, Hélder; Carvalho, Maria Salomé; Madeira, Sérgio; Machado, Francisco Pereira; Cardim, Nuno; de Araújo Gonçalves, Pedro
2016-11-01
Current guidelines recommend the use of the Modified Diamond-Forrester (MDF) method to assess the pre-test likelihood of obstructive coronary artery disease (CAD). We aimed to compare the performance of the MDF method with two contemporary algorithms derived from multicenter trials that additionally incorporate cardiovascular risk factors: the calculator-based 'CAD Consortium 2' method, and the integer-based CONFIRM score. We assessed 1069 consecutive patients without known CAD undergoing coronary CT angiography (CCTA) for stable chest pain. Obstructive CAD was defined as the presence of coronary stenosis ≥50% on 64-slice dual-source CT. The three methods were assessed for calibration, discrimination, net reclassification, and changes in proposed downstream testing based upon calculated pre-test likelihoods. The observed prevalence of obstructive CAD was 13.8% (n=147). Overestimations of the likelihood of obstructive CAD were 140.1%, 9.8%, and 18.8%, respectively, for the MDF, CAD Consortium 2 and CONFIRM methods. The CAD Consortium 2 showed greater discriminative power than the MDF method, with a C-statistic of 0.73 vs. 0.70 (p<0.001), while the CONFIRM score did not (C-statistic 0.71, p=0.492). Reclassification of pre-test likelihood using the 'CAD Consortium 2' or CONFIRM scores resulted in a net reclassification improvement of 0.19 and 0.18, respectively, which would change the diagnostic strategy in approximately half of the patients. Newer risk factor-encompassing models allow for a more precise estimation of pre-test probabilities of obstructive CAD than the guideline-recommended MDF method. Adoption of these scores may improve disease prediction and change the diagnostic pathway in a significant proportion of patients. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Parameter Estimation in Epidemiology: from Simple to Complex Dynamics
NASA Astrophysics Data System (ADS)
Aguiar, Maíra; Ballesteros, Sebastién; Boto, João Pedro; Kooi, Bob W.; Mateus, Luís; Stollenwerk, Nico
2011-09-01
We revisit the parameter estimation framework for population biological dynamical systems, and apply it to calibrate various models in epidemiology with empirical time series, namely influenza and dengue fever. When it comes to more complex models like multi-strain dynamics to describe the virus-host interaction in dengue fever, even most recently developed parameter estimation techniques, like maximum likelihood iterated filtering, come to their computational limits. However, the first results of parameter estimation with data on dengue fever from Thailand indicate a subtle interplay between stochasticity and deterministic skeleton. The deterministic system on its own already displays complex dynamics up to deterministic chaos and coexistence of multiple attractors.
A business model analysis of telecardiology service.
Lin, Shu-Hsia; Liu, Jorn-Hon; Wei, Jen; Yin, Wei-Hsian; Chen, Hung-Hsin; Chiu, Wen-Ta
2010-12-01
Telecare has become an increasingly common medical service in recent years. However, new service must be close to the market and be market-driven to have a high likelihood of success. This article analyzes the business model of a telecardiology service managed by a general hospital. The methodology of the article is as follows: (1) initially it describes the elements of the service based on the ontology of the business model, (2) then it transfers these elements into the choices for business model dynamic loops and examines their validity, and (3) finally provides an empirical financial analysis of the service to assess the profit-making possibilities.
Suicidal and online: how do online behaviors inform us of this high-risk population?
Harris, Keith M; McLean, John P; Sheffield, Jeanie
2014-01-01
To assist suicide prevention we need a better understanding of how suicidal individuals act in their environment, and the online world offers an ideal opportunity to examine daily behaviors. This anonymous survey (N = 1,016) provides first-of-its-kind empirical evidence demonstrating suicide-risk people (n = 290) are unique in their online behaviors. Suicidal users reported more time online, greater likelihood of developing online personal relationships, and greater use of online forums. In addition, suicide-risk women reported more time browsing/surfing and social networking. The authors conclude that suicide prevention efforts should respond to suicide-risk users' greater demands for online interpersonal communications.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2010-07-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2013-01-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286
Looks and linguistics: Impression formation in online exchange marketplaces.
Ciuchta, Michael P; O'Toole, Jay
2016-01-01
This study advances theories of impression formation by focusing on two factors that generate emotional responses: physical attractiveness and positive word use. Although considerable research on impression formation exists, most studies consider factors in isolation and neglect possible interactions. Our theory introduces competing mechanisms regarding possible interaction effects, and we empirically test them in an online marketplace. Results from the analysis of 729 loan requests from a leading online peer-to-peer lending market suggest that physical attractiveness and positive word use work together to influence the likelihood of acquiring resources and establish an important boundary condition to the general "beauty is good" effect.
Blankespoor, C L; Pappas, P W; Eisner, T
1997-07-01
The defensive glands of beetles, Tenebrio molitor, infected with metacestodes (cysticercoids) of Hymenolepis diminuta are everted less frequently upon stimulation, and contain less toluquinone (methylbenzoquinone) and m-cresol, than glands of uninfected controls. These differences, as shown in predation trials with wild rats, increase the likelihood that both cysticercoids and beetles will be ingested by the tapeworm's definitive host. This is the first documented case of a parasite inhibiting the chemical defence of an intermediate host, and one of only a few reports of parasite-induced manipulation of host biology supported by empirical evidence implicating facilitated parasite transmission between host species.
Extreme data compression for the CMB
NASA Astrophysics Data System (ADS)
Zablocki, Alan; Dodelson, Scott
2016-04-01
We apply the Karhunen-Loéve methods to cosmic microwave background (CMB) data sets, and show that we can recover the input cosmology and obtain the marginalized likelihoods in Λ cold dark matter cosmologies in under a minute, much faster than Markov chain Monte Carlo methods. This is achieved by forming a linear combination of the power spectra at each multipole l , and solving a system of simultaneous equations such that the Fisher matrix is locally unchanged. Instead of carrying out a full likelihood evaluation over the whole parameter space, we need evaluate the likelihood only for the parameter of interest, with the data compression effectively marginalizing over all other parameters. The weighting vectors contain insight about the physical effects of the parameters on the CMB anisotropy power spectrum Cl . The shape and amplitude of these vectors give an intuitive feel for the physics of the CMB, the sensitivity of the observed spectrum to cosmological parameters, and the relative sensitivity of different experiments to cosmological parameters. We test this method on exact theory Cl as well as on a Wilkinson Microwave Anisotropy Probe (WMAP)-like CMB data set generated from a random realization of a fiducial cosmology, comparing the compression results to those from a full likelihood analysis using CosmoMC. After showing that the method works, we apply it to the temperature power spectrum from the WMAP seven-year data release, and discuss the successes and limitations of our method as applied to a real data set.
Maximum Likelihood Shift Estimation Using High Resolution Polarimetric SAR Clutter Model
NASA Astrophysics Data System (ADS)
Harant, Olivier; Bombrun, Lionel; Vasile, Gabriel; Ferro-Famil, Laurent; Gay, Michel
2011-03-01
This paper deals with a Maximum Likelihood (ML) shift estimation method in the context of High Resolution (HR) Polarimetric SAR (PolSAR) clutter. Texture modeling is exposed and the generalized ML texture tracking method is extended to the merging of various sensors. Some results on displacement estimation on the Argentiere glacier in the Mont Blanc massif using dual-pol TerraSAR-X (TSX) and quad-pol RADARSAT-2 (RS2) sensors are finally discussed.
Landslide susceptibility estimations in the Gerecse hills (Hungary).
NASA Astrophysics Data System (ADS)
Gerzsenyi, Dávid; Gáspár, Albert
2017-04-01
Surface movement processes are constantly posing threat to property in populated and agricultural areas in the Gerecse hills (Hungary). The affected geological formations are mainly unconsolidated sediments. Pleistocene loess and alluvial terrace sediments are overwhelmingly present, but fluvio-lacustrine sediments of the latest Miocene, and consolidated Eocene and Mesozoic limestones and marls can also be found in the area. Landslides and other surface movement processes are being studied for a long time in the area, but a comprehensive GIS-based geostatistical analysis have not yet been made for the whole area. This was the reason for choosing the Gerecse as the focus area of the study. However, the base data of our study are freely accessible from online servers, so the used method can be applied to other regions in Hungary. Qualitative data was acquired from the landslide-inventory map of the Hungarian Surface Movement Survey and from the Geological Map of Hungary (1 : 100 000). Morphometric parameters derived from the SRMT-1 DEM were used as quantitative variables. Using these parameters the distribution of elevation, slope gradient, aspect and categorized geological features were computed, both for areas affected and not affected by slope movements. Then likelihood values were computed for each parameters by comparing their distribution in the two areas. With combining the likelihood values of the four parameters relative hazard values were computed for each cell. This method is known as the "empirical probability estimation" originally published by Chung (2005). The map created this way shows each cell's place in their ranking based on the relative hazard values as a percentage for the whole study area (787 km2). These values provide information about how similar is a certain area to the areas already affected by landslides based on the four predictor variables. This map can also serve as a base for more complex landslide vulnerability studies involving economic factors. The landslide-inventory database used in the research provides information regarding the state of activity of the past surface movements, however the activity of many sites are stated as unknown. A complementary field survey have been carried out aiming to categorize these areas - near to Dunaszentmiklós and Neszmély villages - in one of the most landslide-affected part of the Gerecse. Reference: Chung, C. (2005). Using likelihood ratio functions for modeling the conditional probability of occurrence of future landslides for risk assessment. Computers & Geosciences, 32., pp. 1052-1068.
Robust Multipoint Water-Fat Separation Using Fat Likelihood Analysis
Yu, Huanzhou; Reeder, Scott B.; Shimakawa, Ann; McKenzie, Charles A.; Brittain, Jean H.
2016-01-01
Fat suppression is an essential part of routine MRI scanning. Multiecho chemical-shift based water-fat separation methods estimate and correct for Bo field inhomogeneity. However, they must contend with the intrinsic challenge of water-fat ambiguity that can result in water-fat swapping. This problem arises because the signals from two chemical species, when both are modeled as a single discrete spectral peak, may appear indistinguishable in the presence of Bo off-resonance. In conventional methods, the water-fat ambiguity is typically removed by enforcing field map smoothness using region growing based algorithms. In reality, the fat spectrum has multiple spectral peaks. Using this spectral complexity, we introduce a novel concept that identifies water and fat for multiecho acquisitions by exploiting the spectral differences between water and fat. A fat likelihood map is produced to indicate if a pixel is likely to be water-dominant or fat-dominant by comparing the fitting residuals of two different signal models. The fat likelihood analysis and field map smoothness provide complementary information, and we designed an algorithm (Fat Likelihood Analysis for Multiecho Signals) to exploit both mechanisms. It is demonstrated in a wide variety of data that the Fat Likelihood Analysis for Multiecho Signals algorithm offers highly robust water-fat separation for 6-echo acquisitions, particularly in some previously challenging applications. PMID:21842498
NASA Astrophysics Data System (ADS)
Peng, Juan-juan; Wang, Jian-qiang; Yang, Wu-E.
2017-01-01
In this paper, multi-criteria decision-making (MCDM) problems based on the qualitative flexible multiple criteria method (QUALIFLEX), in which the criteria values are expressed by multi-valued neutrosophic information, are investigated. First, multi-valued neutrosophic sets (MVNSs), which allow the truth-membership function, indeterminacy-membership function and falsity-membership function to have a set of crisp values between zero and one, are introduced. Then the likelihood of multi-valued neutrosophic number (MVNN) preference relations is defined and the corresponding properties are also discussed. Finally, an extended QUALIFLEX approach based on likelihood is explored to solve MCDM problems where the assessments of alternatives are in the form of MVNNs; furthermore an example is provided to illustrate the application of the proposed method, together with a comparison analysis.
Maximum Likelihood Compton Polarimetry with the Compton Spectrometer and Imager
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lowell, A. W.; Boggs, S. E; Chiu, C. L.
2017-10-20
Astrophysical polarization measurements in the soft gamma-ray band are becoming more feasible as detectors with high position and energy resolution are deployed. Previous work has shown that the minimum detectable polarization (MDP) of an ideal Compton polarimeter can be improved by ∼21% when an unbinned, maximum likelihood method (MLM) is used instead of the standard approach of fitting a sinusoid to a histogram of azimuthal scattering angles. Here we outline a procedure for implementing this maximum likelihood approach for real, nonideal polarimeters. As an example, we use the recent observation of GRB 160530A with the Compton Spectrometer and Imager. Wemore » find that the MDP for this observation is reduced by 20% when the MLM is used instead of the standard method.« less
Estimation of parameters of dose volume models and their confidence limits
NASA Astrophysics Data System (ADS)
van Luijk, P.; Delvigne, T. C.; Schilstra, C.; Schippers, J. M.
2003-07-01
Predictions of the normal-tissue complication probability (NTCP) for the ranking of treatment plans are based on fits of dose-volume models to clinical and/or experimental data. In the literature several different fit methods are used. In this work frequently used methods and techniques to fit NTCP models to dose response data for establishing dose-volume effects, are discussed. The techniques are tested for their usability with dose-volume data and NTCP models. Different methods to estimate the confidence intervals of the model parameters are part of this study. From a critical-volume (CV) model with biologically realistic parameters a primary dataset was generated, serving as the reference for this study and describable by the NTCP model. The CV model was fitted to this dataset. From the resulting parameters and the CV model, 1000 secondary datasets were generated by Monte Carlo simulation. All secondary datasets were fitted to obtain 1000 parameter sets of the CV model. Thus the 'real' spread in fit results due to statistical spreading in the data is obtained and has been compared with estimates of the confidence intervals obtained by different methods applied to the primary dataset. The confidence limits of the parameters of one dataset were estimated using the methods, employing the covariance matrix, the jackknife method and directly from the likelihood landscape. These results were compared with the spread of the parameters, obtained from the secondary parameter sets. For the estimation of confidence intervals on NTCP predictions, three methods were tested. Firstly, propagation of errors using the covariance matrix was used. Secondly, the meaning of the width of a bundle of curves that resulted from parameters that were within the one standard deviation region in the likelihood space was investigated. Thirdly, many parameter sets and their likelihood were used to create a likelihood-weighted probability distribution of the NTCP. It is concluded that for the type of dose response data used here, only a full likelihood analysis will produce reliable results. The often-used approximations, such as the usage of the covariance matrix, produce inconsistent confidence limits on both the parameter sets and the resulting NTCP values.
Incorrect likelihood methods were used to infer scaling laws of marine predator search behaviour.
Edwards, Andrew M; Freeman, Mervyn P; Breed, Greg A; Jonsen, Ian D
2012-01-01
Ecologists are collecting extensive data concerning movements of animals in marine ecosystems. Such data need to be analysed with valid statistical methods to yield meaningful conclusions. We demonstrate methodological issues in two recent studies that reached similar conclusions concerning movements of marine animals (Nature 451:1098; Science 332:1551). The first study analysed vertical movement data to conclude that diverse marine predators (Atlantic cod, basking sharks, bigeye tuna, leatherback turtles and Magellanic penguins) exhibited "Lévy-walk-like behaviour", close to a hypothesised optimal foraging strategy. By reproducing the original results for the bigeye tuna data, we show that the likelihood of tested models was calculated from residuals of regression fits (an incorrect method), rather than from the likelihood equations of the actual probability distributions being tested. This resulted in erroneous Akaike Information Criteria, and the testing of models that do not correspond to valid probability distributions. We demonstrate how this led to overwhelming support for a model that has no biological justification and that is statistically spurious because its probability density function goes negative. Re-analysis of the bigeye tuna data, using standard likelihood methods, overturns the original result and conclusion for that data set. The second study observed Lévy walk movement patterns by mussels. We demonstrate several issues concerning the likelihood calculations (including the aforementioned residuals issue). Re-analysis of the data rejects the original Lévy walk conclusion. We consequently question the claimed existence of scaling laws of the search behaviour of marine predators and mussels, since such conclusions were reached using incorrect methods. We discourage the suggested potential use of "Lévy-like walks" when modelling consequences of fishing and climate change, and caution that any resulting advice to managers of marine ecosystems would be problematic. For reproducibility and future work we provide R source code for all calculations.
NASA Astrophysics Data System (ADS)
Goodman, Steven N.
1989-11-01
This dissertation explores the use of a mathematical measure of statistical evidence, the log likelihood ratio, in clinical trials. The methods and thinking behind the use of an evidential measure are contrasted with traditional methods of analyzing data, which depend primarily on a p-value as an estimate of the statistical strength of an observed data pattern. It is contended that neither the behavioral dictates of Neyman-Pearson hypothesis testing methods, nor the coherency dictates of Bayesian methods are realistic models on which to base inference. The use of the likelihood alone is applied to four aspects of trial design or conduct: the calculation of sample size, the monitoring of data, testing for the equivalence of two treatments, and meta-analysis--the combining of results from different trials. Finally, a more general model of statistical inference, using belief functions, is used to see if it is possible to separate the assessment of evidence from our background knowledge. It is shown that traditional and Bayesian methods can be modeled as two ends of a continuum of structured background knowledge, methods which summarize evidence at the point of maximum likelihood assuming no structure, and Bayesian methods assuming complete knowledge. Both schools are seen to be missing a concept of ignorance- -uncommitted belief. This concept provides the key to understanding the problem of sampling to a foregone conclusion and the role of frequency properties in statistical inference. The conclusion is that statistical evidence cannot be defined independently of background knowledge, and that frequency properties of an estimator are an indirect measure of uncommitted belief. Several likelihood summaries need to be used in clinical trials, with the quantitative disparity between summaries being an indirect measure of our ignorance. This conclusion is linked with parallel ideas in the philosophy of science and cognitive psychology.
SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.
Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman
2017-03-01
We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).
Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D
2016-06-15
The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Population Synthesis of Radio and Y-ray Normal, Isolated Pulsars Using Markov Chain Monte Carlo
NASA Astrophysics Data System (ADS)
Billman, Caleb; Gonthier, P. L.; Harding, A. K.
2013-04-01
We present preliminary results of a population statistics study of normal pulsars (NP) from the Galactic disk using Markov Chain Monte Carlo techniques optimized according to two different methods. The first method compares the detected and simulated cumulative distributions of series of pulsar characteristics, varying the model parameters to maximize the overall agreement. The advantage of this method is that the distributions do not have to be binned. The other method varies the model parameters to maximize the log of the maximum likelihood obtained from the comparisons of four-two dimensional distributions of radio and γ-ray pulsar characteristics. The advantage of this method is that it provides a confidence region of the model parameter space. The computer code simulates neutron stars at birth using Monte Carlo procedures and evolves them to the present assuming initial spatial, kick velocity, magnetic field, and period distributions. Pulsars are spun down to the present and given radio and γ-ray emission characteristics, implementing an empirical γ-ray luminosity model. A comparison group of radio NPs detected in ten-radio surveys is used to normalize the simulation, adjusting the model radio luminosity to match a birth rate. We include the Fermi pulsars in the forthcoming second pulsar catalog. We present preliminary results comparing the simulated and detected distributions of radio and γ-ray NPs along with a confidence region in the parameter space of the assumed models. We express our gratitude for the generous support of the National Science Foundation (REU and RUI), Fermi Guest Investigator Program and the NASA Astrophysics Theory and Fundamental Program.
The Impact of Missing Data on Species Tree Estimation.
Xi, Zhenxiang; Liu, Liang; Davis, Charles C
2016-03-01
Phylogeneticists are increasingly assembling genome-scale data sets that include hundreds of genes to resolve their focal clades. Although these data sets commonly include a moderate to high amount of missing data, there remains no consensus on their impact to species tree estimation. Here, using several simulated and empirical data sets, we assess the effects of missing data on species tree estimation under varying degrees of incomplete lineage sorting (ILS) and gene rate heterogeneity. We demonstrate that concatenation (RAxML), gene-tree-based coalescent (ASTRAL, MP-EST, and STAR), and supertree (matrix representation with parsimony [MRP]) methods perform reliably, so long as missing data are randomly distributed (by gene and/or by species) and that a sufficiently large number of genes are sampled. When data sets are indecisive sensu Sanderson et al. (2010. Phylogenomics with incomplete taxon coverage: the limits to inference. BMC Evol Biol. 10:155) and/or ILS is high, however, high amounts of missing data that are randomly distributed require exhaustive levels of gene sampling, likely exceeding most empirical studies to date. Moreover, missing data become especially problematic when they are nonrandomly distributed. We demonstrate that STAR produces inconsistent results when the amount of nonrandom missing data is high, regardless of the degree of ILS and gene rate heterogeneity. Similarly, concatenation methods using maximum likelihood can be misled by nonrandom missing data in the presence of gene rate heterogeneity, which becomes further exacerbated when combined with high ILS. In contrast, ASTRAL, MP-EST, and MRP are more robust under all of these scenarios. These results underscore the importance of understanding the influence of missing data in the phylogenomics era. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Li, Dongming; Sun, Changming; Yang, Jinhua; Liu, Huan; Peng, Jiaqi; Zhang, Lijuan
2017-04-06
An adaptive optics (AO) system provides real-time compensation for atmospheric turbulence. However, an AO image is usually of poor contrast because of the nature of the imaging process, meaning that the image contains information coming from both out-of-focus and in-focus planes of the object, which also brings about a loss in quality. In this paper, we present a robust multi-frame adaptive optics image restoration algorithm via maximum likelihood estimation. Our proposed algorithm uses a maximum likelihood method with image regularization as the basic principle, and constructs the joint log likelihood function for multi-frame AO images based on a Poisson distribution model. To begin with, a frame selection method based on image variance is applied to the observed multi-frame AO images to select images with better quality to improve the convergence of a blind deconvolution algorithm. Then, by combining the imaging conditions and the AO system properties, a point spread function estimation model is built. Finally, we develop our iterative solutions for AO image restoration addressing the joint deconvolution issue. We conduct a number of experiments to evaluate the performances of our proposed algorithm. Experimental results show that our algorithm produces accurate AO image restoration results and outperforms the current state-of-the-art blind deconvolution methods.
Li, Dongming; Sun, Changming; Yang, Jinhua; Liu, Huan; Peng, Jiaqi; Zhang, Lijuan
2017-01-01
An adaptive optics (AO) system provides real-time compensation for atmospheric turbulence. However, an AO image is usually of poor contrast because of the nature of the imaging process, meaning that the image contains information coming from both out-of-focus and in-focus planes of the object, which also brings about a loss in quality. In this paper, we present a robust multi-frame adaptive optics image restoration algorithm via maximum likelihood estimation. Our proposed algorithm uses a maximum likelihood method with image regularization as the basic principle, and constructs the joint log likelihood function for multi-frame AO images based on a Poisson distribution model. To begin with, a frame selection method based on image variance is applied to the observed multi-frame AO images to select images with better quality to improve the convergence of a blind deconvolution algorithm. Then, by combining the imaging conditions and the AO system properties, a point spread function estimation model is built. Finally, we develop our iterative solutions for AO image restoration addressing the joint deconvolution issue. We conduct a number of experiments to evaluate the performances of our proposed algorithm. Experimental results show that our algorithm produces accurate AO image restoration results and outperforms the current state-of-the-art blind deconvolution methods. PMID:28383503
NASA Astrophysics Data System (ADS)
Grubov, V. V.; Runnova, A. E.; Hramov, A. E.
2018-05-01
A new method for adaptive filtration of experimental EEG signals in humans and for removal of different physiological artifacts has been proposed. The algorithm of the method includes empirical mode decomposition of EEG, determination of the number of empirical modes that are considered, analysis of the empirical modes and search for modes that contains artifacts, removal of these modes, and reconstruction of the EEG signal. The method was tested on experimental human EEG signals and demonstrated high efficiency in the removal of different types of physiological EEG artifacts.
Maximum likelihood estimation for life distributions with competing failure modes
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1979-01-01
Systems which are placed on test at time zero, function for a period and die at some random time were studied. Failure may be due to one of several causes or modes. The parameters of the life distribution may depend upon the levels of various stress variables the item is subject to. Maximum likelihood estimation methods are discussed. Specific methods are reported for the smallest extreme-value distributions of life. Monte-Carlo results indicate the methods to be promising. Under appropriate conditions, the location parameters are nearly unbiased, the scale parameter is slight biased, and the asymptotic covariances are rapidly approached.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yorita, Kohei
2005-03-01
We have measured the top quark mass with the dynamical likelihood method (DLM) using the CDF II detector at the Fermilab Tevatron. The Tevatron produces top and anti-top pairs in pp collisions at a center of mass energy of 1.96 TeV. The data sample used in this paper was accumulated from March 2002 through August 2003 which corresponds to an integrated luminosity of 162 pb -1.
NASA Astrophysics Data System (ADS)
Piecuch, C. G.; Huybers, P. J.; Tingley, M.
2016-12-01
Sea level observations from coastal tide gauges are some of the longest instrumental records of the ocean. However, these data can be noisy, biased, and gappy, featuring missing values, and reflecting land motion and local effects. Coping with these issues in a formal manner is a challenging task. Some studies use Bayesian approaches to estimate sea level from tide gauge records, making inference probabilistically. Such methods are typically empirically Bayesian in nature: model parameters are treated as known and assigned point values. But, in reality, parameters are not perfectly known. Empirical Bayes methods thus neglect a potentially important source of uncertainty, and so may overestimate the precision (i.e., underestimate the uncertainty) of sea level estimates. We consider whether empirical Bayes methods underestimate uncertainty in sea level from tide gauge data, comparing to a full Bayes method that treats parameters as unknowns to be solved for along with the sea level field. We develop a hierarchical algorithm that we apply to tide gauge data on the North American northeast coast over 1893-2015. The algorithm is run in full Bayes mode, solving for the sea level process and parameters, and in empirical mode, solving only for the process using fixed parameter values. Error bars on sea level from the empirical method are smaller than from the full Bayes method, and the relative discrepancies increase with time; the 95% credible interval on sea level values from the empirical Bayes method in 1910 and 2010 is 23% and 56% narrower, respectively, than from the full Bayes approach. To evaluate the representativeness of the credible intervals, empirical Bayes and full Bayes methods are applied to corrupted data of a known surrogate field. Using rank histograms to evaluate the solutions, we find that the full Bayes method produces generally reliable error bars, whereas the empirical Bayes method gives too-narrow error bars, such that the 90% credible interval only encompasses 70% of true process values. Results demonstrate that parameter uncertainty is an important source of process uncertainty, and advocate for the fully Bayesian treatment of tide gauge records in ocean circulation and climate studies.
An evaluation of percentile and maximum likelihood estimators of weibull paremeters
Stanley J. Zarnoch; Tommy R. Dell
1985-01-01
Two methods of estimating the three-parameter Weibull distribution were evaluated by computer simulation and field data comparison. Maximum likelihood estimators (MLB) with bias correction were calculated with the computer routine FITTER (Bailey 1974); percentile estimators (PCT) were those proposed by Zanakis (1979). The MLB estimators had superior smaller bias and...
Power and Sample Size Calculations for Logistic Regression Tests for Differential Item Functioning
ERIC Educational Resources Information Center
Li, Zhushan
2014-01-01
Logistic regression is a popular method for detecting uniform and nonuniform differential item functioning (DIF) effects. Theoretical formulas for the power and sample size calculations are derived for likelihood ratio tests and Wald tests based on the asymptotic distribution of the maximum likelihood estimators for the logistic regression model.…
A Note on Three Statistical Tests in the Logistic Regression DIF Procedure
ERIC Educational Resources Information Center
Paek, Insu
2012-01-01
Although logistic regression became one of the well-known methods in detecting differential item functioning (DIF), its three statistical tests, the Wald, likelihood ratio (LR), and score tests, which are readily available under the maximum likelihood, do not seem to be consistently distinguished in DIF literature. This paper provides a clarifying…
Contributions to the Underlying Bivariate Normal Method for Factor Analyzing Ordinal Data
ERIC Educational Resources Information Center
Xi, Nuo; Browne, Michael W.
2014-01-01
A promising "underlying bivariate normal" approach was proposed by Jöreskog and Moustaki for use in the factor analysis of ordinal data. This was a limited information approach that involved the maximization of a composite likelihood function. Its advantage over full-information maximum likelihood was that very much less computation was…
Expected versus Observed Information in SEM with Incomplete Normal and Nonnormal Data
ERIC Educational Resources Information Center
Savalei, Victoria
2010-01-01
Maximum likelihood is the most common estimation method in structural equation modeling. Standard errors for maximum likelihood estimates are obtained from the associated information matrix, which can be estimated from the sample using either expected or observed information. It is known that, with complete data, estimates based on observed or…
Evaluation of Smoking Prevention Television Messages Based on the Elaboration Likelihood Model
ERIC Educational Resources Information Center
Flynn, Brian S.; Worden, John K.; Bunn, Janice Yanushka; Connolly, Scott W.; Dorwaldt, Anne L.
2011-01-01
Progress in reducing youth smoking may depend on developing improved methods to communicate with higher risk youth. This study explored the potential of smoking prevention messages based on the Elaboration Likelihood Model (ELM) to address these needs. Structured evaluations of 12 smoking prevention messages based on three strategies derived from…
Can, Seda; van de Schoot, Rens; Hox, Joop
2015-06-01
Because variables may be correlated in the social and behavioral sciences, multicollinearity might be problematic. This study investigates the effect of collinearity manipulated in within and between levels of a two-level confirmatory factor analysis by Monte Carlo simulation. Furthermore, the influence of the size of the intraclass correlation coefficient (ICC) and estimation method; maximum likelihood estimation with robust chi-squares and standard errors and Bayesian estimation, on the convergence rate are investigated. The other variables of interest were rate of inadmissible solutions and the relative parameter and standard error bias on the between level. The results showed that inadmissible solutions were obtained when there was between level collinearity and the estimation method was maximum likelihood. In the within level multicollinearity condition, all of the solutions were admissible but the bias values were higher compared with the between level collinearity condition. Bayesian estimation appeared to be robust in obtaining admissible parameters but the relative bias was higher than for maximum likelihood estimation. Finally, as expected, high ICC produced less biased results compared to medium ICC conditions.
NASA Astrophysics Data System (ADS)
Zeng, X.
2015-12-01
A large number of model executions are required to obtain alternative conceptual models' predictions and their posterior probabilities in Bayesian model averaging (BMA). The posterior model probability is estimated through models' marginal likelihood and prior probability. The heavy computation burden hinders the implementation of BMA prediction, especially for the elaborated marginal likelihood estimator. For overcoming the computation burden of BMA, an adaptive sparse grid (SG) stochastic collocation method is used to build surrogates for alternative conceptual models through the numerical experiment of a synthetical groundwater model. BMA predictions depend on model posterior weights (or marginal likelihoods), and this study also evaluated four marginal likelihood estimators, including arithmetic mean estimator (AME), harmonic mean estimator (HME), stabilized harmonic mean estimator (SHME), and thermodynamic integration estimator (TIE). The results demonstrate that TIE is accurate in estimating conceptual models' marginal likelihoods. The BMA-TIE has better predictive performance than other BMA predictions. TIE has high stability for estimating conceptual model's marginal likelihood. The repeated estimated conceptual model's marginal likelihoods by TIE have significant less variability than that estimated by other estimators. In addition, the SG surrogates are efficient to facilitate BMA predictions, especially for BMA-TIE. The number of model executions needed for building surrogates is 4.13%, 6.89%, 3.44%, and 0.43% of the required model executions of BMA-AME, BMA-HME, BMA-SHME, and BMA-TIE, respectively.
Christensen, Ole F
2012-12-03
Single-step methods provide a coherent and conceptually simple approach to incorporate genomic information into genetic evaluations. An issue with single-step methods is compatibility between the marker-based relationship matrix for genotyped animals and the pedigree-based relationship matrix. Therefore, it is necessary to adjust the marker-based relationship matrix to the pedigree-based relationship matrix. Moreover, with data from routine evaluations, this adjustment should in principle be based on both observed marker genotypes and observed phenotypes, but until now this has been overlooked. In this paper, I propose a new method to address this issue by 1) adjusting the pedigree-based relationship matrix to be compatible with the marker-based relationship matrix instead of the reverse and 2) extending the single-step genetic evaluation using a joint likelihood of observed phenotypes and observed marker genotypes. The performance of this method is then evaluated using two simulated datasets. The method derived here is a single-step method in which the marker-based relationship matrix is constructed assuming all allele frequencies equal to 0.5 and the pedigree-based relationship matrix is constructed using the unusual assumption that animals in the base population are related and inbred with a relationship coefficient γ and an inbreeding coefficient γ / 2. Taken together, this γ parameter and a parameter that scales the marker-based relationship matrix can handle the issue of compatibility between marker-based and pedigree-based relationship matrices. The full log-likelihood function used for parameter inference contains two terms. The first term is the REML-log-likelihood for the phenotypes conditional on the observed marker genotypes, whereas the second term is the log-likelihood for the observed marker genotypes. Analyses of the two simulated datasets with this new method showed that 1) the parameters involved in adjusting marker-based and pedigree-based relationship matrices can depend on both observed phenotypes and observed marker genotypes and 2) a strong association between these two parameters exists. Finally, this method performed at least as well as a method based on adjusting the marker-based relationship matrix. Using the full log-likelihood and adjusting the pedigree-based relationship matrix to be compatible with the marker-based relationship matrix provides a new and interesting approach to handle the issue of compatibility between the two matrices in single-step genetic evaluation.
Sparsity guided empirical wavelet transform for fault diagnosis of rolling element bearings
NASA Astrophysics Data System (ADS)
Wang, Dong; Zhao, Yang; Yi, Cai; Tsui, Kwok-Leung; Lin, Jianhui
2018-02-01
Rolling element bearings are widely used in various industrial machines, such as electric motors, generators, pumps, gearboxes, railway axles, turbines, and helicopter transmissions. Fault diagnosis of rolling element bearings is beneficial to preventing any unexpected accident and reducing economic loss. In the past years, many bearing fault detection methods have been developed. Recently, a new adaptive signal processing method called empirical wavelet transform attracts much attention from readers and engineers and its applications to bearing fault diagnosis have been reported. The main problem of empirical wavelet transform is that Fourier segments required in empirical wavelet transform are strongly dependent on the local maxima of the amplitudes of the Fourier spectrum of a signal, which connotes that Fourier segments are not always reliable and effective if the Fourier spectrum of the signal is complicated and overwhelmed by heavy noises and other strong vibration components. In this paper, sparsity guided empirical wavelet transform is proposed to automatically establish Fourier segments required in empirical wavelet transform for fault diagnosis of rolling element bearings. Industrial bearing fault signals caused by single and multiple railway axle bearing defects are used to verify the effectiveness of the proposed sparsity guided empirical wavelet transform. Results show that the proposed method can automatically discover Fourier segments required in empirical wavelet transform and reveal single and multiple railway axle bearing defects. Besides, some comparisons with three popular signal processing methods including ensemble empirical mode decomposition, the fast kurtogram and the fast spectral correlation are conducted to highlight the superiority of the proposed method.
Maximum-likelihood fitting of data dominated by Poisson statistical uncertainties
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stoneking, M.R.; Den Hartog, D.J.
1996-06-01
The fitting of data by {chi}{sup 2}-minimization is valid only when the uncertainties in the data are normally distributed. When analyzing spectroscopic or particle counting data at very low signal level (e.g., a Thomson scattering diagnostic), the uncertainties are distributed with a Poisson distribution. The authors have developed a maximum-likelihood method for fitting data that correctly treats the Poisson statistical character of the uncertainties. This method maximizes the total probability that the observed data are drawn from the assumed fit function using the Poisson probability function to determine the probability for each data point. The algorithm also returns uncertainty estimatesmore » for the fit parameters. They compare this method with a {chi}{sup 2}-minimization routine applied to both simulated and real data. Differences in the returned fits are greater at low signal level (less than {approximately}20 counts per measurement). the maximum-likelihood method is found to be more accurate and robust, returning a narrower distribution of values for the fit parameters with fewer outliers.« less
Love, Jeffrey J.; Rigler, E. Joshua; Pulkkinen, Antti; Riley, Pete
2015-01-01
An examination is made of the hypothesis that the statistics of magnetic-storm-maximum intensities are the realization of a log-normal stochastic process. Weighted least-squares and maximum-likelihood methods are used to fit log-normal functions to −Dst storm-time maxima for years 1957-2012; bootstrap analysis is used to established confidence limits on forecasts. Both methods provide fits that are reasonably consistent with the data; both methods also provide fits that are superior to those that can be made with a power-law function. In general, the maximum-likelihood method provides forecasts having tighter confidence intervals than those provided by weighted least-squares. From extrapolation of maximum-likelihood fits: a magnetic storm with intensity exceeding that of the 1859 Carrington event, −Dst≥850 nT, occurs about 1.13 times per century and a wide 95% confidence interval of [0.42,2.41] times per century; a 100-yr magnetic storm is identified as having a −Dst≥880 nT (greater than Carrington) but a wide 95% confidence interval of [490,1187] nT.
Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong
2013-01-01
As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.
A Poisson Log-Normal Model for Constructing Gene Covariation Network Using RNA-seq Data.
Choi, Yoonha; Coram, Marc; Peng, Jie; Tang, Hua
2017-07-01
Constructing expression networks using transcriptomic data is an effective approach for studying gene regulation. A popular approach for constructing such a network is based on the Gaussian graphical model (GGM), in which an edge between a pair of genes indicates that the expression levels of these two genes are conditionally dependent, given the expression levels of all other genes. However, GGMs are not appropriate for non-Gaussian data, such as those generated in RNA-seq experiments. We propose a novel statistical framework that maximizes a penalized likelihood, in which the observed count data follow a Poisson log-normal distribution. To overcome the computational challenges, we use Laplace's method to approximate the likelihood and its gradients, and apply the alternating directions method of multipliers to find the penalized maximum likelihood estimates. The proposed method is evaluated and compared with GGMs using both simulated and real RNA-seq data. The proposed method shows improved performance in detecting edges that represent covarying pairs of genes, particularly for edges connecting low-abundant genes and edges around regulatory hubs.
Superfast maximum-likelihood reconstruction for quantum tomography
NASA Astrophysics Data System (ADS)
Shang, Jiangwei; Zhang, Zhengyun; Ng, Hui Khoon
2017-06-01
Conventional methods for computing maximum-likelihood estimators (MLE) often converge slowly in practical situations, leading to a search for simplifying methods that rely on additional assumptions for their validity. In this work, we provide a fast and reliable algorithm for maximum-likelihood reconstruction that avoids this slow convergence. Our method utilizes the state-of-the-art convex optimization scheme, an accelerated projected-gradient method, that allows one to accommodate the quantum nature of the problem in a different way than in the standard methods. We demonstrate the power of our approach by comparing its performance with other algorithms for n -qubit state tomography. In particular, an eight-qubit situation that purportedly took weeks of computation time in 2005 can now be completed in under a minute for a single set of data, with far higher accuracy than previously possible. This refutes the common claim that MLE reconstruction is slow and reduces the need for alternative methods that often come with difficult-to-verify assumptions. In fact, recent methods assuming Gaussian statistics or relying on compressed sensing ideas are demonstrably inapplicable for the situation under consideration here. Our algorithm can be applied to general optimization problems over the quantum state space; the philosophy of projected gradients can further be utilized for optimization contexts with general constraints.
Langholz, Bryan; Thomas, Duncan C.; Stovall, Marilyn; Smith, Susan A.; Boice, John D.; Shore, Roy E.; Bernstein, Leslie; Lynch, Charles F.; Zhang, Xinbo; Bernstein, Jonine L.
2009-01-01
Summary Methods for the analysis of individually matched case-control studies with location-specific radiation dose and tumor location information are described. These include likelihood methods for analyses that just use cases with precise location of tumor information and methods that also include cases with imprecise tumor location information. The theory establishes that each of these likelihood based methods estimates the same radiation rate ratio parameters, within the context of the appropriate model for location and subject level covariate effects. The underlying assumptions are characterized and the potential strengths and limitations of each method are described. The methods are illustrated and compared using the WECARE study of radiation and asynchronous contralateral breast cancer. PMID:18647297
Jackson, Dan; White, Ian R; Riley, Richard D
2013-01-01
Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. PMID:23401213
Reyes-Valdés, M H; Stelly, D M
1995-01-01
Frequencies of meiotic configurations in cytogenetic stocks are dependent on chiasma frequencies in segments defined by centromeres, breakpoints, and telomeres. The expectation maximization algorithm is proposed as a general method to perform maximum likelihood estimations of the chiasma frequencies in the intervals between such locations. The estimates can be translated via mapping functions into genetic maps of cytogenetic landmarks. One set of observational data was analyzed to exemplify application of these methods, results of which were largely concordant with other comparable data. The method was also tested by Monte Carlo simulation of frequencies of meiotic configurations from a monotelodisomic translocation heterozygote, assuming six different sample sizes. The estimate averages were always close to the values given initially to the parameters. The maximum likelihood estimation procedures can be extended readily to other kinds of cytogenetic stocks and allow the pooling of diverse cytogenetic data to collectively estimate lengths of segments, arms, and chromosomes. Images Fig. 1 PMID:7568226
DECONV-TOOL: An IDL based deconvolution software package
NASA Technical Reports Server (NTRS)
Varosi, F.; Landsman, W. B.
1992-01-01
There are a variety of algorithms for deconvolution of blurred images, each having its own criteria or statistic to be optimized in order to estimate the original image data. Using the Interactive Data Language (IDL), we have implemented the Maximum Likelihood, Maximum Entropy, Maximum Residual Likelihood, and sigma-CLEAN algorithms in a unified environment called DeConv_Tool. Most of the algorithms have as their goal the optimization of statistics such as standard deviation and mean of residuals. Shannon entropy, log-likelihood, and chi-square of the residual auto-correlation are computed by DeConv_Tool for the purpose of determining the performance and convergence of any particular method and comparisons between methods. DeConv_Tool allows interactive monitoring of the statistics and the deconvolved image during computation. The final results, and optionally, the intermediate results, are stored in a structure convenient for comparison between methods and review of the deconvolution computation. The routines comprising DeConv_Tool are available via anonymous FTP through the IDL Astronomy User's Library.
English, Sangeeta B.; Shih, Shou-Ching; Ramoni, Marco F.; Smith, Lois E.; Butte, Atul J.
2014-01-01
Though genome-wide technologies, such as microarrays, are widely used, data from these methods are considered noisy; there is still varied success in downstream biological validation. We report a method that increases the likelihood of successfully validating microarray findings using real time RT-PCR, including genes at low expression levels and with small differences. We use a Bayesian network to identify the most relevant sources of noise based on the successes and failures in validation for an initial set of selected genes, and then improve our subsequent selection of genes for validation based on eliminating these sources of noise. The network displays the significant sources of noise in an experiment, and scores the likelihood of validation for every gene. We show how the method can significantly increase validation success rates. In conclusion, in this study, we have successfully added a new automated step to determine the contributory sources of noise that determine successful or unsuccessful downstream biological validation. PMID:18790084
Extreme data compression for the CMB
Zablocki, Alan; Dodelson, Scott
2016-04-28
We apply the Karhunen-Loéve methods to cosmic microwave background (CMB) data sets, and show that we can recover the input cosmology and obtain the marginalized likelihoods in Λ cold dark matter cosmologies in under a minute, much faster than Markov chain Monte Carlo methods. This is achieved by forming a linear combination of the power spectra at each multipole l, and solving a system of simultaneous equations such that the Fisher matrix is locally unchanged. Instead of carrying out a full likelihood evaluation over the whole parameter space, we need evaluate the likelihood only for the parameter of interest, with themore » data compression effectively marginalizing over all other parameters. The weighting vectors contain insight about the physical effects of the parameters on the CMB anisotropy power spectrum C l. The shape and amplitude of these vectors give an intuitive feel for the physics of the CMB, the sensitivity of the observed spectrum to cosmological parameters, and the relative sensitivity of different experiments to cosmological parameters. We test this method on exact theory C l as well as on a Wilkinson Microwave Anisotropy Probe (WMAP)-like CMB data set generated from a random realization of a fiducial cosmology, comparing the compression results to those from a full likelihood analysis using CosmoMC. Furthermore, after showing that the method works, we apply it to the temperature power spectrum from the WMAP seven-year data release, and discuss the successes and limitations of our method as applied to a real data set.« less
Women's autonomy and reproductive health care utilisation: empirical evidence from Tajikistan.
Kamiya, Yusuke
2011-10-01
Women's autonomy is widely considered to be a key to improving maternal health in developing countries, whereas there is no consistent empirical evidence to support this claim. This paper examines whether or not and how women's autonomy within the household affects the use of reproductive health care, using a household survey data from Tajikistan. Estimation is performed by the bivariate probit model whereby woman's use of health services and the level of women's autonomy are recursively and simultaneously determined. The data is from a sample of women aged 15-49 from the Tajikistan Living Standard Measurement Survey 2007. Women's autonomy as measured by women's decision-making on household financial matters increase the likelihood that a woman receives antenatal and delivery care, whilst it has a negative effect on the probability of attending to four or more antenatal consultations. The hypothesis that women's autonomy and reproductive health care utilisation are independently determined is rejected for most of the estimation specifications, indicating the importance of taking into account the endogenous nature of women's autonomy when assessing its effect on health care use. The empirical results reconfirm the assertion that women's status within the household is closely linked to reproductive health care utilisation in developing countries. Policymakers therefore need not only to implement not only direct health interventions but also to focus on broader social policies which address women's empowerment. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Frey, H Christopher; Zhao, Yuchao
2004-11-15
Probabilistic emission inventories were developed for urban air toxic emissions of benzene, formaldehyde, chromium, and arsenic for the example of Houston. Variability and uncertainty in emission factors were quantified for 71-97% of total emissions, depending upon the pollutant and data availability. Parametric distributions for interunit variability were fit using maximum likelihood estimation (MLE), and uncertainty in mean emission factors was estimated using parametric bootstrap simulation. For data sets containing one or more nondetected values, empirical bootstrap simulation was used to randomly sample detection limits for nondetected values and observations for sample values, and parametric distributions for variability were fit using MLE estimators for censored data. The goodness-of-fit for censored data was evaluated by comparison of cumulative distributions of bootstrap confidence intervals and empirical data. The emission inventory 95% uncertainty ranges are as small as -25% to +42% for chromium to as large as -75% to +224% for arsenic with correlated surrogates. Uncertainty was dominated by only a few source categories. Recommendations are made for future improvements to the analysis.
Blood donation as a public good: an empirical investigation of the free rider problem.
Abásolo, Ignacio; Tsuchiya, Aki
2014-04-01
A voluntary blood donation system can be seen as a public good. People can take advantage without contributing and have a free ride. We empirically analyse the extent of free riding and its determinants. Interviews of the general public in Spain (n = 1,211) were used to ask whether respondents were (or have been) regular blood donors and, if not, the reason. Free riders are defined as those who are medically capable to donate blood but do not. In addition, we distinguish four different types of free riding depending on the reason given for not donating. Binomial and multinomial logit models estimate the effect of individual characteristics on the propensity to free ride and the likelihood of the free rider types. Amongst those who are able to donate, there is a 67 % probability of being a free rider. The most likely free rider is female, single, with low/no education and abstained from voting in a recent national election. Gender, age, religious practice, political participation and regional income explain the type of free rider.
First-Passage-Time Distribution for Variable-Diffusion Processes
NASA Astrophysics Data System (ADS)
Barney, Liberty; Gunaratne, Gemunu H.
2017-05-01
First-passage-time distribution, which presents the likelihood of a stock reaching a pre-specified price at a given time, is useful in establishing the value of financial instruments and in designing trading strategies. First-passage-time distribution for Wiener processes has a single peak, while that for stocks exhibits a notable second peak within a trading day. This feature has only been discussed sporadically—often dismissed as due to insufficient/incorrect data or circumvented by conversion to tick time—and to the best of our knowledge has not been explained in terms of the underlying stochastic process. It was shown previously that intra-day variations in the market can be modeled by a stochastic process containing two variable-diffusion processes (Hua et al. in, Physica A 419:221-233, 2015). We show here that the first-passage-time distribution of this two-stage variable-diffusion model does exhibit a behavior similar to the empirical observation. In addition, we find that an extended model incorporating overnight price fluctuations exhibits intra- and inter-day behavior similar to those of empirical first-passage-time distributions.
Scaling in the distribution of intertrade durations of Chinese stocks
NASA Astrophysics Data System (ADS)
Jiang, Zhi-Qiang; Chen, Wei; Zhou, Wei-Xing
2008-10-01
The distribution of intertrade durations, defined as the waiting times between two consecutive transactions, is investigated based upon the limit order book data of 23 liquid Chinese stocks listed on the Shenzhen Stock Exchange in the whole year 2003. A scaling pattern is observed in the distributions of intertrade durations, where the empirical density functions of the normalized intertrade durations of all 23 stocks collapse onto a single curve. The scaling pattern is also observed in the intertrade duration distributions for filled and partially filled trades and in the conditional distributions. The ensemble distributions for all stocks are modeled by the Weibull and the Tsallis q-exponential distributions. Maximum likelihood estimation shows that the Weibull distribution outperforms the q-exponential for not-too-large intertrade durations which account for more than 98.5% of the data. Alternatively, nonlinear least-squares estimation selects the q-exponential as a better model, in which the optimization is conducted on the distance between empirical and theoretical values of the logarithmic probability densities. The distribution of intertrade durations is Weibull followed by a power-law tail with an asymptotic tail exponent close to 3.
A New Tool for Forecasting Solar Drivers of Severe Space Weather
NASA Technical Reports Server (NTRS)
Adams, J. H.; Falconer, D.; Barghouty, A. F.; Khazanov, I.; Moore, R.
2010-01-01
This poster describes a tool that is designed to forecast solar drivers for severe space weather. Since most severe space weather is driven by Solar flares and Coronal Mass Ejections (CMEs) - the strongest of these originate in active regions and are driven by the release of coronal free magnetic energy and There is a positive correlation between an active region's free magnetic energy and the likelihood of flare and CME production therefore we can use this positive correlation as the basis of our empirical space weather forecasting tool. The new tool takes a full disk Michelson Doppler Imager (MDI) magnetogram, identifies strong magnetic field areas, identifies these with NOAA active regions, and measures a free-magnetic-energy proxy. It uses an empirically derived forecasting function to convert the free-magnetic-energy proxy to an expected event rate. It adds up the expected event rates from all active regions on the disk to forecast the expected rate and probability of each class of events -- X-class flares, X&M class flares, CMEs, fast CMEs, and solar particle events (SPEs).
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples
Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David
2009-01-01
Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
Lee, E Henry; Wickham, Charlotte; Beedlow, Peter A; Waschmann, Ronald S; Tingey, David T
2017-10-01
A time series intervention analysis (TSIA) of dendrochronological data to infer the tree growth-climate-disturbance relations and forest disturbance history is described. Maximum likelihood is used to estimate the parameters of a structural time series model with components for climate and forest disturbances (i.e., pests, diseases, fire). The statistical method is illustrated with a tree-ring width time series for a mature closed-canopy Douglas-fir stand on the west slopes of the Cascade Mountains of Oregon, USA that is impacted by Swiss needle cast disease caused by the foliar fungus, Phaecryptopus gaeumannii (Rhode) Petrak. The likelihood-based TSIA method is proposed for the field of dendrochronology to understand the interaction of temperature, water, and forest disturbances that are important in forest ecology and climate change studies.
U.S. cannabis legalization and use of vaping and edible products among youth.
Borodovsky, Jacob T; Lee, Dustin C; Crosier, Benjamin S; Gabrielli, Joy L; Sargent, James D; Budney, Alan J
2017-08-01
Alternative methods for consuming cannabis (e.g., vaping and edibles) have become more popular in the wake of U.S. cannabis legalization. Specific provisions of legal cannabis laws (LCL) (e.g., dispensary regulations) may impact the likelihood that youth will use alternative methods and the age at which they first try the method - potentially magnifying or mitigating the developmental harms of cannabis use. This study examined associations between LCL provisions and how youth consume cannabis. An online cannabis use survey was distributed using Facebook advertising, and data were collected from 2630 cannabis-using youth (ages 14-18). U.S. states were coded for LCL status and various LCL provisions. Regression analyses tested associations among lifetime use and age of onset of cannabis vaping and edibles and LCL provisions. Longer LCL duration (OR vaping : 2.82, 95% CI: 2.24, 3.55; OR edibles : 3.82, 95% CI: 2.96, 4.94), and higher dispensary density (OR vaping : 2.68, 95% CI: 2.12, 3.38; OR edibles : 3.31, 95% CI: 2.56, 4.26), were related to higher likelihood of trying vaping and edibles. Permitting home cultivation was related to higher likelihood (OR: 1.93, 95% CI: 1.50, 2.48) and younger age of onset (β: -0.30, 95% CI: -0.45, -0.15) of edibles. Specific provisions of LCL appear to impact the likelihood, and age at which, youth use alternative methods to consume cannabis. These methods may carry differential risks for initiation and escalation of cannabis use. Understanding associations between LCL provisions and methods of administration can inform the design of effective cannabis regulatory strategies. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Xu, M., III; Liu, X.
2017-12-01
In the past 60 years, both the runoff and sediment load in the Yellow River Basin showed significant decreasing trends owing to the influences of human activities and climate change. Quantifying the impact of each factor (e.g. precipitation, sediment trapping dams, pasture, terrace, etc.) on the runoff and sediment load is among the key issues to guide the implement of water and soil conservation measures, and to predict the variation trends in the future. Hundreds of methods have been developed for studying the runoff and sediment load in the Yellow River Basin. Generally, these methods can be classified into empirical methods and physical-based models. The empirical methods, including hydrological method, soil and water conservation method, etc., are widely used in the Yellow River management engineering. These methods generally apply the statistical analyses like the regression analysis to build the empirical relationships between the main characteristic variables in a river basin. The elasticity method extensively used in the hydrological research can be classified into empirical method as it is mathematically deduced to be equivalent with the hydrological method. Physical-based models mainly include conceptual models and distributed models. The conceptual models are usually lumped models (e.g. SYMHD model, etc.) and can be regarded as transition of empirical models and distributed models. Seen from the publications that less studies have been conducted applying distributed models than empirical models as the simulation results of runoff and sediment load based on distributed models (e.g. the Digital Yellow Integrated Model, the Geomorphology-Based Hydrological Model, etc.) were usually not so satisfied owing to the intensive human activities in the Yellow River Basin. Therefore, this study primarily summarizes the empirical models applied in the Yellow River Basin and theoretically analyzes the main causes for the significantly different results using different empirical researching methods. Besides, we put forward an assessment frame for the researching methods of the runoff and sediment load variations in the Yellow River Basin from the point of view of inputting data, model structure and result output. And the assessment frame was then applied in the Huangfuchuan River.
Li, Dongmei; Le Pape, Marc A; Parikh, Nisha I; Chen, Will X; Dye, Timothy D
2013-01-01
Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.
Method and system for diagnostics of apparatus
NASA Technical Reports Server (NTRS)
Gorinevsky, Dimitry (Inventor)
2012-01-01
Proposed is a method, implemented in software, for estimating fault state of an apparatus outfitted with sensors. At each execution period the method processes sensor data from the apparatus to obtain a set of parity parameters, which are further used for estimating fault state. The estimation method formulates a convex optimization problem for each fault hypothesis and employs a convex solver to compute fault parameter estimates and fault likelihoods for each fault hypothesis. The highest likelihoods and corresponding parameter estimates are transmitted to a display device or an automated decision and control system. The obtained accurate estimate of fault state can be used to improve safety, performance, or maintenance processes for the apparatus.
NASA Astrophysics Data System (ADS)
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-07-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.
The Frequency of Fitness Peak Shifts Is Increased at Expanding Range Margins Due to Mutation Surfing
Burton, Olivia J.; Travis, Justin M. J.
2008-01-01
Dynamic species' ranges, those that are either invasive or shifting in response to environmental change, are the focus of much recent interest in ecology, evolution, and genetics. Understanding how range expansions can shape evolutionary trajectories requires the consideration of nonneutral variability and genetic architecture, yet the majority of empirical and theoretical work to date has explored patterns of neutral variability. Here we use forward computer simulations of population growth, dispersal, and mutation to explore how range-shifting dynamics can influence evolution on rugged fitness landscapes. We employ a two-locus model, incorporating sign epistasis, and find that there is an increased likelihood of fitness peak shifts during a period of range expansion. Maladapted valley genotypes can accumulate at an expanding range front through a phenomenon called mutation surfing, which increases the likelihood that a mutation leading to a higher peak will occur. Our results indicate that most peak shifts occur close to the expanding front. We also demonstrate that periods of range shifting are especially important for peak shifting in species with narrow geographic distributions. Our results imply that trajectories on rugged fitness landscapes can be modified substantially when ranges are dynamic. PMID:18505864
Life Shocks and Crime: A Test of the “Turning Point” Hypothesis
Noonan, Kelly; Reichman, Nancy E.; Schwartz-Soicher, Ofira
2012-01-01
Other researchers have posited that important events in men’s lives—such as employment, marriage, and parenthood—strengthen their social ties and lead them to refrain from crime. A challenge in empirically testing this hypothesis has been the issue of self-selection into life transitions. This study contributes to this literature by estimating the effects of an exogenous life shock on crime. We use data from the Fragile Families and Child Wellbeing Study, augmented with information from hospital medical records, to estimate the effects of the birth of a child with a severe health problem on the likelihood that the infant’s father engages in illegal activities. We conduct a number of auxiliary analyses to examine exogeneity assumptions. We find that having an infant born with a severe health condition increases the likelihood that the father is convicted of a crime in the three-year period following the birth of the child, and at least part of the effect appears to operate through work and changes in parental relationships. These results provide evidence that life events can cause crime and, as such, support the “turning point” hypothesis. PMID:21660628
Urquhart, Erin A.; Jones, Stephen H.; Yu, Jong W.; Schuster, Brian M.; Marcinkiewicz, Ashley L.; Whistler, Cheryl A.; Cooper, Vaughn S.
2016-01-01
Reports from state health departments and the Centers for Disease Control and Prevention indicate that the annual number of reported human vibriosis cases in New England has increased in the past decade. Concurrently, there has been a shift in both the spatial distribution and seasonal detection of Vibrio spp. throughout the region based on limited monitoring data. To determine environmental factors that may underlie these emerging conditions, this study focuses on a long-term database of Vibrio parahaemolyticus concentrations in oyster samples generated from data collected from the Great Bay Estuary, New Hampshire over a period of seven consecutive years. Oyster samples from two distinct sites were analyzed for V. parahaemolyticus abundance, noting significant relationships with various biotic and abiotic factors measured during the same period of study. We developed a predictive modeling tool capable of estimating the likelihood of V. parahaemolyticus presence in coastal New Hampshire oysters. Results show that the inclusion of chlorophyll a concentration to an empirical model otherwise employing only temperature and salinity variables, offers improved predictive capability for modeling the likelihood of V. parahaemolyticus in the Great Bay Estuary. PMID:27144925
Improved gap size estimation for scaffolding algorithms.
Sahlin, Kristoffer; Street, Nathaniel; Lundeberg, Joakim; Arvestad, Lars
2012-09-01
One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance. In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners. A reference implementation is provided at https://github.com/SciLifeLab/gapest. Supplementary data are availible at Bioinformatics online.
Holland, Kathryn J; Cortina, Lilia M
2017-10-01
Approximately 1 in 4 women is sexually assaulted in college, a problem that federal law has attempted to address with recent changes. Under the evolving landscape of Title IX, and related law, universities nationwide have overhauled their sexual assault policies, procedures, and resources. Many of the new policies designate undergraduate resident assistants (RAs) as Responsible Employees-requiring them to provide assistance and report to the university if a fellow student discloses sexual assault. We investigated factors that predict the likelihood of RAs enacting their policy mandate, that is, reporting sexual assault disclosures to university authorities and referring survivors to sexual assault resources. Based on data from 305 Responsible Employee RAs, we found that likelihood to report and refer varied, depending on RAs' knowledge of reporting procedures and resources, trust in these supports, and perceptions of mandatory reporting policy. Understanding mandatory reporter behavior is crucial, because help-providers' responses can have serious implications for the recovery of sexual assault survivors. Our findings elucidate some effects of changes in the interpretation and implementation of Title IX, with potential to inform the development of more theoretically and empirically informed policies. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Free-Riding Behavior in Vaccination Decisions: An Experimental Study
Ibuka, Yoko; Li, Meng; Vietri, Jeffrey; Chapman, Gretchen B.; Galvani, Alison P.
2014-01-01
Individual decision-making regarding vaccination may be affected by the vaccination choices of others. As vaccination produces externalities reducing transmission of a disease, it can provide an incentive for individuals to be free-riders who benefit from the vaccination of others while avoiding the cost of vaccination. This study examined an individual's decision about vaccination in a group setting for a hypothetical disease that is called “influenza” using a computerized experimental game. In the game, interactions with others are allowed. We found that higher observed vaccination rate within the group during the previous round of the game decreased the likelihood of an individual's vaccination acceptance, indicating the existence of free-riding behavior. The free-riding behavior was observed regardless of parameter conditions on the characteristics of the influenza and vaccine. We also found that other predictors of vaccination uptake included an individual's own influenza exposure in previous rounds increasing the likelihood of vaccination acceptance, consistent with existing empirical studies. Influenza prevalence among other group members during the previous round did not have a statistically significant effect on vaccination acceptance in the current round once vaccination rate in the previous round was controlled for. PMID:24475246
Rape myth acceptance and rape acknowledgment: The mediating role of sexual refusal assertiveness.
Newins, Amie R; Wilson, Laura C; White, Susan W
2018-05-01
Unacknowledged rape, defined as when an individual experiences an event that meets a legal or empirical definition of rape but the individual does not label it as such, is prevalent. Research examining predictors of rape acknowledgment is needed. Sexual assertiveness may be an important variable to consider, as an individual's typical behavior during sexual situations may influence rape acknowledgment. To assess the indirect effect of rape myth acceptance on rape acknowledgment through sexual refusal assertiveness, an online survey of 181 female rape survivors was conducted. The indirect effects of two types of rape myths (He didn't mean to and Rape is a deviant event) were significant and positive. Specifically, acceptance of these two rape myths was negatively related to sexual refusal assertiveness, which was negatively associated with likelihood of rape acknowledgment. The results of this study indicate that sexual refusal assertiveness is associated with lower likelihood of rape acknowledgment among rape survivors. As a result, it appears that, under certain circumstances, women high in rape myth acceptance may be more likely to acknowledge rape when it results in decreased sexual refusal assertiveness. Copyright © 2018 Elsevier B.V. All rights reserved.
Mapping Diffuse Seismicity Using Empirical Matched Field Processing Techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, J; Templeton, D C; Harris, D B
The objective of this project is to detect and locate more microearthquakes using the empirical matched field processing (MFP) method than can be detected using only conventional earthquake detection techniques. We propose that empirical MFP can complement existing catalogs and techniques. We test our method on continuous seismic data collected at the Salton Sea Geothermal Field during November 2009 and January 2010. In the Southern California Earthquake Data Center (SCEDC) earthquake catalog, 619 events were identified in our study area during this time frame and our MFP technique identified 1094 events. Therefore, we believe that the empirical MFP method combinedmore » with conventional methods significantly improves the network detection ability in an efficient matter.« less
Anatomically-Aided PET Reconstruction Using the Kernel Method
Hutchcroft, Will; Wang, Guobao; Chen, Kevin T.; Catana, Ciprian; Qi, Jinyi
2016-01-01
This paper extends the kernel method that was proposed previously for dynamic PET reconstruction, to incorporate anatomical side information into the PET reconstruction model. In contrast to existing methods that incorporate anatomical information using a penalized likelihood framework, the proposed method incorporates this information in the simpler maximum likelihood (ML) formulation and is amenable to ordered subsets. The new method also does not require any segmentation of the anatomical image to obtain edge information. We compare the kernel method with the Bowsher method for anatomically-aided PET image reconstruction through a simulated data set. Computer simulations demonstrate that the kernel method offers advantages over the Bowsher method in region of interest (ROI) quantification. Additionally the kernel method is applied to a 3D patient data set. The kernel method results in reduced noise at a matched contrast level compared with the conventional ML expectation maximization (EM) algorithm. PMID:27541810
Anatomically-aided PET reconstruction using the kernel method.
Hutchcroft, Will; Wang, Guobao; Chen, Kevin T; Catana, Ciprian; Qi, Jinyi
2016-09-21
This paper extends the kernel method that was proposed previously for dynamic PET reconstruction, to incorporate anatomical side information into the PET reconstruction model. In contrast to existing methods that incorporate anatomical information using a penalized likelihood framework, the proposed method incorporates this information in the simpler maximum likelihood (ML) formulation and is amenable to ordered subsets. The new method also does not require any segmentation of the anatomical image to obtain edge information. We compare the kernel method with the Bowsher method for anatomically-aided PET image reconstruction through a simulated data set. Computer simulations demonstrate that the kernel method offers advantages over the Bowsher method in region of interest quantification. Additionally the kernel method is applied to a 3D patient data set. The kernel method results in reduced noise at a matched contrast level compared with the conventional ML expectation maximization algorithm.
Anatomically-aided PET reconstruction using the kernel method
NASA Astrophysics Data System (ADS)
Hutchcroft, Will; Wang, Guobao; Chen, Kevin T.; Catana, Ciprian; Qi, Jinyi
2016-09-01
This paper extends the kernel method that was proposed previously for dynamic PET reconstruction, to incorporate anatomical side information into the PET reconstruction model. In contrast to existing methods that incorporate anatomical information using a penalized likelihood framework, the proposed method incorporates this information in the simpler maximum likelihood (ML) formulation and is amenable to ordered subsets. The new method also does not require any segmentation of the anatomical image to obtain edge information. We compare the kernel method with the Bowsher method for anatomically-aided PET image reconstruction through a simulated data set. Computer simulations demonstrate that the kernel method offers advantages over the Bowsher method in region of interest quantification. Additionally the kernel method is applied to a 3D patient data set. The kernel method results in reduced noise at a matched contrast level compared with the conventional ML expectation maximization algorithm.
Optimal Methods for Classification of Digitally Modulated Signals
2013-03-01
of using a ratio of likelihood functions, the proposed approach uses the Kullback - Leibler (KL) divergence. KL...58 List of Acronyms ALRT Average LRT BPSK Binary Shift Keying BPSK-SS BPSK Spread Spectrum or CDMA DKL Kullback - Leibler Information Divergence...blind demodulation for develop classification algorithms for wider set of signals types. Two methodologies were used : Likelihood Ratio Test
ERIC Educational Resources Information Center
Kieftenbeld, Vincent; Natesan, Prathiba
2012-01-01
Markov chain Monte Carlo (MCMC) methods enable a fully Bayesian approach to parameter estimation of item response models. In this simulation study, the authors compared the recovery of graded response model parameters using marginal maximum likelihood (MML) and Gibbs sampling (MCMC) under various latent trait distributions, test lengths, and…
Maximum Likelihood Dynamic Factor Modeling for Arbitrary "N" and "T" Using SEM
ERIC Educational Resources Information Center
Voelkle, Manuel C.; Oud, Johan H. L.; von Oertzen, Timo; Lindenberger, Ulman
2012-01-01
This article has 3 objectives that build on each other. First, we demonstrate how to obtain maximum likelihood estimates for dynamic factor models (the direct autoregressive factor score model) with arbitrary "T" and "N" by means of structural equation modeling (SEM) and compare the approach to existing methods. Second, we go beyond standard time…
Rate of convergence of k-step Newton estimators to efficient likelihood estimators
Steve Verrill
2007-01-01
We make use of Cramer conditions together with the well-known local quadratic convergence of Newton?s method to establish the asymptotic closeness of k-step Newton estimators to efficient likelihood estimators. In Verrill and Johnson [2007. Confidence bounds and hypothesis tests for normal distribution coefficients of variation. USDA Forest Products Laboratory Research...
Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond.
Huisman, Jisca
2017-09-01
Data on hundreds or thousands of single nucleotide polymorphisms (SNPs) provide detailed information about the relationships between individuals, but currently few tools can turn this information into a multigenerational pedigree. I present the r package sequoia, which assigns parents, clusters half-siblings sharing an unsampled parent and assigns grandparents to half-sibships. Assignments are made after consideration of the likelihoods of all possible first-, second- and third-degree relationships between the focal individuals, as well as the traditional alternative of being unrelated. This careful exploration of the local likelihood surface is implemented in a fast, heuristic hill-climbing algorithm. Distinction between the various categories of second-degree relatives is possible when likelihoods are calculated conditional on at least one parent of each focal individual. Performance was tested on simulated data sets with realistic genotyping error rate and missingness, based on three different large pedigrees (N = 1000-2000). This included a complex pedigree with overlapping generations, occasional close inbreeding and some unknown birth years. Parentage assignment was highly accurate down to about 100 independent SNPs (error rate <0.1%) and fast (<1 min) as most pairs can be excluded from being parent-offspring based on opposite homozygosity. For full pedigree reconstruction, 40% of parents were assumed nongenotyped. Reconstruction resulted in low error rates (<0.3%), high assignment rates (>99%) in limited computation time (typically <1 h) when at least 200 independent SNPs were used. In three empirical data sets, relatedness estimated from the inferred pedigree was strongly correlated to genomic relatedness. © 2017 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.
Comparison between presepsin and procalcitonin in early diagnosis of neonatal sepsis.
Iskandar, Agustin; Arthamin, Maimun Z; Indriana, Kristin; Anshory, Muhammad; Hur, Mina; Di Somma, Salvatore
2018-05-09
Neonatal sepsis remains worldwide one of the leading causes of morbidity and mortality in both term and preterm infants. Lower mortality rates are related to timely diagnostic evaluation and prompt initiation of empiric antibiotic therapy. Blood culture, as gold standard examination for sepsis, has several limitations for early diagnosis, so that sepsis biomarkers could play an important role in this regard. This study was aimed to compare the value of the two biomarkers presepsin and procalcitonin in early diagnosis of neonatal sepsis. This was a prospective cross-sectional study performed, in Saiful Anwar General Hospital Malang, Indonesia, in 51 neonates that fulfill the criteria of systemic inflammatory response syndrome (SIRS) with blood culture as diagnostic gold standard for sepsis. At reviewer operating characteristic (ROC) curve analyses, using a presepsin cutoff of 706,5 pg/mL, the obtained area under the curve (AUCs) were: sensitivity = 85.7%, specificity = 68.8%, positive predictive value = 85.7%, negative predictive value = 68.8%, positive likelihood ratio = 2.75, negative likelihood ratio = 0.21, and accuracy = 80.4%. On the other hand, with a procalcitonin cutoff value of 161.33 pg/mL the obtained AUCs showed: sensitivity = 68.6%, specificity = 62.5%, positive predictive value = 80%, negative predictive value = 47.6%, positive likelihood ratio = 1.83, the odds ratio negative = 0.5, and accuracy = 66.7%. In early diagnosis of neonatal sepsis, compared with procalcitonin, presepsin seems to provide better early diagnostic value with consequent possible faster therapeutical decision making and possible positive impact on outcome of neonates.
Galaxy Zoo: Observing secular evolution through bars
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cheung, Edmond; Faber, S. M.; Koo, David C.
In this paper, we use the Galaxy Zoo 2 data set to study the behavior of bars in disk galaxies as a function of specific star formation rate (SSFR) and bulge prominence. Our sample consists of 13,295 disk galaxies, with an overall (strong) bar fraction of 23.6% ± 0.4%, of which 1154 barred galaxies also have bar length (BL) measurements. These samples are the largest ever used to study the role of bars in galaxy evolution. We find that the likelihood of a galaxy hosting a bar is anticorrelated with SSFR, regardless of stellar mass or bulge prominence. We findmore » that the trends of bar likelihood and BL with bulge prominence are bimodal with SSFR. We interpret these observations using state-of-the-art simulations of bar evolution that include live halos and the effects of gas and star formation. We suggest our observed trends of bar likelihood with SSFR are driven by the gas fraction of the disks, a factor demonstrated to significantly retard both bar formation and evolution in models. We interpret the bimodal relationship between bulge prominence and bar properties as being due to the complicated effects of classical bulges and central mass concentrations on bar evolution and also to the growth of disky pseudobulges by bar evolution. These results represent empirical evidence for secular evolution driven by bars in disk galaxies. This work suggests that bars are not stagnant structures within disk galaxies but are a critical evolutionary driver of their host galaxies in the local universe (z < 1).« less
Chen, Rui; Hyrien, Ollivier
2011-01-01
This article deals with quasi- and pseudo-likelihood estimation in a class of continuous-time multi-type Markov branching processes observed at discrete points in time. “Conventional” and conditional estimation are discussed for both approaches. We compare their properties and identify situations where they lead to asymptotically equivalent estimators. Both approaches possess robustness properties, and coincide with maximum likelihood estimation in some cases. Quasi-likelihood functions involving only linear combinations of the data may be unable to estimate all model parameters. Remedial measures exist, including the resort either to non-linear functions of the data or to conditioning the moments on appropriate sigma-algebras. The method of pseudo-likelihood may also resolve this issue. We investigate the properties of these approaches in three examples: the pure birth process, the linear birth-and-death process, and a two-type process that generalizes the previous two examples. Simulations studies are conducted to evaluate performance in finite samples. PMID:21552356
Maximum likelihood estimation of finite mixture model for economic data
NASA Astrophysics Data System (ADS)
Phoong, Seuk-Yen; Ismail, Mohd Tahir
2014-06-01
Finite mixture model is a mixture model with finite-dimension. This models are provides a natural representation of heterogeneity in a finite number of latent classes. In addition, finite mixture models also known as latent class models or unsupervised learning models. Recently, maximum likelihood estimation fitted finite mixture models has greatly drawn statistician's attention. The main reason is because maximum likelihood estimation is a powerful statistical method which provides consistent findings as the sample sizes increases to infinity. Thus, the application of maximum likelihood estimation is used to fit finite mixture model in the present paper in order to explore the relationship between nonlinear economic data. In this paper, a two-component normal mixture model is fitted by maximum likelihood estimation in order to investigate the relationship among stock market price and rubber price for sampled countries. Results described that there is a negative effect among rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia.
Algorithms of maximum likelihood data clustering with applications
NASA Astrophysics Data System (ADS)
Giada, Lorenzo; Marsili, Matteo
2002-12-01
We address the problem of data clustering by introducing an unsupervised, parameter-free approach based on maximum likelihood principle. Starting from the observation that data sets belonging to the same cluster share a common information, we construct an expression for the likelihood of any possible cluster structure. The likelihood in turn depends only on the Pearson's coefficient of the data. We discuss clustering algorithms that provide a fast and reliable approximation to maximum likelihood configurations. Compared to standard clustering methods, our approach has the advantages that (i) it is parameter free, (ii) the number of clusters need not be fixed in advance and (iii) the interpretation of the results is transparent. In order to test our approach and compare it with standard clustering algorithms, we analyze two very different data sets: time series of financial market returns and gene expression data. We find that different maximization algorithms produce similar cluster structures whereas the outcome of standard algorithms has a much wider variability.
Levine, Adam C; Glavis-Bloom, Justin; Modi, Payal; Nasrin, Sabiha; Rege, Soham; Chu, Chieh; Schmid, Christopher H; Alam, Nur H
2015-08-18
Diarrhea remains one of the most common and most deadly conditions affecting children worldwide. Accurately assessing dehydration status is critical to determining treatment course, yet no clinical diagnostic models for dehydration have been empirically derived and validated for use in resource-limited settings. In the Dehydration: Assessing Kids Accurately (DHAKA) prospective cohort study, a random sample of children under 5 with acute diarrhea was enrolled between February and June 2014 in Bangladesh. Local nurses assessed children for clinical signs of dehydration on arrival, and then serial weights were obtained as subjects were rehydrated. For each child, the percent weight change with rehydration was used to classify subjects with severe dehydration (>9% weight change), some dehydration (3-9%), or no dehydration (<3%). Clinical variables were then entered into logistic regression and recursive partitioning models to develop the DHAKA Dehydration Score and DHAKA Dehydration Tree, respectively. Models were assessed for their accuracy using the area under their receiver operating characteristic curve (AUC) and for their reliability through repeat clinical exams. Bootstrapping was used to internally validate the models. A total of 850 children were enrolled, with 771 included in the final analysis. Of the 771 children included in the analysis, 11% were classified with severe dehydration, 45% with some dehydration, and 44% with no dehydration. Both the DHAKA Dehydration Score and DHAKA Dehydration Tree had significant AUCs of 0.79 (95% CI = 0.74, 0.84) and 0.76 (95% CI = 0.71, 0.80), respectively, for the diagnosis of severe dehydration. Additionally, the DHAKA Dehydration Score and DHAKA Dehydration Tree had significant positive likelihood ratios of 2.0 (95% CI = 1.8, 2.3) and 2.5 (95% CI = 2.1, 2.8), respectively, and significant negative likelihood ratios of 0.23 (95% CI = 0.13, 0.40) and 0.28 (95% CI = 0.18, 0.44), respectively, for the diagnosis of severe dehydration. Both models demonstrated 90% agreement between independent raters and good reproducibility using bootstrapping. This study is the first to empirically derive and internally validate accurate and reliable clinical diagnostic models for dehydration in a resource-limited setting. After external validation, frontline providers may use these new tools to better manage acute diarrhea in children. © Levine et al.
Dudaniec, Rachael Y; Worthington Wilmer, Jessica; Hanson, Jeffrey O; Warren, Matthew; Bell, Sarah; Rhodes, Jonathan R
2016-01-01
Landscape genetics lacks explicit methods for dealing with the uncertainty in landscape resistance estimation, which is particularly problematic when sample sizes of individuals are small. Unless uncertainty can be quantified, valuable but small data sets may be rendered unusable for conservation purposes. We offer a method to quantify uncertainty in landscape resistance estimates using multimodel inference as an improvement over single model-based inference. We illustrate the approach empirically using co-occurring, woodland-preferring Australian marsupials within a common study area: two arboreal gliders (Petaurus breviceps, and Petaurus norfolcensis) and one ground-dwelling antechinus (Antechinus flavipes). First, we use maximum-likelihood and a bootstrap procedure to identify the best-supported isolation-by-resistance model out of 56 models defined by linear and non-linear resistance functions. We then quantify uncertainty in resistance estimates by examining parameter selection probabilities from the bootstrapped data. The selection probabilities provide estimates of uncertainty in the parameters that drive the relationships between landscape features and resistance. We then validate our method for quantifying uncertainty using simulated genetic and landscape data showing that for most parameter combinations it provides sensible estimates of uncertainty. We conclude that small data sets can be informative in landscape genetic analyses provided uncertainty can be explicitly quantified. Being explicit about uncertainty in landscape genetic models will make results more interpretable and useful for conservation decision-making, where dealing with uncertainty is critical. © 2015 John Wiley & Sons Ltd.
Searching mixed DNA profiles directly against profile databases.
Bright, Jo-Anne; Taylor, Duncan; Curran, James; Buckleton, John
2014-03-01
DNA databases have revolutionised forensic science. They are a powerful investigative tool as they have the potential to identify persons of interest in criminal investigations. Routinely, a DNA profile generated from a crime sample could only be searched for in a database of individuals if the stain was from single contributor (single source) or if a contributor could unambiguously be determined from a mixed DNA profile. This meant that a significant number of samples were unsuitable for database searching. The advent of continuous methods for the interpretation of DNA profiles offers an advanced way to draw inferential power from the considerable investment made in DNA databases. Using these methods, each profile on the database may be considered a possible contributor to a mixture and a likelihood ratio (LR) can be formed. Those profiles which produce a sufficiently large LR can serve as an investigative lead. In this paper empirical studies are described to determine what constitutes a large LR. We investigate the effect on a database search of complex mixed DNA profiles with contributors in equal proportions with dropout as a consideration, and also the effect of an incorrect assignment of the number of contributors to a profile. In addition, we give, as a demonstration of the method, the results using two crime samples that were previously unsuitable for database comparison. We show that effective management of the selection of samples for searching and the interpretation of the output can be highly informative. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.
Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao
2015-08-01
Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.
Estimating Divergence Parameters With Small Samples From a Large Number of Loci
Wang, Yong; Hey, Jody
2010-01-01
Most methods for studying divergence with gene flow rely upon data from many individuals at few loci. Such data can be useful for inferring recent population history but they are unlikely to contain sufficient information about older events. However, the growing availability of genome sequences suggests a different kind of sampling scheme, one that may be more suited to studying relatively ancient divergence. Data sets extracted from whole-genome alignments may represent very few individuals but contain a very large number of loci. To take advantage of such data we developed a new maximum-likelihood method for genomic data under the isolation-with-migration model. Unlike many coalescent-based likelihood methods, our method does not rely on Monte Carlo sampling of genealogies, but rather provides a precise calculation of the likelihood by numerical integration over all genealogies. We demonstrate that the method works well on simulated data sets. We also consider two models for accommodating mutation rate variation among loci and find that the model that treats mutation rates as random variables leads to better estimates. We applied the method to the divergence of Drosophila melanogaster and D. simulans and detected a low, but statistically significant, signal of gene flow from D. simulans to D. melanogaster. PMID:19917765
GPSit: An automated method for evolutionary analysis of nonculturable ciliated microeukaryotes.
Chen, Xiao; Wang, Yurui; Sheng, Yalan; Warren, Alan; Gao, Shan
2018-05-01
Microeukaryotes are among the most important components of the microbial food web in almost all aquatic and terrestrial ecosystems worldwide. In order to gain a better understanding their roles and functions in ecosystems, sequencing coupled with phylogenomic analyses of entire genomes or transcriptomes is increasingly used to reconstruct the evolutionary history and classification of these microeukaryotes and thus provide a more robust framework for determining their systematics and diversity. More importantly, phylogenomic research usually requires high levels of hands-on bioinformatics experience. Here, we propose an efficient automated method, "Guided Phylogenomic Search in trees" (GPSit), which starts from predicted protein sequences of newly sequenced species and a well-defined customized orthologous database. Compared with previous protocols, our method streamlines the entire workflow by integrating all essential and other optional operations. In so doing, the manual operation time for reconstructing phylogenetic relationships is reduced from days to several hours, compared to other methods. Furthermore, GPSit supports user-defined parameters in most steps and thus allows users to adapt it to their studies. The effectiveness of GPSit is demonstrated by incorporating available online data and new single-cell data of three nonculturable marine ciliates (Anteholosticha monilata, Deviata sp. and Diophrys scutum) under moderate sequencing coverage (~5×). Our results indicate that the former could reconstruct robust "deep" phylogenetic relationships while the latter reveals the presence of intermediate taxa in shallow relationships. Based on empirical phylogenomic data, we also used GPSit to evaluate the impact of different levels of missing data on two commonly used methods of phylogenetic analyses, maximum likelihood (ML) and Bayesian inference (BI) methods. We found that BI is less sensitive to missing data when fast-evolving sites are removed. © 2018 John Wiley & Sons Ltd.
Testing non-inferiority of a new treatment in three-arm clinical trials with binary endpoints.
Tang, Nian-Sheng; Yu, Bin; Tang, Man-Lai
2014-12-18
A two-arm non-inferiority trial without a placebo is usually adopted to demonstrate that an experimental treatment is not worse than a reference treatment by a small pre-specified non-inferiority margin due to ethical concerns. Selection of the non-inferiority margin and establishment of assay sensitivity are two major issues in the design, analysis and interpretation for two-arm non-inferiority trials. Alternatively, a three-arm non-inferiority clinical trial including a placebo is usually conducted to assess the assay sensitivity and internal validity of a trial. Recently, some large-sample approaches have been developed to assess the non-inferiority of a new treatment based on the three-arm trial design. However, these methods behave badly with small sample sizes in the three arms. This manuscript aims to develop some reliable small-sample methods to test three-arm non-inferiority. Saddlepoint approximation, exact and approximate unconditional, and bootstrap-resampling methods are developed to calculate p-values of the Wald-type, score and likelihood ratio tests. Simulation studies are conducted to evaluate their performance in terms of type I error rate and power. Our empirical results show that the saddlepoint approximation method generally behaves better than the asymptotic method based on the Wald-type test statistic. For small sample sizes, approximate unconditional and bootstrap-resampling methods based on the score test statistic perform better in the sense that their corresponding type I error rates are generally closer to the prespecified nominal level than those of other test procedures. Both approximate unconditional and bootstrap-resampling test procedures based on the score test statistic are generally recommended for three-arm non-inferiority trials with binary outcomes.
Quantitative PET Imaging in Drug Development: Estimation of Target Occupancy.
Naganawa, Mika; Gallezot, Jean-Dominique; Rossano, Samantha; Carson, Richard E
2017-12-11
Positron emission tomography, an imaging tool using radiolabeled tracers in humans and preclinical species, has been widely used in recent years in drug development, particularly in the central nervous system. One important goal of PET in drug development is assessing the occupancy of various molecular targets (e.g., receptors, transporters, enzymes) by exogenous drugs. The current linear mathematical approaches used to determine occupancy using PET imaging experiments are presented. These algorithms use results from multiple regions with different target content in two scans, a baseline (pre-drug) scan and a post-drug scan. New mathematical estimation approaches to determine target occupancy, using maximum likelihood, are presented. A major challenge in these methods is the proper definition of the covariance matrix of the regional binding measures, accounting for different variance of the individual regional measures and their nonzero covariance, factors that have been ignored by conventional methods. The novel methods are compared to standard methods using simulation and real human occupancy data. The simulation data showed the expected reduction in variance and bias using the proper maximum likelihood methods, when the assumptions of the estimation method matched those in simulation. Between-method differences for data from human occupancy studies were less obvious, in part due to small dataset sizes. These maximum likelihood methods form the basis for development of improved PET covariance models, in order to minimize bias and variance in PET occupancy studies.
Safe semi-supervised learning based on weighted likelihood.
Kawakita, Masanori; Takeuchi, Jun'ichi
2014-05-01
We are interested in developing a safe semi-supervised learning that works in any situation. Semi-supervised learning postulates that n(') unlabeled data are available in addition to n labeled data. However, almost all of the previous semi-supervised methods require additional assumptions (not only unlabeled data) to make improvements on supervised learning. If such assumptions are not met, then the methods possibly perform worse than supervised learning. Sokolovska, Cappé, and Yvon (2008) proposed a semi-supervised method based on a weighted likelihood approach. They proved that this method asymptotically never performs worse than supervised learning (i.e., it is safe) without any assumption. Their method is attractive because it is easy to implement and is potentially general. Moreover, it is deeply related to a certain statistical paradox. However, the method of Sokolovska et al. (2008) assumes a very limited situation, i.e., classification, discrete covariates, n(')→∞ and a maximum likelihood estimator. In this paper, we extend their method by modifying the weight. We prove that our proposal is safe in a significantly wide range of situations as long as n≤n('). Further, we give a geometrical interpretation of the proof of safety through the relationship with the above-mentioned statistical paradox. Finally, we show that the above proposal is asymptotically safe even when n(')
Wohl, Michael J A; Thompson, Andrea
2011-06-01
Contrary to conventional wisdom and extant empirical work, forgiving the self may have deleterious consequences, especially if self-forgiveness is granted for chronic unhealthy behaviours such as smoking. Among 181 smokers, it was predicted and found that increased self-forgiveness for smoking was associated with a decreased likelihood of advancing through the stages of behavioural change towards action. Moreover, forgiving the self, mediated the relationship between movements from the pre-contemplation to contemplation stage of change and perceived smoking cons as well as experiential processes. An expanded understanding of the benefits and costs of self-forgiveness is discussed. ©2011 The British Psychological Society.
Peterson, Matthew R; Young, Richard R; Gordon, Gary A
2016-01-01
Key elements of supply chain theory remain relevant to emergency management (EM) logistics activities. The Supply Chain Operations Reference model can also serve as a useful template for the planning, organizing, and execution of EM logistics. Through a series of case studies (developed through intensive survey of organizations and individuals responsible for EM), the authors identified the extent supply chain theory is being adopted and whether the theory was useful for emergency logistics managers. The authors found several drivers that influence the likelihood of an organization to implement elements of supply chain management: the frequency of events, organizational resources, population density, range of events, and severity of the disaster or emergency.
The limits to cumulative causation: international migration from Mexican urban areas.
Fussell, Elizabeth; Massey, Douglas S
2004-02-01
We present theoretical arguments and empirical research to suggest that the principal mechanisms of cumulative causation do not function in large urban settings. Using data from the Mexican Migration Project, we found evidence of cumulative causation in small cities, rural towns and villages, but not in large urban areas. With event-history models, we found little positive effect of community-level social capital and a strong deterrent effect of urban labor markets on the likelihood of first and later U.S. trips for residents of urban areas in Mexico, suggesting that the social process of migration from urban areas is distinct from that in the more widely studied rural migrant-sending communities of Mexico.
Mubayi, Anuj; Castillo-Chavez, Carlos
2018-01-01
Background When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. Methods In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. Conclusions When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle. PMID:29742115
A long-term earthquake rate model for the central and eastern United States from smoothed seismicity
Moschetti, Morgan P.
2015-01-01
I present a long-term earthquake rate model for the central and eastern United States from adaptive smoothed seismicity. By employing pseudoprospective likelihood testing (L-test), I examined the effects of fixed and adaptive smoothing methods and the effects of catalog duration and composition on the ability of the models to forecast the spatial distribution of recent earthquakes. To stabilize the adaptive smoothing method for regions of low seismicity, I introduced minor modifications to the way that the adaptive smoothing distances are calculated. Across all smoothed seismicity models, the use of adaptive smoothing and the use of earthquakes from the recent part of the catalog optimizes the likelihood for tests with M≥2.7 and M≥4.0 earthquake catalogs. The smoothed seismicity models optimized by likelihood testing with M≥2.7 catalogs also produce the highest likelihood values for M≥4.0 likelihood testing, thus substantiating the hypothesis that the locations of moderate-size earthquakes can be forecast by the locations of smaller earthquakes. The likelihood test does not, however, maximize the fraction of earthquakes that are better forecast than a seismicity rate model with uniform rates in all cells. In this regard, fixed smoothing models perform better than adaptive smoothing models. The preferred model of this study is the adaptive smoothed seismicity model, based on its ability to maximize the joint likelihood of predicting the locations of recent small-to-moderate-size earthquakes across eastern North America. The preferred rate model delineates 12 regions where the annual rate of M≥5 earthquakes exceeds 2×10−3. Although these seismic regions have been previously recognized, the preferred forecasts are more spatially concentrated than the rates from fixed smoothed seismicity models, with rate increases of up to a factor of 10 near clusters of high seismic activity.
Johnson, Mallory O; Catz, Sheryl L; Remien, Robert H; Rotheram-Borus, Mary Jane; Morin, Stephen F; Charlebois, Edwin; Gore-Felton, Cheryl; Goldsten, Rise B; Wolfe, Hannah; Lightfoot, Marguerita; Chesney, Margaret A
2003-12-01
Adherence to antiretroviral therapy (ART) remains a challenge in efforts to maximize HIV treatment benefits. Previous studies of antiretroviral adherence are limited by low statistical power, homogeneous samples, and biased assessment methods. Based on Social Action Theory and using a large, diverse sample of men and women living with HIV, the objectives of the current study are to clarify correlates of nonadherence to ART and to provide theory-guided, empirically supported direction for intervening on ART nonadherence. Cross-sectional interview study utilizing a computerized interview. Recruited from clinics, agencies, and via media ads in four U.S. cities from June 2000 to January 2002. Two thousand seven hundred and sixty-five HIV-positive adults taking ART. Computer-assessed self-reported antiretroviral adherence. Thirty-two percent reported less than 90% adherence to ART in the prior 3 days. A number of factors were related to nonadherence in univariate analysis. Multivariate analyses identified that being African American, being in a primary relationship, and a history of injection drug use or homelessness in the past year were associated with greater likelihood of nonadherence. Furthermore, adherence self-efficacy, and being able to manage side effects and fit medications into daily routines were protective against nonadherence. Being tired of taking medications was associated with poorer adherence whereas a belief that nonadherence can make the virus stronger was associated with better adherence. Results support the need for multifocused interventions to improve medication adherence that address logistical barriers, substance use, attitudes and expectancies, as well as skills building and self-efficacy enhancement. Further exploration of issues related to adherence for African Americans and men in primary relationships is warranted.
Reverse Transcription Errors and RNA-DNA Differences at Short Tandem Repeats.
Fungtammasan, Arkarachai; Tomaszkiewicz, Marta; Campos-Sánchez, Rebeca; Eckert, Kristin A; DeGiorgio, Michael; Makova, Kateryna D
2016-10-01
Transcript variation has important implications for organismal function in health and disease. Most transcriptome studies focus on assessing variation in gene expression levels and isoform representation. Variation at the level of transcript sequence is caused by RNA editing and transcription errors, and leads to nongenetically encoded transcript variants, or RNA-DNA differences (RDDs). Such variation has been understudied, in part because its detection is obscured by reverse transcription (RT) and sequencing errors. It has only been evaluated for intertranscript base substitution differences. Here, we investigated transcript sequence variation for short tandem repeats (STRs). We developed the first maximum-likelihood estimator (MLE) to infer RT error and RDD rates, taking next generation sequencing error rates into account. Using the MLE, we empirically evaluated RT error and RDD rates for STRs in a large-scale DNA and RNA replicated sequencing experiment conducted in a primate species. The RT error rates increased exponentially with STR length and were biased toward expansions. The RDD rates were approximately 1 order of magnitude lower than the RT error rates. The RT error rates estimated with the MLE from a primate data set were concordant with those estimated with an independent method, barcoded RNA sequencing, from a Caenorhabditis elegans data set. Our results have important implications for medical genomics, as STR allelic variation is associated with >40 diseases. STR nonallelic transcript variation can also contribute to disease phenotype. The MLE and empirical rates presented here can be used to evaluate the probability of disease-associated transcripts arising due to RDD. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Whiplash and the compensation hypothesis.
Spearing, Natalie M; Connelly, Luke B
2011-12-01
Review article. To explain why the evidence that compensation-related factors lead to worse health outcomes is not compelling, either in general, or in the specific case of whiplash. There is a common view that compensation-related factors lead to worse health outcomes ("the compensation hypothesis"), despite the presence of important, and unresolved sources of bias. The empirical evidence on this question has ramifications for the design of compensation schemes. Using studies on whiplash, this article outlines the methodological problems that impede attempts to confirm or refute the compensation hypothesis. Compensation studies are prone to measurement bias, reverse causation bias, and selection bias. Errors in measurement are largely due to the latent nature of whiplash injuries and health itself, a lack of clarity over the unit of measurement (specific factors, or "compensation"), and a lack of appreciation for the heterogeneous qualities of compensation-related factors and schemes. There has been a failure to acknowledge and empirically address reverse causation bias, or the likelihood that poor health influences the decision to pursue compensation: it is unclear if compensation is a cause or a consequence of poor health, or both. Finally, unresolved selection bias (and hence, confounding) is evident in longitudinal studies and natural experiments. In both cases, between-group differences have not been addressed convincingly. The nature of the relationship between compensation-related factors and health is unclear. Current approaches to testing the compensation hypothesis are prone to several important sources of bias, which compromise the validity of their results. Methods that explicitly test the hypothesis and establish whether or not a causal relationship exists between compensation factors and prolonged whiplash symptoms are needed in future studies.
Salje, Ekhard K H; Planes, Antoni; Vives, Eduard
2017-10-01
Crackling noise can be initiated by competing or coexisting mechanisms. These mechanisms can combine to generate an approximate scale invariant distribution that contains two or more contributions. The overall distribution function can be analyzed, to a good approximation, using maximum-likelihood methods and assuming that it follows a power law although with nonuniversal exponents depending on a varying lower cutoff. We propose that such distributions are rather common and originate from a simple superposition of crackling noise distributions or exponential damping.
A survey of kernel-type estimators for copula and their applications
NASA Astrophysics Data System (ADS)
Sumarjaya, I. W.
2017-10-01
Copulas have been widely used to model nonlinear dependence structure. Main applications of copulas include areas such as finance, insurance, hydrology, rainfall to name but a few. The flexibility of copula allows researchers to model dependence structure beyond Gaussian distribution. Basically, a copula is a function that couples multivariate distribution functions to their one-dimensional marginal distribution functions. In general, there are three methods to estimate copula. These are parametric, nonparametric, and semiparametric method. In this article we survey kernel-type estimators for copula such as mirror reflection kernel, beta kernel, transformation method and local likelihood transformation method. Then, we apply these kernel methods to three stock indexes in Asia. The results of our analysis suggest that, albeit variation in information criterion values, the local likelihood transformation method performs better than the other kernel methods.
Towers, Sherry; Mubayi, Anuj; Castillo-Chavez, Carlos
2018-01-01
When attempting to statistically distinguish between a null and an alternative hypothesis, many researchers in the life and social sciences turn to binned statistical analysis methods, or methods that are simply based on the moments of a distribution (such as the mean, and variance). These methods have the advantage of simplicity of implementation, and simplicity of explanation. However, when null and alternative hypotheses manifest themselves in subtle differences in patterns in the data, binned analysis methods may be insensitive to these differences, and researchers may erroneously fail to reject the null hypothesis when in fact more sensitive statistical analysis methods might produce a different result when the null hypothesis is actually false. Here, with a focus on two recent conflicting studies of contagion in mass killings as instructive examples, we discuss how the use of unbinned likelihood methods makes optimal use of the information in the data; a fact that has been long known in statistical theory, but perhaps is not as widely appreciated amongst general researchers in the life and social sciences. In 2015, Towers et al published a paper that quantified the long-suspected contagion effect in mass killings. However, in 2017, Lankford & Tomek subsequently published a paper, based upon the same data, that claimed to contradict the results of the earlier study. The former used unbinned likelihood methods, and the latter used binned methods, and comparison of distribution moments. Using these analyses, we also discuss how visualization of the data can aid in determination of the most appropriate statistical analysis methods to distinguish between a null and alternate hypothesis. We also discuss the importance of assessment of the robustness of analysis results to methodological assumptions made (for example, arbitrary choices of number of bins and bin widths when using binned methods); an issue that is widely overlooked in the literature, but is critical to analysis reproducibility and robustness. When an analysis cannot distinguish between a null and alternate hypothesis, care must be taken to ensure that the analysis methodology itself maximizes the use of information in the data that can distinguish between the two hypotheses. The use of binned methods by Lankford & Tomek (2017), that examined how many mass killings fell within a 14 day window from a previous mass killing, substantially reduced the sensitivity of their analysis to contagion effects. The unbinned likelihood methods used by Towers et al (2015) did not suffer from this problem. While a binned analysis might be favorable for simplicity and clarity of presentation, unbinned likelihood methods are preferable when effects might be somewhat subtle.
Khan, Mohammad Niyaz; Yusof, Nor Saadah Mohd; Razak, Norazizah Abdul
2013-01-01
The semi-empirical spectrophotometric (SESp) method, for the indirect determination of ion exchange constants (K(X)(Br)) of ion exchange processes occurring between counterions (X⁻ and Br⁻) at the cationic micellar surface, is described in this article. The method uses an anionic spectrophotometric probe molecule, N-(2-methoxyphenyl)phthalamate ion (1⁻), which measures the effects of varying concentrations of inert inorganic or organic salt (Na(v)X, v = 1, 2) on absorbance, (A(ob)) at 310 nm, of samples containing constant concentrations of 1⁻, NaOH and cationic micelles. The observed data fit satisfactorily to an empirical equation which gives the values of two empirical constants. These empirical constants lead to the determination of K(X)(Br) (= K(X)/K(Br) with K(X) and K(Br) representing cationic micellar binding constants of counterions X and Br⁻). This method gives values of K(X)(Br) for both moderately hydrophobic and hydrophilic X⁻. The values of K(X)(Br), obtained by using this method, are comparable with the corresponding values of K(X)(Br), obtained by the use of semi-empirical kinetic (SEK) method, for different moderately hydrophobic X. The values of K(X)(Br) for X = Cl⁻ and 2,6-Cl₂C6H₃CO₂⁻, obtained by the use of SESp and SEK methods, are similar to those obtained by the use of other different conventional methods.
Su, Jingjun; Du, Xinzhong; Li, Xuyong
2018-05-16
Uncertainty analysis is an important prerequisite for model application. However, the existing phosphorus (P) loss indexes or indicators were rarely evaluated. This study applied generalized likelihood uncertainty estimation (GLUE) method to assess the uncertainty of parameters and modeling outputs of a non-point source (NPS) P indicator constructed in R language. And the influences of subjective choices of likelihood formulation and acceptability threshold of GLUE on model outputs were also detected. The results indicated the following. (1) Parameters RegR 2 , RegSDR 2 , PlossDP fer , PlossDP man , DPDR, and DPR were highly sensitive to overall TP simulation and their value ranges could be reduced by GLUE. (2) Nash efficiency likelihood (L 1 ) seemed to present better ability in accentuating high likelihood value simulations than the exponential function (L 2 ) did. (3) The combined likelihood integrating the criteria of multiple outputs acted better than single likelihood in model uncertainty assessment in terms of reducing the uncertainty band widths and assuring the fitting goodness of whole model outputs. (4) A value of 0.55 appeared to be a modest choice of threshold value to balance the interests between high modeling efficiency and high bracketing efficiency. Results of this study could provide (1) an option to conduct NPS modeling under one single computer platform, (2) important references to the parameter setting for NPS model development in similar regions, (3) useful suggestions for the application of GLUE method in studies with different emphases according to research interests, and (4) important insights into the watershed P management in similar regions.
Regression estimators for generic health-related quality of life and quality-adjusted life years.
Basu, Anirban; Manca, Andrea
2012-01-01
To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beer, M.
1980-12-01
The maximum likelihood method for the multivariate normal distribution is applied to the case of several individual eigenvalues. Correlated Monte Carlo estimates of the eigenvalue are assumed to follow this prescription and aspects of the assumption are examined. Monte Carlo cell calculations using the SAM-CE and VIM codes for the TRX-1 and TRX-2 benchmark reactors, and SAM-CE full core results are analyzed with this method. Variance reductions of a few percent to a factor of 2 are obtained from maximum likelihood estimation as compared with the simple average and the minimum variance individual eigenvalue. The numerical results verify that themore » use of sample variances and correlation coefficients in place of the corresponding population statistics still leads to nearly minimum variance estimation for a sufficient number of histories and aggregates.« less
Robbins, L G
2000-01-01
Graduate school programs in genetics have become so full that courses in statistics have often been eliminated. In addition, typical introductory statistics courses for the "statistics user" rather than the nascent statistician are laden with methods for analysis of measured variables while genetic data are most often discrete numbers. These courses are often seen by students and genetics professors alike as largely irrelevant cookbook courses. The powerful methods of likelihood analysis, although commonly employed in human genetics, are much less often used in other areas of genetics, even though current computational tools make this approach readily accessible. This article introduces the MLIKELY.PAS computer program and the logic of do-it-yourself maximum-likelihood statistics. The program itself, course materials, and expanded discussions of some examples that are only summarized here are available at http://www.unisi. it/ricerca/dip/bio_evol/sitomlikely/mlikely.h tml. PMID:10628965
Krishnamoorthy, K; Oral, Evrim
2017-12-01
Standardized likelihood ratio test (SLRT) for testing the equality of means of several log-normal distributions is proposed. The properties of the SLRT and an available modified likelihood ratio test (MLRT) and a generalized variable (GV) test are evaluated by Monte Carlo simulation and compared. Evaluation studies indicate that the SLRT is accurate even for small samples, whereas the MLRT could be quite liberal for some parameter values, and the GV test is in general conservative and less powerful than the SLRT. Furthermore, a closed-form approximate confidence interval for the common mean of several log-normal distributions is developed using the method of variance estimate recovery, and compared with the generalized confidence interval with respect to coverage probabilities and precision. Simulation studies indicate that the proposed confidence interval is accurate and better than the generalized confidence interval in terms of coverage probabilities. The methods are illustrated using two examples.
NASA Astrophysics Data System (ADS)
Jaber, Abobaker M.
2014-12-01
Two nonparametric methods for prediction and modeling of financial time series signals are proposed. The proposed techniques are designed to handle non-stationary and non-linearity behave and to extract meaningful signals for reliable prediction. Due to Fourier Transform (FT), the methods select significant decomposed signals that will be employed for signal prediction. The proposed techniques developed by coupling Holt-winter method with Empirical Mode Decomposition (EMD) and it is Extending the scope of empirical mode decomposition by smoothing (SEMD). To show performance of proposed techniques, we analyze daily closed price of Kuala Lumpur stock market index.
Palamara, Gian Marco; Childs, Dylan Z; Clements, Christopher F; Petchey, Owen L; Plebani, Marco; Smith, Matthew J
2014-01-01
Understanding and quantifying the temperature dependence of population parameters, such as intrinsic growth rate and carrying capacity, is critical for predicting the ecological responses to environmental change. Many studies provide empirical estimates of such temperature dependencies, but a thorough investigation of the methods used to infer them has not been performed yet. We created artificial population time series using a stochastic logistic model parameterized with the Arrhenius equation, so that activation energy drives the temperature dependence of population parameters. We simulated different experimental designs and used different inference methods, varying the likelihood functions and other aspects of the parameter estimation methods. Finally, we applied the best performing inference methods to real data for the species Paramecium caudatum. The relative error of the estimates of activation energy varied between 5% and 30%. The fraction of habitat sampled played the most important role in determining the relative error; sampling at least 1% of the habitat kept it below 50%. We found that methods that simultaneously use all time series data (direct methods) and methods that estimate population parameters separately for each temperature (indirect methods) are complementary. Indirect methods provide a clearer insight into the shape of the functional form describing the temperature dependence of population parameters; direct methods enable a more accurate estimation of the parameters of such functional forms. Using both methods, we found that growth rate and carrying capacity of Paramecium caudatum scale with temperature according to different activation energies. Our study shows how careful choice of experimental design and inference methods can increase the accuracy of the inferred relationships between temperature and population parameters. The comparison of estimation methods provided here can increase the accuracy of model predictions, with important implications in understanding and predicting the effects of temperature on the dynamics of populations. PMID:25558365
A statistically robust EEG re-referencing procedure to mitigate reference effect
Lepage, Kyle Q.; Kramer, Mark A.; Chu, Catherine J.
2014-01-01
Background The electroencephalogram (EEG) remains the primary tool for diagnosis of abnormal brain activity in clinical neurology and for in vivo recordings of human neurophysiology in neuroscience research. In EEG data acquisition, voltage is measured at positions on the scalp with respect to a reference electrode. When this reference electrode responds to electrical activity or artifact all electrodes are affected. Successful analysis of EEG data often involves re-referencing procedures that modify the recorded traces and seek to minimize the impact of reference electrode activity upon functions of the original EEG recordings. New method We provide a novel, statistically robust procedure that adapts a robust maximum-likelihood type estimator to the problem of reference estimation, reduces the influence of neural activity from the re-referencing operation, and maintains good performance in a wide variety of empirical scenarios. Results The performance of the proposed and existing re-referencing procedures are validated in simulation and with examples of EEG recordings. To facilitate this comparison, channel-to-channel correlations are investigated theoretically and in simulation. Comparison with existing methods The proposed procedure avoids using data contaminated by neural signal and remains unbiased in recording scenarios where physical references, the common average reference (CAR) and the reference estimation standardization technique (REST) are not optimal. Conclusion The proposed procedure is simple, fast, and avoids the potential for substantial bias when analyzing low-density EEG data. PMID:24975291
Conception intervals and the substitution of fertility over time.
Olsen, R J; Farkas, G
1985-04-01
This paper applies the waiting-time regression methods of Olsen and Wolpin (1983) to an analysis of fertility. A utility maximizing model is set up and used to provide some guidance for an empirical analysis. The data are from an experimental guaranteed job program, the Youth Incentive Entitlement Pilot Project, aimed at young women 16 to 20 years old, from poverty-level families, and not yet high school graduates. The waiting-time regression method of estimation permits the youth in question to be used as her own control revealing how eligibility for the jobs program changes the durations of periods between live-birth conceptions. 3890 women surveyed had 1 birth, 429 had 2, 112 had 3, 26 had 4, and 7 had 5. Without this person specific control described here, the most important factors affecting fertility are number of siblings (negative effect), labor market attachment by parents, especially the father, and the presence of the natural father. With the person specific control, the results predicted from economic theory do emerge: even adolescent and young women consider the economic consequences of fertility reflected in effects of fertility when wages are high in favor of fertility with lower wages. Post program effects (taking place after youths lose eligibility for the program) are a rather rapid making up for foregone fertility, reducing likelihood of net reductions of total fertility.