Sample records for nonparametric function estimation

  1. Nonparametric Transfer Function Models

    PubMed Central

    Liu, Jun M.; Chen, Rong; Yao, Qiwei

    2009-01-01

    In this paper a class of nonparametric transfer function models is proposed to model nonlinear relationships between ‘input’ and ‘output’ time series. The transfer function is smooth with unknown functional forms, and the noise is assumed to be a stationary autoregressive-moving average (ARMA) process. The nonparametric transfer function is estimated jointly with the ARMA parameters. By modeling the correlation in the noise, the transfer function can be estimated more efficiently. The parsimonious ARMA structure improves the estimation efficiency in finite samples. The asymptotic properties of the estimators are investigated. The finite-sample properties are illustrated through simulations and one empirical example. PMID:20628584

  2. Nonparametric estimation of the multivariate survivor function: the multivariate Kaplan-Meier estimator.

    PubMed

    Prentice, Ross L; Zhao, Shanshan

    2018-01-01

    The Dabrowska (Ann Stat 16:1475-1489, 1988) product integral representation of the multivariate survivor function is extended, leading to a nonparametric survivor function estimator for an arbitrary number of failure time variates that has a simple recursive formula for its calculation. Empirical process methods are used to sketch proofs for this estimator's strong consistency and weak convergence properties. Summary measures of pairwise and higher-order dependencies are also defined and nonparametrically estimated. Simulation evaluation is given for the special case of three failure time variates.

  3. Robust estimation for ordinary differential equation models.

    PubMed

    Cao, J; Wang, L; Xu, J

    2011-12-01

    Applied scientists often like to use ordinary differential equations (ODEs) to model complex dynamic processes that arise in biology, engineering, medicine, and many other areas. It is interesting but challenging to estimate ODE parameters from noisy data, especially when the data have some outliers. We propose a robust method to address this problem. The dynamic process is represented with a nonparametric function, which is a linear combination of basis functions. The nonparametric function is estimated by a robust penalized smoothing method. The penalty term is defined with the parametric ODE model, which controls the roughness of the nonparametric function and maintains the fidelity of the nonparametric function to the ODE model. The basis coefficients and ODE parameters are estimated in two nested levels of optimization. The coefficient estimates are treated as an implicit function of ODE parameters, which enables one to derive the analytic gradients for optimization using the implicit function theorem. Simulation studies show that the robust method gives satisfactory estimates for the ODE parameters from noisy data with outliers. The robust method is demonstrated by estimating a predator-prey ODE model from real ecological data. © 2011, The International Biometric Society.

  4. Nonparametric functional data estimation applied to ozone data: prediction and extreme value analysis.

    PubMed

    Quintela-del-Río, Alejandro; Francisco-Fernández, Mario

    2011-02-01

    The study of extreme values and prediction of ozone data is an important topic of research when dealing with environmental problems. Classical extreme value theory is usually used in air-pollution studies. It consists in fitting a parametric generalised extreme value (GEV) distribution to a data set of extreme values, and using the estimated distribution to compute return levels and other quantities of interest. Here, we propose to estimate these values using nonparametric functional data methods. Functional data analysis is a relatively new statistical methodology that generally deals with data consisting of curves or multi-dimensional variables. In this paper, we use this technique, jointly with nonparametric curve estimation, to provide alternatives to the usual parametric statistical tools. The nonparametric estimators are applied to real samples of maximum ozone values obtained from several monitoring stations belonging to the Automatic Urban and Rural Network (AURN) in the UK. The results show that nonparametric estimators work satisfactorily, outperforming the behaviour of classical parametric estimators. Functional data analysis is also used to predict stratospheric ozone concentrations. We show an application, using the data set of mean monthly ozone concentrations in Arosa, Switzerland, and the results are compared with those obtained by classical time series (ARIMA) analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.

  5. On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices

    PubMed Central

    Ye, Xin; Pendyala, Ram M.; Zou, Yajie

    2017-01-01

    A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences. PMID:29073152

  6. On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices.

    PubMed

    Wang, Ke; Ye, Xin; Pendyala, Ram M; Zou, Yajie

    2017-01-01

    A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences.

  7. Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.

    PubMed

    Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen

    2011-04-01

    Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.

  8. Surface Estimation, Variable Selection, and the Nonparametric Oracle Property

    PubMed Central

    Storlie, Curtis B.; Bondell, Howard D.; Reich, Brian J.; Zhang, Hao Helen

    2010-01-01

    Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting. PMID:21603586

  9. On non-parametric maximum likelihood estimation of the bivariate survivor function.

    PubMed

    Prentice, R L

    The likelihood function for the bivariate survivor function F, under independent censorship, is maximized to obtain a non-parametric maximum likelihood estimator &Fcirc;. &Fcirc; may or may not be unique depending on the configuration of singly- and doubly-censored pairs. The likelihood function can be maximized by placing all mass on the grid formed by the uncensored failure times, or half lines beyond the failure time grid, or in the upper right quadrant beyond the grid. By accumulating the mass along lines (or regions) where the likelihood is flat, one obtains a partially maximized likelihood as a function of parameters that can be uniquely estimated. The score equations corresponding to these point mass parameters are derived, using a Lagrange multiplier technique to ensure unit total mass, and a modified Newton procedure is used to calculate the parameter estimates in some limited simulation studies. Some considerations for the further development of non-parametric bivariate survivor function estimators are briefly described.

  10. Robust location and spread measures for nonparametric probability density function estimation.

    PubMed

    López-Rubio, Ezequiel

    2009-10-01

    Robustness against outliers is a desirable property of any unsupervised learning scheme. In particular, probability density estimators benefit from incorporating this feature. A possible strategy to achieve this goal is to substitute the sample mean and the sample covariance matrix by more robust location and spread estimators. Here we use the L1-median to develop a nonparametric probability density function (PDF) estimator. We prove its most relevant properties, and we show its performance in density estimation and classification applications.

  11. Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data

    PubMed Central

    Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

    2012-01-01

    Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions. PMID:23645976

  12. Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data.

    PubMed

    Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

    2013-01-01

    Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions.

  13. A Comparison of Methods for Nonparametric Estimation of Item Characteristic Curves for Binary Items

    ERIC Educational Resources Information Center

    Lee, Young-Sun

    2007-01-01

    This study compares the performance of three nonparametric item characteristic curve (ICC) estimation procedures: isotonic regression, smoothed isotonic regression, and kernel smoothing. Smoothed isotonic regression, employed along with an appropriate kernel function, provides better estimates and also satisfies the assumption of strict…

  14. Marginally specified priors for non-parametric Bayesian estimation

    PubMed Central

    Kessler, David C.; Hoff, Peter D.; Dunson, David B.

    2014-01-01

    Summary Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables. PMID:25663813

  15. Functional mapping of reaction norms to multiple environmental signals through nonparametric covariance estimation

    PubMed Central

    2011-01-01

    Background The identification of genes or quantitative trait loci that are expressed in response to different environmental factors such as temperature and light, through functional mapping, critically relies on precise modeling of the covariance structure. Previous work used separable parametric covariance structures, such as a Kronecker product of autoregressive one [AR(1)] matrices, that do not account for interaction effects of different environmental factors. Results We implement a more robust nonparametric covariance estimator to model these interactions within the framework of functional mapping of reaction norms to two signals. Our results from Monte Carlo simulations show that this estimator can be useful in modeling interactions that exist between two environmental signals. The interactions are simulated using nonseparable covariance models with spatio-temporal structural forms that mimic interaction effects. Conclusions The nonparametric covariance estimator has an advantage over separable parametric covariance estimators in the detection of QTL location, thus extending the breadth of use of functional mapping in practical settings. PMID:21269481

  16. The Breslow estimator of the nonparametric baseline survivor function in Cox's regression model: some heuristics.

    PubMed

    Hanley, James A

    2008-01-01

    Most survival analysis textbooks explain how the hazard ratio parameters in Cox's life table regression model are estimated. Fewer explain how the components of the nonparametric baseline survivor function are derived. Those that do often relegate the explanation to an "advanced" section and merely present the components as algebraic or iterative solutions to estimating equations. None comment on the structure of these estimators. This note brings out a heuristic representation that may help to de-mystify the structure.

  17. Nonparametric probability density estimation by optimization theoretic techniques

    NASA Technical Reports Server (NTRS)

    Scott, D. W.

    1976-01-01

    Two nonparametric probability density estimators are considered. The first is the kernel estimator. The problem of choosing the kernel scaling factor based solely on a random sample is addressed. An interactive mode is discussed and an algorithm proposed to choose the scaling factor automatically. The second nonparametric probability estimate uses penalty function techniques with the maximum likelihood criterion. A discrete maximum penalized likelihood estimator is proposed and is shown to be consistent in the mean square error. A numerical implementation technique for the discrete solution is discussed and examples displayed. An extensive simulation study compares the integrated mean square error of the discrete and kernel estimators. The robustness of the discrete estimator is demonstrated graphically.

  18. Mathematical models for nonparametric inferences from line transect data

    USGS Publications Warehouse

    Burnham, K.P.; Anderson, D.R.

    1976-01-01

    A general mathematical theory of line transects is develoepd which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(O) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y/r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(O/r).

  19. Estimation and model selection of semiparametric multivariate survival functions under general censorship.

    PubMed

    Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

    2010-07-01

    We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.

  20. Estimation and model selection of semiparametric multivariate survival functions under general censorship

    PubMed Central

    Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

    2013-01-01

    We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286

  1. Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions

    PubMed Central

    Park, Yongseok; Taylor, Jeremy M. G.; Kalbfleisch, John D.

    2012-01-01

    In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method. PMID:23843661

  2. Maximum Likelihood Estimations and EM Algorithms with Length-biased Data

    PubMed Central

    Qin, Jing; Ning, Jing; Liu, Hao; Shen, Yu

    2012-01-01

    SUMMARY Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, epidemiological, genetic and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimations and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semi-parametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online. PMID:22323840

  3. Non-parametric directionality analysis - Extension for removal of a single common predictor and application to time series.

    PubMed

    Halliday, David M; Senik, Mohd Harizal; Stevenson, Carl W; Mason, Rob

    2016-08-01

    The ability to infer network structure from multivariate neuronal signals is central to computational neuroscience. Directed network analyses typically use parametric approaches based on auto-regressive (AR) models, where networks are constructed from estimates of AR model parameters. However, the validity of using low order AR models for neurophysiological signals has been questioned. A recent article introduced a non-parametric approach to estimate directionality in bivariate data, non-parametric approaches are free from concerns over model validity. We extend the non-parametric framework to include measures of directed conditional independence, using scalar measures that decompose the overall partial correlation coefficient summatively by direction, and a set of functions that decompose the partial coherence summatively by direction. A time domain partial correlation function allows both time and frequency views of the data to be constructed. The conditional independence estimates are conditioned on a single predictor. The framework is applied to simulated cortical neuron networks and mixtures of Gaussian time series data with known interactions. It is applied to experimental data consisting of local field potential recordings from bilateral hippocampus in anaesthetised rats. The framework offers a non-parametric approach to estimation of directed interactions in multivariate neuronal recordings, and increased flexibility in dealing with both spike train and time series data. The framework offers a novel alternative non-parametric approach to estimate directed interactions in multivariate neuronal recordings, and is applicable to spike train and time series data. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. Model-free quantification of dynamic PET data using nonparametric deconvolution

    PubMed Central

    Zanderigo, Francesca; Parsey, Ramin V; Todd Ogden, R

    2015-01-01

    Dynamic positron emission tomography (PET) data are usually quantified using compartment models (CMs) or derived graphical approaches. Often, however, CMs either do not properly describe the tracer kinetics, or are not identifiable, leading to nonphysiologic estimates of the tracer binding. The PET data are modeled as the convolution of the metabolite-corrected input function and the tracer impulse response function (IRF) in the tissue. Using nonparametric deconvolution methods, it is possible to obtain model-free estimates of the IRF, from which functionals related to tracer volume of distribution and binding may be computed, but this approach has rarely been applied in PET. Here, we apply nonparametric deconvolution using singular value decomposition to simulated and test–retest clinical PET data with four reversible tracers well characterized by CMs ([11C]CUMI-101, [11C]DASB, [11C]PE2I, and [11C]WAY-100635), and systematically compare reproducibility, reliability, and identifiability of various IRF-derived functionals with that of traditional CMs outcomes. Results show that nonparametric deconvolution, completely free of any model assumptions, allows for estimates of tracer volume of distribution and binding that are very close to the estimates obtained with CMs and, in some cases, show better test–retest performance than CMs outcomes. PMID:25873427

  5. Nonparametric estimation of plant density by the distance method

    USGS Publications Warehouse

    Patil, S.A.; Burnham, K.P.; Kovner, J.L.

    1979-01-01

    A relation between the plant density and the probability density function of the nearest neighbor distance (squared) from a random point is established under fairly broad conditions. Based upon this relationship, a nonparametric estimator for the plant density is developed and presented in terms of order statistics. Consistency and asymptotic normality of the estimator are discussed. An interval estimator for the density is obtained. The modifications of this estimator and its variance are given when the distribution is truncated. Simulation results are presented for regular, random and aggregated populations to illustrate the nonparametric estimator and its variance. A numerical example from field data is given. Merits and deficiencies of the estimator are discussed with regard to its robustness and variance.

  6. Mathematical models for non-parametric inferences from line transect data

    USGS Publications Warehouse

    Burnham, K.P.; Anderson, D.R.

    1976-01-01

    A general mathematical theory of line transects is developed which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(0) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y I r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(0 I r).

  7. Nonparametric analysis of bivariate gap time with competing risks.

    PubMed

    Huang, Chiung-Yu; Wang, Chenguang; Wang, Mei-Cheng

    2016-09-01

    This article considers nonparametric methods for studying recurrent disease and death with competing risks. We first point out that comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events, and that comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. We then propose nonparametric estimators for the conditional cumulative incidence function as well as the conditional bivariate cumulative incidence function for the bivariate gap times, that is, the time to disease recurrence and the residual lifetime after recurrence. To quantify the association between the two gap times in the competing risks setting, a modified Kendall's tau statistic is proposed. The proposed estimators for the conditional bivariate cumulative incidence distribution and the association measure account for the induced dependent censoring for the second gap time. Uniform consistency and weak convergence of the proposed estimators are established. Hypothesis testing procedures for two-sample comparisons are discussed. Numerical simulation studies with practical sample sizes are conducted to evaluate the performance of the proposed nonparametric estimators and tests. An application to data from a pancreatic cancer study is presented to illustrate the methods developed in this article. © 2016, The International Biometric Society.

  8. Three Classes of Nonparametric Differential Step Functioning Effect Estimators

    ERIC Educational Resources Information Center

    Penfield, Randall D.

    2008-01-01

    The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…

  9. Nonparametric Discrete Survival Function Estimation with Uncertain Endpoints Using an Internal Validation Subsample

    PubMed Central

    Zee, Jarcy; Xie, Sharon X.

    2015-01-01

    Summary When a true survival endpoint cannot be assessed for some subjects, an alternative endpoint that measures the true endpoint with error may be collected, which often occurs when obtaining the true endpoint is too invasive or costly. We develop an estimated likelihood function for the situation where we have both uncertain endpoints for all participants and true endpoints for only a subset of participants. We propose a nonparametric maximum estimated likelihood estimator of the discrete survival function of time to the true endpoint. We show that the proposed estimator is consistent and asymptotically normal. We demonstrate through extensive simulations that the proposed estimator has little bias compared to the naïve Kaplan-Meier survival function estimator, which uses only uncertain endpoints, and more efficient with moderate missingness compared to the complete-case Kaplan-Meier survival function estimator, which uses only available true endpoints. Finally, we apply the proposed method to a dataset for estimating the risk of developing Alzheimer's disease from the Alzheimer's Disease Neuroimaging Initiative. PMID:25916510

  10. Short-term forecasting of meteorological time series using Nonparametric Functional Data Analysis (NPFDA)

    NASA Astrophysics Data System (ADS)

    Curceac, S.; Ternynck, C.; Ouarda, T.

    2015-12-01

    Over the past decades, a substantial amount of research has been conducted to model and forecast climatic variables. In this study, Nonparametric Functional Data Analysis (NPFDA) methods are applied to forecast air temperature and wind speed time series in Abu Dhabi, UAE. The dataset consists of hourly measurements recorded for a period of 29 years, 1982-2010. The novelty of the Functional Data Analysis approach is in expressing the data as curves. In the present work, the focus is on daily forecasting and the functional observations (curves) express the daily measurements of the above mentioned variables. We apply a non-linear regression model with a functional non-parametric kernel estimator. The computation of the estimator is performed using an asymmetrical quadratic kernel function for local weighting based on the bandwidth obtained by a cross validation procedure. The proximities between functional objects are calculated by families of semi-metrics based on derivatives and Functional Principal Component Analysis (FPCA). Additionally, functional conditional mode and functional conditional median estimators are applied and the advantages of combining their results are analysed. A different approach employs a SARIMA model selected according to the minimum Akaike (AIC) and Bayessian (BIC) Information Criteria and based on the residuals of the model. The performance of the models is assessed by calculating error indices such as the root mean square error (RMSE), relative RMSE, BIAS and relative BIAS. The results indicate that the NPFDA models provide more accurate forecasts than the SARIMA models. Key words: Nonparametric functional data analysis, SARIMA, time series forecast, air temperature, wind speed

  11. Multiple imputation methods for nonparametric inference on cumulative incidence with missing cause of failure

    PubMed Central

    Lee, Minjung; Dignam, James J.; Han, Junhee

    2014-01-01

    We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107

  12. Thirty Years of Nonparametric Item Response Theory.

    ERIC Educational Resources Information Center

    Molenaar, Ivo W.

    2001-01-01

    Discusses relationships between a mathematical measurement model and its real-world applications. Makes a distinction between large-scale data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Also evaluates nonparametric methods for estimating item response functions and…

  13. Optimum nonparametric estimation of population density based on ordered distances

    USGS Publications Warehouse

    Patil, S.A.; Kovner, J.L.; Burnham, Kenneth P.

    1982-01-01

    The asymptotic mean and error mean square are determined for the nonparametric estimator of plant density by distance sampling proposed by Patil, Burnham and Kovner (1979, Biometrics 35, 597-604. On the basis of these formulae, a bias-reduced version of this estimator is given, and its specific form is determined which gives minimum mean square error under varying assumptions about the true probability density function of the sampled data. Extension is given to line-transect sampling.

  14. A fully nonparametric estimator of the marginal survival function based on case–control clustered age-at-onset data

    PubMed Central

    Gorfine, Malka; Bordo, Nadia; Hsu, Li

    2017-01-01

    Summary Consider a popular case–control family study where individuals with a disease under study (case probands) and individuals who do not have the disease (control probands) are randomly sampled from a well-defined population. Possibly right-censored age at onset and disease status are observed for both probands and their relatives. For example, case probands are men diagnosed with prostate cancer, control probands are men free of prostate cancer, and the prostate cancer history of the fathers of the probands is also collected. Inherited genetic susceptibility, shared environment, and common behavior lead to correlation among the outcomes within a family. In this article, a novel nonparametric estimator of the marginal survival function is provided. The estimator is defined in the presence of intra-cluster dependence, and is based on consistent smoothed kernel estimators of conditional survival functions. By simulation, it is shown that the proposed estimator performs very well in terms of bias. The utility of the estimator is illustrated by the analysis of case–control family data of early onset prostate cancer. To our knowledge, this is the first article that provides a fully nonparametric marginal survival estimator based on case–control clustered age-at-onset data. PMID:27436674

  15. Nonparametric model validations for hidden Markov models with applications in financial econometrics.

    PubMed

    Zhao, Zhibiao

    2011-06-01

    We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise.

  16. Computation of nonparametric convex hazard estimators via profile methods.

    PubMed

    Jankowski, Hanna K; Wellner, Jon A

    2009-05-01

    This paper proposes a profile likelihood algorithm to compute the nonparametric maximum likelihood estimator of a convex hazard function. The maximisation is performed in two steps: First the support reduction algorithm is used to maximise the likelihood over all hazard functions with a given point of minimum (or antimode). Then it is shown that the profile (or partially maximised) likelihood is quasi-concave as a function of the antimode, so that a bisection algorithm can be applied to find the maximum of the profile likelihood, and hence also the global maximum. The new algorithm is illustrated using both artificial and real data, including lifetime data for Canadian males and females.

  17. Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA

    PubMed Central

    Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui

    2014-01-01

    Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792

  18. Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting

    PubMed Central

    Chan, Kwun Chuen Gary; Yam, Sheung Chi Phillip; Zhang, Zheng

    2015-01-01

    Summary The estimation of average treatment effects based on observational data is extremely important in practice and has been studied by generations of statisticians under different frameworks. Existing globally efficient estimators require non-parametric estimation of a propensity score function, an outcome regression function or both, but their performance can be poor in practical sample sizes. Without explicitly estimating either functions, we consider a wide class calibration weights constructed to attain an exact three-way balance of the moments of observed covariates among the treated, the control, and the combined group. The wide class includes exponential tilting, empirical likelihood and generalized regression as important special cases, and extends survey calibration estimators to different statistical problems and with important distinctions. Global semiparametric efficiency for the estimation of average treatment effects is established for this general class of calibration estimators. The results show that efficiency can be achieved by solely balancing the covariate distributions without resorting to direct estimation of propensity score or outcome regression function. We also propose a consistent estimator for the efficient asymptotic variance, which does not involve additional functional estimation of either the propensity score or the outcome regression functions. The proposed variance estimator outperforms existing estimators that require a direct approximation of the efficient influence function. PMID:27346982

  19. Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting.

    PubMed

    Chan, Kwun Chuen Gary; Yam, Sheung Chi Phillip; Zhang, Zheng

    2016-06-01

    The estimation of average treatment effects based on observational data is extremely important in practice and has been studied by generations of statisticians under different frameworks. Existing globally efficient estimators require non-parametric estimation of a propensity score function, an outcome regression function or both, but their performance can be poor in practical sample sizes. Without explicitly estimating either functions, we consider a wide class calibration weights constructed to attain an exact three-way balance of the moments of observed covariates among the treated, the control, and the combined group. The wide class includes exponential tilting, empirical likelihood and generalized regression as important special cases, and extends survey calibration estimators to different statistical problems and with important distinctions. Global semiparametric efficiency for the estimation of average treatment effects is established for this general class of calibration estimators. The results show that efficiency can be achieved by solely balancing the covariate distributions without resorting to direct estimation of propensity score or outcome regression function. We also propose a consistent estimator for the efficient asymptotic variance, which does not involve additional functional estimation of either the propensity score or the outcome regression functions. The proposed variance estimator outperforms existing estimators that require a direct approximation of the efficient influence function.

  20. Nonparametric model validations for hidden Markov models with applications in financial econometrics

    PubMed Central

    Zhao, Zhibiao

    2011-01-01

    We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise. PMID:21750601

  1. Nonparametric autocovariance estimation from censored time series by Gaussian imputation.

    PubMed

    Park, Jung Wook; Genton, Marc G; Ghosh, Sujit K

    2009-02-01

    One of the most frequently used methods to model the autocovariance function of a second-order stationary time series is to use the parametric framework of autoregressive and moving average models developed by Box and Jenkins. However, such parametric models, though very flexible, may not always be adequate to model autocovariance functions with sharp changes. Furthermore, if the data do not follow the parametric model and are censored at a certain value, the estimation results may not be reliable. We develop a Gaussian imputation method to estimate an autocovariance structure via nonparametric estimation of the autocovariance function in order to address both censoring and incorrect model specification. We demonstrate the effectiveness of the technique in terms of bias and efficiency with simulations under various rates of censoring and underlying models. We describe its application to a time series of silicon concentrations in the Arctic.

  2. Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation.

    PubMed

    Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

    2015-07-01

    Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.

  3. Sparsity-promoting and edge-preserving maximum a posteriori estimators in non-parametric Bayesian inverse problems

    NASA Astrophysics Data System (ADS)

    Agapiou, Sergios; Burger, Martin; Dashti, Masoumeh; Helin, Tapio

    2018-04-01

    We consider the inverse problem of recovering an unknown functional parameter u in a separable Banach space, from a noisy observation vector y of its image through a known possibly non-linear map {{\\mathcal G}} . We adopt a Bayesian approach to the problem and consider Besov space priors (see Lassas et al (2009 Inverse Problems Imaging 3 87-122)), which are well-known for their edge-preserving and sparsity-promoting properties and have recently attracted wide attention especially in the medical imaging community. Our key result is to show that in this non-parametric setup the maximum a posteriori (MAP) estimates are characterized by the minimizers of a generalized Onsager-Machlup functional of the posterior. This is done independently for the so-called weak and strong MAP estimates, which as we show coincide in our context. In addition, we prove a form of weak consistency for the MAP estimators in the infinitely informative data limit. Our results are remarkable for two reasons: first, the prior distribution is non-Gaussian and does not meet the smoothness conditions required in previous research on non-parametric MAP estimates. Second, the result analytically justifies existing uses of the MAP estimate in finite but high dimensional discretizations of Bayesian inverse problems with the considered Besov priors.

  4. Asymptotics of nonparametric L-1 regression models with dependent data

    PubMed Central

    ZHAO, ZHIBIAO; WEI, YING; LIN, DENNIS K.J.

    2013-01-01

    We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile estimates. The obtained Bahadur representations provide deep insights into the asymptotic behavior of the estimates. Our main theoretical development is based on studying the modulus of continuity of kernel weighted empirical process through a coupling argument. Progesterone data is used for an illustration. PMID:24955016

  5. Nonparametric instrumental regression with non-convex constraints

    NASA Astrophysics Data System (ADS)

    Grasmair, M.; Scherzer, O.; Vanhems, A.

    2013-03-01

    This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.

  6. Nonparametric estimation and testing of fixed effects panel data models

    PubMed Central

    Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi

    2009-01-01

    In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335

  7. Simplified Computation for Nonparametric Windows Method of Probability Density Function Estimation.

    PubMed

    Joshi, Niranjan; Kadir, Timor; Brady, Michael

    2011-08-01

    Recently, Kadir and Brady proposed a method for estimating probability density functions (PDFs) for digital signals which they call the Nonparametric (NP) Windows method. The method involves constructing a continuous space representation of the discrete space and sampled signal by using a suitable interpolation method. NP Windows requires only a small number of observed signal samples to estimate the PDF and is completely data driven. In this short paper, we first develop analytical formulae to obtain the NP Windows PDF estimates for 1D, 2D, and 3D signals, for different interpolation methods. We then show that the original procedure to calculate the PDF estimate can be significantly simplified and made computationally more efficient by a judicious choice of the frame of reference. We have also outlined specific algorithmic details of the procedures enabling quick implementation. Our reformulation of the original concept has directly demonstrated a close link between the NP Windows method and the Kernel Density Estimator.

  8. Comparison of methods for estimating the attributable risk in the context of survival analysis.

    PubMed

    Gassama, Malamine; Bénichou, Jacques; Dartois, Laureen; Thiébaut, Anne C M

    2017-01-23

    The attributable risk (AR) measures the proportion of disease cases that can be attributed to an exposure in the population. Several definitions and estimation methods have been proposed for survival data. Using simulations, we compared four methods for estimating AR defined in terms of survival functions: two nonparametric methods based on Kaplan-Meier's estimator, one semiparametric based on Cox's model, and one parametric based on the piecewise constant hazards model, as well as one simpler method based on estimated exposure prevalence at baseline and Cox's model hazard ratio. We considered a fixed binary exposure with varying exposure probabilities and strengths of association, and generated event times from a proportional hazards model with constant or monotonic (decreasing or increasing) Weibull baseline hazard, as well as from a nonproportional hazards model. We simulated 1,000 independent samples of size 1,000 or 10,000. The methods were compared in terms of mean bias, mean estimated standard error, empirical standard deviation and 95% confidence interval coverage probability at four equally spaced time points. Under proportional hazards, all five methods yielded unbiased results regardless of sample size. Nonparametric methods displayed greater variability than other approaches. All methods showed satisfactory coverage except for nonparametric methods at the end of follow-up for a sample size of 1,000 especially. With nonproportional hazards, nonparametric methods yielded similar results to those under proportional hazards, whereas semiparametric and parametric approaches that both relied on the proportional hazards assumption performed poorly. These methods were applied to estimate the AR of breast cancer due to menopausal hormone therapy in 38,359 women of the E3N cohort. In practice, our study suggests to use the semiparametric or parametric approaches to estimate AR as a function of time in cohort studies if the proportional hazards assumption appears appropriate.

  9. On the estimation of spread rate for a biological population

    Treesearch

    Jim Clark; Lajos Horváth; Mark Lewis

    2001-01-01

    We propose a nonparametric estimator for the rate of spread of an introduced population. We prove that the limit distribution of the estimator is normal or stable, depending on the behavior of the moment generating function. We show that resampling methods can also be used to approximate the distribution of the estimators.

  10. A framework for multivariate data-based at-site flood frequency analysis: Essentiality of the conjugal application of parametric and nonparametric approaches

    NASA Astrophysics Data System (ADS)

    Vittal, H.; Singh, Jitendra; Kumar, Pankaj; Karmakar, Subhankar

    2015-06-01

    In watershed management, flood frequency analysis (FFA) is performed to quantify the risk of flooding at different spatial locations and also to provide guidelines for determining the design periods of flood control structures. The traditional FFA was extensively performed by considering univariate scenario for both at-site and regional estimation of return periods. However, due to inherent mutual dependence of the flood variables or characteristics [i.e., peak flow (P), flood volume (V) and flood duration (D), which are random in nature], analysis has been further extended to multivariate scenario, with some restrictive assumptions. To overcome the assumption of same family of marginal density function for all flood variables, the concept of copula has been introduced. Although, the advancement from univariate to multivariate analyses drew formidable attention to the FFA research community, the basic limitation was that the analyses were performed with the implementation of only parametric family of distributions. The aim of the current study is to emphasize the importance of nonparametric approaches in the field of multivariate FFA; however, the nonparametric distribution may not always be a good-fit and capable of replacing well-implemented multivariate parametric and multivariate copula-based applications. Nevertheless, the potential of obtaining best-fit using nonparametric distributions might be improved because such distributions reproduce the sample's characteristics, resulting in more accurate estimations of the multivariate return period. Hence, the current study shows the importance of conjugating multivariate nonparametric approach with multivariate parametric and copula-based approaches, thereby results in a comprehensive framework for complete at-site FFA. Although the proposed framework is designed for at-site FFA, this approach can also be applied to regional FFA because regional estimations ideally include at-site estimations. The framework is based on the following steps: (i) comprehensive trend analysis to assess nonstationarity in the observed data; (ii) selection of the best-fit univariate marginal distribution with a comprehensive set of parametric and nonparametric distributions for the flood variables; (iii) multivariate frequency analyses with parametric, copula-based and nonparametric approaches; and (iv) estimation of joint and various conditional return periods. The proposed framework for frequency analysis is demonstrated using 110 years of observed data from Allegheny River at Salamanca, New York, USA. The results show that for both univariate and multivariate cases, the nonparametric Gaussian kernel provides the best estimate. Further, we perform FFA for twenty major rivers over continental USA, which shows for seven rivers, all the flood variables followed nonparametric Gaussian kernel; whereas for other rivers, parametric distributions provide the best-fit either for one or two flood variables. Thus the summary of results shows that the nonparametric method cannot substitute the parametric and copula-based approaches, but should be considered during any at-site FFA to provide the broadest choices for best estimation of the flood return periods.

  11. Interval Estimation of Seismic Hazard Parameters

    NASA Astrophysics Data System (ADS)

    Orlecka-Sikora, Beata; Lasocki, Stanislaw

    2017-03-01

    The paper considers Poisson temporal occurrence of earthquakes and presents a way to integrate uncertainties of the estimates of mean activity rate and magnitude cumulative distribution function in the interval estimation of the most widely used seismic hazard functions, such as the exceedance probability and the mean return period. The proposed algorithm can be used either when the Gutenberg-Richter model of magnitude distribution is accepted or when the nonparametric estimation is in use. When the Gutenberg-Richter model of magnitude distribution is used the interval estimation of its parameters is based on the asymptotic normality of the maximum likelihood estimator. When the nonparametric kernel estimation of magnitude distribution is used, we propose the iterated bias corrected and accelerated method for interval estimation based on the smoothed bootstrap and second-order bootstrap samples. The changes resulted from the integrated approach in the interval estimation of the seismic hazard functions with respect to the approach, which neglects the uncertainty of the mean activity rate estimates have been studied using Monte Carlo simulations and two real dataset examples. The results indicate that the uncertainty of mean activity rate affects significantly the interval estimates of hazard functions only when the product of activity rate and the time period, for which the hazard is estimated, is no more than 5.0. When this product becomes greater than 5.0, the impact of the uncertainty of cumulative distribution function of magnitude dominates the impact of the uncertainty of mean activity rate in the aggregated uncertainty of the hazard functions. Following, the interval estimates with and without inclusion of the uncertainty of mean activity rate converge. The presented algorithm is generic and can be applied also to capture the propagation of uncertainty of estimates, which are parameters of a multiparameter function, onto this function.

  12. Model-based mean square error estimators for k-nearest neighbour predictions and applications using remotely sensed data for forest inventories

    Treesearch

    Steen Magnussen; Ronald E. McRoberts; Erkki O. Tomppo

    2009-01-01

    New model-based estimators of the uncertainty of pixel-level and areal k-nearest neighbour (knn) predictions of attribute Y from remotely-sensed ancillary data X are presented. Non-parametric functions predict Y from scalar 'Single Index Model' transformations of X. Variance functions generated...

  13. A Bayesian nonparametric method for prediction in EST analysis

    PubMed Central

    Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor

    2007-01-01

    Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445

  14. Smoothing spline ANOVA frailty model for recurrent event data.

    PubMed

    Du, Pang; Jiang, Yihua; Wang, Yuedong

    2011-12-01

    Gap time hazard estimation is of particular interest in recurrent event data. This article proposes a fully nonparametric approach for estimating the gap time hazard. Smoothing spline analysis of variance (ANOVA) decompositions are used to model the log gap time hazard as a joint function of gap time and covariates, and general frailty is introduced to account for between-subject heterogeneity and within-subject correlation. We estimate the nonparametric gap time hazard function and parameters in the frailty distribution using a combination of the Newton-Raphson procedure, the stochastic approximation algorithm (SAA), and the Markov chain Monte Carlo (MCMC) method. The convergence of the algorithm is guaranteed by decreasing the step size of parameter update and/or increasing the MCMC sample size along iterations. Model selection procedure is also developed to identify negligible components in a functional ANOVA decomposition of the log gap time hazard. We evaluate the proposed methods with simulation studies and illustrate its use through the analysis of bladder tumor data. © 2011, The International Biometric Society.

  15. Kernel-Smoothing Estimation of Item Characteristic Functions for Continuous Personality Items: An Empirical Comparison with the Linear and the Continuous-Response Models

    ERIC Educational Resources Information Center

    Ferrando, Pere J.

    2004-01-01

    This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…

  16. Nonparametric Estimation of the Probability of Ruin.

    DTIC Science & Technology

    1985-02-01

    MATHEMATICS RESEARCH CENTER I E N FREES FEB 85 MRC/TSR...in NONPARAMETRIC ESTIMATION OF THE PROBABILITY OF RUIN Lf Edward W. Frees * Mathematics Research Center University of Wisconsin-Madison 610 Walnut...34 - .. --- - • ’. - -:- - - ..- . . .- -- .-.-. . -. . .- •. . - . . - . . .’ . ’- - .. -’vi . .-" "-- -" ,’- UNIVERSITY OF WISCONSIN-MADISON MATHEMATICS RESEARCH CENTER NONPARAMETRIC ESTIMATION OF THE PROBABILITY

  17. Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators: Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators

    DOE PAGES

    Staid, Andrea; Watson, Jean -Paul; Wets, Roger J. -B.; ...

    2017-07-11

    Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power timemore » series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.« less

  18. Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators: Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Staid, Andrea; Watson, Jean -Paul; Wets, Roger J. -B.

    Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power timemore » series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.« less

  19. Classification of longitudinal data through a semiparametric mixed-effects model based on lasso-type estimators.

    PubMed

    Arribas-Gil, Ana; De la Cruz, Rolando; Lebarbier, Emilie; Meza, Cristian

    2015-06-01

    We propose a classification method for longitudinal data. The Bayes classifier is classically used to determine a classification rule where the underlying density in each class needs to be well modeled and estimated. This work is motivated by a real dataset of hormone levels measured at the early stages of pregnancy that can be used to predict normal versus abnormal pregnancy outcomes. The proposed model, which is a semiparametric linear mixed-effects model (SLMM), is a particular case of the semiparametric nonlinear mixed-effects class of models (SNMM) in which finite dimensional (fixed effects and variance components) and infinite dimensional (an unknown function) parameters have to be estimated. In SNMM's maximum likelihood estimation is performed iteratively alternating parametric and nonparametric procedures. However, if one can make the assumption that the random effects and the unknown function interact in a linear way, more efficient estimation methods can be used. Our contribution is the proposal of a unified estimation procedure based on a penalized EM-type algorithm. The Expectation and Maximization steps are explicit. In this latter step, the unknown function is estimated in a nonparametric fashion using a lasso-type procedure. A simulation study and an application on real data are performed. © 2015, The International Biometric Society.

  20. Dynamic characteristics of oxygen consumption.

    PubMed

    Ye, Lin; Argha, Ahmadreza; Yu, Hairong; Celler, Branko G; Nguyen, Hung T; Su, Steven

    2018-04-23

    Previous studies have indicated that oxygen uptake ([Formula: see text]) is one of the most accurate indices for assessing the cardiorespiratory response to exercise. In most existing studies, the response of [Formula: see text] is often roughly modelled as a first-order system due to the inadequate stimulation and low signal to noise ratio. To overcome this difficulty, this paper proposes a novel nonparametric kernel-based method for the dynamic modelling of [Formula: see text] response to provide a more robust estimation. Twenty healthy non-athlete participants conducted treadmill exercises with monotonous stimulation (e.g., single step function as input). During the exercise, [Formula: see text] was measured and recorded by a popular portable gas analyser ([Formula: see text], COSMED). Based on the recorded data, a kernel-based estimation method was proposed to perform the nonparametric modelling of [Formula: see text]. For the proposed method, a properly selected kernel can represent the prior modelling information to reduce the dependence of comprehensive stimulations. Furthermore, due to the special elastic net formed by [Formula: see text] norm and kernelised [Formula: see text] norm, the estimations are smooth and concise. Additionally, the finite impulse response based nonparametric model which estimated by the proposed method can optimally select the order and fit better in terms of goodness-of-fit comparing to classical methods. Several kernels were introduced for the kernel-based [Formula: see text] modelling method. The results clearly indicated that the stable spline (SS) kernel has the best performance for [Formula: see text] modelling. Particularly, based on the experimental data from 20 participants, the estimated response from the proposed method with SS kernel was significantly better than the results from the benchmark method [i.e., prediction error method (PEM)] ([Formula: see text] vs [Formula: see text]). The proposed nonparametric modelling method is an effective method for the estimation of the impulse response of VO 2 -Speed system. Furthermore, the identified average nonparametric model method can dynamically predict [Formula: see text] response with acceptable accuracy during treadmill exercise.

  1. Rediscovery of Good-Turing estimators via Bayesian nonparametrics.

    PubMed

    Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye

    2016-03-01

    The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.

  2. Nonparametric Conditional Estimation

    DTIC Science & Technology

    1987-02-01

    the data because the statistician has complete control over the method. It is especially reasonable when there is a bone fide loss function to which...For example, the sample mean is m(Fn). Most calculations that statisticians perform on a set of data can be expressed as statistical functionals on...of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering

  3. Estimating survival of radio-tagged birds

    USGS Publications Warehouse

    Bunck, C.M.; Pollock, K.H.; Lebreton, J.-D.; North, P.M.

    1993-01-01

    Parametric and nonparametric methods for estimating survival of radio-tagged birds are described. The general assumptions of these methods are reviewed. An estimate based on the assumption of constant survival throughout the period is emphasized in the overview of parametric methods. Two nonparametric methods, the Kaplan-Meier estimate of the survival funcrion and the log rank test, are explained in detail The link between these nonparametric methods and traditional capture-recapture models is discussed aloag with considerations in designing studies that use telemetry techniques to estimate survival.

  4. Nonparametric tests for interaction and group differences in a two-way layout.

    PubMed

    Fisher, A C; Wallenstein, S

    1991-01-01

    Nonparametric tests of group differences and interaction across strata are developed in which the null hypotheses for these tests are expressed as functions of rho i = P(X > Y) + 1/2P(X = Y), where X refers to a random observation from one group and Y refers to a random observation from the other group within stratum i. The estimator r of the parameter rho is shown to be a useful way to summarize and examine data for ordinal and continuous data.

  5. Regionalizing nonparametric models of precipitation amounts on different temporal scales

    NASA Astrophysics Data System (ADS)

    Mosthaf, Tobias; Bárdossy, András

    2017-05-01

    Parametric distribution functions are commonly used to model precipitation amounts corresponding to different durations. The precipitation amounts themselves are crucial for stochastic rainfall generators and weather generators. Nonparametric kernel density estimates (KDEs) offer a more flexible way to model precipitation amounts. As already stated in their name, these models do not exhibit parameters that can be easily regionalized to run rainfall generators at ungauged locations as well as at gauged locations. To overcome this deficiency, we present a new interpolation scheme for nonparametric models and evaluate it for different temporal resolutions ranging from hourly to monthly. During the evaluation, the nonparametric methods are compared to commonly used parametric models like the two-parameter gamma and the mixed-exponential distribution. As water volume is considered to be an essential parameter for applications like flood modeling, a Lorenz-curve-based criterion is also introduced. To add value to the estimation of data at sub-daily resolutions, we incorporated the plentiful daily measurements in the interpolation scheme, and this idea was evaluated. The study region is the federal state of Baden-Württemberg in the southwest of Germany with more than 500 rain gauges. The validation results show that the newly proposed nonparametric interpolation scheme provides reasonable results and that the incorporation of daily values in the regionalization of sub-daily models is very beneficial.

  6. A hybrid pareto mixture for conditional asymmetric fat-tailed distributions.

    PubMed

    Carreau, Julie; Bengio, Yoshua

    2009-07-01

    In many cases, we observe some variables X that contain predictive information over a scalar variable of interest Y , with (X,Y) pairs observed in a training set. We can take advantage of this information to estimate the conditional density p(Y|X = x). In this paper, we propose a conditional mixture model with hybrid Pareto components to estimate p(Y|X = x). The hybrid Pareto is a Gaussian whose upper tail has been replaced by a generalized Pareto tail. A third parameter, in addition to the location and spread parameters of the Gaussian, controls the heaviness of the upper tail. Using the hybrid Pareto in a mixture model results in a nonparametric estimator that can adapt to multimodality, asymmetry, and heavy tails. A conditional density estimator is built by modeling the parameters of the mixture estimator as functions of X. We use a neural network to implement these functions. Such conditional density estimators have important applications in many domains such as finance and insurance. We show experimentally that this novel approach better models the conditional density in terms of likelihood, compared to competing algorithms: conditional mixture models with other types of components and a classical kernel-based nonparametric model.

  7. Smoothness of In vivo Spectral Baseline Determined by Mean Squared Error

    PubMed Central

    Zhang, Yan; Shen, Jun

    2013-01-01

    Purpose A nonparametric smooth line is usually added to spectral model to account for background signals in vivo magnetic resonance spectroscopy (MRS). The assumed smoothness of the baseline significantly influences quantitative spectral fitting. In this paper, a method is proposed to minimize baseline influences on estimated spectral parameters. Methods In this paper, the non-parametric baseline function with a given smoothness was treated as a function of spectral parameters. Its uncertainty was measured by root-mean-squared error (RMSE). The proposed method was demonstrated with a simulated spectrum and in vivo spectra of both short echo time (TE) and averaged echo times. The estimated in vivo baselines were compared with the metabolite-nulled spectra, and the LCModel-estimated baselines. The accuracies of estimated baseline and metabolite concentrations were further verified by cross-validation. Results An optimal smoothness condition was found that led to the minimal baseline RMSE. In this condition, the best fit was balanced against minimal baseline influences on metabolite concentration estimates. Conclusion Baseline RMSE can be used to indicate estimated baseline uncertainties and serve as the criterion for determining the baseline smoothness of in vivo MRS. PMID:24259436

  8. A bias-corrected estimator in multiple imputation for missing data.

    PubMed

    Tomita, Hiroaki; Fujisawa, Hironori; Henmi, Masayuki

    2018-05-29

    Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset. Copyright © 2018 John Wiley & Sons, Ltd.

  9. The estimation of time-varying risks in asset pricing modelling using B-Spline method

    NASA Astrophysics Data System (ADS)

    Nurjannah; Solimun; Rinaldo, Adji

    2017-12-01

    Asset pricing modelling has been extensively studied in the past few decades to explore the risk-return relationship. The asset pricing literature typically assumed a static risk-return relationship. However, several studies found few anomalies in the asset pricing modelling which captured the presence of the risk instability. The dynamic model is proposed to offer a better model. The main problem highlighted in the dynamic model literature is that the set of conditioning information is unobservable and therefore some assumptions have to be made. Hence, the estimation requires additional assumptions about the dynamics of risk. To overcome this problem, the nonparametric estimators can also be used as an alternative for estimating risk. The flexibility of the nonparametric setting avoids the problem of misspecification derived from selecting a functional form. This paper investigates the estimation of time-varying asset pricing model using B-Spline, as one of nonparametric approach. The advantages of spline method is its computational speed and simplicity, as well as the clarity of controlling curvature directly. The three popular asset pricing models will be investigated namely CAPM (Capital Asset Pricing Model), Fama-French 3-factors model and Carhart 4-factors model. The results suggest that the estimated risks are time-varying and not stable overtime which confirms the risk instability anomaly. The results is more pronounced in Carhart’s 4-factors model.

  10. Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error.

    PubMed

    Carroll, Raymond J; Delaigle, Aurore; Hall, Peter

    2011-03-01

    In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.

  11. Mapping the Structure-Function Relationship in Glaucoma and Healthy Patients Measured with Spectralis OCT and Humphrey Perimetry

    PubMed Central

    Muñoz–Negrete, Francisco J.; Oblanca, Noelia; Rebolleda, Gema

    2018-01-01

    Purpose To study the structure-function relationship in glaucoma and healthy patients assessed with Spectralis OCT and Humphrey perimetry using new statistical approaches. Materials and Methods Eighty-five eyes were prospectively selected and divided into 2 groups: glaucoma (44) and healthy patients (41). Three different statistical approaches were carried out: (1) factor analysis of the threshold sensitivities (dB) (automated perimetry) and the macular thickness (μm) (Spectralis OCT), subsequently applying Pearson's correlation to the obtained regions, (2) nonparametric regression analysis relating the values in each pair of regions that showed significant correlation, and (3) nonparametric spatial regressions using three models designed for the purpose of this study. Results In the glaucoma group, a map that relates structural and functional damage was drawn. The strongest correlation with visual fields was observed in the peripheral nasal region of both superior and inferior hemigrids (r = 0.602 and r = 0.458, resp.). The estimated functions obtained with the nonparametric regressions provided the mean sensitivity that corresponds to each given macular thickness. These functions allowed for accurate characterization of the structure-function relationship. Conclusions Both maps and point-to-point functions obtained linking structure and function damage contribute to a better understanding of this relationship and may help in the future to improve glaucoma diagnosis. PMID:29850196

  12. Exact nonparametric confidence bands for the survivor function.

    PubMed

    Matthews, David

    2013-10-12

    A method to produce exact simultaneous confidence bands for the empirical cumulative distribution function that was first described by Owen, and subsequently corrected by Jager and Wellner, is the starting point for deriving exact nonparametric confidence bands for the survivor function of any positive random variable. We invert a nonparametric likelihood test of uniformity, constructed from the Kaplan-Meier estimator of the survivor function, to obtain simultaneous lower and upper bands for the function of interest with specified global confidence level. The method involves calculating a null distribution and associated critical value for each observed sample configuration. However, Noe recursions and the Van Wijngaarden-Decker-Brent root-finding algorithm provide the necessary tools for efficient computation of these exact bounds. Various aspects of the effect of right censoring on these exact bands are investigated, using as illustrations two observational studies of survival experience among non-Hodgkin's lymphoma patients and a much larger group of subjects with advanced lung cancer enrolled in trials within the North Central Cancer Treatment Group. Monte Carlo simulations confirm the merits of the proposed method of deriving simultaneous interval estimates of the survivor function across the entire range of the observed sample. This research was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada. It was begun while the author was visiting the Department of Statistics, University of Auckland, and completed during a subsequent sojourn at the Medical Research Council Biostatistics Unit in Cambridge. The support of both institutions, in addition to that of NSERC and the University of Waterloo, is greatly appreciated.

  13. Estimation of two ordered mean residual lifetime functions.

    PubMed

    Ebrahimi, N

    1993-06-01

    In many statistical studies involving failure data, biometric mortality data, and actuarial data, mean residual lifetime (MRL) function is of prime importance. In this paper we introduce the problem of nonparametric estimation of a MRL function on an interval when this function is bounded from below by another such function (known or unknown) on that interval, and derive the corresponding two functional estimators. The first is to be used when there is a known bound, and the second when the bound is another MRL function to be estimated independently. Both estimators are obtained by truncating the empirical estimator discussed by Yang (1978, Annals of Statistics 6, 112-117). In the first case, it is truncated at a known bound; in the second, at a point somewhere between the two empirical estimates. Consistency of both estimators is proved, and a pointwise large-sample distribution theory of the first estimator is derived.

  14. A Non-Parametric Probability Density Estimator and Some Applications.

    DTIC Science & Technology

    1984-05-01

    distributions, which are assumed to be representa- tive of platykurtic , mesokurtic, and leptokurtic distribu- tions in general. The dissertation is... platykurtic distributions. Consider, for example, the uniform distribution shown in Figure 4. 34 o . 1., Figure 4 -Sensitivity to Support Estimation The...results of the density function comparisons indicate that the new estimator is clearly -Z superior for platykurtic distributions, equal to the best 59

  15. Nonparametric Residue Analysis of Dynamic PET Data With Application to Cerebral FDG Studies in Normals.

    PubMed

    O'Sullivan, Finbarr; Muzi, Mark; Spence, Alexander M; Mankoff, David M; O'Sullivan, Janet N; Fitzgerald, Niall; Newman, George C; Krohn, Kenneth A

    2009-06-01

    Kinetic analysis is used to extract metabolic information from dynamic positron emission tomography (PET) uptake data. The theory of indicator dilutions, developed in the seminal work of Meier and Zierler (1954), provides a probabilistic framework for representation of PET tracer uptake data in terms of a convolution between an arterial input function and a tissue residue. The residue is a scaled survival function associated with tracer residence in the tissue. Nonparametric inference for the residue, a deconvolution problem, provides a novel approach to kinetic analysis-critically one that is not reliant on specific compartmental modeling assumptions. A practical computational technique based on regularized cubic B-spline approximation of the residence time distribution is proposed. Nonparametric residue analysis allows formal statistical evaluation of specific parametric models to be considered. This analysis needs to properly account for the increased flexibility of the nonparametric estimator. The methodology is illustrated using data from a series of cerebral studies with PET and fluorodeoxyglucose (FDG) in normal subjects. Comparisons are made between key functionals of the residue, tracer flux, flow, etc., resulting from a parametric (the standard two-compartment of Phelps et al. 1979) and a nonparametric analysis. Strong statistical evidence against the compartment model is found. Primarily these differences relate to the representation of the early temporal structure of the tracer residence-largely a function of the vascular supply network. There are convincing physiological arguments against the representations implied by the compartmental approach but this is the first time that a rigorous statistical confirmation using PET data has been reported. The compartmental analysis produces suspect values for flow but, notably, the impact on the metabolic flux, though statistically significant, is limited to deviations on the order of 3%-4%. The general advantage of the nonparametric residue analysis is the ability to provide a valid kinetic quantitation in the context of studies where there may be heterogeneity or other uncertainty about the accuracy of a compartmental model approximation of the tissue residue.

  16. Emitter Number Estimation by the General Information Theoretic Criterion from Pulse Trains

    DTIC Science & Technology

    2002-12-01

    negative log likelihood function plus a penalty function. The general information criteria by Yin and Krishnaiah [11] are different from the regular...548-551, Victoria, BC, Canada, March 1999 DRDC Ottawa TR 2002-156 11 11. L. Zhao, P. P. Krishnaiah and Z. Bai, “On some nonparametric methods for

  17. Hazard Function Estimation with Cause-of-Death Data Missing at Random.

    PubMed

    Wang, Qihua; Dinse, Gregg E; Liu, Chunling

    2012-04-01

    Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.

  18. Gini estimation under infinite variance

    NASA Astrophysics Data System (ADS)

    Fontanari, Andrea; Taleb, Nassim Nicholas; Cirillo, Pasquale

    2018-07-01

    We study the problems related to the estimation of the Gini index in presence of a fat-tailed data generating process, i.e. one in the stable distribution class with finite mean but infinite variance (i.e. with tail index α ∈(1 , 2)). We show that, in such a case, the Gini coefficient cannot be reliably estimated using conventional nonparametric methods, because of a downward bias that emerges under fat tails. This has important implications for the ongoing discussion about economic inequality. We start by discussing how the nonparametric estimator of the Gini index undergoes a phase transition in the symmetry structure of its asymptotic distribution, as the data distribution shifts from the domain of attraction of a light-tailed distribution to that of a fat-tailed one, especially in the case of infinite variance. We also show how the nonparametric Gini bias increases with lower values of α. We then prove that maximum likelihood estimation outperforms nonparametric methods, requiring a much smaller sample size to reach efficiency. Finally, for fat-tailed data, we provide a simple correction mechanism to the small sample bias of the nonparametric estimator based on the distance between the mode and the mean of its asymptotic distribution.

  19. Identification of cloud fields by the nonparametric algorithm of pattern recognition from normalized video data recorded with the AVHRR instrument

    NASA Astrophysics Data System (ADS)

    Protasov, Konstantin T.; Pushkareva, Tatyana Y.; Artamonov, Evgeny S.

    2002-02-01

    The problem of cloud field recognition from the NOAA satellite data is urgent for solving not only meteorological problems but also for resource-ecological monitoring of the Earth's underlying surface associated with the detection of thunderstorm clouds, estimation of the liquid water content of clouds and the moisture of the soil, the degree of fire hazard, etc. To solve these problems, we used the AVHRR/NOAA video data that regularly displayed the situation in the territory. The complexity and extremely nonstationary character of problems to be solved call for the use of information of all spectral channels, mathematical apparatus of testing statistical hypotheses, and methods of pattern recognition and identification of the informative parameters. For a class of detection and pattern recognition problems, the average risk functional is a natural criterion for the quality and the information content of the synthesized decision rules. In this case, to solve efficiently the problem of identifying cloud field types, the informative parameters must be determined by minimization of this functional. Since the conditional probability density functions, representing mathematical models of stochastic patterns, are unknown, the problem of nonparametric reconstruction of distributions from the leaning samples arises. To this end, we used nonparametric estimates of distributions with the modified Epanechnikov kernel. The unknown parameters of these distributions were determined by minimization of the risk functional, which for the learning sample was substituted by the empirical risk. After the conditional probability density functions had been reconstructed for the examined hypotheses, a cloudiness type was identified using the Bayes decision rule.

  20. Nonparametric Density Estimation Based on Self-Organizing Incremental Neural Network for Large Noisy Data.

    PubMed

    Nakamura, Yoshihiro; Hasegawa, Osamu

    2017-01-01

    With the ongoing development and expansion of communication networks and sensors, massive amounts of data are continuously generated in real time from real environments. Beforehand, prediction of a distribution underlying such data is difficult; furthermore, the data include substantial amounts of noise. These factors make it difficult to estimate probability densities. To handle these issues and massive amounts of data, we propose a nonparametric density estimator that rapidly learns data online and has high robustness. Our approach is an extension of both kernel density estimation (KDE) and a self-organizing incremental neural network (SOINN); therefore, we call our approach KDESOINN. An SOINN provides a clustering method that learns about the given data as networks of prototype of data; more specifically, an SOINN can learn the distribution underlying the given data. Using this information, KDESOINN estimates the probability density function. The results of our experiments show that KDESOINN outperforms or achieves performance comparable to the current state-of-the-art approaches in terms of robustness, learning time, and accuracy.

  1. Cox regression analysis with missing covariates via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Yu, Mandi

    2018-01-01

    We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

  2. Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci.

    PubMed

    Yap, John Stephen; Fan, Jianqing; Wu, Rongling

    2009-12-01

    Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.'s (2006, Biometrika 93, 85-98) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L(2) penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision, and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology, and health sciences.

  3. Multiple robustness in factorized likelihood models.

    PubMed

    Molina, J; Rotnitzky, A; Sued, M; Robins, J M

    2017-09-01

    We consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct.

  4. On the Use of Nonparametric Item Characteristic Curve Estimation Techniques for Checking Parametric Model Fit

    ERIC Educational Resources Information Center

    Lee, Young-Sun; Wollack, James A.; Douglas, Jeffrey

    2009-01-01

    The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items,…

  5. Nonparametric Item Response Curve Estimation with Correction for Measurement Error

    ERIC Educational Resources Information Center

    Guo, Hongwen; Sinharay, Sandip

    2011-01-01

    Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…

  6. Multivariate Density Estimation and Remote Sensing

    NASA Technical Reports Server (NTRS)

    Scott, D. W.

    1983-01-01

    Current efforts to develop methods and computer algorithms to effectively represent multivariate data commonly encountered in remote sensing applications are described. While this may involve scatter diagrams, multivariate representations of nonparametric probability density estimates are emphasized. The density function provides a useful graphical tool for looking at data and a useful theoretical tool for classification. This approach is called a thunderstorm data analysis.

  7. Benchmark dose analysis via nonparametric regression modeling

    PubMed Central

    Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

    2013-01-01

    Estimation of benchmark doses (BMDs) in quantitative risk assessment traditionally is based upon parametric dose-response modeling. It is a well-known concern, however, that if the chosen parametric model is uncertain and/or misspecified, inaccurate and possibly unsafe low-dose inferences can result. We describe a nonparametric approach for estimating BMDs with quantal-response data based on an isotonic regression method, and also study use of corresponding, nonparametric, bootstrap-based confidence limits for the BMD. We explore the confidence limits’ small-sample properties via a simulation study, and illustrate the calculations with an example from cancer risk assessment. It is seen that this nonparametric approach can provide a useful alternative for BMD estimation when faced with the problem of parametric model uncertainty. PMID:23683057

  8. Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.

    PubMed

    Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R

    2018-06-01

    Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.

  9. Hazard Function Estimation with Cause-of-Death Data Missing at Random

    PubMed Central

    Wang, Qihua; Dinse, Gregg E.; Liu, Chunling

    2010-01-01

    Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data. PMID:22267874

  10. Nonparametric analysis of Minnesota spruce and aspen tree data and LANDSAT data

    NASA Technical Reports Server (NTRS)

    Scott, D. W.; Jee, R.

    1984-01-01

    The application of nonparametric methods in data-intensive problems faced by NASA is described. The theoretical development of efficient multivariate density estimators and the novel use of color graphics workstations are reviewed. The use of nonparametric density estimates for data representation and for Bayesian classification are described and illustrated. Progress in building a data analysis system in a workstation environment is reviewed and preliminary runs presented.

  11. Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure

    PubMed Central

    Berisha, Visar; Wisler, Alan; Hero, Alfred O.; Spanias, Andreas

    2015-01-01

    Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks. PMID:26807014

  12. A comparison of Hong Kong and United Kingdom SF-6D health states valuations using a nonparametric Bayesian method.

    PubMed

    Kharroubi, Samer A; Brazier, John E; McGhee, Sarah

    2014-06-01

    There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained in different countries. The present study applies a nonparametric model to estimate and compare two HK and UK standard gamble values for six-dimensional health state short form (derived from short-form 36 health survey) (SF-6D) health states using Bayesian methods. The data set is the HK and UK SF-6D valuation studies in which two samples of 197 and 249 states defined by the SF-6D were valued by representative samples of the HK and UK general populations, respectively, both using the standard gamble technique. We estimated a function applicable across both countries that explicitly accounts for the differences between them, and is estimated using the data from both countries. The results suggest that differences in SF-6D health state valuations between the UK and HK general populations are potentially important. In particular, the valuations of Hong Kong were meaningfully higher than those of the United Kingdom for most of the selected SF-6D health states. The magnitude of these country-specific differences in health state valuation depended, however, in a complex way on the levels of individual dimensions. The new Bayesian nonparametric method is a powerful approach for analyzing data from multiple nationalities or ethnic groups to understand the differences between them and potentially to estimate the underlying utility functions more efficiently. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.

  13. [Nonparametric method of estimating survival functions containing right-censored and interval-censored data].

    PubMed

    Xu, Yonghong; Gao, Xiaohuan; Wang, Zhengxi

    2014-04-01

    Missing data represent a general problem in many scientific fields, especially in medical survival analysis. Dealing with censored data, interpolation method is one of important methods. However, most of the interpolation methods replace the censored data with the exact data, which will distort the real distribution of the censored data and reduce the probability of the real data falling into the interpolation data. In order to solve this problem, we in this paper propose a nonparametric method of estimating the survival function of right-censored and interval-censored data and compare its performance to SC (self-consistent) algorithm. Comparing to the average interpolation and the nearest neighbor interpolation method, the proposed method in this paper replaces the right-censored data with the interval-censored data, and greatly improves the probability of the real data falling into imputation interval. Then it bases on the empirical distribution theory to estimate the survival function of right-censored and interval-censored data. The results of numerical examples and a real breast cancer data set demonstrated that the proposed method had higher accuracy and better robustness for the different proportion of the censored data. This paper provides a good method to compare the clinical treatments performance with estimation of the survival data of the patients. This pro vides some help to the medical survival data analysis.

  14. Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts

    NASA Astrophysics Data System (ADS)

    AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.

    2014-12-01

    Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using nonparametric kernel methods. In addition, to the pointwise hourly wind speed forecasts, a confidence interval is also provided which allows to quantify the uncertainty around the forecasts.

  15. Gaussian process-based Bayesian nonparametric inference of population size trajectories from gene genealogies.

    PubMed

    Palacios, Julia A; Minin, Vladimir N

    2013-03-01

    Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method. Copyright © 2013, The International Biometric Society.

  16. Local Composite Quantile Regression Smoothing for Harris Recurrent Markov Processes

    PubMed Central

    Li, Degui; Li, Runze

    2016-01-01

    In this paper, we study the local polynomial composite quantile regression (CQR) smoothing method for the nonlinear and nonparametric models under the Harris recurrent Markov chain framework. The local polynomial CQR regression method is a robust alternative to the widely-used local polynomial method, and has been well studied in stationary time series. In this paper, we relax the stationarity restriction on the model, and allow that the regressors are generated by a general Harris recurrent Markov process which includes both the stationary (positive recurrent) and nonstationary (null recurrent) cases. Under some mild conditions, we establish the asymptotic theory for the proposed local polynomial CQR estimator of the mean regression function, and show that the convergence rate for the estimator in nonstationary case is slower than that in stationary case. Furthermore, a weighted type local polynomial CQR estimator is provided to improve the estimation efficiency, and a data-driven bandwidth selection is introduced to choose the optimal bandwidth involved in the nonparametric estimators. Finally, we give some numerical studies to examine the finite sample performance of the developed methodology and theory. PMID:27667894

  17. Semiparametric mixed-effects analysis of PK/PD models using differential equations.

    PubMed

    Wang, Yi; Eskridge, Kent M; Zhang, Shunpu

    2008-08-01

    Motivated by the use of semiparametric nonlinear mixed-effects modeling on longitudinal data, we develop a new semiparametric modeling approach to address potential structural model misspecification for population pharmacokinetic/pharmacodynamic (PK/PD) analysis. Specifically, we use a set of ordinary differential equations (ODEs) with form dx/dt = A(t)x + B(t) where B(t) is a nonparametric function that is estimated using penalized splines. The inclusion of a nonparametric function in the ODEs makes identification of structural model misspecification feasible by quantifying the model uncertainty and provides flexibility for accommodating possible structural model deficiencies. The resulting model will be implemented in a nonlinear mixed-effects modeling setup for population analysis. We illustrate the method with an application to cefamandole data and evaluate its performance through simulations.

  18. Robust nonparametric quantification of clustering density of molecules in single-molecule localization microscopy

    PubMed Central

    Jiang, Shenghang; Park, Seongjin; Challapalli, Sai Divya; Fei, Jingyi; Wang, Yong

    2017-01-01

    We report a robust nonparametric descriptor, J′(r), for quantifying the density of clustering molecules in single-molecule localization microscopy. J′(r), based on nearest neighbor distribution functions, does not require any parameter as an input for analyzing point patterns. We show that J′(r) displays a valley shape in the presence of clusters of molecules, and the characteristics of the valley reliably report the clustering features in the data. Most importantly, the position of the J′(r) valley (rJm′) depends exclusively on the density of clustering molecules (ρc). Therefore, it is ideal for direct estimation of the clustering density of molecules in single-molecule localization microscopy. As an example, this descriptor was applied to estimate the clustering density of ptsG mRNA in E. coli bacteria. PMID:28636661

  19. Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28

    ERIC Educational Resources Information Center

    Guo, Hongwen; Sinharay, Sandip

    2011-01-01

    Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…

  20. Confidence Intervals for a Semiparametric Approach to Modeling Nonlinear Relations among Latent Variables

    ERIC Educational Resources Information Center

    Pek, Jolynn; Losardo, Diane; Bauer, Daniel J.

    2011-01-01

    Compared to parametric models, nonparametric and semiparametric approaches to modeling nonlinearity between latent variables have the advantage of recovering global relationships of unknown functional form. Bauer (2005) proposed an indirect application of finite mixtures of structural equation models where latent components are estimated in the…

  1. GEE-Smoothing Spline in Semiparametric Model with Correlated Nominal Data

    NASA Astrophysics Data System (ADS)

    Ibrahim, Noor Akma; Suliadi

    2010-11-01

    In this paper we propose GEE-Smoothing spline in the estimation of semiparametric models with correlated nominal data. The method can be seen as an extension of parametric generalized estimating equation to semiparametric models. The nonparametric component is estimated using smoothing spline specifically the natural cubic spline. We use profile algorithm in the estimation of both parametric and nonparametric components. The properties of the estimators are evaluated using simulation studies.

  2. A Comparison of Japan and U.K. SF-6D Health-State Valuations Using a Non-Parametric Bayesian Method.

    PubMed

    Kharroubi, Samer A

    2015-08-01

    There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained in different countries. We sought to estimate and compare two directly elicited valuations for SF-6D health states between the Japan and U.K. general adult populations using Bayesian methods. We analysed data from two SF-6D valuation studies where, using similar standard gamble protocols, values for 241 and 249 states were elicited from representative samples of the Japan and U.K. general adult populations, respectively. We estimate a function applicable across both countries that explicitly accounts for the differences between them, and is estimated using data from both countries. The results suggest that differences in SF-6D health-state valuations between the Japan and U.K. general populations are potentially important. The magnitude of these country-specific differences in health-state valuation depended, however, in a complex way on the levels of individual dimensions. The new Bayesian non-parametric method is a powerful approach for analysing data from multiple nationalities or ethnic groups, to understand the differences between them and potentially to estimate the underlying utility functions more efficiently.

  3. Nonparametric tests for equality of psychometric functions.

    PubMed

    García-Pérez, Miguel A; Núñez-Antón, Vicente

    2017-12-07

    Many empirical studies measure psychometric functions (curves describing how observers' performance varies with stimulus magnitude) because these functions capture the effects of experimental conditions. To assess these effects, parametric curves are often fitted to the data and comparisons are carried out by testing for equality of mean parameter estimates across conditions. This approach is parametric and, thus, vulnerable to violations of the implied assumptions. Furthermore, testing for equality of means of parameters may be misleading: Psychometric functions may vary meaningfully across conditions on an observer-by-observer basis with no effect on the mean values of the estimated parameters. Alternative approaches to assess equality of psychometric functions per se are thus needed. This paper compares three nonparametric tests that are applicable in all situations of interest: The existing generalized Mantel-Haenszel test, a generalization of the Berry-Mielke test that was developed here, and a split variant of the generalized Mantel-Haenszel test also developed here. Their statistical properties (accuracy and power) are studied via simulation and the results show that all tests are indistinguishable as to accuracy but they differ non-uniformly as to power. Empirical use of the tests is illustrated via analyses of published data sets and practical recommendations are given. The computer code in MATLAB and R to conduct these tests is available as Electronic Supplemental Material.

  4. Characterization and modelling of the spatially- and spectrally-varying point-spread function in hyperspectral imaging systems for computational correction of axial optical aberrations

    NASA Astrophysics Data System (ADS)

    Špiclin, Žiga; Bürmen, Miran; Pernuš, Franjo; Likar, Boštjan

    2012-03-01

    Spatial resolution of hyperspectral imaging systems can vary significantly due to axial optical aberrations that originate from wavelength-induced index-of-refraction variations of the imaging optics. For systems that have a broad spectral range, the spatial resolution will vary significantly both with respect to the acquisition wavelength and with respect to the spatial position within each spectral image. Variations of the spatial resolution can be effectively characterized as part of the calibration procedure by a local image-based estimation of the pointspread function (PSF) of the hyperspectral imaging system. The estimated PSF can then be used in the image deconvolution methods to improve the spatial resolution of the spectral images. We estimated the PSFs from the spectral images of a line grid geometric caliber. From individual line segments of the line grid, the PSF was obtained by a non-parametric estimation procedure that used an orthogonal series representation of the PSF. By using the non-parametric estimation procedure, the PSFs were estimated at different spatial positions and at different wavelengths. The variations of the spatial resolution were characterized by the radius and the fullwidth half-maximum of each PSF and by the modulation transfer function, computed from images of USAF1951 resolution target. The estimation and characterization of the PSFs and the image deconvolution based spatial resolution enhancement were tested on images obtained by a hyperspectral imaging system with an acousto-optic tunable filter in the visible spectral range. The results demonstrate that the spatial resolution of the acquired spectral images can be significantly improved using the estimated PSFs and image deconvolution methods.

  5. TRAN-STAT: statistics for environmental transuranic studies, July 1978, Number 5

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Not Available

    This issue is concerned with nonparametric procedures for (1) estimating the central tendency of a population, (2) describing data sets through estimating percentiles, (3) estimating confidence limits for the median and other percentiles, (4) estimating tolerance limits and associated numbers of samples, and (5) tests of significance and associated procedures for a variety of testing situations (counterparts to t-tests and analysis of variance). Some characteristics of several nonparametric tests are illustrated using the NAEG /sup 241/Am aliquot data presented and discussed in the April issue of TRAN-STAT. Some of the statistical terms used here are defined in a glossary. Themore » reference list also includes short descriptions of nonparametric books. 31 references, 3 figures, 1 table.« less

  6. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  7. A consistent NPMLE of the joint distribution function with competing risks data under the dependent masking and right-censoring model.

    PubMed

    Li, Jiahui; Yu, Qiqing

    2016-01-01

    Dinse (Biometrics, 38:417-431, 1982) provides a special type of right-censored and masked competing risks data and proposes a non-parametric maximum likelihood estimator (NPMLE) and a pseudo MLE of the joint distribution function [Formula: see text] with such data. However, their asymptotic properties have not been studied so far. Under the extention of either the conditional masking probability (CMP) model or the random partition masking (RPM) model (Yu and Li, J Nonparametr Stat 24:753-764, 2012), we show that (1) Dinse's estimators are consistent if [Formula: see text] takes on finitely many values and each point in the support set of [Formula: see text] can be observed; (2) if the failure time is continuous, the NPMLE is not uniquely determined, and the standard approach (which puts weights only on one element in each observed set) leads to an inconsistent NPMLE; (3) in general, Dinse's estimators are not consistent even under the discrete assumption; (4) we construct a consistent NPMLE. The consistency is given under a new model called dependent masking and right-censoring model. The CMP model and the RPM model are indeed special cases of the new model. We compare our estimator to Dinse's estimators through simulation and real data. Simulation study indicates that the consistent NPMLE is a good approximation to the underlying distribution for moderate sample sizes.

  8. Reliability of Test Scores in Nonparametric Item Response Theory.

    ERIC Educational Resources Information Center

    Sijtsma, Klaas; Molenaar, Ivo W.

    1987-01-01

    Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four "classical" lower bounds to reliability. (Author/JAZ)

  9. Estimation of spline function in nonparametric path analysis based on penalized weighted least square (PWLS)

    NASA Astrophysics Data System (ADS)

    Fernandes, Adji Achmad Rinaldo; Solimun, Arisoesilaningsih, Endang

    2017-12-01

    The aim of this research is to estimate the spline in Path Analysis-based on Nonparametric Regression using Penalized Weighted Least Square (PWLS) approach. Approach used is Reproducing Kernel Hilbert Space at sobolev space. Nonparametric path analysis model on the equation y1 i=f1.1(x1 i)+ε1 i; y2 i=f1.2(x1 i)+f2.2(y1 i)+ε2 i; i =1 ,2 ,…,n Nonparametric Path Analysis which meet the criteria of minimizing PWLS min fw .k∈W2m[aw .k,bw .k], k =1 ,2 { (2n ) -1(y˜-f ˜ ) TΣ-1(y ˜-f ˜ ) + ∑k =1 2 ∑w =1 2 λw .k ∫aw .k bw .k [fw.k (m )(xi) ] 2d xi } is f ˜^=Ay ˜ with A=T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1+V1U1-1∑-1[I-T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1] columnalign="left">+T2(T2TU2-1∑-1T2)-1T2TU2-1∑-1+V2U2-1∑-1[I1-T2(T2TU2-1∑-1T2) -1T2TU2-1∑-1

  10. Joint nonparametric correction estimator for excess relative risk regression in survival analysis with exposure measurement error

    PubMed Central

    Wang, Ching-Yun; Cullings, Harry; Song, Xiao; Kopecky, Kenneth J.

    2017-01-01

    SUMMARY Observational epidemiological studies often confront the problem of estimating exposure-disease relationships when the exposure is not measured exactly. In the paper, we investigate exposure measurement error in excess relative risk regression, which is a widely used model in radiation exposure effect research. In the study cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies a generalized version of the classical additive measurement error model, but it may or may not have repeated measurements. In addition, an instrumental variable is available for individuals in a subset of the whole cohort. We develop a nonparametric correction (NPC) estimator using data from the subcohort, and further propose a joint nonparametric correction (JNPC) estimator using all observed data to adjust for exposure measurement error. An optimal linear combination estimator of JNPC and NPC is further developed. The proposed estimators are nonparametric, which are consistent without imposing a covariate or error distribution, and are robust to heteroscedastic errors. Finite sample performance is examined via a simulation study. We apply the developed methods to data from the Radiation Effects Research Foundation, in which chromosome aberration is used to adjust for the effects of radiation dose measurement error on the estimation of radiation dose responses. PMID:29354018

  11. Using exogenous variables in testing for monotonic trends in hydrologic time series

    USGS Publications Warehouse

    Alley, William M.

    1988-01-01

    One approach that has been used in performing a nonparametric test for monotonic trend in a hydrologic time series consists of a two-stage analysis. First, a regression equation is estimated for the variable being tested as a function of an exogenous variable. A nonparametric trend test such as the Kendall test is then performed on the residuals from the equation. By analogy to stagewise regression and through Monte Carlo experiments, it is demonstrated that this approach will tend to underestimate the magnitude of the trend and to result in some loss in power as a result of ignoring the interaction between the exogenous variable and time. An alternative approach, referred to as the adjusted variable Kendall test, is demonstrated to generally have increased statistical power and to provide more reliable estimates of the trend slope. In addition, the utility of including an exogenous variable in a trend test is examined under selected conditions.

  12. Nonparametric estimation of stochastic differential equations with sparse Gaussian processes.

    PubMed

    García, Constantino A; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G

    2017-08-01

    The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.

  13. A Monte Carlo Study of the Effect of Item Characteristic Curve Estimation on the Accuracy of Three Person-Fit Statistics

    ERIC Educational Resources Information Center

    St-Onge, Christina; Valois, Pierre; Abdous, Belkacem; Germain, Stephane

    2009-01-01

    To date, there have been no studies comparing parametric and nonparametric Item Characteristic Curve (ICC) estimation methods on the effectiveness of Person-Fit Statistics (PFS). The primary aim of this study was to determine if the use of ICCs estimated by nonparametric methods would increase the accuracy of item response theory-based PFS for…

  14. Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

    PubMed

    Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

    2012-01-01

    Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.

  15. Stochastic search, optimization and regression with energy applications

    NASA Astrophysics Data System (ADS)

    Hannah, Lauren A.

    Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.

  16. About an adaptively weighted Kaplan-Meier estimate.

    PubMed

    Plante, Jean-François

    2009-09-01

    The minimum averaged mean squared error nonparametric adaptive weights use data from m possibly different populations to infer about one population of interest. The definition of these weights is based on the properties of the empirical distribution function. We use the Kaplan-Meier estimate to let the weights accommodate right-censored data and use them to define the weighted Kaplan-Meier estimate. The proposed estimate is smoother than the usual Kaplan-Meier estimate and converges uniformly in probability to the target distribution. Simulations show that the performances of the weighted Kaplan-Meier estimate on finite samples exceed that of the usual Kaplan-Meier estimate. A case study is also presented.

  17. Nonparametric estimates of drift and diffusion profiles via Fokker-Planck algebra.

    PubMed

    Lund, Steven P; Hubbard, Joseph B; Halter, Michael

    2014-11-06

    Diffusion processes superimposed upon deterministic motion play a key role in understanding and controlling the transport of matter, energy, momentum, and even information in physics, chemistry, material science, biology, and communications technology. Given functions defining these random and deterministic components, the Fokker-Planck (FP) equation is often used to model these diffusive systems. Many methods exist for estimating the drift and diffusion profiles from one or more identifiable diffusive trajectories; however, when many identical entities diffuse simultaneously, it may not be possible to identify individual trajectories. Here we present a method capable of simultaneously providing nonparametric estimates for both drift and diffusion profiles from evolving density profiles, requiring only the validity of Langevin/FP dynamics. This algebraic FP manipulation provides a flexible and robust framework for estimating stationary drift and diffusion coefficient profiles, is not based on fluctuation theory or solved diffusion equations, and may facilitate predictions for many experimental systems. We illustrate this approach on experimental data obtained from a model lipid bilayer system exhibiting free diffusion and electric field induced drift. The wide range over which this approach provides accurate estimates for drift and diffusion profiles is demonstrated through simulation.

  18. CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions

    EPA Pesticide Factsheets

    Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.

  19. The Highly Adaptive Lasso Estimator

    PubMed Central

    Benkeser, David; van der Laan, Mark

    2017-01-01

    Estimation of a regression functions is a common goal of statistical learning. We propose a novel nonparametric regression estimator that, in contrast to many existing methods, does not rely on local smoothness assumptions nor is it constructed using local smoothing techniques. Instead, our estimator respects global smoothness constraints by virtue of falling in a class of right-hand continuous functions with left-hand limits that have variation norm bounded by a constant. Using empirical process theory, we establish a fast minimal rate of convergence of our proposed estimator and illustrate how such an estimator can be constructed using standard software. In simulations, we show that the finite-sample performance of our estimator is competitive with other popular machine learning techniques across a variety of data generating mechanisms. We also illustrate competitive performance in real data examples using several publicly available data sets. PMID:29094111

  20. Nonparametric density estimation and optimal bandwidth selection for protein unfolding and unbinding data

    NASA Astrophysics Data System (ADS)

    Bura, E.; Zhmurov, A.; Barsegov, V.

    2009-01-01

    Dynamic force spectroscopy and steered molecular simulations have become powerful tools for analyzing the mechanical properties of proteins, and the strength of protein-protein complexes and aggregates. Probability density functions of the unfolding forces and unfolding times for proteins, and rupture forces and bond lifetimes for protein-protein complexes allow quantification of the forced unfolding and unbinding transitions, and mapping the biomolecular free energy landscape. The inference of the unknown probability distribution functions from the experimental and simulated forced unfolding and unbinding data, as well as the assessment of analytically tractable models of the protein unfolding and unbinding requires the use of a bandwidth. The choice of this quantity is typically subjective as it draws heavily on the investigator's intuition and past experience. We describe several approaches for selecting the "optimal bandwidth" for nonparametric density estimators, such as the traditionally used histogram and the more advanced kernel density estimators. The performance of these methods is tested on unimodal and multimodal skewed, long-tailed distributed data, as typically observed in force spectroscopy experiments and in molecular pulling simulations. The results of these studies can serve as a guideline for selecting the optimal bandwidth to resolve the underlying distributions from the forced unfolding and unbinding data for proteins.

  1. Constrained multiple indicator kriging using sequential quadratic programming

    NASA Astrophysics Data System (ADS)

    Soltani-Mohammadi, Saeed; Erhan Tercan, A.

    2012-11-01

    Multiple indicator kriging (MIK) is a nonparametric method used to estimate conditional cumulative distribution functions (CCDF). Indicator estimates produced by MIK may not satisfy the order relations of a valid CCDF which is ordered and bounded between 0 and 1. In this paper a new method has been presented that guarantees the order relations of the cumulative distribution functions estimated by multiple indicator kriging. The method is based on minimizing the sum of kriging variances for each cutoff under unbiasedness and order relations constraints and solving constrained indicator kriging system by sequential quadratic programming. A computer code is written in the Matlab environment to implement the developed algorithm and the method is applied to the thickness data.

  2. Nonparametric estimation of the heterogeneity of a random medium using compound Poisson process modeling of wave multiple scattering.

    PubMed

    Le Bihan, Nicolas; Margerin, Ludovic

    2009-07-01

    In this paper, we present a nonparametric method to estimate the heterogeneity of a random medium from the angular distribution of intensity of waves transmitted through a slab of random material. Our approach is based on the modeling of forward multiple scattering using compound Poisson processes on compact Lie groups. The estimation technique is validated through numerical simulations based on radiative transfer theory.

  3. Estimating and comparing microbial diversity in the presence of sequencing errors

    PubMed Central

    Chiu, Chun-Huo

    2016-01-01

    Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872

  4. Out-of-Sample Extensions for Non-Parametric Kernel Methods.

    PubMed

    Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang

    2017-02-01

    Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.

  5. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

    PubMed Central

    Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

    2018-01-01

    This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555

  6. Comparison of Parametric and Nonparametric Bootstrap Methods for Estimating Random Error in Equipercentile Equating

    ERIC Educational Resources Information Center

    Cui, Zhongmin; Kolen, Michael J.

    2008-01-01

    This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…

  7. A review of methods to estimate cause-specific mortality in presence of competing risks

    USGS Publications Warehouse

    Heisey, Dennis M.; Patterson, Brent R.

    2006-01-01

    Estimating cause-specific mortality is often of central importance for understanding the dynamics of wildlife populations. Despite such importance, methodology for estimating and analyzing cause-specific mortality has received little attention in wildlife ecology during the past 20 years. The issue of analyzing cause-specific, mutually exclusive events in time is not unique to wildlife. In fact, this general problem has received substantial attention in human biomedical applications within the context of biostatistical survival analysis. Here, we consider cause-specific mortality from a modern biostatistical perspective. This requires carefully defining what we mean by cause-specific mortality and then providing an appropriate hazard-based representation as a competing risks problem. This leads to the general solution of cause-specific mortality as the cumulative incidence function (CIF). We describe the appropriate generalization of the fully nonparametric staggered-entry Kaplan–Meier survival estimator to cause-specific mortality via the nonparametric CIF estimator (NPCIFE), which in many situations offers an attractive alternative to the Heisey–Fuller estimator. An advantage of the NPCIFE is that it lends itself readily to risk factors analysis with standard software for Cox proportional hazards model. The competing risks–based approach also clarifies issues regarding another intuitive but erroneous "cause-specific mortality" estimator based on the Kaplan–Meier survival estimator and commonly seen in the life sciences literature.

  8. A non-parametric consistency test of the ΛCDM model with Planck CMB data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aghamousa, Amir; Shafieloo, Arman; Hamann, Jan, E-mail: amir@aghamousa.com, E-mail: jan.hamann@unsw.edu.au, E-mail: shafieloo@kasi.re.kr

    Non-parametric reconstruction methods, such as Gaussian process (GP) regression, provide a model-independent way of estimating an underlying function and its uncertainty from noisy data. We demonstrate how GP-reconstruction can be used as a consistency test between a given data set and a specific model by looking for structures in the residuals of the data with respect to the model's best-fit. Applying this formalism to the Planck temperature and polarisation power spectrum measurements, we test their global consistency with the predictions of the base ΛCDM model. Our results do not show any serious inconsistencies, lending further support to the interpretation ofmore » the base ΛCDM model as cosmology's gold standard.« less

  9. Comparison of parametric and bootstrap method in bioequivalence test.

    PubMed

    Ahn, Byung-Jin; Yim, Dong-Seok

    2009-10-01

    The estimation of 90% parametric confidence intervals (CIs) of mean AUC and Cmax ratios in bioequivalence (BE) tests are based upon the assumption that formulation effects in log-transformed data are normally distributed. To compare the parametric CIs with those obtained from nonparametric methods we performed repeated estimation of bootstrap-resampled datasets. The AUC and Cmax values from 3 archived datasets were used. BE tests on 1,000 resampled datasets from each archived dataset were performed using SAS (Enterprise Guide Ver.3). Bootstrap nonparametric 90% CIs of formulation effects were then compared with the parametric 90% CIs of the original datasets. The 90% CIs of formulation effects estimated from the 3 archived datasets were slightly different from nonparametric 90% CIs obtained from BE tests on resampled datasets. Histograms and density curves of formulation effects obtained from resampled datasets were similar to those of normal distribution. However, in 2 of 3 resampled log (AUC) datasets, the estimates of formulation effects did not follow the Gaussian distribution. Bias-corrected and accelerated (BCa) CIs, one of the nonparametric CIs of formulation effects, shifted outside the parametric 90% CIs of the archived datasets in these 2 non-normally distributed resampled log (AUC) datasets. Currently, the 80~125% rule based upon the parametric 90% CIs is widely accepted under the assumption of normally distributed formulation effects in log-transformed data. However, nonparametric CIs may be a better choice when data do not follow this assumption.

  10. Comparison of Parametric and Bootstrap Method in Bioequivalence Test

    PubMed Central

    Ahn, Byung-Jin

    2009-01-01

    The estimation of 90% parametric confidence intervals (CIs) of mean AUC and Cmax ratios in bioequivalence (BE) tests are based upon the assumption that formulation effects in log-transformed data are normally distributed. To compare the parametric CIs with those obtained from nonparametric methods we performed repeated estimation of bootstrap-resampled datasets. The AUC and Cmax values from 3 archived datasets were used. BE tests on 1,000 resampled datasets from each archived dataset were performed using SAS (Enterprise Guide Ver.3). Bootstrap nonparametric 90% CIs of formulation effects were then compared with the parametric 90% CIs of the original datasets. The 90% CIs of formulation effects estimated from the 3 archived datasets were slightly different from nonparametric 90% CIs obtained from BE tests on resampled datasets. Histograms and density curves of formulation effects obtained from resampled datasets were similar to those of normal distribution. However, in 2 of 3 resampled log (AUC) datasets, the estimates of formulation effects did not follow the Gaussian distribution. Bias-corrected and accelerated (BCa) CIs, one of the nonparametric CIs of formulation effects, shifted outside the parametric 90% CIs of the archived datasets in these 2 non-normally distributed resampled log (AUC) datasets. Currently, the 80~125% rule based upon the parametric 90% CIs is widely accepted under the assumption of normally distributed formulation effects in log-transformed data. However, nonparametric CIs may be a better choice when data do not follow this assumption. PMID:19915699

  11. Estimation of option-implied risk-neutral into real-world density by using calibration function

    NASA Astrophysics Data System (ADS)

    Bahaludin, Hafizah; Abdullah, Mimi Hafizah

    2017-04-01

    Option prices contain crucial information that can be used as a reflection of future development of an underlying assets' price. The main objective of this study is to extract the risk-neutral density (RND) and the risk-world density (RWD) of option prices. A volatility function technique is applied by using a fourth order polynomial interpolation to obtain the RNDs. Then, a calibration function is used to convert the RNDs into RWDs. There are two types of calibration function which are parametric and non-parametric calibrations. The density is extracted from the Dow Jones Industrial Average (DJIA) index options with a one month constant maturity from January 2009 until December 2015. The performance of RNDs and RWDs extracted are evaluated by using a density forecasting test. This study found out that the RWDs obtain can provide an accurate information regarding the price of the underlying asset in future compared to that of the RNDs. In addition, empirical evidence suggests that RWDs from a non-parametric calibration has a better accuracy than other densities.

  12. Quadratic semiparametric Von Mises calculus

    PubMed Central

    Robins, James; Li, Lingling; Tchetgen, Eric

    2009-01-01

    We discuss a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on U-statistics constructed from quadratic influence functions. The latter extend ordinary linear influence functions of the parameter of interest as defined in semiparametric theory, and represent second order derivatives of this parameter. For parameters for which the matching cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than n–1/2-rate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at n–1/2-rate. PMID:23087487

  13. Smooth conditional distribution function and quantiles under random censorship.

    PubMed

    Leconte, Eve; Poiraud-Casanova, Sandrine; Thomas-Agnan, Christine

    2002-09-01

    We consider a nonparametric random design regression model in which the response variable is possibly right censored. The aim of this paper is to estimate the conditional distribution function and the conditional alpha-quantile of the response variable. We restrict attention to the case where the response variable as well as the explanatory variable are unidimensional and continuous. We propose and discuss two classes of estimators which are smooth with respect to the response variable as well as to the covariate. Some simulations demonstrate that the new methods have better mean square error performances than the generalized Kaplan-Meier estimator introduced by Beran (1981) and considered in the literature by Dabrowska (1989, 1992) and Gonzalez-Manteiga and Cadarso-Suarez (1994).

  14. Nonparametric Bayesian models for a spatial covariance.

    PubMed

    Reich, Brian J; Fuentes, Montserrat

    2012-01-01

    A crucial step in the analysis of spatial data is to estimate the spatial correlation function that determines the relationship between a spatial process at two locations. The standard approach to selecting the appropriate correlation function is to use prior knowledge or exploratory analysis, such as a variogram analysis, to select the correct parametric correlation function. Rather that selecting a particular parametric correlation function, we treat the covariance function as an unknown function to be estimated from the data. We propose a flexible prior for the correlation function to provide robustness to the choice of correlation function. We specify the prior for the correlation function using spectral methods and the Dirichlet process prior, which is a common prior for an unknown distribution function. Our model does not require Gaussian data or spatial locations on a regular grid. The approach is demonstrated using a simulation study as well as an analysis of California air pollution data.

  15. Quantal Response: Nonparametric Modeling

    DTIC Science & Technology

    2017-01-01

    DATES COVERED (From ‐ To) 4. TITLE AND SUBTITLE    5a. CONTRACT NUMBER  5b. GRANT NUMBER  5c. PROGRAM  ELEMENT  NUMBER 6. AUTHOR(S)    5d...creasing function as P(x) = G ( f (x) ) , where G is a monotone function such as the standard logistic, normal, or Cauchy CDF. Finite -dimensional...examples with dimension k = 5 where various colors distinguish the basis elements . Figure 3 shows logistic response estimates for these 3 basis sets

  16. A survey of kernel-type estimators for copula and their applications

    NASA Astrophysics Data System (ADS)

    Sumarjaya, I. W.

    2017-10-01

    Copulas have been widely used to model nonlinear dependence structure. Main applications of copulas include areas such as finance, insurance, hydrology, rainfall to name but a few. The flexibility of copula allows researchers to model dependence structure beyond Gaussian distribution. Basically, a copula is a function that couples multivariate distribution functions to their one-dimensional marginal distribution functions. In general, there are three methods to estimate copula. These are parametric, nonparametric, and semiparametric method. In this article we survey kernel-type estimators for copula such as mirror reflection kernel, beta kernel, transformation method and local likelihood transformation method. Then, we apply these kernel methods to three stock indexes in Asia. The results of our analysis suggest that, albeit variation in information criterion values, the local likelihood transformation method performs better than the other kernel methods.

  17. Constructing a cosmological model-independent Hubble diagram of type Ia supernovae with cosmic chronometers

    NASA Astrophysics Data System (ADS)

    Li, Zhengxiang; Gonzalez, J. E.; Yu, Hongwei; Zhu, Zong-Hong; Alcaniz, J. S.

    2016-02-01

    We apply two methods, i.e., the Gaussian processes and the nonparametric smoothing procedure, to reconstruct the Hubble parameter H (z ) as a function of redshift from 15 measurements of the expansion rate obtained from age estimates of passively evolving galaxies. These reconstructions enable us to derive the luminosity distance to a certain redshift z , calibrate the light-curve fitting parameters accounting for the (unknown) intrinsic magnitude of type Ia supernova (SNe Ia), and construct cosmological model-independent Hubble diagrams of SNe Ia. In order to test the compatibility between the reconstructed functions of H (z ), we perform a statistical analysis considering the latest SNe Ia sample, the so-called joint light-curve compilation. We find that, for the Gaussian processes, the reconstructed functions of Hubble parameter versus redshift, and thus the following analysis on SNe Ia calibrations and cosmological implications, are sensitive to prior mean functions. However, for the nonparametric smoothing method, the reconstructed functions are not dependent on initial guess models, and consistently require high values of H0, which are in excellent agreement with recent measurements of this quantity from Cepheids and other local distance indicators.

  18. The non-parametric Parzen's window in stereo vision matching.

    PubMed

    Pajares, G; de la Cruz, J

    2002-01-01

    This paper presents an approach to the local stereovision matching problem using edge segments as features with four attributes. From these attributes we compute a matching probability between pairs of features of the stereo images. A correspondence is said true when such a probability is maximum. We introduce a nonparametric strategy based on Parzen's window (1962) to estimate a probability density function (PDF) which is used to obtain the matching probability. This is the main finding of the paper. A comparative analysis of other recent matching methods is included to show that this finding can be justified theoretically. A generalization of the proposed method is made in order to give guidelines about its use with the similarity constraint and also in different environments where other features and attributes are more suitable.

  19. Robust, Adaptive Functional Regression in Functional Mixed Model Framework.

    PubMed

    Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S

    2011-09-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.

  20. Robust, Adaptive Functional Regression in Functional Mixed Model Framework

    PubMed Central

    Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.

    2012-01-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015

  1. Conditional Covariance-Based Nonparametric Multidimensionality Assessment.

    ERIC Educational Resources Information Center

    Stout, William; And Others

    1996-01-01

    Three nonparametric procedures that use estimates of covariances of item-pair responses conditioned on examinee trait level for assessing dimensionality of a test are described. The HCA/CCPROX, DIMTEST, and DETECT are applied to a dimensionality study of the Law School Admission Test. (SLD)

  2. A simulation study of nonparametric total deviation index as a measure of agreement based on quantile regression.

    PubMed

    Lin, Lawrence; Pan, Yi; Hedayat, A S; Barnhart, Huiman X; Haber, Michael

    2016-01-01

    Total deviation index (TDI) captures a prespecified quantile of the absolute deviation of paired observations from raters, observers, methods, assays, instruments, etc. We compare the performance of TDI using nonparametric quantile regression to the TDI assuming normality (Lin, 2000). This simulation study considers three distributions: normal, Poisson, and uniform at quantile levels of 0.8 and 0.9 for cases with and without contamination. Study endpoints include the bias of TDI estimates (compared with their respective theoretical values), standard error of TDI estimates (compared with their true simulated standard errors), and test size (compared with 0.05), and power. Nonparametric TDI using quantile regression, although it slightly underestimates and delivers slightly less power for data without contamination, works satisfactorily under all simulated cases even for moderate (say, ≥40) sample sizes. The performance of the TDI based on a quantile of 0.8 is in general superior to that of 0.9. The performances of nonparametric and parametric TDI methods are compared with a real data example. Nonparametric TDI can be very useful when the underlying distribution on the difference is not normal, especially when it has a heavy tail.

  3. Model risk for European-style stock index options.

    PubMed

    Gençay, Ramazan; Gibson, Rajna

    2007-01-01

    In empirical modeling, there have been two strands for pricing in the options literature, namely the parametric and nonparametric models. Often, the support for the nonparametric methods is based on a benchmark such as the Black-Scholes (BS) model with constant volatility. In this paper, we study the stochastic volatility (SV) and stochastic volatility random jump (SVJ) models as parametric benchmarks against feedforward neural network (FNN) models, a class of neural network models. Our choice for FNN models is due to their well-studied universal approximation properties of an unknown function and its partial derivatives. Since the partial derivatives of an option pricing formula are risk pricing tools, an accurate estimation of the unknown option pricing function is essential for pricing and hedging. Our findings indicate that FNN models offer themselves as robust option pricing tools, over their sophisticated parametric counterparts in predictive settings. There are two routes to explain the superiority of FNN models over the parametric models in forecast settings. These are nonnormality of return distributions and adaptive learning.

  4. Maximum Marginal Likelihood Estimation of a Monotonic Polynomial Generalized Partial Credit Model with Applications to Multiple Group Analysis.

    PubMed

    Falk, Carl F; Cai, Li

    2016-06-01

    We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang's (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock-Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.

  5. A Hybrid Index for Characterizing Drought Based on a Nonparametric Kernel Estimator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Shengzhi; Huang, Qiang; Leng, Guoyong

    This study develops a nonparametric multivariate drought index, namely, the Nonparametric Multivariate Standardized Drought Index (NMSDI), by considering the variations of both precipitation and streamflow. Building upon previous efforts in constructing Nonparametric Multivariate Drought Index, we use the nonparametric kernel estimator to derive the joint distribution of precipitation and streamflow, thus providing additional insights in drought index development. The proposed NMSDI are applied in the Wei River Basin (WRB), based on which the drought evolution characteristics are investigated. Results indicate: (1) generally, NMSDI captures the drought onset similar to Standardized Precipitation Index (SPI) and drought termination and persistence similar tomore » Standardized Streamflow Index (SSFI). The drought events identified by NMSDI match well with historical drought records in the WRB. The performances are also consistent with that by an existing Multivariate Standardized Drought Index (MSDI) at various timescales, confirming the validity of the newly constructed NMSDI in drought detections (2) An increasing risk of drought has been detected for the past decades, and will be persistent to a certain extent in future in most areas of the WRB; (3) the identified change points of annual NMSDI are mainly concentrated in the early 1970s and middle 1990s, coincident with extensive water use and soil reservation practices. This study highlights the nonparametric multivariable drought index, which can be used for drought detections and predictions efficiently and comprehensively.« less

  6. Why preferring parametric forecasting to nonparametric methods?

    PubMed

    Jabot, Franck

    2015-05-07

    A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.

  7. Essays on parametric and nonparametric modeling and estimation with applications to energy economics

    NASA Astrophysics Data System (ADS)

    Gao, Weiyu

    My dissertation research is composed of two parts: a theoretical part on semiparametric efficient estimation and an applied part in energy economics under different dynamic settings. The essays are related in terms of their applications as well as the way in which models are constructed and estimated. In the first essay, efficient estimation of the partially linear model is studied. We work out the efficient score functions and efficiency bounds under four stochastic restrictions---independence, conditional symmetry, conditional zero mean, and partially conditional zero mean. A feasible efficient estimation method for the linear part of the model is developed based on the efficient score. A battery of specification test that allows for choosing between the alternative assumptions is provided. A Monte Carlo simulation is also conducted. The second essay presents a dynamic optimization model for a stylized oilfield resembling the largest developed light oil field in Saudi Arabia, Ghawar. We use data from different sources to estimate the oil production cost function and the revenue function. We pay particular attention to the dynamic aspect of the oil production by employing petroleum-engineering software to simulate the interaction between control variables and reservoir state variables. Optimal solutions are studied under different scenarios to account for the possible changes in the exogenous variables and the uncertainty about the forecasts. The third essay examines the effect of oil price volatility on the level of innovation displayed by the U.S. economy. A measure of innovation is calculated by decomposing an output-based Malmquist index. We also construct a nonparametric measure for oil price volatility. Technical change and oil price volatility are then placed in a VAR system with oil price and a variable indicative of monetary policy. The system is estimated and analyzed for significant relationships. We find that oil price volatility displays a significant negative effect on innovation. A key point of this analysis lies in the fact that we impose no functional forms for technologies and the methods employed keep technical assumptions to a minimum.

  8. Bayesian deconvolution of [corrected] fMRI data using bilinear dynamical systems.

    PubMed

    Makni, Salima; Beckmann, Christian; Smith, Steve; Woolrich, Mark

    2008-10-01

    In Penny et al. [Penny, W., Ghahramani, Z., Friston, K.J. 2005. Bilinear dynamical systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360(1457) 983-993], a particular case of the Linear Dynamical Systems (LDSs) was used to model the dynamic behavior of the BOLD response in functional MRI. This state-space model, called bilinear dynamical system (BDS), is used to deconvolve the fMRI time series in order to estimate the neuronal response induced by the different stimuli of the experimental paradigm. The BDS model parameters are estimated using an expectation-maximization (EM) algorithm proposed by Ghahramani and Hinton [Ghahramani, Z., Hinton, G.E. 1996. Parameter Estimation for Linear Dynamical Systems. Technical Report, Department of Computer Science, University of Toronto]. In this paper we introduce modifications to the BDS model in order to explicitly model the spatial variations of the haemodynamic response function (HRF) in the brain using a non-parametric approach. While in Penny et al. [Penny, W., Ghahramani, Z., Friston, K.J. 2005. Bilinear dynamical systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360(1457) 983-993] the relationship between neuronal activation and fMRI signals is formulated as a first-order convolution with a kernel expansion using basis functions (typically two or three), in this paper, we argue in favor of a spatially adaptive GLM in which a local non-parametric estimation of the HRF is performed. Furthermore, in order to overcome the overfitting problem typically associated with simple EM estimates, we propose a full Variational Bayes (VB) solution to infer the BDS model parameters. We demonstrate the usefulness of our model which is able to estimate both the neuronal activity and the haemodynamic response function in every voxel of the brain. We first examine the behavior of this approach when applied to simulated data with different temporal and noise features. As an example we will show how this method can be used to improve interpretability of estimates from an independent component analysis (ICA) analysis of fMRI data. We finally demonstrate its use on real fMRI data in one slice of the brain.

  9. The binned bispectrum estimator: template-based and non-parametric CMB non-Gaussianity searches

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bucher, Martin; Racine, Benjamin; Tent, Bartjan van, E-mail: bucher@apc.univ-paris7.fr, E-mail: benjar@uio.no, E-mail: vantent@th.u-psud.fr

    2016-05-01

    We describe the details of the binned bispectrum estimator as used for the official 2013 and 2015 analyses of the temperature and polarization CMB maps from the ESA Planck satellite. The defining aspect of this estimator is the determination of a map bispectrum (3-point correlation function) that has been binned in harmonic space. For a parametric determination of the non-Gaussianity in the map (the so-called f NL parameters), one takes the inner product of this binned bispectrum with theoretically motivated templates. However, as a complementary approach one can also smooth the binned bispectrum using a variable smoothing scale in ordermore » to suppress noise and make coherent features stand out above the noise. This allows one to look in a model-independent way for any statistically significant bispectral signal. This approach is useful for characterizing the bispectral shape of the galactic foreground emission, for which a theoretical prediction of the bispectral anisotropy is lacking, and for detecting a serendipitous primordial signal, for which a theoretical template has not yet been put forth. Both the template-based and the non-parametric approaches are described in this paper.« less

  10. Experimental determination of frequency response function estimates for flexible joint industrial manipulators with serial kinematics

    NASA Astrophysics Data System (ADS)

    Saupe, Florian; Knoblach, Andreas

    2015-02-01

    Two different approaches for the determination of frequency response functions (FRFs) are used for the non-parametric closed loop identification of a flexible joint industrial manipulator with serial kinematics. The two applied experiment designs are based on low power multisine and high power chirp excitations. The main challenge is to eliminate disturbances of the FRF estimates caused by the numerous nonlinearities of the robot. For the experiment design based on chirp excitations, a simple iterative procedure is proposed which allows exploiting the good crest factor of chirp signals in a closed loop setup. An interesting synergy of the two approaches, beyond validation purposes, is pointed out.

  11. Geostatistical radar-raingauge combination with nonparametric correlograms: methodological considerations and application in Switzerland

    NASA Astrophysics Data System (ADS)

    Schiemann, R.; Erdin, R.; Willi, M.; Frei, C.; Berenguer, M.; Sempere-Torres, D.

    2011-05-01

    Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semivariogram models are fit from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data fields. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial field under consideration, they yield an estimate of the semivariogram that is unbiased for small lag distances. This justifies the use of this estimation technique in geostatistical applications. Various formulations of geostatistical combination (Kriging) methods are used here for the construction of hourly precipitation grids for Switzerland based on data from a sparse realtime network of raingauges and from a spatially complete radar composite. Two variants of Ordinary Kriging (OK) are used to interpolate the sparse gauge observations. In both OK variants, the radar data are only used to determine the semivariogram model. One variant relies on a traditional parametric semivariogram estimate, whereas the other variant uses the nonparametric correlogram. The variants are tested for three cases and the impact of the semivariogram model on the Kriging prediction is illustrated. For the three test cases, the method using nonparametric correlograms performs equally well or better than the traditional method, and at the same time offers great practical advantages. Furthermore, two variants of Kriging with external drift (KED) are tested, both of which use the radar data to estimate nonparametric correlograms, and as the external drift variable. The first KED variant has been used previously for geostatistical radar-raingauge merging in Catalonia (Spain). The second variant is newly proposed here and is an extension of the first. Both variants are evaluated for the three test cases as well as an extended evaluation period. It is found that both methods yield merged fields of better quality than the original radar field or fields obtained by OK of gauge data. The newly suggested KED formulation is shown to be beneficial, in particular in mountainous regions where the quality of the Swiss radar composite is comparatively low. An analysis of the Kriging variances shows that none of the methods tested here provides a satisfactory uncertainty estimate. A suitable variable transformation is expected to improve this.

  12. Geostatistical radar-raingauge combination with nonparametric correlograms: methodological considerations and application in Switzerland

    NASA Astrophysics Data System (ADS)

    Schiemann, R.; Erdin, R.; Willi, M.; Frei, C.; Berenguer, M.; Sempere-Torres, D.

    2010-09-01

    Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semivariogram models are fit from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data fields. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial field under consideration, they yield an estimate of the semivariogram that is unbiased for small lag distances. This justifies the use of this estimation technique in geostatistical applications. Various formulations of geostatistical combination (Kriging) methods are used here for the construction of hourly precipitation grids for Switzerland based on data from a sparse realtime network of raingauges and from a spatially complete radar composite. Two variants of Ordinary Kriging (OK) are used to interpolate the sparse gauge observations. In both OK variants, the radar data are only used to determine the semivariogram model. One variant relies on a traditional parametric semivariogram estimate, whereas the other variant uses the nonparametric correlogram. The variants are tested for three cases and the impact of the semivariogram model on the Kriging prediction is illustrated. For the three test cases, the method using nonparametric correlograms performs equally well or better than the traditional method, and at the same time offers great practical advantages. Furthermore, two variants of Kriging with external drift (KED) are tested, both of which use the radar data to estimate nonparametric correlograms, and as the external drift variable. The first KED variant has been used previously for geostatistical radar-raingauge merging in Catalonia (Spain). The second variant is newly proposed here and is an extension of the first. Both variants are evaluated for the three test cases as well as an extended evaluation period. It is found that both methods yield merged fields of better quality than the original radar field or fields obtained by OK of gauge data. The newly suggested KED formulation is shown to be beneficial, in particular in mountainous regions where the quality of the Swiss radar composite is comparatively low. An analysis of the Kriging variances shows that none of the methods tested here provides a satisfactory uncertainty estimate. A suitable variable transformation is expected to improve this.

  13. A parametric interpretation of Bayesian Nonparametric Inference from Gene Genealogies: Linking ecological, population genetics and evolutionary processes.

    PubMed

    Ponciano, José Miguel

    2017-11-22

    Using a nonparametric Bayesian approach Palacios and Minin (2013) dramatically improved the accuracy, precision of Bayesian inference of population size trajectories from gene genealogies. These authors proposed an extension of a Gaussian Process (GP) nonparametric inferential method for the intensity function of non-homogeneous Poisson processes. They found that not only the statistical properties of the estimators were improved with their method, but also, that key aspects of the demographic histories were recovered. The authors' work represents the first Bayesian nonparametric solution to this inferential problem because they specify a convenient prior belief without a particular functional form on the population trajectory. Their approach works so well and provides such a profound understanding of the biological process, that the question arises as to how truly "biology-free" their approach really is. Using well-known concepts of stochastic population dynamics, here I demonstrate that in fact, Palacios and Minin's GP model can be cast as a parametric population growth model with density dependence and environmental stochasticity. Making this link between population genetics and stochastic population dynamics modeling provides novel insights into eliciting biologically meaningful priors for the trajectory of the effective population size. The results presented here also bring novel understanding of GP as models for the evolution of a trait. Thus, the ecological principles foundation of Palacios and Minin (2013)'s prior adds to the conceptual and scientific value of these authors' inferential approach. I conclude this note by listing a series of insights brought about by this connection with Ecology. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.

  14. Estimation from PET data of transient changes in dopamine concentration induced by alcohol: support for a non-parametric signal estimation method

    NASA Astrophysics Data System (ADS)

    Constantinescu, C. C.; Yoder, K. K.; Kareken, D. A.; Bouman, C. A.; O'Connor, S. J.; Normandin, M. D.; Morris, E. D.

    2008-03-01

    We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest & activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (FDA(t)) and the change in binding potential (ΔBP). The veracity of the FDA(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) ΔBP should decline with increasing DA peak time, (2) ΔBP should increase as the strength of the temporal correlation between FDA(t) and the free raclopride (FRAC(t)) curve increases, (3) ΔBP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [11C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover FDA(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the FDA(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of FDA(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.

  15. Unveiling acoustic physics of the CMB using nonparametric estimation of the temperature angular power spectrum for Planck

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aghamousa, Amir; Shafieloo, Arman; Arjunwadkar, Mihir

    2015-02-01

    Estimation of the angular power spectrum is one of the important steps in Cosmic Microwave Background (CMB) data analysis. Here, we present a nonparametric estimate of the temperature angular power spectrum for the Planck 2013 CMB data. The method implemented in this work is model-independent, and allows the data, rather than the model, to dictate the fit. Since one of the main targets of our analysis is to test the consistency of the ΛCDM model with Planck 2013 data, we use the nuisance parameters associated with the best-fit ΛCDM angular power spectrum to remove foreground contributions from the data atmore » multipoles ℓ ≥50. We thus obtain a combined angular power spectrum data set together with the full covariance matrix, appropriately weighted over frequency channels. Our subsequent nonparametric analysis resolves six peaks (and five dips) up to ℓ ∼1850 in the temperature angular power spectrum. We present uncertainties in the peak/dip locations and heights at the 95% confidence level. We further show how these reflect the harmonicity of acoustic peaks, and can be used for acoustic scale estimation. Based on this nonparametric formalism, we found the best-fit ΛCDM model to be at 36% confidence distance from the center of the nonparametric confidence set—this is considerably larger than the confidence distance (9%) derived earlier from a similar analysis of the WMAP 7-year data. Another interesting result of our analysis is that at low multipoles, the Planck data do not suggest any upturn, contrary to the expectation based on the integrated Sachs-Wolfe contribution in the best-fit ΛCDM cosmology.« less

  16. Statistical Models and Inference Procedures for Structural and Materials Reliability

    DTIC Science & Technology

    1990-12-01

    as an official Department of the Army positio~n, policy, or decision, unless sD designated by other documentazion. 12a. DISTRIBUTION /AVAILABILITY...Some general stress-strength models were also developed and applied to the failure of systems subject to cyclic loading. Involved in the failure of...process control ideas and sequential design and analysis methods. Finally, smooth nonparametric quantile .wJ function estimators were studied. All of

  17. Empirical estimation of a distribution function with truncated and doubly interval-censored data and its application to AIDS studies.

    PubMed

    Sun, J

    1995-09-01

    In this paper we discuss the non-parametric estimation of a distribution function based on incomplete data for which the measurement origin of a survival time or the date of enrollment in a study is known only to belong to an interval. Also the survival time of interest itself is observed from a truncated distribution and is known only to lie in an interval. To estimate the distribution function, a simple self-consistency algorithm, a generalization of Turnbull's (1976, Journal of the Royal Statistical Association, Series B 38, 290-295) self-consistency algorithm, is proposed. This method is then used to analyze two AIDS cohort studies, for which direct use of the EM algorithm (Dempster, Laird and Rubin, 1976, Journal of the Royal Statistical Association, Series B 39, 1-38), which is computationally complicated, has previously been the usual method of the analysis.

  18. A Frequency-Domain Multipath Parameter Estimation and Mitigation Method for BOC-Modulated GNSS Signals

    PubMed Central

    Sun, Chao; Feng, Wenquan; Du, Songlin

    2018-01-01

    As multipath is one of the dominating error sources for high accuracy Global Navigation Satellite System (GNSS) applications, multipath mitigation approaches are employed to minimize this hazardous error in receivers. Binary offset carrier modulation (BOC), as a modernized signal structure, is adopted to achieve significant enhancement. However, because of its multi-peak autocorrelation function, conventional multipath mitigation techniques for binary phase shift keying (BPSK) signal would not be optimal. Currently, non-parametric and parametric approaches have been studied specifically aiming at multipath mitigation for BOC signals. Non-parametric techniques, such as Code Correlation Reference Waveforms (CCRW), usually have good feasibility with simple structures, but suffer from low universal applicability for different BOC signals. Parametric approaches can thoroughly eliminate multipath error by estimating multipath parameters. The problems with this category are at the high computation complexity and vulnerability to the noise. To tackle the problem, we present a practical parametric multipath estimation method in the frequency domain for BOC signals. The received signal is transferred to the frequency domain to separate out the multipath channel transfer function for multipath parameter estimation. During this process, we take the operations of segmentation and averaging to reduce both noise effect and computational load. The performance of the proposed method is evaluated and compared with the previous work in three scenarios. Results indicate that the proposed averaging-Fast Fourier Transform (averaging-FFT) method achieves good robustness in severe multipath environments with lower computational load for both low-order and high-order BOC signals. PMID:29495589

  19. A comparative study between nonlinear regression and nonparametric approaches for modelling Phalaris paradoxa seedling emergence

    USDA-ARS?s Scientific Manuscript database

    Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...

  20. A robust nonparametric framework for reconstruction of stochastic differential equation models

    NASA Astrophysics Data System (ADS)

    Rajabzadeh, Yalda; Rezaie, Amir Hossein; Amindavar, Hamidreza

    2016-05-01

    In this paper, we employ a nonparametric framework to robustly estimate the functional forms of drift and diffusion terms from discrete stationary time series. The proposed method significantly improves the accuracy of the parameter estimation. In this framework, drift and diffusion coefficients are modeled through orthogonal Legendre polynomials. We employ the least squares regression approach along with the Euler-Maruyama approximation method to learn coefficients of stochastic model. Next, a numerical discrete construction of mean squared prediction error (MSPE) is established to calculate the order of Legendre polynomials in drift and diffusion terms. We show numerically that the new method is robust against the variation in sample size and sampling rate. The performance of our method in comparison with the kernel-based regression (KBR) method is demonstrated through simulation and real data. In case of real dataset, we test our method for discriminating healthy electroencephalogram (EEG) signals from epilepsy ones. We also demonstrate the efficiency of the method through prediction in the financial data. In both simulation and real data, our algorithm outperforms the KBR method.

  1. Bayesian dynamic mediation analysis.

    PubMed

    Huang, Jing; Yuan, Ying

    2017-12-01

    Most existing methods for mediation analysis assume that mediation is a stationary, time-invariant process, which overlooks the inherently dynamic nature of many human psychological processes and behavioral activities. In this article, we consider mediation as a dynamic process that continuously changes over time. We propose Bayesian multilevel time-varying coefficient models to describe and estimate such dynamic mediation effects. By taking the nonparametric penalized spline approach, the proposed method is flexible and able to accommodate any shape of the relationship between time and mediation effects. Simulation studies show that the proposed method works well and faithfully reflects the true nature of the mediation process. By modeling mediation effect nonparametrically as a continuous function of time, our method provides a valuable tool to help researchers obtain a more complete understanding of the dynamic nature of the mediation process underlying psychological and behavioral phenomena. We also briefly discuss an alternative approach of using dynamic autoregressive mediation model to estimate the dynamic mediation effect. The computer code is provided to implement the proposed Bayesian dynamic mediation analysis. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  2. SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES

    PubMed Central

    Zhu, Liping; Huang, Mian; Li, Runze

    2012-01-01

    This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536

  3. A semi-nonparametric Poisson regression model for analyzing motor vehicle crash data.

    PubMed

    Ye, Xin; Wang, Ke; Zou, Yajie; Lord, Dominique

    2018-01-01

    This paper develops a semi-nonparametric Poisson regression model to analyze motor vehicle crash frequency data collected from rural multilane highway segments in California, US. Motor vehicle crash frequency on rural highway is a topic of interest in the area of transportation safety due to higher driving speeds and the resultant severity level. Unlike the traditional Negative Binomial (NB) model, the semi-nonparametric Poisson regression model can accommodate an unobserved heterogeneity following a highly flexible semi-nonparametric (SNP) distribution. Simulation experiments are conducted to demonstrate that the SNP distribution can well mimic a large family of distributions, including normal distributions, log-gamma distributions, bimodal and trimodal distributions. Empirical estimation results show that such flexibility offered by the SNP distribution can greatly improve model precision and the overall goodness-of-fit. The semi-nonparametric distribution can provide a better understanding of crash data structure through its ability to capture potential multimodality in the distribution of unobserved heterogeneity. When estimated coefficients in empirical models are compared, SNP and NB models are found to have a substantially different coefficient for the dummy variable indicating the lane width. The SNP model with better statistical performance suggests that the NB model overestimates the effect of lane width on crash frequency reduction by 83.1%.

  4. Genetic Algorithm Based Framework for Automation of Stochastic Modeling of Multi-Season Streamflows

    NASA Astrophysics Data System (ADS)

    Srivastav, R. K.; Srinivasan, K.; Sudheer, K.

    2009-05-01

    Synthetic streamflow data generation involves the synthesis of likely streamflow patterns that are statistically indistinguishable from the observed streamflow data. The various kinds of stochastic models adopted for multi-season streamflow generation in hydrology are: i) parametric models which hypothesize the form of the periodic dependence structure and the distributional form a priori (examples are PAR, PARMA); disaggregation models that aim to preserve the correlation structure at the periodic level and the aggregated annual level; ii) Nonparametric models (examples are bootstrap/kernel based methods), which characterize the laws of chance, describing the stream flow process, without recourse to prior assumptions as to the form or structure of these laws; (k-nearest neighbor (k-NN), matched block bootstrap (MABB)); non-parametric disaggregation model. iii) Hybrid models which blend both parametric and non-parametric models advantageously to model the streamflows effectively. Despite many of these developments that have taken place in the field of stochastic modeling of streamflows over the last four decades, accurate prediction of the storage and the critical drought characteristics has been posing a persistent challenge to the stochastic modeler. This is partly because, usually, the stochastic streamflow model parameters are estimated by minimizing a statistically based objective function (such as maximum likelihood (MLE) or least squares (LS) estimation) and subsequently the efficacy of the models is being validated based on the accuracy of prediction of the estimates of the water-use characteristics, which requires large number of trial simulations and inspection of many plots and tables. Still accurate prediction of the storage and the critical drought characteristics may not be ensured. In this study a multi-objective optimization framework is proposed to find the optimal hybrid model (blend of a simple parametric model, PAR(1) model and matched block bootstrap (MABB) ) based on the explicit objective functions of minimizing the relative bias and relative root mean square error in estimating the storage capacity of the reservoir. The optimal parameter set of the hybrid model is obtained based on the search over a multi- dimensional parameter space (involving simultaneous exploration of the parametric (PAR(1)) as well as the non-parametric (MABB) components). This is achieved using the efficient evolutionary search based optimization tool namely, non-dominated sorting genetic algorithm - II (NSGA-II). This approach helps in reducing the drudgery involved in the process of manual selection of the hybrid model, in addition to predicting the basic summary statistics dependence structure, marginal distribution and water-use characteristics accurately. The proposed optimization framework is used to model the multi-season streamflows of River Beaver and River Weber of USA. In case of both the rivers, the proposed GA-based hybrid model yields a much better prediction of the storage capacity (where simultaneous exploration of both parametric and non-parametric components is done) when compared with the MLE-based hybrid models (where the hybrid model selection is done in two stages, thus probably resulting in a sub-optimal model). This framework can be further extended to include different linear/non-linear hybrid stochastic models at other temporal and spatial scales as well.

  5. Probit vs. semi-nonparametric estimation: examining the role of disability on institutional entry for older adults.

    PubMed

    Sharma, Andy

    2017-06-01

    The purpose of this study was to showcase an advanced methodological approach to model disability and institutional entry. Both of these are important areas to investigate given the on-going aging of the United States population. By 2020, approximately 15% of the population will be 65 years and older. Many of these older adults will experience disability and require formal care. A probit analysis was employed to determine which disabilities were associated with admission into an institution (i.e. long-term care). Since this framework imposes strong distributional assumptions, misspecification leads to inconsistent estimators. To overcome such a short-coming, this analysis extended the probit framework by employing an advanced semi-nonparamertic maximum likelihood estimation utilizing Hermite polynomial expansions. Specification tests show semi-nonparametric estimation is preferred over probit. In terms of the estimates, semi-nonparametric ratios equal 42 for cognitive difficulty, 64 for independent living, and 111 for self-care disability while probit yields much smaller estimates of 19, 30, and 44, respectively. Public health professionals can use these results to better understand why certain interventions have not shown promise. Equally important, healthcare workers can use this research to evaluate which type of treatment plans may delay institutionalization and improve the quality of life for older adults. Implications for rehabilitation With on-going global aging, understanding the association between disability and institutional entry is important in devising successful rehabilitation interventions. Semi-nonparametric is preferred to probit and shows ambulatory and cognitive impairments present high risk for institutional entry (long-term care). Informal caregiving and home-based care require further examination as forms of rehabilitation/therapy for certain types of disabilities.

  6. Sieve estimation of Cox models with latent structures.

    PubMed

    Cao, Yongxiu; Huang, Jian; Liu, Yanyan; Zhao, Xingqiu

    2016-12-01

    This article considers sieve estimation in the Cox model with an unknown regression structure based on right-censored data. We propose a semiparametric pursuit method to simultaneously identify and estimate linear and nonparametric covariate effects based on B-spline expansions through a penalized group selection method with concave penalties. We show that the estimators of the linear effects and the nonparametric component are consistent. Furthermore, we establish the asymptotic normality of the estimator of the linear effects. To compute the proposed estimators, we develop a modified blockwise majorization descent algorithm that is efficient and easy to implement. Simulation studies demonstrate that the proposed method performs well in finite sample situations. We also use the primary biliary cirrhosis data to illustrate its application. © 2016, The International Biometric Society.

  7. Local linear regression for function learning: an analysis based on sample discrepancy.

    PubMed

    Cervellera, Cristiano; Macciò, Danilo

    2014-11-01

    Local linear regression models, a kind of nonparametric structures that locally perform a linear estimation of the target function, are analyzed in the context of empirical risk minimization (ERM) for function learning. The analysis is carried out with emphasis on geometric properties of the available data. In particular, the discrepancy of the observation points used both to build the local regression models and compute the empirical risk is considered. This allows to treat indifferently the case in which the samples come from a random external source and the one in which the input space can be freely explored. Both consistency of the ERM procedure and approximating capabilities of the estimator are analyzed, proving conditions to ensure convergence. Since the theoretical analysis shows that the estimation improves as the discrepancy of the observation points becomes smaller, low-discrepancy sequences, a family of sampling methods commonly employed for efficient numerical integration, are also analyzed. Simulation results involving two different examples of function learning are provided.

  8. Granger causality revisited

    PubMed Central

    Friston, Karl J.; Bastos, André M.; Oswal, Ashwini; van Wijk, Bernadette; Richter, Craig; Litvak, Vladimir

    2014-01-01

    This technical paper offers a critical re-evaluation of (spectral) Granger causality measures in the analysis of biological timeseries. Using realistic (neural mass) models of coupled neuronal dynamics, we evaluate the robustness of parametric and nonparametric Granger causality. Starting from a broad class of generative (state-space) models of neuronal dynamics, we show how their Volterra kernels prescribe the second-order statistics of their response to random fluctuations; characterised in terms of cross-spectral density, cross-covariance, autoregressive coefficients and directed transfer functions. These quantities in turn specify Granger causality — providing a direct (analytic) link between the parameters of a generative model and the expected Granger causality. We use this link to show that Granger causality measures based upon autoregressive models can become unreliable when the underlying dynamics is dominated by slow (unstable) modes — as quantified by the principal Lyapunov exponent. However, nonparametric measures based on causal spectral factors are robust to dynamical instability. We then demonstrate how both parametric and nonparametric spectral causality measures can become unreliable in the presence of measurement noise. Finally, we show that this problem can be finessed by deriving spectral causality measures from Volterra kernels, estimated using dynamic causal modelling. PMID:25003817

  9. Autonomous Frequency Domain Identification: Theory and Experiment

    DTIC Science & Technology

    1989-04-15

    4,4,3)). This approach is particularly well suited to provide accurate estimation using sampled-data -3 DO 2 ^ UJ H « > x 2 ^ ui M (d 5 -P m...criteria for resonance requires a unimodal search. Search strategies such as golden search, Fibonacci search etc. are well known and can be found for...identified nonparametrically and a frequency domain de - scription is available, a parametric representation of the transfer function can be found by

  10. A note on the IQ of monozygotic twins raised apart and the order of their birth.

    PubMed

    Pencavel, J H

    1976-10-01

    This note examines James Shields' sample of monozygotic twins raised apart to entertain the hypothesis that there is a significant association between the measured IQ of these twins and the order of their birth. A non-parametric test supports this hypothesis and then a linear probability function is estimated that discriminates the effects on IQ of birth order from the effects of birth weight.

  11. Estimation of Spatial Dynamic Nonparametric Durbin Models with Fixed Effects

    ERIC Educational Resources Information Center

    Qian, Minghui; Hu, Ridong; Chen, Jianwei

    2016-01-01

    Spatial panel data models have been widely studied and applied in both scientific and social science disciplines, especially in the analysis of spatial influence. In this paper, we consider the spatial dynamic nonparametric Durbin model (SDNDM) with fixed effects, which takes the nonlinear factors into account base on the spatial dynamic panel…

  12. Spatial hydrological drought characteristics in Karkheh River basin, southwest Iran using copulas

    NASA Astrophysics Data System (ADS)

    Dodangeh, Esmaeel; Shahedi, Kaka; Shiau, Jenq-Tzong; MirAkbari, Maryam

    2017-08-01

    Investigation on drought characteristics such as severity, duration, and frequency is crucial for water resources planning and management in a river basin. While the methodology for multivariate drought frequency analysis is well established by applying the copulas, the estimation on the associated parameters by various parameter estimation methods and the effects on the obtained results have not yet been investigated. This research aims at conducting a comparative analysis between the maximum likelihood parametric and non-parametric method of the Kendall τ estimation method for copulas parameter estimation. The methods were employed to study joint severity-duration probability and recurrence intervals in Karkheh River basin (southwest Iran) which is facing severe water-deficit problems. Daily streamflow data at three hydrological gauging stations (Tang Sazbon, Huleilan and Polchehr) near the Karkheh dam were used to draw flow duration curves (FDC) of these three stations. The Q_{75} index extracted from the FDC were set as threshold level to abstract drought characteristics such as drought duration and severity on the basis of the run theory. Drought duration and severity were separately modeled using the univariate probabilistic distributions and gamma-GEV, LN2-exponential, and LN2-gamma were selected as the best paired drought severity-duration inputs for copulas according to the Akaike Information Criteria (AIC), Kolmogorov-Smirnov and chi-square tests. Archimedean Clayton, Frank, and extreme value Gumbel copulas were employed to construct joint cumulative distribution functions (JCDF) of droughts for each station. Frank copula at Tang Sazbon and Gumbel at Huleilan and Polchehr stations were identified as the best copulas based on the performance evaluation criteria including AIC, BIC, log-likelihood and root mean square error (RMSE) values. Based on the RMSE values, nonparametric Kendall-τ is preferred to the parametric maximum likelihood estimation method. The results showed greater drought return periods by the parametric ML method in comparison to the nonparametric Kendall τ estimation method. The results also showed that stations located in tributaries (Huleilan and Polchehr) have close return periods, while the station along the main river (Tang Sazbon) has the smaller return periods for the drought events with identical drought duration and severity.

  13. Nonparametric EROC analysis for observer performance evaluation on joint detection and estimation tasks

    NASA Astrophysics Data System (ADS)

    Wunderlich, Adam; Goossens, Bart

    2014-03-01

    The majority of the literature on task-based image quality assessment has focused on lesion detection tasks, using the receiver operating characteristic (ROC) curve, or related variants, to measure performance. However, since many clinical image evaluation tasks involve both detection and estimation (e.g., estimation of kidney stone composition, estimation of tumor size), there is a growing interest in performance evaluation for joint detection and estimation tasks. To evaluate observer performance on such tasks, Clarkson introduced the estimation ROC (EROC) curve, and the area under the EROC curve as a summary figure of merit. In the present work, we propose nonparametric estimators for practical EROC analysis from experimental data, including estimators for the area under the EROC curve and its variance. The estimators are illustrated with a practical example comparing MRI images reconstructed from different k-space sampling trajectories.

  14. Testing in semiparametric models with interaction, with applications to gene-environment interactions.

    PubMed

    Maity, Arnab; Carroll, Raymond J; Mammen, Enno; Chatterjee, Nilanjan

    2009-01-01

    Motivated from the problem of testing for genetic effects on complex traits in the presence of gene-environment interaction, we develop score tests in general semiparametric regression problems that involves Tukey style 1 degree-of-freedom form of interaction between parametrically and non-parametrically modelled covariates. We find that the score test in this type of model, as recently developed by Chatterjee and co-workers in the fully parametric setting, is biased and requires undersmoothing to be valid in the presence of non-parametric components. Moreover, in the presence of repeated outcomes, the asymptotic distribution of the score test depends on the estimation of functions which are defined as solutions of integral equations, making implementation difficult and computationally taxing. We develop profiled score statistics which are unbiased and asymptotically efficient and can be performed by using standard bandwidth selection methods. In addition, to overcome the difficulty of solving functional equations, we give easy interpretations of the target functions, which in turn allow us to develop estimation procedures that can be easily implemented by using standard computational methods. We present simulation studies to evaluate type I error and power of the method proposed compared with a naive test that does not consider interaction. Finally, we illustrate our methodology by analysing data from a case-control study of colorectal adenoma that was designed to investigate the association between colorectal adenoma and the candidate gene NAT2 in relation to smoking history.

  15. A Semi-parametric Transformation Frailty Model for Semi-competing Risks Survival Data

    PubMed Central

    Jiang, Fei; Haneuse, Sebastien

    2016-01-01

    In the analysis of semi-competing risks data interest lies in estimation and inference with respect to a so-called non-terminal event, the observation of which is subject to a terminal event. Multi-state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non-terminal and terminal events specified, in part, by a unit-specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, σ2. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi-parametric efficient score under the complete data setting and propose a non-parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small-sample operating characteristics evaluated via simulation. Although the proposed semi-parametric transformation model and non-parametric score imputation method are motivated by the analysis of semi-competing risks data, they are broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer. PMID:28439147

  16. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    PubMed

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.

  17. Bayesian nonparametric regression with varying residual density

    PubMed Central

    Pati, Debdeep; Dunson, David B.

    2013-01-01

    We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053

  18. Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information.

    PubMed

    Fan, Yue; Wang, Xiao; Peng, Qinke

    2017-01-01

    Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.

  19. Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample

    PubMed Central

    Wang, Ching-Yun; Song, Xiao

    2017-01-01

    SUMMARY Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women’s Health Initiative. PMID:27546625

  20. Network Reconstruction Using Nonparametric Additive ODE Models

    PubMed Central

    Henderson, James; Michailidis, George

    2014-01-01

    Network representations of biological systems are widespread and reconstructing unknown networks from data is a focal problem for computational biologists. For example, the series of biochemical reactions in a metabolic pathway can be represented as a network, with nodes corresponding to metabolites and edges linking reactants to products. In a different context, regulatory relationships among genes are commonly represented as directed networks with edges pointing from influential genes to their targets. Reconstructing such networks from data is a challenging problem receiving much attention in the literature. There is a particular need for approaches tailored to time-series data and not reliant on direct intervention experiments, as the former are often more readily available. In this paper, we introduce an approach to reconstructing directed networks based on dynamic systems models. Our approach generalizes commonly used ODE models based on linear or nonlinear dynamics by extending the functional class for the functions involved from parametric to nonparametric models. Concomitantly we limit the complexity by imposing an additive structure on the estimated slope functions. Thus the submodel associated with each node is a sum of univariate functions. These univariate component functions form the basis for a novel coupling metric that we define in order to quantify the strength of proposed relationships and hence rank potential edges. We show the utility of the method by reconstructing networks using simulated data from computational models for the glycolytic pathway of Lactocaccus Lactis and a gene network regulating the pluripotency of mouse embryonic stem cells. For purposes of comparison, we also assess reconstruction performance using gene networks from the DREAM challenges. We compare our method to those that similarly rely on dynamic systems models and use the results to attempt to disentangle the distinct roles of linearity, sparsity, and derivative estimation. PMID:24732037

  1. Profile local linear estimation of generalized semiparametric regression model for longitudinal data.

    PubMed

    Sun, Yanqing; Sun, Liuquan; Zhou, Jie

    2013-07-01

    This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.

  2. A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)

    ERIC Educational Resources Information Center

    Arenson, Ethan A.; Karabatsos, George

    2017-01-01

    Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…

  3. Examining the Feasibility and Utility of Estimating Partial Expected Value of Perfect Information (via a Nonparametric Approach) as Part of the Reimbursement Decision-Making Process in Ireland: Application to Drugs for Cancer.

    PubMed

    McCullagh, Laura; Schmitz, Susanne; Barry, Michael; Walsh, Cathal

    2017-11-01

    In Ireland, all new drugs for which reimbursement by the healthcare payer is sought undergo a health technology assessment by the National Centre for Pharmacoeconomics. The National Centre for Pharmacoeconomics estimate expected value of perfect information but not partial expected value of perfect information (owing to computational expense associated with typical methodologies). The objective of this study was to examine the feasibility and utility of estimating partial expected value of perfect information via a computationally efficient, non-parametric regression approach. This was a retrospective analysis of evaluations on drugs for cancer that had been submitted to the National Centre for Pharmacoeconomics (January 2010 to December 2014 inclusive). Drugs were excluded if cost effective at the submitted price. Drugs were excluded if concerns existed regarding the validity of the applicants' submission or if cost-effectiveness model functionality did not allow required modifications to be made. For each included drug (n = 14), value of information was estimated at the final reimbursement price, at a threshold equivalent to the incremental cost-effectiveness ratio at that price. The expected value of perfect information was estimated from probabilistic analysis. Partial expected value of perfect information was estimated via a non-parametric approach. Input parameters with a population value at least €1 million were identified as potential targets for research. All partial estimates were determined within minutes. Thirty parameters (across nine models) each had a value of at least €1 million. These were categorised. Collectively, survival analysis parameters were valued at €19.32 million, health state utility parameters at €15.81 million and parameters associated with the cost of treating adverse effects at €6.64 million. Those associated with drug acquisition costs and with the cost of care were valued at €6.51 million and €5.71 million, respectively. This research demonstrates that the estimation of partial expected value of perfect information via this computationally inexpensive approach could be considered feasible as part of the health technology assessment process for reimbursement purposes within the Irish healthcare system. It might be a useful tool in prioritising future research to decrease decision uncertainty.

  4. Health Insurance: The Trade-Off Between Risk Pooling and Moral Hazard.

    DTIC Science & Technology

    1989-12-01

    bias comes about because we suppress the intercept term in estimating VFor the power, the test is against 1, - 1. With this transform, the risk...dealing with the same utility function. As one test of whether families behave in the way economic theory suggests, we have also fitted a probit model of...nonparametric alternative to test our results’ sensitivity to the assumption of a normal error in both the theoretical and empirical models of the

  5. Nonparametric estimation of median survival times with applications to multi-site or multi-center studies.

    PubMed

    Rahbar, Mohammad H; Choi, Sangbum; Hong, Chuan; Zhu, Liang; Jeon, Sangchoon; Gardiner, Joseph C

    2018-01-01

    We propose a nonparametric shrinkage estimator for the median survival times from several independent samples of right-censored data, which combines the samples and hypothesis information to improve the efficiency. We compare efficiency of the proposed shrinkage estimation procedure to unrestricted estimator and combined estimator through extensive simulation studies. Our results indicate that performance of these estimators depends on the strength of homogeneity of the medians. When homogeneity holds, the combined estimator is the most efficient estimator. However, it becomes inconsistent when homogeneity fails. On the other hand, the proposed shrinkage estimator remains efficient. Its efficiency decreases as the equality of the survival medians is deviated, but is expected to be as good as or equal to the unrestricted estimator. Our simulation studies also indicate that the proposed shrinkage estimator is robust to moderate levels of censoring. We demonstrate application of these methods to estimating median time for trauma patients to receive red blood cells in the Prospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study.

  6. Nonparametric estimation of median survival times with applications to multi-site or multi-center studies

    PubMed Central

    Choi, Sangbum; Hong, Chuan; Zhu, Liang; Jeon, Sangchoon; Gardiner, Joseph C.

    2018-01-01

    We propose a nonparametric shrinkage estimator for the median survival times from several independent samples of right-censored data, which combines the samples and hypothesis information to improve the efficiency. We compare efficiency of the proposed shrinkage estimation procedure to unrestricted estimator and combined estimator through extensive simulation studies. Our results indicate that performance of these estimators depends on the strength of homogeneity of the medians. When homogeneity holds, the combined estimator is the most efficient estimator. However, it becomes inconsistent when homogeneity fails. On the other hand, the proposed shrinkage estimator remains efficient. Its efficiency decreases as the equality of the survival medians is deviated, but is expected to be as good as or equal to the unrestricted estimator. Our simulation studies also indicate that the proposed shrinkage estimator is robust to moderate levels of censoring. We demonstrate application of these methods to estimating median time for trauma patients to receive red blood cells in the Prospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study. PMID:29772007

  7. Model-free estimation of the psychometric function

    PubMed Central

    Żychaluk, Kamila; Foster, David H.

    2009-01-01

    A subject's response to the strength of a stimulus is described by the psychometric function, from which summary measures, such as a threshold or slope, may be derived. Traditionally, this function is estimated by fitting a parametric model to the experimental data, usually the proportion of successful trials at each stimulus level. Common models include the Gaussian and Weibull cumulative distribution functions. This approach works well if the model is correct, but it can mislead if not. In practice, the correct model is rarely known. Here, a nonparametric approach based on local linear fitting is advocated. No assumption is made about the true model underlying the data, except that the function is smooth. The critical role of the bandwidth is identified, and its optimum value estimated by a cross-validation procedure. As a demonstration, seven vision and hearing data sets were fitted by the local linear method and by several parametric models. The local linear method frequently performed better and never worse than the parametric ones. Supplemental materials for this article can be downloaded from app.psychonomic-journals.org/content/supplemental. PMID:19633355

  8. Parametric and non-parametric modeling of short-term synaptic plasticity. Part I: computational study

    PubMed Central

    Marmarelis, Vasilis Z.; Berger, Theodore W.

    2009-01-01

    Parametric and non-parametric modeling methods are combined to study the short-term plasticity (STP) of synapses in the central nervous system (CNS). The nonlinear dynamics of STP are modeled by means: (1) previously proposed parametric models based on mechanistic hypotheses and/or specific dynamical processes, and (2) non-parametric models (in the form of Volterra kernels) that transforms the presynaptic signals into postsynaptic signals. In order to synergistically use the two approaches, we estimate the Volterra kernels of the parametric models of STP for four types of synapses using synthetic broadband input–output data. Results show that the non-parametric models accurately and efficiently replicate the input–output transformations of the parametric models. Volterra kernels provide a general and quantitative representation of the STP. PMID:18506609

  9. Economic decision making and the application of nonparametric prediction models

    USGS Publications Warehouse

    Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.

    2008-01-01

    Sustained increases in energy prices have focused attention on gas resources in low-permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are often large. Planning and development decisions for extraction of such resources must be areawide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm, the decision to enter such plays depends on reconnaissance-level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional-scale cost functions. The context of the worked example is the Devonian Antrim-shale gas play in the Michigan basin. One finding relates to selection of the resource prediction model to be used with economic models. Models chosen because they can best predict aggregate volume over larger areas (many hundreds of sites) smooth out granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined arbitrarily by extraneous factors. The analysis shows a 15-20% gain in gas volume when these simple models are applied to order drilling prospects strategically rather than to choose drilling locations randomly. Copyright ?? 2008 Society of Petroleum Engineers.

  10. The linear transformation model with frailties for the analysis of item response times.

    PubMed

    Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey A

    2013-02-01

    The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi-parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non-parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non-parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees' latent speeds; whereas the non-parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box-Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two-stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non-parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested. © 2012 The British Psychological Society.

  11. Testing the Hypothesis of a Homoscedastic Error Term in Simple, Nonparametric Regression

    ERIC Educational Resources Information Center

    Wilcox, Rand R.

    2006-01-01

    Consider the nonparametric regression model Y = m(X)+ [tau](X)[epsilon], where X and [epsilon] are independent random variables, [epsilon] has a median of zero and variance [sigma][squared], [tau] is some unknown function used to model heteroscedasticity, and m(X) is an unknown function reflecting some conditional measure of location associated…

  12. Using multiple decrement models to estimate risk and morbidity from specific AIDS illnesses. Multicenter AIDS Cohort Study (MACS).

    PubMed

    Hoover, D R; Peng, Y; Saah, A J; Detels, R R; Day, R S; Phair, J P

    A simple non-parametric approach is developed to simultaneously estimate net incidence and morbidity time from specific AIDS illnesses in populations at high risk for death from these illnesses and other causes. The disease-death process has four-stages that can be recast as two sandwiching three-state multiple decrement processes. Non-parametric estimation of net incidence and morbidity time with error bounds are achieved from these sandwiching models through modification of methods from Aalen and Greenwood, and bootstrapping. An application to immunosuppressed HIV-1 infected homosexual men reveals that cytomegalovirus disease, Kaposi's sarcoma and Pneumocystis pneumonia are likely to occur and cause significant morbidity time.

  13. A mixture model for robust registration in Kinect sensor

    NASA Astrophysics Data System (ADS)

    Peng, Li; Zhou, Huabing; Zhu, Shengguo

    2018-03-01

    The Microsoft Kinect sensor has been widely used in many applications, but it suffers from the drawback of low registration precision between color image and depth image. In this paper, we present a robust method to improve the registration precision by a mixture model that can handle multiply images with the nonparametric model. We impose non-parametric geometrical constraints on the correspondence, as a prior distribution, in a reproducing kernel Hilbert space (RKHS).The estimation is performed by the EM algorithm which by also estimating the variance of the prior model is able to obtain good estimates. We illustrate the proposed method on the public available dataset. The experimental results show that our approach outperforms the baseline methods.

  14. Nonparametric projections of forest and rangeland condition indicators: A technical document supporting the 2005 USDA Forest Service RPA Assessment Update

    Treesearch

    John Hof; Curtis Flather; Tony Baltic; Rudy King

    2006-01-01

    The 2005 Forest and Rangeland Condition Indicator Model is a set of classification trees for forest and rangeland condition indicators at the national scale. This report documents the development of the database and the nonparametric statistical estimation for this analytical structure, with emphasis on three special characteristics of condition indicator production...

  15. Complementary nonparametric analysis of covariance for logistic regression in a randomized clinical trial setting.

    PubMed

    Tangen, C M; Koch, G G

    1999-03-01

    In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.

  16. Transformation-invariant and nonparametric monotone smooth estimation of ROC curves.

    PubMed

    Du, Pang; Tang, Liansheng

    2009-01-30

    When a new diagnostic test is developed, it is of interest to evaluate its accuracy in distinguishing diseased subjects from non-diseased subjects. The accuracy of the test is often evaluated by receiver operating characteristic (ROC) curves. Smooth ROC estimates are often preferable for continuous test results when the underlying ROC curves are in fact continuous. Nonparametric and parametric methods have been proposed by various authors to obtain smooth ROC curve estimates. However, there are certain drawbacks with the existing methods. Parametric methods need specific model assumptions. Nonparametric methods do not always satisfy the inherent properties of the ROC curves, such as monotonicity and transformation invariance. In this paper we propose a monotone spline approach to obtain smooth monotone ROC curves. Our method ensures important inherent properties of the underlying ROC curves, which include monotonicity, transformation invariance, and boundary constraints. We compare the finite sample performance of the newly proposed ROC method with other ROC smoothing methods in large-scale simulation studies. We illustrate our method through a real life example. Copyright (c) 2008 John Wiley & Sons, Ltd.

  17. Boosted Multivariate Trees for Longitudinal Data

    PubMed Central

    Pande, Amol; Li, Liang; Rajeswaran, Jeevanantham; Ehrlinger, John; Kogalur, Udaya B.; Blackstone, Eugene H.; Ishwaran, Hemant

    2017-01-01

    Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing P-splines with estimated smoothing parameter. In order to avoid overfitting, we describe a relatively simple in sample cross-validation method which can be used to estimate the optimal boosting iteration and which has the surprising added benefit of stabilizing certain parameter estimates. Our new multivariate tree boosting method is shown to be highly flexible, robust to covariance misspecification and unbalanced designs, and resistant to overfitting in high dimensions. Feature selection can be used to identify important features and feature-time interactions. An application to longitudinal data of forced 1-second lung expiratory volume (FEV1) for lung transplant patients identifies an important feature-time interaction and illustrates the ease with which our method can find complex relationships in longitudinal data. PMID:29249866

  18. Application of the LSQR algorithm in non-parametric estimation of aerosol size distribution

    NASA Astrophysics Data System (ADS)

    He, Zhenzong; Qi, Hong; Lew, Zhongyuan; Ruan, Liming; Tan, Heping; Luo, Kun

    2016-05-01

    Based on the Least Squares QR decomposition (LSQR) algorithm, the aerosol size distribution (ASD) is retrieved in non-parametric approach. The direct problem is solved by the Anomalous Diffraction Approximation (ADA) and the Lambert-Beer Law. An optimal wavelength selection method is developed to improve the retrieval accuracy of the ASD. The proposed optimal wavelength set is selected by the method which can make the measurement signals sensitive to wavelength and decrease the degree of the ill-condition of coefficient matrix of linear systems effectively to enhance the anti-interference ability of retrieval results. Two common kinds of monomodal and bimodal ASDs, log-normal (L-N) and Gamma distributions, are estimated, respectively. Numerical tests show that the LSQR algorithm can be successfully applied to retrieve the ASD with high stability in the presence of random noise and low susceptibility to the shape of distributions. Finally, the experimental measurement ASD over Harbin in China is recovered reasonably. All the results confirm that the LSQR algorithm combined with the optimal wavelength selection method is an effective and reliable technique in non-parametric estimation of ASD.

  19. Forecasting Epidemics Through Nonparametric Estimation of Time-Dependent Transmission Rates Using the SEIR Model.

    PubMed

    Smirnova, Alexandra; deCamp, Linda; Chowell, Gerardo

    2017-05-02

    Deterministic and stochastic methods relying on early case incidence data for forecasting epidemic outbreaks have received increasing attention during the last few years. In mathematical terms, epidemic forecasting is an ill-posed problem due to instability of parameter identification and limited available data. While previous studies have largely estimated the time-dependent transmission rate by assuming specific functional forms (e.g., exponential decay) that depend on a few parameters, here we introduce a novel approach for the reconstruction of nonparametric time-dependent transmission rates by projecting onto a finite subspace spanned by Legendre polynomials. This approach enables us to effectively forecast future incidence cases, the clear advantage over recovering the transmission rate at finitely many grid points within the interval where the data are currently available. In our approach, we compare three regularization algorithms: variational (Tikhonov's) regularization, truncated singular value decomposition (TSVD), and modified TSVD in order to determine the stabilizing strategy that is most effective in terms of reliability of forecasting from limited data. We illustrate our methodology using simulated data as well as case incidence data for various epidemics including the 1918 influenza pandemic in San Francisco and the 2014-2015 Ebola epidemic in West Africa.

  20. Modeling the human development index and the percentage of poor people using quantile smoothing splines

    NASA Astrophysics Data System (ADS)

    Mulyani, Sri; Andriyana, Yudhie; Sudartianto

    2017-03-01

    Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.

  1. Empirical validation of statistical parametric mapping for group imaging of fast neural activity using electrical impedance tomography.

    PubMed

    Packham, B; Barnes, G; Dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D

    2016-06-01

    Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have  >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p  <  0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity.

  2. Empirical validation of statistical parametric mapping for group imaging of fast neural activity using electrical impedance tomography

    PubMed Central

    Packham, B; Barnes, G; dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D

    2016-01-01

    Abstract Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have  >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p  <  0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity. PMID:27203477

  3. Population-based absolute risk estimation with survey data

    PubMed Central

    Kovalchik, Stephanie A.; Pfeiffer, Ruth M.

    2013-01-01

    Absolute risk is the probability that a cause-specific event occurs in a given time interval in the presence of competing events. We present methods to estimate population-based absolute risk from a complex survey cohort that can accommodate multiple exposure-specific competing risks. The hazard function for each event type consists of an individualized relative risk multiplied by a baseline hazard function, which is modeled nonparametrically or parametrically with a piecewise exponential model. An influence method is used to derive a Taylor-linearized variance estimate for the absolute risk estimates. We introduce novel measures of the cause-specific influences that can guide modeling choices for the competing event components of the model. To illustrate our methodology, we build and validate cause-specific absolute risk models for cardiovascular and cancer deaths using data from the National Health and Nutrition Examination Survey. Our applications demonstrate the usefulness of survey-based risk prediction models for predicting health outcomes and quantifying the potential impact of disease prevention programs at the population level. PMID:23686614

  4. Confidence intervals for the first crossing point of two hazard functions.

    PubMed

    Cheng, Ming-Yen; Qiu, Peihua; Tan, Xianming; Tu, Dongsheng

    2009-12-01

    The phenomenon of crossing hazard rates is common in clinical trials with time to event endpoints. Many methods have been proposed for testing equality of hazard functions against a crossing hazards alternative. However, there has been relatively few approaches available in the literature for point or interval estimation of the crossing time point. The problem of constructing confidence intervals for the first crossing time point of two hazard functions is considered in this paper. After reviewing a recent procedure based on Cox proportional hazard modeling with Box-Cox transformation of the time to event, a nonparametric procedure using the kernel smoothing estimate of the hazard ratio is proposed. The proposed procedure and the one based on Cox proportional hazard modeling with Box-Cox transformation of the time to event are both evaluated by Monte-Carlo simulations and applied to two clinical trial datasets.

  5. Nonparametric methods for drought severity estimation at ungauged sites

    NASA Astrophysics Data System (ADS)

    Sadri, S.; Burn, D. H.

    2012-12-01

    The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.

  6. Robust neural network with applications to credit portfolio data analysis.

    PubMed

    Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun

    2010-01-01

    In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.

  7. A linear programming approach to characterizing norm bounded uncertainty from experimental data

    NASA Technical Reports Server (NTRS)

    Scheid, R. E.; Bayard, D. S.; Yam, Y.

    1991-01-01

    The linear programming spectral overbounding and factorization (LPSOF) algorithm, an algorithm for finding a minimum phase transfer function of specified order whose magnitude tightly overbounds a specified nonparametric function of frequency, is introduced. This method has direct application to transforming nonparametric uncertainty bounds (available from system identification experiments) into parametric representations required for modern robust control design software (i.e., a minimum-phase transfer function multiplied by a norm-bounded perturbation).

  8. Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample.

    PubMed

    Wang, Ching-Yun; Song, Xiao

    2016-11-01

    Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Functional Linear Model with Zero-value Coefficient Function at Sub-regions.

    PubMed

    Zhou, Jianhui; Wang, Nae-Yuh; Wang, Naisyin

    2013-01-01

    We propose a shrinkage method to estimate the coefficient function in a functional linear regression model when the value of the coefficient function is zero within certain sub-regions. Besides identifying the null region in which the coefficient function is zero, we also aim to perform estimation and inferences for the nonparametrically estimated coefficient function without over-shrinking the values. Our proposal consists of two stages. In stage one, the Dantzig selector is employed to provide initial location of the null region. In stage two, we propose a group SCAD approach to refine the estimated location of the null region and to provide the estimation and inference procedures for the coefficient function. Our considerations have certain advantages in this functional setup. One goal is to reduce the number of parameters employed in the model. With a one-stage procedure, it is needed to use a large number of knots in order to precisely identify the zero-coefficient region; however, the variation and estimation difficulties increase with the number of parameters. Owing to the additional refinement stage, we avoid this necessity and our estimator achieves superior numerical performance in practice. We show that our estimator enjoys the Oracle property; it identifies the null region with probability tending to 1, and it achieves the same asymptotic normality for the estimated coefficient function on the non-null region as the functional linear model estimator when the non-null region is known. Numerically, our refined estimator overcomes the shortcomings of the initial Dantzig estimator which tends to under-estimate the absolute scale of non-zero coefficients. The performance of the proposed method is illustrated in simulation studies. We apply the method in an analysis of data collected by the Johns Hopkins Precursors Study, where the primary interests are in estimating the strength of association between body mass index in midlife and the quality of life in physical functioning at old age, and in identifying the effective age ranges where such associations exist.

  10. Modeling SF-6D Hong Kong standard gamble health state preference data using a nonparametric Bayesian method.

    PubMed

    Kharroubi, Samer A; Brazier, John E; McGhee, Sarah

    2013-01-01

    This article reports on the findings from applying a recently described approach to modeling health state valuation data and the impact of the respondent characteristics on health state valuations. The approach applies a nonparametric model to estimate a Bayesian six-dimensional health state short form (derived from short-form 36 health survey) health state valuation algorithm. A sample of 197 states defined by the six-dimensional health state short form (derived from short-form 36 health survey)has been valued by a representative sample of the Hong Kong general population by using standard gamble. The article reports the application of the nonparametric model and compares it to the original model estimated by using a conventional parametric random effects model. The two models are compared theoretically and in terms of empirical performance. Advantages of the nonparametric model are that it can be used to predict scores in populations with different distributions of characteristics than observed in the survey sample and that it allows for the impact of respondent characteristics to vary by health state (while ensuring that full health passes through unity). The results suggest an important age effect with sex, having some effect, but the remaining covariates having no discernible effect. The nonparametric Bayesian model is argued to be more theoretically appropriate than previously used parametric models. Furthermore, it is more flexible to take into account the impact of covariates. Copyright © 2013, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc.

  11. Using a Nonparametric Bootstrap to Obtain a Confidence Interval for Pearson's "r" with Cluster Randomized Data: A Case Study

    ERIC Educational Resources Information Center

    Wagstaff, David A.; Elek, Elvira; Kulis, Stephen; Marsiglia, Flavio

    2009-01-01

    A nonparametric bootstrap was used to obtain an interval estimate of Pearson's "r," and test the null hypothesis that there was no association between 5th grade students' positive substance use expectancies and their intentions to not use substances. The students were participating in a substance use prevention program in which the unit of…

  12. Learning Circulant Sensing Kernels

    DTIC Science & Technology

    2014-03-01

    Furthermore, we test learning the circulant sensing matrix/operator and the nonparametric dictionary altogether and obtain even better performance. We...scale. Furthermore, we test learning the circulant sensing matrix/operator and the nonparametric dictionary altogether and obtain even better performance...matrices, Tropp et al.[28] de - scribes a random filter for acquiring a signal x̄; Haupt et al.[12] describes a channel estimation problem to identify a

  13. Identification and estimation of survivor average causal effects.

    PubMed

    Tchetgen Tchetgen, Eric J

    2014-09-20

    In longitudinal studies, outcomes ascertained at follow-up are typically undefined for individuals who die prior to the follow-up visit. In such settings, outcomes are said to be truncated by death and inference about the effects of a point treatment or exposure, restricted to individuals alive at the follow-up visit, could be biased even if as in experimental studies, treatment assignment were randomized. To account for truncation by death, the survivor average causal effect (SACE) defines the effect of treatment on the outcome for the subset of individuals who would have survived regardless of exposure status. In this paper, the author nonparametrically identifies SACE by leveraging post-exposure longitudinal correlates of survival and outcome that may also mediate the exposure effects on survival and outcome. Nonparametric identification is achieved by supposing that the longitudinal data arise from a certain nonparametric structural equations model and by making the monotonicity assumption that the effect of exposure on survival agrees in its direction across individuals. A novel weighted analysis involving a consistent estimate of the survival process is shown to produce consistent estimates of SACE. A data illustration is given, and the methods are extended to the context of time-varying exposures. We discuss a sensitivity analysis framework that relaxes assumptions about independent errors in the nonparametric structural equations model and may be used to assess the extent to which inference may be altered by a violation of key identifying assumptions. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

  14. Identification and estimation of survivor average causal effects

    PubMed Central

    Tchetgen, Eric J Tchetgen

    2014-01-01

    In longitudinal studies, outcomes ascertained at follow-up are typically undefined for individuals who die prior to the follow-up visit. In such settings, outcomes are said to be truncated by death and inference about the effects of a point treatment or exposure, restricted to individuals alive at the follow-up visit, could be biased even if as in experimental studies, treatment assignment were randomized. To account for truncation by death, the survivor average causal effect (SACE) defines the effect of treatment on the outcome for the subset of individuals who would have survived regardless of exposure status. In this paper, the author nonparametrically identifies SACE by leveraging post-exposure longitudinal correlates of survival and outcome that may also mediate the exposure effects on survival and outcome. Nonparametric identification is achieved by supposing that the longitudinal data arise from a certain nonparametric structural equations model and by making the monotonicity assumption that the effect of exposure on survival agrees in its direction across individuals. A novel weighted analysis involving a consistent estimate of the survival process is shown to produce consistent estimates of SACE. A data illustration is given, and the methods are extended to the context of time-varying exposures. We discuss a sensitivity analysis framework that relaxes assumptions about independent errors in the nonparametric structural equations model and may be used to assess the extent to which inference may be altered by a violation of key identifying assumptions. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24889022

  15. Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    PubMed Central

    Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.

    2014-01-01

    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289

  16. Parametric and non-parametric species delimitation methods result in the recognition of two new Neotropical woody bamboo species.

    PubMed

    Ruiz-Sanchez, Eduardo

    2015-12-01

    The Neotropical woody bamboo genus Otatea is one of five genera in the subtribe Guaduinae. Of the eight described Otatea species, seven are endemic to Mexico and one is also distributed in Central and South America. Otatea acuminata has the widest geographical distribution of the eight species, and two of its recently collected populations do not match the known species morphologically. Parametric and non-parametric methods were used to delimit the species in Otatea using five chloroplast markers, one nuclear marker, and morphological characters. The parametric coalescent method and the non-parametric analysis supported the recognition of two distinct evolutionary lineages. Molecular clock estimates were used to estimate divergence times in Otatea. The results for divergence time in Otatea estimated the origin of the speciation events from the Late Miocene to Late Pleistocene. The species delimitation analyses (parametric and non-parametric) identified that the two populations of O. acuminata from Chiapas and Hidalgo are from two separate evolutionary lineages and these new species have morphological characters that separate them from O. acuminata s.s. The geological activity of the Trans-Mexican Volcanic Belt and the Isthmus of Tehuantepec may have isolated populations and limited the gene flow between Otatea species, driving speciation. Based on the results found here, I describe Otatea rzedowskiorum and Otatea victoriae as two new species, morphologically different from O. acuminata. Copyright © 2015 Elsevier Inc. All rights reserved.

  17. Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution.

    PubMed

    Rigby, Robert A; Stasinopoulos, D Mikis

    2004-10-15

    The Box-Cox power exponential (BCPE) distribution, developed in this paper, provides a model for a dependent variable Y exhibiting both skewness and kurtosis (leptokurtosis or platykurtosis). The distribution is defined by a power transformation Y(nu) having a shifted and scaled (truncated) standard power exponential distribution with parameter tau. The distribution has four parameters and is denoted BCPE (mu,sigma,nu,tau). The parameters, mu, sigma, nu and tau, may be interpreted as relating to location (median), scale (approximate coefficient of variation), skewness (transformation to symmetry) and kurtosis (power exponential parameter), respectively. Smooth centile curves are obtained by modelling each of the four parameters of the distribution as a smooth non-parametric function of an explanatory variable. A Fisher scoring algorithm is used to fit the non-parametric model by maximizing a penalized likelihood. The first and expected second and cross derivatives of the likelihood, with respect to mu, sigma, nu and tau, required for the algorithm, are provided. The centiles of the BCPE distribution are easy to calculate, so it is highly suited to centile estimation. This application of the BCPE distribution to smooth centile estimation provides a generalization of the LMS method of the centile estimation to data exhibiting kurtosis (as well as skewness) different from that of a normal distribution and is named here the LMSP method of centile estimation. The LMSP method of centile estimation is applied to modelling the body mass index of Dutch males against age. 2004 John Wiley & Sons, Ltd.

  18. [Detection of quadratic phase coupling between EEG signal components by nonparamatric and parametric methods of bispectral analysis].

    PubMed

    Schmidt, K; Witte, H

    1999-11-01

    Recently the assumption of the independence of individual frequency components in a signal has been rejected, for example, for the EEG during defined physiological states such as sleep or sedation [9, 10]. Thus, the use of higher-order spectral analysis capable of detecting interrelations between individual signal components has proved useful. The aim of the present study was to investigate the quality of various non-parametric and parametric estimation algorithms using simulated as well as true physiological data. We employed standard algorithms available for the MATLAB. The results clearly show that parametric bispectral estimation is superior to non-parametric estimation in terms of the quality of peak localisation and the discrimination from other peaks.

  19. Non-parametric model selection for subject-specific topological organization of resting-state functional connectivity.

    PubMed

    Ferrarini, Luca; Veer, Ilya M; van Lew, Baldur; Oei, Nicole Y L; van Buchem, Mark A; Reiber, Johan H C; Rombouts, Serge A R B; Milles, J

    2011-06-01

    In recent years, graph theory has been successfully applied to study functional and anatomical connectivity networks in the human brain. Most of these networks have shown small-world topological characteristics: high efficiency in long distance communication between nodes, combined with highly interconnected local clusters of nodes. Moreover, functional studies performed at high resolutions have presented convincing evidence that resting-state functional connectivity networks exhibits (exponentially truncated) scale-free behavior. Such evidence, however, was mostly presented qualitatively, in terms of linear regressions of the degree distributions on log-log plots. Even when quantitative measures were given, these were usually limited to the r(2) correlation coefficient. However, the r(2) statistic is not an optimal estimator of explained variance, when dealing with (truncated) power-law models. Recent developments in statistics have introduced new non-parametric approaches, based on the Kolmogorov-Smirnov test, for the problem of model selection. In this work, we have built on this idea to statistically tackle the issue of model selection for the degree distribution of functional connectivity at rest. The analysis, performed at voxel level and in a subject-specific fashion, confirmed the superiority of a truncated power-law model, showing high consistency across subjects. Moreover, the most highly connected voxels were found to be consistently part of the default mode network. Our results provide statistically sound support to the evidence previously presented in literature for a truncated power-law model of resting-state functional connectivity. Copyright © 2010 Elsevier Inc. All rights reserved.

  20. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach

    PubMed Central

    Kang, Le; Chen, Weijie; Petrick, Nicholas A.; Gallas, Brandon D.

    2014-01-01

    The area under the receiver operating characteristic (ROC) curve (AUC) is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of AUC, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. PMID:25399736

  1. Non-parametric methods for cost-effectiveness analysis: the central limit theorem and the bootstrap compared.

    PubMed

    Nixon, Richard M; Wonderling, David; Grieve, Richard D

    2010-03-01

    Cost-effectiveness analyses (CEA) alongside randomised controlled trials commonly estimate incremental net benefits (INB), with 95% confidence intervals, and compute cost-effectiveness acceptability curves and confidence ellipses. Two alternative non-parametric methods for estimating INB are to apply the central limit theorem (CLT) or to use the non-parametric bootstrap method, although it is unclear which method is preferable. This paper describes the statistical rationale underlying each of these methods and illustrates their application with a trial-based CEA. It compares the sampling uncertainty from using either technique in a Monte Carlo simulation. The experiments are repeated varying the sample size and the skewness of costs in the population. The results showed that, even when data were highly skewed, both methods accurately estimated the true standard errors (SEs) when sample sizes were moderate to large (n>50), and also gave good estimates for small data sets with low skewness. However, when sample sizes were relatively small and the data highly skewed, using the CLT rather than the bootstrap led to slightly more accurate SEs. We conclude that while in general using either method is appropriate, the CLT is easier to implement, and provides SEs that are at least as accurate as the bootstrap. (c) 2009 John Wiley & Sons, Ltd.

  2. Nonparametric estimation of benchmark doses in environmental risk assessment

    PubMed Central

    Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

    2013-01-01

    Summary An important statistical objective in environmental risk analysis is estimation of minimum exposure levels, called benchmark doses (BMDs), that induce a pre-specified benchmark response in a dose-response experiment. In such settings, representations of the risk are traditionally based on a parametric dose-response model. It is a well-known concern, however, that if the chosen parametric form is misspecified, inaccurate and possibly unsafe low-dose inferences can result. We apply a nonparametric approach for calculating benchmark doses, based on an isotonic regression method for dose-response estimation with quantal-response data (Bhattacharya and Kong, 2007). We determine the large-sample properties of the estimator, develop bootstrap-based confidence limits on the BMDs, and explore the confidence limits’ small-sample properties via a short simulation study. An example from cancer risk assessment illustrates the calculations. PMID:23914133

  3. Design of a completely model free adaptive control in the presence of parametric, non-parametric uncertainties and random control signal delay.

    PubMed

    Tutsoy, Onder; Barkana, Duygun Erol; Tugal, Harun

    2018-05-01

    In this paper, an adaptive controller is developed for discrete time linear systems that takes into account parametric uncertainty, internal-external non-parametric random uncertainties, and time varying control signal delay. Additionally, the proposed adaptive control is designed in such a way that it is utterly model free. Even though these properties are studied separately in the literature, they are not taken into account all together in adaptive control literature. The Q-function is used to estimate long-term performance of the proposed adaptive controller. Control policy is generated based on the long-term predicted value, and this policy searches an optimal stabilizing control signal for uncertain and unstable systems. The derived control law does not require an initial stabilizing control assumption as in the ones in the recent literature. Learning error, control signal convergence, minimized Q-function, and instantaneous reward are analyzed to demonstrate the stability and effectiveness of the proposed adaptive controller in a simulation environment. Finally, key insights on parameters convergence of the learning and control signals are provided. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.

  4. Measuring agreement of multivariate discrete survival times using a modified weighted kappa coefficient.

    PubMed

    Guo, Ying; Manatunga, Amita K

    2009-03-01

    Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. We present a modified weighted kappa coefficient to measure agreement between bivariate discrete survival times. The proposed kappa coefficient accommodates censoring by redistributing the mass of censored observations within the grid where the unobserved events may potentially happen. A generalized modified weighted kappa is proposed for multivariate discrete survival times. We estimate the modified kappa coefficients nonparametrically through a multivariate survival function estimator. The asymptotic properties of the kappa estimators are established and the performance of the estimators are examined through simulation studies of bivariate and trivariate survival times. We illustrate the application of the modified kappa coefficient in the presence of censored observations with data from a prostate cancer study.

  5. STATISTICAL ESTIMATION AND VISUALIZATION OF GROUND-WATER CONTAMINATION DATA

    EPA Science Inventory

    This work presents methods of visualizing and animating statistical estimates of ground water and/or soil contamination over a region from observations of the contaminant for that region. The primary statistical methods used to produce the regional estimates are nonparametric re...

  6. Economic decision making and the application of nonparametric prediction models

    USGS Publications Warehouse

    Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.

    2007-01-01

    Sustained increases in energy prices have focused attention on gas resources in low permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are large. Planning and development decisions for extraction of such resources must be area-wide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm the decision to enter such plays depends on reconnaissance level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional scale cost functions. The context of the worked example is the Devonian Antrim shale gas play, Michigan Basin. One finding relates to selection of the resource prediction model to be used with economic models. Models which can best predict aggregate volume over larger areas (many hundreds of sites) may lose granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined by extraneous factors. The paper also shows that when these simple prediction models are used to strategically order drilling prospects, the gain in gas volume over volumes associated with simple random site selection amounts to 15 to 20 percent. It also discusses why the observed benefit of updating predictions from results of new drilling, as opposed to following static predictions, is somewhat smaller. Copyright 2007, Society of Petroleum Engineers.

  7. Nonparametric entropy estimation using kernel densities.

    PubMed

    Lake, Douglas E

    2009-01-01

    The entropy of experimental data from the biological and medical sciences provides additional information over summary statistics. Calculating entropy involves estimates of probability density functions, which can be effectively accomplished using kernel density methods. Kernel density estimation has been widely studied and a univariate implementation is readily available in MATLAB. The traditional definition of Shannon entropy is part of a larger family of statistics, called Renyi entropy, which are useful in applications that require a measure of the Gaussianity of data. Of particular note is the quadratic entropy which is related to the Friedman-Tukey (FT) index, a widely used measure in the statistical community. One application where quadratic entropy is very useful is the detection of abnormal cardiac rhythms, such as atrial fibrillation (AF). Asymptotic and exact small-sample results for optimal bandwidth and kernel selection to estimate the FT index are presented and lead to improved methods for entropy estimation.

  8. Updating estimates of low streamflow statistics to account for possible trends

    NASA Astrophysics Data System (ADS)

    Blum, A. G.; Archfield, S. A.; Hirsch, R. M.; Vogel, R. M.; Kiang, J. E.; Dudley, R. W.

    2017-12-01

    Given evidence of both increasing and decreasing trends in low flows in many streams, methods are needed to update estimators of low flow statistics used in water resources management. One such metric is the 10-year annual low-flow statistic (7Q10) calculated as the annual minimum seven-day streamflow which is exceeded in nine out of ten years on average. Historical streamflow records may not be representative of current conditions at a site if environmental conditions are changing. We present a new approach to frequency estimation under nonstationary conditions that applies a stationary nonparametric quantile estimator to a subset of the annual minimum flow record. Monte Carlo simulation experiments were used to evaluate this approach across a range of trend and no trend scenarios. Relative to the standard practice of using the entire available streamflow record, use of a nonparametric quantile estimator combined with selection of the most recent 30 or 50 years for 7Q10 estimation were found to improve accuracy and reduce bias. Benefits of data subset selection approaches were greater for higher magnitude trends annual minimum flow records with lower coefficients of variation. A nonparametric trend test approach for subset selection did not significantly improve upon always selecting the last 30 years of record. At 174 stream gages in the Chesapeake Bay region, 7Q10 estimators based on the most recent 30 years of flow record were compared to estimators based on the entire period of record. Given the availability of long records of low streamflow, using only a subset of the flow record ( 30 years) can be used to update 7Q10 estimators to better reflect current streamflow conditions.

  9. Historical HIV incidence modelling in regional subgroups: use of flexible discrete models with penalized splines based on prior curves.

    PubMed

    Greenland, S

    1996-03-15

    This paper presents an approach to back-projection (back-calculation) of human immunodeficiency virus (HIV) person-year infection rates in regional subgroups based on combining a log-linear model for subgroup differences with a penalized spline model for trends. The penalized spline approach allows flexible trend estimation but requires far fewer parameters than fully non-parametric smoothers, thus saving parameters that can be used in estimating subgroup effects. Use of reasonable prior curve to construct the penalty function minimizes the degree of smoothing needed beyond model specification. The approach is illustrated in application to acquired immunodeficiency syndrome (AIDS) surveillance data from Los Angeles County.

  10. Semivarying coefficient models for capture-recapture data: colony size estimation for the little penguin Eudyptula minor.

    PubMed

    Stoklosa, Jakub; Dann, Peter; Huggins, Richard

    2014-09-01

    To accommodate seasonal effects that change from year to year into models for the size of an open population we consider a time-varying coefficient model. We fit this model to a capture-recapture data set collected on the little penguin Eudyptula minor in south-eastern Australia over a 25 year period using Jolly-Seber type estimators and nonparametric P-spline techniques. The time-varying coefficient model identified strong changes in the seasonal pattern across the years which we further examined using functional data analysis techniques. To evaluate the methodology we also conducted several simulation studies that incorporate seasonal variation. Copyright © 2014 Elsevier Inc. All rights reserved.

  11. High throughput nonparametric probability density estimation.

    PubMed

    Farmer, Jenny; Jacobs, Donald

    2018-01-01

    In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.

  12. High throughput nonparametric probability density estimation

    PubMed Central

    Farmer, Jenny

    2018-01-01

    In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803

  13. Non-parametric correlative uncertainty quantification and sensitivity analysis: Application to a Langmuir bimolecular adsorption model

    NASA Astrophysics Data System (ADS)

    Feng, Jinchao; Lansford, Joshua; Mironenko, Alexander; Pourkargar, Davood Babaei; Vlachos, Dionisios G.; Katsoulakis, Markos A.

    2018-03-01

    We propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, are correlated and are limited in size (small data). The proposed mathematical methodology employs gradient-based methods to compute sensitivity indices. We observe that ranking influential parameters depends critically on whether or not correlations between parameters are taken into account. The impact of uncertainty in the correlation and the necessity of the proposed non-parametric perspective are demonstrated.

  14. Bayesian non-parametric inference for stochastic epidemic models using Gaussian Processes.

    PubMed

    Xu, Xiaoguang; Kypraios, Theodore; O'Neill, Philip D

    2016-10-01

    This paper considers novel Bayesian non-parametric methods for stochastic epidemic models. Many standard modeling and data analysis methods use underlying assumptions (e.g. concerning the rate at which new cases of disease will occur) which are rarely challenged or tested in practice. To relax these assumptions, we develop a Bayesian non-parametric approach using Gaussian Processes, specifically to estimate the infection process. The methods are illustrated with both simulated and real data sets, the former illustrating that the methods can recover the true infection process quite well in practice, and the latter illustrating that the methods can be successfully applied in different settings. © The Author 2016. Published by Oxford University Press.

  15. Robust estimation for partially linear models with large-dimensional covariates

    PubMed Central

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2014-01-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087

  16. Robust estimation for partially linear models with large-dimensional covariates.

    PubMed

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2013-10-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.

  17. Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures

    NASA Astrophysics Data System (ADS)

    Li, Quanbao; Wei, Fajie; Zhou, Shenghan

    2017-05-01

    The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.

  18. A non-parametric automatic blending methodology to estimate rainfall fields from rain gauge and radar data

    NASA Astrophysics Data System (ADS)

    Velasco-Forero, Carlos A.; Sempere-Torres, Daniel; Cassiraga, Eduardo F.; Jaime Gómez-Hernández, J.

    2009-07-01

    Quantitative estimation of rainfall fields has been a crucial objective from early studies of the hydrological applications of weather radar. Previous studies have suggested that flow estimations are improved when radar and rain gauge data are combined to estimate input rainfall fields. This paper reports new research carried out in this field. Classical approaches for the selection and fitting of a theoretical correlogram (or semivariogram) model (needed to apply geostatistical estimators) are avoided in this study. Instead, a non-parametric technique based on FFT is used to obtain two-dimensional positive-definite correlograms directly from radar observations, dealing with both the natural anisotropy and the temporal variation of the spatial structure of the rainfall in the estimated fields. Because these correlation maps can be automatically obtained at each time step of a given rainfall event, this technique might easily be used in operational (real-time) applications. This paper describes the development of the non-parametric estimator exploiting the advantages of FFT for the automatic computation of correlograms and provides examples of its application on a case study using six rainfall events. This methodology is applied to three different alternatives to incorporate the radar information (as a secondary variable), and a comparison of performances is provided. In particular, their ability to reproduce in estimated rainfall fields (i) the rain gauge observations (in a cross-validation analysis) and (ii) the spatial patterns of radar fields are analyzed. Results seem to indicate that the methodology of kriging with external drift [KED], in combination with the technique of automatically computing 2-D spatial correlograms, provides merged rainfall fields with good agreement with rain gauges and with the most accurate approach to the spatial tendencies observed in the radar rainfall fields, when compared with other alternatives analyzed.

  19. Sci—Fri PM: Topics — 06: The influence of regional dose sensitivity on salivary loss and recovery in the parotid gland

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clark, H; BC Cancer Agency, Surrey, B.C.; BC Cancer Agency, Vancouver, B.C.

    Purpose: The Quantitative Analyses of Normal Tissue Effects in the Clinic (QUANTEC 2010) survey of radiation dose-volume effects on salivary gland function has called for improved understanding of intragland dose sensitivity and the effectiveness of partial sparing in salivary glands. Regional dose susceptibility of sagittally- and coronally-sub-segmented parotid gland has been studied. Specifically, we examine whether individual consideration of sub-segments leads to improved prediction of xerostomia compared with whole parotid mean dose. Methods: Data from 102 patients treated for head-and-neck cancers at the BC Cancer Agency were used in this study. Whole mouth stimulated saliva was collected before (baseline), threemore » months, and one year after cessation of radiotherapy. Organ volumes were contoured using treatment planning CT images and sub-segmented into regional portions. Both non-parametric (local regression) and parametric (mean dose exponential fitting) methods were employed. A bootstrap technique was used for reliability estimation and cross-comparison. Results: Salivary loss is described well using non-parametric and mean dose models. Parametric fits suggest a significant distinction in dose response between medial-lateral and anterior-posterior aspects of the parotid (p<0.01). Least-squares and least-median squares estimates differ significantly (p<0.00001), indicating fits may be skewed by noise or outliers. Salivary recovery exhibits a weakly arched dose response: the highest recovery is seen at intermediate doses. Conclusions: Salivary function loss is strongly dose dependent. In contrast no useful dose dependence was observed for function recovery. Regional dose dependence was observed, but may have resulted from a bias in dose distributions.« less

  20. A program for the Bayesian Neural Network in the ROOT framework

    NASA Astrophysics Data System (ADS)

    Zhong, Jiahang; Huang, Run-Sheng; Lee, Shih-Chang

    2011-12-01

    We present a Bayesian Neural Network algorithm implemented in the TMVA package (Hoecker et al., 2007 [1]), within the ROOT framework (Brun and Rademakers, 1997 [2]). Comparing to the conventional utilization of Neural Network as discriminator, this new implementation has more advantages as a non-parametric regression tool, particularly for fitting probabilities. It provides functionalities including cost function selection, complexity control and uncertainty estimation. An example of such application in High Energy Physics is shown. The algorithm is available with ROOT release later than 5.29. Program summaryProgram title: TMVA-BNN Catalogue identifier: AEJX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: BSD license No. of lines in distributed program, including test data, etc.: 5094 No. of bytes in distributed program, including test data, etc.: 1,320,987 Distribution format: tar.gz Programming language: C++ Computer: Any computer system or cluster with C++ compiler and UNIX-like operating system Operating system: Most UNIX/Linux systems. The application programs were thoroughly tested under Fedora and Scientific Linux CERN. Classification: 11.9 External routines: ROOT package version 5.29 or higher ( http://root.cern.ch) Nature of problem: Non-parametric fitting of multivariate distributions Solution method: An implementation of Neural Network following the Bayesian statistical interpretation. Uses Laplace approximation for the Bayesian marginalizations. Provides the functionalities of automatic complexity control and uncertainty estimation. Running time: Time consumption for the training depends substantially on the size of input sample, the NN topology, the number of training iterations, etc. For the example in this manuscript, about 7 min was used on a PC/Linux with 2.0 GHz processors.

  1. Pattern recognition for passive polarimetric data using nonparametric classifiers

    NASA Astrophysics Data System (ADS)

    Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.

    2005-08-01

    Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.

  2. On an additive partial correlation operator and nonparametric estimation of graphical models.

    PubMed

    Lee, Kuang-Yao; Li, Bing; Zhao, Hongyu

    2016-09-01

    We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance.

  3. On an additive partial correlation operator and nonparametric estimation of graphical models

    PubMed Central

    Li, Bing; Zhao, Hongyu

    2016-01-01

    Abstract We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance. PMID:29422689

  4. Nonparametric Estimation of Distribution and Density Functions with Applications.

    DTIC Science & Technology

    1982-05-01

    8 > Z C.. o i, ,- 4 8 ...S C[ 17 : - j 46 0- IA ~ IL ar’ D rn B- - 4 -4A B- 8 ~a *~jai 188 the jackknife may be found in Gray, et al., and Cressie (Refs 15,28). Analogous to...convergence. 21 =7w; Ul I 0083 q C/) 22S , WJ 8 , J035 224 43 UC I a- 04 14 Q) ’I 4 -4a-Q 23z Let R1 be the real line, the borel field on R 1 and P,

  5. A flexible model for the mean and variance functions, with application to medical cost data.

    PubMed

    Chen, Jinsong; Liu, Lei; Zhang, Daowen; Shih, Ya-Chen T

    2013-10-30

    Medical cost data are often skewed to the right and heteroscedastic, having a nonlinear relation with covariates. To tackle these issues, we consider an extension to generalized linear models by assuming nonlinear associations of covariates in the mean function and allowing the variance to be an unknown but smooth function of the mean. We make no further assumption on the distributional form. The unknown functions are described by penalized splines, and the estimation is carried out using nonparametric quasi-likelihood. Simulation studies show the flexibility and advantages of our approach. We apply the model to the annual medical costs of heart failure patients in the clinical data repository at the University of Virginia Hospital System. Copyright © 2013 John Wiley & Sons, Ltd.

  6. Marginal regression approach for additive hazards models with clustered current status data.

    PubMed

    Su, Pei-Fang; Chi, Yunchan

    2014-01-15

    Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated. Copyright © 2013 John Wiley & Sons, Ltd.

  7. Robustness of survival estimates for radio-marked animals

    USGS Publications Warehouse

    Bunck, C.M.; Chen, C.-L.

    1992-01-01

    Telemetry techniques are often used to study the survival of birds and mammals; particularly whcn mark-recapture approaches are unsuitable. Both parametric and nonparametric methods to estimate survival have becn developed or modified from other applications. An implicit assumption in these approaches is that the probability of re-locating an animal with a functioning transmitter is one. A Monte Carlo study was conducted to determine the bias and variance of the Kaplan-Meier estimator and an estimator based also on the assumption of constant hazard and to eva!uate the performance of the two-sample tests associated with each. Modifications of each estimator which allow a re-Iocation probability of less than one are described and evaluated. Generallv the unmodified estimators were biased but had lower variance. At low sample sizes all estimators performed poorly. Under the null hypothesis, the distribution of all test statistics reasonably approximated the null distribution when survival was low but not when it was high. The power of the two-sample tests were similar.

  8. A comparison of United States and United Kingdom EQ-5D health states valuations using a nonparametric Bayesian method.

    PubMed

    Kharroubi, Samer A; O'Hagan, Anthony; Brazier, John E

    2010-07-10

    Cost-effectiveness analysis of alternative medical treatments relies on having a measure of effectiveness, and many regard the quality adjusted life year (QALY) to be the current 'gold standard.' In order to compute QALYs, we require a suitable system for describing a person's health state, and a utility measure to value the quality of life associated with each possible state. There are a number of different health state descriptive systems, and we focus here on one known as the EQ-5D. Data for estimating utilities for different health states have a number of features that mean care is necessary in statistical modelling.There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained from different countries. This article applies a nonparametric model to estimate and compare EQ-5D health state valuation data obtained from two countries using Bayesian methods. The data set is the US and UK EQ-5D valuation studies where a sample of 42 states defined by the EQ-5D was valued by representative samples of the general population from each country using the time trade-off technique. We estimate a utility function across both countries which explicitly accounts for the differences between them, and is estimated using the data from both countries. The article discusses the implications of these results for future applications of the EQ-5D and for further work in this field. Copyright 2010 John Wiley & Sons, Ltd.

  9. An improved nonparametric lower bound of species richness via a modified good-turing frequency formula.

    PubMed

    Chiu, Chun-Huo; Wang, Yi-Ting; Walther, Bruno A; Chao, Anne

    2014-09-01

    It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper-diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265-270) for individual-based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good-Turing frequency formula, we derive an approximate formula for the first-order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators. © 2014, The International Biometric Society.

  10. Nonparametric estimation of groundwater residence time distributions: What can environmental tracer data tell us about groundwater residence time?

    NASA Astrophysics Data System (ADS)

    McCallum, James L.; Engdahl, Nicholas B.; Ginn, Timothy R.; Cook, Peter. G.

    2014-03-01

    Residence time distributions (RTDs) have been used extensively for quantifying flow and transport in subsurface hydrology. In geochemical approaches, environmental tracer concentrations are used in conjunction with simple lumped parameter models (LPMs). Conversely, numerical simulation techniques require large amounts of parameterization and estimated RTDs are certainly limited by associated uncertainties. In this study, we apply a nonparametric deconvolution approach to estimate RTDs using environmental tracer concentrations. The model is based only on the assumption that flow is steady enough that the observed concentrations are well approximated by linear superposition of the input concentrations with the RTD; that is, the convolution integral holds. Even with large amounts of environmental tracer concentration data, the entire shape of an RTD remains highly nonunique. However, accurate estimates of mean ages and in some cases prediction of young portions of the RTD may be possible. The most useful type of data was found to be the use of a time series of tritium. This was due to the sharp variations in atmospheric concentrations and a short half-life. Conversely, the use of CFC compounds with smoothly varying atmospheric concentrations was more prone to nonuniqueness. This work highlights the benefits and limitations of using environmental tracer data to estimate whole RTDs with either LPMs or through numerical simulation. However, the ability of the nonparametric approach developed here to correct for mixing biases in mean ages appears promising.

  11. Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach.

    PubMed

    Kang, Le; Chen, Weijie; Petrick, Nicholas A; Gallas, Brandon D

    2015-02-20

    The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. Copyright © 2014 John Wiley & Sons, Ltd.

  12. A Powerful Test for Comparing Multiple Regression Functions.

    PubMed

    Maity, Arnab

    2012-09-01

    In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).

  13. A nonparametric mean-variance smoothing method to assess Arabidopsis cold stress transcriptional regulator CBF2 overexpression microarray data.

    PubMed

    Hu, Pingsha; Maiti, Tapabrata

    2011-01-01

    Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.

  14. A Nonparametric Mean-Variance Smoothing Method to Assess Arabidopsis Cold Stress Transcriptional Regulator CBF2 Overexpression Microarray Data

    PubMed Central

    Hu, Pingsha; Maiti, Tapabrata

    2011-01-01

    Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181

  15. Leveraging Genomic Annotations and Pleiotropic Enrichment for Improved Replication Rates in Schizophrenia GWAS

    PubMed Central

    Wang, Yunpeng; Thompson, Wesley K.; Schork, Andrew J.; Holland, Dominic; Chen, Chi-Hua; Bettella, Francesco; Desikan, Rahul S.; Li, Wen; Witoelar, Aree; Zuber, Verena; Devor, Anna; Nöthen, Markus M.; Rietschel, Marcella; Chen, Qiang; Werge, Thomas; Cichon, Sven; Weinberger, Daniel R.; Djurovic, Srdjan; O’Donovan, Michael; Visscher, Peter M.; Andreassen, Ole A.; Dale, Anders M.

    2016-01-01

    Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotropy) for each single nucleotide polymorphism (SNP) to enable more accurate estimation of replication probabilities, conditional on the observed test statistic (“z-score”) of the SNP. We use a multiple logistic regression on z-scores to combine information from auxiliary information to derive a “relative enrichment score” for each SNP. For each stratum of these relative enrichment scores, we obtain nonparametric estimates of posterior expected test statistics and replication probabilities as a function of discovery z-scores, using a resampling-based approach that repeatedly and randomly partitions meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both they are genome-wide significant (p < 5x10-8). There were 693 and 219 independent loci with model-based replication rates ≥80% and ≥90%, respectively. Compared to analyses not incorporating relative enrichment scores, CM3 increased out-of-sample yield for SNPs that replicate at a given rate. This demonstrates that replication probabilities can be more accurately estimated using prior enrichment information with CM3. PMID:26808560

  16. L-statistics for Repeated Measurements Data With Application to Trimmed Means, Quantiles and Tolerance Intervals.

    PubMed

    Assaad, Houssein I; Choudhary, Pankaj K

    2013-01-01

    The L -statistics form an important class of estimators in nonparametric statistics. Its members include trimmed means and sample quantiles and functions thereof. This article is devoted to theory and applications of L -statistics for repeated measurements data, wherein the measurements on the same subject are dependent and the measurements from different subjects are independent. This article has three main goals: (a) Show that the L -statistics are asymptotically normal for repeated measurements data. (b) Present three statistical applications of this result, namely, location estimation using trimmed means, quantile estimation and construction of tolerance intervals. (c) Obtain a Bahadur representation for sample quantiles. These results are generalizations of similar results for independently and identically distributed data. The practical usefulness of these results is illustrated by analyzing a real data set involving measurement of systolic blood pressure. The properties of the proposed point and interval estimators are examined via simulation.

  17. Nonparametric Estimation of Standard Errors in Covariance Analysis Using the Infinitesimal Jackknife

    ERIC Educational Resources Information Center

    Jennrich, Robert I.

    2008-01-01

    The infinitesimal jackknife provides a simple general method for estimating standard errors in covariance structure analysis. Beyond its simplicity and generality what makes the infinitesimal jackknife method attractive is that essentially no assumptions are required to produce consistent standard error estimates, not even the requirement that the…

  18. Geostatistics and petroleum geology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hohn, M.E.

    1988-01-01

    This book examines purpose and use of geostatistics in exploration and development of oil and gas with an emphasis on appropriate and pertinent case studies. It present an overview of geostatistics. Topics covered include: The semivariogram; Linear estimation; Multivariate geostatistics; Nonlinear estimation; From indicator variables to nonparametric estimation; and More detail, less certainty; conditional simulation.

  19. Characterization of a maximum-likelihood nonparametric density estimator of kernel type

    NASA Technical Reports Server (NTRS)

    Geman, S.; Mcclure, D. E.

    1982-01-01

    Kernel type density estimators calculated by the method of sieves. Proofs are presented for the characterization theorem: Let x(1), x(2),...x(n) be a random sample from a population with density f(0). Let sigma 0 and consider estimators f of f(0) defined by (1).

  20. Nonparametric Model of Smooth Muscle Force Production During Electrical Stimulation.

    PubMed

    Cole, Marc; Eikenberry, Steffen; Kato, Takahide; Sandler, Roman A; Yamashiro, Stanley M; Marmarelis, Vasilis Z

    2017-03-01

    A nonparametric model of smooth muscle tension response to electrical stimulation was estimated using the Laguerre expansion technique of nonlinear system kernel estimation. The experimental data consisted of force responses of smooth muscle to energy-matched alternating single pulse and burst current stimuli. The burst stimuli led to at least a 10-fold increase in peak force in smooth muscle from Mytilus edulis, despite the constant energy constraint. A linear model did not fit the data. However, a second-order model fit the data accurately, so the higher-order models were not required to fit the data. Results showed that smooth muscle force response is not linearly related to the stimulation power.

  1. Wavelet Filtering to Reduce Conservatism in Aeroservoelastic Robust Stability Margins

    NASA Technical Reports Server (NTRS)

    Brenner, Marty; Lind, Rick

    1998-01-01

    Wavelet analysis for filtering and system identification was used to improve the estimation of aeroservoelastic stability margins. The conservatism of the robust stability margins was reduced with parametric and nonparametric time-frequency analysis of flight data in the model validation process. Nonparametric wavelet processing of data was used to reduce the effects of external desirableness and unmodeled dynamics. Parametric estimates of modal stability were also extracted using the wavelet transform. Computation of robust stability margins for stability boundary prediction depends on uncertainty descriptions derived from the data for model validation. F-18 high Alpha Research Vehicle aeroservoelastic flight test data demonstrated improved robust stability prediction by extension of the stability boundary beyond the flight regime.

  2. A frequentist approach to computer model calibration

    DOE PAGES

    Wong, Raymond K. W.; Storlie, Curtis Byron; Lee, Thomas C. M.

    2016-05-05

    The paper considers the computer model calibration problem and provides a general frequentist solution. Under the framework proposed, the data model is semiparametric with a non-parametric discrepancy function which accounts for any discrepancy between physical reality and the computer model. In an attempt to solve a fundamentally important (but often ignored) identifiability issue between the computer model parameters and the discrepancy function, the paper proposes a new and identifiable parameterization of the calibration problem. It also develops a two-step procedure for estimating all the relevant quantities under the new parameterization. This estimation procedure is shown to enjoy excellent rates ofmore » convergence and can be straightforwardly implemented with existing software. For uncertainty quantification, bootstrapping is adopted to construct confidence regions for the quantities of interest. As a result, the practical performance of the methodology is illustrated through simulation examples and an application to a computational fluid dynamics model.« less

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wong, Raymond K. W.; Storlie, Curtis Byron; Lee, Thomas C. M.

    The paper considers the computer model calibration problem and provides a general frequentist solution. Under the framework proposed, the data model is semiparametric with a non-parametric discrepancy function which accounts for any discrepancy between physical reality and the computer model. In an attempt to solve a fundamentally important (but often ignored) identifiability issue between the computer model parameters and the discrepancy function, the paper proposes a new and identifiable parameterization of the calibration problem. It also develops a two-step procedure for estimating all the relevant quantities under the new parameterization. This estimation procedure is shown to enjoy excellent rates ofmore » convergence and can be straightforwardly implemented with existing software. For uncertainty quantification, bootstrapping is adopted to construct confidence regions for the quantities of interest. As a result, the practical performance of the methodology is illustrated through simulation examples and an application to a computational fluid dynamics model.« less

  4. A parametric method for determining the number of signals in narrow-band direction finding

    NASA Astrophysics Data System (ADS)

    Wu, Qiang; Fuhrmann, Daniel R.

    1991-08-01

    A novel and more accurate method to determine the number of signals in the multisource direction finding problem is developed. The information-theoretic criteria of Yin and Krishnaiah (1988) are applied to a set of quantities which are evaluated from the log-likelihood function. Based on proven asymptotic properties of the maximum likelihood estimation, these quantities have the properties required by the criteria. Since the information-theoretic criteria use these quantities instead of the eigenvalues of the estimated correlation matrix, this approach possesses the advantage of not requiring a subjective threshold, and also provides higher performance than when eigenvalues are used. Simulation results are presented and compared to those obtained from the nonparametric method given by Wax and Kailath (1985).

  5. Efficient Bayesian hierarchical functional data analysis with basis function approximations using Gaussian-Wishart processes.

    PubMed

    Yang, Jingjing; Cox, Dennis D; Lee, Jong Soo; Ren, Peng; Choi, Taeryon

    2017-12-01

    Functional data are defined as realizations of random functions (mostly smooth functions) varying over a continuum, which are usually collected on discretized grids with measurement errors. In order to accurately smooth noisy functional observations and deal with the issue of high-dimensional observation grids, we propose a novel Bayesian method based on the Bayesian hierarchical model with a Gaussian-Wishart process prior and basis function representations. We first derive an induced model for the basis-function coefficients of the functional data, and then use this model to conduct posterior inference through Markov chain Monte Carlo methods. Compared to the standard Bayesian inference that suffers serious computational burden and instability in analyzing high-dimensional functional data, our method greatly improves the computational scalability and stability, while inheriting the advantage of simultaneously smoothing raw observations and estimating the mean-covariance functions in a nonparametric way. In addition, our method can naturally handle functional data observed on random or uncommon grids. Simulation and real studies demonstrate that our method produces similar results to those obtainable by the standard Bayesian inference with low-dimensional common grids, while efficiently smoothing and estimating functional data with random and high-dimensional observation grids when the standard Bayesian inference fails. In conclusion, our method can efficiently smooth and estimate high-dimensional functional data, providing one way to resolve the curse of dimensionality for Bayesian functional data analysis with Gaussian-Wishart processes. © 2017, The International Biometric Society.

  6. Nonparametric and Semiparametric Regression Estimation for Length-biased Survival Data

    PubMed Central

    Shen, Yu; Ning, Jing; Qin, Jing

    2016-01-01

    For the past several decades, nonparametric and semiparametric modeling for conventional right-censored survival data has been investigated intensively under a noninformative censoring mechanism. However, these methods may not be applicable for analyzing right-censored survival data that arise from prevalent cohorts when the failure times are subject to length-biased sampling. This review article is intended to provide a summary of some newly developed methods as well as established methods for analyzing length-biased data. PMID:27086362

  7. Trial-dependent psychometric functions accounting for perceptual learning in 2-AFC discrimination tasks.

    PubMed

    Kattner, Florian; Cochrane, Aaron; Green, C Shawn

    2017-09-01

    The majority of theoretical models of learning consider learning to be a continuous function of experience. However, most perceptual learning studies use thresholds estimated by fitting psychometric functions to independent blocks, sometimes then fitting a parametric function to these block-wise estimated thresholds. Critically, such approaches tend to violate the basic principle that learning is continuous through time (e.g., by aggregating trials into large "blocks" for analysis that each assume stationarity, then fitting learning functions to these aggregated blocks). To address this discrepancy between base theory and analysis practice, here we instead propose fitting a parametric function to thresholds from each individual trial. In particular, we implemented a dynamic psychometric function whose parameters were allowed to change continuously with each trial, thus parameterizing nonstationarity. We fit the resulting continuous time parametric model to data from two different perceptual learning tasks. In nearly every case, the quality of the fits derived from the continuous time parametric model outperformed the fits derived from a nonparametric approach wherein separate psychometric functions were fit to blocks of trials. Because such a continuous trial-dependent model of perceptual learning also offers a number of additional advantages (e.g., the ability to extrapolate beyond the observed data; the ability to estimate performance on individual critical trials), we suggest that this technique would be a useful addition to each psychophysicist's analysis toolkit.

  8. A Nonparametric Approach to Estimate Classification Accuracy and Consistency

    ERIC Educational Resources Information Center

    Lathrop, Quinn N.; Cheng, Ying

    2014-01-01

    When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…

  9. Stochastic Residual-Error Analysis For Estimating Hydrologic Model Predictive Uncertainty

    EPA Science Inventory

    A hybrid time series-nonparametric sampling approach, referred to herein as semiparametric, is presented for the estimation of model predictive uncertainty. The methodology is a two-step procedure whereby a distributed hydrologic model is first calibrated, then followed by brute ...

  10. Penalized spline estimation for functional coefficient regression models.

    PubMed

    Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

    2010-04-01

    The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.

  11. Exponential series approaches for nonparametric graphical models

    NASA Astrophysics Data System (ADS)

    Janofsky, Eric

    Markov Random Fields (MRFs) or undirected graphical models are parsimonious representations of joint probability distributions. This thesis studies high-dimensional, continuous-valued pairwise Markov Random Fields. We are particularly interested in approximating pairwise densities whose logarithm belongs to a Sobolev space. For this problem we propose the method of exponential series which approximates the log density by a finite-dimensional exponential family with the number of sufficient statistics increasing with the sample size. We consider two approaches to estimating these models. The first is regularized maximum likelihood. This involves optimizing the sum of the log-likelihood of the data and a sparsity-inducing regularizer. We then propose a variational approximation to the likelihood based on tree-reweighted, nonparametric message passing. This approximation allows for upper bounds on risk estimates, leverages parallelization and is scalable to densities on hundreds of nodes. We show how the regularized variational MLE may be estimated using a proximal gradient algorithm. We then consider estimation using regularized score matching. This approach uses an alternative scoring rule to the log-likelihood, which obviates the need to compute the normalizing constant of the distribution. For general continuous-valued exponential families, we provide parameter and edge consistency results. As a special case we detail a new approach to sparse precision matrix estimation which has statistical performance competitive with the graphical lasso and computational performance competitive with the state-of-the-art glasso algorithm. We then describe results for model selection in the nonparametric pairwise model using exponential series. The regularized score matching problem is shown to be a convex program; we provide scalable algorithms based on consensus alternating direction method of multipliers (ADMM) and coordinate-wise descent. We use simulations to compare our method to others in the literature as well as the aforementioned TRW estimator.

  12. Rasch Model Parameter Estimation in the Presence of a Nonnormal Latent Trait Using a Nonparametric Bayesian Approach

    ERIC Educational Resources Information Center

    Finch, Holmes; Edwards, Julianne M.

    2016-01-01

    Standard approaches for estimating item response theory (IRT) model parameters generally work under the assumption that the latent trait being measured by a set of items follows the normal distribution. Estimation of IRT parameters in the presence of nonnormal latent traits has been shown to generate biased person and item parameter estimates. A…

  13. Estimating distributions with increasing failure rate in an imperfect repair model.

    PubMed

    Kvam, Paul H; Singh, Harshinder; Whitaker, Lyn R

    2002-03-01

    A failed system is repaired minimally if after failure, it is restored to the working condition of an identical system of the same age. We extend the nonparametric maximum likelihood estimator (MLE) of a system's lifetime distribution function to test units that are known to have an increasing failure rate. Such items comprise a significant portion of working components in industry. The order-restricted MLE is shown to be consistent. Similar results hold for the Brown-Proschan imperfect repair model, which dictates that a failed component is repaired perfectly with some unknown probability, and is otherwise repaired minimally. The estimators derived are motivated and illustrated by failure data in the nuclear industry. Failure times for groups of emergency diesel generators and motor-driven pumps are analyzed using the order-restricted methods. The order-restricted estimators are consistent and show distinct differences from the ordinary MLEs. Simulation results suggest significant improvement in reliability estimation is available in many cases when component failure data exhibit the IFR property.

  14. Doubly robust estimation of generalized partial linear models for longitudinal data with dropouts.

    PubMed

    Lin, Huiming; Fu, Bo; Qin, Guoyou; Zhu, Zhongyi

    2017-12-01

    We develop a doubly robust estimation of generalized partial linear models for longitudinal data with dropouts. Our method extends the highly efficient aggregate unbiased estimating function approach proposed in Qu et al. (2010) to a doubly robust one in the sense that under missing at random (MAR), our estimator is consistent when either the linear conditional mean condition is satisfied or a model for the dropout process is correctly specified. We begin with a generalized linear model for the marginal mean, and then move forward to a generalized partial linear model, allowing for nonparametric covariate effect by using the regression spline smoothing approximation. We establish the asymptotic theory for the proposed method and use simulation studies to compare its finite sample performance with that of Qu's method, the complete-case generalized estimating equation (GEE) and the inverse-probability weighted GEE. The proposed method is finally illustrated using data from a longitudinal cohort study. © 2017, The International Biometric Society.

  15. VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS

    PubMed Central

    Huang, Jian; Horowitz, Joel L.; Wei, Fengrong

    2010-01-01

    We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method. PMID:21127739

  16. A comparative study of nonparametric methods for pattern recognition

    NASA Technical Reports Server (NTRS)

    Hahn, S. F.; Nelson, G. D.

    1972-01-01

    The applied research discussed in this report determines and compares the correct classification percentage of the nonparametric sign test, Wilcoxon's signed rank test, and K-class classifier with the performance of the Bayes classifier. The performance is determined for data which have Gaussian, Laplacian and Rayleigh probability density functions. The correct classification percentage is shown graphically for differences in modes and/or means of the probability density functions for four, eight and sixteen samples. The K-class classifier performed very well with respect to the other classifiers used. Since the K-class classifier is a nonparametric technique, it usually performed better than the Bayes classifier which assumes the data to be Gaussian even though it may not be. The K-class classifier has the advantage over the Bayes in that it works well with non-Gaussian data without having to determine the probability density function of the data. It should be noted that the data in this experiment was always unimodal.

  17. Determining the multi-scale hedge ratios of stock index futures using the lower partial moments method

    NASA Astrophysics Data System (ADS)

    Dai, Jun; Zhou, Haigang; Zhao, Shaoquan

    2017-01-01

    This paper considers a multi-scale future hedge strategy that minimizes lower partial moments (LPM). To do this, wavelet analysis is adopted to decompose time series data into different components. Next, different parametric estimation methods with known distributions are applied to calculate the LPM of hedged portfolios, which is the key to determining multi-scale hedge ratios over different time scales. Then these parametric methods are compared with the prevailing nonparametric kernel metric method. Empirical results indicate that in the China Securities Index 300 (CSI 300) index futures and spot markets, hedge ratios and hedge efficiency estimated by the nonparametric kernel metric method are inferior to those estimated by parametric hedging model based on the features of sequence distributions. In addition, if minimum-LPM is selected as a hedge target, the hedging periods, degree of risk aversion, and target returns can affect the multi-scale hedge ratios and hedge efficiency, respectively.

  18. Variable-Domain Functional Regression for Modeling ICU Data.

    PubMed

    Gellar, Jonathan E; Colantuoni, Elizabeth; Needham, Dale M; Crainiceanu, Ciprian M

    2014-12-01

    We introduce a class of scalar-on-function regression models with subject-specific functional predictor domains. The fundamental idea is to consider a bivariate functional parameter that depends both on the functional argument and on the width of the functional predictor domain. Both parametric and nonparametric models are introduced to fit the functional coefficient. The nonparametric model is theoretically and practically invariant to functional support transformation, or support registration. Methods were motivated by and applied to a study of association between daily measures of the Intensive Care Unit (ICU) Sequential Organ Failure Assessment (SOFA) score and two outcomes: in-hospital mortality, and physical impairment at hospital discharge among survivors. Methods are generally applicable to a large number of new studies that record a continuous variables over unequal domains.

  19. Type I Error Rates and Power Estimates of Selected Parametric and Nonparametric Tests of Scale.

    ERIC Educational Resources Information Center

    Olejnik, Stephen F.; Algina, James

    1987-01-01

    Estimated Type I Error rates and power are reported for the Brown-Forsythe, O'Brien, Klotz, and Siegal-Tukey procedures. The effect of aligning the data using deviations from group means or group medians is investigated. (RB)

  20. Accurate Biomass Estimation via Bayesian Adaptive Sampling

    NASA Technical Reports Server (NTRS)

    Wheeler, Kevin R.; Knuth, Kevin H.; Castle, Joseph P.; Lvov, Nikolay

    2005-01-01

    The following concepts were introduced: a) Bayesian adaptive sampling for solving biomass estimation; b) Characterization of MISR Rahman model parameters conditioned upon MODIS landcover. c) Rigorous non-parametric Bayesian approach to analytic mixture model determination. d) Unique U.S. asset for science product validation and verification.

  1. Nonparametric predictive inference for combining diagnostic tests with parametric copula

    NASA Astrophysics Data System (ADS)

    Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.

    2017-09-01

    Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.

  2. Extending the Distributed Lag Model framework to handle chemical mixtures.

    PubMed

    Bello, Ghalib A; Arora, Manish; Austin, Christine; Horton, Megan K; Wright, Robert O; Gennings, Chris

    2017-07-01

    Distributed Lag Models (DLMs) are used in environmental health studies to analyze the time-delayed effect of an exposure on an outcome of interest. Given the increasing need for analytical tools for evaluation of the effects of exposure to multi-pollutant mixtures, this study attempts to extend the classical DLM framework to accommodate and evaluate multiple longitudinally observed exposures. We introduce 2 techniques for quantifying the time-varying mixture effect of multiple exposures on an outcome of interest. Lagged WQS, the first technique, is based on Weighted Quantile Sum (WQS) regression, a penalized regression method that estimates mixture effects using a weighted index. We also introduce Tree-based DLMs, a nonparametric alternative for assessment of lagged mixture effects. This technique is based on the Random Forest (RF) algorithm, a nonparametric, tree-based estimation technique that has shown excellent performance in a wide variety of domains. In a simulation study, we tested the feasibility of these techniques and evaluated their performance in comparison to standard methodology. Both methods exhibited relatively robust performance, accurately capturing pre-defined non-linear functional relationships in different simulation settings. Further, we applied these techniques to data on perinatal exposure to environmental metal toxicants, with the goal of evaluating the effects of exposure on neurodevelopment. Our methods identified critical neurodevelopmental windows showing significant sensitivity to metal mixtures. Copyright © 2017 Elsevier Inc. All rights reserved.

  3. The current duration design for estimating the time to pregnancy distribution: a nonparametric Bayesian perspective.

    PubMed

    Gasbarra, Dario; Arjas, Elja; Vehtari, Aki; Slama, Rémy; Keiding, Niels

    2015-10-01

    This paper was inspired by the studies of Niels Keiding and co-authors on estimating the waiting time-to-pregnancy (TTP) distribution, and in particular on using the current duration design in that context. In this design, a cross-sectional sample of women is collected from those who are currently attempting to become pregnant, and then by recording from each the time she has been attempting. Our aim here is to study the identifiability and the estimation of the waiting time distribution on the basis of current duration data. The main difficulty in this stems from the fact that very short waiting times are only rarely selected into the sample of current durations, and this renders their estimation unstable. We introduce here a Bayesian method for this estimation problem, prove its asymptotic consistency, and compare the method to some variants of the non-parametric maximum likelihood estimators, which have been used previously in this context. The properties of the Bayesian estimation method are studied also empirically, using both simulated data and TTP data on current durations collected by Slama et al. (Hum Reprod 27(5):1489-1498, 2012).

  4. Modeling animal movements using stochastic differential equations

    Treesearch

    Haiganoush K. Preisler; Alan A. Ager; Bruce K. Johnson; John G. Kie

    2004-01-01

    We describe the use of bivariate stochastic differential equations (SDE) for modeling movements of 216 radiocollared female Rocky Mountain elk at the Starkey Experimental Forest and Range in northeastern Oregon. Spatially and temporally explicit vector fields were estimated using approximating difference equations and nonparametric regression techniques. Estimated...

  5. The Infinitesimal Jackknife with Exploratory Factor Analysis

    ERIC Educational Resources Information Center

    Zhang, Guangjian; Preacher, Kristopher J.; Jennrich, Robert I.

    2012-01-01

    The infinitesimal jackknife, a nonparametric method for estimating standard errors, has been used to obtain standard error estimates in covariance structure analysis. In this article, we adapt it for obtaining standard errors for rotated factor loadings and factor correlations in exploratory factor analysis with sample correlation matrices. Both…

  6. A new adaptive algorithm for automated feature extraction in exponentially damped signals for health monitoring of smart structures

    NASA Astrophysics Data System (ADS)

    Qarib, Hossein; Adeli, Hojjat

    2015-12-01

    In this paper authors introduce a new adaptive signal processing technique for feature extraction and parameter estimation in noisy exponentially damped signals. The iterative 3-stage method is based on the adroit integration of the strengths of parametric and nonparametric methods such as multiple signal categorization, matrix pencil, and empirical mode decomposition algorithms. The first stage is a new adaptive filtration or noise removal scheme. The second stage is a hybrid parametric-nonparametric signal parameter estimation technique based on an output-only system identification technique. The third stage is optimization of estimated parameters using a combination of the primal-dual path-following interior point algorithm and genetic algorithm. The methodology is evaluated using a synthetic signal and a signal obtained experimentally from transverse vibrations of a steel cantilever beam. The method is successful in estimating the frequencies accurately. Further, it estimates the damping exponents. The proposed adaptive filtration method does not include any frequency domain manipulation. Consequently, the time domain signal is not affected as a result of frequency domain and inverse transformations.

  7. Application of nonparametric regression methods to study the relationship between NO2 concentrations and local wind direction and speed at background sites.

    PubMed

    Donnelly, Aoife; Misstear, Bruce; Broderick, Brian

    2011-02-15

    Background concentrations of nitrogen dioxide (NO(2)) are not constant but vary temporally and spatially. The current paper presents a powerful tool for the quantification of the effects of wind direction and wind speed on background NO(2) concentrations, particularly in cases where monitoring data are limited. In contrast to previous studies which applied similar methods to sites directly affected by local pollution sources, the current study focuses on background sites with the aim of improving methods for predicting background concentrations adopted in air quality modelling studies. The relationship between measured NO(2) concentration in air at three such sites in Ireland and locally measured wind direction has been quantified using nonparametric regression methods. The major aim was to analyse a method for quantifying the effects of local wind direction on background levels of NO(2) in Ireland. The method was expanded to include wind speed as an added predictor variable. A Gaussian kernel function is used in the analysis and circular statistics employed for the wind direction variable. Wind direction and wind speed were both found to have a statistically significant effect on background levels of NO(2) at all three sites. Frequently environmental impact assessments are based on short term baseline monitoring producing a limited dataset. The presented non-parametric regression methods, in contrast to the frequently used methods such as binning of the data, allow concentrations for missing data pairs to be estimated and distinction between spurious and true peaks in concentrations to be made. The methods were found to provide a realistic estimation of long term concentration variation with wind direction and speed, even for cases where the data set is limited. Accurate identification of the actual variation at each location and causative factors could be made, thus supporting the improved definition of background concentrations for use in air quality modelling studies. Copyright © 2010 Elsevier B.V. All rights reserved.

  8. Generating Health Estimates by Zip Code: A Semiparametric Small Area Estimation Approach Using the California Health Interview Survey.

    PubMed

    Wang, Yueyan; Ponce, Ninez A; Wang, Pan; Opsomer, Jean D; Yu, Hongjian

    2015-12-01

    We propose a method to meet challenges in generating health estimates for granular geographic areas in which the survey sample size is extremely small. Our generalized linear mixed model predicts health outcomes using both individual-level and neighborhood-level predictors. The model's feature of nonparametric smoothing function on neighborhood-level variables better captures the association between neighborhood environment and the outcome. Using 2011 to 2012 data from the California Health Interview Survey, we demonstrate an empirical application of this method to estimate the fraction of residents without health insurance for Zip Code Tabulation Areas (ZCTAs). Our method generated stable estimates of uninsurance for 1519 of 1765 ZCTAs (86%) in California. For some areas with great socioeconomic diversity across adjacent neighborhoods, such as Los Angeles County, the modeled uninsured estimates revealed much heterogeneity among geographically adjacent ZCTAs. The proposed method can increase the value of health surveys by providing modeled estimates for health data at a granular geographic level. It can account for variations in health outcomes at the neighborhood level as a result of both socioeconomic characteristics and geographic locations.

  9. A comparison of selected parametric and non-parametric imputation methods for estimating forest biomass and basal area

    Treesearch

    Donald Gagliasso; Susan Hummel; Hailemariam Temesgen

    2014-01-01

    Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate estimation method, land managers might end up with incorrect biomass estimate maps, which could lead them to make poorer decisions in their future...

  10. Estimating technical efficiency in the hospital sector with panel data: a comparison of parametric and non-parametric techniques.

    PubMed

    Siciliani, Luigi

    2006-01-01

    Policy makers are increasingly interested in developing performance indicators that measure hospital efficiency. These indicators may give the purchasers of health services an additional regulatory tool to contain health expenditure. Using panel data, this study compares different parametric (econometric) and non-parametric (linear programming) techniques for the measurement of a hospital's technical efficiency. This comparison was made using a sample of 17 Italian hospitals in the years 1996-9. Highest correlations are found in the efficiency scores between the non-parametric data envelopment analysis under the constant returns to scale assumption (DEA-CRS) and several parametric models. Correlation reduces markedly when using more flexible non-parametric specifications such as data envelopment analysis under the variable returns to scale assumption (DEA-VRS) and the free disposal hull (FDH) model. Correlation also generally reduces when moving from one output to two-output specifications. This analysis suggests that there is scope for developing performance indicators at hospital level using panel data, but it is important that extensive sensitivity analysis is carried out if purchasers wish to make use of these indicators in practice.

  11. Regional vertical total electron content (VTEC) modeling together with satellite and receiver differential code biases (DCBs) using semi-parametric multivariate adaptive regression B-splines (SP-BMARS)

    NASA Astrophysics Data System (ADS)

    Durmaz, Murat; Karslioglu, Mahmut Onur

    2015-04-01

    There are various global and regional methods that have been proposed for the modeling of ionospheric vertical total electron content (VTEC). Global distribution of VTEC is usually modeled by spherical harmonic expansions, while tensor products of compactly supported univariate B-splines can be used for regional modeling. In these empirical parametric models, the coefficients of the basis functions as well as differential code biases (DCBs) of satellites and receivers can be treated as unknown parameters which can be estimated from geometry-free linear combinations of global positioning system observables. In this work we propose a new semi-parametric multivariate adaptive regression B-splines (SP-BMARS) method for the regional modeling of VTEC together with satellite and receiver DCBs, where the parametric part of the model is related to the DCBs as fixed parameters and the non-parametric part adaptively models the spatio-temporal distribution of VTEC. The latter is based on multivariate adaptive regression B-splines which is a non-parametric modeling technique making use of compactly supported B-spline basis functions that are generated from the observations automatically. This algorithm takes advantage of an adaptive scale-by-scale model building strategy that searches for best-fitting B-splines to the data at each scale. The VTEC maps generated from the proposed method are compared numerically and visually with the global ionosphere maps (GIMs) which are provided by the Center for Orbit Determination in Europe (CODE). The VTEC values from SP-BMARS and CODE GIMs are also compared with VTEC values obtained through calibration using local ionospheric model. The estimated satellite and receiver DCBs from the SP-BMARS model are compared with the CODE distributed DCBs. The results show that the SP-BMARS algorithm can be used to estimate satellite and receiver DCBs while adaptively and flexibly modeling the daily regional VTEC.

  12. Bayesian Nonparametric Inference – Why and How

    PubMed Central

    Müller, Peter; Mitra, Riten

    2013-01-01

    We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932

  13. Nonparametric identification of nonlinear dynamic systems using a synchronisation-based method

    NASA Astrophysics Data System (ADS)

    Kenderi, Gábor; Fidlin, Alexander

    2014-12-01

    The present study proposes an identification method for highly nonlinear mechanical systems that does not require a priori knowledge of the underlying nonlinearities to reconstruct arbitrary restoring force surfaces between degrees of freedom. This approach is based on the master-slave synchronisation between a dynamic model of the system as the slave and the real system as the master using measurements of the latter. As the model synchronises to the measurements, it becomes an observer of the real system. The optimal observer algorithm in a least-squares sense is given by the Kalman filter. Using the well-known state augmentation technique, the Kalman filter can be turned into a dual state and parameter estimator to identify parameters of a priori characterised nonlinearities. The paper proposes an extension of this technique towards nonparametric identification. A general system model is introduced by describing the restoring forces as bilateral spring-dampers with time-variant coefficients, which are estimated as augmented states. The estimation procedure is followed by an a posteriori statistical analysis to reconstruct noise-free restoring force characteristics using the estimated states and their estimated variances. Observability is provided using only one measured mechanical quantity per degree of freedom, which makes this approach less demanding in the number of necessary measurement signals compared with truly nonparametric solutions, which typically require displacement, velocity and acceleration signals. Additionally, due to the statistical rigour of the procedure, it successfully addresses signals corrupted by significant measurement noise. In the present paper, the method is described in detail, which is followed by numerical examples of one degree of freedom (1DoF) and 2DoF mechanical systems with strong nonlinearities of vibro-impact type to demonstrate the effectiveness of the proposed technique.

  14. Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records.

    PubMed

    Cheung, Li C; Pan, Qing; Hyun, Noorie; Schiffman, Mark; Fetterman, Barbara; Castle, Philip E; Lorey, Thomas; Katki, Hormuzd A

    2017-09-30

    For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within large health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and there can be pre-existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan-Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. We propose a general family of mixture models for undiagnosed prevalent disease and interval-censored incident disease that we call prevalence-incidence models. Parameters for parametric prevalence-incidence models, such as the logistic regression and Weibull survival (logistic-Weibull) model, are estimated by direct likelihood maximization or by EM algorithm. Non-parametric methods are proposed to calculate cumulative risks for cases without covariates. We compare naive Kaplan-Meier, logistic-Weibull, and non-parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan-Meier provided poor estimates while the logistic-Weibull model was a close fit to the non-parametric. Our findings support our use of logistic-Weibull models to develop the risk estimates that underlie current US risk-based cervical cancer screening guidelines. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

  15. Considerations in cross-validation type density smoothing with a look at some data

    NASA Technical Reports Server (NTRS)

    Schuster, E. F.

    1982-01-01

    Experience gained in applying nonparametric maximum likelihood techniques of density estimation to judge the comparative quality of various estimators is reported. Two invariate data sets of one hundered samples (one Cauchy, one natural normal) are considered as well as studies in the multivariate case.

  16. A nonparametric clustering technique which estimates the number of clusters

    NASA Technical Reports Server (NTRS)

    Ramey, D. B.

    1983-01-01

    In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.

  17. Nonparametric Signal Extraction and Measurement Error in the Analysis of Electroencephalographic Activity During Sleep

    PubMed Central

    Crainiceanu, Ciprian M.; Caffo, Brian S.; Di, Chong-Zhi; Punjabi, Naresh M.

    2009-01-01

    We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep. Using subject specific signals estimated with known variability in a second level regression becomes a nonstandard measurement error problem. We propose and implement methods that take into account cross-sectional and longitudinal measurement error. The research presented here forms the basis for EEG signal processing for the SHHS. PMID:20057925

  18. The two-sample problem with induced dependent censorship.

    PubMed

    Huang, Y

    1999-12-01

    Induced dependent censorship is a general phenomenon in health service evaluation studies in which a measure such as quality-adjusted survival time or lifetime medical cost is of interest. We investigate the two-sample problem and propose two classes of nonparametric tests. Based on consistent estimation of the survival function for each sample, the two classes of test statistics examine the cumulative weighted difference in hazard functions and in survival functions. We derive a unified asymptotic null distribution theory and inference procedure. The tests are applied to trial V of the International Breast Cancer Study Group and show that long duration chemotherapy significantly improves time without symptoms of disease and toxicity of treatment as compared with the short duration treatment. Simulation studies demonstrate that the proposed tests, with a wide range of weight choices, perform well under moderate sample sizes.

  19. Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

    PubMed

    Ishwaran, Hemant; Lu, Min

    2018-06-04

    Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.

  20. Efficient multidimensional regularization for Volterra series estimation

    NASA Astrophysics Data System (ADS)

    Birpoutsoukis, Georgios; Csurcsia, Péter Zoltán; Schoukens, Johan

    2018-05-01

    This paper presents an efficient nonparametric time domain nonlinear system identification method. It is shown how truncated Volterra series models can be efficiently estimated without the need of long, transient-free measurements. The method is a novel extension of the regularization methods that have been developed for impulse response estimates of linear time invariant systems. To avoid the excessive memory needs in case of long measurements or large number of estimated parameters, a practical gradient-based estimation method is also provided, leading to the same numerical results as the proposed Volterra estimation method. Moreover, the transient effects in the simulated output are removed by a special regularization method based on the novel ideas of transient removal for Linear Time-Varying (LTV) systems. Combining the proposed methodologies, the nonparametric Volterra models of the cascaded water tanks benchmark are presented in this paper. The results for different scenarios varying from a simple Finite Impulse Response (FIR) model to a 3rd degree Volterra series with and without transient removal are compared and studied. It is clear that the obtained models capture the system dynamics when tested on a validation dataset, and their performance is comparable with the white-box (physical) models.

  1. Estimating and Identifying Unspecified Correlation Structure for Longitudinal Data

    PubMed Central

    Hu, Jianhua; Wang, Peng; Qu, Annie

    2014-01-01

    Identifying correlation structure is important to achieving estimation efficiency in analyzing longitudinal data, and is also crucial for drawing valid statistical inference for large size clustered data. In this paper, we propose a nonparametric method to estimate the correlation structure, which is applicable for discrete longitudinal data. We utilize eigenvector-based basis matrices to approximate the inverse of the empirical correlation matrix and determine the number of basis matrices via model selection. A penalized objective function based on the difference between the empirical and model approximation of the correlation matrices is adopted to select an informative structure for the correlation matrix. The eigenvector representation of the correlation estimation is capable of reducing the risk of model misspecification, and also provides useful information on the specific within-cluster correlation pattern of the data. We show that the proposed method possesses the oracle property and selects the true correlation structure consistently. The proposed method is illustrated through simulations and two data examples on air pollution and sonar signal studies. PMID:26361433

  2. Posterior consistency in conditional distribution estimation

    PubMed Central

    Pati, Debdeep; Dunson, David B.; Tokdar, Surya T.

    2014-01-01

    A wide variety of priors have been proposed for nonparametric Bayesian estimation of conditional distributions, and there is a clear need for theorems providing conditions on the prior for large support, as well as posterior consistency. Estimation of an uncountable collection of conditional distributions across different regions of the predictor space is a challenging problem, which differs in some important ways from density and mean regression estimation problems. Defining various topologies on the space of conditional distributions, we provide sufficient conditions for posterior consistency focusing on a broad class of priors formulated as predictor-dependent mixtures of Gaussian kernels. This theory is illustrated by showing that the conditions are satisfied for a class of generalized stick-breaking process mixtures in which the stick-breaking lengths are monotone, differentiable functions of a continuous stochastic process. We also provide a set of sufficient conditions for the case where stick-breaking lengths are predictor independent, such as those arising from a fixed Dirichlet process prior. PMID:25067858

  3. Water Residence Time estimation by 1D deconvolution in the form of a l2 -regularized inverse problem with smoothness, positivity and causality constraints

    NASA Astrophysics Data System (ADS)

    Meresescu, Alina G.; Kowalski, Matthieu; Schmidt, Frédéric; Landais, François

    2018-06-01

    The Water Residence Time distribution is the equivalent of the impulse response of a linear system allowing the propagation of water through a medium, e.g. the propagation of rain water from the top of the mountain towards the aquifers. We consider the output aquifer levels as the convolution between the input rain levels and the Water Residence Time, starting with an initial aquifer base level. The estimation of Water Residence Time is important for a better understanding of hydro-bio-geochemical processes and mixing properties of wetlands used as filters in ecological applications, as well as protecting fresh water sources for wells from pollutants. Common methods of estimating the Water Residence Time focus on cross-correlation, parameter fitting and non-parametric deconvolution methods. Here we propose a 1D full-deconvolution, regularized, non-parametric inverse problem algorithm that enforces smoothness and uses constraints of causality and positivity to estimate the Water Residence Time curve. Compared to Bayesian non-parametric deconvolution approaches, it has a fast runtime per test case; compared to the popular and fast cross-correlation method, it produces a more precise Water Residence Time curve even in the case of noisy measurements. The algorithm needs only one regularization parameter to balance between smoothness of the Water Residence Time and accuracy of the reconstruction. We propose an approach on how to automatically find a suitable value of the regularization parameter from the input data only. Tests on real data illustrate the potential of this method to analyze hydrological datasets.

  4. Researches of fruit quality prediction model based on near infrared spectrum

    NASA Astrophysics Data System (ADS)

    Shen, Yulin; Li, Lian

    2018-04-01

    With the improvement in standards for food quality and safety, people pay more attention to the internal quality of fruits, therefore the measurement of fruit internal quality is increasingly imperative. In general, nondestructive soluble solid content (SSC) and total acid content (TAC) analysis of fruits is vital and effective for quality measurement in global fresh produce markets, so in this paper, we aim at establishing a novel fruit internal quality prediction model based on SSC and TAC for Near Infrared Spectrum. Firstly, the model of fruit quality prediction based on PCA + BP neural network, PCA + GRNN network, PCA + BP adaboost strong classifier, PCA + ELM and PCA + LS_SVM classifier are designed and implemented respectively; then, in the NSCT domain, the median filter and the SavitzkyGolay filter are used to preprocess the spectral signal, Kennard-Stone algorithm is used to automatically select the training samples and test samples; thirdly, we achieve the optimal models by comparing 15 kinds of prediction model based on the theory of multi-classifier competition mechanism, specifically, the non-parametric estimation is introduced to measure the effectiveness of proposed model, the reliability and variance of nonparametric estimation evaluation of each prediction model to evaluate the prediction result, while the estimated value and confidence interval regard as a reference, the experimental results demonstrate that this model can better achieve the optimal evaluation of the internal quality of fruit; finally, we employ cat swarm optimization to optimize two optimal models above obtained from nonparametric estimation, empirical testing indicates that the proposed method can provide more accurate and effective results than other forecasting methods.

  5. R package PRIMsrc: Bump Hunting by Patient Rule Induction Method for Survival, Regression and Classification

    PubMed Central

    Dazard, Jean-Eudes; Choe, Michael; LeBlanc, Michael; Rao, J. Sunil

    2015-01-01

    PRIMsrc is a novel implementation of a non-parametric bump hunting procedure, based on the Patient Rule Induction Method (PRIM), offering a unified treatment of outcome variables, including censored time-to-event (Survival), continuous (Regression) and discrete (Classification) responses. To fit the model, it uses a recursive peeling procedure with specific peeling criteria and stopping rules depending on the response. To validate the model, it provides an objective function based on prediction-error or other specific statistic, as well as two alternative cross-validation techniques, adapted to the task of decision-rule making and estimation in the three types of settings. PRIMsrc comes as an open source R package, including at this point: (i) a main function for fitting a Survival Bump Hunting model with various options allowing cross-validated model selection to control model size (#covariates) and model complexity (#peeling steps) and generation of cross-validated end-point estimates; (ii) parallel computing; (iii) various S3-generic and specific plotting functions for data visualization, diagnostic, prediction, summary and display of results. It is available on CRAN and GitHub. PMID:26798326

  6. A Local Agreement Pattern Measure Based on Hazard Functions for Survival Outcomes

    PubMed Central

    Dai, Tian; Guo, Ying; Peng, Limin; Manatunga, Amita K.

    2017-01-01

    Summary Assessing agreement is often of interest in biomedical and clinical research when measurements are obtained on the same subjects by different raters or methods. Most classical agreement methods have been focused on global summary statistics, which cannot be used to describe various local agreement patterns. The objective of this work is to study the local agreement pattern between two continuous measurements subject to censoring. In this paper, we propose a new agreement measure based on bivariate hazard functions to characterize the local agreement pattern between two correlated survival outcomes. The proposed measure naturally accommodates censored observations, fully captures the dependence structure between bivariate survival times and provides detailed information on how the strength of agreement evolves over time. We develop a nonparametric estimation method for the proposed local agreement pattern measure and study theoretical properties including strong consistency and asymptotical normality. We then evaluate the performance of the estimator through simulation studies and illustrate the method using a prostate cancer data example. PMID:28724196

  7. A local agreement pattern measure based on hazard functions for survival outcomes.

    PubMed

    Dai, Tian; Guo, Ying; Peng, Limin; Manatunga, Amita K

    2018-03-01

    Assessing agreement is often of interest in biomedical and clinical research when measurements are obtained on the same subjects by different raters or methods. Most classical agreement methods have been focused on global summary statistics, which cannot be used to describe various local agreement patterns. The objective of this work is to study the local agreement pattern between two continuous measurements subject to censoring. In this article, we propose a new agreement measure based on bivariate hazard functions to characterize the local agreement pattern between two correlated survival outcomes. The proposed measure naturally accommodates censored observations, fully captures the dependence structure between bivariate survival times and provides detailed information on how the strength of agreement evolves over time. We develop a nonparametric estimation method for the proposed local agreement pattern measure and study theoretical properties including strong consistency and asymptotical normality. We then evaluate the performance of the estimator through simulation studies and illustrate the method using a prostate cancer data example. © 2017, The International Biometric Society.

  8. Penalized nonparametric scalar-on-function regression via principal coordinates

    PubMed Central

    Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu

    2016-01-01

    A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963

  9. Estimators for Clustered Education RCTs Using the Neyman Model for Causal Inference

    ERIC Educational Resources Information Center

    Schochet, Peter Z.

    2013-01-01

    This article examines the estimation of two-stage clustered designs for education randomized control trials (RCTs) using the nonparametric Neyman causal inference framework that underlies experiments. The key distinction between the considered causal models is whether potential treatment and control group outcomes are considered to be fixed for…

  10. Forest Stand Canopy Structure Attribute Estimation from High Resolution Digital Airborne Imagery

    Treesearch

    Demetrios Gatziolis

    2006-01-01

    A study of forest stand canopy variable assessment using digital, airborne, multispectral imagery is presented. Variable estimation involves stem density, canopy closure, and mean crown diameter, and it is based on quantification of spatial autocorrelation among pixel digital numbers (DN) using variogram analysis and an alternative, non-parametric approach known as...

  11. Amazon plant diversity revealed by a taxonomically verified species list.

    PubMed

    Cardoso, Domingos; Särkinen, Tiina; Alexander, Sara; Amorim, André M; Bittrich, Volker; Celis, Marcela; Daly, Douglas C; Fiaschi, Pedro; Funk, Vicki A; Giacomin, Leandro L; Goldenberg, Renato; Heiden, Gustavo; Iganci, João; Kelloff, Carol L; Knapp, Sandra; Cavalcante de Lima, Haroldo; Machado, Anderson F P; Dos Santos, Rubens Manoel; Mello-Silva, Renato; Michelangeli, Fabián A; Mitchell, John; Moonlight, Peter; de Moraes, Pedro Luís Rodrigues; Mori, Scott A; Nunes, Teonildes Sacramento; Pennington, Terry D; Pirani, José Rubens; Prance, Ghillean T; de Queiroz, Luciano Paganucci; Rapini, Alessandro; Riina, Ricarda; Rincon, Carlos Alberto Vargas; Roque, Nádia; Shimizu, Gustavo; Sobral, Marcos; Stehmann, João Renato; Stevens, Warren D; Taylor, Charlotte M; Trovó, Marcelo; van den Berg, Cássio; van der Werff, Henk; Viana, Pedro Lage; Zartman, Charles E; Forzza, Rafaela Campostrini

    2017-10-03

    Recent debates on the number of plant species in the vast lowland rain forests of the Amazon have been based largely on model estimates, neglecting published checklists based on verified voucher data. Here we collate taxonomically verified checklists to present a list of seed plant species from lowland Amazon rain forests. Our list comprises 14,003 species, of which 6,727 are trees. These figures are similar to estimates derived from nonparametric ecological models, but they contrast strongly with predictions of much higher tree diversity derived from parametric models. Based on the known proportion of tree species in neotropical lowland rain forest communities as measured in complete plot censuses, and on overall estimates of seed plant diversity in Brazil and in the neotropics in general, it is more likely that tree diversity in the Amazon is closer to the lower estimates derived from nonparametric models. Much remains unknown about Amazonian plant diversity, but this taxonomically verified dataset provides a valid starting point for macroecological and evolutionary studies aimed at understanding the origin, evolution, and ecology of the exceptional biodiversity of Amazonian forests.

  12. Estimation of a partially linear additive model for data from an outcome-dependent sampling design with a continuous outcome

    PubMed Central

    Tan, Ziwen; Qin, Guoyou; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) designs have been well recognized as a cost-effective way to enhance study efficiency in both statistical literature and biomedical and epidemiologic studies. A partially linear additive model (PLAM) is widely applied in real problems because it allows for a flexible specification of the dependence of the response on some covariates in a linear fashion and other covariates in a nonlinear non-parametric fashion. Motivated by an epidemiological study investigating the effect of prenatal polychlorinated biphenyls exposure on children's intelligence quotient (IQ) at age 7 years, we propose a PLAM in this article to investigate a more flexible non-parametric inference on the relationships among the response and covariates under the ODS scheme. We propose the estimation method and establish the asymptotic properties of the proposed estimator. Simulation studies are conducted to show the improved efficiency of the proposed ODS estimator for PLAM compared with that from a traditional simple random sampling design with the same sample size. The data of the above-mentioned study is analyzed to illustrate the proposed method. PMID:27006375

  13. Amazon plant diversity revealed by a taxonomically verified species list

    PubMed Central

    Cardoso, Domingos; Särkinen, Tiina; Alexander, Sara; Amorim, André M.; Bittrich, Volker; Celis, Marcela; Daly, Douglas C.; Fiaschi, Pedro; Funk, Vicki A.; Giacomin, Leandro L.; Heiden, Gustavo; Iganci, João; Kelloff, Carol L.; Knapp, Sandra; Cavalcante de Lima, Haroldo; Machado, Anderson F. P.; dos Santos, Rubens Manoel; Mello-Silva, Renato; Michelangeli, Fabián A.; Mitchell, John; Moonlight, Peter; de Moraes, Pedro Luís Rodrigues; Mori, Scott A.; Nunes, Teonildes Sacramento; Pennington, Terry D.; Pirani, José Rubens; Prance, Ghillean T.; de Queiroz, Luciano Paganucci; Rapini, Alessandro; Rincon, Carlos Alberto Vargas; Roque, Nádia; Shimizu, Gustavo; Sobral, Marcos; Stehmann, João Renato; Stevens, Warren D.; Taylor, Charlotte M.; Trovó, Marcelo; van den Berg, Cássio; van der Werff, Henk; Viana, Pedro Lage; Zartman, Charles E.; Forzza, Rafaela Campostrini

    2017-01-01

    Recent debates on the number of plant species in the vast lowland rain forests of the Amazon have been based largely on model estimates, neglecting published checklists based on verified voucher data. Here we collate taxonomically verified checklists to present a list of seed plant species from lowland Amazon rain forests. Our list comprises 14,003 species, of which 6,727 are trees. These figures are similar to estimates derived from nonparametric ecological models, but they contrast strongly with predictions of much higher tree diversity derived from parametric models. Based on the known proportion of tree species in neotropical lowland rain forest communities as measured in complete plot censuses, and on overall estimates of seed plant diversity in Brazil and in the neotropics in general, it is more likely that tree diversity in the Amazon is closer to the lower estimates derived from nonparametric models. Much remains unknown about Amazonian plant diversity, but this taxonomically verified dataset provides a valid starting point for macroecological and evolutionary studies aimed at understanding the origin, evolution, and ecology of the exceptional biodiversity of Amazonian forests. PMID:28923966

  14. Deciphering the enigma of undetected species, phylogenetic, and functional diversity based on Good-Turing theory.

    PubMed

    Chao, Anne; Chiu, Chun-Huo; Colwell, Robert K; Magnago, Luiz Fernando S; Chazdon, Robin L; Gotelli, Nicholas J

    2017-11-01

    Estimating the species, phylogenetic, and functional diversity of a community is challenging because rare species are often undetected, even with intensive sampling. The Good-Turing frequency formula, originally developed for cryptography, estimates in an ecological context the true frequencies of rare species in a single assemblage based on an incomplete sample of individuals. Until now, this formula has never been used to estimate undetected species, phylogenetic, and functional diversity. Here, we first generalize the Good-Turing formula to incomplete sampling of two assemblages. The original formula and its two-assemblage generalization provide a novel and unified approach to notation, terminology, and estimation of undetected biological diversity. For species richness, the Good-Turing framework offers an intuitive way to derive the non-parametric estimators of the undetected species richness in a single assemblage, and of the undetected species shared between two assemblages. For phylogenetic diversity, the unified approach leads to an estimator of the undetected Faith's phylogenetic diversity (PD, the total length of undetected branches of a phylogenetic tree connecting all species), as well as a new estimator of undetected PD shared between two phylogenetic trees. For functional diversity based on species traits, the unified approach yields a new estimator of undetected Walker et al.'s functional attribute diversity (FAD, the total species-pairwise functional distance) in a single assemblage, as well as a new estimator of undetected FAD shared between two assemblages. Although some of the resulting estimators have been previously published (but derived with traditional mathematical inequalities), all taxonomic, phylogenetic, and functional diversity estimators are now derived under the same framework. All the derived estimators are theoretically lower bounds of the corresponding undetected diversities; our approach reveals the sufficient conditions under which the estimators are nearly unbiased, thus offering new insights. Simulation results are reported to numerically verify the performance of the derived estimators. We illustrate all estimators and assess their sampling uncertainty with an empirical dataset for Brazilian rain forest trees. These estimators should be widely applicable to many current problems in ecology, such as the effects of climate change on spatial and temporal beta diversity and the contribution of trait diversity to ecosystem multi-functionality. © 2017 by the Ecological Society of America.

  15. How to Deal with Interval-Censored Data Practically while Assessing the Progression-Free Survival: A Step-by-Step Guide Using SAS and R Software.

    PubMed

    Dugué, Audrey Emmanuelle; Pulido, Marina; Chabaud, Sylvie; Belin, Lisa; Gal, Jocelyn

    2016-12-01

    We describe how to estimate progression-free survival while dealing with interval-censored data in the setting of clinical trials in oncology. Three procedures with SAS and R statistical software are described: one allowing for a nonparametric maximum likelihood estimation of the survival curve using the EM-ICM (Expectation and Maximization-Iterative Convex Minorant) algorithm as described by Wellner and Zhan in 1997; a sensitivity analysis procedure in which the progression time is assigned (i) at the midpoint, (ii) at the upper limit (reflecting the standard analysis when the progression time is assigned at the first radiologic exam showing progressive disease), or (iii) at the lower limit of the censoring interval; and finally, two multiple imputations are described considering a uniform or the nonparametric maximum likelihood estimation (NPMLE) distribution. Clin Cancer Res; 22(23); 5629-35. ©2016 AACR. ©2016 American Association for Cancer Research.

  16. Comparison of estimators of standard deviation for hydrologic time series

    USGS Publications Warehouse

    Tasker, Gary D.; Gilroy, Edward J.

    1982-01-01

    Unbiasing factors as a function of serial correlation, ρ, and sample size, n for the sample standard deviation of a lag one autoregressive model were generated by random number simulation. Monte Carlo experiments were used to compare the performance of several alternative methods for estimating the standard deviation σ of a lag one autoregressive model in terms of bias, root mean square error, probability of underestimation, and expected opportunity design loss. Three methods provided estimates of σ which were much less biased but had greater mean square errors than the usual estimate of σ: s = (1/(n - 1) ∑ (xi −x¯)2)½. The three methods may be briefly characterized as (1) a method using a maximum likelihood estimate of the unbiasing factor, (2) a method using an empirical Bayes estimate of the unbiasing factor, and (3) a robust nonparametric estimate of σ suggested by Quenouille. Because s tends to underestimate σ, its use as an estimate of a model parameter results in a tendency to underdesign. If underdesign losses are considered more serious than overdesign losses, then the choice of one of the less biased methods may be wise.

  17. Robust point matching via vector field consensus.

    PubMed

    Jiayi Ma; Ji Zhao; Jinwen Tian; Yuille, Alan L; Zhuowen Tu

    2014-04-01

    In this paper, we propose an efficient algorithm, called vector field consensus, for establishing robust point correspondences between two sets of points. Our algorithm starts by creating a set of putative correspondences which can contain a very large number of false correspondences, or outliers, in addition to a limited number of true correspondences (inliers). Next, we solve for correspondence by interpolating a vector field between the two point sets, which involves estimating a consensus of inlier points whose matching follows a nonparametric geometrical constraint. We formulate this a maximum a posteriori (MAP) estimation of a Bayesian model with hidden/latent variables indicating whether matches in the putative set are outliers or inliers. We impose nonparametric geometrical constraints on the correspondence, as a prior distribution, using Tikhonov regularizers in a reproducing kernel Hilbert space. MAP estimation is performed by the EM algorithm which by also estimating the variance of the prior model (initialized to a large value) is able to obtain good estimates very quickly (e.g., avoiding many of the local minima inherent in this formulation). We illustrate this method on data sets in 2D and 3D and demonstrate that it is robust to a very large number of outliers (even up to 90%). We also show that in the special case where there is an underlying parametric geometrical model (e.g., the epipolar line constraint) that we obtain better results than standard alternatives like RANSAC if a large number of outliers are present. This suggests a two-stage strategy, where we use our nonparametric model to reduce the size of the putative set and then apply a parametric variant of our approach to estimate the geometric parameters. Our algorithm is computationally efficient and we provide code for others to use it. In addition, our approach is general and can be applied to other problems, such as learning with a badly corrupted training data set.

  18. Advanced statistical methods for improved data analysis of NASA astrophysics missions

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.

    1992-01-01

    The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.

  19. RRAWFLOW: Rainfall-Response Aquifer and Watershed Flow Model (v1.15)

    USGS Publications Warehouse

    Long, Andrew J.

    2015-01-01

    The Rainfall-Response Aquifer and Watershed Flow Model (RRAWFLOW) is a lumped-parameter model that simulates streamflow, spring flow, groundwater level, or solute transport for a measurement point in response to a system input of precipitation, recharge, or solute injection. I introduce the first version of RRAWFLOW available for download and public use and describe additional options. The open-source code is written in the R language and is available at http://sd.water.usgs.gov/projects/RRAWFLOW/RRAWFLOW.html along with an example model of streamflow. RRAWFLOW includes a time-series process to estimate recharge from precipitation and simulates the response to recharge by convolution, i.e., the unit-hydrograph approach. Gamma functions are used for estimation of parametric impulse-response functions (IRFs); a combination of two gamma functions results in a double-peaked IRF. A spline fit to a set of control points is introduced as a new method for estimation of nonparametric IRFs. Several options are included to simulate time-variant systems. For many applications, lumped models simulate the system response with equal accuracy to that of distributed models, but moreover, the ease of model construction and calibration of lumped models makes them a good choice for many applications (e.g., estimating missing periods in a hydrologic record). RRAWFLOW provides professional hydrologists and students with an accessible and versatile tool for lumped-parameter modeling.

  20. Output-only modal parameter estimator of linear time-varying structural systems based on vector TAR model and least squares support vector machine

    NASA Astrophysics Data System (ADS)

    Zhou, Si-Da; Ma, Yuan-Chen; Liu, Li; Kang, Jie; Ma, Zhi-Sai; Yu, Lei

    2018-01-01

    Identification of time-varying modal parameters contributes to the structural health monitoring, fault detection, vibration control, etc. of the operational time-varying structural systems. However, it is a challenging task because there is not more information for the identification of the time-varying systems than that of the time-invariant systems. This paper presents a vector time-dependent autoregressive model and least squares support vector machine based modal parameter estimator for linear time-varying structural systems in case of output-only measurements. To reduce the computational cost, a Wendland's compactly supported radial basis function is used to achieve the sparsity of the Gram matrix. A Gamma-test-based non-parametric approach of selecting the regularization factor is adapted for the proposed estimator to replace the time-consuming n-fold cross validation. A series of numerical examples have illustrated the advantages of the proposed modal parameter estimator on the suppression of the overestimate and the short data. A laboratory experiment has further validated the proposed estimator.

  1. RRAWFLOW: Rainfall-Response Aquifer and Watershed Flow Model (v1.11)

    NASA Astrophysics Data System (ADS)

    Long, A. J.

    2014-09-01

    The Rainfall-Response Aquifer and Watershed Flow Model (RRAWFLOW) is a lumped-parameter model that simulates streamflow, springflow, groundwater level, solute transport, or cave drip for a measurement point in response to a system input of precipitation, recharge, or solute injection. The RRAWFLOW open-source code is written in the R language and is included in the Supplement to this article along with an example model of springflow. RRAWFLOW includes a time-series process to estimate recharge from precipitation and simulates the response to recharge by convolution; i.e., the unit hydrograph approach. Gamma functions are used for estimation of parametric impulse-response functions (IRFs); a combination of two gamma functions results in a double-peaked IRF. A spline fit to a set of control points is introduced as a new method for estimation of nonparametric IRFs. Other options include the use of user-defined IRFs and different methods to simulate time-variant systems. For many applications, lumped models simulate the system response with equal accuracy to that of distributed models, but moreover, the ease of model construction and calibration of lumped models makes them a good choice for many applications. RRAWFLOW provides professional hydrologists and students with an accessible and versatile tool for lumped-parameter modeling.

  2. Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support.

    PubMed

    Wilcox, Thomas P; Zwickl, Derrick J; Heath, Tracy A; Hillis, David M

    2002-11-01

    Four New World genera of dwarf boas (Exiliboa, Trachyboa, Tropidophis, and Ungaliophis) have been placed by many systematists in a single group (traditionally called Tropidophiidae). However, the monophyly of this group has been questioned in several studies. Moreover, the overall relationships among basal snake lineages, including the placement of the dwarf boas, are poorly understood. We obtained mtDNA sequence data for 12S, 16S, and intervening tRNA-val genes from 23 species of snakes representing most major snake lineages, including all four genera of New World dwarf boas. We then examined the phylogenetic position of these species by estimating the phylogeny of the basal snakes. Our phylogenetic analysis suggests that New World dwarf boas are not monophyletic. Instead, we find Exiliboa and Ungaliophis to be most closely related to sand boas (Erycinae), boas (Boinae), and advanced snakes (Caenophidea), whereas Tropidophis and Trachyboa form an independent clade that separated relatively early in snake radiation. Our estimate of snake phylogeny differs significantly in other ways from some previous estimates of snake phylogeny. For instance, pythons do not cluster with boas and sand boas, but instead show a strong relationship with Loxocemus and Xenopeltis. Additionally, uropeltids cluster strongly with Cylindrophis, and together are embedded in what has previously been considered the macrostomatan radiation. These relationships are supported by both bootstrapping (parametric and nonparametric approaches) and Bayesian analysis, although Bayesian support values are consistently higher than those obtained from nonparametric bootstrapping. Simulations show that Bayesian support values represent much better estimates of phylogenetic accuracy than do nonparametric bootstrap support values, at least under the conditions of our study. Copyright 2002 Elsevier Science (USA)

  3. Two models for evaluating landslide hazards

    USGS Publications Warehouse

    Davis, J.C.; Chung, C.-J.; Ohlmacher, G.C.

    2006-01-01

    Two alternative procedures for estimating landslide hazards were evaluated using data on topographic digital elevation models (DEMs) and bedrock lithologies in an area adjacent to the Missouri River in Atchison County, Kansas, USA. The two procedures are based on the likelihood ratio model but utilize different assumptions. The empirical likelihood ratio model is based on non-parametric empirical univariate frequency distribution functions under an assumption of conditional independence while the multivariate logistic discriminant model assumes that likelihood ratios can be expressed in terms of logistic functions. The relative hazards of occurrence of landslides were estimated by an empirical likelihood ratio model and by multivariate logistic discriminant analysis. Predictor variables consisted of grids containing topographic elevations, slope angles, and slope aspects calculated from a 30-m DEM. An integer grid of coded bedrock lithologies taken from digitized geologic maps was also used as a predictor variable. Both statistical models yield relative estimates in the form of the proportion of total map area predicted to already contain or to be the site of future landslides. The stabilities of estimates were checked by cross-validation of results from random subsamples, using each of the two procedures. Cell-by-cell comparisons of hazard maps made by the two models show that the two sets of estimates are virtually identical. This suggests that the empirical likelihood ratio and the logistic discriminant analysis models are robust with respect to the conditional independent assumption and the logistic function assumption, respectively, and that either model can be used successfully to evaluate landslide hazards. ?? 2006.

  4. Aeroservoelastic Uncertainty Model Identification from Flight Data

    NASA Technical Reports Server (NTRS)

    Brenner, Martin J.

    2001-01-01

    Uncertainty modeling is a critical element in the estimation of robust stability margins for stability boundary prediction and robust flight control system development. There has been a serious deficiency to date in aeroservoelastic data analysis with attention to uncertainty modeling. Uncertainty can be estimated from flight data using both parametric and nonparametric identification techniques. The model validation problem addressed in this paper is to identify aeroservoelastic models with associated uncertainty structures from a limited amount of controlled excitation inputs over an extensive flight envelope. The challenge to this problem is to update analytical models from flight data estimates while also deriving non-conservative uncertainty descriptions consistent with the flight data. Multisine control surface command inputs and control system feedbacks are used as signals in a wavelet-based modal parameter estimation procedure for model updates. Transfer function estimates are incorporated in a robust minimax estimation scheme to get input-output parameters and error bounds consistent with the data and model structure. Uncertainty estimates derived from the data in this manner provide an appropriate and relevant representation for model development and robust stability analysis. This model-plus-uncertainty identification procedure is applied to aeroservoelastic flight data from the NASA Dryden Flight Research Center F-18 Systems Research Aircraft.

  5. A Simple Effect Size Estimator for Single Case Designs Using WinBUGS

    ERIC Educational Resources Information Center

    Rindskopf, David; Shadish, William; Hedges, Larry

    2012-01-01

    Data from single case designs (SCDs) have traditionally been analyzed by visual inspection rather than statistical models. As a consequence, effect sizes have been of little interest. Lately, some effect-size estimators have been proposed, but most are either (i) nonparametric, and/or (ii) based on an analogy incompatible with effect sizes from…

  6. Does multi-functionality affect technical efficiency? A non-parametric analysis of the Scottish dairy industry.

    PubMed

    Barnes, A P

    2006-09-01

    Recent policy changes within the Common Agricultural Policy have led to a shift from a solely production-led agriculture towards the promotion of multi-functionality. Conversely, the removal of production-led supports would indicate that an increased concentration on production efficiencies would seem a critical strategy for a country's future competitiveness. This paper explores the relationship between the 'multi-functional' farming attitude desired by policy makers and its effect on technical efficiency within Scottish dairy farming. Technical efficiency scores are calculated by applying the non-parametric data envelopment analysis technique and then measured against causes of inefficiency. Amongst these explanatory factors is a constructed score of multi-functionality. This research finds that, amongst other factors, a multi-functional attitude has a significant positive effect on technical efficiency. Consequently, this seems to validate the promotion of a multi-functional approach to farming currently being championed by policy-makers.

  7. Applications of species accumulation curves in large-scale biological data analysis.

    PubMed

    Deng, Chao; Daley, Timothy; Smith, Andrew D

    2015-09-01

    The species accumulation curve, or collector's curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45-63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k -mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible.

  8. Applications of species accumulation curves in large-scale biological data analysis

    PubMed Central

    Deng, Chao; Daley, Timothy; Smith, Andrew D

    2016-01-01

    The species accumulation curve, or collector’s curve, of a population gives the expected number of observed species or distinct classes as a function of sampling effort. Species accumulation curves allow researchers to assess and compare diversity across populations or to evaluate the benefits of additional sampling. Traditional applications have focused on ecological populations but emerging large-scale applications, for example in DNA sequencing, are orders of magnitude larger and present new challenges. We developed a method to estimate accumulation curves for predicting the complexity of DNA sequencing libraries. This method uses rational function approximations to a classical non-parametric empirical Bayes estimator due to Good and Toulmin [Biometrika, 1956, 43, 45–63]. Here we demonstrate how the same approach can be highly effective in other large-scale applications involving biological data sets. These include estimating microbial species richness, immune repertoire size, and k-mer diversity for genome assembly applications. We show how the method can be modified to address populations containing an effectively infinite number of species where saturation cannot practically be attained. We also introduce a flexible suite of tools implemented as an R package that make these methods broadly accessible. PMID:27252899

  9. A variational Bayes spatiotemporal model for electromagnetic brain mapping.

    PubMed

    Nathoo, F S; Babul, A; Moiseev, A; Virji-Babul, N; Beg, M F

    2014-03-01

    In this article, we present a new variational Bayes approach for solving the neuroelectromagnetic inverse problem arising in studies involving electroencephalography (EEG) and magnetoencephalography (MEG). This high-dimensional spatiotemporal estimation problem involves the recovery of time-varying neural activity at a large number of locations within the brain, from electromagnetic signals recorded at a relatively small number of external locations on or near the scalp. Framing this problem within the context of spatial variable selection for an underdetermined functional linear model, we propose a spatial mixture formulation where the profile of electrical activity within the brain is represented through location-specific spike-and-slab priors based on a spatial logistic specification. The prior specification accommodates spatial clustering in brain activation, while also allowing for the inclusion of auxiliary information derived from alternative imaging modalities, such as functional magnetic resonance imaging (fMRI). We develop a variational Bayes approach for computing estimates of neural source activity, and incorporate a nonparametric bootstrap for interval estimation. The proposed methodology is compared with several alternative approaches through simulation studies, and is applied to the analysis of a multimodal neuroimaging study examining the neural response to face perception using EEG, MEG, and fMRI. © 2013, The International Biometric Society.

  10. Inferring the demographic history from DNA sequences: An importance sampling approach based on non-homogeneous processes.

    PubMed

    Ait Kaci Azzou, S; Larribe, F; Froda, S

    2016-10-01

    In Ait Kaci Azzou et al. (2015) we introduced an Importance Sampling (IS) approach for estimating the demographic history of a sample of DNA sequences, the skywis plot. More precisely, we proposed a new nonparametric estimate of a population size that changes over time. We showed on simulated data that the skywis plot can work well in typical situations where the effective population size does not undergo very steep changes. In this paper, we introduce an iterative procedure which extends the previous method and gives good estimates under such rapid variations. In the iterative calibrated skywis plot we approximate the effective population size by a piecewise constant function, whose values are re-estimated at each step. These piecewise constant functions are used to generate the waiting times of non homogeneous Poisson processes related to a coalescent process with mutation under a variable population size model. Moreover, the present IS procedure is based on a modified version of the Stephens and Donnelly (2000) proposal distribution. Finally, we apply the iterative calibrated skywis plot method to a simulated data set from a rapidly expanding exponential model, and we show that the method based on this new IS strategy correctly reconstructs the demographic history. Copyright © 2016. Published by Elsevier Inc.

  11. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    USGS Publications Warehouse

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.

  12. Change point detection of the Persian Gulf sea surface temperature

    NASA Astrophysics Data System (ADS)

    Shirvani, A.

    2017-01-01

    In this study, the Student's t parametric and Mann-Whitney nonparametric change point models (CPMs) were applied to detect change point in the annual Persian Gulf sea surface temperature anomalies (PGSSTA) time series for the period 1951-2013. The PGSSTA time series, which were serially correlated, were transformed to produce an uncorrelated pre-whitened time series. The pre-whitened PGSSTA time series were utilized as the input file of change point models. Both the applied parametric and nonparametric CPMs estimated the change point in the PGSSTA in 1992. The PGSSTA follow the normal distribution up to 1992 and thereafter, but with a different mean value after year 1992. The estimated slope of linear trend in PGSSTA time series for the period 1951-1992 was negative; however, that was positive after the detected change point. Unlike the PGSSTA, the applied CPMs suggested no change point in the Niño3.4SSTA time series.

  13. On-Line Robust Modal Stability Prediction using Wavelet Processing

    NASA Technical Reports Server (NTRS)

    Brenner, Martin J.; Lind, Rick

    1998-01-01

    Wavelet analysis for filtering and system identification has been used to improve the estimation of aeroservoelastic stability margins. The conservatism of the robust stability margins is reduced with parametric and nonparametric time- frequency analysis of flight data in the model validation process. Nonparametric wavelet processing of data is used to reduce the effects of external disturbances and unmodeled dynamics. Parametric estimates of modal stability are also extracted using the wavelet transform. Computation of robust stability margins for stability boundary prediction depends on uncertainty descriptions derived from the data for model validation. The F-18 High Alpha Research Vehicle aeroservoelastic flight test data demonstrates improved robust stability prediction by extension of the stability boundary beyond the flight regime. Guidelines and computation times are presented to show the efficiency and practical aspects of these procedures for on-line implementation. Feasibility of the method is shown for processing flight data from time- varying nonstationary test points.

  14. Radiated Seismic Energy of Earthquakes in the South-Central Region of the Gulf of California, Mexico

    NASA Astrophysics Data System (ADS)

    Castro, Raúl R.; Mendoza-Camberos, Antonio; Pérez-Vertti, Arturo

    2018-05-01

    We estimated the radiated seismic energy (ES) of 65 earthquakes located in the south-central region of the Gulf of California. Most of these events occurred along active transform faults that define the Pacific-North America plate boundary and have magnitudes between M3.3 and M5.9. We corrected the spectral records for attenuation using nonparametric S-wave attenuation functions determined with the whole data set. The path effects were isolated from the seismic source using a spectral inversion. We computed radiated seismic energy of the earthquakes by integrating the square velocity source spectrum and estimated their apparent stresses. We found that most events have apparent stress between 3 × 10-4 and 3 MPa. Model independent estimates of the ratio between seismic energy and moment (ES/M0) indicates that this ratio is independent of earthquake size. We conclude that in general the apparent stress is low (σa < 3 MPa) in the south-central and southern Gulf of California.

  15. Quantifying and estimating the predictive accuracy for censored time-to-event data with competing risks.

    PubMed

    Wu, Cai; Li, Liang

    2018-05-15

    This paper focuses on quantifying and estimating the predictive accuracy of prognostic models for time-to-event outcomes with competing events. We consider the time-dependent discrimination and calibration metrics, including the receiver operating characteristics curve and the Brier score, in the context of competing risks. To address censoring, we propose a unified nonparametric estimation framework for both discrimination and calibration measures, by weighting the censored subjects with the conditional probability of the event of interest given the observed data. The proposed method can be extended to time-dependent predictive accuracy metrics constructed from a general class of loss functions. We apply the methodology to a data set from the African American Study of Kidney Disease and Hypertension to evaluate the predictive accuracy of a prognostic risk score in predicting end-stage renal disease, accounting for the competing risk of pre-end-stage renal disease death, and evaluate its numerical performance in extensive simulation studies. Copyright © 2018 John Wiley & Sons, Ltd.

  16. Probability machines: consistent probability estimation using nonparametric learning machines.

    PubMed

    Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

    2012-01-01

    Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.

  17. Estimating the number of tree species in forest populations using current vegetation survey and forest inventory and analysis approximation plots and grid intensities

    Treesearch

    Hans T. Schreuder; Jin-Mann S. Lin; John Teply

    2000-01-01

    We estimate number of tree species in National Forest populations using the nonparametric estimator. Data from the Current Vegetation Survey (CVS) of Region 6 of the USDA Forest Service were used to estimate the number of tree species with a plot close in size to the Forest Inventory and Analysis (FIA) plot and the actual CVS plot for the 5.5 km FIA grid and the 2.7 km...

  18. Essays in applied macroeconomics: Asymmetric price adjustment, exchange rate and treatment effect

    NASA Astrophysics Data System (ADS)

    Gu, Jingping

    This dissertation consists of three essays. Chapter II examines the possible asymmetric response of gasoline prices to crude oil price changes using an error correction model with GARCH errors. Recent papers have looked at this issue. Some of these papers estimate a form of error correction model, but none of them accounts for autoregressive heteroskedasticity in estimation and testing for asymmetry and none of them takes the response of crude oil price into consideration. We find that time-varying volatility of gasoline price disturbances is an important feature of the data, and when we allow for asymmetric GARCH errors and investigate the system wide impulse response function, we find evidence of asymmetric adjustment to crude oil price changes in weekly retail gasoline prices. Chapter III discusses the relationship between fiscal deficit and exchange rate. Economic theory predicts that fiscal deficits can significantly affect real exchange rate movements, but existing empirical evidence reports only a weak impact of fiscal deficits on exchange rates. Based on US dollar-based real exchange rates in G5 countries and a flexible varying coefficient model, we show that the previously documented weak relationship between fiscal deficits and exchange rates may be the result of additive specifications, and that the relationship is stronger if we allow fiscal deficits to impact real exchange rates non-additively as well as nonlinearly. We find that the speed of exchange rate adjustment toward equilibrium depends on the state of the fiscal deficit; a fiscal contraction in the US can lead to less persistence in the deviation of exchange rates from fundamentals, and faster mean reversion to the equilibrium. Chapter IV proposes a kernel method to deal with the nonparametric regression model with only discrete covariates as regressors. This new approach is based on recently developed least squares cross-validation kernel smoothing method. It can not only automatically smooth the irrelevant variables out of the nonparametric regression model, but also avoid the problem of loss of efficiency related to the traditional nonparametric frequency-based method and the problem of misspecification based on parametric model.

  19. Estimating areal means and variances of forest attributes using the k-Nearest Neighbors technique and satellite imagery

    Treesearch

    Ronald E. McRoberts; Erkki O. Tomppo; Andrew O. Finley; Heikkinen Juha

    2007-01-01

    The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest...

  20. Statistical Theory for the "RCT-YES" Software: Design-Based Causal Inference for RCTs. NCEE 2015-4011

    ERIC Educational Resources Information Center

    Schochet, Peter Z.

    2015-01-01

    This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…

  1. Fusion of Hard and Soft Information in Nonparametric Density Estimation

    DTIC Science & Technology

    2015-06-10

    and stochastic optimization models, in analysis of simulation output, and when instantiating probability models. We adopt a constrained maximum...particular, density estimation is needed for generation of input densities to simulation and stochastic optimization models, in analysis of simulation output...an essential step in simulation analysis and stochastic optimization is the generation of probability densities for input random variables; see for

  2. Bayesian nonparametric estimation of EQ-5D utilities for United States using the existing United Kingdom data.

    PubMed

    Kharroubi, Samer A

    2017-10-06

    Valuations of health state descriptors such as EQ-5D or SF6D have been conducted in different countries. There is a scope to make use of the results in one country as informative priors to help with the analysis of a study in another, for this to enable better estimation to be obtained in the new country than analyzing its data separately. Data from 2 EQ-5D valuation studies were analyzed using the time trade-off technique, where values for 42 health states were devised from representative samples of the UK and US populations. A Bayesian non-parametric approach has been applied to predict the health utilities of the US population, where the UK results were used as informative priors in the model to improve their estimation. The findings showed that employing additional information from the UK data helped in the production of US utility estimates much more precisely than would have been possible using the US study data alone. It is very plausible that this method would serve useful in countries where the conduction of large evaluation studies is not very feasible.

  3. Nonparametric test of consistency between cosmological models and multiband CMB measurements

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aghamousa, Amir; Shafieloo, Arman, E-mail: amir@apctp.org, E-mail: shafieloo@kasi.re.kr

    2015-06-01

    We present a novel approach to test the consistency of the cosmological models with multiband CMB data using a nonparametric approach. In our analysis we calibrate the REACT (Risk Estimation and Adaptation after Coordinate Transformation) confidence levels associated with distances in function space (confidence distances) based on the Monte Carlo simulations in order to test the consistency of an assumed cosmological model with observation. To show the applicability of our algorithm, we confront Planck 2013 temperature data with concordance model of cosmology considering two different Planck spectra combination. In order to have an accurate quantitative statistical measure to compare betweenmore » the data and the theoretical expectations, we calibrate REACT confidence distances and perform a bias control using many realizations of the data. Our results in this work using Planck 2013 temperature data put the best fit ΛCDM model at 95% (∼ 2σ) confidence distance from the center of the nonparametric confidence set while repeating the analysis excluding the Planck 217 × 217 GHz spectrum data, the best fit ΛCDM model shifts to 70% (∼ 1σ) confidence distance. The most prominent features in the data deviating from the best fit ΛCDM model seems to be at low multipoles  18 < ℓ < 26 at greater than 2σ, ℓ ∼ 750 at ∼1 to 2σ and ℓ ∼ 1800 at greater than 2σ level. Excluding the 217×217 GHz spectrum the feature at ℓ ∼ 1800 becomes substantially less significance at ∼1 to 2σ confidence level. Results of our analysis based on the new approach we propose in this work are in agreement with other analysis done using alternative methods.« less

  4. Permissible Home Range Estimation (PHRE) in restricted habitats: A new algorithm and an evaluation for sea otters

    USGS Publications Warehouse

    Tarjan, Lily M; Tinker, M. Tim

    2016-01-01

    Parametric and nonparametric kernel methods dominate studies of animal home ranges and space use. Most existing methods are unable to incorporate information about the underlying physical environment, leading to poor performance in excluding areas that are not used. Using radio-telemetry data from sea otters, we developed and evaluated a new algorithm for estimating home ranges (hereafter Permissible Home Range Estimation, or “PHRE”) that reflects habitat suitability. We began by transforming sighting locations into relevant landscape features (for sea otters, coastal position and distance from shore). Then, we generated a bivariate kernel probability density function in landscape space and back-transformed this to geographic space in order to define a permissible home range. Compared to two commonly used home range estimation methods, kernel densities and local convex hulls, PHRE better excluded unused areas and required a smaller sample size. Our PHRE method is applicable to species whose ranges are restricted by complex physical boundaries or environmental gradients and will improve understanding of habitat-use requirements and, ultimately, aid in conservation efforts.

  5. Estimating the Technology of Cognitive and Noncognitive Skill Formation*

    PubMed Central

    Cunha, Flavio; Heckman, James; Schennach, Susanne

    2009-01-01

    This paper formulates and estimates multistage production functions for child cognitive and noncognitive skills. Output is determined by parental environments and investments at different stages of childhood. We estimate the elasticity of substitution between investments in one period and stocks of skills in that period to assess the benefits of early investment in children compared to later remediation. We establish nonparametric identification of a general class of nonlinear factor models. A by-product of our approach is a framework for evaluating childhood interventions that does not rely on arbitrarily scaled test scores as outputs and recognizes the differential effects of skills in different tasks. Using the estimated technology, we determine optimal targeting of interventions to children with different parental and personal birth endowments. Substitutability decreases in later stages of the life cycle for the production of cognitive skills. It increases in later stages of the life cycle for the production of noncognitive skills. This finding has important implications for the design of policies that target the disadvantaged. For some configurations of disadvantage and outcomes, it is optimal to invest relatively more in the later stages of childhood. PMID:20563300

  6. Inhomogeneous Poisson process rate function inference from dead-time limited observations.

    PubMed

    Verma, Gunjan; Drost, Robert J

    2017-05-01

    The estimation of an inhomogeneous Poisson process (IHPP) rate function from a set of process observations is an important problem arising in optical communications and a variety of other applications. However, because of practical limitations of detector technology, one is often only able to observe a corrupted version of the original process. In this paper, we consider how inference of the rate function is affected by dead time, a period of time after the detection of an event during which a sensor is insensitive to subsequent IHPP events. We propose a flexible nonparametric Bayesian approach to infer an IHPP rate function given dead-time limited process realizations. Simulation results illustrate the effectiveness of our inference approach and suggest its ability to extend the utility of existing sensor technology by permitting more accurate inference on signals whose observations are dead-time limited. We apply our inference algorithm to experimentally collected optical communications data, demonstrating the practical utility of our approach in the context of channel modeling and validation.

  7. Regression analysis of informative current status data with the additive hazards model.

    PubMed

    Zhao, Shishun; Hu, Tao; Ma, Ling; Wang, Peijie; Sun, Jianguo

    2015-04-01

    This paper discusses regression analysis of current status failure time data arising from the additive hazards model in the presence of informative censoring. Many methods have been developed for regression analysis of current status data under various regression models if the censoring is noninformative, and also there exists a large literature on parametric analysis of informative current status data in the context of tumorgenicity experiments. In this paper, a semiparametric maximum likelihood estimation procedure is presented and in the method, the copula model is employed to describe the relationship between the failure time of interest and the censoring time. Furthermore, I-splines are used to approximate the nonparametric functions involved and the asymptotic consistency and normality of the proposed estimators are established. A simulation study is conducted and indicates that the proposed approach works well for practical situations. An illustrative example is also provided.

  8. Inferring time derivatives including cell growth rates using Gaussian processes

    NASA Astrophysics Data System (ADS)

    Swain, Peter S.; Stevenson, Keiran; Leary, Allen; Montano-Gutierrez, Luis F.; Clark, Ivan B. N.; Vogel, Jackie; Pilizota, Teuta

    2016-12-01

    Often the time derivative of a measured variable is of as much interest as the variable itself. For a growing population of biological cells, for example, the population's growth rate is typically more important than its size. Here we introduce a non-parametric method to infer first and second time derivatives as a function of time from time-series data. Our approach is based on Gaussian processes and applies to a wide range of data. In tests, the method is at least as accurate as others, but has several advantages: it estimates errors both in the inference and in any summary statistics, such as lag times, and allows interpolation with the corresponding error estimation. As illustrations, we infer growth rates of microbial cells, the rate of assembly of an amyloid fibril and both the speed and acceleration of two separating spindle pole bodies. Our algorithm should thus be broadly applicable.

  9. Data-driven monitoring for stochastic systems and its application on batch process

    NASA Astrophysics Data System (ADS)

    Yin, Shen; Ding, Steven X.; Haghani Abandan Sari, Adel; Hao, Haiyang

    2013-07-01

    Batch processes are characterised by a prescribed processing of raw materials into final products for a finite duration and play an important role in many industrial sectors due to the low-volume and high-value products. Process dynamics and stochastic disturbances are inherent characteristics of batch processes, which cause monitoring of batch processes a challenging problem in practice. To solve this problem, a subspace-aided data-driven approach is presented in this article for batch process monitoring. The advantages of the proposed approach lie in its simple form and its abilities to deal with stochastic disturbances and process dynamics existing in the process. The kernel density estimation, which serves as a non-parametric way of estimating the probability density function, is utilised for threshold calculation. An industrial benchmark of fed-batch penicillin production is finally utilised to verify the effectiveness of the proposed approach.

  10. Non-Parametric Collision Probability for Low-Velocity Encounters

    NASA Technical Reports Server (NTRS)

    Carpenter, J. Russell

    2007-01-01

    An implicit, but not necessarily obvious, assumption in all of the current techniques for assessing satellite collision probability is that the relative position uncertainty is perfectly correlated in time. If there is any mis-modeling of the dynamics in the propagation of the relative position error covariance matrix, time-wise de-correlation of the uncertainty will increase the probability of collision over a given time interval. The paper gives some examples that illustrate this point. This paper argues that, for the present, Monte Carlo analysis is the best available tool for handling low-velocity encounters, and suggests some techniques for addressing the issues just described. One proposal is for the use of a non-parametric technique that is widely used in actuarial and medical studies. The other suggestion is that accurate process noise models be used in the Monte Carlo trials to which the non-parametric estimate is applied. A further contribution of this paper is a description of how the time-wise decorrelation of uncertainty increases the probability of collision.

  11. Nonparametric evaluation of birth cohort trends in disease rates.

    PubMed

    Tarone, R E; Chu, K C

    2000-01-01

    Although interpretation of age-period-cohort analyses is complicated by the non-identifiability of maximum likelihood estimates, changes in the slope of the birth-cohort effect curve are identifiable and have potential aetiologic significance. A nonparametric test for a change in the slope of the birth-cohort trend has been developed. The test is a generalisation of the sign test and is based on permutational distributions. A method for identifying interactions between age and calendar-period effects is also presented. The nonparametric method is shown to be powerful in detecting changes in the slope of the birth-cohort trend, although its power can be reduced considerably by calendar-period patterns of risk. The method identifies a previously unidentified decrease in the birth-cohort risk of lung-cancer mortality from 1912 to 1919, which appears to reflect a reduction in the initiation of smoking by young men at the beginning of the Great Depression (1930s). The method also detects an interaction between age and calendar period in leukemia mortality rates, reflecting the better response of children to chemotherapy. The proposed nonparametric method provides a data analytic approach, which is a useful adjunct to log-linear Poisson analysis of age-period-cohort models, either in the initial model building stage, or in the final interpretation stage.

  12. Multi-object segmentation using coupled nonparametric shape and relative pose priors

    NASA Astrophysics Data System (ADS)

    Uzunbas, Mustafa Gökhan; Soldea, Octavian; Çetin, Müjdat; Ünal, Gözde; Erçil, Aytül; Unay, Devrim; Ekin, Ahmet; Firat, Zeynep

    2009-02-01

    We present a new method for multi-object segmentation in a maximum a posteriori estimation framework. Our method is motivated by the observation that neighboring or coupling objects in images generate configurations and co-dependencies which could potentially aid in segmentation if properly exploited. Our approach employs coupled shape and inter-shape pose priors that are computed using training images in a nonparametric multi-variate kernel density estimation framework. The coupled shape prior is obtained by estimating the joint shape distribution of multiple objects and the inter-shape pose priors are modeled via standard moments. Based on such statistical models, we formulate an optimization problem for segmentation, which we solve by an algorithm based on active contours. Our technique provides significant improvements in the segmentation of weakly contrasted objects in a number of applications. In particular for medical image analysis, we use our method to extract brain Basal Ganglia structures, which are members of a complex multi-object system posing a challenging segmentation problem. We also apply our technique to the problem of handwritten character segmentation. Finally, we use our method to segment cars in urban scenes.

  13. A Semiparametric Change-Point Regression Model for Longitudinal Observations.

    PubMed

    Xing, Haipeng; Ying, Zhiliang

    2012-12-01

    Many longitudinal studies involve relating an outcome process to a set of possibly time-varying covariates, giving rise to the usual regression models for longitudinal data. When the purpose of the study is to investigate the covariate effects when experimental environment undergoes abrupt changes or to locate the periods with different levels of covariate effects, a simple and easy-to-interpret approach is to introduce change-points in regression coefficients. In this connection, we propose a semiparametric change-point regression model, in which the error process (stochastic component) is nonparametric and the baseline mean function (functional part) is completely unspecified, the observation times are allowed to be subject-specific, and the number, locations and magnitudes of change-points are unknown and need to be estimated. We further develop an estimation procedure which combines the recent advance in semiparametric analysis based on counting process argument and multiple change-points inference, and discuss its large sample properties, including consistency and asymptotic normality, under suitable regularity conditions. Simulation results show that the proposed methods work well under a variety of scenarios. An application to a real data set is also given.

  14. Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

    PubMed

    Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

    2017-01-01

    Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.

  15. RRAWFLOW: Rainfall-Response Aquifer and Watershed Flow Model (v1.15)

    NASA Astrophysics Data System (ADS)

    Long, A. J.

    2015-03-01

    The Rainfall-Response Aquifer and Watershed Flow Model (RRAWFLOW) is a lumped-parameter model that simulates streamflow, spring flow, groundwater level, or solute transport for a measurement point in response to a system input of precipitation, recharge, or solute injection. I introduce the first version of RRAWFLOW available for download and public use and describe additional options. The open-source code is written in the R language and is available at http://sd.water.usgs.gov/projects/RRAWFLOW/RRAWFLOW.html along with an example model of streamflow. RRAWFLOW includes a time-series process to estimate recharge from precipitation and simulates the response to recharge by convolution, i.e., the unit-hydrograph approach. Gamma functions are used for estimation of parametric impulse-response functions (IRFs); a combination of two gamma functions results in a double-peaked IRF. A spline fit to a set of control points is introduced as a new method for estimation of nonparametric IRFs. Several options are included to simulate time-variant systems. For many applications, lumped models simulate the system response with equal accuracy to that of distributed models, but moreover, the ease of model construction and calibration of lumped models makes them a good choice for many applications (e.g., estimating missing periods in a hydrologic record). RRAWFLOW provides professional hydrologists and students with an accessible and versatile tool for lumped-parameter modeling.

  16. A Nonparametric Approach to Automated S-Wave Picking

    NASA Astrophysics Data System (ADS)

    Rawles, C.; Thurber, C. H.

    2014-12-01

    Although a number of very effective P-wave automatic pickers have been developed over the years, automatic picking of S waves has remained more challenging. Most automatic pickers take a parametric approach, whereby some characteristic function (CF), e.g. polarization or kurtosis, is determined from the data and the pick is estimated from the CF. We have adopted a nonparametric approach, estimating the pick directly from the waveforms. For a particular waveform to be auto-picked, the method uses a combination of similarity to a set of seismograms with known S-wave arrivals and dissimilarity to a set of seismograms that do not contain S-wave arrivals. Significant effort has been made towards dealing with the problem of S-to-P conversions. We have evaluated the effectiveness of our method by testing it on multiple sets of microearthquake seismograms with well-determined S-wave arrivals for several areas around the world, including fault zones and volcanic regions. In general, we find that the results from our auto-picker are consistent with reviewed analyst picks 90% of the time at the 0.2 s level and 80% of the time at the 0.1 s level, or better. For most of the large datasets we have analyzed, our auto-picker also makes far more S-wave picks than were made previously by analysts. We are using these enlarged sets of high-quality S-wave picks to refine tomographic inversions for these areas, resulting in substantial improvement in the quality of the S-wave images. We will show examples from New Zealand, Hawaii, and California.

  17. EMG-Torque Dynamics Change With Contraction Bandwidth.

    PubMed

    Golkar, Mahsa A; Jalaleddini, Kian; Kearney, Robert E

    2018-04-01

    An accurate model for ElectroMyoGram (EMG)-torque dynamics has many uses. One of its applications which has gained high attention among researchers is its use, in estimating the muscle contraction level for the efficient control of prosthesis. In this paper, the dynamic relationship between the surface EMG and torque during isometric contractions at the human ankle was studied using system identification techniques. Subjects voluntarily modulated their ankle torque in dorsiflexion direction, by activating their tibialis anterior muscle, while tracking a pseudo-random binary sequence in a torque matching task. The effects of contraction bandwidth, described by torque spectrum, on EMG-torque dynamics were evaluated by varying the visual command switching time. Nonparametric impulse response functions (IRF) were estimated between the processed surface EMG and torque. It was demonstrated that: 1) at low contraction bandwidths, the identified IRFs had unphysiological anticipatory (i.e., non-causal) components, whose amplitude decreased as the contraction bandwidth increased. We hypothesized that this non-causal behavior arose, because the EMG input contained a component due to feedback from the output torque, i.e., it was recorded from within a closed-loop. Vision was not the feedback source since the non-causal behavior persisted when visual feedback was removed. Repeating the identification using a nonparametric closed-loop identification algorithm yielded causal IRFs at all bandwidths, supporting this hypothesis. 2) EMG-torque dynamics became faster and the bandwidth of system increased as contraction modulation rate increased. Thus, accurate prediction of torque from EMG signals must take into account the contraction bandwidth sensitivity of this system.

  18. Determination of the minimal clinically important difference for seven fatigue measures in rheumatoid arthritis

    PubMed Central

    Pouchot, Jacques; Kherani, Raheem B.; Brant, Rollin; Lacaille, Diane; Lehman, Allen J.; Ensworth, Stephanie; Kopec, Jacek; Esdaile, John M.; Liang, Matthew H.

    2008-01-01

    Objective To estimate the minimal clinically important difference (MCID) of seven measures of fatigue in rheumatoid arthritis. Study Design and Setting A cross-sectional study design based on inter-individual comparisons was used. Six to eight subjects participated in a single meeting and completed seven fatigue questionnaires (nine sessions were organized and 61 subjects participated). After completion of the questionnaires, the subjects had five one-on-one 10-minute conversations with different people in the group to discuss their fatigue. After each conversation, each patient compared their fatigue to their conversational partner’s on a global rating. Ratings were compared to the scores of the fatigue measures to estimate the MCID. Both non-parametric and linear regression analyses were used. Results Non-parametric estimates for the MCID relative to “little more fatigue” tended to be smaller than those for “little less fatigue”. The global MCIDs estimated by linear regression were: FSS 20.2, VT 14.8, MAF 18.7, MFI 16.6, FACIT–F 15.9, CFS 9.9, RS 19.7, for normalized scores (0 to 100). The standardized MCIDs for the seven measures were roughly similar (0.67 to 0.76). Conclusion These estimates of MCID will help to interpret changes observed in a fatigue score and will be critical in estimating sample size requirements. PMID:18359189

  19. Multiresponse semiparametric regression for modelling the effect of regional socio-economic variables on the use of information technology

    NASA Astrophysics Data System (ADS)

    Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania

    2017-03-01

    Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.

  20. Risk analysis for autonomous underwater vehicle operations in extreme environments.

    PubMed

    Brito, Mario Paulo; Griffiths, Gwyn; Challenor, Peter

    2010-12-01

    Autonomous underwater vehicles (AUVs) are used increasingly to explore hazardous marine environments. Risk assessment for such complex systems is based on subjective judgment and expert knowledge as much as on hard statistics. Here, we describe the use of a risk management process tailored to AUV operations, the implementation of which requires the elicitation of expert judgment. We conducted a formal judgment elicitation process where eight world experts in AUV design and operation were asked to assign a probability of AUV loss given the emergence of each fault or incident from the vehicle's life history of 63 faults and incidents. After discussing methods of aggregation and analysis, we show how the aggregated risk estimates obtained from the expert judgments were used to create a risk model. To estimate AUV survival with mission distance, we adopted a statistical survival function based on the nonparametric Kaplan-Meier estimator. We present theoretical formulations for the estimator, its variance, and confidence limits. We also present a numerical example where the approach is applied to estimate the probability that the Autosub3 AUV would survive a set of missions under Pine Island Glacier, Antarctica in January-March 2009. © 2010 Society for Risk Analysis.

  1. Nonparametric change point estimation for survival distributions with a partially constant hazard rate.

    PubMed

    Brazzale, Alessandra R; Küchenhoff, Helmut; Krügel, Stefanie; Schiergens, Tobias S; Trentzsch, Heiko; Hartl, Wolfgang

    2018-04-05

    We present a new method for estimating a change point in the hazard function of a survival distribution assuming a constant hazard rate after the change point and a decreasing hazard rate before the change point. Our method is based on fitting a stump regression to p values for testing hazard rates in small time intervals. We present three real data examples describing survival patterns of severely ill patients, whose excess mortality rates are known to persist far beyond hospital discharge. For designing survival studies in these patients and for the definition of hospital performance metrics (e.g. mortality), it is essential to define adequate and objective end points. The reliable estimation of a change point will help researchers to identify such end points. By precisely knowing this change point, clinicians can distinguish between the acute phase with high hazard (time elapsed after admission and before the change point was reached), and the chronic phase (time elapsed after the change point) in which hazard is fairly constant. We show in an extensive simulation study that maximum likelihood estimation is not robust in this setting, and we evaluate our new estimation strategy including bootstrap confidence intervals and finite sample bias correction.

  2. Comparison of Two Methods for Estimating the Sampling-Related Uncertainty of Satellite Rainfall Averages Based on a Large Radar Data Set

    NASA Technical Reports Server (NTRS)

    Lau, William K. M. (Technical Monitor); Bell, Thomas L.; Steiner, Matthias; Zhang, Yu; Wood, Eric F.

    2002-01-01

    The uncertainty of rainfall estimated from averages of discrete samples collected by a satellite is assessed using a multi-year radar data set covering a large portion of the United States. The sampling-related uncertainty of rainfall estimates is evaluated for all combinations of 100 km, 200 km, and 500 km space domains, 1 day, 5 day, and 30 day rainfall accumulations, and regular sampling time intervals of 1 h, 3 h, 6 h, 8 h, and 12 h. These extensive analyses are combined to characterize the sampling uncertainty as a function of space and time domain, sampling frequency, and rainfall characteristics by means of a simple scaling law. Moreover, it is shown that both parametric and non-parametric statistical techniques of estimating the sampling uncertainty produce comparable results. Sampling uncertainty estimates, however, do depend on the choice of technique for obtaining them. They can also vary considerably from case to case, reflecting the great variability of natural rainfall, and should therefore be expressed in probabilistic terms. Rainfall calibration errors are shown to affect comparison of results obtained by studies based on data from different climate regions and/or observation platforms.

  3. The quantile regression approach to efficiency measurement: insights from Monte Carlo simulations.

    PubMed

    Liu, Chunping; Laporte, Audrey; Ferguson, Brian S

    2008-09-01

    In the health economics literature there is an ongoing debate over approaches used to estimate the efficiency of health systems at various levels, from the level of the individual hospital - or nursing home - up to that of the health system as a whole. The two most widely used approaches to evaluating the efficiency with which various units deliver care are non-parametric data envelopment analysis (DEA) and parametric stochastic frontier analysis (SFA). Productivity researchers tend to have very strong preferences over which methodology to use for efficiency estimation. In this paper, we use Monte Carlo simulation to compare the performance of DEA and SFA in terms of their ability to accurately estimate efficiency. We also evaluate quantile regression as a potential alternative approach. A Cobb-Douglas production function, random error terms and a technical inefficiency term with different distributions are used to calculate the observed output. The results, based on these experiments, suggest that neither DEA nor SFA can be regarded as clearly dominant, and that, depending on the quantile estimated, the quantile regression approach may be a useful addition to the armamentarium of methods for estimating technical efficiency.

  4. A practical method to test the validity of the standard Gumbel distribution in logit-based multinomial choice models of travel behavior

    DOE PAGES

    Ye, Xin; Garikapati, Venu M.; You, Daehyun; ...

    2017-11-08

    Most multinomial choice models (e.g., the multinomial logit model) adopted in practice assume an extreme-value Gumbel distribution for the random components (error terms) of utility functions. This distributional assumption offers a closed-form likelihood expression when the utility maximization principle is applied to model choice behaviors. As a result, model coefficients can be easily estimated using the standard maximum likelihood estimation method. However, maximum likelihood estimators are consistent and efficient only if distributional assumptions on the random error terms are valid. It is therefore critical to test the validity of underlying distributional assumptions on the error terms that form the basismore » of parameter estimation and policy evaluation. In this paper, a practical yet statistically rigorous method is proposed to test the validity of the distributional assumption on the random components of utility functions in both the multinomial logit (MNL) model and multiple discrete-continuous extreme value (MDCEV) model. Based on a semi-nonparametric approach, a closed-form likelihood function that nests the MNL or MDCEV model being tested is derived. The proposed method allows traditional likelihood ratio tests to be used to test violations of the standard Gumbel distribution assumption. Simulation experiments are conducted to demonstrate that the proposed test yields acceptable Type-I and Type-II error probabilities at commonly available sample sizes. The test is then applied to three real-world discrete and discrete-continuous choice models. For all three models, the proposed test rejects the validity of the standard Gumbel distribution in most utility functions, calling for the development of robust choice models that overcome adverse effects of violations of distributional assumptions on the error terms in random utility functions.« less

  5. A practical method to test the validity of the standard Gumbel distribution in logit-based multinomial choice models of travel behavior

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ye, Xin; Garikapati, Venu M.; You, Daehyun

    Most multinomial choice models (e.g., the multinomial logit model) adopted in practice assume an extreme-value Gumbel distribution for the random components (error terms) of utility functions. This distributional assumption offers a closed-form likelihood expression when the utility maximization principle is applied to model choice behaviors. As a result, model coefficients can be easily estimated using the standard maximum likelihood estimation method. However, maximum likelihood estimators are consistent and efficient only if distributional assumptions on the random error terms are valid. It is therefore critical to test the validity of underlying distributional assumptions on the error terms that form the basismore » of parameter estimation and policy evaluation. In this paper, a practical yet statistically rigorous method is proposed to test the validity of the distributional assumption on the random components of utility functions in both the multinomial logit (MNL) model and multiple discrete-continuous extreme value (MDCEV) model. Based on a semi-nonparametric approach, a closed-form likelihood function that nests the MNL or MDCEV model being tested is derived. The proposed method allows traditional likelihood ratio tests to be used to test violations of the standard Gumbel distribution assumption. Simulation experiments are conducted to demonstrate that the proposed test yields acceptable Type-I and Type-II error probabilities at commonly available sample sizes. The test is then applied to three real-world discrete and discrete-continuous choice models. For all three models, the proposed test rejects the validity of the standard Gumbel distribution in most utility functions, calling for the development of robust choice models that overcome adverse effects of violations of distributional assumptions on the error terms in random utility functions.« less

  6. Survival potential of Phytophthora infestans sporangia in relation to meteorological factors

    USDA-ARS?s Scientific Manuscript database

    Assessment of meteorological factors coupled with sporangia survival curves may enhance effective management of potato late blight, caused by Phytophthora infestans. We utilized a non-parametric density estimation approach to evaluate the cumulative probability of occurrence of temperature and relat...

  7. Estimating numbers of females with cubs-of-the-year in the Yellowstone grizzly bear population

    USGS Publications Warehouse

    Keating, K.A.; Schwartz, C.C.; Haroldson, M.A.; Moody, D.

    2001-01-01

    For grizzly bears (Ursus arctos horribilis) in the Greater Yellowstone Ecosystem (GYE), minimum population size and allowable numbers of human-caused mortalities have been calculated as a function of the number of unique females with cubs-of-the-year (FCUB) seen during a 3- year period. This approach underestimates the total number of FCUB, thereby biasing estimates of population size and sustainable mortality. Also, it does not permit calculation of valid confidence bounds. Many statistical methods can resolve or mitigate these problems, but there is no universal best method. Instead, relative performances of different methods can vary with population size, sample size, and degree of heterogeneity among sighting probabilities for individual animals. We compared 7 nonparametric estimators, using Monte Carlo techniques to assess performances over the range of sampling conditions deemed plausible for the Yellowstone population. Our goal was to estimate the number of FCUB present in the population each year. Our evaluation differed from previous comparisons of such estimators by including sample coverage methods and by treating individual sightings, rather than sample periods, as the sample unit. Consequently, our conclusions also differ from earlier studies. Recommendations regarding estimators and necessary sample sizes are presented, together with estimates of annual numbers of FCUB in the Yellowstone population with bootstrap confidence bounds.

  8. Time-Varying Delay Estimation Applied to the Surface Electromyography Signals Using the Parametric Approach

    NASA Astrophysics Data System (ADS)

    Luu, Gia Thien; Boualem, Abdelbassit; Duy, Tran Trung; Ravier, Philippe; Butteli, Olivier

    Muscle Fiber Conduction Velocity (MFCV) can be calculated from the time delay between the surface electromyographic (sEMG) signals recorded by electrodes aligned with the fiber direction. In order to take into account the non-stationarity during the dynamic contraction (the most daily life situation) of the data, the developed methods have to consider that the MFCV changes over time, which induces time-varying delays and the data is non-stationary (change of Power Spectral Density (PSD)). In this paper, the problem of TVD estimation is considered using a parametric method. First, the polynomial model of TVD has been proposed. Then, the TVD model parameters are estimated by using a maximum likelihood estimation (MLE) strategy solved by a deterministic optimization technique (Newton) and stochastic optimization technique, called simulated annealing (SA). The performance of the two techniques is also compared. We also derive two appropriate Cramer-Rao Lower Bounds (CRLB) for the estimated TVD model parameters and for the TVD waveforms. Monte-Carlo simulation results show that the estimation of both the model parameters and the TVD function is unbiased and that the variance obtained is close to the derived CRBs. A comparison with non-parametric approaches of the TVD estimation is also presented and shows the superiority of the method proposed.

  9. Hidden Markov models and neural networks for fault detection in dynamic systems

    NASA Technical Reports Server (NTRS)

    Smyth, Padhraic

    1994-01-01

    Neural networks plus hidden Markov models (HMM) can provide excellent detection and false alarm rate performance in fault detection applications, as shown in this viewgraph presentation. Modified models allow for novelty detection. Key contributions of neural network models are: (1) excellent nonparametric discrimination capability; (2) a good estimator of posterior state probabilities, even in high dimensions, and thus can be embedded within overall probabilistic model (HMM); and (3) simple to implement compared to other nonparametric models. Neural network/HMM monitoring model is currently being integrated with the new Deep Space Network (DSN) antenna controller software and will be on-line monitoring a new DSN 34-m antenna (DSS-24) by July, 1994.

  10. Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds

    PubMed Central

    Bhattacharya, Abhishek; Dunson, David B.

    2012-01-01

    This article considers a broad class of kernel mixture density models on compact metric spaces and manifolds. Following a Bayesian approach with a nonparametric prior on the location mixing distribution, sufficient conditions are obtained on the kernel, prior and the underlying space for strong posterior consistency at any continuous density. The prior is also allowed to depend on the sample size n and sufficient conditions are obtained for weak and strong consistency. These conditions are verified on compact Euclidean spaces using multivariate Gaussian kernels, on the hypersphere using a von Mises-Fisher kernel and on the planar shape space using complex Watson kernels. PMID:22984295

  11. Reference interval estimation: Methodological comparison using extensive simulations and empirical data.

    PubMed

    Daly, Caitlin H; Higgins, Victoria; Adeli, Khosrow; Grey, Vijay L; Hamid, Jemila S

    2017-12-01

    To statistically compare and evaluate commonly used methods of estimating reference intervals and to determine which method is best based on characteristics of the distribution of various data sets. Three approaches for estimating reference intervals, i.e. parametric, non-parametric, and robust, were compared with simulated Gaussian and non-Gaussian data. The hierarchy of the performances of each method was examined based on bias and measures of precision. The findings of the simulation study were illustrated through real data sets. In all Gaussian scenarios, the parametric approach provided the least biased and most precise estimates. In non-Gaussian scenarios, no single method provided the least biased and most precise estimates for both limits of a reference interval across all sample sizes, although the non-parametric approach performed the best for most scenarios. The hierarchy of the performances of the three methods was only impacted by sample size and skewness. Differences between reference interval estimates established by the three methods were inflated by variability. Whenever possible, laboratories should attempt to transform data to a Gaussian distribution and use the parametric approach to obtain the most optimal reference intervals. When this is not possible, laboratories should consider sample size and skewness as factors in their choice of reference interval estimation method. The consequences of false positives or false negatives may also serve as factors in this decision. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  12. Kernel PLS Estimation of Single-trial Event-related Potentials

    NASA Technical Reports Server (NTRS)

    Rosipal, Roman; Trejo, Leonard J.

    2004-01-01

    Nonlinear kernel partial least squaes (KPLS) regressior, is a novel smoothing approach to nonparametric regression curve fitting. We have developed a KPLS approach to the estimation of single-trial event related potentials (ERPs). For improved accuracy of estimation, we also developed a local KPLS method for situations in which there exists prior knowledge about the approximate latency of individual ERP components. To assess the utility of the KPLS approach, we compared non-local KPLS and local KPLS smoothing with other nonparametric signal processing and smoothing methods. In particular, we examined wavelet denoising, smoothing splines, and localized smoothing splines. We applied these methods to the estimation of simulated mixtures of human ERPs and ongoing electroencephalogram (EEG) activity using a dipole simulator (BESA). In this scenario we considered ongoing EEG to represent spatially and temporally correlated noise added to the ERPs. This simulation provided a reasonable but simplified model of real-world ERP measurements. For estimation of the simulated single-trial ERPs, local KPLS provided a level of accuracy that was comparable with or better than the other methods. We also applied the local KPLS method to the estimation of human ERPs recorded in an experiment on co,onitive fatigue. For these data, the local KPLS method provided a clear improvement in visualization of single-trial ERPs as well as their averages. The local KPLS method may serve as a new alternative to the estimation of single-trial ERPs and improvement of ERP averages.

  13. One-shot estimate of MRMC variance: AUC.

    PubMed

    Gallas, Brandon D

    2006-03-01

    One popular study design for estimating the area under the receiver operating characteristic curve (AUC) is the one in which a set of readers reads a set of cases: a fully crossed design in which every reader reads every case. The variability of the subsequent reader-averaged AUC has two sources: the multiple readers and the multiple cases (MRMC). In this article, we present a nonparametric estimate for the variance of the reader-averaged AUC that is unbiased and does not use resampling tools. The one-shot estimate is based on the MRMC variance derived by the mechanistic approach of Barrett et al. (2005), as well as the nonparametric variance of a single-reader AUC derived in the literature on U statistics. We investigate the bias and variance properties of the one-shot estimate through a set of Monte Carlo simulations with simulated model observers and images. The different simulation configurations vary numbers of readers and cases, amounts of image noise and internal noise, as well as how the readers are constructed. We compare the one-shot estimate to a method that uses the jackknife resampling technique with an analysis of variance model at its foundation (Dorfman et al. 1992). The name one-shot highlights that resampling is not used. The one-shot and jackknife estimators behave similarly, with the one-shot being marginally more efficient when the number of cases is small. We have derived a one-shot estimate of the MRMC variance of AUC that is based on a probabilistic foundation with limited assumptions, is unbiased, and compares favorably to an established estimate.

  14. Estimation of variance in Cox's regression model with shared gamma frailties.

    PubMed

    Andersen, P K; Klein, J P; Knudsen, K M; Tabanera y Palacios, R

    1997-12-01

    The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.

  15. A practical divergence measure for survival distributions that can be estimated from Kaplan-Meier curves.

    PubMed

    Cox, Trevor F; Czanner, Gabriela

    2016-06-30

    This paper introduces a new simple divergence measure between two survival distributions. For two groups of patients, the divergence measure between their associated survival distributions is based on the integral of the absolute difference in probabilities that a patient from one group dies at time t and a patient from the other group survives beyond time t and vice versa. In the case of non-crossing hazard functions, the divergence measure is closely linked to the Harrell concordance index, C, the Mann-Whitney test statistic and the area under a receiver operating characteristic curve. The measure can be used in a dynamic way where the divergence between two survival distributions from time zero up to time t is calculated enabling real-time monitoring of treatment differences. The divergence can be found for theoretical survival distributions or can be estimated non-parametrically from survival data using Kaplan-Meier estimates of the survivor functions. The estimator of the divergence is shown to be generally unbiased and approximately normally distributed. For the case of proportional hazards, the constituent parts of the divergence measure can be used to assess the proportional hazards assumption. The use of the divergence measure is illustrated on the survival of pancreatic cancer patients. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. Nonparametric Hierarchical Bayesian Model for Functional Brain Parcellation

    PubMed Central

    Lashkari, Danial; Sridharan, Ramesh; Vul, Edward; Hsieh, Po-Jang; Kanwisher, Nancy; Golland, Polina

    2011-01-01

    We develop a method for unsupervised analysis of functional brain images that learns group-level patterns of functional response. Our algorithm is based on a generative model that comprises two main layers. At the lower level, we express the functional brain response to each stimulus as a binary activation variable. At the next level, we define a prior over the sets of activation variables in all subjects. We use a Hierarchical Dirichlet Process as the prior in order to simultaneously learn the patterns of response that are shared across the group, and to estimate the number of these patterns supported by data. Inference based on this model enables automatic discovery and characterization of salient and consistent patterns in functional signals. We apply our method to data from a study that explores the response of the visual cortex to a collection of images. The discovered profiles of activation correspond to selectivity to a number of image categories such as faces, bodies, and scenes. More generally, our results appear superior to the results of alternative data-driven methods in capturing the category structure in the space of stimuli. PMID:21841977

  17. Independent Assessment of ITRF Site Velocities using GPS Imaging

    NASA Astrophysics Data System (ADS)

    Blewitt, G.; Hammond, W. C.; Kreemer, C.; Altamimi, Z.

    2015-12-01

    The long-term stability of ITRF is critical to the most challenging scientific applications such as the slow variation of sea level, and of ice sheet loading in Greenland and Antarctica. In 2010, the National Research Council recommended aiming for stability at the level of 1 mm/decade in the ITRF origin and scale. This requires that the ITRF include many globally-distributed sites with motions that are predictable to within a few mm/decade, with a significant number of sites having collocated stations of multiple techniques. Quantifying the stability of ITRF stations can be useful to understand stability of ITRF parameters, and to help the selection and weighting of ITRF stations. Here we apply a new suite of techniques for an independent assessment of ITRF site velocities. Our "GPS Imaging" suite is founded on the principle that, for the case of large numbers of data, the trend can be estimated objectively, automatically, robustly, and accurately by applying non-parametric techniques, which use quantile statistics (e.g., the median). At the foundation of GPS Imaging is the estimator "MIDAS" (Median Interannual Difference Adjusted for Skewness). MIDAS estimates the velocity with a realistic error bar based on sub-sampling the coordinate time series. MIDAS is robust to step discontinuities, outliers, seasonality, and heteroscedasticity. Common-mode noise filters enhance regional- to continental-scale precision in MIDAS estimates, just as they do for standard estimation techniques. Secondly, in regions where there is sufficient spatial sampling, GPS Imaging uses MIDAS velocity estimates to generate a regionally-representative velocity map. For this we apply a median spatial filter to despeckle the maps. We use GPS Imaging to address two questions: (1) How well do the ITRF site velocities derived by parametric estimation agree with non-parametric techniques? (2) Are ITRF site velocities regionally representative? These questions aim to get a handle on (1) the accuracy of ITRF site velocities as a function of characteristics of contributing station data, such as number of step parameters and total time span; and (2) evidence of local processes affecting site velocity, which may impact site stability. Such quantification can be used to rank stations in terms the risk that they may pose to the stability of ITRF.

  18. Survival curve estimation with dependent left truncated data using Cox's model.

    PubMed

    Mackenzie, Todd

    2012-10-19

    The Kaplan-Meier and closely related Lynden-Bell estimators are used to provide nonparametric estimation of the distribution of a left-truncated random variable. These estimators assume that the left-truncation variable is independent of the time-to-event. This paper proposes a semiparametric method for estimating the marginal distribution of the time-to-event that does not require independence. It models the conditional distribution of the time-to-event given the truncation variable using Cox's model for left truncated data, and uses inverse probability weighting. We report the results of simulations and illustrate the method using a survival study.

  19. Statistical methods for estimating normal blood chemistry ranges and variance in rainbow trout (Salmo gairdneri), Shasta Strain

    USGS Publications Warehouse

    Wedemeyer, Gary A.; Nelson, Nancy C.

    1975-01-01

    Gaussian and nonparametric (percentile estimate and tolerance interval) statistical methods were used to estimate normal ranges for blood chemistry (bicarbonate, bilirubin, calcium, hematocrit, hemoglobin, magnesium, mean cell hemoglobin concentration, osmolality, inorganic phosphorus, and pH for juvenile rainbow (Salmo gairdneri, Shasta strain) trout held under defined environmental conditions. The percentile estimate and Gaussian methods gave similar normal ranges, whereas the tolerance interval method gave consistently wider ranges for all blood variables except hemoglobin. If the underlying frequency distribution is unknown, the percentile estimate procedure would be the method of choice.

  20. Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.

    PubMed

    Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin

    2017-01-01

    Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.

  1. Design of adaptive control systems by means of self-adjusting transversal filters

    NASA Technical Reports Server (NTRS)

    Merhav, S. J.

    1986-01-01

    The design of closed-loop adaptive control systems based on nonparametric identification was addressed. Implementation is by self-adjusting Least Mean Square (LMS) transversal filters. The design concept is Model Reference Adaptive Control (MRAC). Major issues are to preserve the linearity of the error equations of each LMS filter, and to prevent estimation bias that is due to process or measurement noise, thus providing necessary conditions for the convergence and stability of the control system. The controlled element is assumed to be asymptotically stable and minimum phase. Because of the nonparametric Finite Impulse Response (FIR) estimates provided by the LMS filters, a-priori information on the plant model is needed only in broad terms. Following a survey of control system configurations and filter design considerations, system implementation is shown here in Single Input Single Output (SISO) format which is readily extendable to multivariable forms. In extensive computer simulation studies the controlled element is represented by a second-order system with widely varying damping, natural frequency, and relative degree.

  2. Robust check loss-based variable selection of high-dimensional single-index varying-coefficient model

    NASA Astrophysics Data System (ADS)

    Song, Yunquan; Lin, Lu; Jian, Ling

    2016-07-01

    Single-index varying-coefficient model is an important mathematical modeling method to model nonlinear phenomena in science and engineering. In this paper, we develop a variable selection method for high-dimensional single-index varying-coefficient models using a shrinkage idea. The proposed procedure can simultaneously select significant nonparametric components and parametric components. Under defined regularity conditions, with appropriate selection of tuning parameters, the consistency of the variable selection procedure and the oracle property of the estimators are established. Moreover, due to the robustness of the check loss function to outliers in the finite samples, our proposed variable selection method is more robust than the ones based on the least squares criterion. Finally, the method is illustrated with numerical simulations.

  3. Conditional Anomaly Detection with Soft Harmonic Functions

    PubMed Central

    Valko, Michal; Kveton, Branislav; Valizadegan, Hamed; Cooper, Gregory F.; Hauskrecht, Milos

    2012-01-01

    In this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions. PMID:25309142

  4. Conditional Anomaly Detection with Soft Harmonic Functions.

    PubMed

    Valko, Michal; Kveton, Branislav; Valizadegan, Hamed; Cooper, Gregory F; Hauskrecht, Milos

    2011-01-01

    In this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions.

  5. A new approach to evaluate gamma-ray measurements

    NASA Technical Reports Server (NTRS)

    Dejager, O. C.; Swanepoel, J. W. H.; Raubenheimer, B. C.; Vandervalt, D. J.

    1985-01-01

    Misunderstandings about the term random samples its implications may easily arise. Conditions under which the phases, obtained from arrival times, do not form a random sample and the dangers involved are discussed. Watson's U sup 2 test for uniformity is recommended for light curves with duty cycles larger than 10%. Under certain conditions, non-parametric density estimation may be used to determine estimates of the true light curve and its parameters.

  6. Estimating the extreme low-temperature event using nonparametric methods

    NASA Astrophysics Data System (ADS)

    D'Silva, Anisha

    This thesis presents a new method of estimating the one-in-N low temperature threshold using a non-parametric statistical method called kernel density estimation applied to daily average wind-adjusted temperatures. We apply our One-in-N Algorithm to local gas distribution companies (LDCs), as they have to forecast the daily natural gas needs of their consumers. In winter, demand for natural gas is high. Extreme low temperature events are not directly related to an LDCs gas demand forecasting, but knowledge of extreme low temperatures is important to ensure that an LDC has enough capacity to meet customer demands when extreme low temperatures are experienced. We present a detailed explanation of our One-in-N Algorithm and compare it to the methods using the generalized extreme value distribution, the normal distribution, and the variance-weighted composite distribution. We show that our One-in-N Algorithm estimates the one-in- N low temperature threshold more accurately than the methods using the generalized extreme value distribution, the normal distribution, and the variance-weighted composite distribution according to root mean square error (RMSE) measure at a 5% level of significance. The One-in- N Algorithm is tested by counting the number of times the daily average wind-adjusted temperature is less than or equal to the one-in- N low temperature threshold.

  7. Nonparametric methods for analyzing recurrent gap time data with application to infections after hematopoietic cell transplant.

    PubMed

    Lee, Chi Hyun; Luo, Xianghua; Huang, Chiung-Yu; DeFor, Todd E; Brunstein, Claudio G; Weisdorf, Daniel J

    2016-06-01

    Infection is one of the most common complications after hematopoietic cell transplantation. Many patients experience infectious complications repeatedly after transplant. Existing statistical methods for recurrent gap time data typically assume that patients are enrolled due to the occurrence of an event of interest, and subsequently experience recurrent events of the same type; moreover, for one-sample estimation, the gap times between consecutive events are usually assumed to be identically distributed. Applying these methods to analyze the post-transplant infection data will inevitably lead to incorrect inferential results because the time from transplant to the first infection has a different biological meaning than the gap times between consecutive recurrent infections. Some unbiased yet inefficient methods include univariate survival analysis methods based on data from the first infection or bivariate serial event data methods based on the first and second infections. In this article, we propose a nonparametric estimator of the joint distribution of time from transplant to the first infection and the gap times between consecutive infections. The proposed estimator takes into account the potentially different distributions of the two types of gap times and better uses the recurrent infection data. Asymptotic properties of the proposed estimators are established. © 2015, The International Biometric Society.

  8. Nonparametric methods for analyzing recurrent gap time data with application to infections after hematopoietic cell transplant

    PubMed Central

    Lee, Chi Hyun; Huang, Chiung-Yu; DeFor, Todd E.; Brunstein, Claudio G.; Weisdorf, Daniel J.

    2015-01-01

    Summary Infection is one of the most common complications after hematopoietic cell transplantation. Many patients experience infectious complications repeatedly after transplant. Existing statistical methods for recurrent gap time data typically assume that patients are enrolled due to the occurrence of an event of interest, and subsequently experience recurrent events of the same type; moreover, for one-sample estimation, the gap times between consecutive events are usually assumed to be identically distributed. Applying these methods to analyze the post-transplant infection data will inevitably lead to incorrect inferential results because the time from transplant to the first infection has a different biological meaning than the gap times between consecutive recurrent infections. Some unbiased yet inefficient methods include univariate survival analysis methods based on data from the first infection or bivariate serial event data methods based on the first and second infections. In this paper, we propose a nonparametric estimator of the joint distribution of time from transplant to the first infection and the gap times between consecutive infections. The proposed estimator takes into account the potentially different distributions of the two types of gap times and better uses the recurrent infection data. Asymptotic properties of the proposed estimators are established. PMID:26575402

  9. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    ERIC Educational Resources Information Center

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  10. Parametric Covariance Model for Horizon-Based Optical Navigation

    NASA Technical Reports Server (NTRS)

    Hikes, Jacob; Liounis, Andrew J.; Christian, John A.

    2016-01-01

    This Note presents an entirely parametric version of the covariance for horizon-based optical navigation measurements. The covariance can be written as a function of only the spacecraft position, two sensor design parameters, the illumination direction, the size of the observed planet, the size of the lit arc to be used, and the total number of observed horizon points. As a result, one may now more clearly understand the sensitivity of horizon-based optical navigation performance as a function of these key design parameters, which is insight that was obscured in previous (and nonparametric) versions of the covariance. Finally, the new parametric covariance is shown to agree with both the nonparametric analytic covariance and results from a Monte Carlo analysis.

  11. Nonparametric bootstrap analysis with applications to demographic effects in demand functions.

    PubMed

    Gozalo, P L

    1997-12-01

    "A new bootstrap proposal, labeled smooth conditional moment (SCM) bootstrap, is introduced for independent but not necessarily identically distributed data, where the classical bootstrap procedure fails.... A good example of the benefits of using nonparametric and bootstrap methods is the area of empirical demand analysis. In particular, we will be concerned with their application to the study of two important topics: what are the most relevant effects of household demographic variables on demand behavior, and to what extent present parametric specifications capture these effects." excerpt

  12. A Robust Step Detection Algorithm and Walking Distance Estimation Based on Daily Wrist Activity Recognition Using a Smart Band.

    PubMed

    Trong Bui, Duong; Nguyen, Nhan Duc; Jeong, Gu-Min

    2018-06-25

    Human activity recognition and pedestrian dead reckoning are an interesting field because of their importance utilities in daily life healthcare. Currently, these fields are facing many challenges, one of which is the lack of a robust algorithm with high performance. This paper proposes a new method to implement a robust step detection and adaptive distance estimation algorithm based on the classification of five daily wrist activities during walking at various speeds using a smart band. The key idea is that the non-parametric adaptive distance estimator is performed after two activity classifiers and a robust step detector. In this study, two classifiers perform two phases of recognizing five wrist activities during walking. Then, a robust step detection algorithm, which is integrated with an adaptive threshold, peak and valley correction algorithm, is applied to the classified activities to detect the walking steps. In addition, the misclassification activities are fed back to the previous layer. Finally, three adaptive distance estimators, which are based on a non-parametric model of the average walking speed, calculate the length of each strike. The experimental results show that the average classification accuracy is about 99%, and the accuracy of the step detection is 98.7%. The error of the estimated distance is 2.2⁻4.2% depending on the type of wrist activities.

  13. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

    PubMed

    Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

  14. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification

    PubMed Central

    Feng, Yang; Jiang, Jiancheng; Tong, Xin

    2015-01-01

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970

  15. Easy and accurate variance estimation of the nonparametric estimator of the partial area under the ROC curve and its application.

    PubMed

    Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D

    2016-06-15

    The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  16. The influence of linear elements on plant species diversity of Mediterranean rural landscapes: assessment of different indices and statistical approaches.

    PubMed

    García del Barrio, J M; Ortega, M; Vázquez De la Cueva, A; Elena-Rosselló, R

    2006-08-01

    This paper mainly aims to study the linear element influence on the estimation of vascular plant species diversity in five Mediterranean landscapes modeled as land cover patch mosaics. These landscapes have several core habitats and a different set of linear elements--habitat edges or ecotones, roads or railways, rivers, streams and hedgerows on farm land--whose plant composition were examined. Secondly, it aims to check plant diversity estimation in Mediterranean landscapes using parametric and non-parametric procedures, with two indices: Species richness and Shannon index. Land cover types and landscape linear elements were identified from aerial photographs. Their spatial information was processed using GIS techniques. Field plots were selected using a stratified sampling design according to relieve and tree density of each habitat type. A 50x20 m2 multi-scale sampling plot was designed for the core habitats and across the main landscape linear elements. Richness and diversity of plant species were estimated by comparing the observed field data to ICE (Incidence-based Coverage Estimator) and ACE (Abundance-based Coverage Estimator) non-parametric estimators. The species density, percentage of unique species, and alpha diversity per plot were significantly higher (p < 0.05) in linear elements than in core habitats. ICE estimate of number of species was 32% higher than of ACE estimate, which did not differ significantly from the observed values. Accumulated species richness in core habitats together with linear elements, were significantly higher than those recorded only in the core habitats in all the landscapes. Conversely, Shannon diversity index did not show significant differences.

  17. Broadband Processing in a Noisy Shallow Ocean Environment: A Particle Filtering Approach

    DOE PAGES

    Candy, J. V.

    2016-04-14

    Here we report that when a broadband source propagates sound in a shallow ocean the received data can become quite complicated due to temperature-related sound-speed variations and therefore a highly dispersive environment. Noise and uncertainties disrupt this already chaotic environment even further because disturbances propagate through the same inherent acoustic channel. The broadband (signal) estimation/detection problem can be decomposed into a set of narrowband solutions that are processed separately and then combined to achieve more enhancement of signal levels than that available from a single frequency, thereby allowing more information to be extracted leading to a more reliable source detection.more » A Bayesian solution to the broadband modal function tracking, pressure-field enhancement, and source detection problem is developed that leads to nonparametric estimates of desired posterior distributions enabling the estimation of useful statistics and an improved processor/detector. In conclusion, to investigate the processor capabilities, we synthesize an ensemble of noisy, broadband, shallow-ocean measurements to evaluate its overall performance using an information theoretical metric for the preprocessor and the receiver operating characteristic curve for the detector.« less

  18. Sample Skewness as a Statistical Measurement of Neuronal Tuning Sharpness

    PubMed Central

    Samonds, Jason M.; Potetz, Brian R.; Lee, Tai Sing

    2014-01-01

    We propose using the statistical measurement of the sample skewness of the distribution of mean firing rates of a tuning curve to quantify sharpness of tuning. For some features, like binocular disparity, tuning curves are best described by relatively complex and sometimes diverse functions, making it difficult to quantify sharpness with a single function and parameter. Skewness provides a robust nonparametric measure of tuning curve sharpness that is invariant with respect to the mean and variance of the tuning curve and is straightforward to apply to a wide range of tuning, including simple orientation tuning curves and complex object tuning curves that often cannot even be described parametrically. Because skewness does not depend on a specific model or function of tuning, it is especially appealing to cases of sharpening where recurrent interactions among neurons produce sharper tuning curves that deviate in a complex manner from the feedforward function of tuning. Since tuning curves for all neurons are not typically well described by a single parametric function, this model independence additionally allows skewness to be applied to all recorded neurons, maximizing the statistical power of a set of data. We also compare skewness with other nonparametric measures of tuning curve sharpness and selectivity. Compared to these other nonparametric measures tested, skewness is best used for capturing the sharpness of multimodal tuning curves defined by narrow peaks (maximum) and broad valleys (minima). Finally, we provide a more formal definition of sharpness using a shape-based information gain measure and derive and show that skewness is correlated with this definition. PMID:24555451

  19. Multi-frequency Phase Unwrap from Noisy Data: Adaptive Least Squares Approach

    NASA Astrophysics Data System (ADS)

    Katkovnik, Vladimir; Bioucas-Dias, José

    2010-04-01

    Multiple frequency interferometry is, basically, a phase acquisition strategy aimed at reducing or eliminating the ambiguity of the wrapped phase observations or, equivalently, reducing or eliminating the fringe ambiguity order. In multiple frequency interferometry, the phase measurements are acquired at different frequencies (or wavelengths) and recorded using the corresponding sensors (measurement channels). Assuming that the absolute phase to be reconstructed is piece-wise smooth, we use a nonparametric regression technique for the phase reconstruction. The nonparametric estimates are derived from a local least squares criterion, which, when applied to the multifrequency data, yields denoised (filtered) phase estimates with extended ambiguity (periodized), compared with the phase ambiguities inherent to each measurement frequency. The filtering algorithm is based on local polynomial (LPA) approximation for design of nonlinear filters (estimators) and adaptation of these filters to unknown smoothness of the spatially varying absolute phase [9]. For phase unwrapping, from filtered periodized data, we apply the recently introduced robust (in the sense of discontinuity preserving) PUMA unwrapping algorithm [1]. Simulations give evidence that the proposed algorithm yields state-of-the-art performance for continuous as well as for discontinues phase surfaces, enabling phase unwrapping in extraordinary difficult situations when all other algorithms fail.

  20. PROPERTIES OF THE WEIGHTING CELL ESTIMATOR UNDER A NONPARAMETRIC RESPONSE MECHANISM. (R829095C002)

    EPA Science Inventory

    The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...

  1. Robust non-parametric one-sample tests for the analysis of recurrent events.

    PubMed

    Rebora, Paola; Galimberti, Stefania; Valsecchi, Maria Grazia

    2010-12-30

    One-sample non-parametric tests are proposed here for inference on recurring events. The focus is on the marginal mean function of events and the basis for inference is the standardized distance between the observed and the expected number of events under a specified reference rate. Different weights are considered in order to account for various types of alternative hypotheses on the mean function of the recurrent events process. A robust version and a stratified version of the test are also proposed. The performance of these tests was investigated through simulation studies under various underlying event generation processes, such as homogeneous and nonhomogeneous Poisson processes, autoregressive and renewal processes, with and without frailty effects. The robust versions of the test have been shown to be suitable in a wide variety of event generating processes. The motivating context is a study on gene therapy in a very rare immunodeficiency in children, where a major end-point is the recurrence of severe infections. Robust non-parametric one-sample tests for recurrent events can be useful to assess efficacy and especially safety in non-randomized studies or in epidemiological studies for comparison with a standard population. Copyright © 2010 John Wiley & Sons, Ltd.

  2. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

    PubMed

    Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

    2014-07-01

    Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  3. Cluster-level statistical inference in fMRI datasets: The unexpected behavior of random fields in high dimensions.

    PubMed

    Bansal, Ravi; Peterson, Bradley S

    2018-06-01

    Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.

  4. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

    PubMed

    Guindon, Stéphane; Dufayard, Jean-François; Lefort, Vincent; Anisimova, Maria; Hordijk, Wim; Gascuel, Olivier

    2010-05-01

    PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.

  5. Total recognition discriminability in Huntington's and Alzheimer's disease.

    PubMed

    Graves, Lisa V; Holden, Heather M; Delano-Wood, Lisa; Bondi, Mark W; Woods, Steven Paul; Corey-Bloom, Jody; Salmon, David P; Delis, Dean C; Gilbert, Paul E

    2017-03-01

    Both the original and second editions of the California Verbal Learning Test (CVLT) provide an index of total recognition discriminability (TRD) but respectively utilize nonparametric and parametric formulas to compute the index. However, the degree to which population differences in TRD may vary across applications of these nonparametric and parametric formulas has not been explored. We evaluated individuals with Huntington's disease (HD), individuals with Alzheimer's disease (AD), healthy middle-aged adults, and healthy older adults who were administered the CVLT-II. Yes/no recognition memory indices were generated, including raw nonparametric TRD scores (as used in CVLT-I) and raw and standardized parametric TRD scores (as used in CVLT-II), as well as false positive (FP) rates. Overall, the patient groups had significantly lower TRD scores than their comparison groups. The application of nonparametric and parametric formulas resulted in comparable effect sizes for all group comparisons on raw TRD scores. Relative to the HD group, the AD group showed comparable standardized parametric TRD scores (despite lower raw nonparametric and parametric TRD scores), whereas the previous CVLT literature has shown that standardized TRD scores are lower in AD than in HD. Possible explanations for the similarity in standardized parametric TRD scores in the HD and AD groups in the present study are discussed, with an emphasis on the importance of evaluating TRD scores in the context of other indices such as FP rates in an effort to fully capture recognition memory function using the CVLT-II.

  6. Generalized linear mixed models with varying coefficients for longitudinal data.

    PubMed

    Zhang, Daowen

    2004-03-01

    The routinely assumed parametric functional form in the linear predictor of a generalized linear mixed model for longitudinal data may be too restrictive to represent true underlying covariate effects. We relax this assumption by representing these covariate effects by smooth but otherwise arbitrary functions of time, with random effects used to model the correlation induced by among-subject and within-subject variation. Due to the usually intractable integration involved in evaluating the quasi-likelihood function, the double penalized quasi-likelihood (DPQL) approach of Lin and Zhang (1999, Journal of the Royal Statistical Society, Series B61, 381-400) is used to estimate the varying coefficients and the variance components simultaneously by representing a nonparametric function by a linear combination of fixed effects and random effects. A scaled chi-squared test based on the mixed model representation of the proposed model is developed to test whether an underlying varying coefficient is a polynomial of certain degree. We evaluate the performance of the procedures through simulation studies and illustrate their application with Indonesian children infectious disease data.

  7. Novel Application of Density Estimation Techniques in Muon Ionization Cooling Experiment

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Mohayai, Tanaz Angelina; Snopok, Pavel; Neuffer, David

    The international Muon Ionization Cooling Experiment (MICE) aims to demonstrate muon beam ionization cooling for the first time and constitutes a key part of the R&D towards a future neutrino factory or muon collider. Beam cooling reduces the size of the phase space volume occupied by the beam. Non-parametric density estimation techniques allow very precise calculation of the muon beam phase-space density and its increase as a result of cooling. These density estimation techniques are investigated in this paper and applied in order to estimate the reduction in muon beam size in MICE under various conditions.

  8. On the Bias-Amplifying Effect of Near Instruments in Observational Studies

    ERIC Educational Resources Information Center

    Steiner, Peter M.; Kim, Yongnam

    2014-01-01

    In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…

  9. A Bayesian Nonparametric Meta-Analysis Model

    ERIC Educational Resources Information Center

    Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.

    2015-01-01

    In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…

  10. A Nonparametric Geostatistical Method For Estimating Species Importance

    Treesearch

    Andrew J. Lister; Rachel Riemann; Michael Hoppus

    2001-01-01

    Parametric statistical methods are not always appropriate for conducting spatial analyses of forest inventory data. Parametric geostatistical methods such as variography and kriging are essentially averaging procedures, and thus can be affected by extreme values. Furthermore, non normal distributions violate the assumptions of analyses in which test statistics are...

  11. Estimates of the Sampling Distribution of Scalability Coefficient H

    ERIC Educational Resources Information Center

    Van Onna, Marieke J. H.

    2004-01-01

    Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…

  12. Comparison of least squares and exponential sine sweep methods for Parallel Hammerstein Models estimation

    NASA Astrophysics Data System (ADS)

    Rebillat, Marc; Schoukens, Maarten

    2018-05-01

    Linearity is a common assumption for many real-life systems, but in many cases the nonlinear behavior of systems cannot be ignored and must be modeled and estimated. Among the various existing classes of nonlinear models, Parallel Hammerstein Models (PHM) are interesting as they are at the same time easy to interpret as well as to estimate. One way to estimate PHM relies on the fact that the estimation problem is linear in the parameters and thus that classical least squares (LS) estimation algorithms can be used. In that area, this article introduces a regularized LS estimation algorithm inspired on some of the recently developed regularized impulse response estimation techniques. Another mean to estimate PHM consists in using parametric or non-parametric exponential sine sweeps (ESS) based methods. These methods (LS and ESS) are founded on radically different mathematical backgrounds but are expected to tackle the same issue. A methodology is proposed here to compare them with respect to (i) their accuracy, (ii) their computational cost, and (iii) their robustness to noise. Tests are performed on simulated systems for several values of methods respective parameters and of signal to noise ratio. Results show that, for a given set of data points, the ESS method is less demanding in computational resources than the LS method but that it is also less accurate. Furthermore, the LS method needs parameters to be set in advance whereas the ESS method is not subject to conditioning issues and can be fully non-parametric. In summary, for a given set of data points, ESS method can provide a first, automatic, and quick overview of a nonlinear system than can guide more computationally demanding and precise methods, such as the regularized LS one proposed here.

  13. A PDF-based classification of gait cadence patterns in patients with amyotrophic lateral sclerosis.

    PubMed

    Wu, Yunfeng; Ng, Sin Chun

    2010-01-01

    Amyotrophic lateral sclerosis (ALS) is a type of neurological disease due to the degeneration of motor neurons. During the course of such a progressive disease, it would be difficult for ALS patients to regulate normal locomotion, so that the gait stability becomes perturbed. This paper presents a pilot statistical study on the gait cadence (or stride interval) in ALS, based on the statistical analysis method. The probability density functions (PDFs) of stride interval were first estimated with the nonparametric Parzen-window method. We computed the mean of the left-foot stride interval and the modified Kullback-Leibler divergence (MKLD) from the PDFs estimated. The analysis results suggested that both of these two statistical parameters were significantly altered in ALS, and the least-squares support vector machine (LS-SVM) may effectively distinguish the stride patterns between the ALS patients and healthy controls, with an accurate rate of 82.8% and an area of 0.87 under the receiver operating characteristic curve.

  14. Forecasting the impact of transport improvements on commuting and residential choice

    NASA Astrophysics Data System (ADS)

    Elhorst, J. Paul; Oosterhaven, Jan

    2006-03-01

    This paper develops a probabilistic, competing-destinations, assignment model that predicts changes in the spatial pattern of the working population as a result of transport improvements. The choice of residence is explained by a new non-parametric model, which represents an alternative to the popular multinominal logit model. Travel times between zones are approximated by a normal distribution function with different mean and variance for each pair of zones, whereas previous models only use average travel times. The model’s forecast error of the spatial distribution of the Dutch working population is 7% when tested on 1998 base-year data. To incorporate endogenous changes in its causal variables, an almost ideal demand system is estimated to explain the choice of transport mode, and a new economic geography inter-industry model (RAEM) is estimated to explain the spatial distribution of employment. In the application, the model is used to forecast the impact of six mutually exclusive Dutch core-periphery railway proposals in the projection year 2020.

  15. Coalescent methods for estimating phylogenetic trees.

    PubMed

    Liu, Liang; Yu, Lili; Kubatko, Laura; Pearl, Dennis K; Edwards, Scott V

    2009-10-01

    We review recent models to estimate phylogenetic trees under the multispecies coalescent. Although the distinction between gene trees and species trees has come to the fore of phylogenetics, only recently have methods been developed that explicitly estimate species trees. Of the several factors that can cause gene tree heterogeneity and discordance with the species tree, deep coalescence due to random genetic drift in branches of the species tree has been modeled most thoroughly. Bayesian approaches to estimating species trees utilizes two likelihood functions, one of which has been widely used in traditional phylogenetics and involves the model of nucleotide substitution, and the second of which is less familiar to phylogeneticists and involves the probability distribution of gene trees given a species tree. Other recent parametric and nonparametric methods for estimating species trees involve parsimony criteria, summary statistics, supertree and consensus methods. Species tree approaches are an appropriate goal for systematics, appear to work well in some cases where concatenation can be misleading, and suggest that sampling many independent loci will be paramount. Such methods can also be challenging to implement because of the complexity of the models and computational time. In addition, further elaboration of the simplest of coalescent models will be required to incorporate commonly known issues such as deviation from the molecular clock, gene flow and other genetic forces.

  16. Reconstruction of cosmological matter perturbations in modified gravity

    NASA Astrophysics Data System (ADS)

    Gonzalez, J. E.

    2017-12-01

    The analysis of perturbative quantities is a powerful tool to distinguish between different dark energy models and gravity theories degenerated at the background level. In this work, we generalize the integral solution of the matter density contrast for general relativity gravity [V. Sahni and A. Starobinsky, Int. J. Mod. Phys. D 15, 2105 (2006)., 10.1142/S0218271806009704, U. Alam, V. Sahni, and A. A. Starobinsky, Astrophys. J. 704, 1086 (2009)., 10.1088/0004-637X/704/2/1086] to a wide class of modified gravity (MG) theories. To calculate this solution, it is necessary to have prior knowledge of the Hubble rate, the density parameter at the present epoch (Ωm 0), and the functional form of the effective Newton's constant that characterizes the gravity theory. We estimate in a model-independent way the Hubble expansion rate by applying a nonparametric reconstruction method to model-independent cosmic chronometer data and high-z quasar data. In order to compare our generalized solution of the matter density contrast, using the nonparametric reconstruction of H (z ) from observational data, with a purely theoretical one, we choose a parametrization of the screened modified gravity and the Ωm 0 from WMAP-9 Collaborations. Finally, we calculate the growth index for the analyzed cases, finding very good agreement between theoretical values and the obtained ones using the approach presented in this work.

  17. Estimating the Societal Benefits of THA After Accounting for Work Status and Productivity: A Markov Model Approach.

    PubMed

    Koenig, Lane; Zhang, Qian; Austin, Matthew S; Demiralp, Berna; Fehring, Thomas K; Feng, Chaoling; Mather, Richard C; Nguyen, Jennifer T; Saavoss, Asha; Springer, Bryan D; Yates, Adolph J

    2016-12-01

    Demand for total hip arthroplasty (THA) is high and expected to continue to grow during the next decade. Although much of this growth includes working-aged patients, cost-effectiveness studies on THA have not fully incorporated the productivity effects from surgery. We asked: (1) What is the expected effect of THA on patients' employment and earnings? (2) How does accounting for these effects influence the cost-effectiveness of THA relative to nonsurgical treatment? Taking a societal perspective, we used a Markov model to assess the overall cost-effectiveness of THA compared with nonsurgical treatment. We estimated direct medical costs using Medicare claims data and indirect costs (employment status and worker earnings) using regression models and nonparametric simulations. For direct costs, we estimated average spending 1 year before and after surgery. Spending estimates included physician and related services, hospital inpatient and outpatient care, and postacute care. For indirect costs, we estimated the relationship between functional status and productivity, using data from the National Health Interview Survey and regression analysis. Using regression coefficients and patient survey data, we ran a nonparametric simulation to estimate productivity (probability of working multiplied by earnings if working minus the value of missed work days) before and after THA. We used the Australian Orthopaedic Association National Joint Replacement Registry to obtain revision rates because it contained osteoarthritis-specific THA revision rates by age and gender, which were unavailable in other registry reports. Other model assumptions were extracted from a previously published cost-effectiveness analysis that included a comprehensive literature review. We incorporated all parameter estimates into Markov models to assess THA effects on quality-adjusted life years and lifetime costs. We conducted threshold and sensitivity analyses on direct costs, indirect costs, and revision rates to assess the robustness of our Markov model results. Compared with nonsurgical treatments, THA increased average annual productivity of patients by USD 9503 (95% CI, USD 1446-USD 17,812). We found that THA increases average lifetime direct costs by USD 30,365, which were offset by USD 63,314 in lifetime savings from increased productivity. With net societal savings of USD 32,948 per patient, total lifetime societal savings were estimated at almost USD 10 billion from more than 300,000 THAs performed in the United States each year. Using a Markov model approach, we show that THA produces societal benefits that can offset the costs of THA. When comparing THA with other nonsurgical treatments, policymakers should consider the long-term benefits associated with increased productivity from surgery. Level III, economic and decision analysis.

  18. Applications of quantum entropy to statistics

    NASA Astrophysics Data System (ADS)

    Silver, R. N.; Martz, H. F.

    This paper develops two generalizations of the maximum entropy (ME) principle. First, Shannon classical entropy is replaced by von Neumann quantum entropy to yield a broader class of information divergences (or penalty functions) for statistics applications. Negative relative quantum entropy enforces convexity, positivity, non-local extensivity and prior correlations such as smoothness. This enables the extension of ME methods from their traditional domain of ill-posed in-verse problems to new applications such as non-parametric density estimation. Second, given a choice of information divergence, a combination of ME and Bayes rule is used to assign both prior and posterior probabilities. Hyperparameters are interpreted as Lagrange multipliers enforcing constraints. Conservation principles are proposed to act statistical regularization and other hyperparameters, such as conservation of information and smoothness. ME provides an alternative to hierarchical Bayes methods.

  19. Evaluating effects of developmental education for college students using a regression discontinuity design.

    PubMed

    Moss, Brian G; Yeaton, William H

    2013-10-01

    Annually, American colleges and universities provide developmental education (DE) to millions of underprepared students; however, evaluation estimates of DE benefits have been mixed. Using a prototypic exemplar of DE, our primary objective was to investigate the utility of a replicative evaluative framework for assessing program effectiveness. Within the context of the regression discontinuity (RD) design, this research examined the effectiveness of a DE program for five, sequential cohorts of first-time college students. Discontinuity estimates were generated for individual terms and cumulatively, across terms. Participants were 3,589 first-time community college students. DE program effects were measured by contrasting both college-level English grades and a dichotomous measure of pass/fail, for DE and non-DE students. Parametric and nonparametric estimates of overall effect were positive for continuous and dichotomous measures of achievement (grade and pass/fail). The variability of program effects over time was determined by tracking results within individual terms and cumulatively, across terms. Applying this replication strategy, DE's overall impact was modest (an effect size of approximately .20) but quite consistent, based on parametric and nonparametric estimation approaches. A meta-analysis of five RD results yielded virtually the same estimate as the overall, parametric findings. Subset analysis, though tentative, suggested that males benefited more than females, while academic gains were comparable for different ethnicities. The cumulative, within-study comparison, replication approach offers considerable potential for the evaluation of new and existing policies, particularly when effects are relatively small, as is often the case in applied settings.

  20. Using Cochran's Z Statistic to Test the Kernel-Smoothed Item Response Function Differences between Focal and Reference Groups

    ERIC Educational Resources Information Center

    Zheng, Yinggan; Gierl, Mark J.; Cui, Ying

    2010-01-01

    This study combined the kernel smoothing procedure and a nonparametric differential item functioning statistic--Cochran's Z--to statistically test the difference between the kernel-smoothed item response functions for reference and focal groups. Simulation studies were conducted to investigate the Type I error and power of the proposed…

  1. The influence of filtering and downsampling on the estimation of transfer entropy

    PubMed Central

    Florin, Esther; von Papen, Michael; Timmermann, Lars

    2017-01-01

    Transfer entropy (TE) provides a generalized and model-free framework to study Wiener-Granger causality between brain regions. Because of its nonparametric character, TE can infer directed information flow also from nonlinear systems. Despite its increasing number of applications in neuroscience, not much is known regarding the influence of common electrophysiological preprocessing on its estimation. We test the influence of filtering and downsampling on a recently proposed nearest neighborhood based TE estimator. Different filter settings and downsampling factors were tested in a simulation framework using a model with a linear coupling function and two nonlinear models with sigmoid and logistic coupling functions. For nonlinear coupling and progressively lower low-pass filter cut-off frequencies up to 72% false negative direct connections and up to 26% false positive connections were identified. In contrast, for the linear model, a monotonic increase was only observed for missed indirect connections (up to 86%). High-pass filtering (1 Hz, 2 Hz) had no impact on TE estimation. After low-pass filtering interaction delays were significantly underestimated. Downsampling the data by a factor greater than the assumed interaction delay erased most of the transmitted information and thus led to a very high percentage (67–100%) of false negative direct connections. Low-pass filtering increases the number of missed connections depending on the filters cut-off frequency. Downsampling should only be done if the sampling factor is smaller than the smallest assumed interaction delay of the analyzed network. PMID:29149201

  2. A nonparametric smoothing method for assessing GEE models with longitudinal binary data.

    PubMed

    Lin, Kuo-Chin; Chen, Yi-Ju; Shyr, Yu

    2008-09-30

    Studies involving longitudinal binary responses are widely applied in the health and biomedical sciences research and frequently analyzed by generalized estimating equations (GEE) method. This article proposes an alternative goodness-of-fit test based on the nonparametric smoothing approach for assessing the adequacy of GEE fitted models, which can be regarded as an extension of the goodness-of-fit test of le Cessie and van Houwelingen (Biometrics 1991; 47:1267-1282). The expectation and approximate variance of the proposed test statistic are derived. The asymptotic distribution of the proposed test statistic in terms of a scaled chi-squared distribution and the power performance of the proposed test are discussed by simulation studies. The testing procedure is demonstrated by two real data. Copyright (c) 2008 John Wiley & Sons, Ltd.

  3. Parametric, nonparametric and parametric modelling of a chaotic circuit time series

    NASA Astrophysics Data System (ADS)

    Timmer, J.; Rust, H.; Horbelt, W.; Voss, H. U.

    2000-09-01

    The determination of a differential equation underlying a measured time series is a frequently arising task in nonlinear time series analysis. In the validation of a proposed model one often faces the dilemma that it is hard to decide whether possible discrepancies between the time series and model output are caused by an inappropriate model or by bad estimates of parameters in a correct type of model, or both. We propose a combination of parametric modelling based on Bock's multiple shooting algorithm and nonparametric modelling based on optimal transformations as a strategy to test proposed models and if rejected suggest and test new ones. We exemplify this strategy on an experimental time series from a chaotic circuit where we obtain an extremely accurate reconstruction of the observed attractor.

  4. Risk analysis in cohort studies with heterogeneous strata. A global chi2-test for dose-response relationship, generalizing the Mantel-Haenszel procedure.

    PubMed

    Ahlborn, W; Tuz, H J; Uberla, K

    1990-03-01

    In cohort studies the Mantel-Haenszel estimator ORMH is computed from sample data and is used as a point estimator of relative risk. Test-based confidence intervals are estimated with the help of the asymptotic chi-squared distributed MH-statistic chi 2MHS. The Mantel-extension-chi-squared is used as a test statistic for a dose-response relationship. Both test statistics--the Mantel-Haenszel-chi as well as the Mantel-extension-chi--assume homogeneity of risk across strata, which is rarely present. Also an extended nonparametric statistic, proposed by Terpstra, which is based on the Mann-Whitney-statistics assumes homogeneity of risk across strata. We have earlier defined four risk measures RRkj (k = 1,2,...,4) in the population and considered their estimates and the corresponding asymptotic distributions. In order to overcome the homogeneity assumption we use the delta-method to get "test-based" confidence intervals. Because the four risk measures RRkj are presented as functions of four weights gik we give, consequently, the asymptotic variances of these risk estimators also as functions of the weights gik in a closed form. Approximations to these variances are given. For testing a dose-response relationship we propose a new class of chi 2(1)-distributed global measures Gk and the corresponding global chi 2-test. In contrast to the Mantel-extension-chi homogeneity of risk across strata must not be assumed. These global test statistics are of the Wald type for composite hypotheses.(ABSTRACT TRUNCATED AT 250 WORDS)

  5. Mean Recency Period for Estimation of HIV-1 Incidence with the BED-Capture EIA and Bio-Rad Avidity in Persons Diagnosed in the United States with Subtype B.

    PubMed

    Hanson, Debra L; Song, Ruiguang; Masciotra, Silvina; Hernandez, Angela; Dobbs, Trudy L; Parekh, Bharat S; Owen, S Michele; Green, Timothy A

    2016-01-01

    HIV incidence estimates are used to monitor HIV-1 infection in the United States. Use of laboratory biomarkers that distinguish recent from longstanding infection to quantify HIV incidence rely on having accurate knowledge of the average time that individuals spend in a transient state of recent infection between seroconversion and reaching a specified biomarker cutoff value. This paper describes five estimation procedures from two general statistical approaches, a survival time approach and an approach that fits binomial models of the probability of being classified as recently infected, as a function of time since seroconversion. We compare these procedures for estimating the mean duration of recent infection (MDRI) for two biomarkers used by the U.S. National HIV Surveillance System for determination of HIV incidence, the Aware BED EIA HIV-1 incidence test (BED) and the avidity-based, modified Bio-Rad HIV-1/HIV-2 plus O ELISA (BRAI) assay. Collectively, 953 specimens from 220 HIV-1 subtype B seroconverters, taken from 5 cohorts, were tested with a biomarker assay. Estimates of MDRI using the non-parametric survival approach were 198.4 days (SD 13.0) for BED and 239.6 days (SD 13.9) for BRAI using cutoff values of 0.8 normalized optical density and 30%, respectively. The probability of remaining in the recent state as a function of time since seroconversion, based upon this revised statistical approach, can be applied in the calculation of annual incidence in the United States.

  6. The Impact of Conditional Scores on the Performance of DETECT.

    ERIC Educational Resources Information Center

    Zhang, Yanwei Oliver; Yu, Feng; Nandakumar, Ratna

    DETECT is a nonparametric, conditional covariance-based procedure to identify dimensional structure and the degree of multidimensionality of test data. The ability composite or conditional score used to estimate conditional covariance plays a significant role in the performance of DETECT. The number correct score of all items in the test (T) and…

  7. Gender Wage Disparities among the Highly Educated

    ERIC Educational Resources Information Center

    Black, Dan A.; Haviland, Amelia M.; Sanders, Seth G.; Taylor, Lowell J.

    2008-01-01

    We examine gender wage disparities for four groups of college-educated women--black, Hispanic, Asian, and non-Hispanic white--using the National Survey of College Graduates. Raw log wage gaps, relative to non-Hispanic white male counterparts, generally exceed -0.30. Estimated gaps decline to between -0.08 and -0.19 in nonparametric analyses that…

  8. Nonparametric Estimation of the Probability of Discovering a New Species.

    DTIC Science & Technology

    1986-01-01

    see Good (1953, 1965), Good and Toulmin (1956), Goodman (1949), Harris (1959, 1968), Knott (1967) and Robbins (1968). We note that our model is not...and Toulmin , G. (1956). The number of new species and the increase of population coverage, when a sample is increased. Biometrika 43, 45-63. Goodman

  9. Estimating a Meaningful Point of Change: A Comparison of Exploratory Techniques Based on Nonparametric Regression

    ERIC Educational Resources Information Center

    Klotsche, Jens; Gloster, Andrew T.

    2012-01-01

    Longitudinal studies are increasingly common in psychological research. Characterized by repeated measurements, longitudinal designs aim to observe phenomena that change over time. One important question involves identification of the exact point in time when the observed phenomena begin to meaningfully change above and beyond baseline…

  10. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  11. Population pharmacokinetic modelling of tramadol using inverse Gaussian function for the assessment of drug absorption from prolonged and immediate release formulations.

    PubMed

    Brvar, Nina; Mateović-Rojnik, Tatjana; Grabnar, Iztok

    2014-10-01

    This study aimed to develop a population pharmacokinetic model for tramadol that combines different input rates with disposition characteristics. Data used for the analysis were pooled from two phase I bioavailability studies with immediate (IR) and prolonged release (PR) formulations in healthy volunteers. Tramadol plasma concentration-time data were described by an inverse Gaussian function to model the complete input process linked to a two-compartment disposition model with first-order elimination. Although polymorphic CYP2D6 appears to be a major enzyme involved in the metabolism of tramadol, application of a mixture model to test the assumption of two and three subpopulations did not reveal any improvement of the model. The final model estimated parameters with reasonable precision and was able to estimate the interindividual variability of all parameters except for the relative bioavailability of PR vs. IR formulation. Validity of the model was further tested using the nonparametric bootstrap approach. Finally, the model was applied to assess absorption kinetics of tramadol and predict steady-state pharmacokinetics following administration of both types of formulations. For both formulations, the final model yielded a stable estimate of the absorption time profiles. Steady-state simulation supports switching of patients from IR to PR formulation. Copyright © 2014 Elsevier B.V. All rights reserved.

  12. Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization

    NASA Astrophysics Data System (ADS)

    Lee, Kyungbook; Song, Seok Goo

    2017-09-01

    Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events ( M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.

  13. A Non-Parametric Approach for the Activation Detection of Block Design fMRI Simulated Data Using Self-Organizing Maps and Support Vector Machine.

    PubMed

    Bahrami, Sheyda; Shamsi, Mousa

    2017-01-01

    Functional magnetic resonance imaging (fMRI) is a popular method to probe the functional organization of the brain using hemodynamic responses. In this method, volume images of the entire brain are obtained with a very good spatial resolution and low temporal resolution. However, they always suffer from high dimensionality in the face of classification algorithms. In this work, we combine a support vector machine (SVM) with a self-organizing map (SOM) for having a feature-based classification by using SVM. Then, a linear kernel SVM is used for detecting the active areas. Here, we use SOM for feature extracting and labeling the datasets. SOM has two major advances: (i) it reduces dimension of data sets for having less computational complexity and (ii) it is useful for identifying brain regions with small onset differences in hemodynamic responses. Our non-parametric model is compared with parametric and non-parametric methods. We use simulated fMRI data sets and block design inputs in this paper and consider the contrast to noise ratio (CNR) value equal to 0.6 for simulated datasets. fMRI simulated dataset has contrast 1-4% in active areas. The accuracy of our proposed method is 93.63% and the error rate is 6.37%.

  14. Nonparametric spirometry reference values for Hispanic Americans.

    PubMed

    Glenn, Nancy L; Brown, Vanessa M

    2011-02-01

    Recent literature sites ethnic origin as a major factor in developing pulmonary function reference values. Extensive studies established reference values for European and African Americans, but not for Hispanic Americans. The Third National Health and Nutrition Examination Survey defines Hispanic as individuals of Spanish speaking cultures. While no group was excluded from the target population, sample size requirements only allowed inclusion of individuals who identified themselves as Mexican Americans. This research constructs nonparametric reference value confidence intervals for Hispanic American pulmonary function. The method is applicable to all ethnicities. We use empirical likelihood confidence intervals to establish normal ranges for reference values. Its major advantage: it is model free, but shares asymptotic properties of model based methods. Statistical comparisons indicate that empirical likelihood interval lengths are comparable to normal theory intervals. Power and efficiency studies agree with previously published theoretical results.

  15. [Age- and sex-specific reference intervals for 10 health examination items: mega-data from a Japanese Health Service Association].

    PubMed

    Suka, Machi; Yoshida, Katsumi; Kawai, Tadashi; Aoki, Yoshikazu; Yamane, Noriyuki; Yamauchi, Kuniaki

    2005-07-01

    To determine age- and sex-specific reference intervals for 10 health examination items in Japanese adults. Health examination data were accumulated from 24 different prefectural health service associations affiliated with the Japan Association of Health Service. Those who were non-smokers, drank less than 7 days/week, and had a body mass index of 18.5-24.9kg/m2 were sampled as a reference population (n = 737,538; 224,947 men and 512,591 women). After classified by age and sex, reference intervals for 10 health examination items (systolic blood pressure, diastolic blood pressure, total cholesterol, triglyceride, glucose, uric acid, AST, ALT, gamma-GT, and hemoglobin) were estimated using the parametric and nonparametric methods. In every item except for hemoglobin, men had higher reference intervals than women. Systolic blood pressure, total cholesterol, and glucose showed an upward trend in values with increasing age. Hemoglobin showed a downward trend in values with increasing age. Triglyceride, ALT, and gamma-GT reached a peak in middle age. Overall, parametric estimates showed narrower reference intervals than non-parametric estimates. Reference intervals vary with age and sex. Age- and sex-specific reference intervals may contribute to better assessment of health examination data.

  16. Estimating scaled treatment effects with multiple outcomes.

    PubMed

    Kennedy, Edward H; Kangovi, Shreya; Mitra, Nandita

    2017-01-01

    In classical study designs, the aim is often to learn about the effects of a treatment or intervention on a single outcome; in many modern studies, however, data on multiple outcomes are collected and it is of interest to explore effects on multiple outcomes simultaneously. Such designs can be particularly useful in patient-centered research, where different outcomes might be more or less important to different patients. In this paper, we propose scaled effect measures (via potential outcomes) that translate effects on multiple outcomes to a common scale, using mean-variance and median-interquartile range based standardizations. We present efficient, nonparametric, doubly robust methods for estimating these scaled effects (and weighted average summary measures), and for testing the null hypothesis that treatment affects all outcomes equally. We also discuss methods for exploring how treatment effects depend on covariates (i.e., effect modification). In addition to describing efficiency theory for our estimands and the asymptotic behavior of our estimators, we illustrate the methods in a simulation study and a data analysis. Importantly, and in contrast to much of the literature concerning effects on multiple outcomes, our methods are nonparametric and can be used not only in randomized trials to yield increased efficiency, but also in observational studies with high-dimensional covariates to reduce confounding bias.

  17. Dynamic experiment design regularization approach to adaptive imaging with array radar/SAR sensor systems.

    PubMed

    Shkvarko, Yuriy; Tuxpan, José; Santos, Stewart

    2011-01-01

    We consider a problem of high-resolution array radar/SAR imaging formalized in terms of a nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the random wavefield scattered from a remotely sensed scene observed through a kernel signal formation operator and contaminated with random Gaussian noise. First, the Sobolev-type solution space is constructed to specify the class of consistent kernel SSP estimators with the reproducing kernel structures adapted to the metrics in such the solution space. Next, the "model-free" variational analysis (VA)-based image enhancement approach and the "model-based" descriptive experiment design (DEED) regularization paradigm are unified into a new dynamic experiment design (DYED) regularization framework. Application of the proposed DYED framework to the adaptive array radar/SAR imaging problem leads to a class of two-level (DEED-VA) regularized SSP reconstruction techniques that aggregate the kernel adaptive anisotropic windowing with the projections onto convex sets to enforce the consistency and robustness of the overall iterative SSP estimators. We also show how the proposed DYED regularization method may be considered as a generalization of the MVDR, APES and other high-resolution nonparametric adaptive radar sensing techniques. A family of the DYED-related algorithms is constructed and their effectiveness is finally illustrated via numerical simulations.

  18. Estimation of Subpixel Snow-Covered Area by Nonparametric Regression Splines

    NASA Astrophysics Data System (ADS)

    Kuter, S.; Akyürek, Z.; Weber, G.-W.

    2016-10-01

    Measurement of the areal extent of snow cover with high accuracy plays an important role in hydrological and climate modeling. Remotely-sensed data acquired by earth-observing satellites offer great advantages for timely monitoring of snow cover. However, the main obstacle is the tradeoff between temporal and spatial resolution of satellite imageries. Soft or subpixel classification of low or moderate resolution satellite images is a preferred technique to overcome this problem. The most frequently employed snow cover fraction methods applied on Moderate Resolution Imaging Spectroradiometer (MODIS) data have evolved from spectral unmixing and empirical Normalized Difference Snow Index (NDSI) methods to latest machine learning-based artificial neural networks (ANNs). This study demonstrates the implementation of subpixel snow-covered area estimation based on the state-of-the-art nonparametric spline regression method, namely, Multivariate Adaptive Regression Splines (MARS). MARS models were trained by using MODIS top of atmospheric reflectance values of bands 1-7 as predictor variables. Reference percentage snow cover maps were generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also employed to estimate the percentage snow-covered area on the same data set. The results indicated that the developed MARS model performed better than th

  19. Regional trends in short-duration precipitation extremes: a flexible multivariate monotone quantile regression approach

    NASA Astrophysics Data System (ADS)

    Cannon, Alex

    2017-04-01

    Estimating historical trends in short-duration rainfall extremes at regional and local scales is challenging due to low signal-to-noise ratios and the limited availability of homogenized observational data. In addition to being of scientific interest, trends in rainfall extremes are of practical importance, as their presence calls into question the stationarity assumptions that underpin traditional engineering and infrastructure design practice. Even with these fundamental challenges, increasingly complex questions are being asked about time series of extremes. For instance, users may not only want to know whether or not rainfall extremes have changed over time, they may also want information on the modulation of trends by large-scale climate modes or on the nonstationarity of trends (e.g., identifying hiatus periods or periods of accelerating positive trends). Efforts have thus been devoted to the development and application of more robust and powerful statistical estimators for regional and local scale trends. While a standard nonparametric method like the regional Mann-Kendall test, which tests for the presence of monotonic trends (i.e., strictly non-decreasing or non-increasing changes), makes fewer assumptions than parametric methods and pools information from stations within a region, it is not designed to visualize detected trends, include information from covariates, or answer questions about the rate of change in trends. As a remedy, monotone quantile regression (MQR) has been developed as a nonparametric alternative that can be used to estimate a common monotonic trend in extremes at multiple stations. Quantile regression makes efficient use of data by directly estimating conditional quantiles based on information from all rainfall data in a region, i.e., without having to precompute the sample quantiles. The MQR method is also flexible and can be used to visualize and analyze the nonlinearity of the detected trend. However, it is fundamentally a univariate technique, and cannot incorporate information from additional covariates, for example ENSO state or physiographic controls on extreme rainfall within a region. Here, the univariate MQR model is extended to allow the use of multiple covariates. Multivariate monotone quantile regression (MMQR) is based on a single hidden-layer feedforward network with the quantile regression error function and partial monotonicity constraints. The MMQR model is demonstrated via Monte Carlo simulations and the estimation and visualization of regional trends in moderate rainfall extremes based on homogenized sub-daily precipitation data at stations in Canada.

  20. Adaptive channel estimation for soft decision decoding over non-Gaussian optical channel

    NASA Astrophysics Data System (ADS)

    Xiang, Jing-song; Miao, Tao-tao; Huang, Sheng; Liu, Huan-lin

    2016-10-01

    An adaptive priori likelihood ratio (LLR) estimation method is proposed over non-Gaussian channel in the intensity modulation/direct detection (IM/DD) optical communication systems. Using the nonparametric histogram and the weighted least square linear fitting in the tail regions, the LLR is estimated and used for the soft decision decoding of the low-density parity-check (LDPC) codes. This method can adapt well to the three main kinds of intensity modulation/direct detection (IM/DD) optical channel, i.e., the chi-square channel, the Webb-Gaussian channel and the additive white Gaussian noise (AWGN) channel. The performance penalty of channel estimation is neglected.

  1. Robust variable selection method for nonparametric differential equation models with application to nonlinear dynamic gene regulatory network analysis.

    PubMed

    Lu, Tao

    2016-01-01

    The gene regulation network (GRN) evaluates the interactions between genes and look for models to describe the gene expression behavior. These models have many applications; for instance, by characterizing the gene expression mechanisms that cause certain disorders, it would be possible to target those genes to block the progress of the disease. Many biological processes are driven by nonlinear dynamic GRN. In this article, we propose a nonparametric differential equation (ODE) to model the nonlinear dynamic GRN. Specially, we address following questions simultaneously: (i) extract information from noisy time course gene expression data; (ii) model the nonlinear ODE through a nonparametric smoothing function; (iii) identify the important regulatory gene(s) through a group smoothly clipped absolute deviation (SCAD) approach; (iv) test the robustness of the model against possible shortening of experimental duration. We illustrate the usefulness of the model and associated statistical methods through a simulation and a real application examples.

  2. On the Mean Squared Error of Nonparametric Quantile Estimators under Random Right-Censorship.

    DTIC Science & Technology

    1986-09-01

    SECURITY CI.ASSIFICATION lb. RESTRICTIVE MARKINGS UNCLASSIFIED 2a, SECURITY CLASSIFICATION AUTHORITY 3 . OISTRIBUTIONIAVAILASIL.ITY OF REPORT P16e 2b...UNCLASSIPIEO/UNLIMITEO 3 SAME AS RPT". 0 OTIC USERS 1 UNCLASSIFIED p." " 22. NAME OP RESPONSIBLE INOIVIOUAL 22b. TELEPHONE NUMBER 22c. OFFICE SYMBOL...in Section 3 , and the result for the kernel estimator Qn is derived in Section 4. It should be k. mentioned that the order statistic methods used by

  3. Mixed-effects Gaussian process functional regression models with application to dose-response curve prediction.

    PubMed

    Shi, J Q; Wang, B; Will, E J; West, R M

    2012-11-20

    We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.

  4. Nonparametric Fine Tuning of Mixtures: Application to Non-Life Insurance Claims Distribution Estimation

    NASA Astrophysics Data System (ADS)

    Sardet, Laure; Patilea, Valentin

    When pricing a specific insurance premium, actuary needs to evaluate the claims cost distribution for the warranty. Traditional actuarial methods use parametric specifications to model claims distribution, like lognormal, Weibull and Pareto laws. Mixtures of such distributions allow to improve the flexibility of the parametric approach and seem to be quite well-adapted to capture the skewness, the long tails as well as the unobserved heterogeneity among the claims. In this paper, instead of looking for a finely tuned mixture with many components, we choose a parsimonious mixture modeling, typically a two or three-component mixture. Next, we use the mixture cumulative distribution function (CDF) to transform data into the unit interval where we apply a beta-kernel smoothing procedure. A bandwidth rule adapted to our methodology is proposed. Finally, the beta-kernel density estimate is back-transformed to recover an estimate of the original claims density. The beta-kernel smoothing provides an automatic fine-tuning of the parsimonious mixture and thus avoids inference in more complex mixture models with many parameters. We investigate the empirical performance of the new method in the estimation of the quantiles with simulated nonnegative data and the quantiles of the individual claims distribution in a non-life insurance application.

  5. Simulated maximum likelihood method for estimating kinetic rates in gene expression.

    PubMed

    Tian, Tianhai; Xu, Songlin; Gao, Junbin; Burrage, Kevin

    2007-01-01

    Kinetic rate in gene expression is a key measurement of the stability of gene products and gives important information for the reconstruction of genetic regulatory networks. Recent developments in experimental technologies have made it possible to measure the numbers of transcripts and protein molecules in single cells. Although estimation methods based on deterministic models have been proposed aimed at evaluating kinetic rates from experimental observations, these methods cannot tackle noise in gene expression that may arise from discrete processes of gene expression, small numbers of mRNA transcript, fluctuations in the activity of transcriptional factors and variability in the experimental environment. In this paper, we develop effective methods for estimating kinetic rates in genetic regulatory networks. The simulated maximum likelihood method is used to evaluate parameters in stochastic models described by either stochastic differential equations or discrete biochemical reactions. Different types of non-parametric density functions are used to measure the transitional probability of experimental observations. For stochastic models described by biochemical reactions, we propose to use the simulated frequency distribution to evaluate the transitional density based on the discrete nature of stochastic simulations. The genetic optimization algorithm is used as an efficient tool to search for optimal reaction rates. Numerical results indicate that the proposed methods can give robust estimations of kinetic rates with good accuracy.

  6. A Multidimensional Item Response Model: Constrained Latent Class Analysis Using the Gibbs Sampler and Posterior Predictive Checks.

    ERIC Educational Resources Information Center

    Hoijtink, Herbert; Molenaar, Ivo W.

    1997-01-01

    This paper shows that a certain class of constrained latent class models may be interpreted as a special case of nonparametric multidimensional item response models. Parameters of this latent class model are estimated using an application of the Gibbs sampler, and model fit is investigated using posterior predictive checks. (SLD)

  7. Validity Study in Multidimensional Latent Space and Efficient Computerized Adaptive Testing. Final Report.

    ERIC Educational Resources Information Center

    Samejima, Fumiko

    This paper is the final report of a multi-year project sponsored by the Office of Naval Research (ONR) in 1987 through 1990. The main objectives of the research summarized were to: investigate the non-parametric approach to the estimation of the operating characteristics of discrete item responses; revise and strengthen the package computer…

  8. An application of quantile random forests for predictive mapping of forest attributes

    Treesearch

    E.A. Freeman; G.G. Moisen

    2015-01-01

    Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...

  9. Measuring firm size distribution with semi-nonparametric densities

    NASA Astrophysics Data System (ADS)

    Cortés, Lina M.; Mora-Valencia, Andrés; Perote, Javier

    2017-11-01

    In this article, we propose a new methodology based on a (log) semi-nonparametric (log-SNP) distribution that nests the lognormal and enables better fits in the upper tail of the distribution through the introduction of new parameters. We test the performance of the lognormal and log-SNP distributions capturing firm size, measured through a sample of US firms in 2004-2015. Taking different levels of aggregation by type of economic activity, our study shows that the log-SNP provides a better fit of the firm size distribution. We also formally introduce the multivariate log-SNP distribution, which encompasses the multivariate lognormal, to analyze the estimation of the joint distribution of the value of the firm's assets and sales. The results suggest that sales are a better firm size measure, as indicated by other studies in the literature.

  10. Transport on Riemannian manifold for functional connectivity-based classification.

    PubMed

    Ng, Bernard; Dressler, Martin; Varoquaux, Gaël; Poline, Jean Baptiste; Greicius, Michael; Thirion, Bertrand

    2014-01-01

    We present a Riemannian approach for classifying fMRI connectivity patterns before and after intervention in longitudinal studies. A fundamental difficulty with using connectivity as features is that covariance matrices live on the positive semi-definite cone, which renders their elements inter-related. The implicit independent feature assumption in most classifier learning algorithms is thus violated. In this paper, we propose a matrix whitening transport for projecting the covariance estimates onto a common tangent space to reduce the statistical dependencies between their elements. We show on real data that our approach provides significantly higher classification accuracy than directly using Pearson's correlation. We further propose a non-parametric scheme for identifying significantly discriminative connections from classifier weights. Using this scheme, a number of neuroanatomically meaningful connections are found, whereas no significant connections are detected with pure permutation testing.

  11. Goodness-of-Fit Tests and Nonparametric Adaptive Estimation for Spike Train Analysis

    PubMed Central

    2014-01-01

    When dealing with classical spike train analysis, the practitioner often performs goodness-of-fit tests to test whether the observed process is a Poisson process, for instance, or if it obeys another type of probabilistic model (Yana et al. in Biophys. J. 46(3):323–330, 1984; Brown et al. in Neural Comput. 14(2):325–346, 2002; Pouzat and Chaffiol in Technical report, http://arxiv.org/abs/arXiv:0909.2785, 2009). In doing so, there is a fundamental plug-in step, where the parameters of the supposed underlying model are estimated. The aim of this article is to show that plug-in has sometimes very undesirable effects. We propose a new method based on subsampling to deal with those plug-in issues in the case of the Kolmogorov–Smirnov test of uniformity. The method relies on the plug-in of good estimates of the underlying model that have to be consistent with a controlled rate of convergence. Some nonparametric estimates satisfying those constraints in the Poisson or in the Hawkes framework are highlighted. Moreover, they share adaptive properties that are useful from a practical point of view. We show the performance of those methods on simulated data. We also provide a complete analysis with these tools on single unit activity recorded on a monkey during a sensory-motor task. Electronic Supplementary Material The online version of this article (doi:10.1186/2190-8567-4-3) contains supplementary material. PMID:24742008

  12. Parametric vs. non-parametric statistics of low resolution electromagnetic tomography (LORETA).

    PubMed

    Thatcher, R W; North, D; Biver, C

    2005-01-01

    This study compared the relative statistical sensitivity of non-parametric and parametric statistics of 3-dimensional current sources as estimated by the EEG inverse solution Low Resolution Electromagnetic Tomography (LORETA). One would expect approximately 5% false positives (classification of a normal as abnormal) at the P < .025 level of probability (two tailed test) and approximately 1% false positives at the P < .005 level. EEG digital samples (2 second intervals sampled 128 Hz, 1 to 2 minutes eyes closed) from 43 normal adult subjects were imported into the Key Institute's LORETA program. We then used the Key Institute's cross-spectrum and the Key Institute's LORETA output files (*.lor) as the 2,394 gray matter pixel representation of 3-dimensional currents at different frequencies. The mean and standard deviation *.lor files were computed for each of the 2,394 gray matter pixels for each of the 43 subjects. Tests of Gaussianity and different transforms were computed in order to best approximate a normal distribution for each frequency and gray matter pixel. The relative sensitivity of parametric vs. non-parametric statistics were compared using a "leave-one-out" cross validation method in which individual normal subjects were withdrawn and then statistically classified as being either normal or abnormal based on the remaining subjects. Log10 transforms approximated Gaussian distribution in the range of 95% to 99% accuracy. Parametric Z score tests at P < .05 cross-validation demonstrated an average misclassification rate of approximately 4.25%, and range over the 2,394 gray matter pixels was 27.66% to 0.11%. At P < .01 parametric Z score cross-validation false positives were 0.26% and ranged from 6.65% to 0% false positives. The non-parametric Key Institute's t-max statistic at P < .05 had an average misclassification error rate of 7.64% and ranged from 43.37% to 0.04% false positives. The nonparametric t-max at P < .01 had an average misclassification rate of 6.67% and ranged from 41.34% to 0% false positives of the 2,394 gray matter pixels for any cross-validated normal subject. In conclusion, adequate approximation to Gaussian distribution and high cross-validation can be achieved by the Key Institute's LORETA programs by using a log10 transform and parametric statistics, and parametric normative comparisons had lower false positive rates than the non-parametric tests.

  13. Use of Brain MRI Atlases to Determine Boundaries of Age-Related Pathology: The Importance of Statistical Method

    PubMed Central

    Dickie, David Alexander; Job, Dominic E.; Gonzalez, David Rodriguez; Shenkin, Susan D.; Wardlaw, Joanna M.

    2015-01-01

    Introduction Neurodegenerative disease diagnoses may be supported by the comparison of an individual patient’s brain magnetic resonance image (MRI) with a voxel-based atlas of normal brain MRI. Most current brain MRI atlases are of young to middle-aged adults and parametric, e.g., mean ±standard deviation (SD); these atlases require data to be Gaussian. Brain MRI data, e.g., grey matter (GM) proportion images, from normal older subjects are apparently not Gaussian. We created a nonparametric and a parametric atlas of the normal limits of GM proportions in older subjects and compared their classifications of GM proportions in Alzheimer’s disease (AD) patients. Methods Using publicly available brain MRI from 138 normal subjects and 138 subjects diagnosed with AD (all 55–90 years), we created: a mean ±SD atlas to estimate parametrically the percentile ranks and limits of normal ageing GM; and, separately, a nonparametric, rank order-based GM atlas from the same normal ageing subjects. GM images from AD patients were then classified with respect to each atlas to determine the effect statistical distributions had on classifications of proportions of GM in AD patients. Results The parametric atlas often defined the lower normal limit of the proportion of GM to be negative (which does not make sense physiologically as the lowest possible proportion is zero). Because of this, for approximately half of the AD subjects, 25–45% of voxels were classified as normal when compared to the parametric atlas; but were classified as abnormal when compared to the nonparametric atlas. These voxels were mainly concentrated in the frontal and occipital lobes. Discussion To our knowledge, we have presented the first nonparametric brain MRI atlas. In conditions where there is increasing variability in brain structure, such as in old age, nonparametric brain MRI atlases may represent the limits of normal brain structure more accurately than parametric approaches. Therefore, we conclude that the statistical method used for construction of brain MRI atlases should be selected taking into account the population and aim under study. Parametric methods are generally robust for defining central tendencies, e.g., means, of brain structure. Nonparametric methods are advisable when studying the limits of brain structure in ageing and neurodegenerative disease. PMID:26023913

  14. Divergences and estimating tight bounds on Bayes error with applications to multivariate Gaussian copula and latent Gaussian copula

    NASA Astrophysics Data System (ADS)

    Thelen, Brian J.; Xique, Ismael J.; Burns, Joseph W.; Goley, G. Steven; Nolan, Adam R.; Benson, Jonathan W.

    2017-04-01

    In Bayesian decision theory, there has been a great amount of research into theoretical frameworks and information- theoretic quantities that can be used to provide lower and upper bounds for the Bayes error. These include well-known bounds such as Chernoff, Battacharrya, and J-divergence. Part of the challenge of utilizing these various metrics in practice is (i) whether they are "loose" or "tight" bounds, (ii) how they might be estimated via either parametric or non-parametric methods, and (iii) how accurate the estimates are for limited amounts of data. In general what is desired is a methodology for generating relatively tight lower and upper bounds, and then an approach to estimate these bounds efficiently from data. In this paper, we explore the so-called triangle divergence which has been around for a while, but was recently made more prominent in some recent research on non-parametric estimation of information metrics. Part of this work is motivated by applications for quantifying fundamental information content in SAR/LIDAR data, and to help in this, we have developed a flexible multivariate modeling framework based on multivariate Gaussian copula models which can be combined with the triangle divergence framework to quantify this information, and provide approximate bounds on Bayes error. In this paper we present an overview of the bounds, including those based on triangle divergence and verify that under a number of multivariate models, the upper and lower bounds derived from triangle divergence are significantly tighter than the other common bounds, and often times, dramatically so. We also propose some simple but effective means for computing the triangle divergence using Monte Carlo methods, and then discuss estimation of the triangle divergence from empirical data based on Gaussian Copula models.

  15. Reliable estimates of predictive uncertainty for an Alpine catchment using a non-parametric methodology

    NASA Astrophysics Data System (ADS)

    Matos, José P.; Schaefli, Bettina; Schleiss, Anton J.

    2017-04-01

    Uncertainty affects hydrological modelling efforts from the very measurements (or forecasts) that serve as inputs to the more or less inaccurate predictions that are produced. Uncertainty is truly inescapable in hydrology and yet, due to the theoretical and technical hurdles associated with its quantification, it is at times still neglected or estimated only qualitatively. In recent years the scientific community has made a significant effort towards quantifying this hydrologic prediction uncertainty. Despite this, most of the developed methodologies can be computationally demanding, are complex from a theoretical point of view, require substantial expertise to be employed, and are constrained by a number of assumptions about the model error distribution. These assumptions limit the reliability of many methods in case of errors that show particular cases of non-normality, heteroscedasticity, or autocorrelation. The present contribution builds on a non-parametric data-driven approach that was developed for uncertainty quantification in operational (real-time) forecasting settings. The approach is based on the concept of Pareto optimality and can be used as a standalone forecasting tool or as a postprocessor. By virtue of its non-parametric nature and a general operating principle, it can be applied directly and with ease to predictions of streamflow, water stage, or even accumulated runoff. Also, it is a methodology capable of coping with high heteroscedasticity and seasonal hydrological regimes (e.g. snowmelt and rainfall driven events in the same catchment). Finally, the training and operation of the model are very fast, making it a tool particularly adapted to operational use. To illustrate its practical use, the uncertainty quantification method is coupled with a process-based hydrological model to produce statistically reliable forecasts for an Alpine catchment located in Switzerland. Results are presented and discussed in terms of their reliability and resolution.

  16. Water quality analysis in rivers with non-parametric probability distributions and fuzzy inference systems: application to the Cauca River, Colombia.

    PubMed

    Ocampo-Duque, William; Osorio, Carolina; Piamba, Christian; Schuhmacher, Marta; Domingo, José L

    2013-02-01

    The integration of water quality monitoring variables is essential in environmental decision making. Nowadays, advanced techniques to manage subjectivity, imprecision, uncertainty, vagueness, and variability are required in such complex evaluation process. We here propose a probabilistic fuzzy hybrid model to assess river water quality. Fuzzy logic reasoning has been used to compute a water quality integrative index. By applying a Monte Carlo technique, based on non-parametric probability distributions, the randomness of model inputs was estimated. Annual histograms of nine water quality variables were built with monitoring data systematically collected in the Colombian Cauca River, and probability density estimations using the kernel smoothing method were applied to fit data. Several years were assessed, and river sectors upstream and downstream the city of Santiago de Cali, a big city with basic wastewater treatment and high industrial activity, were analyzed. The probabilistic fuzzy water quality index was able to explain the reduction in water quality, as the river receives a larger number of agriculture, domestic, and industrial effluents. The results of the hybrid model were compared to traditional water quality indexes. The main advantage of the proposed method is that it considers flexible boundaries between the linguistic qualifiers used to define the water status, being the belongingness of water quality to the diverse output fuzzy sets or classes provided with percentiles and histograms, which allows classify better the real water condition. The results of this study show that fuzzy inference systems integrated to stochastic non-parametric techniques may be used as complementary tools in water quality indexing methodologies. Copyright © 2012 Elsevier Ltd. All rights reserved.

  17. Support vector machines for nuclear reactor state estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zavaljevski, N.; Gross, K. C.

    2000-02-14

    Validation of nuclear power reactor signals is often performed by comparing signal prototypes with the actual reactor signals. The signal prototypes are often computed based on empirical data. The implementation of an estimation algorithm which can make predictions on limited data is an important issue. A new machine learning algorithm called support vector machines (SVMS) recently developed by Vladimir Vapnik and his coworkers enables a high level of generalization with finite high-dimensional data. The improved generalization in comparison with standard methods like neural networks is due mainly to the following characteristics of the method. The input data space is transformedmore » into a high-dimensional feature space using a kernel function, and the learning problem is formulated as a convex quadratic programming problem with a unique solution. In this paper the authors have applied the SVM method for data-based state estimation in nuclear power reactors. In particular, they implemented and tested kernels developed at Argonne National Laboratory for the Multivariate State Estimation Technique (MSET), a nonlinear, nonparametric estimation technique with a wide range of applications in nuclear reactors. The methodology has been applied to three data sets from experimental and commercial nuclear power reactor applications. The results are promising. The combination of MSET kernels with the SVM method has better noise reduction and generalization properties than the standard MSET algorithm.« less

  18. Joint-specific risk of impaired function in fibrodysplasia ossificans progressiva (FOP).

    PubMed

    Pignolo, Robert J; Durbin-Johnson, Blythe P; Rocke, David M; Kaplan, Frederick S

    2018-04-01

    Fibrodysplasia ossificans progressiva (FOP) causes progressive disability due to heterotopic ossification from episodic flare-ups. Using data from 500 FOP patients (representing 63% of all known patients world-wide), age- and joint-specific risks of new joint involvement were estimated using parametric and nonparametric statistical methods. Compared to data from a 1994 survey of 44 individuals with FOP, our current estimates of age- and joint-specific risks of new joint involvement are more accurate (narrower confidence limits), based on a wider range of ages, and have less bias due to its greater comprehensiveness (captures over three-fifths of the known FOP patients worldwide). For the neck, chest, abdomen, and upper back, the estimated hazard decreases over time. For the jaw, lower back, shoulder, elbow, wrist, fingers, hip, knee, ankle, and foot, the estimated hazard increases initially then either plateaus or decreases. At any given time and for any anatomic site, the data indicate which joints are at risk. This study of approximately 63% of the world's known population of FOP patients provides a refined estimate of risk for new involvement at any joint at any age, as well as the proportion of patients with uninvolved joints at any age. Importantly, these joint-specific survival curves can be used to facilitate clinical trial design and to determine if potential treatments can modify the predicted trajectory of progressive joint dysfunction. Copyright © 2017 Elsevier Inc. All rights reserved.

  19. A Non-parametric Approach to Constrain the Transfer Function in Reverberation Mapping

    NASA Astrophysics Data System (ADS)

    Li, Yan-Rong; Wang, Jian-Min; Bai, Jin-Ming

    2016-11-01

    Broad emission lines of active galactic nuclei stem from a spatially extended region (broad-line region, BLR) that is composed of discrete clouds and photoionized by the central ionizing continuum. The temporal behaviors of these emission lines are blurred echoes of continuum variations (I.e., reverberation mapping, RM) and directly reflect the structures and kinematic information of BLRs through the so-called transfer function (also known as the velocity-delay map). Based on the previous works of Rybicki and Press and Zu et al., we develop an extended, non-parametric approach to determine the transfer function for RM data, in which the transfer function is expressed as a sum of a family of relatively displaced Gaussian response functions. Therefore, arbitrary shapes of transfer functions associated with complicated BLR geometry can be seamlessly included, enabling us to relax the presumption of a specified transfer function frequently adopted in previous studies and to let it be determined by observation data. We formulate our approach in a previously well-established framework that incorporates the statistical modeling of continuum variations as a damped random walk process and takes into account long-term secular variations which are irrelevant to RM signals. The application to RM data shows the fidelity of our approach.

  20. Generalized t-statistic for two-group classification.

    PubMed

    Komori, Osamu; Eguchi, Shinto; Copas, John B

    2015-06-01

    In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.

  1. Long-range dismount activity classification: LODAC

    NASA Astrophysics Data System (ADS)

    Garagic, Denis; Peskoe, Jacob; Liu, Fang; Cuevas, Manuel; Freeman, Andrew M.; Rhodes, Bradley J.

    2014-06-01

    Continuous classification of dismount types (including gender, age, ethnicity) and their activities (such as walking, running) evolving over space and time is challenging. Limited sensor resolution (often exacerbated as a function of platform standoff distance) and clutter from shadows in dense target environments, unfavorable environmental conditions, and the normal properties of real data all contribute to the challenge. The unique and innovative aspect of our approach is a synthesis of multimodal signal processing with incremental non-parametric, hierarchical Bayesian machine learning methods to create a new kind of target classification architecture. This architecture is designed from the ground up to optimally exploit correlations among the multiple sensing modalities (multimodal data fusion) and rapidly and continuously learns (online self-tuning) patterns of distinct classes of dismounts given little a priori information. This increases classification performance in the presence of challenges posed by anti-access/area denial (A2/AD) sensing. To fuse multimodal features, Long-range Dismount Activity Classification (LODAC) develops a novel statistical information theoretic approach for multimodal data fusion that jointly models multimodal data (i.e., a probabilistic model for cross-modal signal generation) and discovers the critical cross-modal correlations by identifying components (features) with maximal mutual information (MI) which is efficiently estimated using non-parametric entropy models. LODAC develops a generic probabilistic pattern learning and classification framework based on a new class of hierarchical Bayesian learning algorithms for efficiently discovering recurring patterns (classes of dismounts) in multiple simultaneous time series (sensor modalities) at multiple levels of feature granularity.

  2. European regional efficiency and geographical externalities: a spatial nonparametric frontier analysis

    NASA Astrophysics Data System (ADS)

    Ramajo, Julián; Cordero, José Manuel; Márquez, Miguel Ángel

    2017-10-01

    This paper analyses region-level technical efficiency in nine European countries over the 1995-2007 period. We propose the application of a nonparametric conditional frontier approach to account for the presence of heterogeneous conditions in the form of geographical externalities. Such environmental factors are beyond the control of regional authorities, but may affect the production function. Therefore, they need to be considered in the frontier estimation. Specifically, a spatial autoregressive term is included as an external conditioning factor in a robust order- m model. Thus we can test the hypothesis of non-separability (the external factor impacts both the input-output space and the distribution of efficiencies), demonstrating the existence of significant global interregional spillovers into the production process. Our findings show that geographical externalities affect both the frontier level and the probability of being more or less efficient. Specifically, the results support the fact that the spatial lag variable has an inverted U-shaped non-linear impact on the performance of regions. This finding can be interpreted as a differential effect of interregional spillovers depending on the size of the neighboring economies: positive externalities for small values, possibly related to agglomeration economies, and negative externalities for high values, indicating the possibility of production congestion. Additionally, evidence of the existence of a strong geographic pattern of European regional efficiency is reported and the levels of technical efficiency are acknowledged to have converged during the period under analysis.

  3. Assessing statistical differences between parameters estimates in Partial Least Squares path modeling.

    PubMed

    Rodríguez-Entrena, Macario; Schuberth, Florian; Gelhard, Carsten

    2018-01-01

    Structural equation modeling using partial least squares (PLS-SEM) has become a main-stream modeling approach in various disciplines. Nevertheless, prior literature still lacks a practical guidance on how to properly test for differences between parameter estimates. Whereas existing techniques such as parametric and non-parametric approaches in PLS multi-group analysis solely allow to assess differences between parameters that are estimated for different subpopulations, the study at hand introduces a technique that allows to also assess whether two parameter estimates that are derived from the same sample are statistically different. To illustrate this advancement to PLS-SEM, we particularly refer to a reduced version of the well-established technology acceptance model.

  4. Network Reconstruction From High-Dimensional Ordinary Differential Equations.

    PubMed

    Chen, Shizhe; Shojaie, Ali; Witten, Daniela M

    2017-01-01

    We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system nonparametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches. Supplementary materials for this article are available online.

  5. Doubly Nonparametric Sparse Nonnegative Matrix Factorization Based on Dependent Indian Buffet Processes.

    PubMed

    Xuan, Junyu; Lu, Jie; Zhang, Guangquan; Xu, Richard Yi Da; Luo, Xiangfeng

    2018-05-01

    Sparse nonnegative matrix factorization (SNMF) aims to factorize a data matrix into two optimized nonnegative sparse factor matrices, which could benefit many tasks, such as document-word co-clustering. However, the traditional SNMF typically assumes the number of latent factors (i.e., dimensionality of the factor matrices) to be fixed. This assumption makes it inflexible in practice. In this paper, we propose a doubly sparse nonparametric NMF framework to mitigate this issue by using dependent Indian buffet processes (dIBP). We apply a correlation function for the generation of two stick weights associated with each column pair of factor matrices while still maintaining their respective marginal distribution specified by IBP. As a consequence, the generation of two factor matrices will be columnwise correlated. Under this framework, two classes of correlation function are proposed: 1) using bivariate Beta distribution and 2) using Copula function. Compared with the single IBP-based NMF, this paper jointly makes two factor matrices nonparametric and sparse, which could be applied to broader scenarios, such as co-clustering. This paper is seen to be much more flexible than Gaussian process-based and hierarchial Beta process-based dIBPs in terms of allowing the two corresponding binary matrix columns to have greater variations in their nonzero entries. Our experiments on synthetic data show the merits of this paper compared with the state-of-the-art models in respect of factorization efficiency, sparsity, and flexibility. Experiments on real-world data sets demonstrate the efficiency of this paper in document-word co-clustering tasks.

  6. Nonparametric Bayesian inference for mean residual life functions in survival analysis.

    PubMed

    Poynor, Valerie; Kottas, Athanasios

    2018-01-19

    Modeling and inference for survival analysis problems typically revolves around different functions related to the survival distribution. Here, we focus on the mean residual life (MRL) function, which provides the expected remaining lifetime given that a subject has survived (i.e. is event-free) up to a particular time. This function is of direct interest in reliability, medical, and actuarial fields. In addition to its practical interpretation, the MRL function characterizes the survival distribution. We develop general Bayesian nonparametric inference for MRL functions built from a Dirichlet process mixture model for the associated survival distribution. The resulting model for the MRL function admits a representation as a mixture of the kernel MRL functions with time-dependent mixture weights. This model structure allows for a wide range of shapes for the MRL function. Particular emphasis is placed on the selection of the mixture kernel, taken to be a gamma distribution, to obtain desirable properties for the MRL function arising from the mixture model. The inference method is illustrated with a data set of two experimental groups and a data set involving right censoring. The supplementary material available at Biostatistics online provides further results on empirical performance of the model, using simulated data examples. © The Author 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  7. The Initial Mass Function in the Nearest Strong Lenses from SNELLS: Assessing the Consistency of Lensing, Dynamical, and Spectroscopic Constraints

    NASA Astrophysics Data System (ADS)

    Newman, Andrew B.; Smith, Russell J.; Conroy, Charlie; Villaume, Alexa; van Dokkum, Pieter

    2017-08-01

    We present new observations of the three nearest early-type galaxy (ETG) strong lenses discovered in the SINFONI Nearby Elliptical Lens Locator Survey (SNELLS). Based on their lensing masses, these ETGs were inferred to have a stellar initial mass function (IMF) consistent with that of the Milky Way, not the bottom-heavy IMF that has been reported as typical for high-σ ETGs based on lensing, dynamical, and stellar population synthesis techniques. We use these unique systems to test the consistency of IMF estimates derived from different methods. We first estimate the stellar M */L using lensing and stellar dynamics. We then fit high-quality optical spectra of the lenses using an updated version of the stellar population synthesis models developed by Conroy & van Dokkum. When examined individually, we find good agreement among these methods for one galaxy. The other two galaxies show 2-3σ tension with lensing estimates, depending on the dark matter contribution, when considering IMFs that extend to 0.08 M ⊙. Allowing a variable low-mass cutoff or a nonparametric form of the IMF reduces the tension among the IMF estimates to <2σ. There is moderate evidence for a reduced number of low-mass stars in the SNELLS spectra, but no such evidence in a composite spectrum of matched-σ ETGs drawn from the SDSS. Such variation in the form of the IMF at low stellar masses (m ≲ 0.3 M ⊙), if present, could reconcile lensing/dynamical and spectroscopic IMF estimates for the SNELLS lenses and account for their lighter M */L relative to the mean matched-σ ETG. We provide the spectra used in this study to facilitate future comparisons.

  8. The Initial Mass Function in the Nearest Strong Lenses from SNELLS: Assessing the Consistency of Lensing, Dynamical, and Spectroscopic Constraints

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Newman, Andrew B.; Smith, Russell J.; Conroy, Charlie

    2017-08-20

    We present new observations of the three nearest early-type galaxy (ETG) strong lenses discovered in the SINFONI Nearby Elliptical Lens Locator Survey (SNELLS). Based on their lensing masses, these ETGs were inferred to have a stellar initial mass function (IMF) consistent with that of the Milky Way, not the bottom-heavy IMF that has been reported as typical for high- σ ETGs based on lensing, dynamical, and stellar population synthesis techniques. We use these unique systems to test the consistency of IMF estimates derived from different methods. We first estimate the stellar M {sub *}/ L using lensing and stellar dynamics.more » We then fit high-quality optical spectra of the lenses using an updated version of the stellar population synthesis models developed by Conroy and van Dokkum. When examined individually, we find good agreement among these methods for one galaxy. The other two galaxies show 2–3 σ tension with lensing estimates, depending on the dark matter contribution, when considering IMFs that extend to 0.08 M {sub ⊙}. Allowing a variable low-mass cutoff or a nonparametric form of the IMF reduces the tension among the IMF estimates to <2 σ . There is moderate evidence for a reduced number of low-mass stars in the SNELLS spectra, but no such evidence in a composite spectrum of matched- σ ETGs drawn from the SDSS. Such variation in the form of the IMF at low stellar masses ( m ≲ 0.3 M {sub ⊙}), if present, could reconcile lensing/dynamical and spectroscopic IMF estimates for the SNELLS lenses and account for their lighter M {sub *}/ L relative to the mean matched- σ ETG. We provide the spectra used in this study to facilitate future comparisons.« less

  9. Balancing Score Adjusted Targeted Minimum Loss-based Estimation

    PubMed Central

    Lendle, Samuel David; Fireman, Bruce; van der Laan, Mark J.

    2015-01-01

    Adjusting for a balancing score is sufficient for bias reduction when estimating causal effects including the average treatment effect and effect among the treated. Estimators that adjust for the propensity score in a nonparametric way, such as matching on an estimate of the propensity score, can be consistent when the estimated propensity score is not consistent for the true propensity score but converges to some other balancing score. We call this property the balancing score property, and discuss a class of estimators that have this property. We introduce a targeted minimum loss-based estimator (TMLE) for a treatment-specific mean with the balancing score property that is additionally locally efficient and doubly robust. We investigate the new estimator’s performance relative to other estimators, including another TMLE, a propensity score matching estimator, an inverse probability of treatment weighted estimator, and a regression-based estimator in simulation studies. PMID:26561539

  10. A Novel Nonparametric Approach for Neural Encoding and Decoding Models of Multimodal Receptive Fields.

    PubMed

    Agarwal, Rahul; Chen, Zhe; Kloosterman, Fabian; Wilson, Matthew A; Sarma, Sridevi V

    2016-07-01

    Pyramidal neurons recorded from the rat hippocampus and entorhinal cortex, such as place and grid cells, have diverse receptive fields, which are either unimodal or multimodal. Spiking activity from these cells encodes information about the spatial position of a freely foraging rat. At fine timescales, a neuron's spike activity also depends significantly on its own spike history. However, due to limitations of current parametric modeling approaches, it remains a challenge to estimate complex, multimodal neuronal receptive fields while incorporating spike history dependence. Furthermore, efforts to decode the rat's trajectory in one- or two-dimensional space from hippocampal ensemble spiking activity have mainly focused on spike history-independent neuronal encoding models. In this letter, we address these two important issues by extending a recently introduced nonparametric neural encoding framework that allows modeling both complex spatial receptive fields and spike history dependencies. Using this extended nonparametric approach, we develop novel algorithms for decoding a rat's trajectory based on recordings of hippocampal place cells and entorhinal grid cells. Results show that both encoding and decoding models derived from our new method performed significantly better than state-of-the-art encoding and decoding models on 6 minutes of test data. In addition, our model's performance remains invariant to the apparent modality of the neuron's receptive field.

  11. Experimental estimation of transmissibility matrices for industrial multi-axis vibration isolation systems

    NASA Astrophysics Data System (ADS)

    Beijen, Michiel A.; Voorhoeve, Robbert; Heertjes, Marcel F.; Oomen, Tom

    2018-07-01

    Vibration isolation is essential for industrial high-precision systems to suppress external disturbances. The aim of this paper is to develop a general identification approach to estimate the frequency response function (FRF) of the transmissibility matrix, which is a key performance indicator for vibration isolation systems. The major challenge lies in obtaining a good signal-to-noise ratio in view of a large system weight. A non-parametric system identification method is proposed that combines floor and shaker excitations. Furthermore, a method is presented to analyze the input power spectrum of the floor excitations, both in terms of magnitude and direction. In turn, the input design of the shaker excitation signals is investigated to obtain sufficient excitation power in all directions with minimum experiment cost. The proposed methods are shown to provide an accurate FRF of the transmissibility matrix in three relevant directions on an industrial active vibration isolation system over a large frequency range. This demonstrates that, despite their heavy weight, industrial vibration isolation systems can be accurately identified using this approach.

  12. Assessing the causal effect of policies: an example using stochastic interventions.

    PubMed

    Díaz, Iván; van der Laan, Mark J

    2013-11-19

    Assessing the causal effect of an exposure often involves the definition of counterfactual outcomes in a hypothetical world in which the stochastic nature of the exposure is modified. Although stochastic interventions are a powerful tool to measure the causal effect of a realistic intervention that intends to alter the population distribution of an exposure, their importance to answer questions about plausible policy interventions has been obscured by the generalized use of deterministic interventions. In this article, we follow the approach described in Díaz and van der Laan (2012) to define and estimate the effect of an intervention that is expected to cause a truncation in the population distribution of the exposure. The observed data parameter that identifies the causal parameter of interest is established, as well as its efficient influence function under the non-parametric model. Inverse probability of treatment weighted (IPTW), augmented IPTW and targeted minimum loss-based estimators (TMLE) are proposed, their consistency and efficiency properties are determined. An extension to longitudinal data structures is presented and its use is demonstrated with a real data example.

  13. A Review of Methods for Analysis of the Expected Value of Information.

    PubMed

    Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca

    2017-10-01

    In recent years, value-of-information analysis has become more widespread in health economic evaluations, specifically as a tool to guide further research and perform probabilistic sensitivity analysis. This is partly due to methodological advancements allowing for the fast computation of a typical summary known as the expected value of partial perfect information (EVPPI). A recent review discussed some approximation methods for calculating the EVPPI, but as the research has been active over the intervening years, that review does not discuss some key estimation methods. Therefore, this paper presents a comprehensive review of these new methods. We begin by providing the technical details of these computation methods. We then present two case studies in order to compare the estimation performance of these new methods. We conclude that a method based on nonparametric regression offers the best method for calculating the EVPPI in terms of accuracy, computational time, and ease of implementation. This means that the EVPPI can now be used practically in health economic evaluations, especially as all the methods are developed in parallel with R functions and a web app to aid practitioners.

  14. A new virtual instrument for estimating punch velocity in combat sports.

    PubMed

    Urbinati, K S; Scheeren, E; Nohama, P

    2013-01-01

    For improving the performance in combat sport, especially percussion, it is necessary achieving high velocity in punches and kicks. The aim of this study was to evaluate the applicability of 3D accelerometry in a Virtual Instrumentation System (VIS) designed for estimating punch velocity in combat sports. It was conducted in two phases: (1) integration of the 3D accelerometer with the communication interface and software for processing and visualization, and (2) applicability of the system. Fifteen karate athletes performed five gyaku zuki type punches (with reverse leg) using the accelerometer on the 3rd metacarpal on the back of the hand. It was performed nonparametric Mann-Whitney U-test to determine differences in the mean linear velocity among three punches performed sequentially (p <0.05). The maximum velocities measured varied in the range of 10 and 10.2 m/s and the mean velocities from 6 to 6.8 m/s. There was no difference on the mean velocity for the tested punches. The VIS demonstrated regularity and proper functionality for assessing punches in combat sport.

  15. Locally-Based Kernal PLS Smoothing to Non-Parametric Regression Curve Fitting

    NASA Technical Reports Server (NTRS)

    Rosipal, Roman; Trejo, Leonard J.; Wheeler, Kevin; Korsmeyer, David (Technical Monitor)

    2002-01-01

    We present a novel smoothing approach to non-parametric regression curve fitting. This is based on kernel partial least squares (PLS) regression in reproducing kernel Hilbert space. It is our concern to apply the methodology for smoothing experimental data where some level of knowledge about the approximate shape, local inhomogeneities or points where the desired function changes its curvature is known a priori or can be derived based on the observed noisy data. We propose locally-based kernel PLS regression that extends the previous kernel PLS methodology by incorporating this knowledge. We compare our approach with existing smoothing splines, hybrid adaptive splines and wavelet shrinkage techniques on two generated data sets.

  16. To Math or Not to Math: The Algebra-Calculus Pipeline and Postsecondary Mathematics Remediation

    ERIC Educational Resources Information Center

    Showalter, Daniel A.

    2017-01-01

    This article reports on a study designed to estimate the effect of high school coursetaking in the algebra-calculus pipeline on the likelihood of placing out of postsecondary remedial mathematics. A nonparametric variant of propensity score analysis was used on a nationally representative data set to remove selection bias and test for an effect…

  17. The Rasch Model and Missing Data, with an Emphasis on Tailoring Test Items.

    ERIC Educational Resources Information Center

    de Gruijter, Dato N. M.

    Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the…

  18. Standard Errors and Confidence Intervals from Bootstrapping for Ramsay-Curve Item Response Theory Model Item Parameters

    ERIC Educational Resources Information Center

    Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M.

    2011-01-01

    Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…

  19. Estimating Marginal Returns to Higher Education in the UK. NBER Working Paper No. 13534

    ERIC Educational Resources Information Center

    Moffitt, Robert

    2007-01-01

    A long-standing issue in the literature on education is whether marginal returns to education fall as education rises. If the population differs in its rate of return, a closely related question is whether marginal returns to higher education fall as a greater fraction of the population enrolls. This paper proposes a nonparametric method of…

  20. Alternating event processes during lifetimes: population dynamics and statistical inference.

    PubMed

    Shinohara, Russell T; Sun, Yifei; Wang, Mei-Cheng

    2018-01-01

    In the literature studying recurrent event data, a large amount of work has been focused on univariate recurrent event processes where the occurrence of each event is treated as a single point in time. There are many applications, however, in which univariate recurrent events are insufficient to characterize the feature of the process because patients experience nontrivial durations associated with each event. This results in an alternating event process where the disease status of a patient alternates between exacerbations and remissions. In this paper, we consider the dynamics of a chronic disease and its associated exacerbation-remission process over two time scales: calendar time and time-since-onset. In particular, over calendar time, we explore population dynamics and the relationship between incidence, prevalence and duration for such alternating event processes. We provide nonparametric estimation techniques for characteristic quantities of the process. In some settings, exacerbation processes are observed from an onset time until death; to account for the relationship between the survival and alternating event processes, nonparametric approaches are developed for estimating exacerbation process over lifetime. By understanding the population dynamics and within-process structure, the paper provide a new and general way to study alternating event processes.

  1. International comparisons of the technical efficiency of the hospital sector: panel data analysis of OECD countries using parametric and non-parametric approaches.

    PubMed

    Varabyova, Yauheniya; Schreyögg, Jonas

    2013-09-01

    There is a growing interest in the cross-country comparisons of the performance of national health care systems. The present work provides a comparison of the technical efficiency of the hospital sector using unbalanced panel data from OECD countries over the period 2000-2009. The estimation of the technical efficiency of the hospital sector is performed using nonparametric data envelopment analysis (DEA) and parametric stochastic frontier analysis (SFA). Internal and external validity of findings is assessed by estimating the Spearman rank correlations between the results obtained in different model specifications. The panel-data analyses using two-step DEA and one-stage SFA show that countries, which have higher health care expenditure per capita, tend to have a more technically efficient hospital sector. Whether the expenditure is financed through private or public sources is not related to the technical efficiency of the hospital sector. On the other hand, the hospital sector in countries with higher income inequality and longer average hospital length of stay is less technically efficient. Copyright © 2013 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.

  2. A Machine Learning Framework for Plan Payment Risk Adjustment.

    PubMed

    Rose, Sherri

    2016-12-01

    To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R 2 . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.

  3. Monitoring waterbird abundance in wetlands: The importance of controlling results for variation in water depth

    USGS Publications Warehouse

    Bolduc, F.; Afton, A.D.

    2008-01-01

    Wetland use by waterbirds is highly dependent on water depth, and depth requirements generally vary among species. Furthermore, water depth within wetlands often varies greatly over time due to unpredictable hydrological events, making comparisons of waterbird abundance among wetlands difficult as effects of habitat variables and water depth are confounded. Species-specific relationships between bird abundance and water depth necessarily are non-linear; thus, we developed a methodology to correct waterbird abundance for variation in water depth, based on the non-parametric regression of these two variables. Accordingly, we used the difference between observed and predicted abundances from non-parametric regression (analogous to parametric residuals) as an estimate of bird abundance at equivalent water depths. We scaled this difference to levels of observed and predicted abundances using the formula: ((observed - predicted abundance)/(observed + predicted abundance)) ?? 100. This estimate also corresponds to the observed:predicted abundance ratio, which allows easy interpretation of results. We illustrated this methodology using two hypothetical species that differed in water depth and wetland preferences. Comparisons of wetlands, using both observed and relative corrected abundances, indicated that relative corrected abundance adequately separates the effect of water depth from the effect of wetlands. ?? 2008 Elsevier B.V.

  4. Dynamic Experiment Design Regularization Approach to Adaptive Imaging with Array Radar/SAR Sensor Systems

    PubMed Central

    Shkvarko, Yuriy; Tuxpan, José; Santos, Stewart

    2011-01-01

    We consider a problem of high-resolution array radar/SAR imaging formalized in terms of a nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the random wavefield scattered from a remotely sensed scene observed through a kernel signal formation operator and contaminated with random Gaussian noise. First, the Sobolev-type solution space is constructed to specify the class of consistent kernel SSP estimators with the reproducing kernel structures adapted to the metrics in such the solution space. Next, the “model-free” variational analysis (VA)-based image enhancement approach and the “model-based” descriptive experiment design (DEED) regularization paradigm are unified into a new dynamic experiment design (DYED) regularization framework. Application of the proposed DYED framework to the adaptive array radar/SAR imaging problem leads to a class of two-level (DEED-VA) regularized SSP reconstruction techniques that aggregate the kernel adaptive anisotropic windowing with the projections onto convex sets to enforce the consistency and robustness of the overall iterative SSP estimators. We also show how the proposed DYED regularization method may be considered as a generalization of the MVDR, APES and other high-resolution nonparametric adaptive radar sensing techniques. A family of the DYED-related algorithms is constructed and their effectiveness is finally illustrated via numerical simulations. PMID:22163859

  5. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.

    PubMed

    Dwivedi, Alok Kumar; Mallawaarachchi, Indika; Alvarado, Luis A

    2017-06-30

    Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  6. Length bias correction in one-day cross-sectional assessments - The nutritionDay study.

    PubMed

    Frantal, Sophie; Pernicka, Elisabeth; Hiesmayr, Michael; Schindler, Karin; Bauer, Peter

    2016-04-01

    A major problem occurring in cross-sectional studies is sampling bias. Length of hospital stay (LOS) differs strongly between patients and causes a length bias as patients with longer LOS are more likely to be included and are therefore overrepresented in this type of study. To adjust for the length bias higher weights are allocated to patients with shorter LOS. We determined the effect of length-bias adjustment in two independent populations. Length-bias correction is applied to the data of the nutritionDay project, a one-day multinational cross-sectional audit capturing data on disease and nutrition of patients admitted to hospital wards with right-censoring after 30 days follow-up. We applied the weighting method for estimating the distribution function of patient baseline variables based on the method of non-parametric maximum likelihood. Results are validated using data from all patients admitted to the General Hospital of Vienna between 2005 and 2009, where the distribution of LOS can be assumed to be known. Additionally, a simplified calculation scheme for estimating the adjusted distribution function of LOS is demonstrated on a small patient example. The crude median (lower quartile; upper quartile) LOS in the cross-sectional sample was 14 (8; 24) and decreased to 7 (4; 12) when adjusted. Hence, adjustment for length bias in cross-sectional studies is essential to get appropriate estimates. Copyright © 2015 Elsevier Ltd and European Society for Clinical Nutrition and Metabolism. All rights reserved.

  7. S-wave attenuation in northeastern Sonora, Mexico, near the faults that ruptured during the earthquake of 3 May 1887 Mw 7.5.

    PubMed

    Villalobos-Escobar, Gina P; Castro, Raúl R

    2014-01-01

    We used a new data set of relocated earthquakes recorded by the Seismic Network of Northeastern Sonora, Mexico (RESNES) to characterize the attenuation of S-waves in the fault zone of the 1887 Sonora earthquake (M w 7.5). We determined spectral attenuation functions for hypocentral distances (r) between 10 and 140 km using a nonparametric approach and found that in this fault zone the spectral amplitudes decay slower with distance at low frequencies (f < 4 Hz) compared to those reported in previous studies in the region using more distant recordings. The attenuation functions obtained for 23 frequencies (0.4 ≤ f ≤ 63.1 Hz) permit us estimating the average quality factor Q S  = (141 ± 1.1 )f ((0.74 ± 0.04)) and a geometrical spreading term G(r) = 1/r (0.21). The values of Q estimated for S-wave paths traveling along the fault system that rupture during the 1887 event, in the north-south direction, are considerably lower than the average Q estimated using source-station paths from multiple stations and directions. These results indicate that near the fault zone S waves attenuate considerably more than at regional scale, particularly at low frequencies. This may be the result of strong scattering near the faults due to the fractured upper crust and higher intrinsic attenuation due to stress concentration near the faults.

  8. Evaluation of species richness estimators based on quantitative performance measures and sensitivity to patchiness and sample grain size

    NASA Astrophysics Data System (ADS)

    Willie, Jacob; Petre, Charles-Albert; Tagg, Nikki; Lens, Luc

    2012-11-01

    Data from forest herbaceous plants in a site of known species richness in Cameroon were used to test the performance of rarefaction and eight species richness estimators (ACE, ICE, Chao1, Chao2, Jack1, Jack2, Bootstrap and MM). Bias, accuracy, precision and sensitivity to patchiness and sample grain size were the evaluation criteria. An evaluation of the effects of sampling effort and patchiness on diversity estimation is also provided. Stems were identified and counted in linear series of 1-m2 contiguous square plots distributed in six habitat types. Initially, 500 plots were sampled in each habitat type. The sampling process was monitored using rarefaction and a set of richness estimator curves. Curves from the first dataset suggested adequate sampling in riparian forest only. Additional plots ranging from 523 to 2143 were subsequently added in the undersampled habitats until most of the curves stabilized. Jack1 and ICE, the non-parametric richness estimators, performed better, being more accurate and less sensitive to patchiness and sample grain size, and significantly reducing biases that could not be detected by rarefaction and other estimators. This study confirms the usefulness of non-parametric incidence-based estimators, and recommends Jack1 or ICE alongside rarefaction while describing taxon richness and comparing results across areas sampled using similar or different grain sizes. As patchiness varied across habitat types, accurate estimations of diversity did not require the same number of plots. The number of samples needed to fully capture diversity is not necessarily the same across habitats, and can only be known when taxon sampling curves have indicated adequate sampling. Differences in observed species richness between habitats were generally due to differences in patchiness, except between two habitats where they resulted from differences in abundance. We suggest that communities should first be sampled thoroughly using appropriate taxon sampling curves before explaining differences in diversity.

  9. Examining the nonparametric effect of drivers' age in rear-end accidents through an additive logistic regression model.

    PubMed

    Ma, Lu; Yan, Xuedong

    2014-06-01

    This study seeks to inspect the nonparametric characteristics connecting the age of the driver to the relative risk of being an at-fault vehicle, in order to discover a more precise and smooth pattern of age impact, which has commonly been neglected in past studies. Records of drivers in two-vehicle rear-end collisions are selected from the general estimates system (GES) 2011 dataset. These extracted observations in fact constitute inherently matched driver pairs under certain matching variables including weather conditions, pavement conditions and road geometry design characteristics that are shared by pairs of drivers in rear-end accidents. The introduced data structure is able to guarantee that the variance of the response variable will not depend on the matching variables and hence provides a high power of statistical modeling. The estimation results exhibit a smooth cubic spline function for examining the nonlinear relationship between the age of the driver and the log odds of being at fault in a rear-end accident. The results are presented with respect to the main effect of age, the interaction effect between age and sex, and the effects of age under different scenarios of pre-crash actions by the leading vehicle. Compared to the conventional specification in which age is categorized into several predefined groups, the proposed method is more flexible and able to produce quantitatively explicit results. First, it confirms the U-shaped pattern of the age effect, and further shows that the risks of young and old drivers change rapidly with age. Second, the interaction effects between age and sex show that female and male drivers behave differently in rear-end accidents. Third, it is found that the pattern of age impact varies according to the type of pre-crash actions exhibited by the leading vehicle. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Understanding Past Population Dynamics: Bayesian Coalescent-Based Modeling with Covariates

    PubMed Central

    Gill, Mandev S.; Lemey, Philippe; Bennett, Shannon N.; Biek, Roman; Suchard, Marc A.

    2016-01-01

    Effective population size characterizes the genetic variability in a population and is a parameter of paramount importance in population genetics and evolutionary biology. Kingman’s coalescent process enables inference of past population dynamics directly from molecular sequence data, and researchers have developed a number of flexible coalescent-based models for Bayesian nonparametric estimation of the effective population size as a function of time. Major goals of demographic reconstruction include identifying driving factors of effective population size, and understanding the association between the effective population size and such factors. Building upon Bayesian nonparametric coalescent-based approaches, we introduce a flexible framework that incorporates time-varying covariates that exploit Gaussian Markov random fields to achieve temporal smoothing of effective population size trajectories. To approximate the posterior distribution, we adapt efficient Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. Incorporating covariates into the demographic inference framework enables the modeling of associations between the effective population size and covariates while accounting for uncertainty in population histories. Furthermore, it can lead to more precise estimates of population dynamics. We apply our model to four examples. We reconstruct the demographic history of raccoon rabies in North America and find a significant association with the spatiotemporal spread of the outbreak. Next, we examine the effective population size trajectory of the DENV-4 virus in Puerto Rico along with viral isolate count data and find similar cyclic patterns. We compare the population history of the HIV-1 CRF02_AG clade in Cameroon with HIV incidence and prevalence data and find that the effective population size is more reflective of incidence rate. Finally, we explore the hypothesis that the population dynamics of musk ox during the Late Quaternary period were related to climate change. [Coalescent; effective population size; Gaussian Markov random fields; phylodynamics; phylogenetics; population genetics. PMID:27368344

  11. Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data.

    PubMed

    Dazard, Jean-Eudes; Rao, J Sunil

    2012-07-01

    The paper addresses a common problem in the analysis of high-dimensional high-throughput "omics" data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel "similarity statistic"-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called 'MVR' ('Mean-Variance Regularization'), downloadable from the CRAN website.

  12. Joint Adaptive Mean-Variance Regularization and Variance Stabilization of High Dimensional Data

    PubMed Central

    Dazard, Jean-Eudes; Rao, J. Sunil

    2012-01-01

    The paper addresses a common problem in the analysis of high-dimensional high-throughput “omics” data, which is parameter estimation across multiple variables in a set of data where the number of variables is much larger than the sample size. Among the problems posed by this type of data are that variable-specific estimators of variances are not reliable and variable-wise tests statistics have low power, both due to a lack of degrees of freedom. In addition, it has been observed in this type of data that the variance increases as a function of the mean. We introduce a non-parametric adaptive regularization procedure that is innovative in that : (i) it employs a novel “similarity statistic”-based clustering technique to generate local-pooled or regularized shrinkage estimators of population parameters, (ii) the regularization is done jointly on population moments, benefiting from C. Stein's result on inadmissibility, which implies that usual sample variance estimator is improved by a shrinkage estimator using information contained in the sample mean. From these joint regularized shrinkage estimators, we derived regularized t-like statistics and show in simulation studies that they offer more statistical power in hypothesis testing than their standard sample counterparts, or regular common value-shrinkage estimators, or when the information contained in the sample mean is simply ignored. Finally, we show that these estimators feature interesting properties of variance stabilization and normalization that can be used for preprocessing high-dimensional multivariate data. The method is available as an R package, called ‘MVR’ (‘Mean-Variance Regularization’), downloadable from the CRAN website. PMID:22711950

  13. Uncertainty in determining extreme precipitation thresholds

    NASA Astrophysics Data System (ADS)

    Liu, Bingjun; Chen, Junfan; Chen, Xiaohong; Lian, Yanqing; Wu, Lili

    2013-10-01

    Extreme precipitation events are rare and occur mostly on a relatively small and local scale, which makes it difficult to set the thresholds for extreme precipitations in a large basin. Based on the long term daily precipitation data from 62 observation stations in the Pearl River Basin, this study has assessed the applicability of the non-parametric, parametric, and the detrended fluctuation analysis (DFA) methods in determining extreme precipitation threshold (EPT) and the certainty to EPTs from each method. Analyses from this study show the non-parametric absolute critical value method is easy to use, but unable to reflect the difference of spatial rainfall distribution. The non-parametric percentile method can account for the spatial distribution feature of precipitation, but the problem with this method is that the threshold value is sensitive to the size of rainfall data series and is subjected to the selection of a percentile thus make it difficult to determine reasonable threshold values for a large basin. The parametric method can provide the most apt description of extreme precipitations by fitting extreme precipitation distributions with probability distribution functions; however, selections of probability distribution functions, the goodness-of-fit tests, and the size of the rainfall data series can greatly affect the fitting accuracy. In contrast to the non-parametric and the parametric methods which are unable to provide information for EPTs with certainty, the DFA method although involving complicated computational processes has proven to be the most appropriate method that is able to provide a unique set of EPTs for a large basin with uneven spatio-temporal precipitation distribution. The consistency between the spatial distribution of DFA-based thresholds with the annual average precipitation, the coefficient of variation (CV), and the coefficient of skewness (CS) for the daily precipitation further proves that EPTs determined by the DFA method are more reasonable and applicable for the Pearl River Basin.

  14. Estimating the Distribution of the Incubation Periods of Human Avian Influenza A(H7N9) Virus Infections.

    PubMed

    Virlogeux, Victor; Li, Ming; Tsang, Tim K; Feng, Luzhao; Fang, Vicky J; Jiang, Hui; Wu, Peng; Zheng, Jiandong; Lau, Eric H Y; Cao, Yu; Qin, Ying; Liao, Qiaohong; Yu, Hongjie; Cowling, Benjamin J

    2015-10-15

    A novel avian influenza virus, influenza A(H7N9), emerged in China in early 2013 and caused severe disease in humans, with infections occurring most frequently after recent exposure to live poultry. The distribution of A(H7N9) incubation periods is of interest to epidemiologists and public health officials, but estimation of the distribution is complicated by interval censoring of exposures. Imputation of the midpoint of intervals was used in some early studies, resulting in estimated mean incubation times of approximately 5 days. In this study, we estimated the incubation period distribution of human influenza A(H7N9) infections using exposure data available for 229 patients with laboratory-confirmed A(H7N9) infection from mainland China. A nonparametric model (Turnbull) and several parametric models accounting for the interval censoring in some exposures were fitted to the data. For the best-fitting parametric model (Weibull), the mean incubation period was 3.4 days (95% confidence interval: 3.0, 3.7) and the variance was 2.9 days; results were very similar for the nonparametric Turnbull estimate. Under the Weibull model, the 95th percentile of the incubation period distribution was 6.5 days (95% confidence interval: 5.9, 7.1). The midpoint approximation for interval-censored exposures led to overestimation of the mean incubation period. Public health observation of potentially exposed persons for 7 days after exposure would be appropriate. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  15. When the Single Matters more than the Group (II): Addressing the Problem of High False Positive Rates in Single Case Voxel Based Morphometry Using Non-parametric Statistics.

    PubMed

    Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea

    2016-01-01

    In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used to characterize neuroanatomical alterations in individual subjects as long as non-parametric statistics are employed.

  16. Multicategorical Spline Model for Item Response Theory.

    ERIC Educational Resources Information Center

    Abrahamowicz, Michal; Ramsay, James O.

    1992-01-01

    A nonparametric multicategorical model for multiple-choice data is proposed as an extension of the binary spline model of J. O. Ramsay and M. Abrahamowicz (1989). Results of two Monte Carlo studies illustrate the model, which approximates probability functions by rational splines. (SLD)

  17. CHRONOBIOLOGY OF HIGH BLOOD PRESSURE

    PubMed Central

    Cornélissen, G.; Halberg, F.; Bakken, E. E.; Wang, Z.; Tarquini, R.; Perfetto, F.; Laffi, G.; Maggioni, C.; Kumagai, Y.; Homolka, P.; Havelková, A.; Dušek, J.; Svačinová, H.; Siegelová, J.; Fišer, B.

    2008-01-01

    BIOCOS, the project aimed at studying BIOlogical systems in their COSmos, has obtained a great deal of expertise in the fields of blood pressure (BP) and heart rate (HR) monitoring and of marker rhythmometry for the purposes of screening, diagnosis, treatment, and prognosis. Prolonging the monitoring reduces the uncertainty in the estimation of circadian parameters; the current recommendation of BIOCOS requires monitoring for at least 7 days. The BIOCOS approach consists of a parametric and a non-parametric analysis of the data, in which the results from the individual subject are being compared with gender- and age-specified reference values in health. Chronobiological designs can offer important new information regarding the optimization of treatment by timing its administration as a function of circadian and other rhythms. New technological developments are needed to close the loop between the monitoring of blood pressure and the administration of antihypertensive drugs. PMID:19122770

  18. Flexible link functions in nonparametric binary regression with Gaussian process priors.

    PubMed

    Li, Dan; Wang, Xia; Lin, Lizhen; Dey, Dipak K

    2016-09-01

    In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function. © 2015, The International Biometric Society.

  19. Flexible Link Functions in Nonparametric Binary Regression with Gaussian Process Priors

    PubMed Central

    Li, Dan; Lin, Lizhen; Dey, Dipak K.

    2015-01-01

    Summary In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function. PMID:26686333

  20. Numerical trials of HISSE

    NASA Technical Reports Server (NTRS)

    Peters, C.; Kampe, F. (Principal Investigator)

    1980-01-01

    The mathematical description and implementation of the statistical estimation procedure known as the Houston integrated spatial/spectral estimator (HISSE) is discussed. HISSE is based on a normal mixture model and is designed to take advantage of spectral and spatial information of LANDSAT data pixels, utilizing the initial classification and clustering information provided by the AMOEBA algorithm. The HISSE calculates parametric estimates of class proportions which reduce the error inherent in estimates derived from typical classify and count procedures common to nonparametric clustering algorithms. It also singles out spatial groupings of pixels which are most suitable for labeling classes. These calculations are designed to aid the analyst/interpreter in labeling patches with a crop class label. Finally, HISSE's initial performance on an actual LANDSAT agricultural ground truth data set is reported.

Top