nonparametric density estimation: Topics by Science.gov

Sample records for nonparametric density estimation

Nonparametric estimation of plant density by the distance method

USGS Publications Warehouse

Patil, S.A.; Burnham, K.P.; Kovner, J.L.

1979-01-01

A relation between the plant density and the probability density function of the nearest neighbor distance (squared) from a random point is established under fairly broad conditions. Based upon this relationship, a nonparametric estimator for the plant density is developed and presented in terms of order statistics. Consistency and asymptotic normality of the estimator are discussed. An interval estimator for the density is obtained. The modifications of this estimator and its variance are given when the distribution is truncated. Simulation results are presented for regular, random and aggregated populations to illustrate the nonparametric estimator and its variance. A numerical example from field data is given. Merits and deficiencies of the estimator are discussed with regard to its robustness and variance.
Robust location and spread measures for nonparametric probability density function estimation.

PubMed

López-Rubio, Ezequiel

2009-10-01

Robustness against outliers is a desirable property of any unsupervised learning scheme. In particular, probability density estimators benefit from incorporating this feature. A possible strategy to achieve this goal is to substitute the sample mean and the sample covariance matrix by more robust location and spread estimators. Here we use the L1-median to develop a nonparametric probability density function (PDF) estimator. We prove its most relevant properties, and we show its performance in density estimation and classification applications.
Nonparametric analysis of Minnesota spruce and aspen tree data and LANDSAT data

NASA Technical Reports Server (NTRS)

Scott, D. W.; Jee, R.

1984-01-01

The application of nonparametric methods in data-intensive problems faced by NASA is described. The theoretical development of efficient multivariate density estimators and the novel use of color graphics workstations are reviewed. The use of nonparametric density estimates for data representation and for Bayesian classification are described and illustrated. Progress in building a data analysis system in a workstation environment is reviewed and preliminary runs presented.
Nonparametric probability density estimation by optimization theoretic techniques

NASA Technical Reports Server (NTRS)

Scott, D. W.

1976-01-01

Two nonparametric probability density estimators are considered. The first is the kernel estimator. The problem of choosing the kernel scaling factor based solely on a random sample is addressed. An interactive mode is discussed and an algorithm proposed to choose the scaling factor automatically. The second nonparametric probability estimate uses penalty function techniques with the maximum likelihood criterion. A discrete maximum penalized likelihood estimator is proposed and is shown to be consistent in the mean square error. A numerical implementation technique for the discrete solution is discussed and examples displayed. An extensive simulation study compares the integrated mean square error of the discrete and kernel estimators. The robustness of the discrete estimator is demonstrated graphically.
Mathematical models for nonparametric inferences from line transect data

USGS Publications Warehouse

Burnham, K.P.; Anderson, D.R.

1976-01-01

A general mathematical theory of line transects is develoepd which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(O) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y/r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(O/r).
Optimum nonparametric estimation of population density based on ordered distances

USGS Publications Warehouse

Patil, S.A.; Kovner, J.L.; Burnham, Kenneth P.

1982-01-01

The asymptotic mean and error mean square are determined for the nonparametric estimator of plant density by distance sampling proposed by Patil, Burnham and Kovner (1979, Biometrics 35, 597-604. On the basis of these formulae, a bias-reduced version of this estimator is given, and its specific form is determined which gives minimum mean square error under varying assumptions about the true probability density function of the sampled data. Extension is given to line-transect sampling.
Mathematical models for non-parametric inferences from line transect data

USGS Publications Warehouse

Burnham, K.P.; Anderson, D.R.

1976-01-01

A general mathematical theory of line transects is developed which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(0) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y I r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(0 I r).
Nonparametric model validations for hidden Markov models with applications in financial econometrics.

PubMed

Zhao, Zhibiao

2011-06-01

We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise.
Characterization of a maximum-likelihood nonparametric density estimator of kernel type

NASA Technical Reports Server (NTRS)

Geman, S.; Mcclure, D. E.

1982-01-01

Kernel type density estimators calculated by the method of sieves. Proofs are presented for the characterization theorem: Let x(1), x(2),...x(n) be a random sample from a population with density f(0). Let sigma 0 and consider estimators f of f(0) defined by (1).
Nonparametric model validations for hidden Markov models with applications in financial econometrics

PubMed Central

Zhao, Zhibiao

2011-01-01

We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise. PMID:21750601
Nonparametric Density Estimation Based on Self-Organizing Incremental Neural Network for Large Noisy Data.

PubMed

Nakamura, Yoshihiro; Hasegawa, Osamu

2017-01-01

With the ongoing development and expansion of communication networks and sensors, massive amounts of data are continuously generated in real time from real environments. Beforehand, prediction of a distribution underlying such data is difficult; furthermore, the data include substantial amounts of noise. These factors make it difficult to estimate probability densities. To handle these issues and massive amounts of data, we propose a nonparametric density estimator that rapidly learns data online and has high robustness. Our approach is an extension of both kernel density estimation (KDE) and a self-organizing incremental neural network (SOINN); therefore, we call our approach KDESOINN. An SOINN provides a clustering method that learns about the given data as networks of prototype of data; more specifically, an SOINN can learn the distribution underlying the given data. Using this information, KDESOINN estimates the probability density function. The results of our experiments show that KDESOINN outperforms or achieves performance comparable to the current state-of-the-art approaches in terms of robustness, learning time, and accuracy.
On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices

PubMed Central

Ye, Xin; Pendyala, Ram M.; Zou, Yajie

2017-01-01

A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences. PMID:29073152
On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices.

PubMed

Wang, Ke; Ye, Xin; Pendyala, Ram M; Zou, Yajie

2017-01-01

A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences.
Robust nonparametric quantification of clustering density of molecules in single-molecule localization microscopy

PubMed Central

Jiang, Shenghang; Park, Seongjin; Challapalli, Sai Divya; Fei, Jingyi; Wang, Yong

2017-01-01

We report a robust nonparametric descriptor, J′(r), for quantifying the density of clustering molecules in single-molecule localization microscopy. J′(r), based on nearest neighbor distribution functions, does not require any parameter as an input for analyzing point patterns. We show that J′(r) displays a valley shape in the presence of clusters of molecules, and the characteristics of the valley reliably report the clustering features in the data. Most importantly, the position of the J′(r) valley (rJm′) depends exclusively on the density of clustering molecules (ρc). Therefore, it is ideal for direct estimation of the clustering density of molecules in single-molecule localization microscopy. As an example, this descriptor was applied to estimate the clustering density of ptsG mRNA in E. coli bacteria. PMID:28636661
Fusion of Hard and Soft Information in Nonparametric Density Estimation

DTIC Science & Technology

2015-06-10

and stochastic optimization models, in analysis of simulation output, and when instantiating probability models. We adopt a constrained maximum...particular, density estimation is needed for generation of input densities to simulation and stochastic optimization models, in analysis of simulation output...an essential step in simulation analysis and stochastic optimization is the generation of probability densities for input random variables; see for
Simplified Computation for Nonparametric Windows Method of Probability Density Function Estimation.

PubMed

Joshi, Niranjan; Kadir, Timor; Brady, Michael

2011-08-01

Recently, Kadir and Brady proposed a method for estimating probability density functions (PDFs) for digital signals which they call the Nonparametric (NP) Windows method. The method involves constructing a continuous space representation of the discrete space and sampled signal by using a suitable interpolation method. NP Windows requires only a small number of observed signal samples to estimate the PDF and is completely data driven. In this short paper, we first develop analytical formulae to obtain the NP Windows PDF estimates for 1D, 2D, and 3D signals, for different interpolation methods. We then show that the original procedure to calculate the PDF estimate can be significantly simplified and made computationally more efficient by a judicious choice of the frame of reference. We have also outlined specific algorithmic details of the procedures enabling quick implementation. Our reformulation of the original concept has directly demonstrated a close link between the NP Windows method and the Kernel Density Estimator.
Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators: Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators

DOE PAGES

Staid, Andrea; Watson, Jean -Paul; Wets, Roger J. -B.; ...

2017-07-11

Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power timemore » series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.« less
Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators: Generating short-term probabilistic wind power scenarios via nonparametric forecast error density estimators

DOE Office of Scientific and Technical Information (OSTI.GOV)

Staid, Andrea; Watson, Jean -Paul; Wets, Roger J. -B.

Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power timemore » series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.« less
Considerations in cross-validation type density smoothing with a look at some data

NASA Technical Reports Server (NTRS)

Schuster, E. F.

1982-01-01

Experience gained in applying nonparametric maximum likelihood techniques of density estimation to judge the comparative quality of various estimators is reported. Two invariate data sets of one hundered samples (one Cauchy, one natural normal) are considered as well as studies in the multivariate case.
Multivariate Density Estimation and Remote Sensing

NASA Technical Reports Server (NTRS)

Scott, D. W.

1983-01-01

Current efforts to develop methods and computer algorithms to effectively represent multivariate data commonly encountered in remote sensing applications are described. While this may involve scatter diagrams, multivariate representations of nonparametric probability density estimates are emphasized. The density function provides a useful graphical tool for looking at data and a useful theoretical tool for classification. This approach is called a thunderstorm data analysis.

A hybrid pareto mixture for conditional asymmetric fat-tailed distributions.

PubMed

Carreau, Julie; Bengio, Yoshua

2009-07-01

In many cases, we observe some variables X that contain predictive information over a scalar variable of interest Y , with (X,Y) pairs observed in a training set. We can take advantage of this information to estimate the conditional density p(Y|X = x). In this paper, we propose a conditional mixture model with hybrid Pareto components to estimate p(Y|X = x). The hybrid Pareto is a Gaussian whose upper tail has been replaced by a generalized Pareto tail. A third parameter, in addition to the location and spread parameters of the Gaussian, controls the heaviness of the upper tail. Using the hybrid Pareto in a mixture model results in a nonparametric estimator that can adapt to multimodality, asymmetry, and heavy tails. A conditional density estimator is built by modeling the parameters of the mixture estimator as functions of X. We use a neural network to implement these functions. Such conditional density estimators have important applications in many domains such as finance and insurance. We show experimentally that this novel approach better models the conditional density in terms of likelihood, compared to competing algorithms: conditional mixture models with other types of components and a classical kernel-based nonparametric model.
Marginally specified priors for non-parametric Bayesian estimation

PubMed Central

Kessler, David C.; Hoff, Peter D.; Dunson, David B.

2014-01-01

Summary Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables. PMID:25663813
A Non-Parametric Probability Density Estimator and Some Applications.

DTIC Science & Technology

1984-05-01

distributions, which are assumed to be representa- tive of platykurtic , mesokurtic, and leptokurtic distribu- tions in general. The dissertation is... platykurtic distributions. Consider, for example, the uniform distribution shown in Figure 4. 34 o . 1., Figure 4 -Sensitivity to Support Estimation The...results of the density function comparisons indicate that the new estimator is clearly -Z superior for platykurtic distributions, equal to the best 59
Novel Application of Density Estimation Techniques in Muon Ionization Cooling Experiment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mohayai, Tanaz Angelina; Snopok, Pavel; Neuffer, David

The international Muon Ionization Cooling Experiment (MICE) aims to demonstrate muon beam ionization cooling for the first time and constitutes a key part of the R&D towards a future neutrino factory or muon collider. Beam cooling reduces the size of the phase space volume occupied by the beam. Non-parametric density estimation techniques allow very precise calculation of the muon beam phase-space density and its increase as a result of cooling. These density estimation techniques are investigated in this paper and applied in order to estimate the reduction in muon beam size in MICE under various conditions.
Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds

PubMed Central

Bhattacharya, Abhishek; Dunson, David B.

2012-01-01

This article considers a broad class of kernel mixture density models on compact metric spaces and manifolds. Following a Bayesian approach with a nonparametric prior on the location mixing distribution, sufficient conditions are obtained on the kernel, prior and the underlying space for strong posterior consistency at any continuous density. The prior is also allowed to depend on the sample size n and sufficient conditions are obtained for weak and strong consistency. These conditions are verified on compact Euclidean spaces using multivariate Gaussian kernels, on the hypersphere using a von Mises-Fisher kernel and on the planar shape space using complex Watson kernels. PMID:22984295
A bias-corrected estimator in multiple imputation for missing data.

PubMed

Tomita, Hiroaki; Fujisawa, Hironori; Henmi, Masayuki

2018-05-29

Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset. Copyright © 2018 John Wiley & Sons, Ltd.
Exponential series approaches for nonparametric graphical models

NASA Astrophysics Data System (ADS)

Janofsky, Eric

Markov Random Fields (MRFs) or undirected graphical models are parsimonious representations of joint probability distributions. This thesis studies high-dimensional, continuous-valued pairwise Markov Random Fields. We are particularly interested in approximating pairwise densities whose logarithm belongs to a Sobolev space. For this problem we propose the method of exponential series which approximates the log density by a finite-dimensional exponential family with the number of sufficient statistics increasing with the sample size. We consider two approaches to estimating these models. The first is regularized maximum likelihood. This involves optimizing the sum of the log-likelihood of the data and a sparsity-inducing regularizer. We then propose a variational approximation to the likelihood based on tree-reweighted, nonparametric message passing. This approximation allows for upper bounds on risk estimates, leverages parallelization and is scalable to densities on hundreds of nodes. We show how the regularized variational MLE may be estimated using a proximal gradient algorithm. We then consider estimation using regularized score matching. This approach uses an alternative scoring rule to the log-likelihood, which obviates the need to compute the normalizing constant of the distribution. For general continuous-valued exponential families, we provide parameter and edge consistency results. As a special case we detail a new approach to sparse precision matrix estimation which has statistical performance competitive with the graphical lasso and computational performance competitive with the state-of-the-art glasso algorithm. We then describe results for model selection in the nonparametric pairwise model using exponential series. The regularized score matching problem is shown to be a convex program; we provide scalable algorithms based on consensus alternating direction method of multipliers (ADMM) and coordinate-wise descent. We use simulations to compare our method to others in the literature as well as the aforementioned TRW estimator.
Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error.

PubMed

Carroll, Raymond J; Delaigle, Aurore; Hall, Peter

2011-03-01

In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.
Comparison of parametric and bootstrap method in bioequivalence test.

PubMed

Ahn, Byung-Jin; Yim, Dong-Seok

2009-10-01

The estimation of 90% parametric confidence intervals (CIs) of mean AUC and Cmax ratios in bioequivalence (BE) tests are based upon the assumption that formulation effects in log-transformed data are normally distributed. To compare the parametric CIs with those obtained from nonparametric methods we performed repeated estimation of bootstrap-resampled datasets. The AUC and Cmax values from 3 archived datasets were used. BE tests on 1,000 resampled datasets from each archived dataset were performed using SAS (Enterprise Guide Ver.3). Bootstrap nonparametric 90% CIs of formulation effects were then compared with the parametric 90% CIs of the original datasets. The 90% CIs of formulation effects estimated from the 3 archived datasets were slightly different from nonparametric 90% CIs obtained from BE tests on resampled datasets. Histograms and density curves of formulation effects obtained from resampled datasets were similar to those of normal distribution. However, in 2 of 3 resampled log (AUC) datasets, the estimates of formulation effects did not follow the Gaussian distribution. Bias-corrected and accelerated (BCa) CIs, one of the nonparametric CIs of formulation effects, shifted outside the parametric 90% CIs of the archived datasets in these 2 non-normally distributed resampled log (AUC) datasets. Currently, the 80~125% rule based upon the parametric 90% CIs is widely accepted under the assumption of normally distributed formulation effects in log-transformed data. However, nonparametric CIs may be a better choice when data do not follow this assumption.
Comparison of Parametric and Bootstrap Method in Bioequivalence Test

PubMed Central

Ahn, Byung-Jin

2009-01-01

The estimation of 90% parametric confidence intervals (CIs) of mean AUC and Cmax ratios in bioequivalence (BE) tests are based upon the assumption that formulation effects in log-transformed data are normally distributed. To compare the parametric CIs with those obtained from nonparametric methods we performed repeated estimation of bootstrap-resampled datasets. The AUC and Cmax values from 3 archived datasets were used. BE tests on 1,000 resampled datasets from each archived dataset were performed using SAS (Enterprise Guide Ver.3). Bootstrap nonparametric 90% CIs of formulation effects were then compared with the parametric 90% CIs of the original datasets. The 90% CIs of formulation effects estimated from the 3 archived datasets were slightly different from nonparametric 90% CIs obtained from BE tests on resampled datasets. Histograms and density curves of formulation effects obtained from resampled datasets were similar to those of normal distribution. However, in 2 of 3 resampled log (AUC) datasets, the estimates of formulation effects did not follow the Gaussian distribution. Bias-corrected and accelerated (BCa) CIs, one of the nonparametric CIs of formulation effects, shifted outside the parametric 90% CIs of the archived datasets in these 2 non-normally distributed resampled log (AUC) datasets. Currently, the 80~125% rule based upon the parametric 90% CIs is widely accepted under the assumption of normally distributed formulation effects in log-transformed data. However, nonparametric CIs may be a better choice when data do not follow this assumption. PMID:19915699
A tool for the estimation of the distribution of landslide area in R

NASA Astrophysics Data System (ADS)

Rossi, M.; Cardinali, M.; Fiorucci, F.; Marchesini, I.; Mondini, A. C.; Santangelo, M.; Ghosh, S.; Riguer, D. E. L.; Lahousse, T.; Chang, K. T.; Guzzetti, F.

2012-04-01

We have developed a tool in R (the free software environment for statistical computing, http://www.r-project.org/) to estimate the probability density and the frequency density of landslide area. The tool implements parametric and non-parametric approaches to the estimation of the probability density and the frequency density of landslide area, including: (i) Histogram Density Estimation (HDE), (ii) Kernel Density Estimation (KDE), and (iii) Maximum Likelihood Estimation (MLE). The tool is available as a standard Open Geospatial Consortium (OGC) Web Processing Service (WPS), and is accessible through the web using different GIS software clients. We tested the tool to compare Double Pareto and Inverse Gamma models for the probability density of landslide area in different geological, morphological and climatological settings, and to compare landslides shown in inventory maps prepared using different mapping techniques, including (i) field mapping, (ii) visual interpretation of monoscopic and stereoscopic aerial photographs, (iii) visual interpretation of monoscopic and stereoscopic VHR satellite images and (iv) semi-automatic detection and mapping from VHR satellite images. Results show that both models are applicable in different geomorphological settings. In most cases the two models provided very similar results. Non-parametric estimation methods (i.e., HDE and KDE) provided reasonable results for all the tested landslide datasets. For some of the datasets, MLE failed to provide a result, for convergence problems. The two tested models (Double Pareto and Inverse Gamma) resulted in very similar results for large and very large datasets (> 150 samples). Differences in the modeling results were observed for small datasets affected by systematic biases. A distinct rollover was observed in all analyzed landslide datasets, except for a few datasets obtained from landslide inventories prepared through field mapping or by semi-automatic mapping from VHR satellite imagery. The tool can also be used to evaluate the probability density and the frequency density of landslide volume.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

PubMed

Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin

We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification

PubMed Central

Feng, Yang; Jiang, Jiancheng; Tong, Xin

2015-01-01

We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970
Forest Stand Canopy Structure Attribute Estimation from High Resolution Digital Airborne Imagery

Treesearch

Demetrios Gatziolis

2006-01-01

A study of forest stand canopy variable assessment using digital, airborne, multispectral imagery is presented. Variable estimation involves stem density, canopy closure, and mean crown diameter, and it is based on quantification of spatial autocorrelation among pixel digital numbers (DN) using variogram analysis and an alternative, non-parametric approach known as...
Topics in global convergence of density estimates

NASA Technical Reports Server (NTRS)

Devroye, L.

1982-01-01

The problem of estimating a density f on R sup d from a sample Xz(1),...,X(n) of independent identically distributed random vectors is critically examined, and some recent results in the field are reviewed. The following statements are qualified: (1) For any sequence of density estimates f(n), any arbitrary slow rate of convergence to 0 is possible for E(integral/f(n)-fl); (2) In theoretical comparisons of density estimates, integral/f(n)-f/ should be used and not integral/f(n)-f/sup p, p 1; and (3) For most reasonable nonparametric density estimates, either there is convergence of integral/f(n)-f/ (and then the convergence is in the strongest possible sense for all f), or there is no convergence (even in the weakest possible sense for a single f). There is no intermediate situation.
Nonparametric density estimation and optimal bandwidth selection for protein unfolding and unbinding data

NASA Astrophysics Data System (ADS)

Bura, E.; Zhmurov, A.; Barsegov, V.

2009-01-01

Dynamic force spectroscopy and steered molecular simulations have become powerful tools for analyzing the mechanical properties of proteins, and the strength of protein-protein complexes and aggregates. Probability density functions of the unfolding forces and unfolding times for proteins, and rupture forces and bond lifetimes for protein-protein complexes allow quantification of the forced unfolding and unbinding transitions, and mapping the biomolecular free energy landscape. The inference of the unknown probability distribution functions from the experimental and simulated forced unfolding and unbinding data, as well as the assessment of analytically tractable models of the protein unfolding and unbinding requires the use of a bandwidth. The choice of this quantity is typically subjective as it draws heavily on the investigator's intuition and past experience. We describe several approaches for selecting the "optimal bandwidth" for nonparametric density estimators, such as the traditionally used histogram and the more advanced kernel density estimators. The performance of these methods is tested on unimodal and multimodal skewed, long-tailed distributed data, as typically observed in force spectroscopy experiments and in molecular pulling simulations. The results of these studies can serve as a guideline for selecting the optimal bandwidth to resolve the underlying distributions from the forced unfolding and unbinding data for proteins.
Bayesian Nonparametric Inference – Why and How

PubMed Central

Müller, Peter; Mitra, Riten

2013-01-01

We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932
A framework for multivariate data-based at-site flood frequency analysis: Essentiality of the conjugal application of parametric and nonparametric approaches

NASA Astrophysics Data System (ADS)

Vittal, H.; Singh, Jitendra; Kumar, Pankaj; Karmakar, Subhankar

2015-06-01

In watershed management, flood frequency analysis (FFA) is performed to quantify the risk of flooding at different spatial locations and also to provide guidelines for determining the design periods of flood control structures. The traditional FFA was extensively performed by considering univariate scenario for both at-site and regional estimation of return periods. However, due to inherent mutual dependence of the flood variables or characteristics [i.e., peak flow (P), flood volume (V) and flood duration (D), which are random in nature], analysis has been further extended to multivariate scenario, with some restrictive assumptions. To overcome the assumption of same family of marginal density function for all flood variables, the concept of copula has been introduced. Although, the advancement from univariate to multivariate analyses drew formidable attention to the FFA research community, the basic limitation was that the analyses were performed with the implementation of only parametric family of distributions. The aim of the current study is to emphasize the importance of nonparametric approaches in the field of multivariate FFA; however, the nonparametric distribution may not always be a good-fit and capable of replacing well-implemented multivariate parametric and multivariate copula-based applications. Nevertheless, the potential of obtaining best-fit using nonparametric distributions might be improved because such distributions reproduce the sample's characteristics, resulting in more accurate estimations of the multivariate return period. Hence, the current study shows the importance of conjugating multivariate nonparametric approach with multivariate parametric and copula-based approaches, thereby results in a comprehensive framework for complete at-site FFA. Although the proposed framework is designed for at-site FFA, this approach can also be applied to regional FFA because regional estimations ideally include at-site estimations. The framework is based on the following steps: (i) comprehensive trend analysis to assess nonstationarity in the observed data; (ii) selection of the best-fit univariate marginal distribution with a comprehensive set of parametric and nonparametric distributions for the flood variables; (iii) multivariate frequency analyses with parametric, copula-based and nonparametric approaches; and (iv) estimation of joint and various conditional return periods. The proposed framework for frequency analysis is demonstrated using 110 years of observed data from Allegheny River at Salamanca, New York, USA. The results show that for both univariate and multivariate cases, the nonparametric Gaussian kernel provides the best estimate. Further, we perform FFA for twenty major rivers over continental USA, which shows for seven rivers, all the flood variables followed nonparametric Gaussian kernel; whereas for other rivers, parametric distributions provide the best-fit either for one or two flood variables. Thus the summary of results shows that the nonparametric method cannot substitute the parametric and copula-based approaches, but should be considered during any at-site FFA to provide the broadest choices for best estimation of the flood return periods.
Survival potential of Phytophthora infestans sporangia in relation to meteorological factors

USDA-ARS?s Scientific Manuscript database

Assessment of meteorological factors coupled with sporangia survival curves may enhance effective management of potato late blight, caused by Phytophthora infestans. We utilized a non-parametric density estimation approach to evaluate the cumulative probability of occurrence of temperature and relat...
Estimation of option-implied risk-neutral into real-world density by using calibration function

NASA Astrophysics Data System (ADS)

Bahaludin, Hafizah; Abdullah, Mimi Hafizah

2017-04-01

Option prices contain crucial information that can be used as a reflection of future development of an underlying assets' price. The main objective of this study is to extract the risk-neutral density (RND) and the risk-world density (RWD) of option prices. A volatility function technique is applied by using a fourth order polynomial interpolation to obtain the RNDs. Then, a calibration function is used to convert the RNDs into RWDs. There are two types of calibration function which are parametric and non-parametric calibrations. The density is extracted from the Dow Jones Industrial Average (DJIA) index options with a one month constant maturity from January 2009 until December 2015. The performance of RNDs and RWDs extracted are evaluated by using a density forecasting test. This study found out that the RWDs obtain can provide an accurate information regarding the price of the underlying asset in future compared to that of the RNDs. In addition, empirical evidence suggests that RWDs from a non-parametric calibration has a better accuracy than other densities.

Regionalizing nonparametric models of precipitation amounts on different temporal scales

NASA Astrophysics Data System (ADS)

Mosthaf, Tobias; Bárdossy, András

2017-05-01

Parametric distribution functions are commonly used to model precipitation amounts corresponding to different durations. The precipitation amounts themselves are crucial for stochastic rainfall generators and weather generators. Nonparametric kernel density estimates (KDEs) offer a more flexible way to model precipitation amounts. As already stated in their name, these models do not exhibit parameters that can be easily regionalized to run rainfall generators at ungauged locations as well as at gauged locations. To overcome this deficiency, we present a new interpolation scheme for nonparametric models and evaluate it for different temporal resolutions ranging from hourly to monthly. During the evaluation, the nonparametric methods are compared to commonly used parametric models like the two-parameter gamma and the mixed-exponential distribution. As water volume is considered to be an essential parameter for applications like flood modeling, a Lorenz-curve-based criterion is also introduced. To add value to the estimation of data at sub-daily resolutions, we incorporated the plentiful daily measurements in the interpolation scheme, and this idea was evaluated. The study region is the federal state of Baden-Württemberg in the southwest of Germany with more than 500 rain gauges. The validation results show that the newly proposed nonparametric interpolation scheme provides reasonable results and that the incorporation of daily values in the regionalization of sub-daily models is very beneficial.
Nonparametric entropy estimation using kernel densities.

PubMed

Lake, Douglas E

2009-01-01

The entropy of experimental data from the biological and medical sciences provides additional information over summary statistics. Calculating entropy involves estimates of probability density functions, which can be effectively accomplished using kernel density methods. Kernel density estimation has been widely studied and a univariate implementation is readily available in MATLAB. The traditional definition of Shannon entropy is part of a larger family of statistics, called Renyi entropy, which are useful in applications that require a measure of the Gaussianity of data. Of particular note is the quadratic entropy which is related to the Friedman-Tukey (FT) index, a widely used measure in the statistical community. One application where quadratic entropy is very useful is the detection of abnormal cardiac rhythms, such as atrial fibrillation (AF). Asymptotic and exact small-sample results for optimal bandwidth and kernel selection to estimate the FT index are presented and lead to improved methods for entropy estimation.
Bayesian nonparametric regression with varying residual density

PubMed Central

Pati, Debdeep; Dunson, David B.

2013-01-01

We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053
Methods for estimating population density in data-limited areas: evaluating regression and tree-based models in Peru.

PubMed

Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

2014-01-01

Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.
Methods for Estimating Population Density in Data-Limited Areas: Evaluating Regression and Tree-Based Models in Peru

PubMed Central

Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

2014-01-01

Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657
Nonparametric estimation and testing of fixed effects panel data models

PubMed Central

Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi

2009-01-01

In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335
A new approach to evaluate gamma-ray measurements

NASA Technical Reports Server (NTRS)

Dejager, O. C.; Swanepoel, J. W. H.; Raubenheimer, B. C.; Vandervalt, D. J.

1985-01-01

Misunderstandings about the term random samples its implications may easily arise. Conditions under which the phases, obtained from arrival times, do not form a random sample and the dangers involved are discussed. Watson's U sup 2 test for uniformity is recommended for light curves with duty cycles larger than 10%. Under certain conditions, non-parametric density estimation may be used to determine estimates of the true light curve and its parameters.
The influence of linear elements on plant species diversity of Mediterranean rural landscapes: assessment of different indices and statistical approaches.

PubMed

García del Barrio, J M; Ortega, M; Vázquez De la Cueva, A; Elena-Rosselló, R

2006-08-01

This paper mainly aims to study the linear element influence on the estimation of vascular plant species diversity in five Mediterranean landscapes modeled as land cover patch mosaics. These landscapes have several core habitats and a different set of linear elements--habitat edges or ecotones, roads or railways, rivers, streams and hedgerows on farm land--whose plant composition were examined. Secondly, it aims to check plant diversity estimation in Mediterranean landscapes using parametric and non-parametric procedures, with two indices: Species richness and Shannon index. Land cover types and landscape linear elements were identified from aerial photographs. Their spatial information was processed using GIS techniques. Field plots were selected using a stratified sampling design according to relieve and tree density of each habitat type. A 50x20 m2 multi-scale sampling plot was designed for the core habitats and across the main landscape linear elements. Richness and diversity of plant species were estimated by comparing the observed field data to ICE (Incidence-based Coverage Estimator) and ACE (Abundance-based Coverage Estimator) non-parametric estimators. The species density, percentage of unique species, and alpha diversity per plot were significantly higher (p < 0.05) in linear elements than in core habitats. ICE estimate of number of species was 32% higher than of ACE estimate, which did not differ significantly from the observed values. Accumulated species richness in core habitats together with linear elements, were significantly higher than those recorded only in the core habitats in all the landscapes. Conversely, Shannon diversity index did not show significant differences.
Nonparametric Estimation of the Probability of Ruin.

DTIC Science & Technology

1985-02-01

MATHEMATICS RESEARCH CENTER I E N FREES FEB 85 MRC/TSR...in NONPARAMETRIC ESTIMATION OF THE PROBABILITY OF RUIN Lf Edward W. Frees * Mathematics Research Center University of Wisconsin-Madison 610 Walnut...34 - .. --- - • ’. - -:- - - ..- . . .- -- .-.-. . -. . .- •. . - . . - . . .’ . ’- - .. -’vi . .-" "-- -" ,’- UNIVERSITY OF WISCONSIN-MADISON MATHEMATICS RESEARCH CENTER NONPARAMETRIC ESTIMATION OF THE PROBABILITY
Application of Semiparametric Spline Regression Model in Analyzing Factors that In uence Population Density in Central Java

NASA Astrophysics Data System (ADS)

Sumantari, Y. D.; Slamet, I.; Sugiyanto

2017-06-01

Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
GPU Acceleration of Mean Free Path Based Kernel Density Estimators for Monte Carlo Neutronics Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Burke, TImothy P.; Kiedrowski, Brian C.; Martin, William R.

Kernel Density Estimators (KDEs) are a non-parametric density estimation technique that has recently been applied to Monte Carlo radiation transport simulations. Kernel density estimators are an alternative to histogram tallies for obtaining global solutions in Monte Carlo tallies. With KDEs, a single event, either a collision or particle track, can contribute to the score at multiple tally points with the uncertainty at those points being independent of the desired resolution of the solution. Thus, KDEs show potential for obtaining estimates of a global solution with reduced variance when compared to a histogram. Previously, KDEs have been applied to neutronics formore » one-group reactor physics problems and fixed source shielding applications. However, little work was done to obtain reaction rates using KDEs. This paper introduces a new form of the MFP KDE that is capable of handling general geometries. Furthermore, extending the MFP KDE to 2-D problems in continuous energy introduces inaccuracies to the solution. An ad-hoc solution to these inaccuracies is introduced that produces errors smaller than 4% at material interfaces.« less
A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

PubMed

Karabatsos, George

2017-02-01

Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.

PubMed

Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye

2016-03-01

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.
Identification of cloud fields by the nonparametric algorithm of pattern recognition from normalized video data recorded with the AVHRR instrument

NASA Astrophysics Data System (ADS)

Protasov, Konstantin T.; Pushkareva, Tatyana Y.; Artamonov, Evgeny S.

2002-02-01

The problem of cloud field recognition from the NOAA satellite data is urgent for solving not only meteorological problems but also for resource-ecological monitoring of the Earth's underlying surface associated with the detection of thunderstorm clouds, estimation of the liquid water content of clouds and the moisture of the soil, the degree of fire hazard, etc. To solve these problems, we used the AVHRR/NOAA video data that regularly displayed the situation in the territory. The complexity and extremely nonstationary character of problems to be solved call for the use of information of all spectral channels, mathematical apparatus of testing statistical hypotheses, and methods of pattern recognition and identification of the informative parameters. For a class of detection and pattern recognition problems, the average risk functional is a natural criterion for the quality and the information content of the synthesized decision rules. In this case, to solve efficiently the problem of identifying cloud field types, the informative parameters must be determined by minimization of this functional. Since the conditional probability density functions, representing mathematical models of stochastic patterns, are unknown, the problem of nonparametric reconstruction of distributions from the leaning samples arises. To this end, we used nonparametric estimates of distributions with the modified Epanechnikov kernel. The unknown parameters of these distributions were determined by minimization of the risk functional, which for the learning sample was substituted by the empirical risk. After the conditional probability density functions had been reconstructed for the examined hypotheses, a cloudiness type was identified using the Bayes decision rule.
Estimating survival of radio-tagged birds

USGS Publications Warehouse

Bunck, C.M.; Pollock, K.H.; Lebreton, J.-D.; North, P.M.

1993-01-01

Parametric and nonparametric methods for estimating survival of radio-tagged birds are described. The general assumptions of these methods are reviewed. An estimate based on the assumption of constant survival throughout the period is emphasized in the overview of parametric methods. Two nonparametric methods, the Kaplan-Meier estimate of the survival funcrion and the log rank test, are explained in detail The link between these nonparametric methods and traditional capture-recapture models is discussed aloag with considerations in designing studies that use telemetry techniques to estimate survival.
Neighbor-Dependent Ramachandran Probability Distributions of Amino Acids Developed from a Hierarchical Dirichlet Process Model

PubMed Central

Mitra, Rajib; Jordan, Michael I.; Dunbrack, Roland L.

2010-01-01

Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp. PMID:20442867
Nonparametric estimates of drift and diffusion profiles via Fokker-Planck algebra.

PubMed

Lund, Steven P; Hubbard, Joseph B; Halter, Michael

2014-11-06

Diffusion processes superimposed upon deterministic motion play a key role in understanding and controlling the transport of matter, energy, momentum, and even information in physics, chemistry, material science, biology, and communications technology. Given functions defining these random and deterministic components, the Fokker-Planck (FP) equation is often used to model these diffusive systems. Many methods exist for estimating the drift and diffusion profiles from one or more identifiable diffusive trajectories; however, when many identical entities diffuse simultaneously, it may not be possible to identify individual trajectories. Here we present a method capable of simultaneously providing nonparametric estimates for both drift and diffusion profiles from evolving density profiles, requiring only the validity of Langevin/FP dynamics. This algebraic FP manipulation provides a flexible and robust framework for estimating stationary drift and diffusion coefficient profiles, is not based on fluctuation theory or solved diffusion equations, and may facilitate predictions for many experimental systems. We illustrate this approach on experimental data obtained from a model lipid bilayer system exhibiting free diffusion and electric field induced drift. The wide range over which this approach provides accurate estimates for drift and diffusion profiles is demonstrated through simulation.
Classification of longitudinal data through a semiparametric mixed-effects model based on lasso-type estimators.

PubMed

Arribas-Gil, Ana; De la Cruz, Rolando; Lebarbier, Emilie; Meza, Cristian

2015-06-01

We propose a classification method for longitudinal data. The Bayes classifier is classically used to determine a classification rule where the underlying density in each class needs to be well modeled and estimated. This work is motivated by a real dataset of hormone levels measured at the early stages of pregnancy that can be used to predict normal versus abnormal pregnancy outcomes. The proposed model, which is a semiparametric linear mixed-effects model (SLMM), is a particular case of the semiparametric nonlinear mixed-effects class of models (SNMM) in which finite dimensional (fixed effects and variance components) and infinite dimensional (an unknown function) parameters have to be estimated. In SNMM's maximum likelihood estimation is performed iteratively alternating parametric and nonparametric procedures. However, if one can make the assumption that the random effects and the unknown function interact in a linear way, more efficient estimation methods can be used. Our contribution is the proposal of a unified estimation procedure based on a penalized EM-type algorithm. The Expectation and Maximization steps are explicit. In this latter step, the unknown function is estimated in a nonparametric fashion using a lasso-type procedure. A simulation study and an application on real data are performed. © 2015, The International Biometric Society.
Nonparametric estimation of the multivariate survivor function: the multivariate Kaplan-Meier estimator.

PubMed

Prentice, Ross L; Zhao, Shanshan

2018-01-01

The Dabrowska (Ann Stat 16:1475-1489, 1988) product integral representation of the multivariate survivor function is extended, leading to a nonparametric survivor function estimator for an arbitrary number of failure time variates that has a simple recursive formula for its calculation. Empirical process methods are used to sketch proofs for this estimator's strong consistency and weak convergence properties. Summary measures of pairwise and higher-order dependencies are also defined and nonparametrically estimated. Simulation evaluation is given for the special case of three failure time variates.
Adaptive channel estimation for soft decision decoding over non-Gaussian optical channel

NASA Astrophysics Data System (ADS)

Xiang, Jing-song; Miao, Tao-tao; Huang, Sheng; Liu, Huan-lin

2016-10-01

An adaptive priori likelihood ratio (LLR) estimation method is proposed over non-Gaussian channel in the intensity modulation/direct detection (IM/DD) optical communication systems. Using the nonparametric histogram and the weighted least square linear fitting in the tail regions, the LLR is estimated and used for the soft decision decoding of the low-density parity-check (LDPC) codes. This method can adapt well to the three main kinds of intensity modulation/direct detection (IM/DD) optical channel, i.e., the chi-square channel, the Webb-Gaussian channel and the additive white Gaussian noise (AWGN) channel. The performance penalty of channel estimation is neglected.

Multi-object segmentation using coupled nonparametric shape and relative pose priors

NASA Astrophysics Data System (ADS)

Uzunbas, Mustafa Gökhan; Soldea, Octavian; Çetin, Müjdat; Ünal, Gözde; Erçil, Aytül; Unay, Devrim; Ekin, Ahmet; Firat, Zeynep

2009-02-01

We present a new method for multi-object segmentation in a maximum a posteriori estimation framework. Our method is motivated by the observation that neighboring or coupling objects in images generate configurations and co-dependencies which could potentially aid in segmentation if properly exploited. Our approach employs coupled shape and inter-shape pose priors that are computed using training images in a nonparametric multi-variate kernel density estimation framework. The coupled shape prior is obtained by estimating the joint shape distribution of multiple objects and the inter-shape pose priors are modeled via standard moments. Based on such statistical models, we formulate an optimization problem for segmentation, which we solve by an algorithm based on active contours. Our technique provides significant improvements in the segmentation of weakly contrasted objects in a number of applications. In particular for medical image analysis, we use our method to extract brain Basal Ganglia structures, which are members of a complex multi-object system posing a challenging segmentation problem. We also apply our technique to the problem of handwritten character segmentation. Finally, we use our method to segment cars in urban scenes.
Pattern recognition for passive polarimetric data using nonparametric classifiers

NASA Astrophysics Data System (ADS)

Thilak, Vimal; Saini, Jatinder; Voelz, David G.; Creusere, Charles D.

2005-08-01

Passive polarization based imaging is a useful tool in computer vision and pattern recognition. A passive polarization imaging system forms a polarimetric image from the reflection of ambient light that contains useful information for computer vision tasks such as object detection (classification) and recognition. Applications of polarization based pattern recognition include material classification and automatic shape recognition. In this paper, we present two target detection algorithms for images captured by a passive polarimetric imaging system. The proposed detection algorithms are based on Bayesian decision theory. In these approaches, an object can belong to one of any given number classes and classification involves making decisions that minimize the average probability of making incorrect decisions. This minimum is achieved by assigning an object to the class that maximizes the a posteriori probability. Computing a posteriori probabilities requires estimates of class conditional probability density functions (likelihoods) and prior probabilities. A Probabilistic neural network (PNN), which is a nonparametric method that can compute Bayes optimal boundaries, and a -nearest neighbor (KNN) classifier, is used for density estimation and classification. The proposed algorithms are applied to polarimetric image data gathered in the laboratory with a liquid crystal-based system. The experimental results validate the effectiveness of the above algorithms for target detection from polarimetric data.
The non-parametric Parzen's window in stereo vision matching.

PubMed

Pajares, G; de la Cruz, J

2002-01-01

This paper presents an approach to the local stereovision matching problem using edge segments as features with four attributes. From these attributes we compute a matching probability between pairs of features of the stereo images. A correspondence is said true when such a probability is maximum. We introduce a nonparametric strategy based on Parzen's window (1962) to estimate a probability density function (PDF) which is used to obtain the matching probability. This is the main finding of the paper. A comparative analysis of other recent matching methods is included to show that this finding can be justified theoretically. A generalization of the proposed method is made in order to give guidelines about its use with the similarity constraint and also in different environments where other features and attributes are more suitable.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.

PubMed

Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen

2011-04-01

Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property

PubMed Central

Storlie, Curtis B.; Bondell, Howard D.; Reich, Brian J.; Zhang, Hao Helen

2010-01-01

Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting. PMID:21603586
Gini estimation under infinite variance

NASA Astrophysics Data System (ADS)

Fontanari, Andrea; Taleb, Nassim Nicholas; Cirillo, Pasquale

2018-07-01

We study the problems related to the estimation of the Gini index in presence of a fat-tailed data generating process, i.e. one in the stable distribution class with finite mean but infinite variance (i.e. with tail index α ∈(1 , 2)). We show that, in such a case, the Gini coefficient cannot be reliably estimated using conventional nonparametric methods, because of a downward bias that emerges under fat tails. This has important implications for the ongoing discussion about economic inequality. We start by discussing how the nonparametric estimator of the Gini index undergoes a phase transition in the symmetry structure of its asymptotic distribution, as the data distribution shifts from the domain of attraction of a light-tailed distribution to that of a fat-tailed one, especially in the case of infinite variance. We also show how the nonparametric Gini bias increases with lower values of α. We then prove that maximum likelihood estimation outperforms nonparametric methods, requiring a much smaller sample size to reach efficiency. Finally, for fat-tailed data, we provide a simple correction mechanism to the small sample bias of the nonparametric estimator based on the distance between the mode and the mean of its asymptotic distribution.
Twenty-five years of maximum-entropy principle

NASA Astrophysics Data System (ADS)

Kapur, J. N.

1983-04-01

The strengths and weaknesses of the maximum entropy principle (MEP) are examined and some challenging problems that remain outstanding at the end of the first quarter century of the principle are discussed. The original formalism of the MEP is presented and its relationship to statistical mechanics is set forth. The use of MEP for characterizing statistical distributions, in statistical inference, nonlinear spectral analysis, transportation models, population density models, models for brand-switching in marketing and vote-switching in elections is discussed. Its application to finance, insurance, image reconstruction, pattern recognition, operations research and engineering, biology and medicine, and nonparametric density estimation is considered.
Comparative study of species sensitivity distributions based on non-parametric kernel density estimation for some transition metals.

PubMed

Wang, Ying; Feng, Chenglian; Liu, Yuedan; Zhao, Yujie; Li, Huixian; Zhao, Tianhui; Guo, Wenjing

2017-02-01

Transition metals in the fourth period of the periodic table of the elements are widely widespread in aquatic environments. They could often occur at certain concentrations to cause adverse effects on aquatic life and human health. Generally, parametric models are mostly used to construct species sensitivity distributions (SSDs), which result in comparison for water quality criteria (WQC) of elements in the same period or group of the periodic table might be inaccurate and the results could be biased. To address this inadequacy, the non-parametric kernel density estimation (NPKDE) with its optimal bandwidths and testing methods were developed for establishing SSDs. The NPKDE was better fit, more robustness and better predicted than conventional normal and logistic parametric density estimations for constructing SSDs and deriving acute HC5 and WQC for transition metals in the fourth period of the periodic table. The decreasing sequence of HC5 values for the transition metals in the fourth period was Ti > Mn > V > Ni > Zn > Cu > Fe > Co > Cr(VI), which were not proportional to atomic number in the periodic table, and for different metals the relatively sensitive species were also different. The results indicated that except for physical and chemical properties there are other factors affecting toxicity mechanisms of transition metals. The proposed method enriched the methodological foundation for WQC. Meanwhile, it also provided a relatively innovative, accurate approach for the WQC derivation and risk assessment of the same group and period metals in aquatic environments to support protection of aquatic organisms. Copyright © 2016 Elsevier Ltd. All rights reserved.
Nonparametric functional data estimation applied to ozone data: prediction and extreme value analysis.

PubMed

Quintela-del-Río, Alejandro; Francisco-Fernández, Mario

2011-02-01

The study of extreme values and prediction of ozone data is an important topic of research when dealing with environmental problems. Classical extreme value theory is usually used in air-pollution studies. It consists in fitting a parametric generalised extreme value (GEV) distribution to a data set of extreme values, and using the estimated distribution to compute return levels and other quantities of interest. Here, we propose to estimate these values using nonparametric functional data methods. Functional data analysis is a relatively new statistical methodology that generally deals with data consisting of curves or multi-dimensional variables. In this paper, we use this technique, jointly with nonparametric curve estimation, to provide alternatives to the usual parametric statistical tools. The nonparametric estimators are applied to real samples of maximum ozone values obtained from several monitoring stations belonging to the Automatic Urban and Rural Network (AURN) in the UK. The results show that nonparametric estimators work satisfactorily, outperforming the behaviour of classical parametric estimators. Functional data analysis is also used to predict stratospheric ozone concentrations. We show an application, using the data set of mean monthly ozone concentrations in Arosa, Switzerland, and the results are compared with those obtained by classical time series (ARIMA) analysis. Copyright Â© 2010 Elsevier Ltd. All rights reserved.
A strategy for analysis of (molecular) equilibrium simulations: Configuration space density estimation, clustering, and visualization

NASA Astrophysics Data System (ADS)

Hamprecht, Fred A.; Peter, Christine; Daura, Xavier; Thiel, Walter; van Gunsteren, Wilfred F.

2001-02-01

We propose an approach for summarizing the output of long simulations of complex systems, affording a rapid overview and interpretation. First, multidimensional scaling techniques are used in conjunction with dimension reduction methods to obtain a low-dimensional representation of the configuration space explored by the system. A nonparametric estimate of the density of states in this subspace is then obtained using kernel methods. The free energy surface is calculated from that density, and the configurations produced in the simulation are then clustered according to the topography of that surface, such that all configurations belonging to one local free energy minimum form one class. This topographical cluster analysis is performed using basin spanning trees which we introduce as subgraphs of Delaunay triangulations. Free energy surfaces obtained in dimensions lower than four can be visualized directly using iso-contours and -surfaces. Basin spanning trees also afford a glimpse of higher-dimensional topographies. The procedure is illustrated using molecular dynamics simulations on the reversible folding of peptide analoga. Finally, we emphasize the intimate relation of density estimation techniques to modern enhanced sampling algorithms.
On the Use of Nonparametric Item Characteristic Curve Estimation Techniques for Checking Parametric Model Fit

ERIC Educational Resources Information Center

Lee, Young-Sun; Wollack, James A.; Douglas, Jeffrey

2009-01-01

The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items,…
Nonparametric Transfer Function Models

PubMed Central

Liu, Jun M.; Chen, Rong; Yao, Qiwei

2009-01-01

In this paper a class of nonparametric transfer function models is proposed to model nonlinear relationships between ‘input’ and ‘output’ time series. The transfer function is smooth with unknown functional forms, and the noise is assumed to be a stationary autoregressive-moving average (ARMA) process. The nonparametric transfer function is estimated jointly with the ARMA parameters. By modeling the correlation in the noise, the transfer function can be estimated more efficiently. The parsimonious ARMA structure improves the estimation efficiency in finite samples. The asymptotic properties of the estimators are investigated. The finite-sample properties are illustrated through simulations and one empirical example. PMID:20628584
Nonparametric Item Response Curve Estimation with Correction for Measurement Error

ERIC Educational Resources Information Center

Guo, Hongwen; Sinharay, Sandip

2011-01-01

Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…
Benchmark dose analysis via nonparametric regression modeling

PubMed Central

Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

2013-01-01

Estimation of benchmark doses (BMDs) in quantitative risk assessment traditionally is based upon parametric dose-response modeling. It is a well-known concern, however, that if the chosen parametric model is uncertain and/or misspecified, inaccurate and possibly unsafe low-dose inferences can result. We describe a nonparametric approach for estimating BMDs with quantal-response data based on an isotonic regression method, and also study use of corresponding, nonparametric, bootstrap-based confidence limits for the BMD. We explore the confidence limits’ small-sample properties via a simulation study, and illustrate the calculations with an example from cancer risk assessment. It is seen that this nonparametric approach can provide a useful alternative for BMD estimation when faced with the problem of parametric model uncertainty. PMID:23683057
Fast clustering using adaptive density peak detection.

PubMed

Wang, Xiao-Feng; Xu, Yifan

2017-12-01

Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.
High throughput nonparametric probability density estimation.

PubMed

Farmer, Jenny; Jacobs, Donald

2018-01-01

In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.
High throughput nonparametric probability density estimation

PubMed Central

Farmer, Jenny

2018-01-01

In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803
Nonparametric methods for doubly robust estimation of continuous treatment effects.

PubMed

Kennedy, Edward H; Ma, Zongming; McHugh, Matthew D; Small, Dylan S

2017-09-01

Continuous treatments (e.g., doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the effect curve, or by not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on the effect curve, and still allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and give a procedure for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.
Measuring firm size distribution with semi-nonparametric densities

NASA Astrophysics Data System (ADS)

Cortés, Lina M.; Mora-Valencia, Andrés; Perote, Javier

2017-11-01

In this article, we propose a new methodology based on a (log) semi-nonparametric (log-SNP) distribution that nests the lognormal and enables better fits in the upper tail of the distribution through the introduction of new parameters. We test the performance of the lognormal and log-SNP distributions capturing firm size, measured through a sample of US firms in 2004-2015. Taking different levels of aggregation by type of economic activity, our study shows that the log-SNP provides a better fit of the firm size distribution. We also formally introduce the multivariate log-SNP distribution, which encompasses the multivariate lognormal, to analyze the estimation of the joint distribution of the value of the firm's assets and sales. The results suggest that sales are a better firm size measure, as indicated by other studies in the literature.
A Comparison of Methods for Nonparametric Estimation of Item Characteristic Curves for Binary Items

ERIC Educational Resources Information Center

Lee, Young-Sun

2007-01-01

This study compares the performance of three nonparametric item characteristic curve (ICC) estimation procedures: isotonic regression, smoothed isotonic regression, and kernel smoothing. Smoothed isotonic regression, employed along with an appropriate kernel function, provides better estimates and also satisfies the assumption of strict…

Estimating the extreme low-temperature event using nonparametric methods

NASA Astrophysics Data System (ADS)

D'Silva, Anisha

This thesis presents a new method of estimating the one-in-N low temperature threshold using a non-parametric statistical method called kernel density estimation applied to daily average wind-adjusted temperatures. We apply our One-in-N Algorithm to local gas distribution companies (LDCs), as they have to forecast the daily natural gas needs of their consumers. In winter, demand for natural gas is high. Extreme low temperature events are not directly related to an LDCs gas demand forecasting, but knowledge of extreme low temperatures is important to ensure that an LDC has enough capacity to meet customer demands when extreme low temperatures are experienced. We present a detailed explanation of our One-in-N Algorithm and compare it to the methods using the generalized extreme value distribution, the normal distribution, and the variance-weighted composite distribution. We show that our One-in-N Algorithm estimates the one-in- N low temperature threshold more accurately than the methods using the generalized extreme value distribution, the normal distribution, and the variance-weighted composite distribution according to root mean square error (RMSE) measure at a 5% level of significance. The One-in- N Algorithm is tested by counting the number of times the daily average wind-adjusted temperature is less than or equal to the one-in- N low temperature threshold.
Statistical field estimators for multiscale simulations.

PubMed

Eapen, Jacob; Li, Ju; Yip, Sidney

2005-11-01

We present a systematic approach for generating smooth and accurate fields from particle simulation data using the notions of statistical inference. As an extension to a parametric representation based on the maximum likelihood technique previously developed for velocity and temperature fields, a nonparametric estimator based on the principle of maximum entropy is proposed for particle density and stress fields. Both estimators are applied to represent molecular dynamics data on shear-driven flow in an enclosure which exhibits a high degree of nonlinear characteristics. We show that the present density estimator is a significant improvement over ad hoc bin averaging and is also free of systematic boundary artifacts that appear in the method of smoothing kernel estimates. Similarly, the velocity fields generated by the maximum likelihood estimator do not show any edge effects that can be erroneously interpreted as slip at the wall. For low Reynolds numbers, the velocity fields and streamlines generated by the present estimator are benchmarked against Newtonian continuum calculations. For shear velocities that are a significant fraction of the thermal speed, we observe a form of shear localization that is induced by the confining boundary.
Non-parametric directionality analysis - Extension for removal of a single common predictor and application to time series.

PubMed

Halliday, David M; Senik, Mohd Harizal; Stevenson, Carl W; Mason, Rob

2016-08-01

The ability to infer network structure from multivariate neuronal signals is central to computational neuroscience. Directed network analyses typically use parametric approaches based on auto-regressive (AR) models, where networks are constructed from estimates of AR model parameters. However, the validity of using low order AR models for neurophysiological signals has been questioned. A recent article introduced a non-parametric approach to estimate directionality in bivariate data, non-parametric approaches are free from concerns over model validity. We extend the non-parametric framework to include measures of directed conditional independence, using scalar measures that decompose the overall partial correlation coefficient summatively by direction, and a set of functions that decompose the partial coherence summatively by direction. A time domain partial correlation function allows both time and frequency views of the data to be constructed. The conditional independence estimates are conditioned on a single predictor. The framework is applied to simulated cortical neuron networks and mixtures of Gaussian time series data with known interactions. It is applied to experimental data consisting of local field potential recordings from bilateral hippocampus in anaesthetised rats. The framework offers a non-parametric approach to estimation of directed interactions in multivariate neuronal recordings, and increased flexibility in dealing with both spike train and time series data. The framework offers a novel alternative non-parametric approach to estimate directed interactions in multivariate neuronal recordings, and is applicable to spike train and time series data. Copyright © 2016 Elsevier B.V. All rights reserved.
Robust estimation for ordinary differential equation models.

PubMed

Cao, J; Wang, L; Xu, J

2011-12-01

Applied scientists often like to use ordinary differential equations (ODEs) to model complex dynamic processes that arise in biology, engineering, medicine, and many other areas. It is interesting but challenging to estimate ODE parameters from noisy data, especially when the data have some outliers. We propose a robust method to address this problem. The dynamic process is represented with a nonparametric function, which is a linear combination of basis functions. The nonparametric function is estimated by a robust penalized smoothing method. The penalty term is defined with the parametric ODE model, which controls the roughness of the nonparametric function and maintains the fidelity of the nonparametric function to the ODE model. The basis coefficients and ODE parameters are estimated in two nested levels of optimization. The coefficient estimates are treated as an implicit function of ODE parameters, which enables one to derive the analytic gradients for optimization using the implicit function theorem. Simulation studies show that the robust method gives satisfactory estimates for the ODE parameters from noisy data with outliers. The robust method is demonstrated by estimating a predator-prey ODE model from real ecological data. © 2011, The International Biometric Society.
Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28

ERIC Educational Resources Information Center

Guo, Hongwen; Sinharay, Sandip

2011-01-01

Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…
GEE-Smoothing Spline in Semiparametric Model with Correlated Nominal Data

NASA Astrophysics Data System (ADS)

Ibrahim, Noor Akma; Suliadi

2010-11-01

In this paper we propose GEE-Smoothing spline in the estimation of semiparametric models with correlated nominal data. The method can be seen as an extension of parametric generalized estimating equation to semiparametric models. The nonparametric component is estimated using smoothing spline specifically the natural cubic spline. We use profile algorithm in the estimation of both parametric and nonparametric components. The properties of the estimators are evaluated using simulation studies.
Granger causality revisited

PubMed Central

Friston, Karl J.; Bastos, André M.; Oswal, Ashwini; van Wijk, Bernadette; Richter, Craig; Litvak, Vladimir

2014-01-01

This technical paper offers a critical re-evaluation of (spectral) Granger causality measures in the analysis of biological timeseries. Using realistic (neural mass) models of coupled neuronal dynamics, we evaluate the robustness of parametric and nonparametric Granger causality. Starting from a broad class of generative (state-space) models of neuronal dynamics, we show how their Volterra kernels prescribe the second-order statistics of their response to random fluctuations; characterised in terms of cross-spectral density, cross-covariance, autoregressive coefficients and directed transfer functions. These quantities in turn specify Granger causality — providing a direct (analytic) link between the parameters of a generative model and the expected Granger causality. We use this link to show that Granger causality measures based upon autoregressive models can become unreliable when the underlying dynamics is dominated by slow (unstable) modes — as quantified by the principal Lyapunov exponent. However, nonparametric measures based on causal spectral factors are robust to dynamical instability. We then demonstrate how both parametric and nonparametric spectral causality measures can become unreliable in the presence of measurement noise. Finally, we show that this problem can be finessed by deriving spectral causality measures from Volterra kernels, estimated using dynamic causal modelling. PMID:25003817
Efficient, adaptive estimation of two-dimensional firing rate surfaces via Gaussian process methods.

PubMed

Rad, Kamiar Rahnama; Paninski, Liam

2010-01-01

Estimating two-dimensional firing rate maps is a common problem, arising in a number of contexts: the estimation of place fields in hippocampus, the analysis of temporally nonstationary tuning curves in sensory and motor areas, the estimation of firing rates following spike-triggered covariance analyses, etc. Here we introduce methods based on Gaussian process nonparametric Bayesian techniques for estimating these two-dimensional rate maps. These techniques offer a number of advantages: the estimates may be computed efficiently, come equipped with natural errorbars, adapt their smoothness automatically to the local density and informativeness of the observed data, and permit direct fitting of the model hyperparameters (e.g., the prior smoothness of the rate map) via maximum marginal likelihood. We illustrate the method's flexibility and performance on a variety of simulated and real data.
Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions.

PubMed

Najibi, Seyed Morteza; Maadooliat, Mehdi; Zhou, Lan; Huang, Jianhua Z; Gao, Xin

2017-01-01

Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction.
TRAN-STAT: statistics for environmental transuranic studies, July 1978, Number 5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

This issue is concerned with nonparametric procedures for (1) estimating the central tendency of a population, (2) describing data sets through estimating percentiles, (3) estimating confidence limits for the median and other percentiles, (4) estimating tolerance limits and associated numbers of samples, and (5) tests of significance and associated procedures for a variety of testing situations (counterparts to t-tests and analysis of variance). Some characteristics of several nonparametric tests are illustrated using the NAEG /sup 241/Am aliquot data presented and discussed in the April issue of TRAN-STAT. Some of the statistical terms used here are defined in a glossary. Themore » reference list also includes short descriptions of nonparametric books. 31 references, 3 figures, 1 table.« less
Reliability of Test Scores in Nonparametric Item Response Theory.

ERIC Educational Resources Information Center

Sijtsma, Klaas; Molenaar, Ivo W.

1987-01-01

Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four "classical" lower bounds to reliability. (Author/JAZ)
Joint nonparametric correction estimator for excess relative risk regression in survival analysis with exposure measurement error

PubMed Central

Wang, Ching-Yun; Cullings, Harry; Song, Xiao; Kopecky, Kenneth J.

2017-01-01

SUMMARY Observational epidemiological studies often confront the problem of estimating exposure-disease relationships when the exposure is not measured exactly. In the paper, we investigate exposure measurement error in excess relative risk regression, which is a widely used model in radiation exposure effect research. In the study cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies a generalized version of the classical additive measurement error model, but it may or may not have repeated measurements. In addition, an instrumental variable is available for individuals in a subset of the whole cohort. We develop a nonparametric correction (NPC) estimator using data from the subcohort, and further propose a joint nonparametric correction (JNPC) estimator using all observed data to adjust for exposure measurement error. An optimal linear combination estimator of JNPC and NPC is further developed. The proposed estimators are nonparametric, which are consistent without imposing a covariate or error distribution, and are robust to heteroscedastic errors. Finite sample performance is examined via a simulation study. We apply the developed methods to data from the Radiation Effects Research Foundation, in which chromosome aberration is used to adjust for the effects of radiation dose measurement error on the estimation of radiation dose responses. PMID:29354018
A Monte Carlo Study of the Effect of Item Characteristic Curve Estimation on the Accuracy of Three Person-Fit Statistics

ERIC Educational Resources Information Center

St-Onge, Christina; Valois, Pierre; Abdous, Belkacem; Germain, Stephane

2009-01-01

To date, there have been no studies comparing parametric and nonparametric Item Characteristic Curve (ICC) estimation methods on the effectiveness of Person-Fit Statistics (PFS). The primary aim of this study was to determine if the use of ICCs estimated by nonparametric methods would increase the accuracy of item response theory-based PFS for…
Reconstruction of cosmological matter perturbations in modified gravity

NASA Astrophysics Data System (ADS)

Gonzalez, J. E.

2017-12-01

The analysis of perturbative quantities is a powerful tool to distinguish between different dark energy models and gravity theories degenerated at the background level. In this work, we generalize the integral solution of the matter density contrast for general relativity gravity [V. Sahni and A. Starobinsky, Int. J. Mod. Phys. D 15, 2105 (2006)., 10.1142/S0218271806009704, U. Alam, V. Sahni, and A. A. Starobinsky, Astrophys. J. 704, 1086 (2009)., 10.1088/0004-637X/704/2/1086] to a wide class of modified gravity (MG) theories. To calculate this solution, it is necessary to have prior knowledge of the Hubble rate, the density parameter at the present epoch (Ωm 0), and the functional form of the effective Newton's constant that characterizes the gravity theory. We estimate in a model-independent way the Hubble expansion rate by applying a nonparametric reconstruction method to model-independent cosmic chronometer data and high-z quasar data. In order to compare our generalized solution of the matter density contrast, using the nonparametric reconstruction of H (z ) from observational data, with a purely theoretical one, we choose a parametrization of the screened modified gravity and the Ωm 0 from WMAP-9 Collaborations. Finally, we calculate the growth index for the analyzed cases, finding very good agreement between theoretical values and the obtained ones using the approach presented in this work.
Data-driven monitoring for stochastic systems and its application on batch process

NASA Astrophysics Data System (ADS)

Yin, Shen; Ding, Steven X.; Haghani Abandan Sari, Adel; Hao, Haiyang

2013-07-01

Batch processes are characterised by a prescribed processing of raw materials into final products for a finite duration and play an important role in many industrial sectors due to the low-volume and high-value products. Process dynamics and stochastic disturbances are inherent characteristics of batch processes, which cause monitoring of batch processes a challenging problem in practice. To solve this problem, a subspace-aided data-driven approach is presented in this article for batch process monitoring. The advantages of the proposed approach lie in its simple form and its abilities to deal with stochastic disturbances and process dynamics existing in the process. The kernel density estimation, which serves as a non-parametric way of estimating the probability density function, is utilised for threshold calculation. An industrial benchmark of fed-batch penicillin production is finally utilised to verify the effectiveness of the proposed approach.
Bayesian Unimodal Density Regression for Causal Inference

ERIC Educational Resources Information Center

Karabatsos, George; Walker, Stephen G.

2011-01-01

Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…
A comparative study of nonparametric methods for pattern recognition

NASA Technical Reports Server (NTRS)

Hahn, S. F.; Nelson, G. D.

1972-01-01

The applied research discussed in this report determines and compares the correct classification percentage of the nonparametric sign test, Wilcoxon's signed rank test, and K-class classifier with the performance of the Bayes classifier. The performance is determined for data which have Gaussian, Laplacian and Rayleigh probability density functions. The correct classification percentage is shown graphically for differences in modes and/or means of the probability density functions for four, eight and sixteen samples. The K-class classifier performed very well with respect to the other classifiers used. Since the K-class classifier is a nonparametric technique, it usually performed better than the Bayes classifier which assumes the data to be Gaussian even though it may not be. The K-class classifier has the advantage over the Bayes in that it works well with non-Gaussian data without having to determine the probability density function of the data. It should be noted that the data in this experiment was always unimodal.
Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data

PubMed Central

Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

2012-01-01

Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions. PMID:23645976
Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data.

PubMed

Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

2013-01-01

Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions.
Permissible Home Range Estimation (PHRE) in restricted habitats: A new algorithm and an evaluation for sea otters

USGS Publications Warehouse

Tarjan, Lily M; Tinker, M. Tim

2016-01-01

Parametric and nonparametric kernel methods dominate studies of animal home ranges and space use. Most existing methods are unable to incorporate information about the underlying physical environment, leading to poor performance in excluding areas that are not used. Using radio-telemetry data from sea otters, we developed and evaluated a new algorithm for estimating home ranges (hereafter Permissible Home Range Estimation, or “PHRE”) that reflects habitat suitability. We began by transforming sighting locations into relevant landscape features (for sea otters, coastal position and distance from shore). Then, we generated a bivariate kernel probability density function in landscape space and back-transformed this to geographic space in order to define a permissible home range. Compared to two commonly used home range estimation methods, kernel densities and local convex hulls, PHRE better excluded unused areas and required a smaller sample size. Our PHRE method is applicable to species whose ranges are restricted by complex physical boundaries or environmental gradients and will improve understanding of habitat-use requirements and, ultimately, aid in conservation efforts.

CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions

EPA Pesticide Factsheets

Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.
The Breslow estimator of the nonparametric baseline survivor function in Cox's regression model: some heuristics.

PubMed

Hanley, James A

2008-01-01

Most survival analysis textbooks explain how the hazard ratio parameters in Cox's life table regression model are estimated. Fewer explain how the components of the nonparametric baseline survivor function are derived. Those that do often relegate the explanation to an "advanced" section and merely present the components as algebraic or iterative solutions to estimating equations. None comment on the structure of these estimators. This note brings out a heuristic representation that may help to de-mystify the structure.
Nonparametric Estimation of Distribution and Density Functions with Applications.

DTIC Science & Technology

1982-05-01

8 > Z C.. o i, ,- 4 8 ...S C[ 17 : - j 46 0- IA ~ IL ar’ D rn B- - 4 -4A B- 8 ~a *~jai 188 the jackknife may be found in Gray, et al., and Cressie (Refs 15,28). Analogous to...convergence. 21 =7w; Ul I 0083 q C/) 22S , WJ 8 , J035 224 43 UC I a- 04 14 Q) ’I 4 -4a-Q 23z Let R1 be the real line, the borel field on R 1 and P,
Nonparametric estimation of the heterogeneity of a random medium using compound Poisson process modeling of wave multiple scattering.

PubMed

Le Bihan, Nicolas; Margerin, Ludovic

2009-07-01

In this paper, we present a nonparametric method to estimate the heterogeneity of a random medium from the angular distribution of intensity of waves transmitted through a slab of random material. Our approach is based on the modeling of forward multiple scattering using compound Poisson processes on compact Lie groups. The estimation technique is validated through numerical simulations based on radiative transfer theory.
Maximum Likelihood Estimations and EM Algorithms with Length-biased Data

PubMed Central

Qin, Jing; Ning, Jing; Liu, Hao; Shen, Yu

2012-01-01

SUMMARY Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, epidemiological, genetic and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimations and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semi-parametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online. PMID:22323840
Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

PubMed Central

Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

2018-01-01

This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
Comparison of Parametric and Nonparametric Bootstrap Methods for Estimating Random Error in Equipercentile Equating

ERIC Educational Resources Information Center

Cui, Zhongmin; Kolen, Michael J.

2008-01-01

This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…
Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA

PubMed Central

Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui

2014-01-01

Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792
Empirical prediction intervals improve energy forecasting

PubMed Central

Kaack, Lynn H.; Apt, Jay; Morgan, M. Granger; McSharry, Patrick

2017-01-01

Hundreds of organizations and analysts use energy projections, such as those contained in the US Energy Information Administration (EIA)’s Annual Energy Outlook (AEO), for investment and policy decisions. Retrospective analyses of past AEO projections have shown that observed values can differ from the projection by several hundred percent, and thus a thorough treatment of uncertainty is essential. We evaluate the out-of-sample forecasting performance of several empirical density forecasting methods, using the continuous ranked probability score (CRPS). The analysis confirms that a Gaussian density, estimated on past forecasting errors, gives comparatively accurate uncertainty estimates over a variety of energy quantities in the AEO, in particular outperforming scenario projections provided in the AEO. We report probabilistic uncertainties for 18 core quantities of the AEO 2016 projections. Our work frames how to produce, evaluate, and rank probabilistic forecasts in this setting. We propose a log transformation of forecast errors for price projections and a modified nonparametric empirical density forecasting method. Our findings give guidance on how to evaluate and communicate uncertainty in future energy outlooks. PMID:28760997
Equivalence of truncated count mixture distributions and mixtures of truncated count distributions.

PubMed

Böhning, Dankmar; Kuhnert, Ronny

2006-12-01

This article is about modeling count data with zero truncation. A parametric count density family is considered. The truncated mixture of densities from this family is different from the mixture of truncated densities from the same family. Whereas the former model is more natural to formulate and to interpret, the latter model is theoretically easier to treat. It is shown that for any mixing distribution leading to a truncated mixture, a (usually different) mixing distribution can be found so that the associated mixture of truncated densities equals the truncated mixture, and vice versa. This implies that the likelihood surfaces for both situations agree, and in this sense both models are equivalent. Zero-truncated count data models are used frequently in the capture-recapture setting to estimate population size, and it can be shown that the two Horvitz-Thompson estimators, associated with the two models, agree. In particular, it is possible to achieve strong results for mixtures of truncated Poisson densities, including reliable, global construction of the unique NPMLE (nonparametric maximum likelihood estimator) of the mixing distribution, implying a unique estimator for the population size. The benefit of these results lies in the fact that it is valid to work with the mixture of truncated count densities, which is less appealing for the practitioner but theoretically easier. Mixtures of truncated count densities form a convex linear model, for which a developed theory exists, including global maximum likelihood theory as well as algorithmic approaches. Once the problem has been solved in this class, it might readily be transformed back to the original problem by means of an explicitly given mapping. Applications of these ideas are given, particularly in the case of the truncated Poisson family.
Functional mapping of reaction norms to multiple environmental signals through nonparametric covariance estimation

PubMed Central

2011-01-01

Background The identification of genes or quantitative trait loci that are expressed in response to different environmental factors such as temperature and light, through functional mapping, critically relies on precise modeling of the covariance structure. Previous work used separable parametric covariance structures, such as a Kronecker product of autoregressive one [AR(1)] matrices, that do not account for interaction effects of different environmental factors. Results We implement a more robust nonparametric covariance estimator to model these interactions within the framework of functional mapping of reaction norms to two signals. Our results from Monte Carlo simulations show that this estimator can be useful in modeling interactions that exist between two environmental signals. The interactions are simulated using nonseparable covariance models with spatio-temporal structural forms that mimic interaction effects. Conclusions The nonparametric covariance estimator has an advantage over separable parametric covariance estimators in the detection of QTL location, thus extending the breadth of use of functional mapping in practical settings. PMID:21269481
On non-parametric maximum likelihood estimation of the bivariate survivor function.

PubMed

Prentice, R L

The likelihood function for the bivariate survivor function F, under independent censorship, is maximized to obtain a non-parametric maximum likelihood estimator &Fcirc;. &Fcirc; may or may not be unique depending on the configuration of singly- and doubly-censored pairs. The likelihood function can be maximized by placing all mass on the grid formed by the uncensored failure times, or half lines beyond the failure time grid, or in the upper right quadrant beyond the grid. By accumulating the mass along lines (or regions) where the likelihood is flat, one obtains a partially maximized likelihood as a function of parameters that can be uniquely estimated. The score equations corresponding to these point mass parameters are derived, using a Lagrange multiplier technique to ensure unit total mass, and a modified Newton procedure is used to calculate the parameter estimates in some limited simulation studies. Some considerations for the further development of non-parametric bivariate survivor function estimators are briefly described.
A Bayesian nonparametric method for prediction in EST analysis

PubMed Central

Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor

2007-01-01

Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445
Water quality analysis in rivers with non-parametric probability distributions and fuzzy inference systems: application to the Cauca River, Colombia.

PubMed

Ocampo-Duque, William; Osorio, Carolina; Piamba, Christian; Schuhmacher, Marta; Domingo, José L

2013-02-01

The integration of water quality monitoring variables is essential in environmental decision making. Nowadays, advanced techniques to manage subjectivity, imprecision, uncertainty, vagueness, and variability are required in such complex evaluation process. We here propose a probabilistic fuzzy hybrid model to assess river water quality. Fuzzy logic reasoning has been used to compute a water quality integrative index. By applying a Monte Carlo technique, based on non-parametric probability distributions, the randomness of model inputs was estimated. Annual histograms of nine water quality variables were built with monitoring data systematically collected in the Colombian Cauca River, and probability density estimations using the kernel smoothing method were applied to fit data. Several years were assessed, and river sectors upstream and downstream the city of Santiago de Cali, a big city with basic wastewater treatment and high industrial activity, were analyzed. The probabilistic fuzzy water quality index was able to explain the reduction in water quality, as the river receives a larger number of agriculture, domestic, and industrial effluents. The results of the hybrid model were compared to traditional water quality indexes. The main advantage of the proposed method is that it considers flexible boundaries between the linguistic qualifiers used to define the water status, being the belongingness of water quality to the diverse output fuzzy sets or classes provided with percentiles and histograms, which allows classify better the real water condition. The results of this study show that fuzzy inference systems integrated to stochastic non-parametric techniques may be used as complementary tools in water quality indexing methodologies. Copyright © 2012 Elsevier Ltd. All rights reserved.
Bayesian isotonic density regression

PubMed Central

Wang, Lianming; Dunson, David B.

2011-01-01

Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered. PMID:22822259
Nonparametric analysis of bivariate gap time with competing risks.

PubMed

Huang, Chiung-Yu; Wang, Chenguang; Wang, Mei-Cheng

2016-09-01

This article considers nonparametric methods for studying recurrent disease and death with competing risks. We first point out that comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events, and that comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. We then propose nonparametric estimators for the conditional cumulative incidence function as well as the conditional bivariate cumulative incidence function for the bivariate gap times, that is, the time to disease recurrence and the residual lifetime after recurrence. To quantify the association between the two gap times in the competing risks setting, a modified Kendall's tau statistic is proposed. The proposed estimators for the conditional bivariate cumulative incidence distribution and the association measure account for the induced dependent censoring for the second gap time. Uniform consistency and weak convergence of the proposed estimators are established. Hypothesis testing procedures for two-sample comparisons are discussed. Numerical simulation studies with practical sample sizes are conducted to evaluate the performance of the proposed nonparametric estimators and tests. An application to data from a pancreatic cancer study is presented to illustrate the methods developed in this article. © 2016, The International Biometric Society.
Model-free quantification of dynamic PET data using nonparametric deconvolution

PubMed Central

Zanderigo, Francesca; Parsey, Ramin V; Todd Ogden, R

2015-01-01

Dynamic positron emission tomography (PET) data are usually quantified using compartment models (CMs) or derived graphical approaches. Often, however, CMs either do not properly describe the tracer kinetics, or are not identifiable, leading to nonphysiologic estimates of the tracer binding. The PET data are modeled as the convolution of the metabolite-corrected input function and the tracer impulse response function (IRF) in the tissue. Using nonparametric deconvolution methods, it is possible to obtain model-free estimates of the IRF, from which functionals related to tracer volume of distribution and binding may be computed, but this approach has rarely been applied in PET. Here, we apply nonparametric deconvolution using singular value decomposition to simulated and test–retest clinical PET data with four reversible tracers well characterized by CMs ([11C]CUMI-101, [11C]DASB, [11C]PE2I, and [11C]WAY-100635), and systematically compare reproducibility, reliability, and identifiability of various IRF-derived functionals with that of traditional CMs outcomes. Results show that nonparametric deconvolution, completely free of any model assumptions, allows for estimates of tracer volume of distribution and binding that are very close to the estimates obtained with CMs and, in some cases, show better test–retest performance than CMs outcomes. PMID:25873427
Estimation and model selection of semiparametric multivariate survival functions under general censorship.

PubMed

Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

2010-07-01

We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.
Estimation and model selection of semiparametric multivariate survival functions under general censorship

PubMed Central

Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

2013-01-01

We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286
Dynamic characteristics of oxygen consumption.

PubMed

Ye, Lin; Argha, Ahmadreza; Yu, Hairong; Celler, Branko G; Nguyen, Hung T; Su, Steven

2018-04-23

Previous studies have indicated that oxygen uptake ([Formula: see text]) is one of the most accurate indices for assessing the cardiorespiratory response to exercise. In most existing studies, the response of [Formula: see text] is often roughly modelled as a first-order system due to the inadequate stimulation and low signal to noise ratio. To overcome this difficulty, this paper proposes a novel nonparametric kernel-based method for the dynamic modelling of [Formula: see text] response to provide a more robust estimation. Twenty healthy non-athlete participants conducted treadmill exercises with monotonous stimulation (e.g., single step function as input). During the exercise, [Formula: see text] was measured and recorded by a popular portable gas analyser ([Formula: see text], COSMED). Based on the recorded data, a kernel-based estimation method was proposed to perform the nonparametric modelling of [Formula: see text]. For the proposed method, a properly selected kernel can represent the prior modelling information to reduce the dependence of comprehensive stimulations. Furthermore, due to the special elastic net formed by [Formula: see text] norm and kernelised [Formula: see text] norm, the estimations are smooth and concise. Additionally, the finite impulse response based nonparametric model which estimated by the proposed method can optimally select the order and fit better in terms of goodness-of-fit comparing to classical methods. Several kernels were introduced for the kernel-based [Formula: see text] modelling method. The results clearly indicated that the stable spline (SS) kernel has the best performance for [Formula: see text] modelling. Particularly, based on the experimental data from 20 participants, the estimated response from the proposed method with SS kernel was significantly better than the results from the benchmark method [i.e., prediction error method (PEM)] ([Formula: see text] vs [Formula: see text]). The proposed nonparametric modelling method is an effective method for the estimation of the impulse response of VO 2 -Speed system. Furthermore, the identified average nonparametric model method can dynamically predict [Formula: see text] response with acceptable accuracy during treadmill exercise.

Thirty Years of Nonparametric Item Response Theory.

ERIC Educational Resources Information Center

Molenaar, Ivo W.

2001-01-01

Discusses relationships between a mathematical measurement model and its real-world applications. Makes a distinction between large-scale data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Also evaluates nonparametric methods for estimating item response functions and…
Conditional Covariance-Based Nonparametric Multidimensionality Assessment.

ERIC Educational Resources Information Center

Stout, William; And Others

1996-01-01

Three nonparametric procedures that use estimates of covariances of item-pair responses conditioned on examinee trait level for assessing dimensionality of a test are described. The HCA/CCPROX, DIMTEST, and DETECT are applied to a dimensionality study of the Law School Admission Test. (SLD)
A parametric interpretation of Bayesian Nonparametric Inference from Gene Genealogies: Linking ecological, population genetics and evolutionary processes.

PubMed

Ponciano, José Miguel

2017-11-22

Using a nonparametric Bayesian approach Palacios and Minin (2013) dramatically improved the accuracy, precision of Bayesian inference of population size trajectories from gene genealogies. These authors proposed an extension of a Gaussian Process (GP) nonparametric inferential method for the intensity function of non-homogeneous Poisson processes. They found that not only the statistical properties of the estimators were improved with their method, but also, that key aspects of the demographic histories were recovered. The authors' work represents the first Bayesian nonparametric solution to this inferential problem because they specify a convenient prior belief without a particular functional form on the population trajectory. Their approach works so well and provides such a profound understanding of the biological process, that the question arises as to how truly "biology-free" their approach really is. Using well-known concepts of stochastic population dynamics, here I demonstrate that in fact, Palacios and Minin's GP model can be cast as a parametric population growth model with density dependence and environmental stochasticity. Making this link between population genetics and stochastic population dynamics modeling provides novel insights into eliciting biologically meaningful priors for the trajectory of the effective population size. The results presented here also bring novel understanding of GP as models for the evolution of a trait. Thus, the ecological principles foundation of Palacios and Minin (2013)'s prior adds to the conceptual and scientific value of these authors' inferential approach. I conclude this note by listing a series of insights brought about by this connection with Ecology. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.
Inter-annual and seasonal trends in cetacean distribution, density and abundance off southern California

NASA Astrophysics Data System (ADS)

Campbell, Gregory S.; Thomas, Len; Whitaker, Katherine; Douglas, Annie B.; Calambokidis, John; Hildebrand, John A.

2015-02-01

Trends in cetacean density and distribution off southern California were assessed through visual line-transect surveys during thirty-seven California Cooperative Oceanic Fisheries Investigations (CalCOFI) cruises from July 2004-November 2013. From sightings of the six most commonly encountered cetacean species, seasonal, annual and overall density estimates were calculated. Blue whales (Balaenoptera musculus), fin whales (Balaenoptera physalus) and humpback whales (Megaptera novaeangliae) were the most frequently sighted baleen whales with overall densities of 0.91/1000 km2 (CV=0.27), 2.73/1000 km2 (CV=0.19), and 1.17/1000 km2 (CV=0.21) respectively. Species specific density estimates, stratified by cruise, were analyzed using a generalized additive model to estimate long-term trends and correct for seasonal imbalances. Variances were estimated using a non-parametric bootstrap with one day of effort as the sampling unit. Blue whales were primarily observed during summer and fall while fin and humpback whales were observed year-round with peaks in density during summer and spring respectively. Short-beaked common dolphins (Delphinus delphis), Pacific white-sided dolphins (Lagenorhynchus obliquidens) and Dall's porpoise (Phocoenoidesdalli) were the most frequently encountered small cetaceans with overall densities of 705.83/1000 km2 (CV=0.22), 51.98/1000 km2 (CV=0.27), and 21.37/1000 km2 (CV=0.19) respectively. Seasonally, short-beaked common dolphins were most abundant in winter whereas Pacific white-sided dolphins and Dall's porpoise were most abundant during spring. There were no significant long-term changes in blue whale, fin whale, humpback whale, short-beaked common dolphin or Dall's porpoise densities while Pacific white-sided dolphins exhibited a significant decrease in density across the ten-year study. The results from this study were fundamentally consistent with earlier studies, but provide greater temporal and seasonal resolution.
A simulation study of nonparametric total deviation index as a measure of agreement based on quantile regression.

PubMed

Lin, Lawrence; Pan, Yi; Hedayat, A S; Barnhart, Huiman X; Haber, Michael

2016-01-01

Total deviation index (TDI) captures a prespecified quantile of the absolute deviation of paired observations from raters, observers, methods, assays, instruments, etc. We compare the performance of TDI using nonparametric quantile regression to the TDI assuming normality (Lin, 2000). This simulation study considers three distributions: normal, Poisson, and uniform at quantile levels of 0.8 and 0.9 for cases with and without contamination. Study endpoints include the bias of TDI estimates (compared with their respective theoretical values), standard error of TDI estimates (compared with their true simulated standard errors), and test size (compared with 0.05), and power. Nonparametric TDI using quantile regression, although it slightly underestimates and delivers slightly less power for data without contamination, works satisfactorily under all simulated cases even for moderate (say, ≥40) sample sizes. The performance of the TDI based on a quantile of 0.8 is in general superior to that of 0.9. The performances of nonparametric and parametric TDI methods are compared with a real data example. Nonparametric TDI can be very useful when the underlying distribution on the difference is not normal, especially when it has a heavy tail.
A Hybrid Index for Characterizing Drought Based on a Nonparametric Kernel Estimator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, Shengzhi; Huang, Qiang; Leng, Guoyong

This study develops a nonparametric multivariate drought index, namely, the Nonparametric Multivariate Standardized Drought Index (NMSDI), by considering the variations of both precipitation and streamflow. Building upon previous efforts in constructing Nonparametric Multivariate Drought Index, we use the nonparametric kernel estimator to derive the joint distribution of precipitation and streamflow, thus providing additional insights in drought index development. The proposed NMSDI are applied in the Wei River Basin (WRB), based on which the drought evolution characteristics are investigated. Results indicate: (1) generally, NMSDI captures the drought onset similar to Standardized Precipitation Index (SPI) and drought termination and persistence similar tomore » Standardized Streamflow Index (SSFI). The drought events identified by NMSDI match well with historical drought records in the WRB. The performances are also consistent with that by an existing Multivariate Standardized Drought Index (MSDI) at various timescales, confirming the validity of the newly constructed NMSDI in drought detections (2) An increasing risk of drought has been detected for the past decades, and will be persistent to a certain extent in future in most areas of the WRB; (3) the identified change points of annual NMSDI are mainly concentrated in the early 1970s and middle 1990s, coincident with extensive water use and soil reservation practices. This study highlights the nonparametric multivariable drought index, which can be used for drought detections and predictions efficiently and comprehensively.« less
Why preferring parametric forecasting to nonparametric methods?

PubMed

Jabot, Franck

2015-05-07

A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
Geostatistical radar-raingauge combination with nonparametric correlograms: methodological considerations and application in Switzerland

NASA Astrophysics Data System (ADS)

Schiemann, R.; Erdin, R.; Willi, M.; Frei, C.; Berenguer, M.; Sempere-Torres, D.

2011-05-01

Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semivariogram models are fit from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data fields. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial field under consideration, they yield an estimate of the semivariogram that is unbiased for small lag distances. This justifies the use of this estimation technique in geostatistical applications. Various formulations of geostatistical combination (Kriging) methods are used here for the construction of hourly precipitation grids for Switzerland based on data from a sparse realtime network of raingauges and from a spatially complete radar composite. Two variants of Ordinary Kriging (OK) are used to interpolate the sparse gauge observations. In both OK variants, the radar data are only used to determine the semivariogram model. One variant relies on a traditional parametric semivariogram estimate, whereas the other variant uses the nonparametric correlogram. The variants are tested for three cases and the impact of the semivariogram model on the Kriging prediction is illustrated. For the three test cases, the method using nonparametric correlograms performs equally well or better than the traditional method, and at the same time offers great practical advantages. Furthermore, two variants of Kriging with external drift (KED) are tested, both of which use the radar data to estimate nonparametric correlograms, and as the external drift variable. The first KED variant has been used previously for geostatistical radar-raingauge merging in Catalonia (Spain). The second variant is newly proposed here and is an extension of the first. Both variants are evaluated for the three test cases as well as an extended evaluation period. It is found that both methods yield merged fields of better quality than the original radar field or fields obtained by OK of gauge data. The newly suggested KED formulation is shown to be beneficial, in particular in mountainous regions where the quality of the Swiss radar composite is comparatively low. An analysis of the Kriging variances shows that none of the methods tested here provides a satisfactory uncertainty estimate. A suitable variable transformation is expected to improve this.
Geostatistical radar-raingauge combination with nonparametric correlograms: methodological considerations and application in Switzerland

NASA Astrophysics Data System (ADS)

Schiemann, R.; Erdin, R.; Willi, M.; Frei, C.; Berenguer, M.; Sempere-Torres, D.

2010-09-01

Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semivariogram models are fit from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data fields. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial field under consideration, they yield an estimate of the semivariogram that is unbiased for small lag distances. This justifies the use of this estimation technique in geostatistical applications. Various formulations of geostatistical combination (Kriging) methods are used here for the construction of hourly precipitation grids for Switzerland based on data from a sparse realtime network of raingauges and from a spatially complete radar composite. Two variants of Ordinary Kriging (OK) are used to interpolate the sparse gauge observations. In both OK variants, the radar data are only used to determine the semivariogram model. One variant relies on a traditional parametric semivariogram estimate, whereas the other variant uses the nonparametric correlogram. The variants are tested for three cases and the impact of the semivariogram model on the Kriging prediction is illustrated. For the three test cases, the method using nonparametric correlograms performs equally well or better than the traditional method, and at the same time offers great practical advantages. Furthermore, two variants of Kriging with external drift (KED) are tested, both of which use the radar data to estimate nonparametric correlograms, and as the external drift variable. The first KED variant has been used previously for geostatistical radar-raingauge merging in Catalonia (Spain). The second variant is newly proposed here and is an extension of the first. Both variants are evaluated for the three test cases as well as an extended evaluation period. It is found that both methods yield merged fields of better quality than the original radar field or fields obtained by OK of gauge data. The newly suggested KED formulation is shown to be beneficial, in particular in mountainous regions where the quality of the Swiss radar composite is comparatively low. An analysis of the Kriging variances shows that none of the methods tested here provides a satisfactory uncertainty estimate. A suitable variable transformation is expected to improve this.
Estimation from PET data of transient changes in dopamine concentration induced by alcohol: support for a non-parametric signal estimation method

NASA Astrophysics Data System (ADS)

Constantinescu, C. C.; Yoder, K. K.; Kareken, D. A.; Bouman, C. A.; O'Connor, S. J.; Normandin, M. D.; Morris, E. D.

2008-03-01

We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest & activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (FDA(t)) and the change in binding potential (ΔBP). The veracity of the FDA(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) ΔBP should decline with increasing DA peak time, (2) ΔBP should increase as the strength of the temporal correlation between FDA(t) and the free raclopride (FRAC(t)) curve increases, (3) ΔBP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [11C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover FDA(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the FDA(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of FDA(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.
Unveiling acoustic physics of the CMB using nonparametric estimation of the temperature angular power spectrum for Planck

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aghamousa, Amir; Shafieloo, Arman; Arjunwadkar, Mihir

2015-02-01

Estimation of the angular power spectrum is one of the important steps in Cosmic Microwave Background (CMB) data analysis. Here, we present a nonparametric estimate of the temperature angular power spectrum for the Planck 2013 CMB data. The method implemented in this work is model-independent, and allows the data, rather than the model, to dictate the fit. Since one of the main targets of our analysis is to test the consistency of the ΛCDM model with Planck 2013 data, we use the nuisance parameters associated with the best-fit ΛCDM angular power spectrum to remove foreground contributions from the data atmore » multipoles ℓ ≥50. We thus obtain a combined angular power spectrum data set together with the full covariance matrix, appropriately weighted over frequency channels. Our subsequent nonparametric analysis resolves six peaks (and five dips) up to ℓ ∼1850 in the temperature angular power spectrum. We present uncertainties in the peak/dip locations and heights at the 95% confidence level. We further show how these reflect the harmonicity of acoustic peaks, and can be used for acoustic scale estimation. Based on this nonparametric formalism, we found the best-fit ΛCDM model to be at 36% confidence distance from the center of the nonparametric confidence set—this is considerably larger than the confidence distance (9%) derived earlier from a similar analysis of the WMAP 7-year data. Another interesting result of our analysis is that at low multipoles, the Planck data do not suggest any upturn, contrary to the expectation based on the integrated Sachs-Wolfe contribution in the best-fit ΛCDM cosmology.« less
Semi-nonparametric VaR forecasts for hedge funds during the recent crisis

NASA Astrophysics Data System (ADS)

Del Brio, Esther B.; Mora-Valencia, Andrés; Perote, Javier

2014-05-01

The need to provide accurate value-at-risk (VaR) forecasting measures has triggered an important literature in econophysics. Although these accurate VaR models and methodologies are particularly demanded for hedge fund managers, there exist few articles specifically devoted to implement new techniques in hedge fund returns VaR forecasting. This article advances in these issues by comparing the performance of risk measures based on parametric distributions (the normal, Student’s t and skewed-t), semi-nonparametric (SNP) methodologies based on Gram-Charlier (GC) series and the extreme value theory (EVT) approach. Our results show that normal-, Student’s t- and Skewed t- based methodologies fail to forecast hedge fund VaR, whilst SNP and EVT approaches accurately success on it. We extend these results to the multivariate framework by providing an explicit formula for the GC copula and its density that encompasses the Gaussian copula and accounts for non-linear dependences. We show that the VaR obtained by the meta GC accurately captures portfolio risk and outperforms regulatory VaR estimates obtained through the meta Gaussian and Student’s t distributions.
Age-class structure and variability of two populations of the bluemask darter etheostoma (Doration) sp.

USGS Publications Warehouse

Simmons, J.W.; Layzer, J.B.; Smith, D.D.

2008-01-01

The bluemask darter Etheostoma (Doration) sp. is an endangered fish endemic to the upper Caney Fork system in the Cumberland River drainage in central Tennessee. Darters (Etheostoma spp.) are typically short-lived and exhibit rapid growth that quickly decreases with age. Consequently, estimating age of darters from length-frequency distributions can be difficult and subjective. We used a nonparametric kernel density estimator to reduce subjectivity in estimating ages of bluemask darters. Data were collected from a total of 2926 bluemask darters from the Collins River throughout three growing seasons. Additionally, data were collected from 842 bluemask darters from the Rocky River during one growing season. Analysis of length-frequencies indicated the presence of four age classes in both rivers. In each river, the majority of the population was comprised of fish 0.05). In both rivers, females were more abundant than males.
A comparative study between nonlinear regression and nonparametric approaches for modelling Phalaris paradoxa seedling emergence

USDA-ARS?s Scientific Manuscript database

Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
Asymptotics of nonparametric L-1 regression models with dependent data

PubMed Central

ZHAO, ZHIBIAO; WEI, YING; LIN, DENNIS K.J.

2013-01-01

We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile estimates. The obtained Bahadur representations provide deep insights into the asymptotic behavior of the estimates. Our main theoretical development is based on studying the modulus of continuity of kernel weighted empirical process through a coupling argument. Progesterone data is used for an illustration. PMID:24955016
A fully nonparametric estimator of the marginal survival function based on case–control clustered age-at-onset data

PubMed Central

Gorfine, Malka; Bordo, Nadia; Hsu, Li

2017-01-01

Summary Consider a popular case–control family study where individuals with a disease under study (case probands) and individuals who do not have the disease (control probands) are randomly sampled from a well-defined population. Possibly right-censored age at onset and disease status are observed for both probands and their relatives. For example, case probands are men diagnosed with prostate cancer, control probands are men free of prostate cancer, and the prostate cancer history of the fathers of the probands is also collected. Inherited genetic susceptibility, shared environment, and common behavior lead to correlation among the outcomes within a family. In this article, a novel nonparametric estimator of the marginal survival function is provided. The estimator is defined in the presence of intra-cluster dependence, and is based on consistent smoothed kernel estimators of conditional survival functions. By simulation, it is shown that the proposed estimator performs very well in terms of bias. The utility of the estimator is illustrated by the analysis of case–control family data of early onset prostate cancer. To our knowledge, this is the first article that provides a fully nonparametric marginal survival estimator based on case–control clustered age-at-onset data. PMID:27436674
A semi-nonparametric Poisson regression model for analyzing motor vehicle crash data.

PubMed

Ye, Xin; Wang, Ke; Zou, Yajie; Lord, Dominique

2018-01-01

This paper develops a semi-nonparametric Poisson regression model to analyze motor vehicle crash frequency data collected from rural multilane highway segments in California, US. Motor vehicle crash frequency on rural highway is a topic of interest in the area of transportation safety due to higher driving speeds and the resultant severity level. Unlike the traditional Negative Binomial (NB) model, the semi-nonparametric Poisson regression model can accommodate an unobserved heterogeneity following a highly flexible semi-nonparametric (SNP) distribution. Simulation experiments are conducted to demonstrate that the SNP distribution can well mimic a large family of distributions, including normal distributions, log-gamma distributions, bimodal and trimodal distributions. Empirical estimation results show that such flexibility offered by the SNP distribution can greatly improve model precision and the overall goodness-of-fit. The semi-nonparametric distribution can provide a better understanding of crash data structure through its ability to capture potential multimodality in the distribution of unobserved heterogeneity. When estimated coefficients in empirical models are compared, SNP and NB models are found to have a substantially different coefficient for the dummy variable indicating the lane width. The SNP model with better statistical performance suggests that the NB model overestimates the effect of lane width on crash frequency reduction by 83.1%.
Pointwise nonparametric maximum likelihood estimator of stochastically ordered survivor functions

PubMed Central

Park, Yongseok; Taylor, Jeremy M. G.; Kalbfleisch, John D.

2012-01-01

In this paper, we consider estimation of survivor functions from groups of observations with right-censored data when the groups are subject to a stochastic ordering constraint. Many methods and algorithms have been proposed to estimate distribution functions under such restrictions, but none have completely satisfactory properties when the observations are censored. We propose a pointwise constrained nonparametric maximum likelihood estimator, which is defined at each time t by the estimates of the survivor functions subject to constraints applied at time t only. We also propose an efficient method to obtain the estimator. The estimator of each constrained survivor function is shown to be nonincreasing in t, and its consistency and asymptotic distribution are established. A simulation study suggests better small and large sample properties than for alternative estimators. An example using prostate cancer data illustrates the method. PMID:23843661
Nonparametric predictive inference for combining diagnostic tests with parametric copula

NASA Astrophysics Data System (ADS)

Muhammad, Noryanti; Coolen, F. P. A.; Coolen-Maturi, T.

2017-09-01

Measuring the accuracy of diagnostic tests is crucial in many application areas including medicine and health care. The Receiver Operating Characteristic (ROC) curve is a popular statistical tool for describing the performance of diagnostic tests. The area under the ROC curve (AUC) is often used as a measure of the overall performance of the diagnostic test. In this paper, we interest in developing strategies for combining test results in order to increase the diagnostic accuracy. We introduce nonparametric predictive inference (NPI) for combining two diagnostic test results with considering dependence structure using parametric copula. NPI is a frequentist statistical framework for inference on a future observation based on past data observations. NPI uses lower and upper probabilities to quantify uncertainty and is based on only a few modelling assumptions. While copula is a well-known statistical concept for modelling dependence of random variables. A copula is a joint distribution function whose marginals are all uniformly distributed and it can be used to model the dependence separately from the marginal distributions. In this research, we estimate the copula density using a parametric method which is maximum likelihood estimator (MLE). We investigate the performance of this proposed method via data sets from the literature and discuss results to show how our method performs for different family of copulas. Finally, we briefly outline related challenges and opportunities for future research.
Nonparametric Fine Tuning of Mixtures: Application to Non-Life Insurance Claims Distribution Estimation

NASA Astrophysics Data System (ADS)

Sardet, Laure; Patilea, Valentin

When pricing a specific insurance premium, actuary needs to evaluate the claims cost distribution for the warranty. Traditional actuarial methods use parametric specifications to model claims distribution, like lognormal, Weibull and Pareto laws. Mixtures of such distributions allow to improve the flexibility of the parametric approach and seem to be quite well-adapted to capture the skewness, the long tails as well as the unobserved heterogeneity among the claims. In this paper, instead of looking for a finely tuned mixture with many components, we choose a parsimonious mixture modeling, typically a two or three-component mixture. Next, we use the mixture cumulative distribution function (CDF) to transform data into the unit interval where we apply a beta-kernel smoothing procedure. A bandwidth rule adapted to our methodology is proposed. Finally, the beta-kernel density estimate is back-transformed to recover an estimate of the original claims density. The beta-kernel smoothing provides an automatic fine-tuning of the parsimonious mixture and thus avoids inference in more complex mixture models with many parameters. We investigate the empirical performance of the new method in the estimation of the quantiles with simulated nonnegative data and the quantiles of the individual claims distribution in a non-life insurance application.

Simulated maximum likelihood method for estimating kinetic rates in gene expression.

PubMed

Tian, Tianhai; Xu, Songlin; Gao, Junbin; Burrage, Kevin

2007-01-01

Kinetic rate in gene expression is a key measurement of the stability of gene products and gives important information for the reconstruction of genetic regulatory networks. Recent developments in experimental technologies have made it possible to measure the numbers of transcripts and protein molecules in single cells. Although estimation methods based on deterministic models have been proposed aimed at evaluating kinetic rates from experimental observations, these methods cannot tackle noise in gene expression that may arise from discrete processes of gene expression, small numbers of mRNA transcript, fluctuations in the activity of transcriptional factors and variability in the experimental environment. In this paper, we develop effective methods for estimating kinetic rates in genetic regulatory networks. The simulated maximum likelihood method is used to evaluate parameters in stochastic models described by either stochastic differential equations or discrete biochemical reactions. Different types of non-parametric density functions are used to measure the transitional probability of experimental observations. For stochastic models described by biochemical reactions, we propose to use the simulated frequency distribution to evaluate the transitional density based on the discrete nature of stochastic simulations. The genetic optimization algorithm is used as an efficient tool to search for optimal reaction rates. Numerical results indicate that the proposed methods can give robust estimations of kinetic rates with good accuracy.
Probit vs. semi-nonparametric estimation: examining the role of disability on institutional entry for older adults.

PubMed

Sharma, Andy

2017-06-01

The purpose of this study was to showcase an advanced methodological approach to model disability and institutional entry. Both of these are important areas to investigate given the on-going aging of the United States population. By 2020, approximately 15% of the population will be 65 years and older. Many of these older adults will experience disability and require formal care. A probit analysis was employed to determine which disabilities were associated with admission into an institution (i.e. long-term care). Since this framework imposes strong distributional assumptions, misspecification leads to inconsistent estimators. To overcome such a short-coming, this analysis extended the probit framework by employing an advanced semi-nonparamertic maximum likelihood estimation utilizing Hermite polynomial expansions. Specification tests show semi-nonparametric estimation is preferred over probit. In terms of the estimates, semi-nonparametric ratios equal 42 for cognitive difficulty, 64 for independent living, and 111 for self-care disability while probit yields much smaller estimates of 19, 30, and 44, respectively. Public health professionals can use these results to better understand why certain interventions have not shown promise. Equally important, healthcare workers can use this research to evaluate which type of treatment plans may delay institutionalization and improve the quality of life for older adults. Implications for rehabilitation With on-going global aging, understanding the association between disability and institutional entry is important in devising successful rehabilitation interventions. Semi-nonparametric is preferred to probit and shows ambulatory and cognitive impairments present high risk for institutional entry (long-term care). Informal caregiving and home-based care require further examination as forms of rehabilitation/therapy for certain types of disabilities.
Sieve estimation of Cox models with latent structures.

PubMed

Cao, Yongxiu; Huang, Jian; Liu, Yanyan; Zhao, Xingqiu

2016-12-01

This article considers sieve estimation in the Cox model with an unknown regression structure based on right-censored data. We propose a semiparametric pursuit method to simultaneously identify and estimate linear and nonparametric covariate effects based on B-spline expansions through a penalized group selection method with concave penalties. We show that the estimators of the linear effects and the nonparametric component are consistent. Furthermore, we establish the asymptotic normality of the estimator of the linear effects. To compute the proposed estimators, we develop a modified blockwise majorization descent algorithm that is efficient and easy to implement. Simulation studies demonstrate that the proposed method performs well in finite sample situations. We also use the primary biliary cirrhosis data to illustrate its application. © 2016, The International Biometric Society.
Posterior consistency in conditional distribution estimation

PubMed Central

Pati, Debdeep; Dunson, David B.; Tokdar, Surya T.

2014-01-01

A wide variety of priors have been proposed for nonparametric Bayesian estimation of conditional distributions, and there is a clear need for theorems providing conditions on the prior for large support, as well as posterior consistency. Estimation of an uncountable collection of conditional distributions across different regions of the predictor space is a challenging problem, which differs in some important ways from density and mean regression estimation problems. Defining various topologies on the space of conditional distributions, we provide sufficient conditions for posterior consistency focusing on a broad class of priors formulated as predictor-dependent mixtures of Gaussian kernels. This theory is illustrated by showing that the conditions are satisfied for a class of generalized stick-breaking process mixtures in which the stick-breaking lengths are monotone, differentiable functions of a continuous stochastic process. We also provide a set of sufficient conditions for the case where stick-breaking lengths are predictor independent, such as those arising from a fixed Dirichlet process prior. PMID:25067858
Nonparametric regression applied to quantitative structure-activity relationships

PubMed

Constans; Hirst

2000-03-01

Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density designs. Performances were benchmarked against the nonlinear method of smoothing splines. A linear reference point was provided by multilinear regression (MLR). Variable selection was explored using systematic combinations of different variables and combinations of principal components. For the data set examined, 47 inhibitors of dopamine beta-hydroxylase, the additive nonparametric regressors have greater predictive accuracy (as measured by the mean absolute error of the predictions or the Pearson correlation in cross-validation trails) than MLR. The use of principal components did not improve the performance of the nonparametric regressors over use of the original descriptors, since the original descriptors are not strongly correlated. It remains to be seen if the nonparametric regressors can be successfully coupled with better variable selection and dimensionality reduction in the context of high-dimensional QSARs.
Three Classes of Nonparametric Differential Step Functioning Effect Estimators

ERIC Educational Resources Information Center

Penfield, Randall D.

2008-01-01

The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…
Estimation of Spatial Dynamic Nonparametric Durbin Models with Fixed Effects

ERIC Educational Resources Information Center

Qian, Minghui; Hu, Ridong; Chen, Jianwei

2016-01-01

Spatial panel data models have been widely studied and applied in both scientific and social science disciplines, especially in the analysis of spatial influence. In this paper, we consider the spatial dynamic nonparametric Durbin model (SDNDM) with fixed effects, which takes the nonlinear factors into account base on the spatial dynamic panel…
Nonparametric EROC analysis for observer performance evaluation on joint detection and estimation tasks

NASA Astrophysics Data System (ADS)

Wunderlich, Adam; Goossens, Bart

2014-03-01

The majority of the literature on task-based image quality assessment has focused on lesion detection tasks, using the receiver operating characteristic (ROC) curve, or related variants, to measure performance. However, since many clinical image evaluation tasks involve both detection and estimation (e.g., estimation of kidney stone composition, estimation of tumor size), there is a growing interest in performance evaluation for joint detection and estimation tasks. To evaluate observer performance on such tasks, Clarkson introduced the estimation ROC (EROC) curve, and the area under the EROC curve as a summary figure of merit. In the present work, we propose nonparametric estimators for practical EROC analysis from experimental data, including estimators for the area under the EROC curve and its variance. The estimators are illustrated with a practical example comparing MRI images reconstructed from different k-space sampling trajectories.
Sparsity-promoting and edge-preserving maximum a posteriori estimators in non-parametric Bayesian inverse problems

NASA Astrophysics Data System (ADS)

Agapiou, Sergios; Burger, Martin; Dashti, Masoumeh; Helin, Tapio

2018-04-01

We consider the inverse problem of recovering an unknown functional parameter u in a separable Banach space, from a noisy observation vector y of its image through a known possibly non-linear map {{\\mathcal G}} . We adopt a Bayesian approach to the problem and consider Besov space priors (see Lassas et al (2009 Inverse Problems Imaging 3 87-122)), which are well-known for their edge-preserving and sparsity-promoting properties and have recently attracted wide attention especially in the medical imaging community. Our key result is to show that in this non-parametric setup the maximum a posteriori (MAP) estimates are characterized by the minimizers of a generalized Onsager-Machlup functional of the posterior. This is done independently for the so-called weak and strong MAP estimates, which as we show coincide in our context. In addition, we prove a form of weak consistency for the MAP estimators in the infinitely informative data limit. Our results are remarkable for two reasons: first, the prior distribution is non-Gaussian and does not meet the smoothness conditions required in previous research on non-parametric MAP estimates. Second, the result analytically justifies existing uses of the MAP estimate in finite but high dimensional discretizations of Bayesian inverse problems with the considered Besov priors.
Comparison of methods for estimating the attributable risk in the context of survival analysis.

PubMed

Gassama, Malamine; Bénichou, Jacques; Dartois, Laureen; Thiébaut, Anne C M

2017-01-23

The attributable risk (AR) measures the proportion of disease cases that can be attributed to an exposure in the population. Several definitions and estimation methods have been proposed for survival data. Using simulations, we compared four methods for estimating AR defined in terms of survival functions: two nonparametric methods based on Kaplan-Meier's estimator, one semiparametric based on Cox's model, and one parametric based on the piecewise constant hazards model, as well as one simpler method based on estimated exposure prevalence at baseline and Cox's model hazard ratio. We considered a fixed binary exposure with varying exposure probabilities and strengths of association, and generated event times from a proportional hazards model with constant or monotonic (decreasing or increasing) Weibull baseline hazard, as well as from a nonproportional hazards model. We simulated 1,000 independent samples of size 1,000 or 10,000. The methods were compared in terms of mean bias, mean estimated standard error, empirical standard deviation and 95% confidence interval coverage probability at four equally spaced time points. Under proportional hazards, all five methods yielded unbiased results regardless of sample size. Nonparametric methods displayed greater variability than other approaches. All methods showed satisfactory coverage except for nonparametric methods at the end of follow-up for a sample size of 1,000 especially. With nonproportional hazards, nonparametric methods yielded similar results to those under proportional hazards, whereas semiparametric and parametric approaches that both relied on the proportional hazards assumption performed poorly. These methods were applied to estimate the AR of breast cancer due to menopausal hormone therapy in 38,359 women of the E3N cohort. In practice, our study suggests to use the semiparametric or parametric approaches to estimate AR as a function of time in cohort studies if the proportional hazards assumption appears appropriate.
Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample

PubMed Central

Wang, Ching-Yun; Song, Xiao

2017-01-01

SUMMARY Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women’s Health Initiative. PMID:27546625
Nonparametric estimation of median survival times with applications to multi-site or multi-center studies.

PubMed

Rahbar, Mohammad H; Choi, Sangbum; Hong, Chuan; Zhu, Liang; Jeon, Sangchoon; Gardiner, Joseph C

2018-01-01

We propose a nonparametric shrinkage estimator for the median survival times from several independent samples of right-censored data, which combines the samples and hypothesis information to improve the efficiency. We compare efficiency of the proposed shrinkage estimation procedure to unrestricted estimator and combined estimator through extensive simulation studies. Our results indicate that performance of these estimators depends on the strength of homogeneity of the medians. When homogeneity holds, the combined estimator is the most efficient estimator. However, it becomes inconsistent when homogeneity fails. On the other hand, the proposed shrinkage estimator remains efficient. Its efficiency decreases as the equality of the survival medians is deviated, but is expected to be as good as or equal to the unrestricted estimator. Our simulation studies also indicate that the proposed shrinkage estimator is robust to moderate levels of censoring. We demonstrate application of these methods to estimating median time for trauma patients to receive red blood cells in the Prospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study.
Nonparametric estimation of median survival times with applications to multi-site or multi-center studies

PubMed Central

Choi, Sangbum; Hong, Chuan; Zhu, Liang; Jeon, Sangchoon; Gardiner, Joseph C.

2018-01-01

We propose a nonparametric shrinkage estimator for the median survival times from several independent samples of right-censored data, which combines the samples and hypothesis information to improve the efficiency. We compare efficiency of the proposed shrinkage estimation procedure to unrestricted estimator and combined estimator through extensive simulation studies. Our results indicate that performance of these estimators depends on the strength of homogeneity of the medians. When homogeneity holds, the combined estimator is the most efficient estimator. However, it becomes inconsistent when homogeneity fails. On the other hand, the proposed shrinkage estimator remains efficient. Its efficiency decreases as the equality of the survival medians is deviated, but is expected to be as good as or equal to the unrestricted estimator. Our simulation studies also indicate that the proposed shrinkage estimator is robust to moderate levels of censoring. We demonstrate application of these methods to estimating median time for trauma patients to receive red blood cells in the Prospective Observational Multi-center Major Trauma Transfusion (PROMMTT) study. PMID:29772007
A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

USGS Publications Warehouse

Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

2012-01-01

Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.
Parametric and non-parametric modeling of short-term synaptic plasticity. Part I: computational study

PubMed Central

Marmarelis, Vasilis Z.; Berger, Theodore W.

2009-01-01

Parametric and non-parametric modeling methods are combined to study the short-term plasticity (STP) of synapses in the central nervous system (CNS). The nonlinear dynamics of STP are modeled by means: (1) previously proposed parametric models based on mechanistic hypotheses and/or specific dynamical processes, and (2) non-parametric models (in the form of Volterra kernels) that transforms the presynaptic signals into postsynaptic signals. In order to synergistically use the two approaches, we estimate the Volterra kernels of the parametric models of STP for four types of synapses using synthetic broadband input–output data. Results show that the non-parametric models accurately and efficiently replicate the input–output transformations of the parametric models. Volterra kernels provide a general and quantitative representation of the STP. PMID:18506609
The linear transformation model with frailties for the analysis of item response times.

PubMed

Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey A

2013-02-01

The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi-parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non-parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non-parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees' latent speeds; whereas the non-parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box-Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two-stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non-parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested. © 2012 The British Psychological Society.
Multiple imputation methods for nonparametric inference on cumulative incidence with missing cause of failure

PubMed Central

Lee, Minjung; Dignam, James J.; Han, Junhee

2014-01-01

We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107
Using multiple decrement models to estimate risk and morbidity from specific AIDS illnesses. Multicenter AIDS Cohort Study (MACS).

PubMed

Hoover, D R; Peng, Y; Saah, A J; Detels, R R; Day, R S; Phair, J P

A simple non-parametric approach is developed to simultaneously estimate net incidence and morbidity time from specific AIDS illnesses in populations at high risk for death from these illnesses and other causes. The disease-death process has four-stages that can be recast as two sandwiching three-state multiple decrement processes. Non-parametric estimation of net incidence and morbidity time with error bounds are achieved from these sandwiching models through modification of methods from Aalen and Greenwood, and bootstrapping. An application to immunosuppressed HIV-1 infected homosexual men reveals that cytomegalovirus disease, Kaposi's sarcoma and Pneumocystis pneumonia are likely to occur and cause significant morbidity time.
A mixture model for robust registration in Kinect sensor

NASA Astrophysics Data System (ADS)

Peng, Li; Zhou, Huabing; Zhu, Shengguo

2018-03-01

The Microsoft Kinect sensor has been widely used in many applications, but it suffers from the drawback of low registration precision between color image and depth image. In this paper, we present a robust method to improve the registration precision by a mixture model that can handle multiply images with the nonparametric model. We impose non-parametric geometrical constraints on the correspondence, as a prior distribution, in a reproducing kernel Hilbert space (RKHS).The estimation is performed by the EM algorithm which by also estimating the variance of the prior model is able to obtain good estimates. We illustrate the proposed method on the public available dataset. The experimental results show that our approach outperforms the baseline methods.
The estimation of time-varying risks in asset pricing modelling using B-Spline method

NASA Astrophysics Data System (ADS)

Nurjannah; Solimun; Rinaldo, Adji

2017-12-01

Asset pricing modelling has been extensively studied in the past few decades to explore the risk-return relationship. The asset pricing literature typically assumed a static risk-return relationship. However, several studies found few anomalies in the asset pricing modelling which captured the presence of the risk instability. The dynamic model is proposed to offer a better model. The main problem highlighted in the dynamic model literature is that the set of conditioning information is unobservable and therefore some assumptions have to be made. Hence, the estimation requires additional assumptions about the dynamics of risk. To overcome this problem, the nonparametric estimators can also be used as an alternative for estimating risk. The flexibility of the nonparametric setting avoids the problem of misspecification derived from selecting a functional form. This paper investigates the estimation of time-varying asset pricing model using B-Spline, as one of nonparametric approach. The advantages of spline method is its computational speed and simplicity, as well as the clarity of controlling curvature directly. The three popular asset pricing models will be investigated namely CAPM (Capital Asset Pricing Model), Fama-French 3-factors model and Carhart 4-factors model. The results suggest that the estimated risks are time-varying and not stable overtime which confirms the risk instability anomaly. The results is more pronounced in Carhart’s 4-factors model.

Nonparametric projections of forest and rangeland condition indicators: A technical document supporting the 2005 USDA Forest Service RPA Assessment Update

Treesearch

John Hof; Curtis Flather; Tony Baltic; Rudy King

2006-01-01

The 2005 Forest and Rangeland Condition Indicator Model is a set of classification trees for forest and rangeland condition indicators at the national scale. This report documents the development of the database and the nonparametric statistical estimation for this analytical structure, with emphasis on three special characteristics of condition indicator production...
Complementary nonparametric analysis of covariance for logistic regression in a randomized clinical trial setting.

PubMed

Tangen, C M; Koch, G G

1999-03-01

In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.
Automated Breast Density Computation in Digital Mammography and Digital Breast Tomosynthesis: Influence on Mean Glandular Dose and BIRADS Density Categorization.

PubMed

Castillo-García, Maria; Chevalier, Margarita; Garayoa, Julia; Rodriguez-Ruiz, Alejandro; García-Pinto, Diego; Valverde, Julio

2017-07-01

The study aimed to compare the breast density estimates from two algorithms on full-field digital mammography (FFDM) and digital breast tomosynthesis (DBT) and to analyze the clinical implications. We selected 561 FFDM and DBT examinations from patients without breast pathologies. Two versions of a commercial software (Quantra 2D and Quantra 3D) calculated the volumetric breast density automatically in FFDM and DBT, respectively. Other parameters such as area breast density and total breast volume were evaluated. We compared the results from both algorithms using the Mann-Whitney U non-parametric test and the Spearman's rank coefficient for data correlation analysis. Mean glandular dose (MGD) was calculated following the methodology proposed by Dance et al. Measurements with both algorithms are well correlated (r ≥ 0.77). However, there are statistically significant differences between the medians (P < 0.05) of most parameters. The volumetric and area breast density median values from FFDM are, respectively, 8% and 77% higher than DBT estimations. Both algorithms classify 35% and 55% of breasts into BIRADS (Breast Imaging-Reporting and Data System) b and c categories, respectively. There are no significant differences between the MGD calculated using the breast density from each algorithm. DBT delivers higher MGD than FFDM, with a lower difference (5%) for breasts in the BIRADS d category. MGD is, on average, 6% higher than values obtained with the breast glandularity proposed by Dance et al. Breast density measurements from both algorithms lead to equivalent BIRADS classification and MGD values, hence showing no difference in clinical outcomes. The median MGD values of FFDM and DBT examinations are similar for dense breasts (BIRADS d category). Published by Elsevier Inc.
Transformation-invariant and nonparametric monotone smooth estimation of ROC curves.

PubMed

Du, Pang; Tang, Liansheng

2009-01-30

When a new diagnostic test is developed, it is of interest to evaluate its accuracy in distinguishing diseased subjects from non-diseased subjects. The accuracy of the test is often evaluated by receiver operating characteristic (ROC) curves. Smooth ROC estimates are often preferable for continuous test results when the underlying ROC curves are in fact continuous. Nonparametric and parametric methods have been proposed by various authors to obtain smooth ROC curve estimates. However, there are certain drawbacks with the existing methods. Parametric methods need specific model assumptions. Nonparametric methods do not always satisfy the inherent properties of the ROC curves, such as monotonicity and transformation invariance. In this paper we propose a monotone spline approach to obtain smooth monotone ROC curves. Our method ensures important inherent properties of the underlying ROC curves, which include monotonicity, transformation invariance, and boundary constraints. We compare the finite sample performance of the newly proposed ROC method with other ROC smoothing methods in large-scale simulation studies. We illustrate our method through a real life example. Copyright (c) 2008 John Wiley & Sons, Ltd.
Boosted Multivariate Trees for Longitudinal Data

PubMed Central

Pande, Amol; Li, Liang; Rajeswaran, Jeevanantham; Ehrlinger, John; Kogalur, Udaya B.; Blackstone, Eugene H.; Ishwaran, Hemant

2017-01-01

Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing P-splines with estimated smoothing parameter. In order to avoid overfitting, we describe a relatively simple in sample cross-validation method which can be used to estimate the optimal boosting iteration and which has the surprising added benefit of stabilizing certain parameter estimates. Our new multivariate tree boosting method is shown to be highly flexible, robust to covariance misspecification and unbalanced designs, and resistant to overfitting in high dimensions. Feature selection can be used to identify important features and feature-time interactions. An application to longitudinal data of forced 1-second lung expiratory volume (FEV1) for lung transplant patients identifies an important feature-time interaction and illustrates the ease with which our method can find complex relationships in longitudinal data. PMID:29249866
Application of the LSQR algorithm in non-parametric estimation of aerosol size distribution

NASA Astrophysics Data System (ADS)

He, Zhenzong; Qi, Hong; Lew, Zhongyuan; Ruan, Liming; Tan, Heping; Luo, Kun

2016-05-01

Based on the Least Squares QR decomposition (LSQR) algorithm, the aerosol size distribution (ASD) is retrieved in non-parametric approach. The direct problem is solved by the Anomalous Diffraction Approximation (ADA) and the Lambert-Beer Law. An optimal wavelength selection method is developed to improve the retrieval accuracy of the ASD. The proposed optimal wavelength set is selected by the method which can make the measurement signals sensitive to wavelength and decrease the degree of the ill-condition of coefficient matrix of linear systems effectively to enhance the anti-interference ability of retrieval results. Two common kinds of monomodal and bimodal ASDs, log-normal (L-N) and Gamma distributions, are estimated, respectively. Numerical tests show that the LSQR algorithm can be successfully applied to retrieve the ASD with high stability in the presence of random noise and low susceptibility to the shape of distributions. Finally, the experimental measurement ASD over Harbin in China is recovered reasonably. All the results confirm that the LSQR algorithm combined with the optimal wavelength selection method is an effective and reliable technique in non-parametric estimation of ASD.
Statistical inference based on the nonparametric maximum likelihood estimator under double-truncation.

PubMed

Emura, Takeshi; Konno, Yoshihiko; Michimae, Hirofumi

2015-07-01

Doubly truncated data consist of samples whose observed values fall between the right- and left- truncation limits. With such samples, the distribution function of interest is estimated using the nonparametric maximum likelihood estimator (NPMLE) that is obtained through a self-consistency algorithm. Owing to the complicated asymptotic distribution of the NPMLE, the bootstrap method has been suggested for statistical inference. This paper proposes a closed-form estimator for the asymptotic covariance function of the NPMLE, which is computationally attractive alternative to bootstrapping. Furthermore, we develop various statistical inference procedures, such as confidence interval, goodness-of-fit tests, and confidence bands to demonstrate the usefulness of the proposed covariance estimator. Simulations are performed to compare the proposed method with both the bootstrap and jackknife methods. The methods are illustrated using the childhood cancer dataset.
Nonparametric Discrete Survival Function Estimation with Uncertain Endpoints Using an Internal Validation Subsample

PubMed Central

Zee, Jarcy; Xie, Sharon X.

2015-01-01

Summary When a true survival endpoint cannot be assessed for some subjects, an alternative endpoint that measures the true endpoint with error may be collected, which often occurs when obtaining the true endpoint is too invasive or costly. We develop an estimated likelihood function for the situation where we have both uncertain endpoints for all participants and true endpoints for only a subset of participants. We propose a nonparametric maximum estimated likelihood estimator of the discrete survival function of time to the true endpoint. We show that the proposed estimator is consistent and asymptotically normal. We demonstrate through extensive simulations that the proposed estimator has little bias compared to the naïve Kaplan-Meier survival function estimator, which uses only uncertain endpoints, and more efficient with moderate missingness compared to the complete-case Kaplan-Meier survival function estimator, which uses only available true endpoints. Finally, we apply the proposed method to a dataset for estimating the risk of developing Alzheimer's disease from the Alzheimer's Disease Neuroimaging Initiative. PMID:25916510
Robust neural network with applications to credit portfolio data analysis.

PubMed

Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun

2010-01-01

In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.
Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample.

PubMed

Wang, Ching-Yun; Song, Xiao

2016-11-01

Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Modeling SF-6D Hong Kong standard gamble health state preference data using a nonparametric Bayesian method.

PubMed

Kharroubi, Samer A; Brazier, John E; McGhee, Sarah

2013-01-01

This article reports on the findings from applying a recently described approach to modeling health state valuation data and the impact of the respondent characteristics on health state valuations. The approach applies a nonparametric model to estimate a Bayesian six-dimensional health state short form (derived from short-form 36 health survey) health state valuation algorithm. A sample of 197 states defined by the six-dimensional health state short form (derived from short-form 36 health survey)has been valued by a representative sample of the Hong Kong general population by using standard gamble. The article reports the application of the nonparametric model and compares it to the original model estimated by using a conventional parametric random effects model. The two models are compared theoretically and in terms of empirical performance. Advantages of the nonparametric model are that it can be used to predict scores in populations with different distributions of characteristics than observed in the survey sample and that it allows for the impact of respondent characteristics to vary by health state (while ensuring that full health passes through unity). The results suggest an important age effect with sex, having some effect, but the remaining covariates having no discernible effect. The nonparametric Bayesian model is argued to be more theoretically appropriate than previously used parametric models. Furthermore, it is more flexible to take into account the impact of covariates. Copyright © 2013, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc.
Using a Nonparametric Bootstrap to Obtain a Confidence Interval for Pearson's "r" with Cluster Randomized Data: A Case Study

ERIC Educational Resources Information Center

Wagstaff, David A.; Elek, Elvira; Kulis, Stephen; Marsiglia, Flavio

2009-01-01

A nonparametric bootstrap was used to obtain an interval estimate of Pearson's "r," and test the null hypothesis that there was no association between 5th grade students' positive substance use expectancies and their intentions to not use substances. The students were participating in a substance use prevention program in which the unit of…
Learning Circulant Sensing Kernels

DTIC Science & Technology

2014-03-01

Furthermore, we test learning the circulant sensing matrix/operator and the nonparametric dictionary altogether and obtain even better performance. We...scale. Furthermore, we test learning the circulant sensing matrix/operator and the nonparametric dictionary altogether and obtain even better performance...matrices, Tropp et al.[28] de - scribes a random filter for acquiring a signal x̄; Haupt et al.[12] describes a channel estimation problem to identify a
Identification and estimation of survivor average causal effects.

PubMed

Tchetgen Tchetgen, Eric J

2014-09-20

In longitudinal studies, outcomes ascertained at follow-up are typically undefined for individuals who die prior to the follow-up visit. In such settings, outcomes are said to be truncated by death and inference about the effects of a point treatment or exposure, restricted to individuals alive at the follow-up visit, could be biased even if as in experimental studies, treatment assignment were randomized. To account for truncation by death, the survivor average causal effect (SACE) defines the effect of treatment on the outcome for the subset of individuals who would have survived regardless of exposure status. In this paper, the author nonparametrically identifies SACE by leveraging post-exposure longitudinal correlates of survival and outcome that may also mediate the exposure effects on survival and outcome. Nonparametric identification is achieved by supposing that the longitudinal data arise from a certain nonparametric structural equations model and by making the monotonicity assumption that the effect of exposure on survival agrees in its direction across individuals. A novel weighted analysis involving a consistent estimate of the survival process is shown to produce consistent estimates of SACE. A data illustration is given, and the methods are extended to the context of time-varying exposures. We discuss a sensitivity analysis framework that relaxes assumptions about independent errors in the nonparametric structural equations model and may be used to assess the extent to which inference may be altered by a violation of key identifying assumptions. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
Identification and estimation of survivor average causal effects

PubMed Central

Tchetgen, Eric J Tchetgen

2014-01-01

In longitudinal studies, outcomes ascertained at follow-up are typically undefined for individuals who die prior to the follow-up visit. In such settings, outcomes are said to be truncated by death and inference about the effects of a point treatment or exposure, restricted to individuals alive at the follow-up visit, could be biased even if as in experimental studies, treatment assignment were randomized. To account for truncation by death, the survivor average causal effect (SACE) defines the effect of treatment on the outcome for the subset of individuals who would have survived regardless of exposure status. In this paper, the author nonparametrically identifies SACE by leveraging post-exposure longitudinal correlates of survival and outcome that may also mediate the exposure effects on survival and outcome. Nonparametric identification is achieved by supposing that the longitudinal data arise from a certain nonparametric structural equations model and by making the monotonicity assumption that the effect of exposure on survival agrees in its direction across individuals. A novel weighted analysis involving a consistent estimate of the survival process is shown to produce consistent estimates of SACE. A data illustration is given, and the methods are extended to the context of time-varying exposures. We discuss a sensitivity analysis framework that relaxes assumptions about independent errors in the nonparametric structural equations model and may be used to assess the extent to which inference may be altered by a violation of key identifying assumptions. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24889022
Cox regression analysis with missing covariates via nonparametric multiple imputation.

PubMed

Hsu, Chiu-Hsieh; Yu, Mandi

2018-01-01

We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

PubMed Central

Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.

2014-01-01

Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289
Parametric and non-parametric species delimitation methods result in the recognition of two new Neotropical woody bamboo species.

PubMed

Ruiz-Sanchez, Eduardo

2015-12-01

The Neotropical woody bamboo genus Otatea is one of five genera in the subtribe Guaduinae. Of the eight described Otatea species, seven are endemic to Mexico and one is also distributed in Central and South America. Otatea acuminata has the widest geographical distribution of the eight species, and two of its recently collected populations do not match the known species morphologically. Parametric and non-parametric methods were used to delimit the species in Otatea using five chloroplast markers, one nuclear marker, and morphological characters. The parametric coalescent method and the non-parametric analysis supported the recognition of two distinct evolutionary lineages. Molecular clock estimates were used to estimate divergence times in Otatea. The results for divergence time in Otatea estimated the origin of the speciation events from the Late Miocene to Late Pleistocene. The species delimitation analyses (parametric and non-parametric) identified that the two populations of O. acuminata from Chiapas and Hidalgo are from two separate evolutionary lineages and these new species have morphological characters that separate them from O. acuminata s.s. The geological activity of the Trans-Mexican Volcanic Belt and the Isthmus of Tehuantepec may have isolated populations and limited the gene flow between Otatea species, driving speciation. Based on the results found here, I describe Otatea rzedowskiorum and Otatea victoriae as two new species, morphologically different from O. acuminata. Copyright © 2015 Elsevier Inc. All rights reserved.
[Detection of quadratic phase coupling between EEG signal components by nonparamatric and parametric methods of bispectral analysis].

PubMed

Schmidt, K; Witte, H

1999-11-01

Recently the assumption of the independence of individual frequency components in a signal has been rejected, for example, for the EEG during defined physiological states such as sleep or sedation [9, 10]. Thus, the use of higher-order spectral analysis capable of detecting interrelations between individual signal components has proved useful. The aim of the present study was to investigate the quality of various non-parametric and parametric estimation algorithms using simulated as well as true physiological data. We employed standard algorithms available for the MATLAB. The results clearly show that parametric bispectral estimation is superior to non-parametric estimation in terms of the quality of peak localisation and the discrimination from other peaks.
Spectral decompositions of multiple time series: a Bayesian non-parametric approach.

PubMed

Macaro, Christian; Prado, Raquel

2014-01-01

We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.

Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach

PubMed Central

Kang, Le; Chen, Weijie; Petrick, Nicholas A.; Gallas, Brandon D.

2014-01-01

The area under the receiver operating characteristic (ROC) curve (AUC) is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of AUC, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. PMID:25399736
Non-parametric methods for cost-effectiveness analysis: the central limit theorem and the bootstrap compared.

PubMed

Nixon, Richard M; Wonderling, David; Grieve, Richard D

2010-03-01

Cost-effectiveness analyses (CEA) alongside randomised controlled trials commonly estimate incremental net benefits (INB), with 95% confidence intervals, and compute cost-effectiveness acceptability curves and confidence ellipses. Two alternative non-parametric methods for estimating INB are to apply the central limit theorem (CLT) or to use the non-parametric bootstrap method, although it is unclear which method is preferable. This paper describes the statistical rationale underlying each of these methods and illustrates their application with a trial-based CEA. It compares the sampling uncertainty from using either technique in a Monte Carlo simulation. The experiments are repeated varying the sample size and the skewness of costs in the population. The results showed that, even when data were highly skewed, both methods accurately estimated the true standard errors (SEs) when sample sizes were moderate to large (n>50), and also gave good estimates for small data sets with low skewness. However, when sample sizes were relatively small and the data highly skewed, using the CLT rather than the bootstrap led to slightly more accurate SEs. We conclude that while in general using either method is appropriate, the CLT is easier to implement, and provides SEs that are at least as accurate as the bootstrap. (c) 2009 John Wiley & Sons, Ltd.
Experimentally testing the dependence of momentum transport on second derivatives using Gaussian process regression

NASA Astrophysics Data System (ADS)

Chilenski, M. A.; Greenwald, M. J.; Hubbard, A. E.; Hughes, J. W.; Lee, J. P.; Marzouk, Y. M.; Rice, J. E.; White, A. E.

2017-12-01

It remains an open question to explain the dramatic change in intrinsic rotation induced by slight changes in electron density (White et al 2013 Phys. Plasmas 20 056106). One proposed explanation is that momentum transport is sensitive to the second derivatives of the temperature and density profiles (Lee et al 2015 Plasma Phys. Control. Fusion 57 125006), but it is widely considered to be impossible to measure these higher derivatives. In this paper, we show that it is possible to estimate second derivatives of electron density and temperature using a nonparametric regression technique known as Gaussian process regression. This technique avoids over-constraining the fit by not assuming an explicit functional form for the fitted curve. The uncertainties, obtained rigorously using Markov chain Monte Carlo sampling, are small enough that it is reasonable to explore hypotheses which depend on second derivatives. It is found that the differences in the second derivatives of n{e} and T{e} between the peaked and hollow rotation cases are rather small, suggesting that changes in the second derivatives are not likely to explain the experimental results.
Stochastic search, optimization and regression with energy applications

NASA Astrophysics Data System (ADS)

Hannah, Lauren A.

Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.
Nonparametric estimation of benchmark doses in environmental risk assessment

PubMed Central

Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

2013-01-01

Summary An important statistical objective in environmental risk analysis is estimation of minimum exposure levels, called benchmark doses (BMDs), that induce a pre-specified benchmark response in a dose-response experiment. In such settings, representations of the risk are traditionally based on a parametric dose-response model. It is a well-known concern, however, that if the chosen parametric form is misspecified, inaccurate and possibly unsafe low-dose inferences can result. We apply a nonparametric approach for calculating benchmark doses, based on an isotonic regression method for dose-response estimation with quantal-response data (Bhattacharya and Kong, 2007). We determine the large-sample properties of the estimator, develop bootstrap-based confidence limits on the BMDs, and explore the confidence limits’ small-sample properties via a short simulation study. An example from cancer risk assessment illustrates the calculations. PMID:23914133
STATISTICAL ESTIMATION AND VISUALIZATION OF GROUND-WATER CONTAMINATION DATA

EPA Science Inventory

This work presents methods of visualizing and animating statistical estimates of ground water and/or soil contamination over a region from observations of the contaminant for that region. The primary statistical methods used to produce the regional estimates are nonparametric re...
Updating estimates of low streamflow statistics to account for possible trends

NASA Astrophysics Data System (ADS)

Blum, A. G.; Archfield, S. A.; Hirsch, R. M.; Vogel, R. M.; Kiang, J. E.; Dudley, R. W.

2017-12-01

Given evidence of both increasing and decreasing trends in low flows in many streams, methods are needed to update estimators of low flow statistics used in water resources management. One such metric is the 10-year annual low-flow statistic (7Q10) calculated as the annual minimum seven-day streamflow which is exceeded in nine out of ten years on average. Historical streamflow records may not be representative of current conditions at a site if environmental conditions are changing. We present a new approach to frequency estimation under nonstationary conditions that applies a stationary nonparametric quantile estimator to a subset of the annual minimum flow record. Monte Carlo simulation experiments were used to evaluate this approach across a range of trend and no trend scenarios. Relative to the standard practice of using the entire available streamflow record, use of a nonparametric quantile estimator combined with selection of the most recent 30 or 50 years for 7Q10 estimation were found to improve accuracy and reduce bias. Benefits of data subset selection approaches were greater for higher magnitude trends annual minimum flow records with lower coefficients of variation. A nonparametric trend test approach for subset selection did not significantly improve upon always selecting the last 30 years of record. At 174 stream gages in the Chesapeake Bay region, 7Q10 estimators based on the most recent 30 years of flow record were compared to estimators based on the entire period of record. Given the availability of long records of low streamflow, using only a subset of the flow record ( 30 years) can be used to update 7Q10 estimators to better reflect current streamflow conditions.
Computation of nonparametric convex hazard estimators via profile methods.

PubMed

Jankowski, Hanna K; Wellner, Jon A

2009-05-01

This paper proposes a profile likelihood algorithm to compute the nonparametric maximum likelihood estimator of a convex hazard function. The maximisation is performed in two steps: First the support reduction algorithm is used to maximise the likelihood over all hazard functions with a given point of minimum (or antimode). Then it is shown that the profile (or partially maximised) likelihood is quasi-concave as a function of the antimode, so that a bisection algorithm can be applied to find the maximum of the profile likelihood, and hence also the global maximum. The new algorithm is illustrated using both artificial and real data, including lifetime data for Canadian males and females.
Nonmechanistic forecasts of seasonal influenza with iterative one-week-ahead distributions.

PubMed

Brooks, Logan C; Farrow, David C; Hyun, Sangwon; Tibshirani, Ryan J; Rosenfeld, Roni

2018-06-15

Accurate and reliable forecasts of seasonal epidemics of infectious disease can assist in the design of countermeasures and increase public awareness and preparedness. This article describes two main contributions we made recently toward this goal: a novel approach to probabilistic modeling of surveillance time series based on "delta densities", and an optimization scheme for combining output from multiple forecasting methods into an adaptively weighted ensemble. Delta densities describe the probability distribution of the change between one observation and the next, conditioned on available data; chaining together nonparametric estimates of these distributions yields a model for an entire trajectory. Corresponding distributional forecasts cover more observed events than alternatives that treat the whole season as a unit, and improve upon multiple evaluation metrics when extracting key targets of interest to public health officials. Adaptively weighted ensembles integrate the results of multiple forecasting methods, such as delta density, using weights that can change from situation to situation. We treat selection of optimal weightings across forecasting methods as a separate estimation task, and describe an estimation procedure based on optimizing cross-validation performance. We consider some details of the data generation process, including data revisions and holiday effects, both in the construction of these forecasting methods and when performing retrospective evaluation. The delta density method and an adaptively weighted ensemble of other forecasting methods each improve significantly on the next best ensemble component when applied separately, and achieve even better cross-validated performance when used in conjunction. We submitted real-time forecasts based on these contributions as part of CDC's 2015/2016 FluSight Collaborative Comparison. Among the fourteen submissions that season, this system was ranked by CDC as the most accurate.
Non-parametric correlative uncertainty quantification and sensitivity analysis: Application to a Langmuir bimolecular adsorption model

NASA Astrophysics Data System (ADS)

Feng, Jinchao; Lansford, Joshua; Mironenko, Alexander; Pourkargar, Davood Babaei; Vlachos, Dionisios G.; Katsoulakis, Markos A.

2018-03-01

We propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, are correlated and are limited in size (small data). The proposed mathematical methodology employs gradient-based methods to compute sensitivity indices. We observe that ranking influential parameters depends critically on whether or not correlations between parameters are taken into account. The impact of uncertainty in the correlation and the necessity of the proposed non-parametric perspective are demonstrated.
Bayesian non-parametric inference for stochastic epidemic models using Gaussian Processes.

PubMed

Xu, Xiaoguang; Kypraios, Theodore; O'Neill, Philip D

2016-10-01

This paper considers novel Bayesian non-parametric methods for stochastic epidemic models. Many standard modeling and data analysis methods use underlying assumptions (e.g. concerning the rate at which new cases of disease will occur) which are rarely challenged or tested in practice. To relax these assumptions, we develop a Bayesian non-parametric approach using Gaussian Processes, specifically to estimate the infection process. The methods are illustrated with both simulated and real data sets, the former illustrating that the methods can recover the true infection process quite well in practice, and the latter illustrating that the methods can be successfully applied in different settings. © The Author 2016. Published by Oxford University Press.
Temporal variations of potential fecundity of southern blue whiting (Micromesistius australis australis) in the Southeast Pacific

NASA Astrophysics Data System (ADS)

Flores, Andrés; Wiff, Rodrigo; Díaz, Eduardo; Carvajal, Bernardita

2017-08-01

Fecundity is a key aspect of fish species reproductive biology because it relates directly to total egg production. Yet, despite such importance, fecundity estimates are lacking or scarce for several fish species. The gravimetric method is the most-used one to estimate fecundity by essentially scaling up the oocyte density to the ovary weight. It is a relatively simple and precise technique, but also time consuming because it requires counting all oocytes in an ovary subsample. The auto-diametric method, on the other hand, is relatively new for estimating fecundity, representing a rapid alternative, because it requires only an estimation of mean oocyte density from mean oocyte diameter. Using the extensive database available from commercial fishery and design surveys for southern blue whiting Micromesistius australis australis in the Southeast Pacific, we compared estimates of fecundity using both gravimetric and auto-diametric methods. Temporal variations in potential fecundity from the auto-diametric method were evaluated using generalised linear models considering predictors from maternal characteristics such as female size, condition factor, oocyte size, and gonadosomatic index. A global and time-invariant auto-diametric equation was evaluated using a simulation procedure based on non-parametric bootstrap. Results indicated there were not significant differences regarding fecundity estimates between the gravimetric and auto-diametric method (p > 0.05). Simulation showed the application of a global equation is unbiased and sufficiently precise to estimate time-invariant fecundity of this species. Temporal variations on fecundity were explained by maternal characteristic, revealing signals of fecundity down-regulation. We discuss how oocyte size and nutritional condition (measured as condition factor) are one of the important factors determining fecundity. We highlighted also the relevance of choosing the appropriate sampling period to conduct maturity studies and ensure precise estimates of fecundity of this species.
Short-term forecasting of meteorological time series using Nonparametric Functional Data Analysis (NPFDA)

NASA Astrophysics Data System (ADS)

Curceac, S.; Ternynck, C.; Ouarda, T.

2015-12-01

Over the past decades, a substantial amount of research has been conducted to model and forecast climatic variables. In this study, Nonparametric Functional Data Analysis (NPFDA) methods are applied to forecast air temperature and wind speed time series in Abu Dhabi, UAE. The dataset consists of hourly measurements recorded for a period of 29 years, 1982-2010. The novelty of the Functional Data Analysis approach is in expressing the data as curves. In the present work, the focus is on daily forecasting and the functional observations (curves) express the daily measurements of the above mentioned variables. We apply a non-linear regression model with a functional non-parametric kernel estimator. The computation of the estimator is performed using an asymmetrical quadratic kernel function for local weighting based on the bandwidth obtained by a cross validation procedure. The proximities between functional objects are calculated by families of semi-metrics based on derivatives and Functional Principal Component Analysis (FPCA). Additionally, functional conditional mode and functional conditional median estimators are applied and the advantages of combining their results are analysed. A different approach employs a SARIMA model selected according to the minimum Akaike (AIC) and Bayessian (BIC) Information Criteria and based on the residuals of the model. The performance of the models is assessed by calculating error indices such as the root mean square error (RMSE), relative RMSE, BIAS and relative BIAS. The results indicate that the NPFDA models provide more accurate forecasts than the SARIMA models. Key words: Nonparametric functional data analysis, SARIMA, time series forecast, air temperature, wind speed
Construction of joint confidence regions for the optimal true class fractions of Receiver Operating Characteristic (ROC) surfaces and manifolds.

PubMed

Bantis, Leonidas E; Nakas, Christos T; Reiser, Benjamin; Myall, Daniel; Dalrymple-Alford, John C

2017-06-01

The three-class approach is used for progressive disorders when clinicians and researchers want to diagnose or classify subjects as members of one of three ordered categories based on a continuous diagnostic marker. The decision thresholds or optimal cut-off points required for this classification are often chosen to maximize the generalized Youden index (Nakas et al., Stat Med 2013; 32: 995-1003). The effectiveness of these chosen cut-off points can be evaluated by estimating their corresponding true class fractions and their associated confidence regions. Recently, in the two-class case, parametric and non-parametric methods were investigated for the construction of confidence regions for the pair of the Youden-index-based optimal sensitivity and specificity fractions that can take into account the correlation introduced between sensitivity and specificity when the optimal cut-off point is estimated from the data (Bantis et al., Biomet 2014; 70: 212-223). A parametric approach based on the Box-Cox transformation to normality often works well while for markers having more complex distributions a non-parametric procedure using logspline density estimation can be used instead. The true class fractions that correspond to the optimal cut-off points estimated by the generalized Youden index are correlated similarly to the two-class case. In this article, we generalize these methods to the three- and to the general k-class case which involves the classification of subjects into three or more ordered categories, where ROC surface or ROC manifold methodology, respectively, is typically employed for the evaluation of the discriminatory capacity of a diagnostic marker. We obtain three- and multi-dimensional joint confidence regions for the optimal true class fractions. We illustrate this with an application to the Trail Making Test Part A that has been used to characterize cognitive impairment in patients with Parkinson's disease.
A non-parametric automatic blending methodology to estimate rainfall fields from rain gauge and radar data

NASA Astrophysics Data System (ADS)

Velasco-Forero, Carlos A.; Sempere-Torres, Daniel; Cassiraga, Eduardo F.; Jaime Gómez-Hernández, J.

2009-07-01

Quantitative estimation of rainfall fields has been a crucial objective from early studies of the hydrological applications of weather radar. Previous studies have suggested that flow estimations are improved when radar and rain gauge data are combined to estimate input rainfall fields. This paper reports new research carried out in this field. Classical approaches for the selection and fitting of a theoretical correlogram (or semivariogram) model (needed to apply geostatistical estimators) are avoided in this study. Instead, a non-parametric technique based on FFT is used to obtain two-dimensional positive-definite correlograms directly from radar observations, dealing with both the natural anisotropy and the temporal variation of the spatial structure of the rainfall in the estimated fields. Because these correlation maps can be automatically obtained at each time step of a given rainfall event, this technique might easily be used in operational (real-time) applications. This paper describes the development of the non-parametric estimator exploiting the advantages of FFT for the automatic computation of correlograms and provides examples of its application on a case study using six rainfall events. This methodology is applied to three different alternatives to incorporate the radar information (as a secondary variable), and a comparison of performances is provided. In particular, their ability to reproduce in estimated rainfall fields (i) the rain gauge observations (in a cross-validation analysis) and (ii) the spatial patterns of radar fields are analyzed. Results seem to indicate that the methodology of kriging with external drift [KED], in combination with the technique of automatically computing 2-D spatial correlograms, provides merged rainfall fields with good agreement with rain gauges and with the most accurate approach to the spatial tendencies observed in the radar rainfall fields, when compared with other alternatives analyzed.
Nonparametric autocovariance estimation from censored time series by Gaussian imputation.

PubMed

Park, Jung Wook; Genton, Marc G; Ghosh, Sujit K

2009-02-01

One of the most frequently used methods to model the autocovariance function of a second-order stationary time series is to use the parametric framework of autoregressive and moving average models developed by Box and Jenkins. However, such parametric models, though very flexible, may not always be adequate to model autocovariance functions with sharp changes. Furthermore, if the data do not follow the parametric model and are censored at a certain value, the estimation results may not be reliable. We develop a Gaussian imputation method to estimate an autocovariance structure via nonparametric estimation of the autocovariance function in order to address both censoring and incorrect model specification. We demonstrate the effectiveness of the technique in terms of bias and efficiency with simulations under various rates of censoring and underlying models. We describe its application to a time series of silicon concentrations in the Arctic.
On an additive partial correlation operator and nonparametric estimation of graphical models.

PubMed

Lee, Kuang-Yao; Li, Bing; Zhao, Hongyu

2016-09-01

We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance.
On an additive partial correlation operator and nonparametric estimation of graphical models

PubMed Central

Li, Bing; Zhao, Hongyu

2016-01-01

Abstract We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance. PMID:29422689
On the estimation of spread rate for a biological population

Treesearch

Jim Clark; Lajos Horváth; Mark Lewis

2001-01-01

We propose a nonparametric estimator for the rate of spread of an introduced population. We prove that the limit distribution of the estimator is normal or stable, depending on the behavior of the moment generating function. We show that resampling methods can also be used to approximate the distribution of the estimators.
Inferring the three-dimensional distribution of dust in the Galaxy with a non-parametric method . Preparing for Gaia

NASA Astrophysics Data System (ADS)

Rezaei Kh., S.; Bailer-Jones, C. A. L.; Hanson, R. J.; Fouesneau, M.

2017-02-01

We present a non-parametric model for inferring the three-dimensional (3D) distribution of dust density in the Milky Way. Our approach uses the extinction measured towards stars at different locations in the Galaxy at approximately known distances. Each extinction measurement is proportional to the integrated dust density along its line of sight (LoS). Making simple assumptions about the spatial correlation of the dust density, we can infer the most probable 3D distribution of dust across the entire observed region, including along sight lines which were not observed. This is possible because our model employs a Gaussian process to connect all LoS. We demonstrate the capability of our model to capture detailed dust density variations using mock data and simulated data from the Gaia Universe Model Snapshot. We then apply our method to a sample of giant stars observed by APOGEE and Kepler to construct a 3D dust map over a small region of the Galaxy. Owing to our smoothness constraint and its isotropy, we provide one of the first maps which does not show the "fingers of God" effect.

An improved nonparametric lower bound of species richness via a modified good-turing frequency formula.

PubMed

Chiu, Chun-Huo; Wang, Yi-Ting; Walther, Bruno A; Chao, Anne

2014-09-01

It is difficult to accurately estimate species richness if there are many almost undetectable species in a hyper-diverse community. Practically, an accurate lower bound for species richness is preferable to an inaccurate point estimator. The traditional nonparametric lower bound developed by Chao (1984, Scandinavian Journal of Statistics 11, 265-270) for individual-based abundance data uses only the information on the rarest species (the numbers of singletons and doubletons) to estimate the number of undetected species in samples. Applying a modified Good-Turing frequency formula, we derive an approximate formula for the first-order bias of this traditional lower bound. The approximate bias is estimated by using additional information (namely, the numbers of tripletons and quadrupletons). This approximate bias can be corrected, and an improved lower bound is thus obtained. The proposed lower bound is nonparametric in the sense that it is universally valid for any species abundance distribution. A similar type of improved lower bound can be derived for incidence data. We test our proposed lower bounds on simulated data sets generated from various species abundance models. Simulation results show that the proposed lower bounds always reduce bias over the traditional lower bounds and improve accuracy (as measured by mean squared error) when the heterogeneity of species abundances is relatively high. We also apply the proposed new lower bounds to real data for illustration and for comparisons with previously developed estimators. © 2014, The International Biometric Society.
Nonparametric estimation of groundwater residence time distributions: What can environmental tracer data tell us about groundwater residence time?

NASA Astrophysics Data System (ADS)

McCallum, James L.; Engdahl, Nicholas B.; Ginn, Timothy R.; Cook, Peter. G.

2014-03-01

Residence time distributions (RTDs) have been used extensively for quantifying flow and transport in subsurface hydrology. In geochemical approaches, environmental tracer concentrations are used in conjunction with simple lumped parameter models (LPMs). Conversely, numerical simulation techniques require large amounts of parameterization and estimated RTDs are certainly limited by associated uncertainties. In this study, we apply a nonparametric deconvolution approach to estimate RTDs using environmental tracer concentrations. The model is based only on the assumption that flow is steady enough that the observed concentrations are well approximated by linear superposition of the input concentrations with the RTD; that is, the convolution integral holds. Even with large amounts of environmental tracer concentration data, the entire shape of an RTD remains highly nonunique. However, accurate estimates of mean ages and in some cases prediction of young portions of the RTD may be possible. The most useful type of data was found to be the use of a time series of tritium. This was due to the sharp variations in atmospheric concentrations and a short half-life. Conversely, the use of CFC compounds with smoothly varying atmospheric concentrations was more prone to nonuniqueness. This work highlights the benefits and limitations of using environmental tracer data to estimate whole RTDs with either LPMs or through numerical simulation. However, the ability of the nonparametric approach developed here to correct for mixing biases in mean ages appears promising.
Comparing two correlated C indices with right-censored survival outcome: a one-shot nonparametric approach.

PubMed

Kang, Le; Chen, Weijie; Petrick, Nicholas A; Gallas, Brandon D

2015-02-20

The area under the receiver operating characteristic curve is often used as a summary index of the diagnostic ability in evaluating biomarkers when the clinical outcome (truth) is binary. When the clinical outcome is right-censored survival time, the C index, motivated as an extension of area under the receiver operating characteristic curve, has been proposed by Harrell as a measure of concordance between a predictive biomarker and the right-censored survival outcome. In this work, we investigate methods for statistical comparison of two diagnostic or predictive systems, of which they could either be two biomarkers or two fixed algorithms, in terms of their C indices. We adopt a U-statistics-based C estimator that is asymptotically normal and develop a nonparametric analytical approach to estimate the variance of the C estimator and the covariance of two C estimators. A z-score test is then constructed to compare the two C indices. We validate our one-shot nonparametric method via simulation studies in terms of the type I error rate and power. We also compare our one-shot method with resampling methods including the jackknife and the bootstrap. Simulation results show that the proposed one-shot method provides almost unbiased variance estimations and has satisfactory type I error control and power. Finally, we illustrate the use of the proposed method with an example from the Framingham Heart Study. Copyright © 2014 John Wiley & Sons, Ltd.
Time-Varying Delay Estimation Applied to the Surface Electromyography Signals Using the Parametric Approach

NASA Astrophysics Data System (ADS)

Luu, Gia Thien; Boualem, Abdelbassit; Duy, Tran Trung; Ravier, Philippe; Butteli, Olivier

Muscle Fiber Conduction Velocity (MFCV) can be calculated from the time delay between the surface electromyographic (sEMG) signals recorded by electrodes aligned with the fiber direction. In order to take into account the non-stationarity during the dynamic contraction (the most daily life situation) of the data, the developed methods have to consider that the MFCV changes over time, which induces time-varying delays and the data is non-stationary (change of Power Spectral Density (PSD)). In this paper, the problem of TVD estimation is considered using a parametric method. First, the polynomial model of TVD has been proposed. Then, the TVD model parameters are estimated by using a maximum likelihood estimation (MLE) strategy solved by a deterministic optimization technique (Newton) and stochastic optimization technique, called simulated annealing (SA). The performance of the two techniques is also compared. We also derive two appropriate Cramer-Rao Lower Bounds (CRLB) for the estimated TVD model parameters and for the TVD waveforms. Monte-Carlo simulation results show that the estimation of both the model parameters and the TVD function is unbiased and that the variance obtained is close to the derived CRBs. A comparison with non-parametric approaches of the TVD estimation is also presented and shows the superiority of the method proposed.
A nonparametric mean-variance smoothing method to assess Arabidopsis cold stress transcriptional regulator CBF2 overexpression microarray data.

PubMed

Hu, Pingsha; Maiti, Tapabrata

2011-01-01

Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.
A Nonparametric Mean-Variance Smoothing Method to Assess Arabidopsis Cold Stress Transcriptional Regulator CBF2 Overexpression Microarray Data

PubMed Central

Hu, Pingsha; Maiti, Tapabrata

2011-01-01

Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181
Maternal and child mortality indicators across 187 countries of the world: converging or diverging.

PubMed

Goli, Srinivas; Arokiasamy, Perianayagam

2014-01-01

This study reassessed the progress achieved since 1990 in maternal and child mortality indicators to test whether the progress is converging or diverging across countries worldwide. The convergence process is examined using standard parametric and non-parametric econometric models of convergence. The results of absolute convergence estimates reveal that progress in maternal and child mortality indicators is diverging for the entire period of 1990-2010 [maternal mortality ratio (MMR) - β = .00033, p < .574; neonatal mortality rate (NNMR) - β = .04367, p < .000; post-neonatal mortality rate (PNMR) - β = .02677, p < .000; under-five mortality rate (U5MR) - β = .00828, p < .000)]. In the recent period, such divergence is replaced with convergence for MMR but diverged for all the child mortality indicators. The results of Kernel density estimate reveal considerable reduction in divergence of MMR for the recent period; however, the Kernel density distribution plots show more than one 'peak' which indicates the emergence of convergence clubs based on their mortality levels. For child mortality indicators, the Kernel estimates suggest that divergence is in progress across the countries worldwide but tended to converge for countries with low mortality levels. A mere progress in global averages of maternal and child mortality indicators among a global cross-section of countries does not warranty convergence unless there is a considerable reduction in variance, skewness and range of change.
Gaussian process-based Bayesian nonparametric inference of population size trajectories from gene genealogies.

PubMed

Palacios, Julia A; Minin, Vladimir N

2013-03-01

Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method. Copyright © 2013, The International Biometric Society.
Estimation of kinetic parameters from list-mode data using an indirect apporach

NASA Astrophysics Data System (ADS)

Ortiz, Joseph Christian

This dissertation explores the possibility of using an imaging approach to model classical pharmacokinetic (PK) problems. The kinetic parameters which describe the uptake rates of a drug within a biological system, are parameters of interest. Knowledge of the drug uptake in a system is useful in expediting the drug development process, as well as providing a dosage regimen for patients. Traditionally, the uptake rate of a drug in a system is obtained via sampling the concentration of the drug in a central compartment, usually the blood, and fitting the data to a curve. In a system consisting of multiple compartments, the number of kinetic parameters is proportional to the number of compartments, and in classical PK experiments, the number of identifiable parameters is less than the total number of parameters. Using an imaging approach to model classical PK problems, the support region of each compartment within the system will be exactly known, and all the kinetic parameters are uniquely identifiable. To solve for the kinetic parameters, an indirect approach, which is a two part process, was used. First the compartmental activity was obtained from data, and next the kinetic parameters were estimated. The novel aspect of the research is using listmode data to obtain the activity curves from a system as opposed to a traditional binned approach. Using techniques from information theoretic learning, particularly kernel density estimation, a non-parametric probability density function for the voltage outputs on each photo-multiplier tube, for each event, was generated on the fly, which was used in a least squares optimization routine to estimate the compartmental activity. The estimability of the activity curves for varying noise levels as well as time sample densities were explored. Once an estimate for the activity was obtained, the kinetic parameters were obtained using multiple cost functions, and the compared to each other using the mean squared error as the figure of merit.
Nonparametric Estimation of Standard Errors in Covariance Analysis Using the Infinitesimal Jackknife

ERIC Educational Resources Information Center

Jennrich, Robert I.

2008-01-01

The infinitesimal jackknife provides a simple general method for estimating standard errors in covariance structure analysis. Beyond its simplicity and generality what makes the infinitesimal jackknife method attractive is that essentially no assumptions are required to produce consistent standard error estimates, not even the requirement that the…
Geostatistics and petroleum geology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hohn, M.E.

1988-01-01

This book examines purpose and use of geostatistics in exploration and development of oil and gas with an emphasis on appropriate and pertinent case studies. It present an overview of geostatistics. Topics covered include: The semivariogram; Linear estimation; Multivariate geostatistics; Nonlinear estimation; From indicator variables to nonparametric estimation; and More detail, less certainty; conditional simulation.
Nonparametric Model of Smooth Muscle Force Production During Electrical Stimulation.

PubMed

Cole, Marc; Eikenberry, Steffen; Kato, Takahide; Sandler, Roman A; Yamashiro, Stanley M; Marmarelis, Vasilis Z

2017-03-01

A nonparametric model of smooth muscle tension response to electrical stimulation was estimated using the Laguerre expansion technique of nonlinear system kernel estimation. The experimental data consisted of force responses of smooth muscle to energy-matched alternating single pulse and burst current stimuli. The burst stimuli led to at least a 10-fold increase in peak force in smooth muscle from Mytilus edulis, despite the constant energy constraint. A linear model did not fit the data. However, a second-order model fit the data accurately, so the higher-order models were not required to fit the data. Results showed that smooth muscle force response is not linearly related to the stimulation power.
Wavelet Filtering to Reduce Conservatism in Aeroservoelastic Robust Stability Margins

NASA Technical Reports Server (NTRS)

Brenner, Marty; Lind, Rick

1998-01-01

Wavelet analysis for filtering and system identification was used to improve the estimation of aeroservoelastic stability margins. The conservatism of the robust stability margins was reduced with parametric and nonparametric time-frequency analysis of flight data in the model validation process. Nonparametric wavelet processing of data was used to reduce the effects of external desirableness and unmodeled dynamics. Parametric estimates of modal stability were also extracted using the wavelet transform. Computation of robust stability margins for stability boundary prediction depends on uncertainty descriptions derived from the data for model validation. F-18 high Alpha Research Vehicle aeroservoelastic flight test data demonstrated improved robust stability prediction by extension of the stability boundary beyond the flight regime.
Probability Distribution Extraction from TEC Estimates based on Kernel Density Estimation

NASA Astrophysics Data System (ADS)

Demir, Uygar; Toker, Cenk; Çenet, Duygu

2016-07-01

Statistical analysis of the ionosphere, specifically the Total Electron Content (TEC), may reveal important information about its temporal and spatial characteristics. One of the core metrics that express the statistical properties of a stochastic process is its Probability Density Function (pdf). Furthermore, statistical parameters such as mean, variance and kurtosis, which can be derived from the pdf, may provide information about the spatial uniformity or clustering of the electron content. For example, the variance differentiates between a quiet ionosphere and a disturbed one, whereas kurtosis differentiates between a geomagnetic storm and an earthquake. Therefore, valuable information about the state of the ionosphere (and the natural phenomena that cause the disturbance) can be obtained by looking at the statistical parameters. In the literature, there are publications which try to fit the histogram of TEC estimates to some well-known pdf.s such as Gaussian, Exponential, etc. However, constraining a histogram to fit to a function with a fixed shape will increase estimation error, and all the information extracted from such pdf will continue to contain this error. In such techniques, it is highly likely to observe some artificial characteristics in the estimated pdf which is not present in the original data. In the present study, we use the Kernel Density Estimation (KDE) technique to estimate the pdf of the TEC. KDE is a non-parametric approach which does not impose a specific form on the TEC. As a result, better pdf estimates that almost perfectly fit to the observed TEC values can be obtained as compared to the techniques mentioned above. KDE is particularly good at representing the tail probabilities, and outliers. We also calculate the mean, variance and kurtosis of the measured TEC values. The technique is applied to the ionosphere over Turkey where the TEC values are estimated from the GNSS measurement from the TNPGN-Active (Turkish National Permanent GNSS Network) network. This study is supported by by TUBITAK 115E915 and Joint TUBITAK 114E092 and AS CR14/001 projects.
Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.

PubMed

Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R

2018-06-01

Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
Applications of quantum entropy to statistics

NASA Astrophysics Data System (ADS)

Silver, R. N.; Martz, H. F.

This paper develops two generalizations of the maximum entropy (ME) principle. First, Shannon classical entropy is replaced by von Neumann quantum entropy to yield a broader class of information divergences (or penalty functions) for statistics applications. Negative relative quantum entropy enforces convexity, positivity, non-local extensivity and prior correlations such as smoothness. This enables the extension of ME methods from their traditional domain of ill-posed in-verse problems to new applications such as non-parametric density estimation. Second, given a choice of information divergence, a combination of ME and Bayes rule is used to assign both prior and posterior probabilities. Hyperparameters are interpreted as Lagrange multipliers enforcing constraints. Conservation principles are proposed to act statistical regularization and other hyperparameters, such as conservation of information and smoothness. ME provides an alternative to hierarchical Bayes methods.
Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

USGS Publications Warehouse

Lee, L.; Helsel, D.

2007-01-01

Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
A PDF-based classification of gait cadence patterns in patients with amyotrophic lateral sclerosis.

PubMed

Wu, Yunfeng; Ng, Sin Chun

2010-01-01

Amyotrophic lateral sclerosis (ALS) is a type of neurological disease due to the degeneration of motor neurons. During the course of such a progressive disease, it would be difficult for ALS patients to regulate normal locomotion, so that the gait stability becomes perturbed. This paper presents a pilot statistical study on the gait cadence (or stride interval) in ALS, based on the statistical analysis method. The probability density functions (PDFs) of stride interval were first estimated with the nonparametric Parzen-window method. We computed the mean of the left-foot stride interval and the modified Kullback-Leibler divergence (MKLD) from the PDFs estimated. The analysis results suggested that both of these two statistical parameters were significantly altered in ALS, and the least-squares support vector machine (LS-SVM) may effectively distinguish the stride patterns between the ALS patients and healthy controls, with an accurate rate of 82.8% and an area of 0.87 under the receiver operating characteristic curve.
Nonparametric and Semiparametric Regression Estimation for Length-biased Survival Data

PubMed Central

Shen, Yu; Ning, Jing; Qin, Jing

2016-01-01

For the past several decades, nonparametric and semiparametric modeling for conventional right-censored survival data has been investigated intensively under a noninformative censoring mechanism. However, these methods may not be applicable for analyzing right-censored survival data that arise from prevalent cohorts when the failure times are subject to length-biased sampling. This review article is intended to provide a summary of some newly developed methods as well as established methods for analyzing length-biased data. PMID:27086362
Nonparametric tests for interaction and group differences in a two-way layout.

PubMed

Fisher, A C; Wallenstein, S

1991-01-01

Nonparametric tests of group differences and interaction across strata are developed in which the null hypotheses for these tests are expressed as functions of rho i = P(X > Y) + 1/2P(X = Y), where X refers to a random observation from one group and Y refers to a random observation from the other group within stratum i. The estimator r of the parameter rho is shown to be a useful way to summarize and examine data for ordinal and continuous data.

A Nonparametric Approach to Estimate Classification Accuracy and Consistency

ERIC Educational Resources Information Center

Lathrop, Quinn N.; Cheng, Ying

2014-01-01

When cut scores for classifications occur on the total score scale, popular methods for estimating classification accuracy (CA) and classification consistency (CC) require assumptions about a parametric form of the test scores or about a parametric response model, such as item response theory (IRT). This article develops an approach to estimate CA…
Stochastic Residual-Error Analysis For Estimating Hydrologic Model Predictive Uncertainty

EPA Science Inventory

A hybrid time series-nonparametric sampling approach, referred to herein as semiparametric, is presented for the estimation of model predictive uncertainty. The methodology is a two-step procedure whereby a distributed hydrologic model is first calibrated, then followed by brute ...
Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting

PubMed Central

Chan, Kwun Chuen Gary; Yam, Sheung Chi Phillip; Zhang, Zheng

2015-01-01

Summary The estimation of average treatment effects based on observational data is extremely important in practice and has been studied by generations of statisticians under different frameworks. Existing globally efficient estimators require non-parametric estimation of a propensity score function, an outcome regression function or both, but their performance can be poor in practical sample sizes. Without explicitly estimating either functions, we consider a wide class calibration weights constructed to attain an exact three-way balance of the moments of observed covariates among the treated, the control, and the combined group. The wide class includes exponential tilting, empirical likelihood and generalized regression as important special cases, and extends survey calibration estimators to different statistical problems and with important distinctions. Global semiparametric efficiency for the estimation of average treatment effects is established for this general class of calibration estimators. The results show that efficiency can be achieved by solely balancing the covariate distributions without resorting to direct estimation of propensity score or outcome regression function. We also propose a consistent estimator for the efficient asymptotic variance, which does not involve additional functional estimation of either the propensity score or the outcome regression functions. The proposed variance estimator outperforms existing estimators that require a direct approximation of the efficient influence function. PMID:27346982
Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting.

PubMed

Chan, Kwun Chuen Gary; Yam, Sheung Chi Phillip; Zhang, Zheng

2016-06-01

The estimation of average treatment effects based on observational data is extremely important in practice and has been studied by generations of statisticians under different frameworks. Existing globally efficient estimators require non-parametric estimation of a propensity score function, an outcome regression function or both, but their performance can be poor in practical sample sizes. Without explicitly estimating either functions, we consider a wide class calibration weights constructed to attain an exact three-way balance of the moments of observed covariates among the treated, the control, and the combined group. The wide class includes exponential tilting, empirical likelihood and generalized regression as important special cases, and extends survey calibration estimators to different statistical problems and with important distinctions. Global semiparametric efficiency for the estimation of average treatment effects is established for this general class of calibration estimators. The results show that efficiency can be achieved by solely balancing the covariate distributions without resorting to direct estimation of propensity score or outcome regression function. We also propose a consistent estimator for the efficient asymptotic variance, which does not involve additional functional estimation of either the propensity score or the outcome regression functions. The proposed variance estimator outperforms existing estimators that require a direct approximation of the efficient influence function.
Rasch Model Parameter Estimation in the Presence of a Nonnormal Latent Trait Using a Nonparametric Bayesian Approach

ERIC Educational Resources Information Center

Finch, Holmes; Edwards, Julianne M.

2016-01-01

Standard approaches for estimating item response theory (IRT) model parameters generally work under the assumption that the latent trait being measured by a set of items follows the normal distribution. Estimation of IRT parameters in the presence of nonnormal latent traits has been shown to generate biased person and item parameter estimates. A…
Determining the multi-scale hedge ratios of stock index futures using the lower partial moments method

NASA Astrophysics Data System (ADS)

Dai, Jun; Zhou, Haigang; Zhao, Shaoquan

2017-01-01

This paper considers a multi-scale future hedge strategy that minimizes lower partial moments (LPM). To do this, wavelet analysis is adopted to decompose time series data into different components. Next, different parametric estimation methods with known distributions are applied to calculate the LPM of hedged portfolios, which is the key to determining multi-scale hedge ratios over different time scales. Then these parametric methods are compared with the prevailing nonparametric kernel metric method. Empirical results indicate that in the China Securities Index 300 (CSI 300) index futures and spot markets, hedge ratios and hedge efficiency estimated by the nonparametric kernel metric method are inferior to those estimated by parametric hedging model based on the features of sequence distributions. In addition, if minimum-LPM is selected as a hedge target, the hedging periods, degree of risk aversion, and target returns can affect the multi-scale hedge ratios and hedge efficiency, respectively.
Type I Error Rates and Power Estimates of Selected Parametric and Nonparametric Tests of Scale.

ERIC Educational Resources Information Center

Olejnik, Stephen F.; Algina, James

1987-01-01

Estimated Type I Error rates and power are reported for the Brown-Forsythe, O'Brien, Klotz, and Siegal-Tukey procedures. The effect of aligning the data using deviations from group means or group medians is investigated. (RB)
Accurate Biomass Estimation via Bayesian Adaptive Sampling

NASA Technical Reports Server (NTRS)

Wheeler, Kevin R.; Knuth, Kevin H.; Castle, Joseph P.; Lvov, Nikolay

2005-01-01

The following concepts were introduced: a) Bayesian adaptive sampling for solving biomass estimation; b) Characterization of MISR Rahman model parameters conditioned upon MODIS landcover. c) Rigorous non-parametric Bayesian approach to analytic mixture model determination. d) Unique U.S. asset for science product validation and verification.
Estimation of spline function in nonparametric path analysis based on penalized weighted least square (PWLS)

NASA Astrophysics Data System (ADS)

Fernandes, Adji Achmad Rinaldo; Solimun, Arisoesilaningsih, Endang

2017-12-01

The aim of this research is to estimate the spline in Path Analysis-based on Nonparametric Regression using Penalized Weighted Least Square (PWLS) approach. Approach used is Reproducing Kernel Hilbert Space at sobolev space. Nonparametric path analysis model on the equation y1 i=f1.1(x1 i)+ε1 i; y2 i=f1.2(x1 i)+f2.2(y1 i)+ε2 i; i =1 ,2 ,…,n Nonparametric Path Analysis which meet the criteria of minimizing PWLS min fw .k∈W2m[aw .k,bw .k], k =1 ,2 { (2n ) -1(y˜-f ˜ ) TΣ-1(y ˜-f ˜ ) + ∑k =1 2 ∑w =1 2 λw .k ∫aw .k bw .k [fw.k (m )(xi) ] 2d xi } is f ˜^=Ay ˜ with A=T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1+V1U1-1∑-1[I-T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1] columnalign="left">+T2(T2TU2-1∑-1T2)-1T2TU2-1∑-1+V2U2-1∑-1[I1-T2(T2TU2-1∑-1T2) -1T2TU2-1∑-1
Systematics in lensing reconstruction: dark matter rings in the sky?

NASA Astrophysics Data System (ADS)

Ponente, P. P.; Diego, J. M.

2011-11-01

Context. Non-parametric lensing methods are a useful way of reconstructing the lensing mass of a cluster without making assumptions about the way the mass is distributed in the cluster. These methods are particularly powerful in the case of galaxy clusters with a large number of constraints. The advantage of not assuming implicitly that the luminous matter follows the dark matter is particularly interesting in those cases where the cluster is in a non-relaxed dynamical state. On the other hand, non-parametric methods have several limitations that should be taken into account carefully. Aims: We explore some of these limitations and focus on their implications for the possible ring of dark matter around the galaxy cluster CL0024+17. Methods: We project three background galaxies through a mock cluster of known radial profile density and obtain a map for the arcs (θ map). We also calculate the shear field associated with the mock cluster across the whole field of view (3.3 arcmin). Combining the positions of the arcs and the two-direction shear, we perform an inversion of the lens equation using two separate methods, the biconjugate gradient, and the quadratic programming (QADP) to reconstruct the convergence map of the mock cluster. Results: We explore the space of the solutions of the convergence map and compare the radial density profiles to the density profile of the mock cluster. When the inversion matrix algorithms are forced to find the exact solution, we encounter systematic effects resembling ring structures, that clearly depart from the original convergence map. Conclusions: Overfitting lensing data with a non-parametric method can produce ring-like structures similar to the alleged one in CL0024.
The current duration design for estimating the time to pregnancy distribution: a nonparametric Bayesian perspective.

PubMed

Gasbarra, Dario; Arjas, Elja; Vehtari, Aki; Slama, Rémy; Keiding, Niels

2015-10-01

This paper was inspired by the studies of Niels Keiding and co-authors on estimating the waiting time-to-pregnancy (TTP) distribution, and in particular on using the current duration design in that context. In this design, a cross-sectional sample of women is collected from those who are currently attempting to become pregnant, and then by recording from each the time she has been attempting. Our aim here is to study the identifiability and the estimation of the waiting time distribution on the basis of current duration data. The main difficulty in this stems from the fact that very short waiting times are only rarely selected into the sample of current durations, and this renders their estimation unstable. We introduce here a Bayesian method for this estimation problem, prove its asymptotic consistency, and compare the method to some variants of the non-parametric maximum likelihood estimators, which have been used previously in this context. The properties of the Bayesian estimation method are studied also empirically, using both simulated data and TTP data on current durations collected by Slama et al. (Hum Reprod 27(5):1489-1498, 2012).
Modeling animal movements using stochastic differential equations

Treesearch

Haiganoush K. Preisler; Alan A. Ager; Bruce K. Johnson; John G. Kie

2004-01-01

We describe the use of bivariate stochastic differential equations (SDE) for modeling movements of 216 radiocollared female Rocky Mountain elk at the Starkey Experimental Forest and Range in northeastern Oregon. Spatially and temporally explicit vector fields were estimated using approximating difference equations and nonparametric regression techniques. Estimated...
The Infinitesimal Jackknife with Exploratory Factor Analysis

ERIC Educational Resources Information Center

Zhang, Guangjian; Preacher, Kristopher J.; Jennrich, Robert I.

2012-01-01

The infinitesimal jackknife, a nonparametric method for estimating standard errors, has been used to obtain standard error estimates in covariance structure analysis. In this article, we adapt it for obtaining standard errors for rotated factor loadings and factor correlations in exploratory factor analysis with sample correlation matrices. Both…
A new adaptive algorithm for automated feature extraction in exponentially damped signals for health monitoring of smart structures

NASA Astrophysics Data System (ADS)

Qarib, Hossein; Adeli, Hojjat

2015-12-01

In this paper authors introduce a new adaptive signal processing technique for feature extraction and parameter estimation in noisy exponentially damped signals. The iterative 3-stage method is based on the adroit integration of the strengths of parametric and nonparametric methods such as multiple signal categorization, matrix pencil, and empirical mode decomposition algorithms. The first stage is a new adaptive filtration or noise removal scheme. The second stage is a hybrid parametric-nonparametric signal parameter estimation technique based on an output-only system identification technique. The third stage is optimization of estimated parameters using a combination of the primal-dual path-following interior point algorithm and genetic algorithm. The methodology is evaluated using a synthetic signal and a signal obtained experimentally from transverse vibrations of a steel cantilever beam. The method is successful in estimating the frequencies accurately. Further, it estimates the damping exponents. The proposed adaptive filtration method does not include any frequency domain manipulation. Consequently, the time domain signal is not affected as a result of frequency domain and inverse transformations.
Local Composite Quantile Regression Smoothing for Harris Recurrent Markov Processes

PubMed Central

Li, Degui; Li, Runze

2016-01-01

In this paper, we study the local polynomial composite quantile regression (CQR) smoothing method for the nonlinear and nonparametric models under the Harris recurrent Markov chain framework. The local polynomial CQR regression method is a robust alternative to the widely-used local polynomial method, and has been well studied in stationary time series. In this paper, we relax the stationarity restriction on the model, and allow that the regressors are generated by a general Harris recurrent Markov process which includes both the stationary (positive recurrent) and nonstationary (null recurrent) cases. Under some mild conditions, we establish the asymptotic theory for the proposed local polynomial CQR estimator of the mean regression function, and show that the convergence rate for the estimator in nonstationary case is slower than that in stationary case. Furthermore, a weighted type local polynomial CQR estimator is provided to improve the estimation efficiency, and a data-driven bandwidth selection is introduced to choose the optimal bandwidth involved in the nonparametric estimators. Finally, we give some numerical studies to examine the finite sample performance of the developed methodology and theory. PMID:27667894
A comparison of selected parametric and non-parametric imputation methods for estimating forest biomass and basal area

Treesearch

Donald Gagliasso; Susan Hummel; Hailemariam Temesgen

2014-01-01

Various methods have been used to estimate the amount of above ground forest biomass across landscapes and to create biomass maps for specific stands or pixels across ownership or project areas. Without an accurate estimation method, land managers might end up with incorrect biomass estimate maps, which could lead them to make poorer decisions in their future...
Estimating technical efficiency in the hospital sector with panel data: a comparison of parametric and non-parametric techniques.

PubMed

Siciliani, Luigi

2006-01-01

Policy makers are increasingly interested in developing performance indicators that measure hospital efficiency. These indicators may give the purchasers of health services an additional regulatory tool to contain health expenditure. Using panel data, this study compares different parametric (econometric) and non-parametric (linear programming) techniques for the measurement of a hospital's technical efficiency. This comparison was made using a sample of 17 Italian hospitals in the years 1996-9. Highest correlations are found in the efficiency scores between the non-parametric data envelopment analysis under the constant returns to scale assumption (DEA-CRS) and several parametric models. Correlation reduces markedly when using more flexible non-parametric specifications such as data envelopment analysis under the variable returns to scale assumption (DEA-VRS) and the free disposal hull (FDH) model. Correlation also generally reduces when moving from one output to two-output specifications. This analysis suggests that there is scope for developing performance indicators at hospital level using panel data, but it is important that extensive sensitivity analysis is carried out if purchasers wish to make use of these indicators in practice.
Object-oriented recognition of high-resolution remote sensing image

NASA Astrophysics Data System (ADS)

Wang, Yongyan; Li, Haitao; Chen, Hong; Xu, Yuannan

2016-01-01

With the development of remote sensing imaging technology and the improvement of multi-source image's resolution in satellite visible light, multi-spectral and hyper spectral , the high resolution remote sensing image has been widely used in various fields, for example military field, surveying and mapping, geophysical prospecting, environment and so forth. In remote sensing image, the segmentation of ground targets, feature extraction and the technology of automatic recognition are the hotspot and difficulty in the research of modern information technology. This paper also presents an object-oriented remote sensing image scene classification method. The method is consist of vehicles typical objects classification generation, nonparametric density estimation theory, mean shift segmentation theory, multi-scale corner detection algorithm, local shape matching algorithm based on template. Remote sensing vehicles image classification software system is designed and implemented to meet the requirements .
Nonparametric identification of nonlinear dynamic systems using a synchronisation-based method

NASA Astrophysics Data System (ADS)

Kenderi, Gábor; Fidlin, Alexander

2014-12-01

The present study proposes an identification method for highly nonlinear mechanical systems that does not require a priori knowledge of the underlying nonlinearities to reconstruct arbitrary restoring force surfaces between degrees of freedom. This approach is based on the master-slave synchronisation between a dynamic model of the system as the slave and the real system as the master using measurements of the latter. As the model synchronises to the measurements, it becomes an observer of the real system. The optimal observer algorithm in a least-squares sense is given by the Kalman filter. Using the well-known state augmentation technique, the Kalman filter can be turned into a dual state and parameter estimator to identify parameters of a priori characterised nonlinearities. The paper proposes an extension of this technique towards nonparametric identification. A general system model is introduced by describing the restoring forces as bilateral spring-dampers with time-variant coefficients, which are estimated as augmented states. The estimation procedure is followed by an a posteriori statistical analysis to reconstruct noise-free restoring force characteristics using the estimated states and their estimated variances. Observability is provided using only one measured mechanical quantity per degree of freedom, which makes this approach less demanding in the number of necessary measurement signals compared with truly nonparametric solutions, which typically require displacement, velocity and acceleration signals. Additionally, due to the statistical rigour of the procedure, it successfully addresses signals corrupted by significant measurement noise. In the present paper, the method is described in detail, which is followed by numerical examples of one degree of freedom (1DoF) and 2DoF mechanical systems with strong nonlinearities of vibro-impact type to demonstrate the effectiveness of the proposed technique.
Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records.

PubMed

Cheung, Li C; Pan, Qing; Hyun, Noorie; Schiffman, Mark; Fetterman, Barbara; Castle, Philip E; Lorey, Thomas; Katki, Hormuzd A

2017-09-30

For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within large health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and there can be pre-existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan-Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. We propose a general family of mixture models for undiagnosed prevalent disease and interval-censored incident disease that we call prevalence-incidence models. Parameters for parametric prevalence-incidence models, such as the logistic regression and Weibull survival (logistic-Weibull) model, are estimated by direct likelihood maximization or by EM algorithm. Non-parametric methods are proposed to calculate cumulative risks for cases without covariates. We compare naive Kaplan-Meier, logistic-Weibull, and non-parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan-Meier provided poor estimates while the logistic-Weibull model was a close fit to the non-parametric. Our findings support our use of logistic-Weibull models to develop the risk estimates that underlie current US risk-based cervical cancer screening guidelines. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

A nonparametric clustering technique which estimates the number of clusters

NASA Technical Reports Server (NTRS)

Ramey, D. B.

1983-01-01

In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
Nonparametric Signal Extraction and Measurement Error in the Analysis of Electroencephalographic Activity During Sleep

PubMed Central

Crainiceanu, Ciprian M.; Caffo, Brian S.; Di, Chong-Zhi; Punjabi, Naresh M.

2009-01-01

We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep. Using subject specific signals estimated with known variability in a second level regression becomes a nonstandard measurement error problem. We propose and implement methods that take into account cross-sectional and longitudinal measurement error. The research presented here forms the basis for EEG signal processing for the SHHS. PMID:20057925
Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

PubMed

Ishwaran, Hemant; Lu, Min

2018-06-04

Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.
Smoothness of In vivo Spectral Baseline Determined by Mean Squared Error

PubMed Central

Zhang, Yan; Shen, Jun

2013-01-01

Purpose A nonparametric smooth line is usually added to spectral model to account for background signals in vivo magnetic resonance spectroscopy (MRS). The assumed smoothness of the baseline significantly influences quantitative spectral fitting. In this paper, a method is proposed to minimize baseline influences on estimated spectral parameters. Methods In this paper, the non-parametric baseline function with a given smoothness was treated as a function of spectral parameters. Its uncertainty was measured by root-mean-squared error (RMSE). The proposed method was demonstrated with a simulated spectrum and in vivo spectra of both short echo time (TE) and averaged echo times. The estimated in vivo baselines were compared with the metabolite-nulled spectra, and the LCModel-estimated baselines. The accuracies of estimated baseline and metabolite concentrations were further verified by cross-validation. Results An optimal smoothness condition was found that led to the minimal baseline RMSE. In this condition, the best fit was balanced against minimal baseline influences on metabolite concentration estimates. Conclusion Baseline RMSE can be used to indicate estimated baseline uncertainties and serve as the criterion for determining the baseline smoothness of in vivo MRS. PMID:24259436
Efficient multidimensional regularization for Volterra series estimation

NASA Astrophysics Data System (ADS)

Birpoutsoukis, Georgios; Csurcsia, Péter Zoltán; Schoukens, Johan

2018-05-01

This paper presents an efficient nonparametric time domain nonlinear system identification method. It is shown how truncated Volterra series models can be efficiently estimated without the need of long, transient-free measurements. The method is a novel extension of the regularization methods that have been developed for impulse response estimates of linear time invariant systems. To avoid the excessive memory needs in case of long measurements or large number of estimated parameters, a practical gradient-based estimation method is also provided, leading to the same numerical results as the proposed Volterra estimation method. Moreover, the transient effects in the simulated output are removed by a special regularization method based on the novel ideas of transient removal for Linear Time-Varying (LTV) systems. Combining the proposed methodologies, the nonparametric Volterra models of the cascaded water tanks benchmark are presented in this paper. The results for different scenarios varying from a simple Finite Impulse Response (FIR) model to a 3rd degree Volterra series with and without transient removal are compared and studied. It is clear that the obtained models capture the system dynamics when tested on a validation dataset, and their performance is comparable with the white-box (physical) models.
Water Residence Time estimation by 1D deconvolution in the form of a l2 -regularized inverse problem with smoothness, positivity and causality constraints

NASA Astrophysics Data System (ADS)

Meresescu, Alina G.; Kowalski, Matthieu; Schmidt, Frédéric; Landais, François

2018-06-01

The Water Residence Time distribution is the equivalent of the impulse response of a linear system allowing the propagation of water through a medium, e.g. the propagation of rain water from the top of the mountain towards the aquifers. We consider the output aquifer levels as the convolution between the input rain levels and the Water Residence Time, starting with an initial aquifer base level. The estimation of Water Residence Time is important for a better understanding of hydro-bio-geochemical processes and mixing properties of wetlands used as filters in ecological applications, as well as protecting fresh water sources for wells from pollutants. Common methods of estimating the Water Residence Time focus on cross-correlation, parameter fitting and non-parametric deconvolution methods. Here we propose a 1D full-deconvolution, regularized, non-parametric inverse problem algorithm that enforces smoothness and uses constraints of causality and positivity to estimate the Water Residence Time curve. Compared to Bayesian non-parametric deconvolution approaches, it has a fast runtime per test case; compared to the popular and fast cross-correlation method, it produces a more precise Water Residence Time curve even in the case of noisy measurements. The algorithm needs only one regularization parameter to balance between smoothness of the Water Residence Time and accuracy of the reconstruction. We propose an approach on how to automatically find a suitable value of the regularization parameter from the input data only. Tests on real data illustrate the potential of this method to analyze hydrological datasets.
Researches of fruit quality prediction model based on near infrared spectrum

NASA Astrophysics Data System (ADS)

Shen, Yulin; Li, Lian

2018-04-01

With the improvement in standards for food quality and safety, people pay more attention to the internal quality of fruits, therefore the measurement of fruit internal quality is increasingly imperative. In general, nondestructive soluble solid content (SSC) and total acid content (TAC) analysis of fruits is vital and effective for quality measurement in global fresh produce markets, so in this paper, we aim at establishing a novel fruit internal quality prediction model based on SSC and TAC for Near Infrared Spectrum. Firstly, the model of fruit quality prediction based on PCA + BP neural network, PCA + GRNN network, PCA + BP adaboost strong classifier, PCA + ELM and PCA + LS_SVM classifier are designed and implemented respectively; then, in the NSCT domain, the median filter and the SavitzkyGolay filter are used to preprocess the spectral signal, Kennard-Stone algorithm is used to automatically select the training samples and test samples; thirdly, we achieve the optimal models by comparing 15 kinds of prediction model based on the theory of multi-classifier competition mechanism, specifically, the non-parametric estimation is introduced to measure the effectiveness of proposed model, the reliability and variance of nonparametric estimation evaluation of each prediction model to evaluate the prediction result, while the estimated value and confidence interval regard as a reference, the experimental results demonstrate that this model can better achieve the optimal evaluation of the internal quality of fruit; finally, we employ cat swarm optimization to optimize two optimal models above obtained from nonparametric estimation, empirical testing indicates that the proposed method can provide more accurate and effective results than other forecasting methods.
"TNOs are Cool": A survey of the trans-Neptunian region. XIII. Statistical analysis of multiple trans-Neptunian objects observed with Herschel Space Observatory

NASA Astrophysics Data System (ADS)

Kovalenko, I. D.; Doressoundiram, A.; Lellouch, E.; Vilenius, E.; Müller, T.; Stansberry, J.

2017-11-01

Context. Gravitationally bound multiple systems provide an opportunity to estimate the mean bulk density of the objects, whereas this characteristic is not available for single objects. Being a primitive population of the outer solar system, binary and multiple trans-Neptunian objects (TNOs) provide unique information about bulk density and internal structure, improving our understanding of their formation and evolution. Aims: The goal of this work is to analyse parameters of multiple trans-Neptunian systems, observed with Herschel and Spitzer space telescopes. Particularly, statistical analysis is done for radiometric size and geometric albedo, obtained from photometric observations, and for estimated bulk density. Methods: We use Monte Carlo simulation to estimate the real size distribution of TNOs. For this purpose, we expand the dataset of diameters by adopting the Minor Planet Center database list with available values of the absolute magnitude therein, and the albedo distribution derived from Herschel radiometric measurements. We use the 2-sample Anderson-Darling non-parametric statistical method for testing whether two samples of diameters, for binary and single TNOs, come from the same distribution. Additionally, we use the Spearman's coefficient as a measure of rank correlations between parameters. Uncertainties of estimated parameters together with lack of data are taken into account. Conclusions about correlations between parameters are based on statistical hypothesis testing. Results: We have found that the difference in size distributions of multiple and single TNOs is biased by small objects. The test on correlations between parameters shows that the effective diameter of binary TNOs strongly correlates with heliocentric orbital inclination and with magnitude difference between components of binary system. The correlation between diameter and magnitude difference implies that small and large binaries are formed by different mechanisms. Furthermore, the statistical test indicates, although not significant with the sample size, that a moderately strong correlation exists between diameter and bulk density. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA.
Nonparametric instrumental regression with non-convex constraints

NASA Astrophysics Data System (ADS)

Grasmair, M.; Scherzer, O.; Vanhems, A.

2013-03-01

This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Estimators for Clustered Education RCTs Using the Neyman Model for Causal Inference

ERIC Educational Resources Information Center

Schochet, Peter Z.

2013-01-01

This article examines the estimation of two-stage clustered designs for education randomized control trials (RCTs) using the nonparametric Neyman causal inference framework that underlies experiments. The key distinction between the considered causal models is whether potential treatment and control group outcomes are considered to be fixed for…
Amazon plant diversity revealed by a taxonomically verified species list.

PubMed

Cardoso, Domingos; Särkinen, Tiina; Alexander, Sara; Amorim, André M; Bittrich, Volker; Celis, Marcela; Daly, Douglas C; Fiaschi, Pedro; Funk, Vicki A; Giacomin, Leandro L; Goldenberg, Renato; Heiden, Gustavo; Iganci, João; Kelloff, Carol L; Knapp, Sandra; Cavalcante de Lima, Haroldo; Machado, Anderson F P; Dos Santos, Rubens Manoel; Mello-Silva, Renato; Michelangeli, Fabián A; Mitchell, John; Moonlight, Peter; de Moraes, Pedro Luís Rodrigues; Mori, Scott A; Nunes, Teonildes Sacramento; Pennington, Terry D; Pirani, José Rubens; Prance, Ghillean T; de Queiroz, Luciano Paganucci; Rapini, Alessandro; Riina, Ricarda; Rincon, Carlos Alberto Vargas; Roque, Nádia; Shimizu, Gustavo; Sobral, Marcos; Stehmann, João Renato; Stevens, Warren D; Taylor, Charlotte M; Trovó, Marcelo; van den Berg, Cássio; van der Werff, Henk; Viana, Pedro Lage; Zartman, Charles E; Forzza, Rafaela Campostrini

2017-10-03

Recent debates on the number of plant species in the vast lowland rain forests of the Amazon have been based largely on model estimates, neglecting published checklists based on verified voucher data. Here we collate taxonomically verified checklists to present a list of seed plant species from lowland Amazon rain forests. Our list comprises 14,003 species, of which 6,727 are trees. These figures are similar to estimates derived from nonparametric ecological models, but they contrast strongly with predictions of much higher tree diversity derived from parametric models. Based on the known proportion of tree species in neotropical lowland rain forest communities as measured in complete plot censuses, and on overall estimates of seed plant diversity in Brazil and in the neotropics in general, it is more likely that tree diversity in the Amazon is closer to the lower estimates derived from nonparametric models. Much remains unknown about Amazonian plant diversity, but this taxonomically verified dataset provides a valid starting point for macroecological and evolutionary studies aimed at understanding the origin, evolution, and ecology of the exceptional biodiversity of Amazonian forests.
Estimation of a partially linear additive model for data from an outcome-dependent sampling design with a continuous outcome

PubMed Central

Tan, Ziwen; Qin, Guoyou; Zhou, Haibo

2016-01-01

Outcome-dependent sampling (ODS) designs have been well recognized as a cost-effective way to enhance study efficiency in both statistical literature and biomedical and epidemiologic studies. A partially linear additive model (PLAM) is widely applied in real problems because it allows for a flexible specification of the dependence of the response on some covariates in a linear fashion and other covariates in a nonlinear non-parametric fashion. Motivated by an epidemiological study investigating the effect of prenatal polychlorinated biphenyls exposure on children's intelligence quotient (IQ) at age 7 years, we propose a PLAM in this article to investigate a more flexible non-parametric inference on the relationships among the response and covariates under the ODS scheme. We propose the estimation method and establish the asymptotic properties of the proposed estimator. Simulation studies are conducted to show the improved efficiency of the proposed ODS estimator for PLAM compared with that from a traditional simple random sampling design with the same sample size. The data of the above-mentioned study is analyzed to illustrate the proposed method. PMID:27006375
Amazon plant diversity revealed by a taxonomically verified species list

PubMed Central

Cardoso, Domingos; Särkinen, Tiina; Alexander, Sara; Amorim, André M.; Bittrich, Volker; Celis, Marcela; Daly, Douglas C.; Fiaschi, Pedro; Funk, Vicki A.; Giacomin, Leandro L.; Heiden, Gustavo; Iganci, João; Kelloff, Carol L.; Knapp, Sandra; Cavalcante de Lima, Haroldo; Machado, Anderson F. P.; dos Santos, Rubens Manoel; Mello-Silva, Renato; Michelangeli, Fabián A.; Mitchell, John; Moonlight, Peter; de Moraes, Pedro Luís Rodrigues; Mori, Scott A.; Nunes, Teonildes Sacramento; Pennington, Terry D.; Pirani, José Rubens; Prance, Ghillean T.; de Queiroz, Luciano Paganucci; Rapini, Alessandro; Rincon, Carlos Alberto Vargas; Roque, Nádia; Shimizu, Gustavo; Sobral, Marcos; Stehmann, João Renato; Stevens, Warren D.; Taylor, Charlotte M.; Trovó, Marcelo; van den Berg, Cássio; van der Werff, Henk; Viana, Pedro Lage; Zartman, Charles E.; Forzza, Rafaela Campostrini

2017-01-01

Recent debates on the number of plant species in the vast lowland rain forests of the Amazon have been based largely on model estimates, neglecting published checklists based on verified voucher data. Here we collate taxonomically verified checklists to present a list of seed plant species from lowland Amazon rain forests. Our list comprises 14,003 species, of which 6,727 are trees. These figures are similar to estimates derived from nonparametric ecological models, but they contrast strongly with predictions of much higher tree diversity derived from parametric models. Based on the known proportion of tree species in neotropical lowland rain forest communities as measured in complete plot censuses, and on overall estimates of seed plant diversity in Brazil and in the neotropics in general, it is more likely that tree diversity in the Amazon is closer to the lower estimates derived from nonparametric models. Much remains unknown about Amazonian plant diversity, but this taxonomically verified dataset provides a valid starting point for macroecological and evolutionary studies aimed at understanding the origin, evolution, and ecology of the exceptional biodiversity of Amazonian forests. PMID:28923966
How to Deal with Interval-Censored Data Practically while Assessing the Progression-Free Survival: A Step-by-Step Guide Using SAS and R Software.

PubMed

Dugué, Audrey Emmanuelle; Pulido, Marina; Chabaud, Sylvie; Belin, Lisa; Gal, Jocelyn

2016-12-01

We describe how to estimate progression-free survival while dealing with interval-censored data in the setting of clinical trials in oncology. Three procedures with SAS and R statistical software are described: one allowing for a nonparametric maximum likelihood estimation of the survival curve using the EM-ICM (Expectation and Maximization-Iterative Convex Minorant) algorithm as described by Wellner and Zhan in 1997; a sensitivity analysis procedure in which the progression time is assigned (i) at the midpoint, (ii) at the upper limit (reflecting the standard analysis when the progression time is assigned at the first radiologic exam showing progressive disease), or (iii) at the lower limit of the censoring interval; and finally, two multiple imputations are described considering a uniform or the nonparametric maximum likelihood estimation (NPMLE) distribution. Clin Cancer Res; 22(23); 5629-35. ©2016 AACR. ©2016 American Association for Cancer Research.
Smoothing spline ANOVA frailty model for recurrent event data.

PubMed

Du, Pang; Jiang, Yihua; Wang, Yuedong

2011-12-01

Gap time hazard estimation is of particular interest in recurrent event data. This article proposes a fully nonparametric approach for estimating the gap time hazard. Smoothing spline analysis of variance (ANOVA) decompositions are used to model the log gap time hazard as a joint function of gap time and covariates, and general frailty is introduced to account for between-subject heterogeneity and within-subject correlation. We estimate the nonparametric gap time hazard function and parameters in the frailty distribution using a combination of the Newton-Raphson procedure, the stochastic approximation algorithm (SAA), and the Markov chain Monte Carlo (MCMC) method. The convergence of the algorithm is guaranteed by decreasing the step size of parameter update and/or increasing the MCMC sample size along iterations. Model selection procedure is also developed to identify negligible components in a functional ANOVA decomposition of the log gap time hazard. We evaluate the proposed methods with simulation studies and illustrate its use through the analysis of bladder tumor data. © 2011, The International Biometric Society.
Robust point matching via vector field consensus.

PubMed

Jiayi Ma; Ji Zhao; Jinwen Tian; Yuille, Alan L; Zhuowen Tu

2014-04-01

In this paper, we propose an efficient algorithm, called vector field consensus, for establishing robust point correspondences between two sets of points. Our algorithm starts by creating a set of putative correspondences which can contain a very large number of false correspondences, or outliers, in addition to a limited number of true correspondences (inliers). Next, we solve for correspondence by interpolating a vector field between the two point sets, which involves estimating a consensus of inlier points whose matching follows a nonparametric geometrical constraint. We formulate this a maximum a posteriori (MAP) estimation of a Bayesian model with hidden/latent variables indicating whether matches in the putative set are outliers or inliers. We impose nonparametric geometrical constraints on the correspondence, as a prior distribution, using Tikhonov regularizers in a reproducing kernel Hilbert space. MAP estimation is performed by the EM algorithm which by also estimating the variance of the prior model (initialized to a large value) is able to obtain good estimates very quickly (e.g., avoiding many of the local minima inherent in this formulation). We illustrate this method on data sets in 2D and 3D and demonstrate that it is robust to a very large number of outliers (even up to 90%). We also show that in the special case where there is an underlying parametric geometrical model (e.g., the epipolar line constraint) that we obtain better results than standard alternatives like RANSAC if a large number of outliers are present. This suggests a two-stage strategy, where we use our nonparametric model to reduce the size of the putative set and then apply a parametric variant of our approach to estimate the geometric parameters. Our algorithm is computationally efficient and we provide code for others to use it. In addition, our approach is general and can be applied to other problems, such as learning with a badly corrupted training data set.
Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure

PubMed Central

Berisha, Visar; Wisler, Alan; Hero, Alfred O.; Spanias, Andreas

2015-01-01

Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks. PMID:26807014
MEASURING DARK MATTER PROFILES NON-PARAMETRICALLY IN DWARF SPHEROIDALS: AN APPLICATION TO DRACO

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jardel, John R.; Gebhardt, Karl; Fabricius, Maximilian H.

2013-02-15

We introduce a novel implementation of orbit-based (or Schwarzschild) modeling that allows dark matter density profiles to be calculated non-parametrically in nearby galaxies. Our models require no assumptions to be made about velocity anisotropy or the dark matter profile. The technique can be applied to any dispersion-supported stellar system, and we demonstrate its use by studying the Local Group dwarf spheroidal galaxy (dSph) Draco. We use existing kinematic data at larger radii and also present 12 new radial velocities within the central 13 pc obtained with the VIRUS-W integral field spectrograph on the 2.7 m telescope at McDonald Observatory. Ourmore » non-parametric Schwarzschild models find strong evidence that the dark matter profile in Draco is cuspy for 20 {<=} r {<=} 700 pc. The profile for r {>=} 20 pc is well fit by a power law with slope {alpha} = -1.0 {+-} 0.2, consistent with predictions from cold dark matter simulations. Our models confirm that, despite its low baryon content relative to other dSphs, Draco lives in a massive halo.« less
Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma

PubMed Central

Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

2007-01-01

Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis. PMID:18305825
Iterative local Gaussian clustering for expressed genes identification linked to malignancy of human colorectal carcinoma.

PubMed

Wasito, Ito; Hashim, Siti Zaiton M; Sukmaningrum, Sri

2007-12-30

Gene expression profiling plays an important role in the identification of biological and clinical properties of human solid tumors such as colorectal carcinoma. Profiling is required to reveal underlying molecular features for diagnostic and therapeutic purposes. A non-parametric density-estimation-based approach called iterative local Gaussian clustering (ILGC), was used to identify clusters of expressed genes. We used experimental data from a previous study by Muro and others consisting of 1,536 genes in 100 colorectal cancer and 11 normal tissues. In this dataset, the ILGC finds three clusters, two large and one small gene clusters, similar to their results which used Gaussian mixture clustering. The correlation of each cluster of genes and clinical properties of malignancy of human colorectal cancer was analysed for the existence of tumor or normal, the existence of distant metastasis and the existence of lymph node metastasis.

Disjunctive Normal Shape and Appearance Priors with Applications to Image Segmentation.

PubMed

Mesadi, Fitsum; Cetin, Mujdat; Tasdizen, Tolga

2015-10-01

The use of appearance and shape priors in image segmentation is known to improve accuracy; however, existing techniques have several drawbacks. Active shape and appearance models require landmark points and assume unimodal shape and appearance distributions. Level set based shape priors are limited to global shape similarity. In this paper, we present a novel shape and appearance priors for image segmentation based on an implicit parametric shape representation called disjunctive normal shape model (DNSM). DNSM is formed by disjunction of conjunctions of half-spaces defined by discriminants. We learn shape and appearance statistics at varying spatial scales using nonparametric density estimation. Our method can generate a rich set of shape variations by locally combining training shapes. Additionally, by studying the intensity and texture statistics around each discriminant of our shape model, we construct a local appearance probability map. Experiments carried out on both medical and natural image datasets show the potential of the proposed method.
Phylogenetic relationships of the dwarf boas and a comparison of Bayesian and bootstrap measures of phylogenetic support.

PubMed

Wilcox, Thomas P; Zwickl, Derrick J; Heath, Tracy A; Hillis, David M

2002-11-01

Four New World genera of dwarf boas (Exiliboa, Trachyboa, Tropidophis, and Ungaliophis) have been placed by many systematists in a single group (traditionally called Tropidophiidae). However, the monophyly of this group has been questioned in several studies. Moreover, the overall relationships among basal snake lineages, including the placement of the dwarf boas, are poorly understood. We obtained mtDNA sequence data for 12S, 16S, and intervening tRNA-val genes from 23 species of snakes representing most major snake lineages, including all four genera of New World dwarf boas. We then examined the phylogenetic position of these species by estimating the phylogeny of the basal snakes. Our phylogenetic analysis suggests that New World dwarf boas are not monophyletic. Instead, we find Exiliboa and Ungaliophis to be most closely related to sand boas (Erycinae), boas (Boinae), and advanced snakes (Caenophidea), whereas Tropidophis and Trachyboa form an independent clade that separated relatively early in snake radiation. Our estimate of snake phylogeny differs significantly in other ways from some previous estimates of snake phylogeny. For instance, pythons do not cluster with boas and sand boas, but instead show a strong relationship with Loxocemus and Xenopeltis. Additionally, uropeltids cluster strongly with Cylindrophis, and together are embedded in what has previously been considered the macrostomatan radiation. These relationships are supported by both bootstrapping (parametric and nonparametric approaches) and Bayesian analysis, although Bayesian support values are consistently higher than those obtained from nonparametric bootstrapping. Simulations show that Bayesian support values represent much better estimates of phylogenetic accuracy than do nonparametric bootstrap support values, at least under the conditions of our study. Copyright 2002 Elsevier Science (USA)
A comparison of Hong Kong and United Kingdom SF-6D health states valuations using a nonparametric Bayesian method.

PubMed

Kharroubi, Samer A; Brazier, John E; McGhee, Sarah

2014-06-01

There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained in different countries. The present study applies a nonparametric model to estimate and compare two HK and UK standard gamble values for six-dimensional health state short form (derived from short-form 36 health survey) (SF-6D) health states using Bayesian methods. The data set is the HK and UK SF-6D valuation studies in which two samples of 197 and 249 states defined by the SF-6D were valued by representative samples of the HK and UK general populations, respectively, both using the standard gamble technique. We estimated a function applicable across both countries that explicitly accounts for the differences between them, and is estimated using the data from both countries. The results suggest that differences in SF-6D health state valuations between the UK and HK general populations are potentially important. In particular, the valuations of Hong Kong were meaningfully higher than those of the United Kingdom for most of the selected SF-6D health states. The magnitude of these country-specific differences in health state valuation depended, however, in a complex way on the levels of individual dimensions. The new Bayesian nonparametric method is a powerful approach for analyzing data from multiple nationalities or ethnic groups to understand the differences between them and potentially to estimate the underlying utility functions more efficiently. Copyright © 2014 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
Density-based empirical likelihood procedures for testing symmetry of data distributions and K-sample comparisons.

PubMed

Vexler, Albert; Tanajian, Hovig; Hutson, Alan D

In practice, parametric likelihood-ratio techniques are powerful statistical tools. In this article, we propose and examine novel and simple distribution-free test statistics that efficiently approximate parametric likelihood ratios to analyze and compare distributions of K groups of observations. Using the density-based empirical likelihood methodology, we develop a Stata package that applies to a test for symmetry of data distributions and compares K -sample distributions. Recognizing that recent statistical software packages do not sufficiently address K -sample nonparametric comparisons of data distributions, we propose a new Stata command, vxdbel, to execute exact density-based empirical likelihood-ratio tests using K samples. To calculate p -values of the proposed tests, we use the following methods: 1) a classical technique based on Monte Carlo p -value evaluations; 2) an interpolation technique based on tabulated critical values; and 3) a new hybrid technique that combines methods 1 and 2. The third, cutting-edge method is shown to be very efficient in the context of exact-test p -value computations. This Bayesian-type method considers tabulated critical values as prior information and Monte Carlo generations of test statistic values as data used to depict the likelihood function. In this case, a nonparametric Bayesian method is proposed to compute critical values of exact tests.
Chemical Mapping of the Milky Way with The Canada-France Imaging Survey: A Non-parametric Metallicity-Distance Decomposition of the Galaxy

NASA Astrophysics Data System (ADS)

Ibata, Rodrigo A.; McConnachie, Alan; Cuillandre, Jean-Charles; Fantin, Nicholas; Haywood, Misha; Martin, Nicolas F.; Bergeron, Pierre; Beckmann, Volker; Bernard, Edouard; Bonifacio, Piercarlo; Caffau, Elisabetta; Carlberg, Raymond; Côté, Patrick; Cabanac, Rémi; Chapman, Scott; Duc, Pierre-Alain; Durret, Florence; Famaey, Benoît; Fabbro, Sébastien; Gwyn, Stephen; Hammer, Francois; Hill, Vanessa; Hudson, Michael J.; Lançon, Ariane; Lewis, Geraint; Malhan, Khyati; di Matteo, Paola; McCracken, Henry; Mei, Simona; Mellier, Yannick; Navarro, Julio; Pires, Sandrine; Pritchet, Chris; Reylé, Celine; Richer, Harvey; Robin, Annie C.; Sánchez-Janssen, Rubén; Sawicki, Marcin; Scott, Douglas; Scottez, Vivien; Spekkens, Kristine; Starkenburg, Else; Thomas, Guillaume; Venn, Kim

2017-10-01

We present the chemical distribution of the Milky Way, based on 2900 {\\deg }2 of u-band photometry taken as part of the Canada-France Imaging Survey. When complete, this survey will cover 10,000 {\\deg }2 of the northern sky. By combing the CFHT u-band photometry together with Sloan Digital Sky Survey and Pan-STARRS g,r, and I, we demonstrate that we are able to reliably measure the metallicities of individual stars to ˜0.2 dex, and hence additionally obtain good photometric distance estimates. This survey thus permits the measurement of metallicities and distances of the dominant main-sequence (MS) population out to approximately 30 {kpc}, and provides a much higher number of stars at large extraplanar distances than have been available from previous surveys. We develop a non-parametric distance-metallicity decomposition algorithm and apply it to the sky at 30^\\circ < | b| < 70^\\circ and to the North Galactic Cap. We find that the metallicity-distance distribution is well-represented by three populations whose metallicity distributions do not vary significantly with vertical height above the disk. As traced in MS stars, the stellar halo component shows a vertical density profile that is close to exponential, with a scale height of around 3 {kpc}. This may indicate that the inner halo was formed partly from disk stars ejected in an ancient minor merger.
Verification of forecast ensembles in complex terrain including observation uncertainty

NASA Astrophysics Data System (ADS)

Dorninger, Manfred; Kloiber, Simon

2017-04-01

Traditionally, verification means to verify a forecast (ensemble) with the truth represented by observations. The observation errors are quite often neglected arguing that they are small when compared to the forecast error. In this study as part of the MesoVICT (Mesoscale Verification Inter-comparison over Complex Terrain) project it will be shown, that observation errors have to be taken into account for verification purposes. The observation uncertainty is estimated from the VERA (Vienna Enhanced Resolution Analysis) and represented via two analysis ensembles which are compared to the forecast ensemble. For the whole study results from COSMO-LEPS provided by Arpae-SIMC Emilia-Romagna are used as forecast ensemble. The time period covers the MesoVICT core case from 20-22 June 2007. In a first step, all ensembles are investigated concerning their distribution. Several tests have been executed (Kolmogorov-Smirnov-Test, Finkelstein-Schafer Test, Chi-Square Test etc.) showing no exact mathematical distribution. So the main focus is on non-parametric statistics (e.g. Kernel density estimation, Boxplots etc.) and also the deviation between "forced" normal distributed data and the kernel density estimations. In a next step the observational deviations due to the analysis ensembles are analysed. In a first approach scores are multiple times calculated with every single ensemble member from the analysis ensemble regarded as "true" observation. The results are presented as boxplots for the different scores and parameters. Additionally, the bootstrapping method is also applied to the ensembles. These possible approaches to incorporating observational uncertainty into the computation of statistics will be discussed in the talk.
A Simple Effect Size Estimator for Single Case Designs Using WinBUGS

ERIC Educational Resources Information Center

Rindskopf, David; Shadish, William; Hedges, Larry

2012-01-01

Data from single case designs (SCDs) have traditionally been analyzed by visual inspection rather than statistical models. As a consequence, effect sizes have been of little interest. Lately, some effect-size estimators have been proposed, but most are either (i) nonparametric, and/or (ii) based on an analogy incompatible with effect sizes from…
Model-based mean square error estimators for k-nearest neighbour predictions and applications using remotely sensed data for forest inventories

Treesearch

Steen Magnussen; Ronald E. McRoberts; Erkki O. Tomppo

2009-01-01

New model-based estimators of the uncertainty of pixel-level and areal k-nearest neighbour (knn) predictions of attribute Y from remotely-sensed ancillary data X are presented. Non-parametric functions predict Y from scalar 'Single Index Model' transformations of X. Variance functions generated...
Kernel-Smoothing Estimation of Item Characteristic Functions for Continuous Personality Items: An Empirical Comparison with the Linear and the Continuous-Response Models

ERIC Educational Resources Information Center

Ferrando, Pere J.

2004-01-01

This study used kernel-smoothing procedures to estimate the item characteristic functions (ICFs) of a set of continuous personality items. The nonparametric ICFs were compared with the ICFs estimated (a) by the linear model and (b) by Samejima's continuous-response model. The study was based on a conditioned approach and used an error-in-variables…
A review of methods to estimate cause-specific mortality in presence of competing risks

USGS Publications Warehouse

Heisey, Dennis M.; Patterson, Brent R.

2006-01-01

Estimating cause-specific mortality is often of central importance for understanding the dynamics of wildlife populations. Despite such importance, methodology for estimating and analyzing cause-specific mortality has received little attention in wildlife ecology during the past 20 years. The issue of analyzing cause-specific, mutually exclusive events in time is not unique to wildlife. In fact, this general problem has received substantial attention in human biomedical applications within the context of biostatistical survival analysis. Here, we consider cause-specific mortality from a modern biostatistical perspective. This requires carefully defining what we mean by cause-specific mortality and then providing an appropriate hazard-based representation as a competing risks problem. This leads to the general solution of cause-specific mortality as the cumulative incidence function (CIF). We describe the appropriate generalization of the fully nonparametric staggered-entry Kaplan–Meier survival estimator to cause-specific mortality via the nonparametric CIF estimator (NPCIFE), which in many situations offers an attractive alternative to the Heisey–Fuller estimator. An advantage of the NPCIFE is that it lends itself readily to risk factors analysis with standard software for Cox proportional hazards model. The competing risks–based approach also clarifies issues regarding another intuitive but erroneous "cause-specific mortality" estimator based on the Kaplan–Meier survival estimator and commonly seen in the life sciences literature.
Investigation of the dynamic stress–strain response of compressible polymeric foam using a non-parametric analysis

DOE PAGES

Koohbor, Behrad; Kidane, Addis; Lu, Wei -Yang; ...

2016-01-25

Dynamic stress–strain response of rigid closed-cell polymeric foams is investigated in this work by subjecting high toughness polyurethane foam specimens to direct impact with different projectile velocities and quantifying their deformation response with high speed stereo-photography together with 3D digital image correlation. The measured transient displacement field developed in the specimens during high stain rate loading is used to calculate the transient axial acceleration field throughout the specimen. A simple mathematical formulation based on conservation of mass is also proposed to determine the local change of density in the specimen during deformation. By obtaining the full-field acceleration and density distributions,more » the inertia stresses at each point in the specimen are determined through a non-parametric analysis and superimposed on the stress magnitudes measured at specimen ends to obtain the full-field stress distribution. Furthermore, the process outlined above overcomes a major challenge in high strain rate experiments with low impedance polymeric foam specimens, i.e. the delayed equilibrium conditions can be quantified.« less
JDINAC: joint density-based non-parametric differential interaction network analysis and classification using high-dimensional sparse omics data.

PubMed

Ji, Jiadong; He, Di; Feng, Yang; He, Yong; Xue, Fuzhong; Xie, Lei

2017-10-01

A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. R scripts available at https://github.com/jijiadong/JDINAC. lxie@iscb.org. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Pharmacokinetic modelling of intravenous tobramycin in adolescent and adult patients with cystic fibrosis using the nonparametric expectation maximization (NPEM) algorithm.

PubMed

Touw, D J; Vinks, A A; Neef, C

1997-06-01

The availability of personal computer programs for individualizing drug dosage regimens has stimulated the interest in modelling population pharmacokinetics. Data from 82 adolescent and adult patients with cystic fibrosis (CF) who were treated with intravenous tobramycin because of an exacerbation of their pulmonary infection were analysed with a non-parametric expectation maximization (NPEM) algorithm. This algorithm estimates the entire discrete joint probability density of the pharmacokinetic parameters. It also provides traditional parametric statistics such as the means, standard deviation, median, covariances and correlations among the various parameters. It also provides graphic-2- and 3-dimensional representations of the marginal densities of the parameters investigated. Several models for intravenous tobramycin in adolescent and adult patients with CF were compared. Covariates were total body weight (for the volume of distribution) and creatinine clearance (for the total body clearance and elimination rate). Because of lack of data on patients with poor renal function, restricted models with non-renal clearance and the non-renal elimination rate constant fixed at literature values of 0.15 L/h and 0.01 h-1 were also included. In this population, intravenous tobramycin could be best described by median (+/-dispersion factor) volume of distribution per unit of total body weight of 0.28 +/- 0.05 L/kg, elimination rate constant of 0.25 +/- 0.10 h-1 and elimination rate constant per unit of creatinine clearance of 0.0008 +/- 0.0009 h-1/(ml/min/1.73 m2). Analysis of populations of increasing size showed that using a restricted model with a non-renal elimination rate constant fixed at 0.01 h-1, a model based on a population of only 10 to 20 patients, contained parameter values similar to those of the entire population and, using the full model, a larger population (at least 40 patients) was needed.
Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

USGS Publications Warehouse

Granato, Gregory E.

2006-01-01

The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.
Change point detection of the Persian Gulf sea surface temperature

NASA Astrophysics Data System (ADS)

Shirvani, A.

2017-01-01

In this study, the Student's t parametric and Mann-Whitney nonparametric change point models (CPMs) were applied to detect change point in the annual Persian Gulf sea surface temperature anomalies (PGSSTA) time series for the period 1951-2013. The PGSSTA time series, which were serially correlated, were transformed to produce an uncorrelated pre-whitened time series. The pre-whitened PGSSTA time series were utilized as the input file of change point models. Both the applied parametric and nonparametric CPMs estimated the change point in the PGSSTA in 1992. The PGSSTA follow the normal distribution up to 1992 and thereafter, but with a different mean value after year 1992. The estimated slope of linear trend in PGSSTA time series for the period 1951-1992 was negative; however, that was positive after the detected change point. Unlike the PGSSTA, the applied CPMs suggested no change point in the Niño3.4SSTA time series.
Using exogenous variables in testing for monotonic trends in hydrologic time series

USGS Publications Warehouse

Alley, William M.

1988-01-01

One approach that has been used in performing a nonparametric test for monotonic trend in a hydrologic time series consists of a two-stage analysis. First, a regression equation is estimated for the variable being tested as a function of an exogenous variable. A nonparametric trend test such as the Kendall test is then performed on the residuals from the equation. By analogy to stagewise regression and through Monte Carlo experiments, it is demonstrated that this approach will tend to underestimate the magnitude of the trend and to result in some loss in power as a result of ignoring the interaction between the exogenous variable and time. An alternative approach, referred to as the adjusted variable Kendall test, is demonstrated to generally have increased statistical power and to provide more reliable estimates of the trend slope. In addition, the utility of including an exogenous variable in a trend test is examined under selected conditions.
Nonparametric estimation of stochastic differential equations with sparse Gaussian processes.

PubMed

García, Constantino A; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G

2017-08-01

The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.
On-Line Robust Modal Stability Prediction using Wavelet Processing

NASA Technical Reports Server (NTRS)

Brenner, Martin J.; Lind, Rick

1998-01-01

Wavelet analysis for filtering and system identification has been used to improve the estimation of aeroservoelastic stability margins. The conservatism of the robust stability margins is reduced with parametric and nonparametric time- frequency analysis of flight data in the model validation process. Nonparametric wavelet processing of data is used to reduce the effects of external disturbances and unmodeled dynamics. Parametric estimates of modal stability are also extracted using the wavelet transform. Computation of robust stability margins for stability boundary prediction depends on uncertainty descriptions derived from the data for model validation. The F-18 High Alpha Research Vehicle aeroservoelastic flight test data demonstrates improved robust stability prediction by extension of the stability boundary beyond the flight regime. Guidelines and computation times are presented to show the efficiency and practical aspects of these procedures for on-line implementation. Feasibility of the method is shown for processing flight data from time- varying nonstationary test points.
Probability machines: consistent probability estimation using nonparametric learning machines.

PubMed

Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

2012-01-01

Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
Estimating the number of tree species in forest populations using current vegetation survey and forest inventory and analysis approximation plots and grid intensities

Treesearch

Hans T. Schreuder; Jin-Mann S. Lin; John Teply

2000-01-01

We estimate number of tree species in National Forest populations using the nonparametric estimator. Data from the Current Vegetation Survey (CVS) of Region 6 of the USDA Forest Service were used to estimate the number of tree species with a plot close in size to the Forest Inventory and Analysis (FIA) plot and the actual CVS plot for the 5.5 km FIA grid and the 2.7 km...

Calcium and Dairy Products Consumption and Association with Total Hip Bone Mineral Density in Women from Kosovo

PubMed Central

Bahtiri, Elton; Islami, Hilmi; Hoxha, Rexhep; Bytyqi, Hasime Qorraj-; Sermaxhaj, Faton; Halimi, Enis

2014-01-01

Background and objective: There is paucity of evidence in southeastern Europe and Kosovo regarding dairy products consumption and association with bone mineral density (BMD). Therefore, the objective of present study was to assess calcium intake and dairy products consumption and to investigate relationship with total hip BMD in a Kosovo women sample. Methods: This cross-sectional study included a sample of 185 women divided into respective groups according to total hip BMD. All the study participants completed a food frequency questionnaire and underwent dual-energy X-ray absorptiometry (DEXA) to estimate BMD. Nonparametric tests were performed to compare characteristics of the groups. Results: The average dietary calcium intake was 818.41 mg/day. Only 16.75% of the subjects met calcium recommended dietary reference intakes (DRIs). There were no significant differences between low BMD group and normal BMD group regarding average dietary calcium intake, but it was significantly higher in BMDT3 subgroup than in BMDT2 and BMDT1 subgroups. Conclusions: The results of this study demonstrate significant relationship of daily dietary calcium intake with upper BMD tertile. Further initiatives are warranted from this study to highlight the importance of nutrition education. PMID:25568548
Estimating areal means and variances of forest attributes using the k-Nearest Neighbors technique and satellite imagery

Treesearch

Ronald E. McRoberts; Erkki O. Tomppo; Andrew O. Finley; Heikkinen Juha

2007-01-01

The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest...
Statistical Theory for the "RCT-YES" Software: Design-Based Causal Inference for RCTs. NCEE 2015-4011

ERIC Educational Resources Information Center

Schochet, Peter Z.

2015-01-01

This report presents the statistical theory underlying the "RCT-YES" software that estimates and reports impacts for RCTs for a wide range of designs used in social policy research. The report discusses a unified, non-parametric design-based approach for impact estimation using the building blocks of the Neyman-Rubin-Holland causal…
Bayesian nonparametric estimation of EQ-5D utilities for United States using the existing United Kingdom data.

PubMed

Kharroubi, Samer A

2017-10-06

Valuations of health state descriptors such as EQ-5D or SF6D have been conducted in different countries. There is a scope to make use of the results in one country as informative priors to help with the analysis of a study in another, for this to enable better estimation to be obtained in the new country than analyzing its data separately. Data from 2 EQ-5D valuation studies were analyzed using the time trade-off technique, where values for 42 health states were devised from representative samples of the UK and US populations. A Bayesian non-parametric approach has been applied to predict the health utilities of the US population, where the UK results were used as informative priors in the model to improve their estimation. The findings showed that employing additional information from the UK data helped in the production of US utility estimates much more precisely than would have been possible using the US study data alone. It is very plausible that this method would serve useful in countries where the conduction of large evaluation studies is not very feasible.
Non-Parametric Collision Probability for Low-Velocity Encounters

NASA Technical Reports Server (NTRS)

Carpenter, J. Russell

2007-01-01

An implicit, but not necessarily obvious, assumption in all of the current techniques for assessing satellite collision probability is that the relative position uncertainty is perfectly correlated in time. If there is any mis-modeling of the dynamics in the propagation of the relative position error covariance matrix, time-wise de-correlation of the uncertainty will increase the probability of collision over a given time interval. The paper gives some examples that illustrate this point. This paper argues that, for the present, Monte Carlo analysis is the best available tool for handling low-velocity encounters, and suggests some techniques for addressing the issues just described. One proposal is for the use of a non-parametric technique that is widely used in actuarial and medical studies. The other suggestion is that accurate process noise models be used in the Monte Carlo trials to which the non-parametric estimate is applied. A further contribution of this paper is a description of how the time-wise decorrelation of uncertainty increases the probability of collision.
Nonparametric evaluation of birth cohort trends in disease rates.

PubMed

Tarone, R E; Chu, K C

2000-01-01

Although interpretation of age-period-cohort analyses is complicated by the non-identifiability of maximum likelihood estimates, changes in the slope of the birth-cohort effect curve are identifiable and have potential aetiologic significance. A nonparametric test for a change in the slope of the birth-cohort trend has been developed. The test is a generalisation of the sign test and is based on permutational distributions. A method for identifying interactions between age and calendar-period effects is also presented. The nonparametric method is shown to be powerful in detecting changes in the slope of the birth-cohort trend, although its power can be reduced considerably by calendar-period patterns of risk. The method identifies a previously unidentified decrease in the birth-cohort risk of lung-cancer mortality from 1912 to 1919, which appears to reflect a reduction in the initiation of smoking by young men at the beginning of the Great Depression (1930s). The method also detects an interaction between age and calendar period in leukemia mortality rates, reflecting the better response of children to chemotherapy. The proposed nonparametric method provides a data analytic approach, which is a useful adjunct to log-linear Poisson analysis of age-period-cohort models, either in the initial model building stage, or in the final interpretation stage.
Water quality assessment by means of HFNI valvometry and high-frequency data modeling.

PubMed

Sow, Mohamedou; Durrieu, Gilles; Briollais, Laurent; Ciret, Pierre; Massabuau, Jean-Charles

2011-11-01

The high-frequency measurements of valve activity in bivalves (e.g., valvometry) over a long period of time and in various environmental conditions allow a very accurate study of their behaviors as well as a global analysis of possible perturbations due to the environment. Valvometry uses the bivalve's ability to close its shell when exposed to a contaminant or other abnormal environmental conditions as an alarm to indicate possible perturbations in the environment. The modeling of such high-frequency serial valvometry data is statistically challenging, and here, a nonparametric approach based on kernel estimation is proposed. This method has the advantage of summarizing complex data into a simple density profile obtained from each animal at every 24-h period to ultimately make inference about time effect and external conditions on this profile. The statistical properties of the estimator are presented. Through an application to a sample of 16 oysters living in the Bay of Arcachon (France), we demonstrate that this method can be used to first estimate the normal biological rhythms of permanently immersed oysters and second to detect perturbations of these rhythms due to changes in their environment. We anticipate that this approach could have an important contribution to the survey of aquatic systems.
Estimating and comparing microbial diversity in the presence of sequencing errors

PubMed Central

Chiu, Chun-Huo

2016-01-01

Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures’ emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes. PMID:26855872
Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

PubMed

Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

2017-01-01

Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.
Determination of the minimal clinically important difference for seven fatigue measures in rheumatoid arthritis

PubMed Central

Pouchot, Jacques; Kherani, Raheem B.; Brant, Rollin; Lacaille, Diane; Lehman, Allen J.; Ensworth, Stephanie; Kopec, Jacek; Esdaile, John M.; Liang, Matthew H.

2008-01-01

Objective To estimate the minimal clinically important difference (MCID) of seven measures of fatigue in rheumatoid arthritis. Study Design and Setting A cross-sectional study design based on inter-individual comparisons was used. Six to eight subjects participated in a single meeting and completed seven fatigue questionnaires (nine sessions were organized and 61 subjects participated). After completion of the questionnaires, the subjects had five one-on-one 10-minute conversations with different people in the group to discuss their fatigue. After each conversation, each patient compared their fatigue to their conversational partner’s on a global rating. Ratings were compared to the scores of the fatigue measures to estimate the MCID. Both non-parametric and linear regression analyses were used. Results Non-parametric estimates for the MCID relative to “little more fatigue” tended to be smaller than those for “little less fatigue”. The global MCIDs estimated by linear regression were: FSS 20.2, VT 14.8, MAF 18.7, MFI 16.6, FACIT–F 15.9, CFS 9.9, RS 19.7, for normalized scores (0 to 100). The standardized MCIDs for the seven measures were roughly similar (0.67 to 0.76). Conclusion These estimates of MCID will help to interpret changes observed in a fatigue score and will be critical in estimating sample size requirements. PMID:18359189
Nonparametric Conditional Estimation

DTIC Science & Technology

1987-02-01

the data because the statistician has complete control over the method. It is especially reasonable when there is a bone fide loss function to which...For example, the sample mean is m(Fn). Most calculations that statisticians perform on a set of data can be expressed as statistical functionals on...of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering
Interval Estimation of Seismic Hazard Parameters

NASA Astrophysics Data System (ADS)

Orlecka-Sikora, Beata; Lasocki, Stanislaw

2017-03-01

The paper considers Poisson temporal occurrence of earthquakes and presents a way to integrate uncertainties of the estimates of mean activity rate and magnitude cumulative distribution function in the interval estimation of the most widely used seismic hazard functions, such as the exceedance probability and the mean return period. The proposed algorithm can be used either when the Gutenberg-Richter model of magnitude distribution is accepted or when the nonparametric estimation is in use. When the Gutenberg-Richter model of magnitude distribution is used the interval estimation of its parameters is based on the asymptotic normality of the maximum likelihood estimator. When the nonparametric kernel estimation of magnitude distribution is used, we propose the iterated bias corrected and accelerated method for interval estimation based on the smoothed bootstrap and second-order bootstrap samples. The changes resulted from the integrated approach in the interval estimation of the seismic hazard functions with respect to the approach, which neglects the uncertainty of the mean activity rate estimates have been studied using Monte Carlo simulations and two real dataset examples. The results indicate that the uncertainty of mean activity rate affects significantly the interval estimates of hazard functions only when the product of activity rate and the time period, for which the hazard is estimated, is no more than 5.0. When this product becomes greater than 5.0, the impact of the uncertainty of cumulative distribution function of magnitude dominates the impact of the uncertainty of mean activity rate in the aggregated uncertainty of the hazard functions. Following, the interval estimates with and without inclusion of the uncertainty of mean activity rate converge. The presented algorithm is generic and can be applied also to capture the propagation of uncertainty of estimates, which are parameters of a multiparameter function, onto this function.
Spatial hydrological drought characteristics in Karkheh River basin, southwest Iran using copulas

NASA Astrophysics Data System (ADS)

Dodangeh, Esmaeel; Shahedi, Kaka; Shiau, Jenq-Tzong; MirAkbari, Maryam

2017-08-01

Investigation on drought characteristics such as severity, duration, and frequency is crucial for water resources planning and management in a river basin. While the methodology for multivariate drought frequency analysis is well established by applying the copulas, the estimation on the associated parameters by various parameter estimation methods and the effects on the obtained results have not yet been investigated. This research aims at conducting a comparative analysis between the maximum likelihood parametric and non-parametric method of the Kendall τ estimation method for copulas parameter estimation. The methods were employed to study joint severity-duration probability and recurrence intervals in Karkheh River basin (southwest Iran) which is facing severe water-deficit problems. Daily streamflow data at three hydrological gauging stations (Tang Sazbon, Huleilan and Polchehr) near the Karkheh dam were used to draw flow duration curves (FDC) of these three stations. The Q_{75} index extracted from the FDC were set as threshold level to abstract drought characteristics such as drought duration and severity on the basis of the run theory. Drought duration and severity were separately modeled using the univariate probabilistic distributions and gamma-GEV, LN2-exponential, and LN2-gamma were selected as the best paired drought severity-duration inputs for copulas according to the Akaike Information Criteria (AIC), Kolmogorov-Smirnov and chi-square tests. Archimedean Clayton, Frank, and extreme value Gumbel copulas were employed to construct joint cumulative distribution functions (JCDF) of droughts for each station. Frank copula at Tang Sazbon and Gumbel at Huleilan and Polchehr stations were identified as the best copulas based on the performance evaluation criteria including AIC, BIC, log-likelihood and root mean square error (RMSE) values. Based on the RMSE values, nonparametric Kendall-τ is preferred to the parametric maximum likelihood estimation method. The results showed greater drought return periods by the parametric ML method in comparison to the nonparametric Kendall τ estimation method. The results also showed that stations located in tributaries (Huleilan and Polchehr) have close return periods, while the station along the main river (Tang Sazbon) has the smaller return periods for the drought events with identical drought duration and severity.
Nonparametric Residue Analysis of Dynamic PET Data With Application to Cerebral FDG Studies in Normals.

PubMed

O'Sullivan, Finbarr; Muzi, Mark; Spence, Alexander M; Mankoff, David M; O'Sullivan, Janet N; Fitzgerald, Niall; Newman, George C; Krohn, Kenneth A

2009-06-01

Kinetic analysis is used to extract metabolic information from dynamic positron emission tomography (PET) uptake data. The theory of indicator dilutions, developed in the seminal work of Meier and Zierler (1954), provides a probabilistic framework for representation of PET tracer uptake data in terms of a convolution between an arterial input function and a tissue residue. The residue is a scaled survival function associated with tracer residence in the tissue. Nonparametric inference for the residue, a deconvolution problem, provides a novel approach to kinetic analysis-critically one that is not reliant on specific compartmental modeling assumptions. A practical computational technique based on regularized cubic B-spline approximation of the residence time distribution is proposed. Nonparametric residue analysis allows formal statistical evaluation of specific parametric models to be considered. This analysis needs to properly account for the increased flexibility of the nonparametric estimator. The methodology is illustrated using data from a series of cerebral studies with PET and fluorodeoxyglucose (FDG) in normal subjects. Comparisons are made between key functionals of the residue, tracer flux, flow, etc., resulting from a parametric (the standard two-compartment of Phelps et al. 1979) and a nonparametric analysis. Strong statistical evidence against the compartment model is found. Primarily these differences relate to the representation of the early temporal structure of the tracer residence-largely a function of the vascular supply network. There are convincing physiological arguments against the representations implied by the compartmental approach but this is the first time that a rigorous statistical confirmation using PET data has been reported. The compartmental analysis produces suspect values for flow but, notably, the impact on the metabolic flux, though statistically significant, is limited to deviations on the order of 3%-4%. The general advantage of the nonparametric residue analysis is the ability to provide a valid kinetic quantitation in the context of studies where there may be heterogeneity or other uncertainty about the accuracy of a compartmental model approximation of the tissue residue.
Short-term monitoring of benzene air concentration in an urban area: a preliminary study of application of Kruskal-Wallis non-parametric test to assess pollutant impact on global environment and indoor.

PubMed

Mura, Maria Chiara; De Felice, Marco; Morlino, Roberta; Fuselli, Sergio

2010-01-01

In step with the need to develop statistical procedures to manage small-size environmental samples, in this work we have used concentration values of benzene (C6H6), concurrently detected by seven outdoor and indoor monitoring stations over 12 000 minutes, in order to assess the representativeness of collected data and the impact of the pollutant on indoor environment. Clearly, the former issue is strictly connected to sampling-site geometry, which proves critical to correctly retrieving information from analysis of pollutants of sanitary interest. Therefore, according to current criteria for network-planning, single stations have been interpreted as nodes of a set of adjoining triangles; then, a) node pairs have been taken into account in order to estimate pollutant stationarity on triangle sides, as well as b) node triplets, to statistically associate data from air-monitoring with the corresponding territory area, and c) node sextuplets, to assess the impact probability of the outdoor pollutant on indoor environment for each area. Distributions from the various node combinations are all non-Gaussian, in the consequently, Kruskal-Wallis (KW) non-parametric statistics has been exploited to test variability on continuous density function from each pair, triplet and sextuplet. Results from the above-mentioned statistical analysis have shown randomness of site selection, which has not allowed a reliable generalization of monitoring data to the entire selected territory, except for a single "forced" case (70%); most important, they suggest a possible procedure to optimize network design.
Examining the Feasibility and Utility of Estimating Partial Expected Value of Perfect Information (via a Nonparametric Approach) as Part of the Reimbursement Decision-Making Process in Ireland: Application to Drugs for Cancer.

PubMed

McCullagh, Laura; Schmitz, Susanne; Barry, Michael; Walsh, Cathal

2017-11-01

In Ireland, all new drugs for which reimbursement by the healthcare payer is sought undergo a health technology assessment by the National Centre for Pharmacoeconomics. The National Centre for Pharmacoeconomics estimate expected value of perfect information but not partial expected value of perfect information (owing to computational expense associated with typical methodologies). The objective of this study was to examine the feasibility and utility of estimating partial expected value of perfect information via a computationally efficient, non-parametric regression approach. This was a retrospective analysis of evaluations on drugs for cancer that had been submitted to the National Centre for Pharmacoeconomics (January 2010 to December 2014 inclusive). Drugs were excluded if cost effective at the submitted price. Drugs were excluded if concerns existed regarding the validity of the applicants' submission or if cost-effectiveness model functionality did not allow required modifications to be made. For each included drug (n = 14), value of information was estimated at the final reimbursement price, at a threshold equivalent to the incremental cost-effectiveness ratio at that price. The expected value of perfect information was estimated from probabilistic analysis. Partial expected value of perfect information was estimated via a non-parametric approach. Input parameters with a population value at least €1 million were identified as potential targets for research. All partial estimates were determined within minutes. Thirty parameters (across nine models) each had a value of at least €1 million. These were categorised. Collectively, survival analysis parameters were valued at €19.32 million, health state utility parameters at €15.81 million and parameters associated with the cost of treating adverse effects at €6.64 million. Those associated with drug acquisition costs and with the cost of care were valued at €6.51 million and €5.71 million, respectively. This research demonstrates that the estimation of partial expected value of perfect information via this computationally inexpensive approach could be considered feasible as part of the health technology assessment process for reimbursement purposes within the Irish healthcare system. It might be a useful tool in prioritising future research to decrease decision uncertainty.
Hidden Markov models and neural networks for fault detection in dynamic systems

NASA Technical Reports Server (NTRS)

Smyth, Padhraic

1994-01-01

Neural networks plus hidden Markov models (HMM) can provide excellent detection and false alarm rate performance in fault detection applications, as shown in this viewgraph presentation. Modified models allow for novelty detection. Key contributions of neural network models are: (1) excellent nonparametric discrimination capability; (2) a good estimator of posterior state probabilities, even in high dimensions, and thus can be embedded within overall probabilistic model (HMM); and (3) simple to implement compared to other nonparametric models. Neural network/HMM monitoring model is currently being integrated with the new Deep Space Network (DSN) antenna controller software and will be on-line monitoring a new DSN 34-m antenna (DSS-24) by July, 1994.
Proceedings of the Third Annual Symposium on Mathematical Pattern Recognition and Image Analysis

NASA Technical Reports Server (NTRS)

Guseman, L. F., Jr.

1985-01-01

Topics addressed include: multivariate spline method; normal mixture analysis applied to remote sensing; image data analysis; classifications in spatially correlated environments; probability density functions; graphical nonparametric methods; subpixel registration analysis; hypothesis integration in image understanding systems; rectification of satellite scanner imagery; spatial variation in remotely sensed images; smooth multidimensional interpolation; and optimal frequency domain textural edge detection filters.
Reference interval estimation: Methodological comparison using extensive simulations and empirical data.

PubMed

Daly, Caitlin H; Higgins, Victoria; Adeli, Khosrow; Grey, Vijay L; Hamid, Jemila S

2017-12-01

To statistically compare and evaluate commonly used methods of estimating reference intervals and to determine which method is best based on characteristics of the distribution of various data sets. Three approaches for estimating reference intervals, i.e. parametric, non-parametric, and robust, were compared with simulated Gaussian and non-Gaussian data. The hierarchy of the performances of each method was examined based on bias and measures of precision. The findings of the simulation study were illustrated through real data sets. In all Gaussian scenarios, the parametric approach provided the least biased and most precise estimates. In non-Gaussian scenarios, no single method provided the least biased and most precise estimates for both limits of a reference interval across all sample sizes, although the non-parametric approach performed the best for most scenarios. The hierarchy of the performances of the three methods was only impacted by sample size and skewness. Differences between reference interval estimates established by the three methods were inflated by variability. Whenever possible, laboratories should attempt to transform data to a Gaussian distribution and use the parametric approach to obtain the most optimal reference intervals. When this is not possible, laboratories should consider sample size and skewness as factors in their choice of reference interval estimation method. The consequences of false positives or false negatives may also serve as factors in this decision. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Kernel PLS Estimation of Single-trial Event-related Potentials

NASA Technical Reports Server (NTRS)

Rosipal, Roman; Trejo, Leonard J.

2004-01-01

Nonlinear kernel partial least squaes (KPLS) regressior, is a novel smoothing approach to nonparametric regression curve fitting. We have developed a KPLS approach to the estimation of single-trial event related potentials (ERPs). For improved accuracy of estimation, we also developed a local KPLS method for situations in which there exists prior knowledge about the approximate latency of individual ERP components. To assess the utility of the KPLS approach, we compared non-local KPLS and local KPLS smoothing with other nonparametric signal processing and smoothing methods. In particular, we examined wavelet denoising, smoothing splines, and localized smoothing splines. We applied these methods to the estimation of simulated mixtures of human ERPs and ongoing electroencephalogram (EEG) activity using a dipole simulator (BESA). In this scenario we considered ongoing EEG to represent spatially and temporally correlated noise added to the ERPs. This simulation provided a reasonable but simplified model of real-world ERP measurements. For estimation of the simulated single-trial ERPs, local KPLS provided a level of accuracy that was comparable with or better than the other methods. We also applied the local KPLS method to the estimation of human ERPs recorded in an experiment on co,onitive fatigue. For these data, the local KPLS method provided a clear improvement in visualization of single-trial ERPs as well as their averages. The local KPLS method may serve as a new alternative to the estimation of single-trial ERPs and improvement of ERP averages.

One-shot estimate of MRMC variance: AUC.

PubMed

Gallas, Brandon D

2006-03-01

One popular study design for estimating the area under the receiver operating characteristic curve (AUC) is the one in which a set of readers reads a set of cases: a fully crossed design in which every reader reads every case. The variability of the subsequent reader-averaged AUC has two sources: the multiple readers and the multiple cases (MRMC). In this article, we present a nonparametric estimate for the variance of the reader-averaged AUC that is unbiased and does not use resampling tools. The one-shot estimate is based on the MRMC variance derived by the mechanistic approach of Barrett et al. (2005), as well as the nonparametric variance of a single-reader AUC derived in the literature on U statistics. We investigate the bias and variance properties of the one-shot estimate through a set of Monte Carlo simulations with simulated model observers and images. The different simulation configurations vary numbers of readers and cases, amounts of image noise and internal noise, as well as how the readers are constructed. We compare the one-shot estimate to a method that uses the jackknife resampling technique with an analysis of variance model at its foundation (Dorfman et al. 1992). The name one-shot highlights that resampling is not used. The one-shot and jackknife estimators behave similarly, with the one-shot being marginally more efficient when the number of cases is small. We have derived a one-shot estimate of the MRMC variance of AUC that is based on a probabilistic foundation with limited assumptions, is unbiased, and compares favorably to an established estimate.
Estimation of variance in Cox's regression model with shared gamma frailties.

PubMed

Andersen, P K; Klein, J P; Knudsen, K M; Tabanera y Palacios, R

1997-12-01

The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.
About an adaptively weighted Kaplan-Meier estimate.

PubMed

Plante, Jean-François

2009-09-01

The minimum averaged mean squared error nonparametric adaptive weights use data from m possibly different populations to infer about one population of interest. The definition of these weights is based on the properties of the empirical distribution function. We use the Kaplan-Meier estimate to let the weights accommodate right-censored data and use them to define the weighted Kaplan-Meier estimate. The proposed estimate is smoother than the usual Kaplan-Meier estimate and converges uniformly in probability to the target distribution. Simulations show that the performances of the weighted Kaplan-Meier estimate on finite samples exceed that of the usual Kaplan-Meier estimate. A case study is also presented.
A Semi-parametric Transformation Frailty Model for Semi-competing Risks Survival Data

PubMed Central

Jiang, Fei; Haneuse, Sebastien

2016-01-01

In the analysis of semi-competing risks data interest lies in estimation and inference with respect to a so-called non-terminal event, the observation of which is subject to a terminal event. Multi-state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non-terminal and terminal events specified, in part, by a unit-specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, σ2. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi-parametric efficient score under the complete data setting and propose a non-parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small-sample operating characteristics evaluated via simulation. Although the proposed semi-parametric transformation model and non-parametric score imputation method are motivated by the analysis of semi-competing risks data, they are broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer. PMID:28439147
Survival curve estimation with dependent left truncated data using Cox's model.

PubMed

Mackenzie, Todd

2012-10-19

The Kaplan-Meier and closely related Lynden-Bell estimators are used to provide nonparametric estimation of the distribution of a left-truncated random variable. These estimators assume that the left-truncation variable is independent of the time-to-event. This paper proposes a semiparametric method for estimating the marginal distribution of the time-to-event that does not require independence. It models the conditional distribution of the time-to-event given the truncation variable using Cox's model for left truncated data, and uses inverse probability weighting. We report the results of simulations and illustrate the method using a survival study.
Statistical methods for estimating normal blood chemistry ranges and variance in rainbow trout (Salmo gairdneri), Shasta Strain

USGS Publications Warehouse

Wedemeyer, Gary A.; Nelson, Nancy C.

1975-01-01

Gaussian and nonparametric (percentile estimate and tolerance interval) statistical methods were used to estimate normal ranges for blood chemistry (bicarbonate, bilirubin, calcium, hematocrit, hemoglobin, magnesium, mean cell hemoglobin concentration, osmolality, inorganic phosphorus, and pH for juvenile rainbow (Salmo gairdneri, Shasta strain) trout held under defined environmental conditions. The percentile estimate and Gaussian methods gave similar normal ranges, whereas the tolerance interval method gave consistently wider ranges for all blood variables except hemoglobin. If the underlying frequency distribution is unknown, the percentile estimate procedure would be the method of choice.
Design of adaptive control systems by means of self-adjusting transversal filters

NASA Technical Reports Server (NTRS)

Merhav, S. J.

1986-01-01

The design of closed-loop adaptive control systems based on nonparametric identification was addressed. Implementation is by self-adjusting Least Mean Square (LMS) transversal filters. The design concept is Model Reference Adaptive Control (MRAC). Major issues are to preserve the linearity of the error equations of each LMS filter, and to prevent estimation bias that is due to process or measurement noise, thus providing necessary conditions for the convergence and stability of the control system. The controlled element is assumed to be asymptotically stable and minimum phase. Because of the nonparametric Finite Impulse Response (FIR) estimates provided by the LMS filters, a-priori information on the plant model is needed only in broad terms. Following a survey of control system configurations and filter design considerations, system implementation is shown here in Single Input Single Output (SISO) format which is readily extendable to multivariable forms. In extensive computer simulation studies the controlled element is represented by a second-order system with widely varying damping, natural frequency, and relative degree.
Nonparametric methods for analyzing recurrent gap time data with application to infections after hematopoietic cell transplant.

PubMed

Lee, Chi Hyun; Luo, Xianghua; Huang, Chiung-Yu; DeFor, Todd E; Brunstein, Claudio G; Weisdorf, Daniel J

2016-06-01

Infection is one of the most common complications after hematopoietic cell transplantation. Many patients experience infectious complications repeatedly after transplant. Existing statistical methods for recurrent gap time data typically assume that patients are enrolled due to the occurrence of an event of interest, and subsequently experience recurrent events of the same type; moreover, for one-sample estimation, the gap times between consecutive events are usually assumed to be identically distributed. Applying these methods to analyze the post-transplant infection data will inevitably lead to incorrect inferential results because the time from transplant to the first infection has a different biological meaning than the gap times between consecutive recurrent infections. Some unbiased yet inefficient methods include univariate survival analysis methods based on data from the first infection or bivariate serial event data methods based on the first and second infections. In this article, we propose a nonparametric estimator of the joint distribution of time from transplant to the first infection and the gap times between consecutive infections. The proposed estimator takes into account the potentially different distributions of the two types of gap times and better uses the recurrent infection data. Asymptotic properties of the proposed estimators are established. © 2015, The International Biometric Society.
Nonparametric methods for analyzing recurrent gap time data with application to infections after hematopoietic cell transplant

PubMed Central

Lee, Chi Hyun; Huang, Chiung-Yu; DeFor, Todd E.; Brunstein, Claudio G.; Weisdorf, Daniel J.

2015-01-01

Summary Infection is one of the most common complications after hematopoietic cell transplantation. Many patients experience infectious complications repeatedly after transplant. Existing statistical methods for recurrent gap time data typically assume that patients are enrolled due to the occurrence of an event of interest, and subsequently experience recurrent events of the same type; moreover, for one-sample estimation, the gap times between consecutive events are usually assumed to be identically distributed. Applying these methods to analyze the post-transplant infection data will inevitably lead to incorrect inferential results because the time from transplant to the first infection has a different biological meaning than the gap times between consecutive recurrent infections. Some unbiased yet inefficient methods include univariate survival analysis methods based on data from the first infection or bivariate serial event data methods based on the first and second infections. In this paper, we propose a nonparametric estimator of the joint distribution of time from transplant to the first infection and the gap times between consecutive infections. The proposed estimator takes into account the potentially different distributions of the two types of gap times and better uses the recurrent infection data. Asymptotic properties of the proposed estimators are established. PMID:26575402
A Comparison of Japan and U.K. SF-6D Health-State Valuations Using a Non-Parametric Bayesian Method.

PubMed

Kharroubi, Samer A

2015-08-01

There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained in different countries. We sought to estimate and compare two directly elicited valuations for SF-6D health states between the Japan and U.K. general adult populations using Bayesian methods. We analysed data from two SF-6D valuation studies where, using similar standard gamble protocols, values for 241 and 249 states were elicited from representative samples of the Japan and U.K. general adult populations, respectively. We estimate a function applicable across both countries that explicitly accounts for the differences between them, and is estimated using data from both countries. The results suggest that differences in SF-6D health-state valuations between the Japan and U.K. general populations are potentially important. The magnitude of these country-specific differences in health-state valuation depended, however, in a complex way on the levels of individual dimensions. The new Bayesian non-parametric method is a powerful approach for analysing data from multiple nationalities or ethnic groups, to understand the differences between them and potentially to estimate the underlying utility functions more efficiently.
A consistent NPMLE of the joint distribution function with competing risks data under the dependent masking and right-censoring model.

PubMed

Li, Jiahui; Yu, Qiqing

2016-01-01

Dinse (Biometrics, 38:417-431, 1982) provides a special type of right-censored and masked competing risks data and proposes a non-parametric maximum likelihood estimator (NPMLE) and a pseudo MLE of the joint distribution function [Formula: see text] with such data. However, their asymptotic properties have not been studied so far. Under the extention of either the conditional masking probability (CMP) model or the random partition masking (RPM) model (Yu and Li, J Nonparametr Stat 24:753-764, 2012), we show that (1) Dinse's estimators are consistent if [Formula: see text] takes on finitely many values and each point in the support set of [Formula: see text] can be observed; (2) if the failure time is continuous, the NPMLE is not uniquely determined, and the standard approach (which puts weights only on one element in each observed set) leads to an inconsistent NPMLE; (3) in general, Dinse's estimators are not consistent even under the discrete assumption; (4) we construct a consistent NPMLE. The consistency is given under a new model called dependent masking and right-censoring model. The CMP model and the RPM model are indeed special cases of the new model. We compare our estimator to Dinse's estimators through simulation and real data. Simulation study indicates that the consistent NPMLE is a good approximation to the underlying distribution for moderate sample sizes.
A Robust Step Detection Algorithm and Walking Distance Estimation Based on Daily Wrist Activity Recognition Using a Smart Band.

PubMed

Trong Bui, Duong; Nguyen, Nhan Duc; Jeong, Gu-Min

2018-06-25

Human activity recognition and pedestrian dead reckoning are an interesting field because of their importance utilities in daily life healthcare. Currently, these fields are facing many challenges, one of which is the lack of a robust algorithm with high performance. This paper proposes a new method to implement a robust step detection and adaptive distance estimation algorithm based on the classification of five daily wrist activities during walking at various speeds using a smart band. The key idea is that the non-parametric adaptive distance estimator is performed after two activity classifiers and a robust step detector. In this study, two classifiers perform two phases of recognizing five wrist activities during walking. Then, a robust step detection algorithm, which is integrated with an adaptive threshold, peak and valley correction algorithm, is applied to the classified activities to detect the walking steps. In addition, the misclassification activities are fed back to the previous layer. Finally, three adaptive distance estimators, which are based on a non-parametric model of the average walking speed, calculate the length of each strike. The experimental results show that the average classification accuracy is about 99%, and the accuracy of the step detection is 98.7%. The error of the estimated distance is 2.2⁻4.2% depending on the type of wrist activities.
Easy and accurate variance estimation of the nonparametric estimator of the partial area under the ROC curve and its application.

PubMed

Yu, Jihnhee; Yang, Luge; Vexler, Albert; Hutson, Alan D

2016-06-15

The receiver operating characteristic (ROC) curve is a popular technique with applications, for example, investigating an accuracy of a biomarker to delineate between disease and non-disease groups. A common measure of accuracy of a given diagnostic marker is the area under the ROC curve (AUC). In contrast with the AUC, the partial area under the ROC curve (pAUC) looks into the area with certain specificities (i.e., true negative rate) only, and it can be often clinically more relevant than examining the entire ROC curve. The pAUC is commonly estimated based on a U-statistic with the plug-in sample quantile, making the estimator a non-traditional U-statistic. In this article, we propose an accurate and easy method to obtain the variance of the nonparametric pAUC estimator. The proposed method is easy to implement for both one biomarker test and the comparison of two correlated biomarkers because it simply adapts the existing variance estimator of U-statistics. In this article, we show accuracy and other advantages of the proposed variance estimation method by broadly comparing it with previously existing methods. Further, we develop an empirical likelihood inference method based on the proposed variance estimator through a simple implementation. In an application, we demonstrate that, depending on the inferences by either the AUC or pAUC, we can make a different decision on a prognostic ability of a same set of biomarkers. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Mapping Findspots of Roman Military Brickstamps in Mogontiacum (Mainz) and Archaeometrical Analysis

NASA Astrophysics Data System (ADS)

Dolata, Jens; Mucha, Hans-Joachim; Bartel, Hans-Georg

Mainz was a Roman settlement that was established as an important military outpost in 13 BC. Almost 100 years later Mainz, the ancient Mogontiacum, became the seat of the administrative centre of the Roman Province of Germania Superior. About 3,500 brickstamps concerning to the period until the fall of the Roman Empire in the fifth century AD have been found in archaeological excavations. These documents have to be investigated based on several methods for a better understanding the history. Here the focus is on an application of spatial statistical analysis in archaeology. Concretely, about 250 sites have to be investigated. So, we compare maps of different periods graphically by nonparametric density estimation. Here different weights of the sites according to the radius of the finding area are taken into account. Moreover we can test whether archaeological segmentation is statistically significant or not. In combination of smooth mapping, testing and looking for dated brickstamps there is a good chance to achieve new sources for the Roman history of Mainz.
Multi-frequency Phase Unwrap from Noisy Data: Adaptive Least Squares Approach

NASA Astrophysics Data System (ADS)

Katkovnik, Vladimir; Bioucas-Dias, José

2010-04-01

Multiple frequency interferometry is, basically, a phase acquisition strategy aimed at reducing or eliminating the ambiguity of the wrapped phase observations or, equivalently, reducing or eliminating the fringe ambiguity order. In multiple frequency interferometry, the phase measurements are acquired at different frequencies (or wavelengths) and recorded using the corresponding sensors (measurement channels). Assuming that the absolute phase to be reconstructed is piece-wise smooth, we use a nonparametric regression technique for the phase reconstruction. The nonparametric estimates are derived from a local least squares criterion, which, when applied to the multifrequency data, yields denoised (filtered) phase estimates with extended ambiguity (periodized), compared with the phase ambiguities inherent to each measurement frequency. The filtering algorithm is based on local polynomial (LPA) approximation for design of nonlinear filters (estimators) and adaptation of these filters to unknown smoothness of the spatially varying absolute phase [9]. For phase unwrapping, from filtered periodized data, we apply the recently introduced robust (in the sense of discontinuity preserving) PUMA unwrapping algorithm [1]. Simulations give evidence that the proposed algorithm yields state-of-the-art performance for continuous as well as for discontinues phase surfaces, enabling phase unwrapping in extraordinary difficult situations when all other algorithms fail.
The binned bispectrum estimator: template-based and non-parametric CMB non-Gaussianity searches

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bucher, Martin; Racine, Benjamin; Tent, Bartjan van, E-mail: bucher@apc.univ-paris7.fr, E-mail: benjar@uio.no, E-mail: vantent@th.u-psud.fr

2016-05-01

We describe the details of the binned bispectrum estimator as used for the official 2013 and 2015 analyses of the temperature and polarization CMB maps from the ESA Planck satellite. The defining aspect of this estimator is the determination of a map bispectrum (3-point correlation function) that has been binned in harmonic space. For a parametric determination of the non-Gaussianity in the map (the so-called f NL parameters), one takes the inner product of this binned bispectrum with theoretically motivated templates. However, as a complementary approach one can also smooth the binned bispectrum using a variable smoothing scale in ordermore » to suppress noise and make coherent features stand out above the noise. This allows one to look in a model-independent way for any statistically significant bispectral signal. This approach is useful for characterizing the bispectral shape of the galactic foreground emission, for which a theoretical prediction of the bispectral anisotropy is lacking, and for detecting a serendipitous primordial signal, for which a theoretical template has not yet been put forth. Both the template-based and the non-parametric approaches are described in this paper.« less
Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci.

PubMed

Yap, John Stephen; Fan, Jianqing; Wu, Rongling

2009-12-01

Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.'s (2006, Biometrika 93, 85-98) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L(2) penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision, and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology, and health sciences.
PROPERTIES OF THE WEIGHTING CELL ESTIMATOR UNDER A NONPARAMETRIC RESPONSE MECHANISM. (R829095C002)

EPA Science Inventory

The perspectives, information and conclusions conveyed in research project abstracts, progress reports, final reports, journal abstracts and journal publications convey the viewpoints of the principal investigator and may not represent the views and policies of ORD and EPA. Concl...
Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

PubMed

Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

2014-07-01

Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A non-parametric consistency test of the ΛCDM model with Planck CMB data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aghamousa, Amir; Shafieloo, Arman; Hamann, Jan, E-mail: amir@aghamousa.com, E-mail: jan.hamann@unsw.edu.au, E-mail: shafieloo@kasi.re.kr

Non-parametric reconstruction methods, such as Gaussian process (GP) regression, provide a model-independent way of estimating an underlying function and its uncertainty from noisy data. We demonstrate how GP-reconstruction can be used as a consistency test between a given data set and a specific model by looking for structures in the residuals of the data with respect to the model's best-fit. Applying this formalism to the Planck temperature and polarisation power spectrum measurements, we test their global consistency with the predictions of the base ΛCDM model. Our results do not show any serious inconsistencies, lending further support to the interpretation ofmore » the base ΛCDM model as cosmology's gold standard.« less

Mapping Topographic Structure in White Matter Pathways with Level Set Trees

PubMed Central

Kent, Brian P.; Rinaldo, Alessandro; Yeh, Fang-Cheng; Verstynen, Timothy

2014-01-01

Fiber tractography on diffusion imaging data offers rich potential for describing white matter pathways in the human brain, but characterizing the spatial organization in these large and complex data sets remains a challenge. We show that level set trees–which provide a concise representation of the hierarchical mode structure of probability density functions–offer a statistically-principled framework for visualizing and analyzing topography in fiber streamlines. Using diffusion spectrum imaging data collected on neurologically healthy controls (N = 30), we mapped white matter pathways from the cortex into the striatum using a deterministic tractography algorithm that estimates fiber bundles as dimensionless streamlines. Level set trees were used for interactive exploration of patterns in the endpoint distributions of the mapped fiber pathways and an efficient segmentation of the pathways that had empirical accuracy comparable to standard nonparametric clustering techniques. We show that level set trees can also be generalized to model pseudo-density functions in order to analyze a broader array of data types, including entire fiber streamlines. Finally, resampling methods show the reliability of the level set tree as a descriptive measure of topographic structure, illustrating its potential as a statistical descriptor in brain imaging analysis. These results highlight the broad applicability of level set trees for visualizing and analyzing high-dimensional data like fiber tractography output. PMID:24714673
On the Bias-Amplifying Effect of Near Instruments in Observational Studies

ERIC Educational Resources Information Center

Steiner, Peter M.; Kim, Yongnam

2014-01-01

In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…
Confidence Intervals for a Semiparametric Approach to Modeling Nonlinear Relations among Latent Variables

ERIC Educational Resources Information Center

Pek, Jolynn; Losardo, Diane; Bauer, Daniel J.

2011-01-01

Compared to parametric models, nonparametric and semiparametric approaches to modeling nonlinearity between latent variables have the advantage of recovering global relationships of unknown functional form. Bauer (2005) proposed an indirect application of finite mixtures of structural equation models where latent components are estimated in the…
A Bayesian Nonparametric Meta-Analysis Model

ERIC Educational Resources Information Center

Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.

2015-01-01

In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…
A Nonparametric Geostatistical Method For Estimating Species Importance

Treesearch

Andrew J. Lister; Rachel Riemann; Michael Hoppus

2001-01-01

Parametric statistical methods are not always appropriate for conducting spatial analyses of forest inventory data. Parametric geostatistical methods such as variography and kriging are essentially averaging procedures, and thus can be affected by extreme values. Furthermore, non normal distributions violate the assumptions of analyses in which test statistics are...
Estimates of the Sampling Distribution of Scalability Coefficient H

ERIC Educational Resources Information Center

Van Onna, Marieke J. H.

2004-01-01

Coefficient "H" is used as an index of scalability in nonparametric item response theory (NIRT). It indicates the degree to which a set of items rank orders examinees. Theoretical sampling distributions, however, have only been derived asymptotically and only under restrictive conditions. Bootstrap methods offer an alternative possibility to…
Comparison of least squares and exponential sine sweep methods for Parallel Hammerstein Models estimation

NASA Astrophysics Data System (ADS)

Rebillat, Marc; Schoukens, Maarten

2018-05-01

Linearity is a common assumption for many real-life systems, but in many cases the nonlinear behavior of systems cannot be ignored and must be modeled and estimated. Among the various existing classes of nonlinear models, Parallel Hammerstein Models (PHM) are interesting as they are at the same time easy to interpret as well as to estimate. One way to estimate PHM relies on the fact that the estimation problem is linear in the parameters and thus that classical least squares (LS) estimation algorithms can be used. In that area, this article introduces a regularized LS estimation algorithm inspired on some of the recently developed regularized impulse response estimation techniques. Another mean to estimate PHM consists in using parametric or non-parametric exponential sine sweeps (ESS) based methods. These methods (LS and ESS) are founded on radically different mathematical backgrounds but are expected to tackle the same issue. A methodology is proposed here to compare them with respect to (i) their accuracy, (ii) their computational cost, and (iii) their robustness to noise. Tests are performed on simulated systems for several values of methods respective parameters and of signal to noise ratio. Results show that, for a given set of data points, the ESS method is less demanding in computational resources than the LS method but that it is also less accurate. Furthermore, the LS method needs parameters to be set in advance whereas the ESS method is not subject to conditioning issues and can be fully non-parametric. In summary, for a given set of data points, ESS method can provide a first, automatic, and quick overview of a nonlinear system than can guide more computationally demanding and precise methods, such as the regularized LS one proposed here.
A sharper view of Pal 5's tails: discovery of stream perturbations with a novel non-parametric technique

NASA Astrophysics Data System (ADS)

Erkal, Denis; Koposov, Sergey E.; Belokurov, Vasily

2017-09-01

Only in the Milky Way is it possible to conduct an experiment that uses stellar streams to detect low-mass dark matter subhaloes. In smooth and static host potentials, tidal tails of disrupting satellites appear highly symmetric. However, perturbations from dark subhaloes, as well as from GMCs and the Milky Way bar, can induce density fluctuations that destroy this symmetry. Motivated by the recent release of unprecedentedly deep and wide imaging data around the Pal 5 stellar stream, we develop a new probabilistic, adaptive and non-parametric technique that allows us to bring the cluster's tidal tails into clear focus. Strikingly, we uncover a stream whose density exhibits visible changes on a variety of angular scales. We detect significant bumps and dips, both narrow and broad: two peaks on either side of the progenitor, each only a fraction of a degree across, and two gaps, ˜2° and ˜9° wide, the latter accompanied by a gargantuan lump of debris. This largest density feature results in a pronounced intertail asymmetry which cannot be made consistent with an unperturbed stream according to a suite of simulations we have produced. We conjecture that the sharp peaks around Pal 5 are epicyclic overdensities, while the two dips are consistent with impacts by subhaloes. Assuming an age of 3.4 Gyr for Pal 5, these two gaps would correspond to the characteristic size of gaps created by subhaloes in the mass range of 106-107 M⊙ and 107-108 M⊙, respectively. In addition to dark substructure, we find that the bar of the Milky Way can plausibly produce the asymmetric density seen in Pal 5 and that GMCs could cause the smaller gap.
Mean Recency Period for Estimation of HIV-1 Incidence with the BED-Capture EIA and Bio-Rad Avidity in Persons Diagnosed in the United States with Subtype B.

PubMed

Hanson, Debra L; Song, Ruiguang; Masciotra, Silvina; Hernandez, Angela; Dobbs, Trudy L; Parekh, Bharat S; Owen, S Michele; Green, Timothy A

2016-01-01

HIV incidence estimates are used to monitor HIV-1 infection in the United States. Use of laboratory biomarkers that distinguish recent from longstanding infection to quantify HIV incidence rely on having accurate knowledge of the average time that individuals spend in a transient state of recent infection between seroconversion and reaching a specified biomarker cutoff value. This paper describes five estimation procedures from two general statistical approaches, a survival time approach and an approach that fits binomial models of the probability of being classified as recently infected, as a function of time since seroconversion. We compare these procedures for estimating the mean duration of recent infection (MDRI) for two biomarkers used by the U.S. National HIV Surveillance System for determination of HIV incidence, the Aware BED EIA HIV-1 incidence test (BED) and the avidity-based, modified Bio-Rad HIV-1/HIV-2 plus O ELISA (BRAI) assay. Collectively, 953 specimens from 220 HIV-1 subtype B seroconverters, taken from 5 cohorts, were tested with a biomarker assay. Estimates of MDRI using the non-parametric survival approach were 198.4 days (SD 13.0) for BED and 239.6 days (SD 13.9) for BRAI using cutoff values of 0.8 normalized optical density and 30%, respectively. The probability of remaining in the recent state as a function of time since seroconversion, based upon this revised statistical approach, can be applied in the calculation of annual incidence in the United States.
Multiple robustness in factorized likelihood models.

PubMed

Molina, J; Rotnitzky, A; Sued, M; Robins, J M

2017-09-01

We consider inference under a nonparametric or semiparametric model with likelihood that factorizes as the product of two or more variation-independent factors. We are interested in a finite-dimensional parameter that depends on only one of the likelihood factors and whose estimation requires the auxiliary estimation of one or several nuisance functions. We investigate general structures conducive to the construction of so-called multiply robust estimating functions, whose computation requires postulating several dimension-reducing models but which have mean zero at the true parameter value provided one of these models is correct.
Genetic Algorithm Based Framework for Automation of Stochastic Modeling of Multi-Season Streamflows

NASA Astrophysics Data System (ADS)

Srivastav, R. K.; Srinivasan, K.; Sudheer, K.

2009-05-01

Synthetic streamflow data generation involves the synthesis of likely streamflow patterns that are statistically indistinguishable from the observed streamflow data. The various kinds of stochastic models adopted for multi-season streamflow generation in hydrology are: i) parametric models which hypothesize the form of the periodic dependence structure and the distributional form a priori (examples are PAR, PARMA); disaggregation models that aim to preserve the correlation structure at the periodic level and the aggregated annual level; ii) Nonparametric models (examples are bootstrap/kernel based methods), which characterize the laws of chance, describing the stream flow process, without recourse to prior assumptions as to the form or structure of these laws; (k-nearest neighbor (k-NN), matched block bootstrap (MABB)); non-parametric disaggregation model. iii) Hybrid models which blend both parametric and non-parametric models advantageously to model the streamflows effectively. Despite many of these developments that have taken place in the field of stochastic modeling of streamflows over the last four decades, accurate prediction of the storage and the critical drought characteristics has been posing a persistent challenge to the stochastic modeler. This is partly because, usually, the stochastic streamflow model parameters are estimated by minimizing a statistically based objective function (such as maximum likelihood (MLE) or least squares (LS) estimation) and subsequently the efficacy of the models is being validated based on the accuracy of prediction of the estimates of the water-use characteristics, which requires large number of trial simulations and inspection of many plots and tables. Still accurate prediction of the storage and the critical drought characteristics may not be ensured. In this study a multi-objective optimization framework is proposed to find the optimal hybrid model (blend of a simple parametric model, PAR(1) model and matched block bootstrap (MABB) ) based on the explicit objective functions of minimizing the relative bias and relative root mean square error in estimating the storage capacity of the reservoir. The optimal parameter set of the hybrid model is obtained based on the search over a multi- dimensional parameter space (involving simultaneous exploration of the parametric (PAR(1)) as well as the non-parametric (MABB) components). This is achieved using the efficient evolutionary search based optimization tool namely, non-dominated sorting genetic algorithm - II (NSGA-II). This approach helps in reducing the drudgery involved in the process of manual selection of the hybrid model, in addition to predicting the basic summary statistics dependence structure, marginal distribution and water-use characteristics accurately. The proposed optimization framework is used to model the multi-season streamflows of River Beaver and River Weber of USA. In case of both the rivers, the proposed GA-based hybrid model yields a much better prediction of the storage capacity (where simultaneous exploration of both parametric and non-parametric components is done) when compared with the MLE-based hybrid models (where the hybrid model selection is done in two stages, thus probably resulting in a sub-optimal model). This framework can be further extended to include different linear/non-linear hybrid stochastic models at other temporal and spatial scales as well.
Evaluation of non-stationarity of floods in the Northeastern and Upper Midwest United States

NASA Astrophysics Data System (ADS)

Dhakal, N.; Palmer, R. N.

2017-12-01

Climate change is likely to impact precipitation as well as snow accumulation and melt in the Northeastern and Upper Midwest Unites States, ultimately affecting the quantity and seasonal distribution of streamflow. Such information is crucial for flood protection polices for example for regional flood frequency analysis. The objective of this study is to analyze seasonality and magnitude of long-term daily annual maximum streamflow (AMF) records and its changes for 158 sites in Northeastern and Upper Midwest Unites States. Temporal trends were analyzed based on two 30-year blocks (1951-1980 and 1981-2010) of AMF. Seasonality is assessed based on nonparametric directional/circular statistical method that allows for an adaptive estimation of seasonal density. The results for temporal change in seasonality showed mixed pattern/trend across the stations. While for majority of the stations, the distribution of AMF timing is strongly unimodal (concentrated around Spring season) for the earlier time period, the strength in the modes have gotten weaker during the recent time period for a number of stations along the coastal states indicating the emergence of multiple modes and change in seasonality therein. Assessment of the temporal change in magnitude of AMF based on the Mann-Kendall nonparametric test shows that majority of the stations do not show significant increasing or decreasing trend for either time period. It is also observed that comparatively more stations show increasing trends in magnitude based on AMF from earlier time period and most of these stations are coastal sites concentrated in the southeastern part of the region. Our study focused on both seasonality and magnitude of AMF has important implications for flood management and mitigation.
Characterization and modelling of the spatially- and spectrally-varying point-spread function in hyperspectral imaging systems for computational correction of axial optical aberrations

NASA Astrophysics Data System (ADS)

Špiclin, Žiga; Bürmen, Miran; Pernuš, Franjo; Likar, Boštjan

2012-03-01

Spatial resolution of hyperspectral imaging systems can vary significantly due to axial optical aberrations that originate from wavelength-induced index-of-refraction variations of the imaging optics. For systems that have a broad spectral range, the spatial resolution will vary significantly both with respect to the acquisition wavelength and with respect to the spatial position within each spectral image. Variations of the spatial resolution can be effectively characterized as part of the calibration procedure by a local image-based estimation of the pointspread function (PSF) of the hyperspectral imaging system. The estimated PSF can then be used in the image deconvolution methods to improve the spatial resolution of the spectral images. We estimated the PSFs from the spectral images of a line grid geometric caliber. From individual line segments of the line grid, the PSF was obtained by a non-parametric estimation procedure that used an orthogonal series representation of the PSF. By using the non-parametric estimation procedure, the PSFs were estimated at different spatial positions and at different wavelengths. The variations of the spatial resolution were characterized by the radius and the fullwidth half-maximum of each PSF and by the modulation transfer function, computed from images of USAF1951 resolution target. The estimation and characterization of the PSFs and the image deconvolution based spatial resolution enhancement were tested on images obtained by a hyperspectral imaging system with an acousto-optic tunable filter in the visible spectral range. The results demonstrate that the spatial resolution of the acquired spectral images can be significantly improved using the estimated PSFs and image deconvolution methods.
Evaluating effects of developmental education for college students using a regression discontinuity design.

PubMed

Moss, Brian G; Yeaton, William H

2013-10-01

Annually, American colleges and universities provide developmental education (DE) to millions of underprepared students; however, evaluation estimates of DE benefits have been mixed. Using a prototypic exemplar of DE, our primary objective was to investigate the utility of a replicative evaluative framework for assessing program effectiveness. Within the context of the regression discontinuity (RD) design, this research examined the effectiveness of a DE program for five, sequential cohorts of first-time college students. Discontinuity estimates were generated for individual terms and cumulatively, across terms. Participants were 3,589 first-time community college students. DE program effects were measured by contrasting both college-level English grades and a dichotomous measure of pass/fail, for DE and non-DE students. Parametric and nonparametric estimates of overall effect were positive for continuous and dichotomous measures of achievement (grade and pass/fail). The variability of program effects over time was determined by tracking results within individual terms and cumulatively, across terms. Applying this replication strategy, DE's overall impact was modest (an effect size of approximately .20) but quite consistent, based on parametric and nonparametric estimation approaches. A meta-analysis of five RD results yielded virtually the same estimate as the overall, parametric findings. Subset analysis, though tentative, suggested that males benefited more than females, while academic gains were comparable for different ethnicities. The cumulative, within-study comparison, replication approach offers considerable potential for the evaluation of new and existing policies, particularly when effects are relatively small, as is often the case in applied settings.
[Nonparametric method of estimating survival functions containing right-censored and interval-censored data].

PubMed

Xu, Yonghong; Gao, Xiaohuan; Wang, Zhengxi

2014-04-01

Missing data represent a general problem in many scientific fields, especially in medical survival analysis. Dealing with censored data, interpolation method is one of important methods. However, most of the interpolation methods replace the censored data with the exact data, which will distort the real distribution of the censored data and reduce the probability of the real data falling into the interpolation data. In order to solve this problem, we in this paper propose a nonparametric method of estimating the survival function of right-censored and interval-censored data and compare its performance to SC (self-consistent) algorithm. Comparing to the average interpolation and the nearest neighbor interpolation method, the proposed method in this paper replaces the right-censored data with the interval-censored data, and greatly improves the probability of the real data falling into imputation interval. Then it bases on the empirical distribution theory to estimate the survival function of right-censored and interval-censored data. The results of numerical examples and a real breast cancer data set demonstrated that the proposed method had higher accuracy and better robustness for the different proportion of the censored data. This paper provides a good method to compare the clinical treatments performance with estimation of the survival data of the patients. This pro vides some help to the medical survival data analysis.
Exact nonparametric confidence bands for the survivor function.

PubMed

Matthews, David

2013-10-12

A method to produce exact simultaneous confidence bands for the empirical cumulative distribution function that was first described by Owen, and subsequently corrected by Jager and Wellner, is the starting point for deriving exact nonparametric confidence bands for the survivor function of any positive random variable. We invert a nonparametric likelihood test of uniformity, constructed from the Kaplan-Meier estimator of the survivor function, to obtain simultaneous lower and upper bands for the function of interest with specified global confidence level. The method involves calculating a null distribution and associated critical value for each observed sample configuration. However, Noe recursions and the Van Wijngaarden-Decker-Brent root-finding algorithm provide the necessary tools for efficient computation of these exact bounds. Various aspects of the effect of right censoring on these exact bands are investigated, using as illustrations two observational studies of survival experience among non-Hodgkin's lymphoma patients and a much larger group of subjects with advanced lung cancer enrolled in trials within the North Central Cancer Treatment Group. Monte Carlo simulations confirm the merits of the proposed method of deriving simultaneous interval estimates of the survivor function across the entire range of the observed sample. This research was supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada. It was begun while the author was visiting the Department of Statistics, University of Auckland, and completed during a subsequent sojourn at the Medical Research Council Biostatistics Unit in Cambridge. The support of both institutions, in addition to that of NSERC and the University of Waterloo, is greatly appreciated.
Semiparametric mixed-effects analysis of PK/PD models using differential equations.

PubMed

Wang, Yi; Eskridge, Kent M; Zhang, Shunpu

2008-08-01

Motivated by the use of semiparametric nonlinear mixed-effects modeling on longitudinal data, we develop a new semiparametric modeling approach to address potential structural model misspecification for population pharmacokinetic/pharmacodynamic (PK/PD) analysis. Specifically, we use a set of ordinary differential equations (ODEs) with form dx/dt = A(t)x + B(t) where B(t) is a nonparametric function that is estimated using penalized splines. The inclusion of a nonparametric function in the ODEs makes identification of structural model misspecification feasible by quantifying the model uncertainty and provides flexibility for accommodating possible structural model deficiencies. The resulting model will be implemented in a nonlinear mixed-effects modeling setup for population analysis. We illustrate the method with an application to cefamandole data and evaluate its performance through simulations.
A nonparametric smoothing method for assessing GEE models with longitudinal binary data.

PubMed

Lin, Kuo-Chin; Chen, Yi-Ju; Shyr, Yu

2008-09-30

Studies involving longitudinal binary responses are widely applied in the health and biomedical sciences research and frequently analyzed by generalized estimating equations (GEE) method. This article proposes an alternative goodness-of-fit test based on the nonparametric smoothing approach for assessing the adequacy of GEE fitted models, which can be regarded as an extension of the goodness-of-fit test of le Cessie and van Houwelingen (Biometrics 1991; 47:1267-1282). The expectation and approximate variance of the proposed test statistic are derived. The asymptotic distribution of the proposed test statistic in terms of a scaled chi-squared distribution and the power performance of the proposed test are discussed by simulation studies. The testing procedure is demonstrated by two real data. Copyright (c) 2008 John Wiley & Sons, Ltd.
Parametric, nonparametric and parametric modelling of a chaotic circuit time series

NASA Astrophysics Data System (ADS)

Timmer, J.; Rust, H.; Horbelt, W.; Voss, H. U.

2000-09-01

The determination of a differential equation underlying a measured time series is a frequently arising task in nonlinear time series analysis. In the validation of a proposed model one often faces the dilemma that it is hard to decide whether possible discrepancies between the time series and model output are caused by an inappropriate model or by bad estimates of parameters in a correct type of model, or both. We propose a combination of parametric modelling based on Bock's multiple shooting algorithm and nonparametric modelling based on optimal transformations as a strategy to test proposed models and if rejected suggest and test new ones. We exemplify this strategy on an experimental time series from a chaotic circuit where we obtain an extremely accurate reconstruction of the observed attractor.
Seed dispersal into wetlands: Techniques and results for a restored tidal freshwater marsh

USGS Publications Warehouse

Neff, K.P.; Baldwin, A.H.

2005-01-01

Although seed dispersal is assumed to be a major factor determining plant community development in restored wetlands, little research exists on density and species richness of seed available through dispersal in these systems. We measured composition and seed dispersal rates at a restored tidal freshwater marsh in Washington, DC, USA by collecting seed dispersing through water and wind. Seed dispersal by water was measured using two methods of seed collection: (1) stationary traps composed of coconut fiber mat along an elevation gradient bracketing the tidal range and (2) a floating surface trawl net attached to a boat. To estimate wind dispersal rates, we collected seed from stationary traps composed of coconut fiber mat positioned above marsh vegetation. We also collected a small number of samples of debris deposited along high tide lines (drift lines) and feces of Canada Goose to explore their seed content. We used the seedling emergence method to determine seed density in all samples, which involved placing the fiber mats or sample material on top of potting soil in a greenhouse misting room and enumerating emerging seedlings. Seedlings from a total of 125 plant species emerged during this study (including 82 in river trawls, 89 in stationary water traps, 21 in drift lines, 39 in wind traps, and 10 in goose feces). The most abundant taxa included Bidens frondosa, Boehmeria cylindrica, Cyperus spp., Eclipta prostrata, and Ludwigia palustris. Total seedling density was significantly greater for the stationary water traps (212 + 30.6 seeds/m2/month) than the equal-sized stationary wind traps (18 + 6.0 seeds/m(2)/month). Lower-bound estimates of total species richness based on the non-parametric Chao 2 asymptotic estimators were greater for seeds in water (106 + 1.4 for stationary water traps and 104 + 5.5 for trawl samples) than for wind (54 + 6.4). Our results indicate that water is the primary source of seeds dispersing to the site and that a species-rich pool of dispersing propagules is present, an interesting result given the urbanized nature of the surrounding landscape. However, species composition of dispersing seeds differed from vegetation of restored and natural tidal freshwater marshes, indicating that planting is necessary for certain species. At other restoration sites, information on densities of dispersing seeds can support decisions on which species to plant.

Emitter Number Estimation by the General Information Theoretic Criterion from Pulse Trains

DTIC Science & Technology

2002-12-01

negative log likelihood function plus a penalty function. The general information criteria by Yin and Krishnaiah [11] are different from the regular...548-551, Victoria, BC, Canada, March 1999 DRDC Ottawa TR 2002-156 11 11. L. Zhao, P. P. Krishnaiah and Z. Bai, “On some nonparametric methods for
The Impact of Conditional Scores on the Performance of DETECT.

ERIC Educational Resources Information Center

Zhang, Yanwei Oliver; Yu, Feng; Nandakumar, Ratna

DETECT is a nonparametric, conditional covariance-based procedure to identify dimensional structure and the degree of multidimensionality of test data. The ability composite or conditional score used to estimate conditional covariance plays a significant role in the performance of DETECT. The number correct score of all items in the test (T) and…
Gender Wage Disparities among the Highly Educated

ERIC Educational Resources Information Center

Black, Dan A.; Haviland, Amelia M.; Sanders, Seth G.; Taylor, Lowell J.

2008-01-01

We examine gender wage disparities for four groups of college-educated women--black, Hispanic, Asian, and non-Hispanic white--using the National Survey of College Graduates. Raw log wage gaps, relative to non-Hispanic white male counterparts, generally exceed -0.30. Estimated gaps decline to between -0.08 and -0.19 in nonparametric analyses that…
Nonparametric Estimation of the Probability of Discovering a New Species.

DTIC Science & Technology

1986-01-01

see Good (1953, 1965), Good and Toulmin (1956), Goodman (1949), Harris (1959, 1968), Knott (1967) and Robbins (1968). We note that our model is not...and Toulmin , G. (1956). The number of new species and the increase of population coverage, when a sample is increased. Biometrika 43, 45-63. Goodman
Estimating a Meaningful Point of Change: A Comparison of Exploratory Techniques Based on Nonparametric Regression

ERIC Educational Resources Information Center

Klotsche, Jens; Gloster, Andrew T.

2012-01-01

Longitudinal studies are increasingly common in psychological research. Characterized by repeated measurements, longitudinal designs aim to observe phenomena that change over time. One important question involves identification of the exact point in time when the observed phenomena begin to meaningfully change above and beyond baseline…
A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

ERIC Educational Resources Information Center

Karabatsos, George; Walker, Stephen G.

2013-01-01

The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…
A robust nonparametric framework for reconstruction of stochastic differential equation models

NASA Astrophysics Data System (ADS)

Rajabzadeh, Yalda; Rezaie, Amir Hossein; Amindavar, Hamidreza

2016-05-01

In this paper, we employ a nonparametric framework to robustly estimate the functional forms of drift and diffusion terms from discrete stationary time series. The proposed method significantly improves the accuracy of the parameter estimation. In this framework, drift and diffusion coefficients are modeled through orthogonal Legendre polynomials. We employ the least squares regression approach along with the Euler-Maruyama approximation method to learn coefficients of stochastic model. Next, a numerical discrete construction of mean squared prediction error (MSPE) is established to calculate the order of Legendre polynomials in drift and diffusion terms. We show numerically that the new method is robust against the variation in sample size and sampling rate. The performance of our method in comparison with the kernel-based regression (KBR) method is demonstrated through simulation and real data. In case of real dataset, we test our method for discriminating healthy electroencephalogram (EEG) signals from epilepsy ones. We also demonstrate the efficiency of the method through prediction in the financial data. In both simulation and real data, our algorithm outperforms the KBR method.
Bayesian dynamic mediation analysis.

PubMed

Huang, Jing; Yuan, Ying

2017-12-01

Most existing methods for mediation analysis assume that mediation is a stationary, time-invariant process, which overlooks the inherently dynamic nature of many human psychological processes and behavioral activities. In this article, we consider mediation as a dynamic process that continuously changes over time. We propose Bayesian multilevel time-varying coefficient models to describe and estimate such dynamic mediation effects. By taking the nonparametric penalized spline approach, the proposed method is flexible and able to accommodate any shape of the relationship between time and mediation effects. Simulation studies show that the proposed method works well and faithfully reflects the true nature of the mediation process. By modeling mediation effect nonparametrically as a continuous function of time, our method provides a valuable tool to help researchers obtain a more complete understanding of the dynamic nature of the mediation process underlying psychological and behavioral phenomena. We also briefly discuss an alternative approach of using dynamic autoregressive mediation model to estimate the dynamic mediation effect. The computer code is provided to implement the proposed Bayesian dynamic mediation analysis. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
Evaluation of rules to distinguish unique female grizzly bears with cubs in Yellowstone

USGS Publications Warehouse

Schwartz, C.C.; Haroldson, M.A.; Cherry, S.; Keating, K.A.

2008-01-01

The United States Fish and Wildlife Service uses counts of unduplicated female grizzly bears (Ursus arctos) with cubs-of-the-year to establish limits of sustainable mortality in the Greater Yellowstone Ecosystem, USA. Sightings are dustered into observations of unique bears based on an empirically derived rule set. The method has never been tested or verified. To evaluate the rule set, we used data from radiocollared females obtained during 1975-2004 to simulate populations under varying densities, distributions, and sighting frequencies. We tested individual rules and rule-set performance, using custom software to apply the rule-set and duster sightings. Results indicated most rules were violated to some degree, and rule-based dustering consistently underestimated the minimum number of females and total population size derived from a nonparametric estimator (Chao2). We conclude that the current rule set returns conservative estimates, but with minor improvements, counts of unduplicated females-with-cubs can serve as a reasonable index of population size useful for establishing annual mortality limits. For the Yellowstone population, the index is more practical and cost-effective than capture-mark-recapture using either DNA hair snagging or aerial surveys with radiomarked bears. The method has useful application in other ecosystems, but we recommend rules used to distinguish unique females be adapted to local conditions and tested.
Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization

NASA Astrophysics Data System (ADS)

Lee, Kyungbook; Song, Seok Goo

2017-09-01

Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events ( M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.
[Age- and sex-specific reference intervals for 10 health examination items: mega-data from a Japanese Health Service Association].

PubMed

Suka, Machi; Yoshida, Katsumi; Kawai, Tadashi; Aoki, Yoshikazu; Yamane, Noriyuki; Yamauchi, Kuniaki

2005-07-01

To determine age- and sex-specific reference intervals for 10 health examination items in Japanese adults. Health examination data were accumulated from 24 different prefectural health service associations affiliated with the Japan Association of Health Service. Those who were non-smokers, drank less than 7 days/week, and had a body mass index of 18.5-24.9kg/m2 were sampled as a reference population (n = 737,538; 224,947 men and 512,591 women). After classified by age and sex, reference intervals for 10 health examination items (systolic blood pressure, diastolic blood pressure, total cholesterol, triglyceride, glucose, uric acid, AST, ALT, gamma-GT, and hemoglobin) were estimated using the parametric and nonparametric methods. In every item except for hemoglobin, men had higher reference intervals than women. Systolic blood pressure, total cholesterol, and glucose showed an upward trend in values with increasing age. Hemoglobin showed a downward trend in values with increasing age. Triglyceride, ALT, and gamma-GT reached a peak in middle age. Overall, parametric estimates showed narrower reference intervals than non-parametric estimates. Reference intervals vary with age and sex. Age- and sex-specific reference intervals may contribute to better assessment of health examination data.
Estimating scaled treatment effects with multiple outcomes.

PubMed

Kennedy, Edward H; Kangovi, Shreya; Mitra, Nandita

2017-01-01

In classical study designs, the aim is often to learn about the effects of a treatment or intervention on a single outcome; in many modern studies, however, data on multiple outcomes are collected and it is of interest to explore effects on multiple outcomes simultaneously. Such designs can be particularly useful in patient-centered research, where different outcomes might be more or less important to different patients. In this paper, we propose scaled effect measures (via potential outcomes) that translate effects on multiple outcomes to a common scale, using mean-variance and median-interquartile range based standardizations. We present efficient, nonparametric, doubly robust methods for estimating these scaled effects (and weighted average summary measures), and for testing the null hypothesis that treatment affects all outcomes equally. We also discuss methods for exploring how treatment effects depend on covariates (i.e., effect modification). In addition to describing efficiency theory for our estimands and the asymptotic behavior of our estimators, we illustrate the methods in a simulation study and a data analysis. Importantly, and in contrast to much of the literature concerning effects on multiple outcomes, our methods are nonparametric and can be used not only in randomized trials to yield increased efficiency, but also in observational studies with high-dimensional covariates to reduce confounding bias.
Dynamic experiment design regularization approach to adaptive imaging with array radar/SAR sensor systems.

PubMed

Shkvarko, Yuriy; Tuxpan, José; Santos, Stewart

2011-01-01

We consider a problem of high-resolution array radar/SAR imaging formalized in terms of a nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the random wavefield scattered from a remotely sensed scene observed through a kernel signal formation operator and contaminated with random Gaussian noise. First, the Sobolev-type solution space is constructed to specify the class of consistent kernel SSP estimators with the reproducing kernel structures adapted to the metrics in such the solution space. Next, the "model-free" variational analysis (VA)-based image enhancement approach and the "model-based" descriptive experiment design (DEED) regularization paradigm are unified into a new dynamic experiment design (DYED) regularization framework. Application of the proposed DYED framework to the adaptive array radar/SAR imaging problem leads to a class of two-level (DEED-VA) regularized SSP reconstruction techniques that aggregate the kernel adaptive anisotropic windowing with the projections onto convex sets to enforce the consistency and robustness of the overall iterative SSP estimators. We also show how the proposed DYED regularization method may be considered as a generalization of the MVDR, APES and other high-resolution nonparametric adaptive radar sensing techniques. A family of the DYED-related algorithms is constructed and their effectiveness is finally illustrated via numerical simulations.
Estimation of Subpixel Snow-Covered Area by Nonparametric Regression Splines

NASA Astrophysics Data System (ADS)

Kuter, S.; Akyürek, Z.; Weber, G.-W.

2016-10-01

Measurement of the areal extent of snow cover with high accuracy plays an important role in hydrological and climate modeling. Remotely-sensed data acquired by earth-observing satellites offer great advantages for timely monitoring of snow cover. However, the main obstacle is the tradeoff between temporal and spatial resolution of satellite imageries. Soft or subpixel classification of low or moderate resolution satellite images is a preferred technique to overcome this problem. The most frequently employed snow cover fraction methods applied on Moderate Resolution Imaging Spectroradiometer (MODIS) data have evolved from spectral unmixing and empirical Normalized Difference Snow Index (NDSI) methods to latest machine learning-based artificial neural networks (ANNs). This study demonstrates the implementation of subpixel snow-covered area estimation based on the state-of-the-art nonparametric spline regression method, namely, Multivariate Adaptive Regression Splines (MARS). MARS models were trained by using MODIS top of atmospheric reflectance values of bands 1-7 as predictor variables. Reference percentage snow cover maps were generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also employed to estimate the percentage snow-covered area on the same data set. The results indicated that the developed MARS model performed better than th
Temporal and Spatial Analysis of Monogenetic Volcanic Fields

NASA Astrophysics Data System (ADS)

Kiyosugi, Koji

Achieving an understanding of the nature of monogenetic volcanic fields depends on identification of the spatial and temporal patterns of volcanism in these fields, and their relationships to structures mapped in the shallow crust and inferred in the deep crust and mantle through interpretation of geochemical, radiometric and geophysical data. We investigate the spatial and temporal distributions of volcanism in the Abu Monogenetic Volcano Group, Southwest Japan. E-W elongated volcano distribution, which is identified by a nonparametric kernel method, is found to be consistent with the spatial extent of P-wave velocity anomalies in the lower crust and upper mantle, supporting the idea that the spatial density map of volcanic vents reflects the geometry of a mantle diapir. Estimated basalt supply to the lower crust is constant. This observation and the spatial distribution of volcanic vents suggest stability of magma productivity and essentially constant two-dimensional size of the source mantle diapir. We mapped conduits, dike segments, and sills in the San Rafael sub-volcanic field, Utah, where the shallowest part of a Pliocene magmatic system is exceptionally well exposed. The distribution of conduits matches the major features of dike distribution, including development of clusters and distribution of outliers. The comparison of San Rafael conduit distribution and the distributions of volcanoes in several recently active volcanic fields supports the use of statistical models, such as nonparametric kernel methods, in probabilistic hazard assessment for distributed volcanism. We developed a new recurrence rate calculation method that uses a Monte Carlo procedure to better reflect and understand the impact of uncertainties of radiometric age determinations on uncertainty of recurrence rate estimates for volcanic activity in the Abu, Yucca Mountain Region, and Izu-Tobu volcanic fields. Results suggest that the recurrence rates of volcanic fields can change by more than one order of magnitude on time scales of several hundred thousand to several million years. This suggests that magma generation rate beneath volcanic fields may change over these time scales. Also, recurrence rate varies more than one order of magnitude between these volcanic fields, consistent with the idea that distributed volcanism may be influenced by both the rate of magma generation and the potential for dike interaction during ascent.
Data mining approach identifies research priorities and data requirements for resolving the red algal tree of life.

PubMed

Verbruggen, Heroen; Maggs, Christine A; Saunders, Gary W; Le Gall, Line; Yoon, Hwan Su; De Clerck, Olivier

2010-01-20

The assembly of the tree of life has seen significant progress in recent years but algae and protists have been largely overlooked in this effort. Many groups of algae and protists have ancient roots and it is unclear how much data will be required to resolve their phylogenetic relationships for incorporation in the tree of life. The red algae, a group of primary photosynthetic eukaryotes of more than a billion years old, provide the earliest fossil evidence for eukaryotic multicellularity and sexual reproduction. Despite this evolutionary significance, their phylogenetic relationships are understudied. This study aims to infer a comprehensive red algal tree of life at the family level from a supermatrix containing data mined from GenBank. We aim to locate remaining regions of low support in the topology, evaluate their causes and estimate the amount of data required to resolve them. Phylogenetic analysis of a supermatrix of 14 loci and 98 red algal families yielded the most complete red algal tree of life to date. Visualization of statistical support showed the presence of five poorly supported regions. Causes for low support were identified with statistics about the age of the region, data availability and node density, showing that poor support has different origins in different parts of the tree. Parametric simulation experiments yielded optimistic estimates of how much data will be needed to resolve the poorly supported regions (ca. 103 to ca. 104 nucleotides for the different regions). Nonparametric simulations gave a markedly more pessimistic image, some regions requiring more than 2.8 105 nucleotides or not achieving the desired level of support at all. The discrepancies between parametric and nonparametric simulations are discussed in light of our dataset and known attributes of both approaches. Our study takes the red algae one step closer to meaningful inclusion in the tree of life. In addition to the recovery of stable relationships, the recognition of five regions in need of further study is a significant outcome of this work. Based on our analyses of current availability and future requirements of data, we make clear recommendations for forthcoming research.
On the Mean Squared Error of Nonparametric Quantile Estimators under Random Right-Censorship.

DTIC Science & Technology

1986-09-01

SECURITY CI.ASSIFICATION lb. RESTRICTIVE MARKINGS UNCLASSIFIED 2a, SECURITY CLASSIFICATION AUTHORITY 3 . OISTRIBUTIONIAVAILASIL.ITY OF REPORT P16e 2b...UNCLASSIPIEO/UNLIMITEO 3 SAME AS RPT". 0 OTIC USERS 1 UNCLASSIFIED p." " 22. NAME OP RESPONSIBLE INOIVIOUAL 22b. TELEPHONE NUMBER 22c. OFFICE SYMBOL...in Section 3 , and the result for the kernel estimator Qn is derived in Section 4. It should be k. mentioned that the order statistic methods used by
Hematology and plasma chemistry reference intervals for cultured tilapia (Oreochromis hybrid).

PubMed

Hrubec, Terry C.; Cardinale, Jenifer L.; Smith, Stephen A.

2000-01-01

Tilapia are a commonly aquacultured fish yet little is known about their normal physiology and response to disease. In this study we determined the results of complete hematologic (n=40) and plasma biochemical profiles (n=63) in production tilapia (Oreochromis hybrids). The fish were raised in recirculating systems with a high stocking density (120 g/L), and were in the middle of a 15-month production cycle. Blood was analyzed using standard techniques, and reference intervals were determined using nonparametric methods. Non-production tilapia (n=15) from low-density tanks (4 g/L) also were sampled; the clinical chemistry results were compared to reference intervals from the fish raised in high-density tanks. Differences were noted in plasma protein, calcium and phosphorus concentrations, such that reference intervals for high-density production tilapia were not applicable to fish raised under different environmental and management conditions.
A Multidimensional Item Response Model: Constrained Latent Class Analysis Using the Gibbs Sampler and Posterior Predictive Checks.

ERIC Educational Resources Information Center

Hoijtink, Herbert; Molenaar, Ivo W.

1997-01-01

This paper shows that a certain class of constrained latent class models may be interpreted as a special case of nonparametric multidimensional item response models. Parameters of this latent class model are estimated using an application of the Gibbs sampler, and model fit is investigated using posterior predictive checks. (SLD)
Validity Study in Multidimensional Latent Space and Efficient Computerized Adaptive Testing. Final Report.

ERIC Educational Resources Information Center

Samejima, Fumiko

This paper is the final report of a multi-year project sponsored by the Office of Naval Research (ONR) in 1987 through 1990. The main objectives of the research summarized were to: investigate the non-parametric approach to the estimation of the operating characteristics of discrete item responses; revise and strengthen the package computer…

An application of quantile random forests for predictive mapping of forest attributes

Treesearch

E.A. Freeman; G.G. Moisen

2015-01-01

Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...
Hazard Function Estimation with Cause-of-Death Data Missing at Random.

PubMed

Wang, Qihua; Dinse, Gregg E; Liu, Chunling

2012-04-01

Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.
The 'F-complex' and MMN tap different aspects of deviance.

PubMed

Laufer, Ilan; Pratt, Hillel

2005-02-01

To compare the 'F(fusion)-complex' with the Mismatch negativity (MMN), both components associated with automatic detection of changes in the acoustic stimulus flow. Ten right-handed adult native Hebrew speakers discriminated vowel-consonant-vowel (V-C-V) sequences /ada/ (deviant) and /aga/ (standard) in an active auditory 'Oddball' task, and the brain potentials associated with performance of the task were recorded from 21 electrodes. Stimuli were generated by fusing the acoustic elements of the V-C-V sequences as follows: base was always presented in front of the subject, and formant transitions were presented to the front, left or right in a virtual reality room. An illusion of a lateralized echo (duplex sensation) accompanied base fusion with the lateralized formant locations. Source current density estimates were derived for the net response to the fusion of the speech elements (F-complex) and for the MMN, using low-resolution electromagnetic tomography (LORETA). Statistical non-parametric mapping was used to estimate the current density differences between the brain sources of the F-complex and the MMN. Occipito-parietal regions and prefrontal regions were associated with the F-complex in all formant locations, whereas the vicinity of the supratemporal plane was bilaterally associated with the MMN, but only in case of front-fusion (no duplex effect). MMN is sensitive to the novelty of the auditory object in relation to other stimuli in a sequence, whereas the F-complex is sensitive to the acoustic features of the auditory object and reflects a process of matching them with target categories. The F-complex and MMN reflect different aspects of auditory processing in a stimulus-rich and changing environment: content analysis of the stimulus and novelty detection, respectively.
A novel approach to validate satellite soil moisture retrievals using precipitation data

NASA Astrophysics Data System (ADS)

Karthikeyan, L.; Kumar, D. Nagesh

2016-10-01

A novel approach is proposed that attempts to validate passive microwave soil moisture retrievals using precipitation data (applied over India). It is based on the concept that the expectation of precipitation conditioned on soil moisture follows a sigmoidal convex-concave-shaped curve, the characteristic of which was recently shown to be represented by mutual information estimated between soil moisture and precipitation. On this basis, with an emphasis over distribution-free nonparametric computations, a new measure called Copula-Kernel Density Estimator based Mutual Information (CKDEMI) is introduced. The validation approach is generic in nature and utilizes CKDEMI in tandem with a couple of proposed bootstrap strategies, to check accuracy of any two soil moisture products (here Advanced Microwave Scanning Radiometer-EOS sensor's Vrije Universiteit Amsterdam-NASA (VUAN) and University of Montana (MONT) products) using precipitation (India Meteorological Department) data. The proposed technique yields a "best choice soil moisture product" map which contains locations where any one of the two/none of the two/both the products have produced accurate retrievals. The results indicated that in general, VUA-NASA product has performed well over University of Montana's product for India. The best choice soil moisture map is then integrated with land use land cover and elevation information using a novel probability density function-based procedure to gain insight on conditions under which each of the products has performed well. Finally, the impact of using a different precipitation (Asian Precipitation-Highly-Resolved Observational Data Integration Towards Evaluation of Water Resources) data set over the best choice soil moisture product map is also analyzed. The proposed methodology assists researchers and practitioners in selecting the appropriate soil moisture product for various assimilation strategies at both basin and continental scales.
Smooth centile curves for skew and kurtotic data modelled using the Box-Cox power exponential distribution.

PubMed

Rigby, Robert A; Stasinopoulos, D Mikis

2004-10-15

The Box-Cox power exponential (BCPE) distribution, developed in this paper, provides a model for a dependent variable Y exhibiting both skewness and kurtosis (leptokurtosis or platykurtosis). The distribution is defined by a power transformation Y(nu) having a shifted and scaled (truncated) standard power exponential distribution with parameter tau. The distribution has four parameters and is denoted BCPE (mu,sigma,nu,tau). The parameters, mu, sigma, nu and tau, may be interpreted as relating to location (median), scale (approximate coefficient of variation), skewness (transformation to symmetry) and kurtosis (power exponential parameter), respectively. Smooth centile curves are obtained by modelling each of the four parameters of the distribution as a smooth non-parametric function of an explanatory variable. A Fisher scoring algorithm is used to fit the non-parametric model by maximizing a penalized likelihood. The first and expected second and cross derivatives of the likelihood, with respect to mu, sigma, nu and tau, required for the algorithm, are provided. The centiles of the BCPE distribution are easy to calculate, so it is highly suited to centile estimation. This application of the BCPE distribution to smooth centile estimation provides a generalization of the LMS method of the centile estimation to data exhibiting kurtosis (as well as skewness) different from that of a normal distribution and is named here the LMSP method of centile estimation. The LMSP method of centile estimation is applied to modelling the body mass index of Dutch males against age. 2004 John Wiley & Sons, Ltd.
The First measurement of the top quark mass at CDF II in the lepton+jets and dilepton channels simultaneously

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aaltonen, T.; /Helsinki Inst. of Phys.; Adelman, J.

2008-09-01

The authors present a measurement of the mass of the top quark using data corresponding to an integrated luminosity of 1.9 fb{sup -1} of p{bar p} collisions collected at {radical}s = 1.96 TeV with the CDF II detector at Fermilab's Tevatron. This is the first measurement of the top quark mass using top-antitop pair candidate events in the lepton + jets and dilepton decay channels simultaneously. They reconstruct two observables in each channel and use a non-parametric kernel density estimation technique to derive two-dimensional probability density functions from simulated signal and background samples. The observables are the top quark massmore » and the invariant mass of two jets from the W decay in the lepton + jets channel, and the top quark mass and the scalar sum of transverse energy of the event in the diletpon channel. They perform a simultaneous fit for the top quark mass and the jet energy scale, which is constrained in situ by the hadronic W boson mass. using 332 lepton + jets candidate events and 144 diletpon candidate events, they measure the top quark mass to be m{sub top} = 171.9 {+-} 1.7 (stat. + JES) {+-} 1.1 (other sys.) GeV/c{sup 2} = 171.9 {+-} 2.0 GeV/c{sup 2}.« less
Goodness-of-Fit Tests and Nonparametric Adaptive Estimation for Spike Train Analysis

PubMed Central

2014-01-01

When dealing with classical spike train analysis, the practitioner often performs goodness-of-fit tests to test whether the observed process is a Poisson process, for instance, or if it obeys another type of probabilistic model (Yana et al. in Biophys. J. 46(3):323–330, 1984; Brown et al. in Neural Comput. 14(2):325–346, 2002; Pouzat and Chaffiol in Technical report, http://arxiv.org/abs/arXiv:0909.2785, 2009). In doing so, there is a fundamental plug-in step, where the parameters of the supposed underlying model are estimated. The aim of this article is to show that plug-in has sometimes very undesirable effects. We propose a new method based on subsampling to deal with those plug-in issues in the case of the Kolmogorov–Smirnov test of uniformity. The method relies on the plug-in of good estimates of the underlying model that have to be consistent with a controlled rate of convergence. Some nonparametric estimates satisfying those constraints in the Poisson or in the Hawkes framework are highlighted. Moreover, they share adaptive properties that are useful from a practical point of view. We show the performance of those methods on simulated data. We also provide a complete analysis with these tools on single unit activity recorded on a monkey during a sensory-motor task. Electronic Supplementary Material The online version of this article (doi:10.1186/2190-8567-4-3) contains supplementary material. PMID:24742008
Out-of-Sample Extensions for Non-Parametric Kernel Methods.

PubMed

Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang

2017-02-01

Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
Dirichlet Process Gaussian-mixture model: An application to localizing coalescing binary neutron stars with gravitational-wave observations

NASA Astrophysics Data System (ADS)

Del Pozzo, W.; Berry, C. P. L.; Ghosh, A.; Haines, T. S. F.; Singer, L. P.; Vecchio, A.

2018-06-01

We reconstruct posterior distributions for the position (sky area and distance) of a simulated set of binary neutron-star gravitational-waves signals observed with Advanced LIGO and Advanced Virgo. We use a Dirichlet Process Gaussian-mixture model, a fully Bayesian non-parametric method that can be used to estimate probability density functions with a flexible set of assumptions. The ability to reliably reconstruct the source position is important for multimessenger astronomy, as recently demonstrated with GW170817. We show that for detector networks comparable to the early operation of Advanced LIGO and Advanced Virgo, typical localization volumes are ˜104-105 Mpc3 corresponding to ˜102-103 potential host galaxies. The localization volume is a strong function of the network signal-to-noise ratio, scaling roughly ∝ϱnet-6. Fractional localizations improve with the addition of further detectors to the network. Our Dirichlet Process Gaussian-mixture model can be adopted for localizing events detected during future gravitational-wave observing runs, and used to facilitate prompt multimessenger follow-up.
Parametric vs. non-parametric statistics of low resolution electromagnetic tomography (LORETA).

PubMed

Thatcher, R W; North, D; Biver, C

2005-01-01

This study compared the relative statistical sensitivity of non-parametric and parametric statistics of 3-dimensional current sources as estimated by the EEG inverse solution Low Resolution Electromagnetic Tomography (LORETA). One would expect approximately 5% false positives (classification of a normal as abnormal) at the P < .025 level of probability (two tailed test) and approximately 1% false positives at the P < .005 level. EEG digital samples (2 second intervals sampled 128 Hz, 1 to 2 minutes eyes closed) from 43 normal adult subjects were imported into the Key Institute's LORETA program. We then used the Key Institute's cross-spectrum and the Key Institute's LORETA output files (*.lor) as the 2,394 gray matter pixel representation of 3-dimensional currents at different frequencies. The mean and standard deviation *.lor files were computed for each of the 2,394 gray matter pixels for each of the 43 subjects. Tests of Gaussianity and different transforms were computed in order to best approximate a normal distribution for each frequency and gray matter pixel. The relative sensitivity of parametric vs. non-parametric statistics were compared using a "leave-one-out" cross validation method in which individual normal subjects were withdrawn and then statistically classified as being either normal or abnormal based on the remaining subjects. Log10 transforms approximated Gaussian distribution in the range of 95% to 99% accuracy. Parametric Z score tests at P < .05 cross-validation demonstrated an average misclassification rate of approximately 4.25%, and range over the 2,394 gray matter pixels was 27.66% to 0.11%. At P < .01 parametric Z score cross-validation false positives were 0.26% and ranged from 6.65% to 0% false positives. The non-parametric Key Institute's t-max statistic at P < .05 had an average misclassification error rate of 7.64% and ranged from 43.37% to 0.04% false positives. The nonparametric t-max at P < .01 had an average misclassification rate of 6.67% and ranged from 41.34% to 0% false positives of the 2,394 gray matter pixels for any cross-validated normal subject. In conclusion, adequate approximation to Gaussian distribution and high cross-validation can be achieved by the Key Institute's LORETA programs by using a log10 transform and parametric statistics, and parametric normative comparisons had lower false positive rates than the non-parametric tests.
Use of Brain MRI Atlases to Determine Boundaries of Age-Related Pathology: The Importance of Statistical Method

PubMed Central

Dickie, David Alexander; Job, Dominic E.; Gonzalez, David Rodriguez; Shenkin, Susan D.; Wardlaw, Joanna M.

2015-01-01

Introduction Neurodegenerative disease diagnoses may be supported by the comparison of an individual patient’s brain magnetic resonance image (MRI) with a voxel-based atlas of normal brain MRI. Most current brain MRI atlases are of young to middle-aged adults and parametric, e.g., mean ±standard deviation (SD); these atlases require data to be Gaussian. Brain MRI data, e.g., grey matter (GM) proportion images, from normal older subjects are apparently not Gaussian. We created a nonparametric and a parametric atlas of the normal limits of GM proportions in older subjects and compared their classifications of GM proportions in Alzheimer’s disease (AD) patients. Methods Using publicly available brain MRI from 138 normal subjects and 138 subjects diagnosed with AD (all 55–90 years), we created: a mean ±SD atlas to estimate parametrically the percentile ranks and limits of normal ageing GM; and, separately, a nonparametric, rank order-based GM atlas from the same normal ageing subjects. GM images from AD patients were then classified with respect to each atlas to determine the effect statistical distributions had on classifications of proportions of GM in AD patients. Results The parametric atlas often defined the lower normal limit of the proportion of GM to be negative (which does not make sense physiologically as the lowest possible proportion is zero). Because of this, for approximately half of the AD subjects, 25–45% of voxels were classified as normal when compared to the parametric atlas; but were classified as abnormal when compared to the nonparametric atlas. These voxels were mainly concentrated in the frontal and occipital lobes. Discussion To our knowledge, we have presented the first nonparametric brain MRI atlas. In conditions where there is increasing variability in brain structure, such as in old age, nonparametric brain MRI atlases may represent the limits of normal brain structure more accurately than parametric approaches. Therefore, we conclude that the statistical method used for construction of brain MRI atlases should be selected taking into account the population and aim under study. Parametric methods are generally robust for defining central tendencies, e.g., means, of brain structure. Nonparametric methods are advisable when studying the limits of brain structure in ageing and neurodegenerative disease. PMID:26023913
Divergences and estimating tight bounds on Bayes error with applications to multivariate Gaussian copula and latent Gaussian copula

NASA Astrophysics Data System (ADS)

Thelen, Brian J.; Xique, Ismael J.; Burns, Joseph W.; Goley, G. Steven; Nolan, Adam R.; Benson, Jonathan W.

2017-04-01

In Bayesian decision theory, there has been a great amount of research into theoretical frameworks and information- theoretic quantities that can be used to provide lower and upper bounds for the Bayes error. These include well-known bounds such as Chernoff, Battacharrya, and J-divergence. Part of the challenge of utilizing these various metrics in practice is (i) whether they are "loose" or "tight" bounds, (ii) how they might be estimated via either parametric or non-parametric methods, and (iii) how accurate the estimates are for limited amounts of data. In general what is desired is a methodology for generating relatively tight lower and upper bounds, and then an approach to estimate these bounds efficiently from data. In this paper, we explore the so-called triangle divergence which has been around for a while, but was recently made more prominent in some recent research on non-parametric estimation of information metrics. Part of this work is motivated by applications for quantifying fundamental information content in SAR/LIDAR data, and to help in this, we have developed a flexible multivariate modeling framework based on multivariate Gaussian copula models which can be combined with the triangle divergence framework to quantify this information, and provide approximate bounds on Bayes error. In this paper we present an overview of the bounds, including those based on triangle divergence and verify that under a number of multivariate models, the upper and lower bounds derived from triangle divergence are significantly tighter than the other common bounds, and often times, dramatically so. We also propose some simple but effective means for computing the triangle divergence using Monte Carlo methods, and then discuss estimation of the triangle divergence from empirical data based on Gaussian Copula models.
Reliable estimates of predictive uncertainty for an Alpine catchment using a non-parametric methodology

NASA Astrophysics Data System (ADS)

Matos, José P.; Schaefli, Bettina; Schleiss, Anton J.

2017-04-01

Uncertainty affects hydrological modelling efforts from the very measurements (or forecasts) that serve as inputs to the more or less inaccurate predictions that are produced. Uncertainty is truly inescapable in hydrology and yet, due to the theoretical and technical hurdles associated with its quantification, it is at times still neglected or estimated only qualitatively. In recent years the scientific community has made a significant effort towards quantifying this hydrologic prediction uncertainty. Despite this, most of the developed methodologies can be computationally demanding, are complex from a theoretical point of view, require substantial expertise to be employed, and are constrained by a number of assumptions about the model error distribution. These assumptions limit the reliability of many methods in case of errors that show particular cases of non-normality, heteroscedasticity, or autocorrelation. The present contribution builds on a non-parametric data-driven approach that was developed for uncertainty quantification in operational (real-time) forecasting settings. The approach is based on the concept of Pareto optimality and can be used as a standalone forecasting tool or as a postprocessor. By virtue of its non-parametric nature and a general operating principle, it can be applied directly and with ease to predictions of streamflow, water stage, or even accumulated runoff. Also, it is a methodology capable of coping with high heteroscedasticity and seasonal hydrological regimes (e.g. snowmelt and rainfall driven events in the same catchment). Finally, the training and operation of the model are very fast, making it a tool particularly adapted to operational use. To illustrate its practical use, the uncertainty quantification method is coupled with a process-based hydrological model to produce statistically reliable forecasts for an Alpine catchment located in Switzerland. Results are presented and discussed in terms of their reliability and resolution.
Assessing statistical differences between parameters estimates in Partial Least Squares path modeling.

PubMed

Rodríguez-Entrena, Macario; Schuberth, Florian; Gelhard, Carsten

2018-01-01

Structural equation modeling using partial least squares (PLS-SEM) has become a main-stream modeling approach in various disciplines. Nevertheless, prior literature still lacks a practical guidance on how to properly test for differences between parameter estimates. Whereas existing techniques such as parametric and non-parametric approaches in PLS multi-group analysis solely allow to assess differences between parameters that are estimated for different subpopulations, the study at hand introduces a technique that allows to also assess whether two parameter estimates that are derived from the same sample are statistically different. To illustrate this advancement to PLS-SEM, we particularly refer to a reduced version of the well-established technology acceptance model.
Network Reconstruction From High-Dimensional Ordinary Differential Equations.

PubMed

Chen, Shizhe; Shojaie, Ali; Witten, Daniela M

2017-01-01

We consider the task of learning a dynamical system from high-dimensional time-course data. For instance, we might wish to estimate a gene regulatory network from gene expression data measured at discrete time points. We model the dynamical system nonparametrically as a system of additive ordinary differential equations. Most existing methods for parameter estimation in ordinary differential equations estimate the derivatives from noisy observations. This is known to be challenging and inefficient. We propose a novel approach that does not involve derivative estimation. We show that the proposed method can consistently recover the true network structure even in high dimensions, and we demonstrate empirical improvement over competing approaches. Supplementary materials for this article are available online.
Dialect Density in Bilingual Puerto Rican Spanish-English Speaking Children

PubMed Central

Fabiano-Smith, Leah; Shuriff, Rebecca; Barlow, Jessica A.; Goldstein, Brian A.

2014-01-01

It is still largely unknown how the two phonological systems of bilingual children interact. In this exploratory study, we examine children's use of dialect features to determine how their speech sound systems interact. Six monolingual Puerto Rican Spanish-speaking children and 6 bilingual Puerto Rican Spanish-English speaking children, ages 5-7 years, were included in the current study. Children's single word productions were analyzed for (1) dialect density and (2) frequency of occurrence of dialect features (after Oetting & McDonald, 2002). Nonparametric statistical analyses were used to examine differences within and across language groups. Results indicated that monolinguals and bilinguals exhibited similar dialect density, but differed on the types of dialect features used. Findings are discussed within the theoretical framework of the Dual Systems Model (Paradis, 2001) of language acquisition in bilingual children. PMID:25009677
Balancing Score Adjusted Targeted Minimum Loss-based Estimation

PubMed Central

Lendle, Samuel David; Fireman, Bruce; van der Laan, Mark J.

2015-01-01

Adjusting for a balancing score is sufficient for bias reduction when estimating causal effects including the average treatment effect and effect among the treated. Estimators that adjust for the propensity score in a nonparametric way, such as matching on an estimate of the propensity score, can be consistent when the estimated propensity score is not consistent for the true propensity score but converges to some other balancing score. We call this property the balancing score property, and discuss a class of estimators that have this property. We introduce a targeted minimum loss-based estimator (TMLE) for a treatment-specific mean with the balancing score property that is additionally locally efficient and doubly robust. We investigate the new estimator’s performance relative to other estimators, including another TMLE, a propensity score matching estimator, an inverse probability of treatment weighted estimator, and a regression-based estimator in simulation studies. PMID:26561539
Empirical validation of statistical parametric mapping for group imaging of fast neural activity using electrical impedance tomography.

PubMed

Packham, B; Barnes, G; Dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D

2016-06-01

Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity.
Empirical validation of statistical parametric mapping for group imaging of fast neural activity using electrical impedance tomography

PubMed Central

Packham, B; Barnes, G; dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D

2016-01-01

Abstract Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity. PMID:27203477
A Novel Nonparametric Approach for Neural Encoding and Decoding Models of Multimodal Receptive Fields.

PubMed

Agarwal, Rahul; Chen, Zhe; Kloosterman, Fabian; Wilson, Matthew A; Sarma, Sridevi V

2016-07-01

Pyramidal neurons recorded from the rat hippocampus and entorhinal cortex, such as place and grid cells, have diverse receptive fields, which are either unimodal or multimodal. Spiking activity from these cells encodes information about the spatial position of a freely foraging rat. At fine timescales, a neuron's spike activity also depends significantly on its own spike history. However, due to limitations of current parametric modeling approaches, it remains a challenge to estimate complex, multimodal neuronal receptive fields while incorporating spike history dependence. Furthermore, efforts to decode the rat's trajectory in one- or two-dimensional space from hippocampal ensemble spiking activity have mainly focused on spike history-independent neuronal encoding models. In this letter, we address these two important issues by extending a recently introduced nonparametric neural encoding framework that allows modeling both complex spatial receptive fields and spike history dependencies. Using this extended nonparametric approach, we develop novel algorithms for decoding a rat's trajectory based on recordings of hippocampal place cells and entorhinal grid cells. Results show that both encoding and decoding models derived from our new method performed significantly better than state-of-the-art encoding and decoding models on 6 minutes of test data. In addition, our model's performance remains invariant to the apparent modality of the neuron's receptive field.

Mapping the Structure-Function Relationship in Glaucoma and Healthy Patients Measured with Spectralis OCT and Humphrey Perimetry

PubMed Central

Muñoz–Negrete, Francisco J.; Oblanca, Noelia; Rebolleda, Gema

2018-01-01

Purpose To study the structure-function relationship in glaucoma and healthy patients assessed with Spectralis OCT and Humphrey perimetry using new statistical approaches. Materials and Methods Eighty-five eyes were prospectively selected and divided into 2 groups: glaucoma (44) and healthy patients (41). Three different statistical approaches were carried out: (1) factor analysis of the threshold sensitivities (dB) (automated perimetry) and the macular thickness (μm) (Spectralis OCT), subsequently applying Pearson's correlation to the obtained regions, (2) nonparametric regression analysis relating the values in each pair of regions that showed significant correlation, and (3) nonparametric spatial regressions using three models designed for the purpose of this study. Results In the glaucoma group, a map that relates structural and functional damage was drawn. The strongest correlation with visual fields was observed in the peripheral nasal region of both superior and inferior hemigrids (r = 0.602 and r = 0.458, resp.). The estimated functions obtained with the nonparametric regressions provided the mean sensitivity that corresponds to each given macular thickness. These functions allowed for accurate characterization of the structure-function relationship. Conclusions Both maps and point-to-point functions obtained linking structure and function damage contribute to a better understanding of this relationship and may help in the future to improve glaucoma diagnosis. PMID:29850196
To Math or Not to Math: The Algebra-Calculus Pipeline and Postsecondary Mathematics Remediation

ERIC Educational Resources Information Center

Showalter, Daniel A.

2017-01-01

This article reports on a study designed to estimate the effect of high school coursetaking in the algebra-calculus pipeline on the likelihood of placing out of postsecondary remedial mathematics. A nonparametric variant of propensity score analysis was used on a nationally representative data set to remove selection bias and test for an effect…
The Rasch Model and Missing Data, with an Emphasis on Tailoring Test Items.

ERIC Educational Resources Information Center

de Gruijter, Dato N. M.

Many applications of educational testing have a missing data aspect (MDA). This MDA is perhaps most pronounced in item banking, where each examinee responds to a different subtest of items from a large item pool and where both person and item parameter estimates are needed. The Rasch model is emphasized, and its non-parametric counterpart (the…
Standard Errors and Confidence Intervals from Bootstrapping for Ramsay-Curve Item Response Theory Model Item Parameters

ERIC Educational Resources Information Center

Gu, Fei; Skorupski, William P.; Hoyle, Larry; Kingston, Neal M.

2011-01-01

Ramsay-curve item response theory (RC-IRT) is a nonparametric procedure that estimates the latent trait using splines, and no distributional assumption about the latent trait is required. For item parameters of the two-parameter logistic (2-PL), three-parameter logistic (3-PL), and polytomous IRT models, RC-IRT can provide more accurate estimates…
Estimating Marginal Returns to Higher Education in the UK. NBER Working Paper No. 13534

ERIC Educational Resources Information Center

Moffitt, Robert

2007-01-01

A long-standing issue in the literature on education is whether marginal returns to education fall as education rises. If the population differs in its rate of return, a closely related question is whether marginal returns to higher education fall as a greater fraction of the population enrolls. This paper proposes a nonparametric method of…
Alternating event processes during lifetimes: population dynamics and statistical inference.

PubMed

Shinohara, Russell T; Sun, Yifei; Wang, Mei-Cheng

2018-01-01

In the literature studying recurrent event data, a large amount of work has been focused on univariate recurrent event processes where the occurrence of each event is treated as a single point in time. There are many applications, however, in which univariate recurrent events are insufficient to characterize the feature of the process because patients experience nontrivial durations associated with each event. This results in an alternating event process where the disease status of a patient alternates between exacerbations and remissions. In this paper, we consider the dynamics of a chronic disease and its associated exacerbation-remission process over two time scales: calendar time and time-since-onset. In particular, over calendar time, we explore population dynamics and the relationship between incidence, prevalence and duration for such alternating event processes. We provide nonparametric estimation techniques for characteristic quantities of the process. In some settings, exacerbation processes are observed from an onset time until death; to account for the relationship between the survival and alternating event processes, nonparametric approaches are developed for estimating exacerbation process over lifetime. By understanding the population dynamics and within-process structure, the paper provide a new and general way to study alternating event processes.
International comparisons of the technical efficiency of the hospital sector: panel data analysis of OECD countries using parametric and non-parametric approaches.

PubMed

Varabyova, Yauheniya; Schreyögg, Jonas

2013-09-01

There is a growing interest in the cross-country comparisons of the performance of national health care systems. The present work provides a comparison of the technical efficiency of the hospital sector using unbalanced panel data from OECD countries over the period 2000-2009. The estimation of the technical efficiency of the hospital sector is performed using nonparametric data envelopment analysis (DEA) and parametric stochastic frontier analysis (SFA). Internal and external validity of findings is assessed by estimating the Spearman rank correlations between the results obtained in different model specifications. The panel-data analyses using two-step DEA and one-stage SFA show that countries, which have higher health care expenditure per capita, tend to have a more technically efficient hospital sector. Whether the expenditure is financed through private or public sources is not related to the technical efficiency of the hospital sector. On the other hand, the hospital sector in countries with higher income inequality and longer average hospital length of stay is less technically efficient. Copyright © 2013 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Forecasting Epidemics Through Nonparametric Estimation of Time-Dependent Transmission Rates Using the SEIR Model.

PubMed

Smirnova, Alexandra; deCamp, Linda; Chowell, Gerardo

2017-05-02

Deterministic and stochastic methods relying on early case incidence data for forecasting epidemic outbreaks have received increasing attention during the last few years. In mathematical terms, epidemic forecasting is an ill-posed problem due to instability of parameter identification and limited available data. While previous studies have largely estimated the time-dependent transmission rate by assuming specific functional forms (e.g., exponential decay) that depend on a few parameters, here we introduce a novel approach for the reconstruction of nonparametric time-dependent transmission rates by projecting onto a finite subspace spanned by Legendre polynomials. This approach enables us to effectively forecast future incidence cases, the clear advantage over recovering the transmission rate at finitely many grid points within the interval where the data are currently available. In our approach, we compare three regularization algorithms: variational (Tikhonov's) regularization, truncated singular value decomposition (TSVD), and modified TSVD in order to determine the stabilizing strategy that is most effective in terms of reliability of forecasting from limited data. We illustrate our methodology using simulated data as well as case incidence data for various epidemics including the 1918 influenza pandemic in San Francisco and the 2014-2015 Ebola epidemic in West Africa.
A Machine Learning Framework for Plan Payment Risk Adjustment.

PubMed

Rose, Sherri

2016-12-01

To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R 2 . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.
Modeling the human development index and the percentage of poor people using quantile smoothing splines

NASA Astrophysics Data System (ADS)

Mulyani, Sri; Andriyana, Yudhie; Sudartianto

2017-03-01

Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.
Monitoring waterbird abundance in wetlands: The importance of controlling results for variation in water depth

USGS Publications Warehouse

Bolduc, F.; Afton, A.D.

2008-01-01

Wetland use by waterbirds is highly dependent on water depth, and depth requirements generally vary among species. Furthermore, water depth within wetlands often varies greatly over time due to unpredictable hydrological events, making comparisons of waterbird abundance among wetlands difficult as effects of habitat variables and water depth are confounded. Species-specific relationships between bird abundance and water depth necessarily are non-linear; thus, we developed a methodology to correct waterbird abundance for variation in water depth, based on the non-parametric regression of these two variables. Accordingly, we used the difference between observed and predicted abundances from non-parametric regression (analogous to parametric residuals) as an estimate of bird abundance at equivalent water depths. We scaled this difference to levels of observed and predicted abundances using the formula: ((observed - predicted abundance)/(observed + predicted abundance)) ?? 100. This estimate also corresponds to the observed:predicted abundance ratio, which allows easy interpretation of results. We illustrated this methodology using two hypothetical species that differed in water depth and wetland preferences. Comparisons of wetlands, using both observed and relative corrected abundances, indicated that relative corrected abundance adequately separates the effect of water depth from the effect of wetlands. ?? 2008 Elsevier B.V.
Dynamic Experiment Design Regularization Approach to Adaptive Imaging with Array Radar/SAR Sensor Systems

PubMed Central

Shkvarko, Yuriy; Tuxpan, José; Santos, Stewart

2011-01-01

We consider a problem of high-resolution array radar/SAR imaging formalized in terms of a nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the random wavefield scattered from a remotely sensed scene observed through a kernel signal formation operator and contaminated with random Gaussian noise. First, the Sobolev-type solution space is constructed to specify the class of consistent kernel SSP estimators with the reproducing kernel structures adapted to the metrics in such the solution space. Next, the “model-free” variational analysis (VA)-based image enhancement approach and the “model-based” descriptive experiment design (DEED) regularization paradigm are unified into a new dynamic experiment design (DYED) regularization framework. Application of the proposed DYED framework to the adaptive array radar/SAR imaging problem leads to a class of two-level (DEED-VA) regularized SSP reconstruction techniques that aggregate the kernel adaptive anisotropic windowing with the projections onto convex sets to enforce the consistency and robustness of the overall iterative SSP estimators. We also show how the proposed DYED regularization method may be considered as a generalization of the MVDR, APES and other high-resolution nonparametric adaptive radar sensing techniques. A family of the DYED-related algorithms is constructed and their effectiveness is finally illustrated via numerical simulations. PMID:22163859
The COS-Halos Survey: Metallicities in the Low-redshift Circumgalactic Medium

NASA Astrophysics Data System (ADS)

Prochaska, J. Xavier; Werk, Jessica K.; Worseck, Gábor; Tripp, Todd M.; Tumlinson, Jason; Burchett, Joseph N.; Fox, Andrew J.; Fumagalli, Michele; Lehner, Nicolas; Peeples, Molly S.; Tejos, Nicolas

2017-03-01

We analyze new far-ultraviolet spectra of 13 quasars from the z˜ 0.2 COS-Halos survey that cover the H I Lyman limit of 14 circumgalactic medium (CGM) systems. These data yield precise estimates or more constraining limits than previous COS-Halos measurements on the H I column densities {N}{{H}{{I}}}. We then apply a Monte-Carlo Markov chain approach on 32 systems from COS-Halos to estimate the metallicity of the cool (T˜ {10}4 K) CGM gas that gives rise to low-ionization state metal lines, under the assumption of photoionization equilibrium with the extragalactic UV background. The principle results are: (1) the CGM of field L* galaxies exhibits a declining H I surface density with impact parameter {R}\\perp (at > 99.5 % confidence), (2) the transmission of ionizing radiation through CGM gas alone is 70 ± 7% (3) the metallicity distribution function of the cool CGM is unimodal with a median of {10}-0.51 {Z}⊙ and a 95% interval ≈ 1/50 {Z}⊙ to > 3 {Z}⊙ ; the incidence of metal-poor (< 1/100 {Z}⊙ ) gas is low, implying any such gas discovered along quasar sightlines is typically unrelated to L* galaxies; (4) we find an unexpected increase in gas metallicity with declining {N}{{H}{{I}}} (at > 99.9 % confidence) and, therefore, also with increasing {R}\\perp ; the high metallicity at large radii implies early enrichment; and (5) a non-parametric estimate of the cool CGM gas mass is {M}{CGM}{cool}=(9.2+/- 4.3)× {10}10 {M}⊙ , which together with new mass estimates for the hot CGM may resolve the galactic missing baryons problem. Future analyses of halo gas should focus on the underlying astrophysics governing the CGM, rather than processes that simply expel the medium from the halo. Based on observations made with the NASA/ESA Hubble Space Telescope, obtained at the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS 5-26555. These observations are associated with programs 13033 and 11598.
Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.

PubMed

Dwivedi, Alok Kumar; Mallawaarachchi, Indika; Alvarado, Luis A

2017-06-30

Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
VizieR Online Data Catalog: ATLAS3D Project. XXX (McDermid+, 2015)

NASA Astrophysics Data System (ADS)

McDermid, R. M.; Alatalo, K.; Blitz, L.; Bournaud, F.; Bureau, M.; Cappellari, M.; Crocker, A. F.; Davies, R. L.; Davis, T. A.; De Zeeuw, P. T.; Duc, P.-A.; Emsellem, E.; Khochfar, S.; Krajnovic, D.; Kuntschner, H.; Morganti, R.; Naab, T.; Oosterloo, T.; Sarzi, M.; Scott, N.; Serra, P.; Weijmans, A.-M.; Young, L. M.

2015-09-01

We present the stellar population content of early-type galaxies from the ATLAS3D survey. Using spectra integrated within apertures covering up to one effective radius, we apply two methods: one based on measuring line-strength indices and applying single stellar population (SSP) models to derive SSP-equivalent values of stellar age, metallicity, and alpha enhancement; and one based on spectral fitting to derive non-parametric star formation histories, mass-weighted average values of age, metallicity, and half-mass formation time-scales. Using homogeneously derived effective radii and dynamically determined galaxy masses, we present the distribution of stellar population parameters on the Mass Plane (MJAM, σe, Rmaje), showing that at fixed mass, compact early-type galaxies are on average older, more metal-rich, and more alpha-enhanced than their larger counterparts. From non-parametric star formation histories, we find that the duration of star formation is systematically more extended in lower mass objects. Assuming that our sample represents most of the stellar content of today's local Universe, approximately 50 percent of all stars formed within the first 2Gyr following the big bang. Most of these stars reside today in the most massive galaxies (>1010.5M⊙), which themselves formed 90 percent of their stars by z~2. The lower mass objects, in contrast, have formed barely half their stars in this time interval. Stellar population properties are independent of environment over two orders of magnitude in local density, varying only with galaxy mass. In the highest density regions of our volume (dominated by the Virgo cluster), galaxies are older, alpha-enhanced, and have shorter star formation histories with respect to lower density regions. (4 data files).
The ATLAS3D Project - XXX. Star formation histories and stellar population scaling relations of early-type galaxies

NASA Astrophysics Data System (ADS)

McDermid, Richard M.; Alatalo, Katherine; Blitz, Leo; Bournaud, Frédéric; Bureau, Martin; Cappellari, Michele; Crocker, Alison F.; Davies, Roger L.; Davis, Timothy A.; de Zeeuw, P. T.; Duc, Pierre-Alain; Emsellem, Eric; Khochfar, Sadegh; Krajnović, Davor; Kuntschner, Harald; Morganti, Raffaella; Naab, Thorsten; Oosterloo, Tom; Sarzi, Marc; Scott, Nicholas; Serra, Paolo; Weijmans, Anne-Marie; Young, Lisa M.

2015-04-01

We present the stellar population content of early-type galaxies from the ATLAS3D survey. Using spectra integrated within apertures covering up to one effective radius, we apply two methods: one based on measuring line-strength indices and applying single stellar population (SSP) models to derive SSP-equivalent values of stellar age, metallicity, and alpha enhancement; and one based on spectral fitting to derive non-parametric star formation histories, mass-weighted average values of age, metallicity, and half-mass formation time-scales. Using homogeneously derived effective radii and dynamically determined galaxy masses, we present the distribution of stellar population parameters on the Mass Plane (MJAM, σe, R^maj_e), showing that at fixed mass, compact early-type galaxies are on average older, more metal-rich, and more alpha-enhanced than their larger counterparts. From non-parametric star formation histories, we find that the duration of star formation is systematically more extended in lower mass objects. Assuming that our sample represents most of the stellar content of today's local Universe, approximately 50 per cent of all stars formed within the first 2 Gyr following the big bang. Most of these stars reside today in the most massive galaxies (>1010.5 M⊙), which themselves formed 90 per cent of their stars by z ˜ 2. The lower mass objects, in contrast, have formed barely half their stars in this time interval. Stellar population properties are independent of environment over two orders of magnitude in local density, varying only with galaxy mass. In the highest density regions of our volume (dominated by the Virgo cluster), galaxies are older, alpha-enhanced, and have shorter star formation histories with respect to lower density regions.
Hazard Function Estimation with Cause-of-Death Data Missing at Random

PubMed Central

Wang, Qihua; Dinse, Gregg E.; Liu, Chunling

2010-01-01

Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data. PMID:22267874
Software Supportability Risk Assessment in OT&E (Operational Test and Evaluation): Literature Review, Current Research Review, and Data Base Assemblage.

DTIC Science & Technology

1984-09-28

variables before simula- tion of model - Search for reality checks a, - Express uncertainty as a probability density distribution. a. H2 a, H-22 TWIF... probability that the software con- tains errors. This prior is updated as test failure data are accumulated. Only a p of 1 (software known to contain...discusssed; both parametric and nonparametric versions are presented. It is shown by the author that the bootstrap underlies the jackknife method and
Evaluation of species richness estimators based on quantitative performance measures and sensitivity to patchiness and sample grain size

NASA Astrophysics Data System (ADS)

Willie, Jacob; Petre, Charles-Albert; Tagg, Nikki; Lens, Luc

2012-11-01

Data from forest herbaceous plants in a site of known species richness in Cameroon were used to test the performance of rarefaction and eight species richness estimators (ACE, ICE, Chao1, Chao2, Jack1, Jack2, Bootstrap and MM). Bias, accuracy, precision and sensitivity to patchiness and sample grain size were the evaluation criteria. An evaluation of the effects of sampling effort and patchiness on diversity estimation is also provided. Stems were identified and counted in linear series of 1-m2 contiguous square plots distributed in six habitat types. Initially, 500 plots were sampled in each habitat type. The sampling process was monitored using rarefaction and a set of richness estimator curves. Curves from the first dataset suggested adequate sampling in riparian forest only. Additional plots ranging from 523 to 2143 were subsequently added in the undersampled habitats until most of the curves stabilized. Jack1 and ICE, the non-parametric richness estimators, performed better, being more accurate and less sensitive to patchiness and sample grain size, and significantly reducing biases that could not be detected by rarefaction and other estimators. This study confirms the usefulness of non-parametric incidence-based estimators, and recommends Jack1 or ICE alongside rarefaction while describing taxon richness and comparing results across areas sampled using similar or different grain sizes. As patchiness varied across habitat types, accurate estimations of diversity did not require the same number of plots. The number of samples needed to fully capture diversity is not necessarily the same across habitats, and can only be known when taxon sampling curves have indicated adequate sampling. Differences in observed species richness between habitats were generally due to differences in patchiness, except between two habitats where they resulted from differences in abundance. We suggest that communities should first be sampled thoroughly using appropriate taxon sampling curves before explaining differences in diversity.
Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information.

PubMed

Fan, Yue; Wang, Xiao; Peng, Qinke

2017-01-01

Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.

The Highly Adaptive Lasso Estimator

PubMed Central

Benkeser, David; van der Laan, Mark

2017-01-01

Estimation of a regression functions is a common goal of statistical learning. We propose a novel nonparametric regression estimator that, in contrast to many existing methods, does not rely on local smoothness assumptions nor is it constructed using local smoothing techniques. Instead, our estimator respects global smoothness constraints by virtue of falling in a class of right-hand continuous functions with left-hand limits that have variation norm bounded by a constant. Using empirical process theory, we establish a fast minimal rate of convergence of our proposed estimator and illustrate how such an estimator can be constructed using standard software. In simulations, we show that the finite-sample performance of our estimator is competitive with other popular machine learning techniques across a variety of data generating mechanisms. We also illustrate competitive performance in real data examples using several publicly available data sets. PMID:29094111
Estimation of two ordered mean residual lifetime functions.

PubMed

Ebrahimi, N

1993-06-01

In many statistical studies involving failure data, biometric mortality data, and actuarial data, mean residual lifetime (MRL) function is of prime importance. In this paper we introduce the problem of nonparametric estimation of a MRL function on an interval when this function is bounded from below by another such function (known or unknown) on that interval, and derive the corresponding two functional estimators. The first is to be used when there is a known bound, and the second when the bound is another MRL function to be estimated independently. Both estimators are obtained by truncating the empirical estimator discussed by Yang (1978, Annals of Statistics 6, 112-117). In the first case, it is truncated at a known bound; in the second, at a point somewhere between the two empirical estimates. Consistency of both estimators is proved, and a pointwise large-sample distribution theory of the first estimator is derived.
Estimating the Distribution of the Incubation Periods of Human Avian Influenza A(H7N9) Virus Infections.

PubMed

Virlogeux, Victor; Li, Ming; Tsang, Tim K; Feng, Luzhao; Fang, Vicky J; Jiang, Hui; Wu, Peng; Zheng, Jiandong; Lau, Eric H Y; Cao, Yu; Qin, Ying; Liao, Qiaohong; Yu, Hongjie; Cowling, Benjamin J

2015-10-15

A novel avian influenza virus, influenza A(H7N9), emerged in China in early 2013 and caused severe disease in humans, with infections occurring most frequently after recent exposure to live poultry. The distribution of A(H7N9) incubation periods is of interest to epidemiologists and public health officials, but estimation of the distribution is complicated by interval censoring of exposures. Imputation of the midpoint of intervals was used in some early studies, resulting in estimated mean incubation times of approximately 5 days. In this study, we estimated the incubation period distribution of human influenza A(H7N9) infections using exposure data available for 229 patients with laboratory-confirmed A(H7N9) infection from mainland China. A nonparametric model (Turnbull) and several parametric models accounting for the interval censoring in some exposures were fitted to the data. For the best-fitting parametric model (Weibull), the mean incubation period was 3.4 days (95% confidence interval: 3.0, 3.7) and the variance was 2.9 days; results were very similar for the nonparametric Turnbull estimate. Under the Weibull model, the 95th percentile of the incubation period distribution was 6.5 days (95% confidence interval: 5.9, 7.1). The midpoint approximation for interval-censored exposures led to overestimation of the mean incubation period. Public health observation of potentially exposed persons for 7 days after exposure would be appropriate. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Leveraging Genomic Annotations and Pleiotropic Enrichment for Improved Replication Rates in Schizophrenia GWAS

PubMed Central

Wang, Yunpeng; Thompson, Wesley K.; Schork, Andrew J.; Holland, Dominic; Chen, Chi-Hua; Bettella, Francesco; Desikan, Rahul S.; Li, Wen; Witoelar, Aree; Zuber, Verena; Devor, Anna; Nöthen, Markus M.; Rietschel, Marcella; Chen, Qiang; Werge, Thomas; Cichon, Sven; Weinberger, Daniel R.; Djurovic, Srdjan; O’Donovan, Michael; Visscher, Peter M.; Andreassen, Ole A.; Dale, Anders M.

2016-01-01

Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotropy) for each single nucleotide polymorphism (SNP) to enable more accurate estimation of replication probabilities, conditional on the observed test statistic (“z-score”) of the SNP. We use a multiple logistic regression on z-scores to combine information from auxiliary information to derive a “relative enrichment score” for each SNP. For each stratum of these relative enrichment scores, we obtain nonparametric estimates of posterior expected test statistics and replication probabilities as a function of discovery z-scores, using a resampling-based approach that repeatedly and randomly partitions meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both they are genome-wide significant (p < 5x10-8). There were 693 and 219 independent loci with model-based replication rates ≥80% and ≥90%, respectively. Compared to analyses not incorporating relative enrichment scores, CM3 increased out-of-sample yield for SNPs that replicate at a given rate. This demonstrates that replication probabilities can be more accurately estimated using prior enrichment information with CM3. PMID:26808560
When the Single Matters more than the Group (II): Addressing the Problem of High False Positive Rates in Single Case Voxel Based Morphometry Using Non-parametric Statistics.

PubMed

Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea

2016-01-01

In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used to characterize neuroanatomical alterations in individual subjects as long as non-parametric statistics are employed.
A comparison of United States and United Kingdom EQ-5D health states valuations using a nonparametric Bayesian method.

PubMed

Kharroubi, Samer A; O'Hagan, Anthony; Brazier, John E

2010-07-10

Cost-effectiveness analysis of alternative medical treatments relies on having a measure of effectiveness, and many regard the quality adjusted life year (QALY) to be the current 'gold standard.' In order to compute QALYs, we require a suitable system for describing a person's health state, and a utility measure to value the quality of life associated with each possible state. There are a number of different health state descriptive systems, and we focus here on one known as the EQ-5D. Data for estimating utilities for different health states have a number of features that mean care is necessary in statistical modelling.There is interest in the extent to which valuations of health may differ between different countries and cultures, but few studies have compared preference values of health states obtained from different countries. This article applies a nonparametric model to estimate and compare EQ-5D health state valuation data obtained from two countries using Bayesian methods. The data set is the US and UK EQ-5D valuation studies where a sample of 42 states defined by the EQ-5D was valued by representative samples of the general population from each country using the time trade-off technique. We estimate a utility function across both countries which explicitly accounts for the differences between them, and is estimated using the data from both countries. The article discusses the implications of these results for future applications of the EQ-5D and for further work in this field. Copyright 2010 John Wiley & Sons, Ltd.
A Frequency-Domain Multipath Parameter Estimation and Mitigation Method for BOC-Modulated GNSS Signals

PubMed Central

Sun, Chao; Feng, Wenquan; Du, Songlin

2018-01-01

As multipath is one of the dominating error sources for high accuracy Global Navigation Satellite System (GNSS) applications, multipath mitigation approaches are employed to minimize this hazardous error in receivers. Binary offset carrier modulation (BOC), as a modernized signal structure, is adopted to achieve significant enhancement. However, because of its multi-peak autocorrelation function, conventional multipath mitigation techniques for binary phase shift keying (BPSK) signal would not be optimal. Currently, non-parametric and parametric approaches have been studied specifically aiming at multipath mitigation for BOC signals. Non-parametric techniques, such as Code Correlation Reference Waveforms (CCRW), usually have good feasibility with simple structures, but suffer from low universal applicability for different BOC signals. Parametric approaches can thoroughly eliminate multipath error by estimating multipath parameters. The problems with this category are at the high computation complexity and vulnerability to the noise. To tackle the problem, we present a practical parametric multipath estimation method in the frequency domain for BOC signals. The received signal is transferred to the frequency domain to separate out the multipath channel transfer function for multipath parameter estimation. During this process, we take the operations of segmentation and averaging to reduce both noise effect and computational load. The performance of the proposed method is evaluated and compared with the previous work in three scenarios. Results indicate that the proposed averaging-Fast Fourier Transform (averaging-FFT) method achieves good robustness in severe multipath environments with lower computational load for both low-order and high-order BOC signals. PMID:29495589
Numerical trials of HISSE

NASA Technical Reports Server (NTRS)

Peters, C.; Kampe, F. (Principal Investigator)

1980-01-01

The mathematical description and implementation of the statistical estimation procedure known as the Houston integrated spatial/spectral estimator (HISSE) is discussed. HISSE is based on a normal mixture model and is designed to take advantage of spectral and spatial information of LANDSAT data pixels, utilizing the initial classification and clustering information provided by the AMOEBA algorithm. The HISSE calculates parametric estimates of class proportions which reduce the error inherent in estimates derived from typical classify and count procedures common to nonparametric clustering algorithms. It also singles out spatial groupings of pixels which are most suitable for labeling classes. These calculations are designed to aid the analyst/interpreter in labeling patches with a crop class label. Finally, HISSE's initial performance on an actual LANDSAT agricultural ground truth data set is reported.
Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts

NASA Astrophysics Data System (ADS)

AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.

2014-12-01

Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using nonparametric kernel methods. In addition, to the pointwise hourly wind speed forecasts, a confidence interval is also provided which allows to quantify the uncertainty around the forecasts.
Efficient Regressions via Optimally Combining Quantile Information*

PubMed Central

Zhao, Zhibiao; Xiao, Zhijie

2014-01-01

We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481
Kernel Density Surface Modelling as a Means to Identify Significant Concentrations of Vulnerable Marine Ecosystem Indicators

PubMed Central

Kenchington, Ellen; Murillo, Francisco Javier; Lirette, Camille; Sacau, Mar; Koen-Alonso, Mariano; Kenny, Andrew; Ollerhead, Neil; Wareham, Vonda; Beazley, Lindsay

2014-01-01

The United Nations General Assembly Resolution 61/105, concerning sustainable fisheries in the marine ecosystem, calls for the protection of vulnerable marine ecosystems (VME) from destructive fishing practices. Subsequently, the Food and Agriculture Organization (FAO) produced guidelines for identification of VME indicator species/taxa to assist in the implementation of the resolution, but recommended the development of case-specific operational definitions for their application. We applied kernel density estimation (KDE) to research vessel trawl survey data from inside the fishing footprint of the Northwest Atlantic Fisheries Organization (NAFO) Regulatory Area in the high seas of the northwest Atlantic to create biomass density surfaces for four VME indicator taxa: large-sized sponges, sea pens, small and large gorgonian corals. These VME indicator taxa were identified previously by NAFO using the fragility, life history characteristics and structural complexity criteria presented by FAO, along with an evaluation of their recovery trajectories. KDE, a non-parametric neighbour-based smoothing function, has been used previously in ecology to identify hotspots, that is, areas of relatively high biomass/abundance. We present a novel approach of examining relative changes in area under polygons created from encircling successive biomass categories on the KDE surface to identify “significant concentrations” of biomass, which we equate to VMEs. This allows identification of the VMEs from the broader distribution of the species in the study area. We provide independent assessments of the VMEs so identified using underwater images, benthic sampling with other gear types (dredges, cores), and/or published species distribution models of probability of occurrence, as available. For each VME indicator taxon we provide a brief review of their ecological function which will be important in future assessments of significant adverse impact on these habitats here and elsewhere. PMID:25289667
How to break the density-anisotropy degeneracy in spherical stellar systems

NASA Astrophysics Data System (ADS)

Read, J. I.; Steger, P.

2017-11-01

We present a new non-parametric Jeans code, GravSphere, that recovers the density ρ(r) and velocity anisotropy β(r) of spherical stellar systems, assuming only that they are in a steady state. Using a large suite of mock data, we confirm that with only line-of-sight velocity data, GravSphere provides a good estimate of the density at the projected stellar half-mass radius, ρ(R1/2), but is not able to measure ρ(r) or β(r), even with 10 000 tracer stars. We then test three popular methods for breaking this ρ - β degeneracy: using multiple populations with different R1/2; using higher order `virial shape parameters' (VSPs); and including proper motion data. We find that two populations provide an excellent recovery of ρ(r) in-between their respective R1/2. However, even with a total of ˜7000 tracers, we are not able to well-constrain β(r) for either population. By contrast, using 1000 tracers with higher order VSPs we are able to measure ρ(r) over the range 0.5 < r/R1/2 < 2 and broadly constrain β(r). Including proper motion data for all stars gives an even better performance, with ρ and β well-measured over the range 0.25 < r/R1/2 < 4. Finally, we test GravSphere on a triaxial mock galaxy that has axis ratios typical of a merger remnant [1 : 0.8 : 0.6]. In this case, GravSphere can become slightly biased. However, we find that when this occurs the data are poorly fit, allowing us to detect when such departures from spherical symmetry become problematic.
Kernel density surface modelling as a means to identify significant concentrations of vulnerable marine ecosystem indicators.

PubMed

Kenchington, Ellen; Murillo, Francisco Javier; Lirette, Camille; Sacau, Mar; Koen-Alonso, Mariano; Kenny, Andrew; Ollerhead, Neil; Wareham, Vonda; Beazley, Lindsay

2014-01-01

The United Nations General Assembly Resolution 61/105, concerning sustainable fisheries in the marine ecosystem, calls for the protection of vulnerable marine ecosystems (VME) from destructive fishing practices. Subsequently, the Food and Agriculture Organization (FAO) produced guidelines for identification of VME indicator species/taxa to assist in the implementation of the resolution, but recommended the development of case-specific operational definitions for their application. We applied kernel density estimation (KDE) to research vessel trawl survey data from inside the fishing footprint of the Northwest Atlantic Fisheries Organization (NAFO) Regulatory Area in the high seas of the northwest Atlantic to create biomass density surfaces for four VME indicator taxa: large-sized sponges, sea pens, small and large gorgonian corals. These VME indicator taxa were identified previously by NAFO using the fragility, life history characteristics and structural complexity criteria presented by FAO, along with an evaluation of their recovery trajectories. KDE, a non-parametric neighbour-based smoothing function, has been used previously in ecology to identify hotspots, that is, areas of relatively high biomass/abundance. We present a novel approach of examining relative changes in area under polygons created from encircling successive biomass categories on the KDE surface to identify "significant concentrations" of biomass, which we equate to VMEs. This allows identification of the VMEs from the broader distribution of the species in the study area. We provide independent assessments of the VMEs so identified using underwater images, benthic sampling with other gear types (dredges, cores), and/or published species distribution models of probability of occurrence, as available. For each VME indicator taxon we provide a brief review of their ecological function which will be important in future assessments of significant adverse impact on these habitats here and elsewhere.
Constrained multiple indicator kriging using sequential quadratic programming

NASA Astrophysics Data System (ADS)

Soltani-Mohammadi, Saeed; Erhan Tercan, A.

2012-11-01

Multiple indicator kriging (MIK) is a nonparametric method used to estimate conditional cumulative distribution functions (CCDF). Indicator estimates produced by MIK may not satisfy the order relations of a valid CCDF which is ordered and bounded between 0 and 1. In this paper a new method has been presented that guarantees the order relations of the cumulative distribution functions estimated by multiple indicator kriging. The method is based on minimizing the sum of kriging variances for each cutoff under unbiasedness and order relations constraints and solving constrained indicator kriging system by sequential quadratic programming. A computer code is written in the Matlab environment to implement the developed algorithm and the method is applied to the thickness data.
Effect of once-yearly zoledronic acid on the spine and hip as measured by quantitative computed tomography: results of the HORIZON Pivotal Fracture Trial

PubMed Central

Lang, T.; Boonen, S.; Cummings, S.; Delmas, P. D.; Cauley, J. A.; Horowitz, Z.; Kerzberg, E.; Bianchi, G.; Kendler, D.; Leung, P.; Man, Z.; Mesenbrink, P.; Eriksen, E. F.; Black, D. M.

2016-01-01

Summary Changes in bone mineral density and bone strength following treatment with zoledronic acid (ZOL) were measured by quantitative computed analysis (QCT) or dual-energy X-ray absorptiometry (DXA). ZOL treatment increased spine and hip BMD vs placebo, assessed by QCT and DXA. Changes in trabecular bone resulted in increased bone strength. Introduction To investigate bone mineral density (BMD) changes in trabecular and cortical bone, estimated by quantitative computed analysis (QCT) or dual-energy X-ray absorptiometry (DXA), and whether zoledronic acid 5 mg (ZOL) affects bone strength. Methods In 233 women from a randomized, controlled trial of once-yearly ZOL, lumbar spine, total hip, femoral neck, and trochanter were assessed by DXA and QCT (baseline, Month 36). Mean percentage changes from baseline and between-treatment differences (ZOL vs placebo, t-test) were evaluated. Results Mean between-treatment differences for lumbar spine BMD were significant by DXA (7.0%, p<0.01) and QCT (5.7%, p<0.0001). Between-treatment differences were significant for trabecular spine (p=0.0017) [non-parametric test], trabecular trochanter (10.7%, p<0.0001), total hip (10.8%, p<0.0001), and compressive strength indices at femoral neck (8.6%, p=0.0001), and trochanter (14.1%, p<0.0001). Conclusions Once-yearly ZOL increased hip and spine BMD vs placebo, assessed by QCT vs DXA. Changes in trabecular bone resulted in increased indices of compressive strength. PMID:19802508
A nonparametric test for Markovianity in the illness-death model.

PubMed

Rodríguez-Girondo, Mar; de Uña-Álvarez, Jacobo

2012-12-30

Multistate models are useful tools for modeling disease progression when survival is the main outcome, but several intermediate events of interest are observed during the follow-up time. The illness-death model is a special multistate model with important applications in the biomedical literature. It provides a suitable representation of the individual's history when a unique intermediate event can be experienced before the main event of interest. Nonparametric estimation of transition probabilities in this and other multistate models is usually performed through the Aalen-Johansen estimator under a Markov assumption. The Markov assumption claims that given the present state, the future evolution of the illness is independent of the states previously visited and the transition times among them. However, this assumption fails in some applications, leading to inconsistent estimates. In this paper, we provide a new approach for testing Markovianity in the illness-death model. The new method is based on measuring the future-past association along time. This results in a detailed inspection of the process, which often reveals a non-Markovian behavior with different trends in the association measure. A test of significance for zero future-past association at each time point is introduced, and a significance trace is proposed accordingly. Besides, we propose a global test for Markovianity based on a supremum-type test statistic. The finite sample performance of the test is investigated through simulations. We illustrate the new method through the analysis of two biomedical data analysis. Copyright © 2012 John Wiley & Sons, Ltd.
A hybrid method in combining treatment effects from matched and unmatched studies.

PubMed

Byun, Jinyoung; Lai, Dejian; Luo, Sheng; Risser, Jan; Tung, Betty; Hardy, Robert J

2013-12-10

The most common data structures in the biomedical studies have been matched or unmatched designs. Data structures resulting from a hybrid of the two may create challenges for statistical inferences. The question may arise whether to use parametric or nonparametric methods on the hybrid data structure. The Early Treatment for Retinopathy of Prematurity study was a multicenter clinical trial sponsored by the National Eye Institute. The design produced data requiring a statistical method of a hybrid nature. An infant in this multicenter randomized clinical trial had high-risk prethreshold retinopathy of prematurity that was eligible for treatment in one or both eyes at entry into the trial. During follow-up, recognition visual acuity was accessed for both eyes. Data from both eyes (matched) and from only one eye (unmatched) were eligible to be used in the trial. The new hybrid nonparametric method is a meta-analysis based on combining the Hodges-Lehmann estimates of treatment effects from the Wilcoxon signed rank and rank sum tests. To compare the new method, we used the classic meta-analysis with the t-test method to combine estimates of treatment effects from the paired and two sample t-tests. We used simulations to calculate the empirical size and power of the test statistics, as well as the bias, mean square and confidence interval width of the corresponding estimators. The proposed method provides an effective tool to evaluate data from clinical trials and similar comparative studies. Copyright © 2013 John Wiley & Sons, Ltd.
Using Multivariate Adaptive Regression Spline and Artificial Neural Network to Simulate Urbanization in Mumbai, India

NASA Astrophysics Data System (ADS)

Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.

2015-12-01

Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.
Statistical Models and Inference Procedures for Structural and Materials Reliability

DTIC Science & Technology

1990-12-01

as an official Department of the Army positio~n, policy, or decision, unless sD designated by other documentazion. 12a. DISTRIBUTION /AVAILABILITY...Some general stress-strength models were also developed and applied to the failure of systems subject to cyclic loading. Involved in the failure of...process control ideas and sequential design and analysis methods. Finally, smooth nonparametric quantile .wJ function estimators were studied. All of
The 2-10 keV unabsorbed luminosity function of AGN from the LSS, CDFS, and COSMOS surveys

NASA Astrophysics Data System (ADS)

Ranalli, P.; Koulouridis, E.; Georgantopoulos, I.; Fotopoulou, S.; Hsu, L.-T.; Salvato, M.; Comastri, A.; Pierre, M.; Cappelluti, N.; Carrera, F. J.; Chiappetti, L.; Clerc, N.; Gilli, R.; Iwasawa, K.; Pacaud, F.; Paltani, S.; Plionis, E.; Vignali, C.

2016-05-01

The XMM-Large scale structure (XMM-LSS), XMM-Cosmological evolution survey (XMM-COSMOS), and XMM-Chandra deep field south (XMM-CDFS) surveys are complementary in terms of sky coverage and depth. Together, they form a clean sample with the least possible variance in instrument effective areas and point spread function. Therefore this is one of the best samples available to determine the 2-10 keV luminosity function of active galactic nuclei (AGN) and their evolution. The samples and the relevant corrections for incompleteness are described. A total of 2887 AGN is used to build the LF in the luminosity interval 1042-1046 erg s-1 and in the redshift interval 0.001-4. A new method to correct for absorption by considering the probability distribution for the column density conditioned on the hardness ratio is presented. The binned luminosity function and its evolution is determined with a variant of the Page-Carrera method, which is improved to include corrections for absorption and to account for the full probability distribution of photometric redshifts. Parametric models, namely a double power law with luminosity and density evolution (LADE) or luminosity-dependent density evolution (LDDE), are explored using Bayesian inference. We introduce the Watanabe-Akaike information criterion (WAIC) to compare the models and estimate their predictive power. Our data are best described by the LADE model, as hinted by the WAIC indicator. We also explore the recently proposed 15-parameter extended LDDE model and find that this extension is not supported by our data. The strength of our method is that it provides unabsorbed, non-parametric estimates, credible intervals for luminosity function parameters, and a model choice based on predictive power for future data. Based on observations obtained with XMM-Newton, an ESA science mission with instruments and contributions directly funded by ESA member states and NASA.Tables with the samples of the posterior probability distributions are only available at the CDS via anonymous ftp to http://cdsarc.u-strasbg.fr (http://130.79.128.5) or via http://cdsarc.u-strasbg.fr/viz-bin/qcat?J/A+A/590/A80

Implications of heterogeneous impacts of protected areas on deforestation and poverty

PubMed Central

Hanauer, Merlin M.; Canavire-Bacarreza, Gustavo

2015-01-01

Protected areas are a popular policy instrument in the global fight against loss of biodiversity and ecosystem services. However, the effectiveness of protected areas in preventing deforestation, and their impacts on poverty, are not well understood. Recent studies have found that Bolivia's protected-area system, on average, reduced deforestation and poverty. We implement several non-parametric and semi-parametric econometric estimators to characterize the heterogeneity in Bolivia's protected-area impacts on joint deforestation and poverty outcomes across a number of socioeconomic and biophysical moderators. Like previous studies from Costa Rica and Thailand, we find that Bolivia's protected areas are not associated with poverty traps. Our results also indicate that protection did not have a differential impact on indigenous populations. However, results from new multidimensional non-parametric estimators provide evidence that the biophysical characteristics associated with the greatest avoided deforestation are the characteristics associated with the potential for poverty exacerbation from protection. We demonstrate that these results would not be identified using the methods implemented in previous studies. Thus, this study provides valuable practical information on the impacts of Bolivia's protected areas for conservation practitioners and demonstrates methods that are likely to be valuable to researchers interested in better understanding the heterogeneity in conservation impacts. PMID:26460125
Implications of heterogeneous impacts of protected areas on deforestation and poverty.

PubMed

Hanauer, Merlin M; Canavire-Bacarreza, Gustavo

2015-11-05

Protected areas are a popular policy instrument in the global fight against loss of biodiversity and ecosystem services. However, the effectiveness of protected areas in preventing deforestation, and their impacts on poverty, are not well understood. Recent studies have found that Bolivia's protected-area system, on average, reduced deforestation and poverty. We implement several non-parametric and semi-parametric econometric estimators to characterize the heterogeneity in Bolivia's protected-area impacts on joint deforestation and poverty outcomes across a number of socioeconomic and biophysical moderators. Like previous studies from Costa Rica and Thailand, we find that Bolivia's protected areas are not associated with poverty traps. Our results also indicate that protection did not have a differential impact on indigenous populations. However, results from new multidimensional non-parametric estimators provide evidence that the biophysical characteristics associated with the greatest avoided deforestation are the characteristics associated with the potential for poverty exacerbation from protection. We demonstrate that these results would not be identified using the methods implemented in previous studies. Thus, this study provides valuable practical information on the impacts of Bolivia's protected areas for conservation practitioners and demonstrates methods that are likely to be valuable to researchers interested in better understanding the heterogeneity in conservation impacts. © 2015 The Author(s).
Nonparametric tests for equality of psychometric functions.

PubMed

García-Pérez, Miguel A; Núñez-Antón, Vicente

2017-12-07

Many empirical studies measure psychometric functions (curves describing how observers' performance varies with stimulus magnitude) because these functions capture the effects of experimental conditions. To assess these effects, parametric curves are often fitted to the data and comparisons are carried out by testing for equality of mean parameter estimates across conditions. This approach is parametric and, thus, vulnerable to violations of the implied assumptions. Furthermore, testing for equality of means of parameters may be misleading: Psychometric functions may vary meaningfully across conditions on an observer-by-observer basis with no effect on the mean values of the estimated parameters. Alternative approaches to assess equality of psychometric functions per se are thus needed. This paper compares three nonparametric tests that are applicable in all situations of interest: The existing generalized Mantel-Haenszel test, a generalization of the Berry-Mielke test that was developed here, and a split variant of the generalized Mantel-Haenszel test also developed here. Their statistical properties (accuracy and power) are studied via simulation and the results show that all tests are indistinguishable as to accuracy but they differ non-uniformly as to power. Empirical use of the tests is illustrated via analyses of published data sets and practical recommendations are given. The computer code in MATLAB and R to conduct these tests is available as Electronic Supplemental Material.
The use of generalized linear models and generalized estimating equations in bioarchaeological studies.

PubMed

Nikita, Efthymia

2014-03-01

The current article explores whether the application of generalized linear models (GLM) and generalized estimating equations (GEE) can be used in place of conventional statistical analyses in the study of ordinal data that code an underlying continuous variable, like entheseal changes. The analysis of artificial data and ordinal data expressing entheseal changes in archaeological North African populations gave the following results. Parametric and nonparametric tests give convergent results particularly for P values <0.1, irrespective of whether the underlying variable is normally distributed or not under the condition that the samples involved in the tests exhibit approximately equal sizes. If this prerequisite is valid and provided that the samples are of equal variances, analysis of covariance may be adopted. GLM are not subject to constraints and give results that converge to those obtained from all nonparametric tests. Therefore, they can be used instead of traditional tests as they give the same amount of information as them, but with the advantage of allowing the study of the simultaneous impact of multiple predictors and their interactions and the modeling of the experimental data. However, GLM should be replaced by GEE for the study of bilateral asymmetry and in general when paired samples are tested, because GEE are appropriate for correlated data. Copyright © 2013 Wiley Periodicals, Inc.
Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

PubMed

Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

2012-01-01

Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Estimating extreme losses for the Florida Public Hurricane Model—part II

NASA Astrophysics Data System (ADS)

Gulati, Sneh; George, Florence; Hamid, Shahid

2018-02-01

Rising global temperatures are leading to an increase in the number of extreme events and losses (http://www.epa.gov/climatechange/science/indicators/). Accurate estimation of these extreme losses with the intention of protecting themselves against them is critical to insurance companies. In a previous paper, Gulati et al. (2014) discussed probable maximum loss (PML) estimation for the Florida Public Hurricane Loss Model (FPHLM) using parametric and nonparametric methods. In this paper, we investigate the use of semi-parametric methods to do the same. Detailed analysis of the data shows that the annual losses from FPHLM do not tend to be very heavy tailed, and therefore, neither the popular Hill's method nor the moment's estimator work well. However, Pickand's estimator with threshold around the 84th percentile provides a good fit for the extreme quantiles for the losses.
Intracranial current density (LORETA) differences in QEEG frequency bands between depressed and non-depressed alcoholic patients.

PubMed

Coutin-Churchman, Pedro; Moreno, Rocío

2008-04-01

To assess possible differences in intracranial source distribution of surface QEEG power between depressed and non-depressed alcoholic patients in order to find any symptom-related topographic features of physiopathologic relevance. Low-Resolution Electromagnetic Tomography (LORETA) for the delta, theta, alpha and beta bands of EEG spectra was estimated from 38 alcoholic patients, 20 with and 18 without clinical depression, in which QEEG showed decreased slow and increased beta activity diffusely. Statistical non-parametric mapping was used to compare depressed and non-depressed groups. Measures of intracranial current density in individual patients at areas of significant differences were correlated with BDI scores. Patients with clinical depression showed areas of significantly lower current density than non-depressed patients in delta band at left anterior temporal, left midtemporal (including amygdala and hippocampus), and both frontopolar cortices mostly on the right; and in theta band at bilateral parietal lobe, anterior cingulate and medial frontal cortex. No differences were found at alpha and beta band. Intracranial current density in delta band at left parahippocampal, left midfrontal cortex and right frontopolar cortex was negatively correlated with BDI score. Theta band also showed negative correlations with BDI at sites of significant differences. Diffusely decreased delta and theta activity in the surface QEEG of alcoholic patients has a different intracranial distribution linked to the presence or not of clinical depression that seems to reveal a dysfunctional neuronal state at several specific limbic and other cortical locations that have been related to a specific clinical disorder such as depression. These results provided further evidence on the effects of depression in the context of alcohol dependence, in this case decreased slow activity as a possible marker of neuronal damage secondary to alcohol toxicity, clinically expressed as depressive symptoms when present in structures that are known to be related to clinical depression.
Quadratic semiparametric Von Mises calculus

PubMed Central

Robins, James; Li, Lingling; Tchetgen, Eric

2009-01-01

We discuss a new method of estimation of parameters in semiparametric and nonparametric models. The method is based on U-statistics constructed from quadratic influence functions. The latter extend ordinary linear influence functions of the parameter of interest as defined in semiparametric theory, and represent second order derivatives of this parameter. For parameters for which the matching cannot be perfect the method leads to a bias-variance trade-off, and results in estimators that converge at a slower than n–1/2-rate. In a number of examples the resulting rate can be shown to be optimal. We are particularly interested in estimating parameters in models with a nuisance parameter of high dimension or low regularity, where the parameter of interest cannot be estimated at n–1/2-rate. PMID:23087487
Optimal Combinations of Diagnostic Tests Based on AUC.

PubMed

Huang, Xin; Qin, Gengsheng; Fang, Yixin

2011-06-01

When several diagnostic tests are available, one can combine them to achieve better diagnostic accuracy. This article considers the optimal linear combination that maximizes the area under the receiver operating characteristic curve (AUC); the estimates of the combination's coefficients can be obtained via a nonparametric procedure. However, for estimating the AUC associated with the estimated coefficients, the apparent estimation by re-substitution is too optimistic. To adjust for the upward bias, several methods are proposed. Among them the cross-validation approach is especially advocated, and an approximated cross-validation is developed to reduce the computational cost. Furthermore, these proposed methods can be applied for variable selection to select important diagnostic tests. The proposed methods are examined through simulation studies and applications to three real examples. © 2010, The International Biometric Society.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number.

PubMed

Fragkos, Konstantinos C; Tsagris, Michail; Frangos, Christos C

2014-01-01

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number

PubMed Central

Fragkos, Konstantinos C.; Tsagris, Michail; Frangos, Christos C.

2014-01-01

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator. PMID:27437470
Application of nonparametric regression methods to study the relationship between NO2 concentrations and local wind direction and speed at background sites.

PubMed

Donnelly, Aoife; Misstear, Bruce; Broderick, Brian

2011-02-15

Background concentrations of nitrogen dioxide (NO(2)) are not constant but vary temporally and spatially. The current paper presents a powerful tool for the quantification of the effects of wind direction and wind speed on background NO(2) concentrations, particularly in cases where monitoring data are limited. In contrast to previous studies which applied similar methods to sites directly affected by local pollution sources, the current study focuses on background sites with the aim of improving methods for predicting background concentrations adopted in air quality modelling studies. The relationship between measured NO(2) concentration in air at three such sites in Ireland and locally measured wind direction has been quantified using nonparametric regression methods. The major aim was to analyse a method for quantifying the effects of local wind direction on background levels of NO(2) in Ireland. The method was expanded to include wind speed as an added predictor variable. A Gaussian kernel function is used in the analysis and circular statistics employed for the wind direction variable. Wind direction and wind speed were both found to have a statistically significant effect on background levels of NO(2) at all three sites. Frequently environmental impact assessments are based on short term baseline monitoring producing a limited dataset. The presented non-parametric regression methods, in contrast to the frequently used methods such as binning of the data, allow concentrations for missing data pairs to be estimated and distinction between spurious and true peaks in concentrations to be made. The methods were found to provide a realistic estimation of long term concentration variation with wind direction and speed, even for cases where the data set is limited. Accurate identification of the actual variation at each location and causative factors could be made, thus supporting the improved definition of background concentrations for use in air quality modelling studies. Copyright © 2010 Elsevier B.V. All rights reserved.
Characterization of Changes in Gene Expression and Biochemical Pathways at Low Levels of Benzene Exposure

PubMed Central

Thomas, Reuben; Hubbard, Alan E.; McHale, Cliona M.; Zhang, Luoping; Rappaport, Stephen M.; Lan, Qing; Rothman, Nathaniel; Vermeulen, Roel; Guyton, Kathryn Z.; Jinot, Jennifer; Sonawane, Babasaheb R.; Smith, Martyn T.

2014-01-01

Benzene, a ubiquitous environmental pollutant, causes acute myeloid leukemia (AML). Recently, through transcriptome profiling of peripheral blood mononuclear cells (PBMC), we reported dose-dependent effects of benzene exposure on gene expression and biochemical pathways in 83 workers exposed across four airborne concentration ranges (from <1 ppm to >10 ppm) compared with 42 subjects with non-workplace ambient exposure levels. Here, we further characterize these dose-dependent effects with continuous benzene exposure in all 125 study subjects. We estimated air benzene exposure levels in the 42 environmentally-exposed subjects from their unmetabolized urinary benzene levels. We used a novel non-parametric, data-adaptive model selection method to estimate the change with dose in the expression of each gene. We describe non-parametric approaches to model pathway responses and used these to estimate the dose responses of the AML pathway and 4 other pathways of interest. The response patterns of majority of genes as captured by mean estimates of the first and second principal components of the dose-response for the five pathways and the profiles of 6 AML pathway response-representative genes (identified by clustering) exhibited similar apparent supra-linear responses. Responses at or below 0.1 ppm benzene were observed for altered expression of AML pathway genes and CYP2E1. Together, these data show that benzene alters disease-relevant pathways and genes in a dose-dependent manner, with effects apparent at doses as low as 100 ppb in air. Studies with extensive exposure assessment of subjects exposed in the low-dose range between 10 ppb and 1 ppm are needed to confirm these findings. PMID:24786086
Task-based detectability comparison of exponential transformation of free-response operating characteristic (EFROC) curve and channelized Hotelling observer (CHO)

NASA Astrophysics Data System (ADS)

Khobragade, P.; Fan, Jiahua; Rupcich, Franco; Crotty, Dominic J.; Gilat Schmidt, Taly

2016-03-01

This study quantitatively evaluated the performance of the exponential transformation of the free-response operating characteristic curve (EFROC) metric, with the Channelized Hotelling Observer (CHO) as a reference. The CHO has been used for image quality assessment of reconstruction algorithms and imaging systems and often it is applied to study the signal-location-known cases. The CHO also requires a large set of images to estimate the covariance matrix. In terms of clinical applications, this assumption and requirement may be unrealistic. The newly developed location-unknown EFROC detectability metric is estimated from the confidence scores reported by a model observer. Unlike the CHO, EFROC does not require a channelization step and is a non-parametric detectability metric. There are few quantitative studies available on application of the EFROC metric, most of which are based on simulation data. This study investigated the EFROC metric using experimental CT data. A phantom with four low contrast objects: 3mm (14 HU), 5mm (7HU), 7mm (5 HU) and 10 mm (3 HU) was scanned at dose levels ranging from 25 mAs to 270 mAs and reconstructed using filtered backprojection. The area under the curve values for CHO (AUC) and EFROC (AFE) were plotted with respect to different dose levels. The number of images required to estimate the non-parametric AFE metric was calculated for varying tasks and found to be less than the number of images required for parametric CHO estimation. The AFE metric was found to be more sensitive to changes in dose than the CHO metric. This increased sensitivity and the assumption of unknown signal location may be useful for investigating and optimizing CT imaging methods. Future work is required to validate the AFE metric against human observers.
Evaluation of procedures for prediction of unconventional gas in the presence of geologic trends

USGS Publications Warehouse

Attanasi, E.D.; Coburn, T.C.

2009-01-01

This study extends the application of local spatial nonparametric prediction models to the estimation of recoverable gas volumes in continuous-type gas plays to regimes where there is a single geologic trend. A transformation is presented, originally proposed by Tomczak, that offsets the distortions caused by the trend. This article reports on numerical experiments that compare predictive and classification performance of the local nonparametric prediction models based on the transformation with models based on Euclidean distance. The transformation offers improvement in average root mean square error when the trend is not severely misspecified. Because of the local nature of the models, even those based on Euclidean distance in the presence of trends are reasonably robust. The tests based on other model performance metrics such as prediction error associated with the high-grade tracts and the ability of the models to identify sites with the largest gas volumes also demonstrate the robustness of both local modeling approaches. ?? International Association for Mathematical Geology 2009.
Power calculation for comparing diagnostic accuracies in a multi-reader, multi-test design.

PubMed

Kim, Eunhee; Zhang, Zheng; Wang, Youdan; Zeng, Donglin

2014-12-01

Receiver operating characteristic (ROC) analysis is widely used to evaluate the performance of diagnostic tests with continuous or ordinal responses. A popular study design for assessing the accuracy of diagnostic tests involves multiple readers interpreting multiple diagnostic test results, called the multi-reader, multi-test design. Although several different approaches to analyzing data from this design exist, few methods have discussed the sample size and power issues. In this article, we develop a power formula to compare the correlated areas under the ROC curves (AUC) in a multi-reader, multi-test design. We present a nonparametric approach to estimate and compare the correlated AUCs by extending DeLong et al.'s (1988, Biometrics 44, 837-845) approach. A power formula is derived based on the asymptotic distribution of the nonparametric AUCs. Simulation studies are conducted to demonstrate the performance of the proposed power formula and an example is provided to illustrate the proposed procedure. © 2014, The International Biometric Society.
REGRES: A FORTRAN-77 program to calculate nonparametric and ``structural'' parametric solutions to bivariate regression equations

NASA Astrophysics Data System (ADS)

Rock, N. M. S.; Duffy, T. R.

REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.
Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses

PubMed Central

Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.

2016-01-01

High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062
Model risk for European-style stock index options.

PubMed

Gençay, Ramazan; Gibson, Rajna

2007-01-01

In empirical modeling, there have been two strands for pricing in the options literature, namely the parametric and nonparametric models. Often, the support for the nonparametric methods is based on a benchmark such as the Black-Scholes (BS) model with constant volatility. In this paper, we study the stochastic volatility (SV) and stochastic volatility random jump (SVJ) models as parametric benchmarks against feedforward neural network (FNN) models, a class of neural network models. Our choice for FNN models is due to their well-studied universal approximation properties of an unknown function and its partial derivatives. Since the partial derivatives of an option pricing formula are risk pricing tools, an accurate estimation of the unknown option pricing function is essential for pricing and hedging. Our findings indicate that FNN models offer themselves as robust option pricing tools, over their sophisticated parametric counterparts in predictive settings. There are two routes to explain the superiority of FNN models over the parametric models in forecast settings. These are nonnormality of return distributions and adaptive learning.
Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity

PubMed Central

Day, Ryan; Lennox, Kristin P.; Sukhanov, Paul; Dahl, David B.; Vannucci, Marina; Tsai, Jerry

2011-01-01

Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. PMID:22028638

Inference on periodicity of circadian time series.

PubMed

Costa, Maria J; Finkenstädt, Bärbel; Roche, Véronique; Lévi, Francis; Gould, Peter D; Foreman, Julia; Halliday, Karen; Hall, Anthony; Rand, David A

2013-09-01

Estimation of the period length of time-course data from cyclical biological processes, such as those driven by the circadian pacemaker, is crucial for inferring the properties of the biological clock found in many living organisms. We propose a methodology for period estimation based on spectrum resampling (SR) techniques. Simulation studies show that SR is superior and more robust to non-sinusoidal and noisy cycles than a currently used routine based on Fourier approximations. In addition, a simple fit to the oscillations using linear least squares is available, together with a non-parametric test for detecting changes in period length which allows for period estimates with different variances, as frequently encountered in practice. The proposed methods are motivated by and applied to various data examples from chronobiology.
A Parametric k-Means Algorithm

PubMed Central

Tarpey, Thaddeus

2007-01-01

Summary The k points that optimally represent a distribution (usually in terms of a squared error loss) are called the k principal points. This paper presents a computationally intensive method that automatically determines the principal points of a parametric distribution. Cluster means from the k-means algorithm are nonparametric estimators of principal points. A parametric k-means approach is introduced for estimating principal points by running the k-means algorithm on a very large simulated data set from a distribution whose parameters are estimated using maximum likelihood. Theoretical and simulation results are presented comparing the parametric k-means algorithm to the usual k-means algorithm and an example on determining sizes of gas masks is used to illustrate the parametric k-means algorithm. PMID:17917692
Smooth conditional distribution function and quantiles under random censorship.

PubMed

Leconte, Eve; Poiraud-Casanova, Sandrine; Thomas-Agnan, Christine

2002-09-01

We consider a nonparametric random design regression model in which the response variable is possibly right censored. The aim of this paper is to estimate the conditional distribution function and the conditional alpha-quantile of the response variable. We restrict attention to the case where the response variable as well as the explanatory variable are unidimensional and continuous. We propose and discuss two classes of estimators which are smooth with respect to the response variable as well as to the covariate. Some simulations demonstrate that the new methods have better mean square error performances than the generalized Kaplan-Meier estimator introduced by Beran (1981) and considered in the literature by Dabrowska (1989, 1992) and Gonzalez-Manteiga and Cadarso-Suarez (1994).
Tremor Detection Using Parametric and Non-Parametric Spectral Estimation Methods: A Comparison with Clinical Assessment

PubMed Central

Martinez Manzanera, Octavio; Elting, Jan Willem; van der Hoeven, Johannes H.; Maurits, Natasha M.

2016-01-01

In the clinic, tremor is diagnosed during a time-limited process in which patients are observed and the characteristics of tremor are visually assessed. For some tremor disorders, a more detailed analysis of these characteristics is needed. Accelerometry and electromyography can be used to obtain a better insight into tremor. Typically, routine clinical assessment of accelerometry and electromyography data involves visual inspection by clinicians and occasionally computational analysis to obtain objective characteristics of tremor. However, for some tremor disorders these characteristics may be different during daily activity. This variability in presentation between the clinic and daily life makes a differential diagnosis more difficult. A long-term recording of tremor by accelerometry and/or electromyography in the home environment could help to give a better insight into the tremor disorder. However, an evaluation of such recordings using routine clinical standards would take too much time. We evaluated a range of techniques that automatically detect tremor segments in accelerometer data, as accelerometer data is more easily obtained in the home environment than electromyography data. Time can be saved if clinicians only have to evaluate the tremor characteristics of segments that have been automatically detected in longer daily activity recordings. We tested four non-parametric methods and five parametric methods on clinical accelerometer data from 14 patients with different tremor disorders. The consensus between two clinicians regarding the presence or absence of tremor on 3943 segments of accelerometer data was employed as reference. The nine methods were tested against this reference to identify their optimal parameters. Non-parametric methods generally performed better than parametric methods on our dataset when optimal parameters were used. However, one parametric method, employing the high frequency content of the tremor bandwidth under consideration (High Freq) performed similarly to non-parametric methods, but had the highest recall values, suggesting that this method could be employed for automatic tremor detection. PMID:27258018
Autonomous Frequency Domain Identification: Theory and Experiment

DTIC Science & Technology

1989-04-15

4,4,3)). This approach is particularly well suited to provide accurate estimation using sampled-data -3 DO 2 ^ UJ H « > x 2 ^ ui M (d 5 -P m...criteria for resonance requires a unimodal search. Search strategies such as golden search, Fibonacci search etc. are well known and can be found for...identified nonparametrically and a frequency domain de - scription is available, a parametric representation of the transfer function can be found by
A note on the IQ of monozygotic twins raised apart and the order of their birth.

PubMed

Pencavel, J H

1976-10-01

This note examines James Shields' sample of monozygotic twins raised apart to entertain the hypothesis that there is a significant association between the measured IQ of these twins and the order of their birth. A non-parametric test supports this hypothesis and then a linear probability function is estimated that discriminates the effects on IQ of birth order from the effects of birth weight.
Integrating geological and geophysical data to improve probabilistic hazard forecasting of Arabian Shield volcanism

NASA Astrophysics Data System (ADS)

Runge, Melody G.; Bebbington, Mark S.; Cronin, Shane J.; Lindsay, Jan M.; Moufti, Mohammed R.

2016-02-01

During probabilistic volcanic hazard analysis of volcanic fields, a greater variety of spatial data on crustal features should help improve forecasts of future vent locations. Without further examination, however, geophysical estimations of crustal or other features may be non-informative. Here, we present a new, robust, non-parametric method to quantitatively determine the existence of any relationship between natural phenomena (e.g., volcanic eruptions) and a variety of geophysical data. This provides a new validation tool for incorporating a range of potentially hazard-diagnostic observable data into recurrence rate estimates and hazard analyses. Through this study it is shown that the location of Cenozoic volcanic fields across the Arabian Shield appear to be related to locations of major and minor faults, at higher elevations, and regions where gravity anomaly values were between - 125 mGal and 0 mGal. These findings support earlier hypotheses that the western shield uplift was related to Cenozoic volcanism. At the harrat (volcanic field)-scale, higher vent density regions are related to both elevation and gravity anomaly values. A by-product of this work is the collection of existing data on the volcanism across Saudi Arabia, with all vent locations provided herein, as well as updated maps for Harrats Kura, Khaybar, Ithnayn, Kishb, and Rahat. This work also highlights the potential dangers of assuming relationships between observed data and the occurrence of a natural phenomenon without quantitative assessment or proper consideration of the effects of data resolution.
Extreme-Scale Bayesian Inference for Uncertainty Quantification of Complex Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Biros, George

Uncertainty quantification (UQ)—that is, quantifying uncertainties in complex mathematical models and their large-scale computational implementations—is widely viewed as one of the outstanding challenges facing the field of CS&E over the coming decade. The EUREKA project set to address the most difficult class of UQ problems: those for which both the underlying PDE model as well as the uncertain parameters are of extreme scale. In the project we worked on these extreme-scale challenges in the following four areas: 1. Scalable parallel algorithms for sampling and characterizing the posterior distribution that exploit the structure of the underlying PDEs and parameter-to-observable map. Thesemore » include structure-exploiting versions of the randomized maximum likelihood method, which aims to overcome the intractability of employing conventional MCMC methods for solving extreme-scale Bayesian inversion problems by appealing to and adapting ideas from large-scale PDE-constrained optimization, which have been very successful at exploring high-dimensional spaces. 2. Scalable parallel algorithms for construction of prior and likelihood functions based on learning methods and non-parametric density estimation. Constructing problem-specific priors remains a critical challenge in Bayesian inference, and more so in high dimensions. Another challenge is construction of likelihood functions that capture unmodeled couplings between observations and parameters. We will create parallel algorithms for non-parametric density estimation using high dimensional N-body methods and combine them with supervised learning techniques for the construction of priors and likelihood functions. 3. Bayesian inadequacy models, which augment physics models with stochastic models that represent their imperfections. The success of the Bayesian inference framework depends on the ability to represent the uncertainty due to imperfections of the mathematical model of the phenomena of interest. This is a central challenge in UQ, especially for large-scale models. We propose to develop the mathematical tools to address these challenges in the context of extreme-scale problems. 4. Parallel scalable algorithms for Bayesian optimal experimental design (OED). Bayesian inversion yields quantified uncertainties in the model parameters, which can be propagated forward through the model to yield uncertainty in outputs of interest. This opens the way for designing new experiments to reduce the uncertainties in the model parameters and model predictions. Such experimental design problems have been intractable for large-scale problems using conventional methods; we will create OED algorithms that exploit the structure of the PDE model and the parameter-to-output map to overcome these challenges. Parallel algorithms for these four problems were created, analyzed, prototyped, implemented, tuned, and scaled up for leading-edge supercomputers, including UT-Austin’s own 10 petaflops Stampede system, ANL’s Mira system, and ORNL’s Titan system. While our focus is on fundamental mathematical/computational methods and algorithms, we will assess our methods on model problems derived from several DOE mission applications, including multiscale mechanics and ice sheet dynamics.« less
Robust estimation for partially linear models with large-dimensional covariates

PubMed Central

Zhu, LiPing; Li, RunZe; Cui, HengJian

2014-01-01

We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087
Robust estimation for partially linear models with large-dimensional covariates.

PubMed

Zhu, LiPing; Li, RunZe; Cui, HengJian

2013-10-01

We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.
A new approach to correct the QT interval for changes in heart rate using a nonparametric regression model in beagle dogs.

PubMed

Watanabe, Hiroyuki; Miyazaki, Hiroyasu

2006-01-01

Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Maximum Marginal Likelihood Estimation of a Monotonic Polynomial Generalized Partial Credit Model with Applications to Multiple Group Analysis.

PubMed

Falk, Carl F; Cai, Li

2016-06-01

We present a semi-parametric approach to estimating item response functions (IRF) useful when the true IRF does not strictly follow commonly used functions. Our approach replaces the linear predictor of the generalized partial credit model with a monotonic polynomial. The model includes the regular generalized partial credit model at the lowest order polynomial. Our approach extends Liang's (A semi-parametric approach to estimate IRFs, Unpublished doctoral dissertation, 2007) method for dichotomous item responses to the case of polytomous data. Furthermore, item parameter estimation is implemented with maximum marginal likelihood using the Bock-Aitkin EM algorithm, thereby facilitating multiple group analyses useful in operational settings. Our approach is demonstrated on both educational and psychological data. We present simulation results comparing our approach to more standard IRF estimation approaches and other non-parametric and semi-parametric alternatives.
Bayesian estimation of the discrete coefficient of determination.

PubMed

Chen, Ting; Braga-Neto, Ulisses M

2016-12-01

The discrete coefficient of determination (CoD) measures the nonlinear interaction between discrete predictor and target variables and has had far-reaching applications in Genomic Signal Processing. Previous work has addressed the inference of the discrete CoD using classical parametric and nonparametric approaches. In this paper, we introduce a Bayesian framework for the inference of the discrete CoD. We derive analytically the optimal minimum mean-square error (MMSE) CoD estimator, as well as a CoD estimator based on the Optimal Bayesian Predictor (OBP). For the latter estimator, exact expressions for its bias, variance, and root-mean-square (RMS) are given. The accuracy of both Bayesian CoD estimators with non-informative and informative priors, under fixed or random parameters, is studied via analytical and numerical approaches. We also demonstrate the application of the proposed Bayesian approach in the inference of gene regulatory networks, using gene-expression data from a previously published study on metastatic melanoma.
Wind and wave extremes over the world oceans from very large ensembles

NASA Astrophysics Data System (ADS)

Breivik, Øyvind; Aarnes, Ole Johan; Abdalla, Saleh; Bidlot, Jean-Raymond; Janssen, Peter A. E. M.

2014-07-01

Global return values of marine wind speed and significant wave height are estimated from very large aggregates of archived ensemble forecasts at +240 h lead time. Long lead time ensures that the forecasts represent independent draws from the model climate. Compared with ERA-Interim, a reanalysis, the ensemble yields higher return estimates for both wind speed and significant wave height. Confidence intervals are much tighter due to the large size of the data set. The period (9 years) is short enough to be considered stationary even with climate change. Furthermore, the ensemble is large enough for nonparametric 100 year return estimates to be made from order statistics. These direct return estimates compare well with extreme value estimates outside areas with tropical cyclones. Like any method employing modeled fields, it is sensitive to tail biases in the numerical model, but we find that the biases are moderate outside areas with tropical cyclones.
Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding.

PubMed

Sun, Xiaochun; Ma, Ping; Mumm, Rita H

2012-01-01

Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.
Nonparametric Method for Genomics-Based Prediction of Performance of Quantitative Traits Involving Epistasis in Plant Breeding

PubMed Central

Sun, Xiaochun; Ma, Ping; Mumm, Rita H.

2012-01-01

Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression. PMID:23226325
Impact of overgrazing on the transmission of Echinococcus multilocularis in Tibetan pastoral communities of Sichuan Province, China.

PubMed

Wang, Qian; Xiao, Yong-fu; Vuitton, Dominique A; Schantz, Peter M; Raoul, Francis; Budke, Christine; Campos-Ponce, Maiza; Craig, Philip S; Giraudoux, Patrick

2007-02-05

Overgrazing was assumed to increase the population density of small mammals that are the intermediate hosts of Echinococcus multilocularis, the pathogen of alveolar echinococcosis in the Qinghai Tibet Plateau. This research tested the hypothesis that overgrazing might promote Echinococcus multilocularis transmission through increasing populations of small mammal, intermediate hosts in Tibetan pastoral communities. Grazing practices, small mammal indices and dog Echinococcus multilocularis infection data were collected to analyze the relation between overgrazing and Echinococcus multilocularis transmission using nonparametric tests and multiple stepwise logistic regression. In the investigated area, raising livestock was a key industry. The communal pastures existed and the available forage was deficient for grazing. Open (common) pastures were overgrazed and had higher burrow density of small mammals compared with neighboring fenced (private) pastures; this high overgrazing pressure on the open pastures measured by neighboring fenced area led to higher burrow density of small mammals in open pastures. The median burrow density of small mammals in open pastures was independently associated with nearby canine Echinococcus multilocularis infection (P = 0.003, OR = 1.048). Overgrazing may promote the transmission of Echinococcus multilocularis through increasing the population density of small mammals.
Evaluating uncertainty in predicting spatially variable representative elementary scales in fractured aquifers, with application to Turkey Creek Basin, Colorado

USGS Publications Warehouse

Wellman, Tristan P.; Poeter, Eileen P.

2006-01-01

Computational limitations and sparse field data often mandate use of continuum representation for modeling hydrologic processes in large‐scale fractured aquifers. Selecting appropriate element size is of primary importance because continuum approximation is not valid for all scales. The traditional approach is to select elements by identifying a single representative elementary scale (RES) for the region of interest. Recent advances indicate RES may be spatially variable, prompting unanswered questions regarding the ability of sparse data to spatially resolve continuum equivalents in fractured aquifers. We address this uncertainty of estimating RES using two techniques. In one technique we employ data‐conditioned realizations generated by sequential Gaussian simulation. For the other we develop a new approach using conditioned random walks and nonparametric bootstrapping (CRWN). We evaluate the effectiveness of each method under three fracture densities, three data sets, and two groups of RES analysis parameters. In sum, 18 separate RES analyses are evaluated, which indicate RES magnitudes may be reasonably bounded using uncertainty analysis, even for limited data sets and complex fracture structure. In addition, we conduct a field study to estimate RES magnitudes and resulting uncertainty for Turkey Creek Basin, a crystalline fractured rock aquifer located 30 km southwest of Denver, Colorado. Analyses indicate RES does not correlate to rock type or local relief in several instances but is generally lower within incised creek valleys and higher along mountain fronts. Results of this study suggest that (1) CRWN is an effective and computationally efficient method to estimate uncertainty, (2) RES predictions are well constrained using uncertainty analysis, and (3) for aquifers such as Turkey Creek Basin, spatial variability of RES is significant and complex.
Measuring agreement of multivariate discrete survival times using a modified weighted kappa coefficient.

PubMed

Guo, Ying; Manatunga, Amita K

2009-03-01

Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. We present a modified weighted kappa coefficient to measure agreement between bivariate discrete survival times. The proposed kappa coefficient accommodates censoring by redistributing the mass of censored observations within the grid where the unobserved events may potentially happen. A generalized modified weighted kappa is proposed for multivariate discrete survival times. We estimate the modified kappa coefficients nonparametrically through a multivariate survival function estimator. The asymptotic properties of the kappa estimators are established and the performance of the estimators are examined through simulation studies of bivariate and trivariate survival times. We illustrate the application of the modified kappa coefficient in the presence of censored observations with data from a prostate cancer study.
Economic decision making and the application of nonparametric prediction models

USGS Publications Warehouse

Attanasi, E.D.; Coburn, T.C.; Freeman, P.A.

2008-01-01

Sustained increases in energy prices have focused attention on gas resources in low-permeability shale or in coals that were previously considered economically marginal. Daily well deliverability is often relatively small, although the estimates of the total volumes of recoverable resources in these settings are often large. Planning and development decisions for extraction of such resources must be areawide because profitable extraction requires optimization of scale economies to minimize costs and reduce risk. For an individual firm, the decision to enter such plays depends on reconnaissance-level estimates of regional recoverable resources and on cost estimates to develop untested areas. This paper shows how simple nonparametric local regression models, used to predict technically recoverable resources at untested sites, can be combined with economic models to compute regional-scale cost functions. The context of the worked example is the Devonian Antrim-shale gas play in the Michigan basin. One finding relates to selection of the resource prediction model to be used with economic models. Models chosen because they can best predict aggregate volume over larger areas (many hundreds of sites) smooth out granularity in the distribution of predicted volumes at individual sites. This loss of detail affects the representation of economic cost functions and may affect economic decisions. Second, because some analysts consider unconventional resources to be ubiquitous, the selection and order of specific drilling sites may, in practice, be determined arbitrarily by extraneous factors. The analysis shows a 15-20% gain in gas volume when these simple models are applied to order drilling prospects strategically rather than to choose drilling locations randomly. Copyright ?? 2008 Society of Petroleum Engineers.

A parametric approach for simultaneous bias correction and high-resolution downscaling of climate model rainfall

NASA Astrophysics Data System (ADS)

Mamalakis, Antonios; Langousis, Andreas; Deidda, Roberto; Marrocu, Marino

2017-03-01

Distribution mapping has been identified as the most efficient approach to bias-correct climate model rainfall, while reproducing its statistics at spatial and temporal resolutions suitable to run hydrologic models. Yet its implementation based on empirical distributions derived from control samples (referred to as nonparametric distribution mapping) makes the method's performance sensitive to sample length variations, the presence of outliers, the spatial resolution of climate model results, and may lead to biases, especially in extreme rainfall estimation. To address these shortcomings, we propose a methodology for simultaneous bias correction and high-resolution downscaling of climate model rainfall products that uses: (a) a two-component theoretical distribution model (i.e., a generalized Pareto (GP) model for rainfall intensities above a specified threshold u*, and an exponential model for lower rainrates), and (b) proper interpolation of the corresponding distribution parameters on a user-defined high-resolution grid, using kriging for uncertain data. We assess the performance of the suggested parametric approach relative to the nonparametric one, using daily raingauge measurements from a dense network in the island of Sardinia (Italy), and rainfall data from four GCM/RCM model chains of the ENSEMBLES project. The obtained results shed light on the competitive advantages of the parametric approach, which is proved more accurate and considerably less sensitive to the characteristics of the calibration period, independent of the GCM/RCM combination used. This is especially the case for extreme rainfall estimation, where the GP assumption allows for more accurate and robust estimates, also beyond the range of the available data.
Joint radius-length distribution as a measure of anisotropic pore eccentricity: an experimental and analytical framework.

PubMed

Benjamini, Dan; Basser, Peter J

2014-12-07

In this work, we present an experimental design and analytical framework to measure the nonparametric joint radius-length (R-L) distribution of an ensemble of parallel, finite cylindrical pores, and more generally, the eccentricity distribution of anisotropic pores. Employing a novel 3D double pulsed-field gradient acquisition scheme, we first obtain both the marginal radius and length distributions of a population of cylindrical pores and then use these to constrain and stabilize the estimate of the joint radius-length distribution. Using the marginal distributions as constraints allows the joint R-L distribution to be reconstructed from an underdetermined system (i.e., more variables than equations), which requires a relatively small and feasible number of MR acquisitions. Three simulated representative joint R-L distribution phantoms corrupted by different noise levels were reconstructed to demonstrate the process, using this new framework. As expected, the broader the peaks in the joint distribution, the less stable and more sensitive to noise the estimation of the marginal distributions. Nevertheless, the reconstruction of the joint distribution is remarkably robust to increases in noise level; we attribute this characteristic to the use of the marginal distributions as constraints. Axons are known to exhibit local compartment eccentricity variations upon injury; the extent of the variations depends on the severity of the injury. Nonparametric estimation of the eccentricity distribution of injured axonal tissue is of particular interest since generally one cannot assume a parametric distribution a priori. Reconstructing the eccentricity distribution may provide vital information about changes resulting from injury or that occurred during development.
An Empirical Study of Eight Nonparametric Tests in Hierarchical Regression.

ERIC Educational Resources Information Center

Harwell, Michael; Serlin, Ronald C.

When normality does not hold, nonparametric tests represent an important data-analytic alternative to parametric tests. However, the use of nonparametric tests in educational research has been limited by the absence of easily performed tests for complex experimental designs and analyses, such as factorial designs and multiple regression analyses,…
Statistical Methods and Sampling Design for Estimating Step Trends in Surface-Water Quality

USGS Publications Warehouse

Hirsch, Robert M.

1988-01-01

This paper addresses two components of the problem of estimating the magnitude of step trends in surface water quality. The first is finding a robust estimator appropriate to the data characteristics expected in water-quality time series. The J. L. Hodges-E. L. Lehmann class of estimators is found to be robust in comparison to other nonparametric and moment-based estimators. A seasonal Hodges-Lehmann estimator is developed and shown to have desirable properties. Second, the effectiveness of various sampling strategies is examined using Monte Carlo simulation coupled with application of this estimator. The simulation is based on a large set of total phosphorus data from the Potomac River. To assure that the simulated records have realistic properties, the data are modeled in a multiplicative fashion incorporating flow, hysteresis, seasonal, and noise components. The results demonstrate the importance of balancing the length of the two sampling periods and balancing the number of data values between the two periods.
Robustness of survival estimates for radio-marked animals

USGS Publications Warehouse

Bunck, C.M.; Chen, C.-L.

1992-01-01

Telemetry techniques are often used to study the survival of birds and mammals; particularly whcn mark-recapture approaches are unsuitable. Both parametric and nonparametric methods to estimate survival have becn developed or modified from other applications. An implicit assumption in these approaches is that the probability of re-locating an animal with a functioning transmitter is one. A Monte Carlo study was conducted to determine the bias and variance of the Kaplan-Meier estimator and an estimator based also on the assumption of constant hazard and to eva!uate the performance of the two-sample tests associated with each. Modifications of each estimator which allow a re-Iocation probability of less than one are described and evaluated. Generallv the unmodified estimators were biased but had lower variance. At low sample sizes all estimators performed poorly. Under the null hypothesis, the distribution of all test statistics reasonably approximated the null distribution when survival was low but not when it was high. The power of the two-sample tests were similar.
Essays on parametric and nonparametric modeling and estimation with applications to energy economics

NASA Astrophysics Data System (ADS)

Gao, Weiyu

My dissertation research is composed of two parts: a theoretical part on semiparametric efficient estimation and an applied part in energy economics under different dynamic settings. The essays are related in terms of their applications as well as the way in which models are constructed and estimated. In the first essay, efficient estimation of the partially linear model is studied. We work out the efficient score functions and efficiency bounds under four stochastic restrictions---independence, conditional symmetry, conditional zero mean, and partially conditional zero mean. A feasible efficient estimation method for the linear part of the model is developed based on the efficient score. A battery of specification test that allows for choosing between the alternative assumptions is provided. A Monte Carlo simulation is also conducted. The second essay presents a dynamic optimization model for a stylized oilfield resembling the largest developed light oil field in Saudi Arabia, Ghawar. We use data from different sources to estimate the oil production cost function and the revenue function. We pay particular attention to the dynamic aspect of the oil production by employing petroleum-engineering software to simulate the interaction between control variables and reservoir state variables. Optimal solutions are studied under different scenarios to account for the possible changes in the exogenous variables and the uncertainty about the forecasts. The third essay examines the effect of oil price volatility on the level of innovation displayed by the U.S. economy. A measure of innovation is calculated by decomposing an output-based Malmquist index. We also construct a nonparametric measure for oil price volatility. Technical change and oil price volatility are then placed in a VAR system with oil price and a variable indicative of monetary policy. The system is estimated and analyzed for significant relationships. We find that oil price volatility displays a significant negative effect on innovation. A key point of this analysis lies in the fact that we impose no functional forms for technologies and the methods employed keep technical assumptions to a minimum.
Bayesian deconvolution of [corrected] fMRI data using bilinear dynamical systems.

PubMed

Makni, Salima; Beckmann, Christian; Smith, Steve; Woolrich, Mark

2008-10-01

In Penny et al. [Penny, W., Ghahramani, Z., Friston, K.J. 2005. Bilinear dynamical systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360(1457) 983-993], a particular case of the Linear Dynamical Systems (LDSs) was used to model the dynamic behavior of the BOLD response in functional MRI. This state-space model, called bilinear dynamical system (BDS), is used to deconvolve the fMRI time series in order to estimate the neuronal response induced by the different stimuli of the experimental paradigm. The BDS model parameters are estimated using an expectation-maximization (EM) algorithm proposed by Ghahramani and Hinton [Ghahramani, Z., Hinton, G.E. 1996. Parameter Estimation for Linear Dynamical Systems. Technical Report, Department of Computer Science, University of Toronto]. In this paper we introduce modifications to the BDS model in order to explicitly model the spatial variations of the haemodynamic response function (HRF) in the brain using a non-parametric approach. While in Penny et al. [Penny, W., Ghahramani, Z., Friston, K.J. 2005. Bilinear dynamical systems. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360(1457) 983-993] the relationship between neuronal activation and fMRI signals is formulated as a first-order convolution with a kernel expansion using basis functions (typically two or three), in this paper, we argue in favor of a spatially adaptive GLM in which a local non-parametric estimation of the HRF is performed. Furthermore, in order to overcome the overfitting problem typically associated with simple EM estimates, we propose a full Variational Bayes (VB) solution to infer the BDS model parameters. We demonstrate the usefulness of our model which is able to estimate both the neuronal activity and the haemodynamic response function in every voxel of the brain. We first examine the behavior of this approach when applied to simulated data with different temporal and noise features. As an example we will show how this method can be used to improve interpretability of estimates from an independent component analysis (ICA) analysis of fMRI data. We finally demonstrate its use on real fMRI data in one slice of the brain.
A comparison of non-parametric techniques to estimate incident photosynthetically active radiation from MODIS for monitoring primary production

NASA Astrophysics Data System (ADS)

Brown, M. G. L.; He, T.; Liang, S.

2016-12-01

Satellite-derived estimates of incident photosynthetically active radiation (PAR) can be used to monitor global change, are required by most terrestrial ecosystem models, and can be used to estimate primary production according to the theory of light use efficiency. Compared with parametric approaches, non-parametric techniques that include an artificial neural network (ANN), support vector machine regression (SVM), an artificial bee colony (ABC), and a look-up table (LUT) do not require many ancillary data as inputs for the estimation of PAR from satellite data. In this study, a selection of machine learning methods to estimate PAR from MODIS top of atmosphere (TOA) radiances are compared to a LUT approach to determine which techniques might best handle the nonlinear relationship between TOA radiance and incident PAR. Evaluation of these methods (ANN, SVM, and LUT) is performed with ground measurements at seven SURFRAD sites. Due to the design of the ANN, it can handle the nonlinear relationship between TOA radiance and PAR better than linearly interpolating between the values in the LUT; however, training the ANN has to be carried out on an angular-bin basis, which results in a LUT of ANNs. The SVM model may be better for incorporating multiple viewing angles than the ANN; however, both techniques require a large amount of training data, which may introduce a regional bias based on where the most training and validation data are available. Based on the literature, the ABC is a promising alternative to an ANN, SVM regression and a LUT, but further development for this application is required before concrete conclusions can be drawn. For now, the LUT method outperforms the machine-learning techniques, but future work should be directed at developing and testing the ABC method. A simple, robust method to estimate direct and diffuse incident PAR, with minimal inputs and a priori knowledge, would be very useful for monitoring global change of primary production, particularly of pastures and rangeland, which have implications for livestock and food security. Future work will delve deeper into the utility of satellite-derived PAR estimation for monitoring primary production in pasture and rangelands.
Health Insurance: The Trade-Off Between Risk Pooling and Moral Hazard.

DTIC Science & Technology

1989-12-01

bias comes about because we suppress the intercept term in estimating VFor the power, the test is against 1, - 1. With this transform, the risk...dealing with the same utility function. As one test of whether families behave in the way economic theory suggests, we have also fitted a probit model of...nonparametric alternative to test our results’ sensitivity to the assumption of a normal error in both the theoretical and empirical models of the
Automatic correction of intensity nonuniformity from sparseness of gradient distribution in medical images.

PubMed

Zheng, Yuanjie; Grossman, Murray; Awate, Suyash P; Gee, James C

2009-01-01

We propose to use the sparseness property of the gradient probability distribution to estimate the intensity nonuniformity in medical images, resulting in two novel automatic methods: a non-parametric method and a parametric method. Our methods are easy to implement because they both solve an iteratively re-weighted least squares problem. They are remarkably accurate as shown by our experiments on images of different imaged objects and from different imaging modalities.
Automatic Correction of Intensity Nonuniformity from Sparseness of Gradient Distribution in Medical Images

PubMed Central

Zheng, Yuanjie; Grossman, Murray; Awate, Suyash P.; Gee, James C.

2013-01-01

We propose to use the sparseness property of the gradient probability distribution to estimate the intensity nonuniformity in medical images, resulting in two novel automatic methods: a non-parametric method and a parametric method. Our methods are easy to implement because they both solve an iteratively re-weighted least squares problem. They are remarkably accurate as shown by our experiments on images of different imaged objects and from different imaging modalities. PMID:20426191
Determination of the appropriate quarantine period following smallpox exposure: an objective approach using the incubation period distribution.

PubMed

Nishiura, Hiroshi

2009-01-01

Determination of the most appropriate quarantine period for those exposed to smallpox is crucial to the construction of an effective preparedness program against a potential bioterrorist attack. This study reanalyzed data on the incubation period distribution of smallpox to allow the optimal quarantine period to be objectively calculated. In total, 131 cases of smallpox were examined; incubation periods were extracted from four different sets of historical data and only cases arising from exposure for a single day were considered. The mean (median and standard deviation (SD)) incubation period was 12.5 (12.0, 2.2) days. Assuming lognormal and gamma distributions for the incubation period, maximum likelihood estimates (and corresponding 95% confidence interval (CI)) of the 95th percentile were 16.4 (95% CI: 15.6, 17.9) and 16.2 (95% CI: 15.5, 17.4) days, respectively. Using a non-parametric method, the 95th percentile point was estimated as 16 (95% CI: 15, 17) days. The upper 95% CIs of the incubation periods at the 90th, 95th and 99th percentiles were shorter than 17, 18 and 23 days, respectively, using both parametric and non-parametric methods. These results suggest that quarantine measures can ensure non-infection among those exposed to smallpox with probabilities higher than 95-99%, if the exposed individuals are quarantined for 18-23 days after the date of contact tracing.
Extending the Distributed Lag Model framework to handle chemical mixtures.

PubMed

Bello, Ghalib A; Arora, Manish; Austin, Christine; Horton, Megan K; Wright, Robert O; Gennings, Chris

2017-07-01

Distributed Lag Models (DLMs) are used in environmental health studies to analyze the time-delayed effect of an exposure on an outcome of interest. Given the increasing need for analytical tools for evaluation of the effects of exposure to multi-pollutant mixtures, this study attempts to extend the classical DLM framework to accommodate and evaluate multiple longitudinally observed exposures. We introduce 2 techniques for quantifying the time-varying mixture effect of multiple exposures on an outcome of interest. Lagged WQS, the first technique, is based on Weighted Quantile Sum (WQS) regression, a penalized regression method that estimates mixture effects using a weighted index. We also introduce Tree-based DLMs, a nonparametric alternative for assessment of lagged mixture effects. This technique is based on the Random Forest (RF) algorithm, a nonparametric, tree-based estimation technique that has shown excellent performance in a wide variety of domains. In a simulation study, we tested the feasibility of these techniques and evaluated their performance in comparison to standard methodology. Both methods exhibited relatively robust performance, accurately capturing pre-defined non-linear functional relationships in different simulation settings. Further, we applied these techniques to data on perinatal exposure to environmental metal toxicants, with the goal of evaluating the effects of exposure on neurodevelopment. Our methods identified critical neurodevelopmental windows showing significant sensitivity to metal mixtures. Copyright © 2017 Elsevier Inc. All rights reserved.
pKWmEB: integration of Kruskal-Wallis test with empirical Bayes under polygenic background control for multi-locus genome-wide association study.

PubMed

Ren, Wen-Long; Wen, Yang-Jun; Dunwell, Jim M; Zhang, Yuan-Ming

2018-03-01

Although nonparametric methods in genome-wide association studies (GWAS) are robust in quantitative trait nucleotide (QTN) detection, the absence of polygenic background control in single-marker association in genome-wide scans results in a high false positive rate. To overcome this issue, we proposed an integrated nonparametric method for multi-locus GWAS. First, a new model transformation was used to whiten the covariance matrix of polygenic matrix K and environmental noise. Using the transferred model, Kruskal-Wallis test along with least angle regression was then used to select all the markers that were potentially associated with the trait. Finally, all the selected markers were placed into multi-locus model, these effects were estimated by empirical Bayes, and all the nonzero effects were further identified by a likelihood ratio test for true QTN detection. This method, named pKWmEB, was validated by a series of Monte Carlo simulation studies. As a result, pKWmEB effectively controlled false positive rate, although a less stringent significance criterion was adopted. More importantly, pKWmEB retained the high power of Kruskal-Wallis test, and provided QTN effect estimates. To further validate pKWmEB, we re-analyzed four flowering time related traits in Arabidopsis thaliana, and detected some previously reported genes that were not identified by the other methods.
Getting It Right Matters: Climate Spectra and Their Estimation

NASA Astrophysics Data System (ADS)

Privalsky, Victor; Yushkov, Vladislav

2018-06-01

In many recent publications, climate spectra estimated with different methods from observed, GCM-simulated, and reconstructed time series contain many peaks at time scales from a few years to many decades and even centuries. However, respective spectral estimates obtained with the autoregressive (AR) and multitapering (MTM) methods showed that spectra of climate time series are smooth and contain no evidence of periodic or quasi-periodic behavior. Four order selection criteria for the autoregressive models were studied and proven sufficiently reliable for 25 time series of climate observations at individual locations or spatially averaged at local-to-global scales. As time series of climate observations are short, an alternative reliable nonparametric approach is Thomson's MTM. These results agree with both the earlier climate spectral analyses and the Markovian stochastic model of climate.
Using a DEA Management Tool through a Nonparametric Approach: An Examination of Urban-Rural Effects on Thai School Efficiency

ERIC Educational Resources Information Center

Kantabutra, Sangchan

2009-01-01

This paper examines urban-rural effects on public upper-secondary school efficiency in northern Thailand. In the study, efficiency was measured by a nonparametric technique, data envelopment analysis (DEA). Urban-rural effects were examined through a Mann-Whitney nonparametric statistical test. Results indicate that urban schools appear to have…
Estimating time-dependent ROC curves using data under prevalent sampling.

PubMed

Li, Shanshan

2017-04-15

Prevalent sampling is frequently a convenient and economical sampling technique for the collection of time-to-event data and thus is commonly used in studies of the natural history of a disease. However, it is biased by design because it tends to recruit individuals with longer survival times. This paper considers estimation of time-dependent receiver operating characteristic curves when data are collected under prevalent sampling. To correct the sampling bias, we develop both nonparametric and semiparametric estimators using extended risk sets and the inverse probability weighting techniques. The proposed estimators are consistent and converge to Gaussian processes, while substantial bias may arise if standard estimators for right-censored data are used. To illustrate our method, we analyze data from an ovarian cancer study and estimate receiver operating characteristic curves that assess the accuracy of the composite markers in distinguishing subjects who died within 3-5 years from subjects who remained alive. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Kappa statistic for clustered matched-pair data.

PubMed

Yang, Zhao; Zhou, Ming

2014-07-10

Kappa statistic is widely used to assess the agreement between two procedures in the independent matched-pair data. For matched-pair data collected in clusters, on the basis of the delta method and sampling techniques, we propose a nonparametric variance estimator for the kappa statistic without within-cluster correlation structure or distributional assumptions. The results of an extensive Monte Carlo simulation study demonstrate that the proposed kappa statistic provides consistent estimation and the proposed variance estimator behaves reasonably well for at least a moderately large number of clusters (e.g., K ≥50). Compared with the variance estimator ignoring dependence within a cluster, the proposed variance estimator performs better in maintaining the nominal coverage probability when the intra-cluster correlation is fair (ρ ≥0.3), with more pronounced improvement when ρ is further increased. To illustrate the practical application of the proposed estimator, we analyze two real data examples of clustered matched-pair data. Copyright © 2014 John Wiley & Sons, Ltd.
Efficient bootstrap estimates for tail statistics

NASA Astrophysics Data System (ADS)

Breivik, Øyvind; Aarnes, Ole Johan

2017-03-01

Bootstrap resamples can be used to investigate the tail of empirical distributions as well as return value estimates from the extremal behaviour of the sample. Specifically, the confidence intervals on return value estimates or bounds on in-sample tail statistics can be obtained using bootstrap techniques. However, non-parametric bootstrapping from the entire sample is expensive. It is shown here that it suffices to bootstrap from a small subset consisting of the highest entries in the sequence to make estimates that are essentially identical to bootstraps from the entire sample. Similarly, bootstrap estimates of confidence intervals of threshold return estimates are found to be well approximated by using a subset consisting of the highest entries. This has practical consequences in fields such as meteorology, oceanography and hydrology where return values are calculated from very large gridded model integrations spanning decades at high temporal resolution or from large ensembles of independent and identically distributed model fields. In such cases the computational savings are substantial.
Doubly robust nonparametric inference on the average treatment effect.

PubMed

Benkeser, D; Carone, M; Laan, M J Van Der; Gilbert, P B

2017-12-01

Doubly robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double robustness does not readily extend to inference. We present a general theoretical study of the behaviour of doubly robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different methods for constructing such estimators and investigate the extent to which they may be modified to also allow doubly robust inference. We find that while targeted minimum loss-based estimation can be used to solve this problem very naturally, common alternative frameworks appear to be inappropriate for this purpose. We provide a theoretical study and a numerical evaluation of the alternatives considered. Our simulations highlight the need for and usefulness of these approaches in practice, while our theoretical developments have broad implications for the construction of estimators that permit doubly robust inference in other problems.

Maximum likelihood estimation for semiparametric transformation models with interval-censored data

PubMed Central

Mao, Lu; Lin, D. Y.

2016-01-01

Abstract Interval censoring arises frequently in clinical, epidemiological, financial and sociological studies, where the event or failure of interest is known only to occur within an interval induced by periodic monitoring. We formulate the effects of potentially time-dependent covariates on the interval-censored failure time through a broad class of semiparametric transformation models that encompasses proportional hazards and proportional odds models. We consider nonparametric maximum likelihood estimation for this class of models with an arbitrary number of monitoring times for each subject. We devise an EM-type algorithm that converges stably, even in the presence of time-dependent covariates, and show that the estimators for the regression parameters are consistent, asymptotically normal, and asymptotically efficient with an easily estimated covariance matrix. Finally, we demonstrate the performance of our procedures through simulation studies and application to an HIV/AIDS study conducted in Thailand. PMID:27279656
L-statistics for Repeated Measurements Data With Application to Trimmed Means, Quantiles and Tolerance Intervals.

PubMed

Assaad, Houssein I; Choudhary, Pankaj K

2013-01-01

The L -statistics form an important class of estimators in nonparametric statistics. Its members include trimmed means and sample quantiles and functions thereof. This article is devoted to theory and applications of L -statistics for repeated measurements data, wherein the measurements on the same subject are dependent and the measurements from different subjects are independent. This article has three main goals: (a) Show that the L -statistics are asymptotically normal for repeated measurements data. (b) Present three statistical applications of this result, namely, location estimation using trimmed means, quantile estimation and construction of tolerance intervals. (c) Obtain a Bahadur representation for sample quantiles. These results are generalizations of similar results for independently and identically distributed data. The practical usefulness of these results is illustrated by analyzing a real data set involving measurement of systolic blood pressure. The properties of the proposed point and interval estimators are examined via simulation.
Experimental determination of frequency response function estimates for flexible joint industrial manipulators with serial kinematics

NASA Astrophysics Data System (ADS)

Saupe, Florian; Knoblach, Andreas

2015-02-01

Two different approaches for the determination of frequency response functions (FRFs) are used for the non-parametric closed loop identification of a flexible joint industrial manipulator with serial kinematics. The two applied experiment designs are based on low power multisine and high power chirp excitations. The main challenge is to eliminate disturbances of the FRF estimates caused by the numerous nonlinearities of the robot. For the experiment design based on chirp excitations, a simple iterative procedure is proposed which allows exploiting the good crest factor of chirp signals in a closed loop setup. An interesting synergy of the two approaches, beyond validation purposes, is pointed out.
Historical HIV incidence modelling in regional subgroups: use of flexible discrete models with penalized splines based on prior curves.

PubMed

Greenland, S

1996-03-15

This paper presents an approach to back-projection (back-calculation) of human immunodeficiency virus (HIV) person-year infection rates in regional subgroups based on combining a log-linear model for subgroup differences with a penalized spline model for trends. The penalized spline approach allows flexible trend estimation but requires far fewer parameters than fully non-parametric smoothers, thus saving parameters that can be used in estimating subgroup effects. Use of reasonable prior curve to construct the penalty function minimizes the degree of smoothing needed beyond model specification. The approach is illustrated in application to acquired immunodeficiency syndrome (AIDS) surveillance data from Los Angeles County.
Sign realized jump risk and the cross-section of stock returns: Evidence from China's stock market.

PubMed

Chao, Youcong; Liu, Xiaoqun; Guo, Shijun

2017-01-01

Using 5-minute high frequency data from the Chinese stock market, we employ a non-parametric method to estimate Fama-French portfolio realized jumps and investigate whether the estimated positive, negative and sign realized jumps could forecast or explain the cross-sectional stock returns. The Fama-MacBeth regression results show that not only have the realized jump components and the continuous volatility been compensated with risk premium, but also that the negative jump risk, the positive jump risk and the sign jump risk, to some extent, could explain the return of the stock portfolios. Therefore, we should pay high attention to the downside tail risk and the upside tail risk.
Statistical methods for astronomical data with upper limits. II - Correlation and regression

NASA Technical Reports Server (NTRS)

Isobe, T.; Feigelson, E. D.; Nelson, P. I.

1986-01-01

Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
Semivarying coefficient models for capture-recapture data: colony size estimation for the little penguin Eudyptula minor.

PubMed

Stoklosa, Jakub; Dann, Peter; Huggins, Richard

2014-09-01

To accommodate seasonal effects that change from year to year into models for the size of an open population we consider a time-varying coefficient model. We fit this model to a capture-recapture data set collected on the little penguin Eudyptula minor in south-eastern Australia over a 25 year period using Jolly-Seber type estimators and nonparametric P-spline techniques. The time-varying coefficient model identified strong changes in the seasonal pattern across the years which we further examined using functional data analysis techniques. To evaluate the methodology we also conducted several simulation studies that incorporate seasonal variation. Copyright © 2014 Elsevier Inc. All rights reserved.
Analysis of survival data from telemetry projects

USGS Publications Warehouse

Bunck, C.M.; Winterstein, S.R.; Pollock, K.H.

1985-01-01

Telemetry techniques can be used to study the survival rates of animal populations and are particularly suitable for species or settings for which band recovery models are not. Statistical methods for estimating survival rates and parameters of survival distributions from observations of radio-tagged animals will be described. These methods have been applied to medical and engineering studies and to the study of nest success. Estimates and tests based on discrete models, originally introduced by Mayfield, and on continuous models, both parametric and nonparametric, will be described. Generalizations, including staggered entry of subjects into the study and identification of mortality factors will be considered. Additional discussion topics will include sample size considerations, relocation frequency for subjects, and use of covariates.
A Nonparametric Multidimensional IRT Approach with Applications to Ability Estimation and Test Bias.

DTIC Science & Technology

1988-04-01

VA 22314 800 N. Quincy Street Attn: TC Arlington, VA 22217-5000 (12 Copies) Dr. Hans Crombag Dr. Stephen Dunbar University of Leyden Lindquist...CenterEducation Research Center for Measurement Boerhaavelaan 2 University of Iowa 2334 EN Leyden Iowa City, IA 52242 The NETHERLANDS Dr. James A. Earles Mr...William Montague Naval Air Station NPRDC Code 13 Pensacola, FL 32508 San Diego, CA 92152-6800 Dr. Gary Marco Ms. Kathleen Moreno Stop 31-E Navy Personnel R
A simple randomisation procedure for validating discriminant analysis: a methodological note.

PubMed

Wastell, D G

1987-04-01

Because the goal of discriminant analysis (DA) is to optimise classification, it designedly exaggerates between-group differences. This bias complicates validation of DA. Jack-knifing has been used for validation but is inappropriate when stepwise selection (SWDA) is employed. A simple randomisation test is presented which is shown to give correct decisions for SWDA. The general superiority of randomisation tests over orthodox significance tests is discussed. Current work on non-parametric methods of estimating the error rates of prediction rules is briefly reviewed.
Quantal Response: Nonparametric Modeling

DTIC Science & Technology

2017-01-01

DATES COVERED (From ‐ To) 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d...creasing function as P(x) = G ( f (x) ) , where G is a monotone function such as the standard logistic, normal, or Cauchy CDF. Finite -dimensional...examples with dimension k = 5 where various colors distinguish the basis elements . Figure 3 shows logistic response estimates for these 3 basis sets
Prevalence Incidence Mixture Models

Cancer.gov

The R package and webtool fits Prevalence Incidence Mixture models to left-censored and irregularly interval-censored time to event data that is commonly found in screening cohorts assembled from electronic health records. Absolute and relative risk can be estimated for simple random sampling, and stratified sampling (the two approaches of superpopulation and a finite population are supported for target populations). Non-parametric (absolute risks only), semi-parametric, weakly-parametric (using B-splines), and some fully parametric (such as the logistic-Weibull) models are supported.
The Generalized Roy Model and the Cost-Benefit Analysis of Social Programs.

PubMed

Eisenhauer, Philipp; Heckman, James J; Vytlacil, Edward

2015-04-01

The literature on treatment effects focuses on gross benefits from program participation. We extend this literature by developing conditions under which it is possible to identify parameters measuring the cost and net surplus from program participation. Using the generalized Roy model, we nonparametrically identify the cost, benefit, and net surplus of selection into treatment without requiring the analyst to have direct information on the cost. We apply our methodology to estimate the gross benefit and net surplus of attending college.
The Generalized Roy Model and the Cost-Benefit Analysis of Social Programs*

PubMed Central

Eisenhauer, Philipp; Heckman, James J.; Vytlacil, Edward

2015-01-01

The literature on treatment effects focuses on gross benefits from program participation. We extend this literature by developing conditions under which it is possible to identify parameters measuring the cost and net surplus from program participation. Using the generalized Roy model, we nonparametrically identify the cost, benefit, and net surplus of selection into treatment without requiring the analyst to have direct information on the cost. We apply our methodology to estimate the gross benefit and net surplus of attending college. PMID:26709315
The urban heat island in Rio de Janeiro, Brazil, in the last 30 years using remote sensing data

NASA Astrophysics Data System (ADS)

Peres, Leonardo de Faria; Lucena, Andrews José de; Rotunno Filho, Otto Corrêa; França, José Ricardo de Almeida

2018-02-01

The aim of this work is to study urban heat island (UHI) in Metropolitan Area of Rio de Janeiro (MARJ) based on the analysis of land-surface temperature (LST) and land-use patterns retrieved from Landsat-5/Thematic Mapper (TM), Landsat-7/Enhanced Thematic Mapper Plus (ETM+) and Landsat-8/Operational Land Imager (OLI) and Thermal Infrared Sensors (TIRS) data covering a 32-year period between 1984 and 2015. LST temporal evolution is assessed by comparing the average LST composites for 1984-1999 and 2000-2015 where the parametric Student t-test was conducted at 5% significance level to map the pixels where LST for the more recent period is statistically significantly greater than the previous one. The non-parametric Mann-Whitney-Wilcoxon rank sum test has also confirmed at the same 5% significance level that the more recent period (2000-2015) has higher LST values. UHI intensity between ;urban; and ;rural/urban low density; (;vegetation;) areas for 1984-1999 and 2000-2015 was established and confirmed by both parametric and non-parametric tests at 1% significance level as 3.3 °C (5.1 °C) and 4.4 °C (7.1 °C), respectively. LST has statistically significantly (p-value < 0.01) increased over time in two of three land cover classes (;urban; and ;urban low density;), respectively by 1.9 °C and 0.9 °C, except in ;vegetation; class. A spatial analysis was also performed to identify the urban pixels within MARJ where UHI is more intense by subtracting the LST of these pixels from the LST mean value of ;vegetation; land-use class.
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS

PubMed Central

Huang, Jian; Horowitz, Joel L.; Wei, Fengrong

2010-01-01

We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is “small” relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method. PMID:21127739
Temporal changes and variability in temperature series over Peninsular Malaysia

NASA Astrophysics Data System (ADS)

Suhaila, Jamaludin

2015-02-01

With the current concern over climate change, the descriptions on how temperature series changed over time are very useful. Annual mean temperature has been analyzed for several stations over Peninsular Malaysia. Non-parametric statistical techniques such as Mann-Kendall test and Theil-Sen slope estimation are used primarily for assessing the significance and detection of trends, while a nonparametric Pettitt's test and sequential Mann-Kendall test are adopted to detect any abrupt climate change. Statistically significance increasing trends for annual mean temperature are detected for almost all studied stations with the magnitude of significant trend varied from 0.02°C to 0.05°C per year. The results shows that climate over Peninsular Malaysia is getting warmer than before. In addition, the results of the abrupt changes in temperature using Pettitt's and sequential Mann-Kendall test reveal the beginning of trends which can be related to El Nino episodes that occur in Malaysia. In general, the analysis results can help local stakeholders and water managers to understand the risks and vulnerabilities related to climate change in terms of mean events in the region.
Generalized t-statistic for two-group classification.

PubMed

Komori, Osamu; Eguchi, Shinto; Copas, John B

2015-06-01

In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.
The orbital PDF: general inference of the gravitational potential from steady-state tracers

NASA Astrophysics Data System (ADS)

Han, Jiaxin; Wang, Wenting; Cole, Shaun; Frenk, Carlos S.

2016-02-01

We develop two general methods to infer the gravitational potential of a system using steady-state tracers, I.e. tracers with a time-independent phase-space distribution. Combined with the phase-space continuity equation, the time independence implies a universal orbital probability density function (oPDF) dP(λ|orbit) ∝ dt, where λ is the coordinate of the particle along the orbit. The oPDF is equivalent to Jeans theorem, and is the key physical ingredient behind most dynamical modelling of steady-state tracers. In the case of a spherical potential, we develop a likelihood estimator that fits analytical potentials to the system and a non-parametric method (`phase-mark') that reconstructs the potential profile, both assuming only the oPDF. The methods involve no extra assumptions about the tracer distribution function and can be applied to tracers with any arbitrary distribution of orbits, with possible extension to non-spherical potentials. The methods are tested on Monte Carlo samples of steady-state tracers in dark matter haloes to show that they are unbiased as well as efficient. A fully documented C/PYTHON code implementing our method is freely available at a GitHub repository linked from http://icc.dur.ac.uk/data/#oPDF.
Model-based segmentation of abdominal aortic aneurysms in CTA images

NASA Astrophysics Data System (ADS)

de Bruijne, Marleen; van Ginneken, Bram; Niessen, Wiro J.; Loog, Marco; Viergever, Max A.

2003-05-01

Segmentation of thrombus in abdominal aortic aneurysms is complicated by regions of low boundary contrast and by the presence of many neighboring structures in close proximity to the aneurysm wall. We present an automated method that is similar to the well known Active Shape Models (ASM), combining a three-dimensional shape model with a one-dimensional boundary appearance model. Our contribution is twofold: we developed a non-parametric appearance modeling scheme that effectively deals with a highly varying background, and we propose a way of generalizing models of curvilinear structures from small training sets. In contrast with the conventional ASM approach, the new appearance model trains on both true and false examples of boundary profiles. The probability that a given image profile belongs to the boundary is obtained using k nearest neighbor (kNN) probability density estimation. The performance of this scheme is compared to that of original ASMs, which minimize the Mahalanobis distance to the average true profile in the training set. The generalizability of the shape model is improved by modeling the objects axis deformation independent of its cross-sectional deformation. A leave-one-out experiment was performed on 23 datasets. Segmentation using the kNN appearance model significantly outperformed the original ASM scheme; average volume errors were 5.9% and 46% respectively.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.