Dwivedi, Alok Kumar; Mallawaarachchi, Indika; Alvarado, Luis A
2017-06-30
Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions
Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.
Nonparametric analysis of Minnesota spruce and aspen tree data and LANDSAT data
NASA Technical Reports Server (NTRS)
Scott, D. W.; Jee, R.
1984-01-01
The application of nonparametric methods in data-intensive problems faced by NASA is described. The theoretical development of efficient multivariate density estimators and the novel use of color graphics workstations are reviewed. The use of nonparametric density estimates for data representation and for Bayesian classification are described and illustrated. Progress in building a data analysis system in a workstation environment is reviewed and preliminary runs presented.
EEG Correlates of Fluctuation in Cognitive Performance in an Air Traffic Control Task
2014-11-01
using non-parametric statistical analysis to identify neurophysiological patterns due to the time-on-task effect. Significant changes in EEG power...EEG, Cognitive Performance, Power Spectral Analysis , Non-Parametric Analysis Document is available to the public through the Internet...3 Performance Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 EEG
Local kernel nonparametric discriminant analysis for adaptive extraction of complex structures
NASA Astrophysics Data System (ADS)
Li, Quanbao; Wei, Fajie; Zhou, Shenghan
2017-05-01
The linear discriminant analysis (LDA) is one of popular means for linear feature extraction. It usually performs well when the global data structure is consistent with the local data structure. Other frequently-used approaches of feature extraction usually require linear, independence, or large sample condition. However, in real world applications, these assumptions are not always satisfied or cannot be tested. In this paper, we introduce an adaptive method, local kernel nonparametric discriminant analysis (LKNDA), which integrates conventional discriminant analysis with nonparametric statistics. LKNDA is adept in identifying both complex nonlinear structures and the ad hoc rule. Six simulation cases demonstrate that LKNDA have both parametric and nonparametric algorithm advantages and higher classification accuracy. Quartic unilateral kernel function may provide better robustness of prediction than other functions. LKNDA gives an alternative solution for discriminant cases of complex nonlinear feature extraction or unknown feature extraction. At last, the application of LKNDA in the complex feature extraction of financial market activities is proposed.
ERIC Educational Resources Information Center
Kantabutra, Sangchan
2009-01-01
This paper examines urban-rural effects on public upper-secondary school efficiency in northern Thailand. In the study, efficiency was measured by a nonparametric technique, data envelopment analysis (DEA). Urban-rural effects were examined through a Mann-Whitney nonparametric statistical test. Results indicate that urban schools appear to have…
Quintela-del-Río, Alejandro; Francisco-Fernández, Mario
2011-02-01
The study of extreme values and prediction of ozone data is an important topic of research when dealing with environmental problems. Classical extreme value theory is usually used in air-pollution studies. It consists in fitting a parametric generalised extreme value (GEV) distribution to a data set of extreme values, and using the estimated distribution to compute return levels and other quantities of interest. Here, we propose to estimate these values using nonparametric functional data methods. Functional data analysis is a relatively new statistical methodology that generally deals with data consisting of curves or multi-dimensional variables. In this paper, we use this technique, jointly with nonparametric curve estimation, to provide alternatives to the usual parametric statistical tools. The nonparametric estimators are applied to real samples of maximum ozone values obtained from several monitoring stations belonging to the Automatic Urban and Rural Network (AURN) in the UK. The results show that nonparametric estimators work satisfactorily, outperforming the behaviour of classical parametric estimators. Functional data analysis is also used to predict stratospheric ozone concentrations. We show an application, using the data set of mean monthly ozone concentrations in Arosa, Switzerland, and the results are compared with those obtained by classical time series (ARIMA) analysis. Copyright © 2010 Elsevier Ltd. All rights reserved.
An Instructional Module on Mokken Scale Analysis
ERIC Educational Resources Information Center
Wind, Stefanie A.
2017-01-01
Mokken scale analysis (MSA) is a probabilistic-nonparametric approach to item response theory (IRT) that can be used to evaluate fundamental measurement properties with less strict assumptions than parametric IRT models. This instructional module provides an introduction to MSA as a probabilistic-nonparametric framework in which to explore…
O'Sullivan, Finbarr; Muzi, Mark; Spence, Alexander M; Mankoff, David M; O'Sullivan, Janet N; Fitzgerald, Niall; Newman, George C; Krohn, Kenneth A
2009-06-01
Kinetic analysis is used to extract metabolic information from dynamic positron emission tomography (PET) uptake data. The theory of indicator dilutions, developed in the seminal work of Meier and Zierler (1954), provides a probabilistic framework for representation of PET tracer uptake data in terms of a convolution between an arterial input function and a tissue residue. The residue is a scaled survival function associated with tracer residence in the tissue. Nonparametric inference for the residue, a deconvolution problem, provides a novel approach to kinetic analysis-critically one that is not reliant on specific compartmental modeling assumptions. A practical computational technique based on regularized cubic B-spline approximation of the residence time distribution is proposed. Nonparametric residue analysis allows formal statistical evaluation of specific parametric models to be considered. This analysis needs to properly account for the increased flexibility of the nonparametric estimator. The methodology is illustrated using data from a series of cerebral studies with PET and fluorodeoxyglucose (FDG) in normal subjects. Comparisons are made between key functionals of the residue, tracer flux, flow, etc., resulting from a parametric (the standard two-compartment of Phelps et al. 1979) and a nonparametric analysis. Strong statistical evidence against the compartment model is found. Primarily these differences relate to the representation of the early temporal structure of the tracer residence-largely a function of the vascular supply network. There are convincing physiological arguments against the representations implied by the compartmental approach but this is the first time that a rigorous statistical confirmation using PET data has been reported. The compartmental analysis produces suspect values for flow but, notably, the impact on the metabolic flux, though statistically significant, is limited to deviations on the order of 3%-4%. The general advantage of the nonparametric residue analysis is the ability to provide a valid kinetic quantitation in the context of studies where there may be heterogeneity or other uncertainty about the accuracy of a compartmental model approximation of the tissue residue.
Nonparametric Trajectory Analysis (NTA), a receptor-oriented model, was used to assess the impact of local sources of air pollution at monitoring sites located adjacent to highway I-15 in Las Vegas, NV. Measurements of black carbon, carbon monoxide, nitrogen oxides, and sulfur di...
Siciliani, Luigi
2006-01-01
Policy makers are increasingly interested in developing performance indicators that measure hospital efficiency. These indicators may give the purchasers of health services an additional regulatory tool to contain health expenditure. Using panel data, this study compares different parametric (econometric) and non-parametric (linear programming) techniques for the measurement of a hospital's technical efficiency. This comparison was made using a sample of 17 Italian hospitals in the years 1996-9. Highest correlations are found in the efficiency scores between the non-parametric data envelopment analysis under the constant returns to scale assumption (DEA-CRS) and several parametric models. Correlation reduces markedly when using more flexible non-parametric specifications such as data envelopment analysis under the variable returns to scale assumption (DEA-VRS) and the free disposal hull (FDH) model. Correlation also generally reduces when moving from one output to two-output specifications. This analysis suggests that there is scope for developing performance indicators at hospital level using panel data, but it is important that extensive sensitivity analysis is carried out if purchasers wish to make use of these indicators in practice.
NASA Astrophysics Data System (ADS)
Vittal, H.; Singh, Jitendra; Kumar, Pankaj; Karmakar, Subhankar
2015-06-01
In watershed management, flood frequency analysis (FFA) is performed to quantify the risk of flooding at different spatial locations and also to provide guidelines for determining the design periods of flood control structures. The traditional FFA was extensively performed by considering univariate scenario for both at-site and regional estimation of return periods. However, due to inherent mutual dependence of the flood variables or characteristics [i.e., peak flow (P), flood volume (V) and flood duration (D), which are random in nature], analysis has been further extended to multivariate scenario, with some restrictive assumptions. To overcome the assumption of same family of marginal density function for all flood variables, the concept of copula has been introduced. Although, the advancement from univariate to multivariate analyses drew formidable attention to the FFA research community, the basic limitation was that the analyses were performed with the implementation of only parametric family of distributions. The aim of the current study is to emphasize the importance of nonparametric approaches in the field of multivariate FFA; however, the nonparametric distribution may not always be a good-fit and capable of replacing well-implemented multivariate parametric and multivariate copula-based applications. Nevertheless, the potential of obtaining best-fit using nonparametric distributions might be improved because such distributions reproduce the sample's characteristics, resulting in more accurate estimations of the multivariate return period. Hence, the current study shows the importance of conjugating multivariate nonparametric approach with multivariate parametric and copula-based approaches, thereby results in a comprehensive framework for complete at-site FFA. Although the proposed framework is designed for at-site FFA, this approach can also be applied to regional FFA because regional estimations ideally include at-site estimations. The framework is based on the following steps: (i) comprehensive trend analysis to assess nonstationarity in the observed data; (ii) selection of the best-fit univariate marginal distribution with a comprehensive set of parametric and nonparametric distributions for the flood variables; (iii) multivariate frequency analyses with parametric, copula-based and nonparametric approaches; and (iv) estimation of joint and various conditional return periods. The proposed framework for frequency analysis is demonstrated using 110 years of observed data from Allegheny River at Salamanca, New York, USA. The results show that for both univariate and multivariate cases, the nonparametric Gaussian kernel provides the best estimate. Further, we perform FFA for twenty major rivers over continental USA, which shows for seven rivers, all the flood variables followed nonparametric Gaussian kernel; whereas for other rivers, parametric distributions provide the best-fit either for one or two flood variables. Thus the summary of results shows that the nonparametric method cannot substitute the parametric and copula-based approaches, but should be considered during any at-site FFA to provide the broadest choices for best estimation of the flood return periods.
ERIC Educational Resources Information Center
Mittag, Kathleen Cage
Most researchers using factor analysis extract factors from a matrix of Pearson product-moment correlation coefficients. A method is presented for extracting factors in a non-parametric way, by extracting factors from a matrix of Spearman rho (rank correlation) coefficients. It is possible to factor analyze a matrix of association such that…
NASA Astrophysics Data System (ADS)
Feng, Jinchao; Lansford, Joshua; Mironenko, Alexander; Pourkargar, Davood Babaei; Vlachos, Dionisios G.; Katsoulakis, Markos A.
2018-03-01
We propose non-parametric methods for both local and global sensitivity analysis of chemical reaction models with correlated parameter dependencies. The developed mathematical and statistical tools are applied to a benchmark Langmuir competitive adsorption model on a close packed platinum surface, whose parameters, estimated from quantum-scale computations, are correlated and are limited in size (small data). The proposed mathematical methodology employs gradient-based methods to compute sensitivity indices. We observe that ranking influential parameters depends critically on whether or not correlations between parameters are taken into account. The impact of uncertainty in the correlation and the necessity of the proposed non-parametric perspective are demonstrated.
A nonparametric analysis of plot basal area growth using tree based models
G. L. Gadbury; H. K. lyer; H. T. Schreuder; C. Y. Ueng
1997-01-01
Tree based statistical models can be used to investigate data structure and predict future observations. We used nonparametric and nonlinear models to reexamine the data sets on tree growth used by Bechtold et al. (1991) and Ruark et al. (1991). The growth data were collected by Forest Inventory and Analysis (FIA) teams from 1962 to 1972 (4th cycle) and 1972 to 1982 (...
Geometric analysis and restitution of digital multispectral scanner data arrays
NASA Technical Reports Server (NTRS)
Baker, J. R.; Mikhail, E. M.
1975-01-01
An investigation was conducted to define causes of geometric defects within digital multispectral scanner (MSS) data arrays, to analyze the resulting geometric errors, and to investigate restitution methods to correct or reduce these errors. Geometric transformation relationships for scanned data, from which collinearity equations may be derived, served as the basis of parametric methods of analysis and restitution of MSS digital data arrays. The linearization of these collinearity equations is presented. Algorithms considered for use in analysis and restitution included the MSS collinearity equations, piecewise polynomials based on linearized collinearity equations, and nonparametric algorithms. A proposed system for geometric analysis and restitution of MSS digital data arrays was used to evaluate these algorithms, utilizing actual MSS data arrays. It was shown that collinearity equations and nonparametric algorithms both yield acceptable results, but nonparametric algorithms possess definite advantages in computational efficiency. Piecewise polynomials were found to yield inferior results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aghamousa, Amir; Shafieloo, Arman; Arjunwadkar, Mihir
2015-02-01
Estimation of the angular power spectrum is one of the important steps in Cosmic Microwave Background (CMB) data analysis. Here, we present a nonparametric estimate of the temperature angular power spectrum for the Planck 2013 CMB data. The method implemented in this work is model-independent, and allows the data, rather than the model, to dictate the fit. Since one of the main targets of our analysis is to test the consistency of the ΛCDM model with Planck 2013 data, we use the nuisance parameters associated with the best-fit ΛCDM angular power spectrum to remove foreground contributions from the data atmore » multipoles ℓ ≥50. We thus obtain a combined angular power spectrum data set together with the full covariance matrix, appropriately weighted over frequency channels. Our subsequent nonparametric analysis resolves six peaks (and five dips) up to ℓ ∼1850 in the temperature angular power spectrum. We present uncertainties in the peak/dip locations and heights at the 95% confidence level. We further show how these reflect the harmonicity of acoustic peaks, and can be used for acoustic scale estimation. Based on this nonparametric formalism, we found the best-fit ΛCDM model to be at 36% confidence distance from the center of the nonparametric confidence set—this is considerably larger than the confidence distance (9%) derived earlier from a similar analysis of the WMAP 7-year data. Another interesting result of our analysis is that at low multipoles, the Planck data do not suggest any upturn, contrary to the expectation based on the integrated Sachs-Wolfe contribution in the best-fit ΛCDM cosmology.« less
Benchmark dose analysis via nonparametric regression modeling
Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen
2013-01-01
Estimation of benchmark doses (BMDs) in quantitative risk assessment traditionally is based upon parametric dose-response modeling. It is a well-known concern, however, that if the chosen parametric model is uncertain and/or misspecified, inaccurate and possibly unsafe low-dose inferences can result. We describe a nonparametric approach for estimating BMDs with quantal-response data based on an isotonic regression method, and also study use of corresponding, nonparametric, bootstrap-based confidence limits for the BMD. We explore the confidence limits’ small-sample properties via a simulation study, and illustrate the calculations with an example from cancer risk assessment. It is seen that this nonparametric approach can provide a useful alternative for BMD estimation when faced with the problem of parametric model uncertainty. PMID:23683057
Sengupta Chattopadhyay, Amrita; Hsiao, Ching-Lin; Chang, Chien Ching; Lian, Ie-Bin; Fann, Cathy S J
2014-01-01
Identifying susceptibility genes that influence complex diseases is extremely difficult because loci often influence the disease state through genetic interactions. Numerous approaches to detect disease-associated SNP-SNP interactions have been developed, but none consistently generates high-quality results under different disease scenarios. Using summarizing techniques to combine a number of existing methods may provide a solution to this problem. Here we used three popular non-parametric methods-Gini, absolute probability difference (APD), and entropy-to develop two novel summary scores, namely principle component score (PCS) and Z-sum score (ZSS), with which to predict disease-associated genetic interactions. We used a simulation study to compare performance of the non-parametric scores, the summary scores, the scaled-sum score (SSS; used in polymorphism interaction analysis (PIA)), and the multifactor dimensionality reduction (MDR). The non-parametric methods achieved high power, but no non-parametric method outperformed all others under a variety of epistatic scenarios. PCS and ZSS, however, outperformed MDR. PCS, ZSS and SSS displayed controlled type-I-errors (<0.05) compared to GS, APDS, ES (>0.05). A real data study using the genetic-analysis-workshop 16 (GAW 16) rheumatoid arthritis dataset identified a number of interesting SNP-SNP interactions. © 2013 Elsevier B.V. All rights reserved.
Assessment of Dimensionality in Social Science Subtest
ERIC Educational Resources Information Center
Ozbek Bastug, Ozlem Yesim
2012-01-01
Most of the literature on dimensionality focused on either comparison of parametric and nonparametric dimensionality detection procedures or showing the effectiveness of one type of procedure. There is no known study to shown how to do combined parametric and nonparametric dimensionality analysis on real data. The current study is aimed to fill…
Measuring Youth Development: A Nonparametric Cross-Country "Youth Welfare Index"
ERIC Educational Resources Information Center
Chaaban, Jad M.
2009-01-01
This paper develops an empirical methodology for the construction of a synthetic multi-dimensional cross-country comparison of the performance of governments around the world in improving the livelihood of their younger population. The devised "Youth Welfare Index" is based on the nonparametric Data Envelopment Analysis (DEA) methodology and…
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.
Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye
2016-03-01
The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.
Applications of non-parametric statistics and analysis of variance on sample variances
NASA Technical Reports Server (NTRS)
Myers, R. H.
1981-01-01
Nonparametric methods that are available for NASA-type applications are discussed. An attempt will be made here to survey what can be used, to attempt recommendations as to when each would be applicable, and to compare the methods, when possible, with the usual normal-theory procedures that are avavilable for the Gaussion analog. It is important here to point out the hypotheses that are being tested, the assumptions that are being made, and limitations of the nonparametric procedures. The appropriateness of doing analysis of variance on sample variances are also discussed and studied. This procedure is followed in several NASA simulation projects. On the surface this would appear to be reasonably sound procedure. However, difficulties involved center around the normality problem and the basic homogeneous variance assumption that is mase in usual analysis of variance problems. These difficulties discussed and guidelines given for using the methods.
Does Private Tutoring Work? The Effectiveness of Private Tutoring: A Nonparametric Bounds Analysis
ERIC Educational Resources Information Center
Hof, Stefanie
2014-01-01
Private tutoring has become popular throughout the world. However, evidence for the effect of private tutoring on students' academic outcome is inconclusive; therefore, this paper presents an alternative framework: a nonparametric bounds method. The present examination uses, for the first time, a large representative data-set in a European setting…
Estimation of Spatial Dynamic Nonparametric Durbin Models with Fixed Effects
ERIC Educational Resources Information Center
Qian, Minghui; Hu, Ridong; Chen, Jianwei
2016-01-01
Spatial panel data models have been widely studied and applied in both scientific and social science disciplines, especially in the analysis of spatial influence. In this paper, we consider the spatial dynamic nonparametric Durbin model (SDNDM) with fixed effects, which takes the nonlinear factors into account base on the spatial dynamic panel…
Nonparametric Item Response Curve Estimation with Correction for Measurement Error
ERIC Educational Resources Information Center
Guo, Hongwen; Sinharay, Sandip
2011-01-01
Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…
Measurement Error in Nonparametric Item Response Curve Estimation. Research Report. ETS RR-11-28
ERIC Educational Resources Information Center
Guo, Hongwen; Sinharay, Sandip
2011-01-01
Nonparametric, or kernel, estimation of item response curve (IRC) is a concern theoretically and operationally. Accuracy of this estimation, often used in item analysis in testing programs, is biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. In this study, we investigate…
Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A
2015-05-01
Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories. Copyright © 2015 Elsevier Ltd. All rights reserved.
Theodorsson-Norheim, E
1986-08-01
Multiple t tests at a fixed p level are frequently used to analyse biomedical data where analysis of variance followed by multiple comparisons or the adjustment of the p values according to Bonferroni would be more appropriate. The Kruskal-Wallis test is a nonparametric 'analysis of variance' which may be used to compare several independent samples. The present program is written in an elementary subset of BASIC and will perform Kruskal-Wallis test followed by multiple comparisons between the groups on practically any computer programmable in BASIC.
ERIC Educational Resources Information Center
Sabourin, Stephane; Valois, Pierre; Lussier, Yvan
2005-01-01
The main purpose of the current research was to develop an abbreviated form of the Dyadic Adjustment Scale (DAS) with nonparametric item response theory. The authors conducted 5 studies, with a total participation of 8,256 married or cohabiting individuals. Results showed that the item characteristic curves behaved in a monotonically increasing…
ERIC Educational Resources Information Center
Bakir, Saad T.
2010-01-01
We propose a nonparametric (or distribution-free) procedure for testing the equality of several population variances (or scale parameters). The proposed test is a modification of Bakir's (1989, Commun. Statist., Simul-Comp., 18, 757-775) analysis of means by ranks (ANOMR) procedure for testing the equality of several population means. A proof is…
Tangen, C M; Koch, G G
1999-03-01
In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.
NASA Astrophysics Data System (ADS)
Fernandes, Adji Achmad Rinaldo; Solimun, Arisoesilaningsih, Endang
2017-12-01
The aim of this research is to estimate the spline in Path Analysis-based on Nonparametric Regression using Penalized Weighted Least Square (PWLS) approach. Approach used is Reproducing Kernel Hilbert Space at sobolev space. Nonparametric path analysis model on the equation y1 i=f1.1(x1 i)+ε1 i; y2 i=f1.2(x1 i)+f2.2(y1 i)+ε2 i; i =1 ,2 ,…,n Nonparametric Path Analysis which meet the criteria of minimizing PWLS min fw .k∈W2m[aw .k,bw .k], k =1 ,2 { (2n ) -1(y˜-f ˜ ) TΣ-1(y ˜-f ˜ ) + ∑k =1 2 ∑w =1 2 λw .k ∫aw .k bw .k [fw.k (m )(xi) ] 2d xi } is f ˜^=Ay ˜ with A=T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1+V1U1-1∑-1[I-T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1] columnalign="left">+T2(T2TU2-1∑-1T2)-1T2TU2-1∑-1+V2U2-1∑-1[I1-T2(T2TU2-1∑-1T2) -1T2TU2-1∑-1
NASA Astrophysics Data System (ADS)
Curceac, S.; Ternynck, C.; Ouarda, T.
2015-12-01
Over the past decades, a substantial amount of research has been conducted to model and forecast climatic variables. In this study, Nonparametric Functional Data Analysis (NPFDA) methods are applied to forecast air temperature and wind speed time series in Abu Dhabi, UAE. The dataset consists of hourly measurements recorded for a period of 29 years, 1982-2010. The novelty of the Functional Data Analysis approach is in expressing the data as curves. In the present work, the focus is on daily forecasting and the functional observations (curves) express the daily measurements of the above mentioned variables. We apply a non-linear regression model with a functional non-parametric kernel estimator. The computation of the estimator is performed using an asymmetrical quadratic kernel function for local weighting based on the bandwidth obtained by a cross validation procedure. The proximities between functional objects are calculated by families of semi-metrics based on derivatives and Functional Principal Component Analysis (FPCA). Additionally, functional conditional mode and functional conditional median estimators are applied and the advantages of combining their results are analysed. A different approach employs a SARIMA model selected according to the minimum Akaike (AIC) and Bayessian (BIC) Information Criteria and based on the residuals of the model. The performance of the models is assessed by calculating error indices such as the root mean square error (RMSE), relative RMSE, BIAS and relative BIAS. The results indicate that the NPFDA models provide more accurate forecasts than the SARIMA models. Key words: Nonparametric functional data analysis, SARIMA, time series forecast, air temperature, wind speed
Nonparametric bootstrap analysis with applications to demographic effects in demand functions.
Gozalo, P L
1997-12-01
"A new bootstrap proposal, labeled smooth conditional moment (SCM) bootstrap, is introduced for independent but not necessarily identically distributed data, where the classical bootstrap procedure fails.... A good example of the benefits of using nonparametric and bootstrap methods is the area of empirical demand analysis. In particular, we will be concerned with their application to the study of two important topics: what are the most relevant effects of household demographic variables on demand behavior, and to what extent present parametric specifications capture these effects." excerpt
Bayesian non-parametric inference for stochastic epidemic models using Gaussian Processes.
Xu, Xiaoguang; Kypraios, Theodore; O'Neill, Philip D
2016-10-01
This paper considers novel Bayesian non-parametric methods for stochastic epidemic models. Many standard modeling and data analysis methods use underlying assumptions (e.g. concerning the rate at which new cases of disease will occur) which are rarely challenged or tested in practice. To relax these assumptions, we develop a Bayesian non-parametric approach using Gaussian Processes, specifically to estimate the infection process. The methods are illustrated with both simulated and real data sets, the former illustrating that the methods can recover the true infection process quite well in practice, and the latter illustrating that the methods can be successfully applied in different settings. © The Author 2016. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Sumantari, Y. D.; Slamet, I.; Sugiyanto
2017-06-01
Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.
Generalized Correlation Coefficient for Non-Parametric Analysis of Microarray Time-Course Data.
Tan, Qihua; Thomassen, Mads; Burton, Mark; Mose, Kristian Fredløv; Andersen, Klaus Ejner; Hjelmborg, Jacob; Kruse, Torben
2017-06-06
Modeling complex time-course patterns is a challenging issue in microarray study due to complex gene expression patterns in response to the time-course experiment. We introduce the generalized correlation coefficient and propose a combinatory approach for detecting, testing and clustering the heterogeneous time-course gene expression patterns. Application of the method identified nonlinear time-course patterns in high agreement with parametric analysis. We conclude that the non-parametric nature in the generalized correlation analysis could be an useful and efficient tool for analyzing microarray time-course data and for exploring the complex relationships in the omics data for studying their association with disease and health.
Halliday, David M; Senik, Mohd Harizal; Stevenson, Carl W; Mason, Rob
2016-08-01
The ability to infer network structure from multivariate neuronal signals is central to computational neuroscience. Directed network analyses typically use parametric approaches based on auto-regressive (AR) models, where networks are constructed from estimates of AR model parameters. However, the validity of using low order AR models for neurophysiological signals has been questioned. A recent article introduced a non-parametric approach to estimate directionality in bivariate data, non-parametric approaches are free from concerns over model validity. We extend the non-parametric framework to include measures of directed conditional independence, using scalar measures that decompose the overall partial correlation coefficient summatively by direction, and a set of functions that decompose the partial coherence summatively by direction. A time domain partial correlation function allows both time and frequency views of the data to be constructed. The conditional independence estimates are conditioned on a single predictor. The framework is applied to simulated cortical neuron networks and mixtures of Gaussian time series data with known interactions. It is applied to experimental data consisting of local field potential recordings from bilateral hippocampus in anaesthetised rats. The framework offers a non-parametric approach to estimation of directed interactions in multivariate neuronal recordings, and increased flexibility in dealing with both spike train and time series data. The framework offers a novel alternative non-parametric approach to estimate directed interactions in multivariate neuronal recordings, and is applicable to spike train and time series data. Copyright © 2016 Elsevier B.V. All rights reserved.
TRAN-STAT: statistics for environmental transuranic studies, July 1978, Number 5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
This issue is concerned with nonparametric procedures for (1) estimating the central tendency of a population, (2) describing data sets through estimating percentiles, (3) estimating confidence limits for the median and other percentiles, (4) estimating tolerance limits and associated numbers of samples, and (5) tests of significance and associated procedures for a variety of testing situations (counterparts to t-tests and analysis of variance). Some characteristics of several nonparametric tests are illustrated using the NAEG /sup 241/Am aliquot data presented and discussed in the April issue of TRAN-STAT. Some of the statistical terms used here are defined in a glossary. Themore » reference list also includes short descriptions of nonparametric books. 31 references, 3 figures, 1 table.« less
Hanley, James A
2008-01-01
Most survival analysis textbooks explain how the hazard ratio parameters in Cox's life table regression model are estimated. Fewer explain how the components of the nonparametric baseline survivor function are derived. Those that do often relegate the explanation to an "advanced" section and merely present the components as algebraic or iterative solutions to estimating equations. None comment on the structure of these estimators. This note brings out a heuristic representation that may help to de-mystify the structure.
NASA Astrophysics Data System (ADS)
Romero, C.; McWilliam, M.; Macías-Pérez, J.-F.; Adam, R.; Ade, P.; André, P.; Aussel, H.; Beelen, A.; Benoît, A.; Bideaud, A.; Billot, N.; Bourrion, O.; Calvo, M.; Catalano, A.; Coiffard, G.; Comis, B.; de Petris, M.; Désert, F.-X.; Doyle, S.; Goupy, J.; Kramer, C.; Lagache, G.; Leclercq, S.; Lestrade, J.-F.; Mauskopf, P.; Mayet, F.; Monfardini, A.; Pascale, E.; Perotto, L.; Pisano, G.; Ponthieu, N.; Revéret, V.; Ritacco, A.; Roussel, H.; Ruppin, F.; Schuster, K.; Sievers, A.; Triqueneaux, S.; Tucker, C.; Zylka, R.
2018-04-01
Context. In the past decade, sensitive, resolved Sunyaev-Zel'dovich (SZ) studies of galaxy clusters have become common. Whereas many previous SZ studies have parameterized the pressure profiles of galaxy clusters, non-parametric reconstructions will provide insights into the thermodynamic state of the intracluster medium. Aim. We seek to recover the non-parametric pressure profiles of the high redshift (z = 0.89) galaxy cluster CLJ 1226.9+3332 as inferred from SZ data from the MUSTANG, NIKA, Bolocam, and Planck instruments, which all probe different angular scales. Methods: Our non-parametric algorithm makes use of logarithmic interpolation, which under the assumption of ellipsoidal symmetry is analytically integrable. For MUSTANG, NIKA, and Bolocam we derive a non-parametric pressure profile independently and find good agreement among the instruments. In particular, we find that the non-parametric profiles are consistent with a fitted generalized Navaro-Frenk-White (gNFW) profile. Given the ability of Planck to constrain the total signal, we include a prior on the integrated Compton Y parameter as determined by Planck. Results: For a given instrument, constraints on the pressure profile diminish rapidly beyond the field of view. The overlap in spatial scales probed by these four datasets is therefore critical in checking for consistency between instruments. By using multiple instruments, our analysis of CLJ 1226.9+3332 covers a large radial range, from the central regions to the cluster outskirts: 0.05 R500 < r < 1.1 R500. This is a wider range of spatial scales than is typically recovered by SZ instruments. Similar analyses will be possible with the new generation of SZ instruments such as NIKA2 and MUSTANG2.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.
Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification
Feng, Yang; Jiang, Jiancheng; Tong, Xin
2015-01-01
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970
NASA Astrophysics Data System (ADS)
Hasyim, M.; Prastyo, D. D.
2018-03-01
Survival analysis performs relationship between independent variables and survival time as dependent variable. In fact, not all survival data can be recorded completely by any reasons. In such situation, the data is called censored data. Moreover, several model for survival analysis requires assumptions. One of the approaches in survival analysis is nonparametric that gives more relax assumption. In this research, the nonparametric approach that is employed is Multivariate Regression Adaptive Spline (MARS). This study is aimed to measure the performance of private university’s lecturer. The survival time in this study is duration needed by lecturer to obtain their professional certificate. The results show that research activities is a significant factor along with developing courses material, good publication in international or national journal, and activities in research collaboration.
Identification and estimation of survivor average causal effects.
Tchetgen Tchetgen, Eric J
2014-09-20
In longitudinal studies, outcomes ascertained at follow-up are typically undefined for individuals who die prior to the follow-up visit. In such settings, outcomes are said to be truncated by death and inference about the effects of a point treatment or exposure, restricted to individuals alive at the follow-up visit, could be biased even if as in experimental studies, treatment assignment were randomized. To account for truncation by death, the survivor average causal effect (SACE) defines the effect of treatment on the outcome for the subset of individuals who would have survived regardless of exposure status. In this paper, the author nonparametrically identifies SACE by leveraging post-exposure longitudinal correlates of survival and outcome that may also mediate the exposure effects on survival and outcome. Nonparametric identification is achieved by supposing that the longitudinal data arise from a certain nonparametric structural equations model and by making the monotonicity assumption that the effect of exposure on survival agrees in its direction across individuals. A novel weighted analysis involving a consistent estimate of the survival process is shown to produce consistent estimates of SACE. A data illustration is given, and the methods are extended to the context of time-varying exposures. We discuss a sensitivity analysis framework that relaxes assumptions about independent errors in the nonparametric structural equations model and may be used to assess the extent to which inference may be altered by a violation of key identifying assumptions. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.
Identification and estimation of survivor average causal effects
Tchetgen, Eric J Tchetgen
2014-01-01
In longitudinal studies, outcomes ascertained at follow-up are typically undefined for individuals who die prior to the follow-up visit. In such settings, outcomes are said to be truncated by death and inference about the effects of a point treatment or exposure, restricted to individuals alive at the follow-up visit, could be biased even if as in experimental studies, treatment assignment were randomized. To account for truncation by death, the survivor average causal effect (SACE) defines the effect of treatment on the outcome for the subset of individuals who would have survived regardless of exposure status. In this paper, the author nonparametrically identifies SACE by leveraging post-exposure longitudinal correlates of survival and outcome that may also mediate the exposure effects on survival and outcome. Nonparametric identification is achieved by supposing that the longitudinal data arise from a certain nonparametric structural equations model and by making the monotonicity assumption that the effect of exposure on survival agrees in its direction across individuals. A novel weighted analysis involving a consistent estimate of the survival process is shown to produce consistent estimates of SACE. A data illustration is given, and the methods are extended to the context of time-varying exposures. We discuss a sensitivity analysis framework that relaxes assumptions about independent errors in the nonparametric structural equations model and may be used to assess the extent to which inference may be altered by a violation of key identifying assumptions. © 2014 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24889022
Lee, L.; Helsel, D.
2007-01-01
Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.
Wavelet Filtering to Reduce Conservatism in Aeroservoelastic Robust Stability Margins
NASA Technical Reports Server (NTRS)
Brenner, Marty; Lind, Rick
1998-01-01
Wavelet analysis for filtering and system identification was used to improve the estimation of aeroservoelastic stability margins. The conservatism of the robust stability margins was reduced with parametric and nonparametric time-frequency analysis of flight data in the model validation process. Nonparametric wavelet processing of data was used to reduce the effects of external desirableness and unmodeled dynamics. Parametric estimates of modal stability were also extracted using the wavelet transform. Computation of robust stability margins for stability boundary prediction depends on uncertainty descriptions derived from the data for model validation. F-18 high Alpha Research Vehicle aeroservoelastic flight test data demonstrated improved robust stability prediction by extension of the stability boundary beyond the flight regime.
Semiparametric mixed-effects analysis of PK/PD models using differential equations.
Wang, Yi; Eskridge, Kent M; Zhang, Shunpu
2008-08-01
Motivated by the use of semiparametric nonlinear mixed-effects modeling on longitudinal data, we develop a new semiparametric modeling approach to address potential structural model misspecification for population pharmacokinetic/pharmacodynamic (PK/PD) analysis. Specifically, we use a set of ordinary differential equations (ODEs) with form dx/dt = A(t)x + B(t) where B(t) is a nonparametric function that is estimated using penalized splines. The inclusion of a nonparametric function in the ODEs makes identification of structural model misspecification feasible by quantifying the model uncertainty and provides flexibility for accommodating possible structural model deficiencies. The resulting model will be implemented in a nonlinear mixed-effects modeling setup for population analysis. We illustrate the method with an application to cefamandole data and evaluate its performance through simulations.
Packham, B; Barnes, G; Dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D
2016-06-01
Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity.
Packham, B; Barnes, G; dos Santos, G Sato; Aristovich, K; Gilad, O; Ghosh, A; Oh, T; Holder, D
2016-01-01
Abstract Electrical impedance tomography (EIT) allows for the reconstruction of internal conductivity from surface measurements. A change in conductivity occurs as ion channels open during neural activity, making EIT a potential tool for functional brain imaging. EIT images can have >10 000 voxels, which means statistical analysis of such images presents a substantial multiple testing problem. One way to optimally correct for these issues and still maintain the flexibility of complicated experimental designs is to use random field theory. This parametric method estimates the distribution of peaks one would expect by chance in a smooth random field of a given size. Random field theory has been used in several other neuroimaging techniques but never validated for EIT images of fast neural activity, such validation can be achieved using non-parametric techniques. Both parametric and non-parametric techniques were used to analyze a set of 22 images collected from 8 rats. Significant group activations were detected using both techniques (corrected p < 0.05). Both parametric and non-parametric analyses yielded similar results, although the latter was less conservative. These results demonstrate the first statistical analysis of such an image set and indicate that such an analysis is an approach for EIT images of neural activity. PMID:27203477
Combined non-parametric and parametric approach for identification of time-variant systems
NASA Astrophysics Data System (ADS)
Dziedziech, Kajetan; Czop, Piotr; Staszewski, Wieslaw J.; Uhl, Tadeusz
2018-03-01
Identification of systems, structures and machines with variable physical parameters is a challenging task especially when time-varying vibration modes are involved. The paper proposes a new combined, two-step - i.e. non-parametric and parametric - modelling approach in order to determine time-varying vibration modes based on input-output measurements. Single-degree-of-freedom (SDOF) vibration modes from multi-degree-of-freedom (MDOF) non-parametric system representation are extracted in the first step with the use of time-frequency wavelet-based filters. The second step involves time-varying parametric representation of extracted modes with the use of recursive linear autoregressive-moving-average with exogenous inputs (ARMAX) models. The combined approach is demonstrated using system identification analysis based on the experimental mass-varying MDOF frame-like structure subjected to random excitation. The results show that the proposed combined method correctly captures the dynamics of the analysed structure, using minimum a priori information on the model.
Spectral analysis method for detecting an element
Blackwood, Larry G [Idaho Falls, ID; Edwards, Andrew J [Idaho Falls, ID; Jewell, James K [Idaho Falls, ID; Reber, Edward L [Idaho Falls, ID; Seabury, Edward H [Idaho Falls, ID
2008-02-12
A method for detecting an element is described and which includes the steps of providing a gamma-ray spectrum which has a region of interest which corresponds with a small amount of an element to be detected; providing nonparametric assumptions about a shape of the gamma-ray spectrum in the region of interest, and which would indicate the presence of the element to be detected; and applying a statistical test to the shape of the gamma-ray spectrum based upon the nonparametric assumptions to detect the small amount of the element to be detected.
Out-of-Sample Extensions for Non-Parametric Kernel Methods.
Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang
2017-02-01
Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.
Further evidence for the increased power of LOD scores compared with nonparametric methods.
Durner, M; Vieland, V J; Greenberg, D A
1999-01-01
In genetic analysis of diseases in which the underlying model is unknown, "model free" methods-such as affected sib pair (ASP) tests-are often preferred over LOD-score methods, although LOD-score methods under the correct or even approximately correct model are more powerful than ASP tests. However, there might be circumstances in which nonparametric methods will outperform LOD-score methods. Recently, Dizier et al. reported that, in some complex two-locus (2L) models, LOD-score methods with segregation analysis-derived parameters had less power to detect linkage than ASP tests. We investigated whether these particular models, in fact, represent a situation that ASP tests are more powerful than LOD scores. We simulated data according to the parameters specified by Dizier et al. and analyzed the data by using a (a) single locus (SL) LOD-score analysis performed twice, under a simple dominant and a recessive mode of inheritance (MOI), (b) ASP methods, and (c) nonparametric linkage (NPL) analysis. We show that SL analysis performed twice and corrected for the type I-error increase due to multiple testing yields almost as much linkage information as does an analysis under the correct 2L model and is more powerful than either the ASP method or the NPL method. We demonstrate that, even for complex genetic models, the most important condition for linkage analysis is that the assumed MOI at the disease locus being tested is approximately correct, not that the inheritance of the disease per se is correctly specified. In the analysis by Dizier et al., segregation analysis led to estimates of dominance parameters that were grossly misspecified for the locus tested in those models in which ASP tests appeared to be more powerful than LOD-score analyses.
Major strengths and weaknesses of the lod score method.
Ott, J
2001-01-01
Strengths and weaknesses of the lod score method for human genetic linkage analysis are discussed. The main weakness is its requirement for the specification of a detailed inheritance model for the trait. Various strengths are identified. For example, the lod score (likelihood) method has optimality properties when the trait to be studied is known to follow a Mendelian mode of inheritance. The ELOD is a useful measure for information content of the data. The lod score method can emulate various "nonparametric" methods, and this emulation is equivalent to the nonparametric methods. Finally, the possibility of building errors into the analysis will prove to be essential for the large amount of linkage and disequilibrium data expected in the near future.
Schmidt, K; Witte, H
1999-11-01
Recently the assumption of the independence of individual frequency components in a signal has been rejected, for example, for the EEG during defined physiological states such as sleep or sedation [9, 10]. Thus, the use of higher-order spectral analysis capable of detecting interrelations between individual signal components has proved useful. The aim of the present study was to investigate the quality of various non-parametric and parametric estimation algorithms using simulated as well as true physiological data. We employed standard algorithms available for the MATLAB. The results clearly show that parametric bispectral estimation is superior to non-parametric estimation in terms of the quality of peak localisation and the discrimination from other peaks.
A SAS(®) macro implementation of a multiple comparison post hoc test for a Kruskal-Wallis analysis.
Elliott, Alan C; Hynan, Linda S
2011-04-01
The Kruskal-Wallis (KW) nonparametric analysis of variance is often used instead of a standard one-way ANOVA when data are from a suspected non-normal population. The KW omnibus procedure tests for some differences between groups, but provides no specific post hoc pair wise comparisons. This paper provides a SAS(®) macro implementation of a multiple comparison test based on significant Kruskal-Wallis results from the SAS NPAR1WAY procedure. The implementation is designed for up to 20 groups at a user-specified alpha significance level. A Monte-Carlo simulation compared this nonparametric procedure to commonly used parametric multiple comparison tests. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
The Infinitesimal Jackknife with Exploratory Factor Analysis
ERIC Educational Resources Information Center
Zhang, Guangjian; Preacher, Kristopher J.; Jennrich, Robert I.
2012-01-01
The infinitesimal jackknife, a nonparametric method for estimating standard errors, has been used to obtain standard error estimates in covariance structure analysis. In this article, we adapt it for obtaining standard errors for rotated factor loadings and factor correlations in exploratory factor analysis with sample correlation matrices. Both…
Zeng, Li-ping; Hu, Zheng-mao; Mu, Li-li; Mei, Gui-sen; Lu, Xiu-ling; Zheng, Yong-jun; Li, Pei-jian; Zhang, Ying-xue; Pan, Qian; Long, Zhi-gao; Dai, He-ping; Zhang, Zhuo-hua; Xia, Jia-hui; Zhao, Jing-ping; Xia, Kun
2011-06-01
To investigate the relationship of susceptibility loci in chromosomes 1q21-25 and 6p21-25 and schizophrenia subtypes in Chinese population. A genomic scan and parametric and non-parametric analyses were performed on 242 individuals from 36 schizophrenia pedigrees, including 19 paranoid schizophrenia and 17 undifferentiated schizophrenia pedigrees, from Henan province of China using 5 microsatellite markers in the chromosome region 1q21-25 and 8 microsatellite markers in the chromosome region 6p21-25, which were the candidates of previous studies. All affected subjects were diagnosed and typed according to the criteria of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revised (DSM-IV-TR; American Psychiatric Association, 2000). All subjects signed informed consent. In chromosome 1, parametric analysis under the dominant inheritance mode of all 36 pedigrees showed that the maximum multi-point heterogeneity Log of odds score method (HLOD) score was 1.33 (α = 0.38). The non-parametric analysis and the single point and multi-point nonparametric linkage (NPL) scores suggested linkage at D1S484, D1S2878, and D1S196. In the 19 paranoid schizophrenias pedigrees, linkage was not observed for any of the 5 markers. In the 17 undifferentiated schizophrenia pedigrees, the multi-point NPL score was 1.60 (P= 0.0367) at D1S484. The single point NPL score was 1.95(P= 0.0145) and the multi-point NPL score was 2.39 (P= 0.0041) at D1S2878. Additionally, the multi-point NPL score was 1.74 (P= 0.0255) at D1S196. These same three loci showed suggestive linkage during the integrative analysis of all 36 pedigrees. In chromosome 6, parametric linkage analysis under the dominant and recessive inheritance and the non-parametric linkage analysis of all 36 pedigrees and the 17 undifferentiated schizophrenia pedigrees, linkage was not observed for any of the 8 markers. In the 19 paranoid schizophrenias pedigrees, parametric analysis showed that under recessive inheritance mode the maximum single-point HLOD score was 1.26 (α = 0.40) and the multi-point HLOD was 1.12 (α = 0.38) at D6S289 in the chromosome 6p23. In nonparametric analysis, the single-point NPL score was 1.52 (P= 0.0402) and the multi-point NPL score was 1.92 (P= 0.0206) at D6S289. Susceptibility genes correlated with undifferentiated schizophrenia pedigrees from D1S484, D1S2878, D1S196 loci, and those correlated with paranoid schizophrenia pedigrees from D6S289 locus are likely present in chromosome regions 1q23.3 and 1q24.2, and chromosome region 6p23, respectively.
Nonparametric evaluation of birth cohort trends in disease rates.
Tarone, R E; Chu, K C
2000-01-01
Although interpretation of age-period-cohort analyses is complicated by the non-identifiability of maximum likelihood estimates, changes in the slope of the birth-cohort effect curve are identifiable and have potential aetiologic significance. A nonparametric test for a change in the slope of the birth-cohort trend has been developed. The test is a generalisation of the sign test and is based on permutational distributions. A method for identifying interactions between age and calendar-period effects is also presented. The nonparametric method is shown to be powerful in detecting changes in the slope of the birth-cohort trend, although its power can be reduced considerably by calendar-period patterns of risk. The method identifies a previously unidentified decrease in the birth-cohort risk of lung-cancer mortality from 1912 to 1919, which appears to reflect a reduction in the initiation of smoking by young men at the beginning of the Great Depression (1930s). The method also detects an interaction between age and calendar period in leukemia mortality rates, reflecting the better response of children to chemotherapy. The proposed nonparametric method provides a data analytic approach, which is a useful adjunct to log-linear Poisson analysis of age-period-cohort models, either in the initial model building stage, or in the final interpretation stage.
Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data
Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao
2012-01-01
Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions. PMID:23645976
Nonparametric analysis of bivariate gap time with competing risks.
Huang, Chiung-Yu; Wang, Chenguang; Wang, Mei-Cheng
2016-09-01
This article considers nonparametric methods for studying recurrent disease and death with competing risks. We first point out that comparisons based on the well-known cumulative incidence function can be confounded by different prevalence rates of the competing events, and that comparisons of the conditional distribution of the survival time given the failure event type are more relevant for investigating the prognosis of different patterns of recurrence disease. We then propose nonparametric estimators for the conditional cumulative incidence function as well as the conditional bivariate cumulative incidence function for the bivariate gap times, that is, the time to disease recurrence and the residual lifetime after recurrence. To quantify the association between the two gap times in the competing risks setting, a modified Kendall's tau statistic is proposed. The proposed estimators for the conditional bivariate cumulative incidence distribution and the association measure account for the induced dependent censoring for the second gap time. Uniform consistency and weak convergence of the proposed estimators are established. Hypothesis testing procedures for two-sample comparisons are discussed. Numerical simulation studies with practical sample sizes are conducted to evaluate the performance of the proposed nonparametric estimators and tests. An application to data from a pancreatic cancer study is presented to illustrate the methods developed in this article. © 2016, The International Biometric Society.
Wang, Ching-Yun; Cullings, Harry; Song, Xiao; Kopecky, Kenneth J.
2017-01-01
SUMMARY Observational epidemiological studies often confront the problem of estimating exposure-disease relationships when the exposure is not measured exactly. In the paper, we investigate exposure measurement error in excess relative risk regression, which is a widely used model in radiation exposure effect research. In the study cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies a generalized version of the classical additive measurement error model, but it may or may not have repeated measurements. In addition, an instrumental variable is available for individuals in a subset of the whole cohort. We develop a nonparametric correction (NPC) estimator using data from the subcohort, and further propose a joint nonparametric correction (JNPC) estimator using all observed data to adjust for exposure measurement error. An optimal linear combination estimator of JNPC and NPC is further developed. The proposed estimators are nonparametric, which are consistent without imposing a covariate or error distribution, and are robust to heteroscedastic errors. Finite sample performance is examined via a simulation study. We apply the developed methods to data from the Radiation Effects Research Foundation, in which chromosome aberration is used to adjust for the effects of radiation dose measurement error on the estimation of radiation dose responses. PMID:29354018
Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data.
Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao
2013-01-01
Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions.
Robust neural network with applications to credit portfolio data analysis.
Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun
2010-01-01
In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.
Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.
Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo
2017-01-01
Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.
Efficiency Analysis of Public Universities in Thailand
ERIC Educational Resources Information Center
Kantabutra, Saranya; Tang, John C. S.
2010-01-01
This paper examines the performance of Thai public universities in terms of efficiency, using a non-parametric approach called data envelopment analysis. Two efficiency models, the teaching efficiency model and the research efficiency model, are developed and the analysis is conducted at the faculty level. Further statistical analyses are also…
A Rational Analysis of the Acquisition of Multisensory Representations
ERIC Educational Resources Information Center
Yildirim, Ilker; Jacobs, Robert A.
2012-01-01
How do people learn multisensory, or amodal, representations, and what consequences do these representations have for perceptual performance? We address this question by performing a rational analysis of the problem of learning multisensory representations. This analysis makes use of a Bayesian nonparametric model that acquires latent multisensory…
Target Identification Using Harmonic Wavelet Based ISAR Imaging
NASA Astrophysics Data System (ADS)
Shreyamsha Kumar, B. K.; Prabhakar, B.; Suryanarayana, K.; Thilagavathi, V.; Rajagopal, R.
2006-12-01
A new approach has been proposed to reduce the computations involved in the ISAR imaging, which uses harmonic wavelet-(HW) based time-frequency representation (TFR). Since the HW-based TFR falls into a category of nonparametric time-frequency (T-F) analysis tool, it is computationally efficient compared to parametric T-F analysis tools such as adaptive joint time-frequency transform (AJTFT), adaptive wavelet transform (AWT), and evolutionary AWT (EAWT). Further, the performance of the proposed method of ISAR imaging is compared with the ISAR imaging by other nonparametric T-F analysis tools such as short-time Fourier transform (STFT) and Choi-Williams distribution (CWD). In the ISAR imaging, the use of HW-based TFR provides similar/better results with significant (92%) computational advantage compared to that obtained by CWD. The ISAR images thus obtained are identified using a neural network-based classification scheme with feature set invariant to translation, rotation, and scaling.
NASA Astrophysics Data System (ADS)
Yang, Yang; Peng, Zhike; Dong, Xingjian; Zhang, Wenming; Clifton, David A.
2018-03-01
A challenge in analysing non-stationary multi-component signals is to isolate nonlinearly time-varying signals especially when they are overlapped in time and frequency plane. In this paper, a framework integrating time-frequency analysis-based demodulation and a non-parametric Gaussian latent feature model is proposed to isolate and recover components of such signals. The former aims to remove high-order frequency modulation (FM) such that the latter is able to infer demodulated components while simultaneously discovering the number of the target components. The proposed method is effective in isolating multiple components that have the same FM behavior. In addition, the results show that the proposed method is superior to generalised demodulation with singular-value decomposition-based method, parametric time-frequency analysis with filter-based method and empirical model decomposition base method, in recovering the amplitude and phase of superimposed components.
NASA Astrophysics Data System (ADS)
Hastuti, S.; Harijono; Murtini, E. S.; Fibrianto, K.
2018-03-01
This current study is aimed to investigate the use of parametric and non-parametric approach for sensory RATA (Rate-All-That-Apply) method. Ledre as Bojonegoro unique local food product was used as point of interest, in which 319 panelists were involved in the study. The result showed that ledre is characterized as easy-crushed texture, sticky in mouth, stingy sensation and easy to swallow. It has also strong banana flavour with brown in colour. Compared to eggroll and semprong, ledre has more variances in terms of taste as well the roll length. As RATA questionnaire is designed to collect categorical data, non-parametric approach is the common statistical procedure. However, similar results were also obtained as parametric approach, regardless the fact of non-normal distributed data. Thus, it suggests that parametric approach can be applicable for consumer study with large number of respondents, even though it may not satisfy the assumption of ANOVA (Analysis of Variances).
Lu, Tao
2016-01-01
The gene regulation network (GRN) evaluates the interactions between genes and look for models to describe the gene expression behavior. These models have many applications; for instance, by characterizing the gene expression mechanisms that cause certain disorders, it would be possible to target those genes to block the progress of the disease. Many biological processes are driven by nonlinear dynamic GRN. In this article, we propose a nonparametric differential equation (ODE) to model the nonlinear dynamic GRN. Specially, we address following questions simultaneously: (i) extract information from noisy time course gene expression data; (ii) model the nonlinear ODE through a nonparametric smoothing function; (iii) identify the important regulatory gene(s) through a group smoothly clipped absolute deviation (SCAD) approach; (iv) test the robustness of the model against possible shortening of experimental duration. We illustrate the usefulness of the model and associated statistical methods through a simulation and a real application examples.
Network structure exploration in networks with node attributes
NASA Astrophysics Data System (ADS)
Chen, Yi; Wang, Xiaolong; Bu, Junzhao; Tang, Buzhou; Xiang, Xin
2016-05-01
Complex networks provide a powerful way to represent complex systems and have been widely studied during the past several years. One of the most important tasks of network analysis is to detect structures (also called structural regularities) embedded in networks by determining group number and group partition. Most of network structure exploration models only consider network links. However, in real world networks, nodes may have attributes that are useful for network structure exploration. In this paper, we propose a novel Bayesian nonparametric (BNP) model to explore structural regularities in networks with node attributes, called Bayesian nonparametric attribute (BNPA) model. This model does not only take full advantage of both links between nodes and node attributes for group partition via shared hidden variables, but also determine group number automatically via the Bayesian nonparametric theory. Experiments conducted on a number of real and synthetic networks show that our BNPA model is able to automatically explore structural regularities in networks with node attributes and is competitive with other state-of-the-art models.
Non-Parametric Collision Probability for Low-Velocity Encounters
NASA Technical Reports Server (NTRS)
Carpenter, J. Russell
2007-01-01
An implicit, but not necessarily obvious, assumption in all of the current techniques for assessing satellite collision probability is that the relative position uncertainty is perfectly correlated in time. If there is any mis-modeling of the dynamics in the propagation of the relative position error covariance matrix, time-wise de-correlation of the uncertainty will increase the probability of collision over a given time interval. The paper gives some examples that illustrate this point. This paper argues that, for the present, Monte Carlo analysis is the best available tool for handling low-velocity encounters, and suggests some techniques for addressing the issues just described. One proposal is for the use of a non-parametric technique that is widely used in actuarial and medical studies. The other suggestion is that accurate process noise models be used in the Monte Carlo trials to which the non-parametric estimate is applied. A further contribution of this paper is a description of how the time-wise decorrelation of uncertainty increases the probability of collision.
Sharma, Andy
2017-06-01
The purpose of this study was to showcase an advanced methodological approach to model disability and institutional entry. Both of these are important areas to investigate given the on-going aging of the United States population. By 2020, approximately 15% of the population will be 65 years and older. Many of these older adults will experience disability and require formal care. A probit analysis was employed to determine which disabilities were associated with admission into an institution (i.e. long-term care). Since this framework imposes strong distributional assumptions, misspecification leads to inconsistent estimators. To overcome such a short-coming, this analysis extended the probit framework by employing an advanced semi-nonparamertic maximum likelihood estimation utilizing Hermite polynomial expansions. Specification tests show semi-nonparametric estimation is preferred over probit. In terms of the estimates, semi-nonparametric ratios equal 42 for cognitive difficulty, 64 for independent living, and 111 for self-care disability while probit yields much smaller estimates of 19, 30, and 44, respectively. Public health professionals can use these results to better understand why certain interventions have not shown promise. Equally important, healthcare workers can use this research to evaluate which type of treatment plans may delay institutionalization and improve the quality of life for older adults. Implications for rehabilitation With on-going global aging, understanding the association between disability and institutional entry is important in devising successful rehabilitation interventions. Semi-nonparametric is preferred to probit and shows ambulatory and cognitive impairments present high risk for institutional entry (long-term care). Informal caregiving and home-based care require further examination as forms of rehabilitation/therapy for certain types of disabilities.
NASA Astrophysics Data System (ADS)
Constantinescu, C. C.; Yoder, K. K.; Kareken, D. A.; Bouman, C. A.; O'Connor, S. J.; Normandin, M. D.; Morris, E. D.
2008-03-01
We previously developed a model-independent technique (non-parametric ntPET) for extracting the transient changes in neurotransmitter concentration from paired (rest & activation) PET studies with a receptor ligand. To provide support for our method, we introduced three hypotheses of validation based on work by Endres and Carson (1998 J. Cereb. Blood Flow Metab. 18 1196-210) and Yoder et al (2004 J. Nucl. Med. 45 903-11), and tested them on experimental data. All three hypotheses describe relationships between the estimated free (synaptic) dopamine curves (FDA(t)) and the change in binding potential (ΔBP). The veracity of the FDA(t) curves recovered by nonparametric ntPET is supported when the data adhere to the following hypothesized behaviors: (1) ΔBP should decline with increasing DA peak time, (2) ΔBP should increase as the strength of the temporal correlation between FDA(t) and the free raclopride (FRAC(t)) curve increases, (3) ΔBP should decline linearly with the effective weighted availability of the receptor sites. We analyzed regional brain data from 8 healthy subjects who received two [11C]raclopride scans: one at rest, and one during which unanticipated IV alcohol was administered to stimulate dopamine release. For several striatal regions, nonparametric ntPET was applied to recover FDA(t), and binding potential values were determined. Kendall rank-correlation analysis confirmed that the FDA(t) data followed the expected trends for all three validation hypotheses. Our findings lend credence to our model-independent estimates of FDA(t). Application of nonparametric ntPET may yield important insights into how alterations in timing of dopaminergic neurotransmission are involved in the pathologies of addiction and other psychiatric disorders.
A Bayesian nonparametric method for prediction in EST analysis
Lijoi, Antonio; Mena, Ramsés H; Prünster, Igor
2007-01-01
Background Expressed sequence tags (ESTs) analyses are a fundamental tool for gene identification in organisms. Given a preliminary EST sample from a certain library, several statistical prediction problems arise. In particular, it is of interest to estimate how many new genes can be detected in a future EST sample of given size and also to determine the gene discovery rate: these estimates represent the basis for deciding whether to proceed sequencing the library and, in case of a positive decision, a guideline for selecting the size of the new sample. Such information is also useful for establishing sequencing efficiency in experimental design and for measuring the degree of redundancy of an EST library. Results In this work we propose a Bayesian nonparametric approach for tackling statistical problems related to EST surveys. In particular, we provide estimates for: a) the coverage, defined as the proportion of unique genes in the library represented in the given sample of reads; b) the number of new unique genes to be observed in a future sample; c) the discovery rate of new genes as a function of the future sample size. The Bayesian nonparametric model we adopt conveys, in a statistically rigorous way, the available information into prediction. Our proposal has appealing properties over frequentist nonparametric methods, which become unstable when prediction is required for large future samples. EST libraries, previously studied with frequentist methods, are analyzed in detail. Conclusion The Bayesian nonparametric approach we undertake yields valuable tools for gene capture and prediction in EST libraries. The estimators we obtain do not feature the kind of drawbacks associated with frequentist estimators and are reliable for any size of the additional sample. PMID:17868445
Nonparametric estimation and testing of fixed effects panel data models
Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi
2009-01-01
In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335
Muñoz–Negrete, Francisco J.; Oblanca, Noelia; Rebolleda, Gema
2018-01-01
Purpose To study the structure-function relationship in glaucoma and healthy patients assessed with Spectralis OCT and Humphrey perimetry using new statistical approaches. Materials and Methods Eighty-five eyes were prospectively selected and divided into 2 groups: glaucoma (44) and healthy patients (41). Three different statistical approaches were carried out: (1) factor analysis of the threshold sensitivities (dB) (automated perimetry) and the macular thickness (μm) (Spectralis OCT), subsequently applying Pearson's correlation to the obtained regions, (2) nonparametric regression analysis relating the values in each pair of regions that showed significant correlation, and (3) nonparametric spatial regressions using three models designed for the purpose of this study. Results In the glaucoma group, a map that relates structural and functional damage was drawn. The strongest correlation with visual fields was observed in the peripheral nasal region of both superior and inferior hemigrids (r = 0.602 and r = 0.458, resp.). The estimated functions obtained with the nonparametric regressions provided the mean sensitivity that corresponds to each given macular thickness. These functions allowed for accurate characterization of the structure-function relationship. Conclusions Both maps and point-to-point functions obtained linking structure and function damage contribute to a better understanding of this relationship and may help in the future to improve glaucoma diagnosis. PMID:29850196
Martina, R; Kay, R; van Maanen, R; Ridder, A
2015-01-01
Clinical studies in overactive bladder have traditionally used analysis of covariance or nonparametric methods to analyse the number of incontinence episodes and other count data. It is known that if the underlying distributional assumptions of a particular parametric method do not hold, an alternative parametric method may be more efficient than a nonparametric one, which makes no assumptions regarding the underlying distribution of the data. Therefore, there are advantages in using methods based on the Poisson distribution or extensions of that method, which incorporate specific features that provide a modelling framework for count data. One challenge with count data is overdispersion, but methods are available that can account for this through the introduction of random effect terms in the modelling, and it is this modelling framework that leads to the negative binomial distribution. These models can also provide clinicians with a clearer and more appropriate interpretation of treatment effects in terms of rate ratios. In this paper, the previously used parametric and non-parametric approaches are contrasted with those based on Poisson regression and various extensions in trials evaluating solifenacin and mirabegron in patients with overactive bladder. In these applications, negative binomial models are seen to fit the data well. Copyright © 2014 John Wiley & Sons, Ltd.
NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES
He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.
2017-01-01
Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225
Theory and Application of DNA Histogram Analysis.
ERIC Educational Resources Information Center
Bagwell, Charles Bruce
The underlying principles and assumptions associated with DNA histograms are discussed along with the characteristics of fluorescent probes. Information theory was described and used to calculate the information content of a DNA histogram. Two major types of DNA histogram analyses are proposed: parametric and nonparametric analysis. Three levels…
HBCU Efficiency and Endowments: An Exploratory Analysis
ERIC Educational Resources Information Center
Coupet, Jason; Barnum, Darold
2010-01-01
Discussions of efficiency among Historically Black Colleges and Universities (HBCUs) are often missing in academic conversations. This article seeks to assess efficiency of individual HBCUs using Data Envelopment Analysis (DEA), a non-parametric technique that can synthesize multiple inputs and outputs to determine a single efficiency score for…
Analyzing Single-Molecule Time Series via Nonparametric Bayesian Inference
Hines, Keegan E.; Bankston, John R.; Aldrich, Richard W.
2015-01-01
The ability to measure the properties of proteins at the single-molecule level offers an unparalleled glimpse into biological systems at the molecular scale. The interpretation of single-molecule time series has often been rooted in statistical mechanics and the theory of Markov processes. While existing analysis methods have been useful, they are not without significant limitations including problems of model selection and parameter nonidentifiability. To address these challenges, we introduce the use of nonparametric Bayesian inference for the analysis of single-molecule time series. These methods provide a flexible way to extract structure from data instead of assuming models beforehand. We demonstrate these methods with applications to several diverse settings in single-molecule biophysics. This approach provides a well-constrained and rigorously grounded method for determining the number of biophysical states underlying single-molecule data. PMID:25650922
Bansal, Ravi; Peterson, Bradley S
2018-06-01
Identifying regional effects of interest in MRI datasets usually entails testing a priori hypotheses across many thousands of brain voxels, requiring control for false positive findings in these multiple hypotheses testing. Recent studies have suggested that parametric statistical methods may have incorrectly modeled functional MRI data, thereby leading to higher false positive rates than their nominal rates. Nonparametric methods for statistical inference when conducting multiple statistical tests, in contrast, are thought to produce false positives at the nominal rate, which has thus led to the suggestion that previously reported studies should reanalyze their fMRI data using nonparametric tools. To understand better why parametric methods may yield excessive false positives, we assessed their performance when applied both to simulated datasets of 1D, 2D, and 3D Gaussian Random Fields (GRFs) and to 710 real-world, resting-state fMRI datasets. We showed that both the simulated 2D and 3D GRFs and the real-world data contain a small percentage (<6%) of very large clusters (on average 60 times larger than the average cluster size), which were not present in 1D GRFs. These unexpectedly large clusters were deemed statistically significant using parametric methods, leading to empirical familywise error rates (FWERs) as high as 65%: the high empirical FWERs were not a consequence of parametric methods failing to model spatial smoothness accurately, but rather of these very large clusters that are inherently present in smooth, high-dimensional random fields. In fact, when discounting these very large clusters, the empirical FWER for parametric methods was 3.24%. Furthermore, even an empirical FWER of 65% would yield on average less than one of those very large clusters in each brain-wide analysis. Nonparametric methods, in contrast, estimated distributions from those large clusters, and therefore, by construct rejected the large clusters as false positives at the nominal FWERs. Those rejected clusters were outlying values in the distribution of cluster size but cannot be distinguished from true positive findings without further analyses, including assessing whether fMRI signal in those regions correlates with other clinical, behavioral, or cognitive measures. Rejecting the large clusters, however, significantly reduced the statistical power of nonparametric methods in detecting true findings compared with parametric methods, which would have detected most true findings that are essential for making valid biological inferences in MRI data. Parametric analyses, in contrast, detected most true findings while generating relatively few false positives: on average, less than one of those very large clusters would be deemed a true finding in each brain-wide analysis. We therefore recommend the continued use of parametric methods that model nonstationary smoothness for cluster-level, familywise control of false positives, particularly when using a Cluster Defining Threshold of 2.5 or higher, and subsequently assessing rigorously the biological plausibility of the findings, even for large clusters. Finally, because nonparametric methods yielded a large reduction in statistical power to detect true positive findings, we conclude that the modest reduction in false positive findings that nonparametric analyses afford does not warrant a re-analysis of previously published fMRI studies using nonparametric techniques. Copyright © 2018 Elsevier Inc. All rights reserved.
Proceedings of the Third Annual Symposium on Mathematical Pattern Recognition and Image Analysis
NASA Technical Reports Server (NTRS)
Guseman, L. F., Jr.
1985-01-01
Topics addressed include: multivariate spline method; normal mixture analysis applied to remote sensing; image data analysis; classifications in spatially correlated environments; probability density functions; graphical nonparametric methods; subpixel registration analysis; hypothesis integration in image understanding systems; rectification of satellite scanner imagery; spatial variation in remotely sensed images; smooth multidimensional interpolation; and optimal frequency domain textural edge detection filters.
Potential linkage for schizophrenia on chromosome 22q12-q13: A replication study
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schwab, S.G.; Bondy, B.; Wildenauer, D.B.
1995-10-09
In an attempt to replicate a potential linkage on chromosome 22q12-q13.1 reported by Pulver et al., we have analyzed 4 microsatellite markers which span this chromosomal region, including the IL2RB locus, for linkage with schizophrenia in 30 families from Israel and Germany. Linkage analysis by pairwise lod score analysis as well as by multipoint analysis did not provide evidence for a single major gene locus. However, a lod score of Z{sub max} = 0.612 was obtained for a dominant model of inheritance with the marker D22S304 at recombination fraction 0.2 by pairwise analysis. In addition, using a nonparametric method, sibmore » pair analysis, a P value of 0.068 corresponding to a lod score of 0.48 was obtained for this marker. This finding, together with those of Pulver et al., is suggestive of a genetic factor in this region, predisposing for schizophrenia in a subset of families. Further studies using nonparametric methods should be conducted in order to clarify this point. 32 refs., 1 fig., 4 tabs.« less
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Nonparametric Estimation of Standard Errors in Covariance Analysis Using the Infinitesimal Jackknife
ERIC Educational Resources Information Center
Jennrich, Robert I.
2008-01-01
The infinitesimal jackknife provides a simple general method for estimating standard errors in covariance structure analysis. Beyond its simplicity and generality what makes the infinitesimal jackknife method attractive is that essentially no assumptions are required to produce consistent standard error estimates, not even the requirement that the…
Can Percentiles Replace Raw Scores in the Statistical Analysis of Test Data?
ERIC Educational Resources Information Center
Zimmerman, Donald W.; Zumbo, Bruno D.
2005-01-01
Educational and psychological testing textbooks typically warn of the inappropriateness of performing arithmetic operations and statistical analysis on percentiles instead of raw scores. This seems inconsistent with the well-established finding that transforming scores to ranks and using nonparametric methods often improves the validity and power…
A Bayesian Nonparametric Meta-Analysis Model
ERIC Educational Resources Information Center
Karabatsos, George; Talbott, Elizabeth; Walker, Stephen G.
2015-01-01
In a meta-analysis, it is important to specify a model that adequately describes the effect-size distribution of the underlying population of studies. The conventional normal fixed-effect and normal random-effects models assume a normal effect-size population distribution, conditionally on parameters and covariates. For estimating the mean overall…
Exploring Rating Quality in Rater-Mediated Assessments Using Mokken Scale Analysis
ERIC Educational Resources Information Center
Wind, Stefanie A.; Engelhard, George, Jr.
2016-01-01
Mokken scale analysis is a probabilistic nonparametric approach that offers statistical and graphical tools for evaluating the quality of social science measurement without placing potentially inappropriate restrictions on the structure of a data set. In particular, Mokken scaling provides a useful method for evaluating important measurement…
Exploring Incomplete Rating Designs with Mokken Scale Analysis
ERIC Educational Resources Information Center
Wind, Stefanie A.; Patil, Yogendra J.
2018-01-01
Recent research has explored the use of models adapted from Mokken scale analysis as a nonparametric approach to evaluating rating quality in educational performance assessments. A potential limiting factor to the widespread use of these techniques is the requirement for complete data, as practical constraints in operational assessment systems…
A neural network approach to cloud classification
NASA Technical Reports Server (NTRS)
Lee, Jonathan; Weger, Ronald C.; Sengupta, Sailes K.; Welch, Ronald M.
1990-01-01
It is shown that, using high-spatial-resolution data, very high cloud classification accuracies can be obtained with a neural network approach. A texture-based neural network classifier using only single-channel visible Landsat MSS imagery achieves an overall cloud identification accuracy of 93 percent. Cirrus can be distinguished from boundary layer cloudiness with an accuracy of 96 percent, without the use of an infrared channel. Stratocumulus is retrieved with an accuracy of 92 percent, cumulus at 90 percent. The use of the neural network does not improve cirrus classification accuracy. Rather, its main effect is in the improved separation between stratocumulus and cumulus cloudiness. While most cloud classification algorithms rely on linear parametric schemes, the present study is based on a nonlinear, nonparametric four-layer neural network approach. A three-layer neural network architecture, the nonparametric K-nearest neighbor approach, and the linear stepwise discriminant analysis procedure are compared. A significant finding is that significantly higher accuracies are attained with the nonparametric approaches using only 20 percent of the database as training data, compared to 67 percent of the database in the linear approach.
Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.
Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R
2018-06-01
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Hu, Pingsha; Maiti, Tapabrata
2011-01-01
Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.
Hu, Pingsha; Maiti, Tapabrata
2011-01-01
Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181
NASA Astrophysics Data System (ADS)
Wunderlich, Adam; Goossens, Bart
2014-03-01
The majority of the literature on task-based image quality assessment has focused on lesion detection tasks, using the receiver operating characteristic (ROC) curve, or related variants, to measure performance. However, since many clinical image evaluation tasks involve both detection and estimation (e.g., estimation of kidney stone composition, estimation of tumor size), there is a growing interest in performance evaluation for joint detection and estimation tasks. To evaluate observer performance on such tasks, Clarkson introduced the estimation ROC (EROC) curve, and the area under the EROC curve as a summary figure of merit. In the present work, we propose nonparametric estimators for practical EROC analysis from experimental data, including estimators for the area under the EROC curve and its variance. The estimators are illustrated with a practical example comparing MRI images reconstructed from different k-space sampling trajectories.
Barnes, A P
2006-09-01
Recent policy changes within the Common Agricultural Policy have led to a shift from a solely production-led agriculture towards the promotion of multi-functionality. Conversely, the removal of production-led supports would indicate that an increased concentration on production efficiencies would seem a critical strategy for a country's future competitiveness. This paper explores the relationship between the 'multi-functional' farming attitude desired by policy makers and its effect on technical efficiency within Scottish dairy farming. Technical efficiency scores are calculated by applying the non-parametric data envelopment analysis technique and then measured against causes of inefficiency. Amongst these explanatory factors is a constructed score of multi-functionality. This research finds that, amongst other factors, a multi-functional attitude has a significant positive effect on technical efficiency. Consequently, this seems to validate the promotion of a multi-functional approach to farming currently being championed by policy-makers.
On-Line Robust Modal Stability Prediction using Wavelet Processing
NASA Technical Reports Server (NTRS)
Brenner, Martin J.; Lind, Rick
1998-01-01
Wavelet analysis for filtering and system identification has been used to improve the estimation of aeroservoelastic stability margins. The conservatism of the robust stability margins is reduced with parametric and nonparametric time- frequency analysis of flight data in the model validation process. Nonparametric wavelet processing of data is used to reduce the effects of external disturbances and unmodeled dynamics. Parametric estimates of modal stability are also extracted using the wavelet transform. Computation of robust stability margins for stability boundary prediction depends on uncertainty descriptions derived from the data for model validation. The F-18 High Alpha Research Vehicle aeroservoelastic flight test data demonstrates improved robust stability prediction by extension of the stability boundary beyond the flight regime. Guidelines and computation times are presented to show the efficiency and practical aspects of these procedures for on-line implementation. Feasibility of the method is shown for processing flight data from time- varying nonstationary test points.
Nonparametric test of consistency between cosmological models and multiband CMB measurements
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aghamousa, Amir; Shafieloo, Arman, E-mail: amir@apctp.org, E-mail: shafieloo@kasi.re.kr
2015-06-01
We present a novel approach to test the consistency of the cosmological models with multiband CMB data using a nonparametric approach. In our analysis we calibrate the REACT (Risk Estimation and Adaptation after Coordinate Transformation) confidence levels associated with distances in function space (confidence distances) based on the Monte Carlo simulations in order to test the consistency of an assumed cosmological model with observation. To show the applicability of our algorithm, we confront Planck 2013 temperature data with concordance model of cosmology considering two different Planck spectra combination. In order to have an accurate quantitative statistical measure to compare betweenmore » the data and the theoretical expectations, we calibrate REACT confidence distances and perform a bias control using many realizations of the data. Our results in this work using Planck 2013 temperature data put the best fit ΛCDM model at 95% (∼ 2σ) confidence distance from the center of the nonparametric confidence set while repeating the analysis excluding the Planck 217 × 217 GHz spectrum data, the best fit ΛCDM model shifts to 70% (∼ 1σ) confidence distance. The most prominent features in the data deviating from the best fit ΛCDM model seems to be at low multipoles 18 < ℓ < 26 at greater than 2σ, ℓ ∼ 750 at ∼1 to 2σ and ℓ ∼ 1800 at greater than 2σ level. Excluding the 217×217 GHz spectrum the feature at ℓ ∼ 1800 becomes substantially less significance at ∼1 to 2σ confidence level. Results of our analysis based on the new approach we propose in this work are in agreement with other analysis done using alternative methods.« less
Cox regression analysis with missing covariates via nonparametric multiple imputation.
Hsu, Chiu-Hsieh; Yu, Mandi
2018-01-01
We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
Ruiz-Sanchez, Eduardo
2015-12-01
The Neotropical woody bamboo genus Otatea is one of five genera in the subtribe Guaduinae. Of the eight described Otatea species, seven are endemic to Mexico and one is also distributed in Central and South America. Otatea acuminata has the widest geographical distribution of the eight species, and two of its recently collected populations do not match the known species morphologically. Parametric and non-parametric methods were used to delimit the species in Otatea using five chloroplast markers, one nuclear marker, and morphological characters. The parametric coalescent method and the non-parametric analysis supported the recognition of two distinct evolutionary lineages. Molecular clock estimates were used to estimate divergence times in Otatea. The results for divergence time in Otatea estimated the origin of the speciation events from the Late Miocene to Late Pleistocene. The species delimitation analyses (parametric and non-parametric) identified that the two populations of O. acuminata from Chiapas and Hidalgo are from two separate evolutionary lineages and these new species have morphological characters that separate them from O. acuminata s.s. The geological activity of the Trans-Mexican Volcanic Belt and the Isthmus of Tehuantepec may have isolated populations and limited the gene flow between Otatea species, driving speciation. Based on the results found here, I describe Otatea rzedowskiorum and Otatea victoriae as two new species, morphologically different from O. acuminata. Copyright © 2015 Elsevier Inc. All rights reserved.
Multilevel Latent Class Analysis: Parametric and Nonparametric Models
ERIC Educational Resources Information Center
Finch, W. Holmes; French, Brian F.
2014-01-01
Latent class analysis is an analytic technique often used in educational and psychological research to identify meaningful groups of individuals within a larger heterogeneous population based on a set of variables. This technique is flexible, encompassing not only a static set of variables but also longitudinal data in the form of growth mixture…
A Comparison of Distribution Free and Non-Distribution Free Factor Analysis Methods
ERIC Educational Resources Information Center
Ritter, Nicola L.
2012-01-01
Many researchers recognize that factor analysis can be conducted on both correlation matrices and variance-covariance matrices. Although most researchers extract factors from non-distribution free or parametric methods, researchers can also extract factors from distribution free or non-parametric methods. The nature of the data dictates the method…
Fusion of Hard and Soft Information in Nonparametric Density Estimation
2015-06-10
and stochastic optimization models, in analysis of simulation output, and when instantiating probability models. We adopt a constrained maximum...particular, density estimation is needed for generation of input densities to simulation and stochastic optimization models, in analysis of simulation output...an essential step in simulation analysis and stochastic optimization is the generation of probability densities for input random variables; see for
An Empirical Study of Eight Nonparametric Tests in Hierarchical Regression.
ERIC Educational Resources Information Center
Harwell, Michael; Serlin, Ronald C.
When normality does not hold, nonparametric tests represent an important data-analytic alternative to parametric tests. However, the use of nonparametric tests in educational research has been limited by the absence of easily performed tests for complex experimental designs and analyses, such as factorial designs and multiple regression analyses,…
Nonparametric Estimation of the Probability of Ruin.
1985-02-01
MATHEMATICS RESEARCH CENTER I E N FREES FEB 85 MRC/TSR...in NONPARAMETRIC ESTIMATION OF THE PROBABILITY OF RUIN Lf Edward W. Frees * Mathematics Research Center University of Wisconsin-Madison 610 Walnut...34 - .. --- - • ’. - -:- - - ..- . . .- -- .-.-. . -. . .- •. . - . . - . . .’ . ’- - .. -’vi . .-" "-- -" ,’- UNIVERSITY OF WISCONSIN-MADISON MATHEMATICS RESEARCH CENTER NONPARAMETRIC ESTIMATION OF THE PROBABILITY
Marginally specified priors for non-parametric Bayesian estimation
Kessler, David C.; Hoff, Peter D.; Dunson, David B.
2014-01-01
Summary Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables. PMID:25663813
Martinez Manzanera, Octavio; Elting, Jan Willem; van der Hoeven, Johannes H.; Maurits, Natasha M.
2016-01-01
In the clinic, tremor is diagnosed during a time-limited process in which patients are observed and the characteristics of tremor are visually assessed. For some tremor disorders, a more detailed analysis of these characteristics is needed. Accelerometry and electromyography can be used to obtain a better insight into tremor. Typically, routine clinical assessment of accelerometry and electromyography data involves visual inspection by clinicians and occasionally computational analysis to obtain objective characteristics of tremor. However, for some tremor disorders these characteristics may be different during daily activity. This variability in presentation between the clinic and daily life makes a differential diagnosis more difficult. A long-term recording of tremor by accelerometry and/or electromyography in the home environment could help to give a better insight into the tremor disorder. However, an evaluation of such recordings using routine clinical standards would take too much time. We evaluated a range of techniques that automatically detect tremor segments in accelerometer data, as accelerometer data is more easily obtained in the home environment than electromyography data. Time can be saved if clinicians only have to evaluate the tremor characteristics of segments that have been automatically detected in longer daily activity recordings. We tested four non-parametric methods and five parametric methods on clinical accelerometer data from 14 patients with different tremor disorders. The consensus between two clinicians regarding the presence or absence of tremor on 3943 segments of accelerometer data was employed as reference. The nine methods were tested against this reference to identify their optimal parameters. Non-parametric methods generally performed better than parametric methods on our dataset when optimal parameters were used. However, one parametric method, employing the high frequency content of the tremor bandwidth under consideration (High Freq) performed similarly to non-parametric methods, but had the highest recall values, suggesting that this method could be employed for automatic tremor detection. PMID:27258018
Mindfulness, Empathy, and Intercultural Sensitivity amongst Undergraduate Students
ERIC Educational Resources Information Center
Menardo, Dayne Arvin
2017-01-01
This study examined the relationships amongst mindfulness, empathy, and intercultural sensitivity. Non-parametric analysis were conducted through Spearman and Hayes's PROCESS bootstrapping to examine the relationship between mindfulness and intercultural sensitivity, and whether empathy mediates the relationship between mindfulness and…
Nonparametric Trajectory Analysis of R2PIER Data
Strategies to isolate air pollution contributions from sources is of interest as voluntary or regulatory measures are undertaken to reduce air pollution. When different sources are located in close proximity to one another and have similar emissions, separating source emissions ...
NASA Astrophysics Data System (ADS)
Schiemann, R.; Erdin, R.; Willi, M.; Frei, C.; Berenguer, M.; Sempere-Torres, D.
2011-05-01
Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semivariogram models are fit from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data fields. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial field under consideration, they yield an estimate of the semivariogram that is unbiased for small lag distances. This justifies the use of this estimation technique in geostatistical applications. Various formulations of geostatistical combination (Kriging) methods are used here for the construction of hourly precipitation grids for Switzerland based on data from a sparse realtime network of raingauges and from a spatially complete radar composite. Two variants of Ordinary Kriging (OK) are used to interpolate the sparse gauge observations. In both OK variants, the radar data are only used to determine the semivariogram model. One variant relies on a traditional parametric semivariogram estimate, whereas the other variant uses the nonparametric correlogram. The variants are tested for three cases and the impact of the semivariogram model on the Kriging prediction is illustrated. For the three test cases, the method using nonparametric correlograms performs equally well or better than the traditional method, and at the same time offers great practical advantages. Furthermore, two variants of Kriging with external drift (KED) are tested, both of which use the radar data to estimate nonparametric correlograms, and as the external drift variable. The first KED variant has been used previously for geostatistical radar-raingauge merging in Catalonia (Spain). The second variant is newly proposed here and is an extension of the first. Both variants are evaluated for the three test cases as well as an extended evaluation period. It is found that both methods yield merged fields of better quality than the original radar field or fields obtained by OK of gauge data. The newly suggested KED formulation is shown to be beneficial, in particular in mountainous regions where the quality of the Swiss radar composite is comparatively low. An analysis of the Kriging variances shows that none of the methods tested here provides a satisfactory uncertainty estimate. A suitable variable transformation is expected to improve this.
NASA Astrophysics Data System (ADS)
Schiemann, R.; Erdin, R.; Willi, M.; Frei, C.; Berenguer, M.; Sempere-Torres, D.
2010-09-01
Modelling spatial covariance is an essential part of all geostatistical methods. Traditionally, parametric semivariogram models are fit from available data. More recently, it has been suggested to use nonparametric correlograms obtained from spatially complete data fields. Here, both estimation techniques are compared. Nonparametric correlograms are shown to have a substantial negative bias. Nonetheless, when combined with the sample variance of the spatial field under consideration, they yield an estimate of the semivariogram that is unbiased for small lag distances. This justifies the use of this estimation technique in geostatistical applications. Various formulations of geostatistical combination (Kriging) methods are used here for the construction of hourly precipitation grids for Switzerland based on data from a sparse realtime network of raingauges and from a spatially complete radar composite. Two variants of Ordinary Kriging (OK) are used to interpolate the sparse gauge observations. In both OK variants, the radar data are only used to determine the semivariogram model. One variant relies on a traditional parametric semivariogram estimate, whereas the other variant uses the nonparametric correlogram. The variants are tested for three cases and the impact of the semivariogram model on the Kriging prediction is illustrated. For the three test cases, the method using nonparametric correlograms performs equally well or better than the traditional method, and at the same time offers great practical advantages. Furthermore, two variants of Kriging with external drift (KED) are tested, both of which use the radar data to estimate nonparametric correlograms, and as the external drift variable. The first KED variant has been used previously for geostatistical radar-raingauge merging in Catalonia (Spain). The second variant is newly proposed here and is an extension of the first. Both variants are evaluated for the three test cases as well as an extended evaluation period. It is found that both methods yield merged fields of better quality than the original radar field or fields obtained by OK of gauge data. The newly suggested KED formulation is shown to be beneficial, in particular in mountainous regions where the quality of the Swiss radar composite is comparatively low. An analysis of the Kriging variances shows that none of the methods tested here provides a satisfactory uncertainty estimate. A suitable variable transformation is expected to improve this.
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method
Zhang, Tingting; Kou, S. C.
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure. PMID:21258615
Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.
Zhang, Tingting; Kou, S C
2010-01-01
Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.
Nonparametric Bayesian Dictionary Learning for Analysis of Noisy and Incomplete Images
Zhou, Mingyuan; Chen, Haojun; Paisley, John; Ren, Lu; Li, Lingbo; Xing, Zhengming; Dunson, David; Sapiro, Guillermo; Carin, Lawrence
2013-01-01
Nonparametric Bayesian methods are considered for recovery of imagery based upon compressive, incomplete, and/or noisy measurements. A truncated beta-Bernoulli process is employed to infer an appropriate dictionary for the data under test and also for image recovery. In the context of compressive sensing, significant improvements in image recovery are manifested using learned dictionaries, relative to using standard orthonormal image expansions. The compressive-measurement projections are also optimized for the learned dictionary. Additionally, we consider simpler (incomplete) measurements, defined by measuring a subset of image pixels, uniformly selected at random. Spatial interrelationships within imagery are exploited through use of the Dirichlet and probit stick-breaking processes. Several example results are presented, with comparisons to other methods in the literature. PMID:21693421
Parametric Covariance Model for Horizon-Based Optical Navigation
NASA Technical Reports Server (NTRS)
Hikes, Jacob; Liounis, Andrew J.; Christian, John A.
2016-01-01
This Note presents an entirely parametric version of the covariance for horizon-based optical navigation measurements. The covariance can be written as a function of only the spacecraft position, two sensor design parameters, the illumination direction, the size of the observed planet, the size of the lit arc to be used, and the total number of observed horizon points. As a result, one may now more clearly understand the sensitivity of horizon-based optical navigation performance as a function of these key design parameters, which is insight that was obscured in previous (and nonparametric) versions of the covariance. Finally, the new parametric covariance is shown to agree with both the nonparametric analytic covariance and results from a Monte Carlo analysis.
The linear transformation model with frailties for the analysis of item response times.
Wang, Chun; Chang, Hua-Hua; Douglas, Jeffrey A
2013-02-01
The item response times (RTs) collected from computerized testing represent an underutilized source of information about items and examinees. In addition to knowing the examinees' responses to each item, we can investigate the amount of time examinees spend on each item. In this paper, we propose a semi-parametric model for RTs, the linear transformation model with a latent speed covariate, which combines the flexibility of non-parametric modelling and the brevity as well as interpretability of parametric modelling. In this new model, the RTs, after some non-parametric monotone transformation, become a linear model with latent speed as covariate plus an error term. The distribution of the error term implicitly defines the relationship between the RT and examinees' latent speeds; whereas the non-parametric transformation is able to describe various shapes of RT distributions. The linear transformation model represents a rich family of models that includes the Cox proportional hazards model, the Box-Cox normal model, and many other models as special cases. This new model is embedded in a hierarchical framework so that both RTs and responses are modelled simultaneously. A two-stage estimation method is proposed. In the first stage, the Markov chain Monte Carlo method is employed to estimate the parametric part of the model. In the second stage, an estimating equation method with a recursive algorithm is adopted to estimate the non-parametric transformation. Applicability of the new model is demonstrated with a simulation study and a real data application. Finally, methods to evaluate the model fit are suggested. © 2012 The British Psychological Society.
Robust non-parametric one-sample tests for the analysis of recurrent events.
Rebora, Paola; Galimberti, Stefania; Valsecchi, Maria Grazia
2010-12-30
One-sample non-parametric tests are proposed here for inference on recurring events. The focus is on the marginal mean function of events and the basis for inference is the standardized distance between the observed and the expected number of events under a specified reference rate. Different weights are considered in order to account for various types of alternative hypotheses on the mean function of the recurrent events process. A robust version and a stratified version of the test are also proposed. The performance of these tests was investigated through simulation studies under various underlying event generation processes, such as homogeneous and nonhomogeneous Poisson processes, autoregressive and renewal processes, with and without frailty effects. The robust versions of the test have been shown to be suitable in a wide variety of event generating processes. The motivating context is a study on gene therapy in a very rare immunodeficiency in children, where a major end-point is the recurrence of severe infections. Robust non-parametric one-sample tests for recurrent events can be useful to assess efficacy and especially safety in non-randomized studies or in epidemiological studies for comparison with a standard population. Copyright © 2010 John Wiley & Sons, Ltd.
Friston, Karl J.; Bastos, André M.; Oswal, Ashwini; van Wijk, Bernadette; Richter, Craig; Litvak, Vladimir
2014-01-01
This technical paper offers a critical re-evaluation of (spectral) Granger causality measures in the analysis of biological timeseries. Using realistic (neural mass) models of coupled neuronal dynamics, we evaluate the robustness of parametric and nonparametric Granger causality. Starting from a broad class of generative (state-space) models of neuronal dynamics, we show how their Volterra kernels prescribe the second-order statistics of their response to random fluctuations; characterised in terms of cross-spectral density, cross-covariance, autoregressive coefficients and directed transfer functions. These quantities in turn specify Granger causality — providing a direct (analytic) link between the parameters of a generative model and the expected Granger causality. We use this link to show that Granger causality measures based upon autoregressive models can become unreliable when the underlying dynamics is dominated by slow (unstable) modes — as quantified by the principal Lyapunov exponent. However, nonparametric measures based on causal spectral factors are robust to dynamical instability. We then demonstrate how both parametric and nonparametric spectral causality measures can become unreliable in the presence of measurement noise. Finally, we show that this problem can be finessed by deriving spectral causality measures from Volterra kernels, estimated using dynamic causal modelling. PMID:25003817
Impact of Business Cycles on US Suicide Rates, 1928–2007
Florence, Curtis S.; Quispe-Agnoli, Myriam; Ouyang, Lijing; Crosby, Alexander E.
2011-01-01
Objectives. We examined the associations of overall and age-specific suicide rates with business cycles from 1928 to 2007 in the United States. Methods. We conducted a graphical analysis of changes in suicide rates during business cycles, used nonparametric analyses to test associations between business cycles and suicide rates, and calculated correlations between the national unemployment rate and suicide rates. Results. Graphical analyses showed that the overall suicide rate generally rose during recessions and fell during expansions. Age-specific suicide rates responded differently to recessions and expansions. Nonparametric tests indicated that the overall suicide rate and the suicide rates of the groups aged 25 to 34 years, 35 to 44 years, 45 to 54 years, and 55 to 64 years rose during contractions and fell during expansions. Suicide rates of the groups aged 15 to 24 years, 65 to 74 years, and 75 years and older did not exhibit this behavior. Correlation results were concordant with all nonparametric results except for the group aged 65 to 74 years. Conclusions. Business cycles may affect suicide rates, although different age groups responded differently. Our findings suggest that public health responses are a necessary component of suicide prevention during recessions. PMID:21493938
Nonparametric Analyses of Log-Periodic Precursors to Financial Crashes
NASA Astrophysics Data System (ADS)
Zhou, Wei-Xing; Sornette, Didier
We apply two nonparametric methods to further test the hypothesis that log-periodicity characterizes the detrended price trajectory of large financial indices prior to financial crashes or strong corrections. The term "parametric" refers here to the use of the log-periodic power law formula to fit the data; in contrast, "nonparametric" refers to the use of general tools such as Fourier transform, and in the present case the Hilbert transform and the so-called (H, q)-analysis. The analysis using the (H, q)-derivative is applied to seven time series ending with the October 1987 crash, the October 1997 correction and the April 2000 crash of the Dow Jones Industrial Average (DJIA), the Standard & Poor 500 and Nasdaq indices. The Hilbert transform is applied to two detrended price time series in terms of the ln(tc-t) variable, where tc is the time of the crash. Taking all results together, we find strong evidence for a universal fundamental log-frequency f=1.02±0.05 corresponding to the scaling ratio λ=2.67±0.12. These values are in very good agreement with those obtained in earlier works with different parametric techniques. This note is extracted from a long unpublished report with 58 figures available at , which extensively describes the evidence we have accumulated on these seven time series, in particular by presenting all relevant details so that the reader can judge for himself or herself the validity and robustness of the results.
Stochastic Residual-Error Analysis For Estimating Hydrologic Model Predictive Uncertainty
A hybrid time series-nonparametric sampling approach, referred to herein as semiparametric, is presented for the estimation of model predictive uncertainty. The methodology is a two-step procedure whereby a distributed hydrologic model is first calibrated, then followed by brute ...
Bayesian dynamic mediation analysis.
Huang, Jing; Yuan, Ying
2017-12-01
Most existing methods for mediation analysis assume that mediation is a stationary, time-invariant process, which overlooks the inherently dynamic nature of many human psychological processes and behavioral activities. In this article, we consider mediation as a dynamic process that continuously changes over time. We propose Bayesian multilevel time-varying coefficient models to describe and estimate such dynamic mediation effects. By taking the nonparametric penalized spline approach, the proposed method is flexible and able to accommodate any shape of the relationship between time and mediation effects. Simulation studies show that the proposed method works well and faithfully reflects the true nature of the mediation process. By modeling mediation effect nonparametrically as a continuous function of time, our method provides a valuable tool to help researchers obtain a more complete understanding of the dynamic nature of the mediation process underlying psychological and behavioral phenomena. We also briefly discuss an alternative approach of using dynamic autoregressive mediation model to estimate the dynamic mediation effect. The computer code is provided to implement the proposed Bayesian dynamic mediation analysis. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
ERIC Educational Resources Information Center
Lee, Young-Sun; Wollack, James A.; Douglas, Jeffrey
2009-01-01
The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items,…
Randomization Procedures Applied to Analysis of Ballistic Data
1991-06-01
test,;;15. NUMBER OF PAGES data analysis; computationally intensive statistics ; randomization tests; permutation tests; 16 nonparametric statistics ...be 0.13. 8 Any reasonable statistical procedure would fail to support the notion of improvement of dynamic over standard indexing based on this data ...AD-A238 389 TECHNICAL REPORT BRL-TR-3245 iBRL RANDOMIZATION PROCEDURES APPLIED TO ANALYSIS OF BALLISTIC DATA MALCOLM S. TAYLOR BARRY A. BODT - JUNE
Ye, Xin; Pendyala, Ram M.; Zou, Yajie
2017-01-01
A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences. PMID:29073152
Wang, Ke; Ye, Xin; Pendyala, Ram M; Zou, Yajie
2017-01-01
A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences.
Mathematical models for nonparametric inferences from line transect data
Burnham, K.P.; Anderson, D.R.
1976-01-01
A general mathematical theory of line transects is develoepd which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(O) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y/r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(O/r).
ERIC Educational Resources Information Center
Guccio, Calogero; Martorana, Marco Ferdinando; Mazza, Isidoro
2017-01-01
The paper assesses the evolution of efficiency of Italian public universities for the period 2000-2010. It aims at investigating whether their levels of efficiency showed signs of convergence, and if the well-known disparity between northern and southern regions decreased. For this purpose, we use a refinement of data envelopment analysis, namely…
ERIC Educational Resources Information Center
Douglas, Jeff; Kim, Hae-Rim; Roussos, Louis; Stout, William; Zhang, Jinming
An extensive nonparametric dimensionality analysis of latent structure was conducted on three forms of the Law School Admission Test (LSAT) (December 1991, June 1992, and October 1992) using the DIMTEST model in confirmatory analyses and using DIMTEST, FAC, DETECT, HCA, PROX, and a genetic algorithm in exploratory analyses. Results indicate that…
Analysis of Parasite and Other Skewed Counts
Alexander, Neal
2012-01-01
Objective To review methods for the statistical analysis of parasite and other skewed count data. Methods Statistical methods for skewed count data are described and compared, with reference to those used over a ten year period of Tropical Medicine and International Health. Two parasitological datasets are used for illustration. Results Ninety papers were identified, 89 with descriptive and 60 with inferential analysis. A lack of clarity is noted in identifying measures of location, in particular the Williams and geometric mean. The different measures are compared, emphasizing the legitimacy of the arithmetic mean for skewed data. In the published papers, the t test and related methods were often used on untransformed data, which is likely to be invalid. Several approaches to inferential analysis are described, emphasizing 1) non-parametric methods, while noting that they are not simply comparisons of medians, and 2) generalized linear modelling, in particular with the negative binomial distribution. Additional methods, such as the bootstrap, with potential for greater use are described. Conclusions Clarity is recommended when describing transformations and measures of location. It is suggested that non-parametric methods and generalized linear models are likely to be sufficient for most analyses. PMID:22943299
In a previously published study, quantitative relationships were developed between landscape metrics and sediment contamination for 25 small estuarine systems within Chesapeake Bay. Nonparametric statistical analysis (rank transformation) was used to develop an empirical relation...
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.
Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen
2011-04-01
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.
Surface Estimation, Variable Selection, and the Nonparametric Oracle Property
Storlie, Curtis B.; Bondell, Howard D.; Reich, Brian J.; Zhang, Hao Helen
2010-01-01
Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting. PMID:21603586
Dugué, Audrey Emmanuelle; Pulido, Marina; Chabaud, Sylvie; Belin, Lisa; Gal, Jocelyn
2016-12-01
We describe how to estimate progression-free survival while dealing with interval-censored data in the setting of clinical trials in oncology. Three procedures with SAS and R statistical software are described: one allowing for a nonparametric maximum likelihood estimation of the survival curve using the EM-ICM (Expectation and Maximization-Iterative Convex Minorant) algorithm as described by Wellner and Zhan in 1997; a sensitivity analysis procedure in which the progression time is assigned (i) at the midpoint, (ii) at the upper limit (reflecting the standard analysis when the progression time is assigned at the first radiologic exam showing progressive disease), or (iii) at the lower limit of the censoring interval; and finally, two multiple imputations are described considering a uniform or the nonparametric maximum likelihood estimation (NPMLE) distribution. Clin Cancer Res; 22(23); 5629-35. ©2016 AACR. ©2016 American Association for Cancer Research.
Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses
Callahan, Ben J.; Sankaran, Kris; Fukuyama, Julia A.; McMurdie, Paul J.; Holmes, Susan P.
2016-01-01
High-throughput sequencing of PCR-amplified taxonomic markers (like the 16S rRNA gene) has enabled a new level of analysis of complex bacterial communities known as microbiomes. Many tools exist to quantify and compare abundance levels or OTU composition of communities in different conditions. The sequencing reads have to be denoised and assigned to the closest taxa from a reference database. Common approaches use a notion of 97% similarity and normalize the data by subsampling to equalize library sizes. In this paper, we show that statistical models allow more accurate abundance estimates. By providing a complete workflow in R, we enable the user to do sophisticated downstream statistical analyses, whether parametric or nonparametric. We provide examples of using the R packages dada2, phyloseq, DESeq2, ggplot2 and vegan to filter, visualize and test microbiome data. We also provide examples of supervised analyses using random forests and nonparametric testing using community networks and the ggnetwork package. PMID:27508062
Smoothing spline ANOVA frailty model for recurrent event data.
Du, Pang; Jiang, Yihua; Wang, Yuedong
2011-12-01
Gap time hazard estimation is of particular interest in recurrent event data. This article proposes a fully nonparametric approach for estimating the gap time hazard. Smoothing spline analysis of variance (ANOVA) decompositions are used to model the log gap time hazard as a joint function of gap time and covariates, and general frailty is introduced to account for between-subject heterogeneity and within-subject correlation. We estimate the nonparametric gap time hazard function and parameters in the frailty distribution using a combination of the Newton-Raphson procedure, the stochastic approximation algorithm (SAA), and the Markov chain Monte Carlo (MCMC) method. The convergence of the algorithm is guaranteed by decreasing the step size of parameter update and/or increasing the MCMC sample size along iterations. Model selection procedure is also developed to identify negligible components in a functional ANOVA decomposition of the log gap time hazard. We evaluate the proposed methods with simulation studies and illustrate its use through the analysis of bladder tumor data. © 2011, The International Biometric Society.
On sample size of the kruskal-wallis test with application to a mouse peritoneal cavity study.
Fan, Chunpeng; Zhang, Donghui; Zhang, Cun-Hui
2011-03-01
As the nonparametric generalization of the one-way analysis of variance model, the Kruskal-Wallis test applies when the goal is to test the difference between multiple samples and the underlying population distributions are nonnormal or unknown. Although the Kruskal-Wallis test has been widely used for data analysis, power and sample size methods for this test have been investigated to a much lesser extent. This article proposes new power and sample size calculation methods for the Kruskal-Wallis test based on the pilot study in either a completely nonparametric model or a semiparametric location model. No assumption is made on the shape of the underlying population distributions. Simulation results show that, in terms of sample size calculation for the Kruskal-Wallis test, the proposed methods are more reliable and preferable to some more traditional methods. A mouse peritoneal cavity study is used to demonstrate the application of the methods. © 2010, The International Biometric Society.
ERIC Educational Resources Information Center
Sueiro, Manuel J.; Abad, Francisco J.
2011-01-01
The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…
The non-parametric Parzen's window in stereo vision matching.
Pajares, G; de la Cruz, J
2002-01-01
This paper presents an approach to the local stereovision matching problem using edge segments as features with four attributes. From these attributes we compute a matching probability between pairs of features of the stereo images. A correspondence is said true when such a probability is maximum. We introduce a nonparametric strategy based on Parzen's window (1962) to estimate a probability density function (PDF) which is used to obtain the matching probability. This is the main finding of the paper. A comparative analysis of other recent matching methods is included to show that this finding can be justified theoretically. A generalization of the proposed method is made in order to give guidelines about its use with the similarity constraint and also in different environments where other features and attributes are more suitable.
Crainiceanu, Ciprian M.; Caffo, Brian S.; Di, Chong-Zhi; Punjabi, Naresh M.
2009-01-01
We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep. Using subject specific signals estimated with known variability in a second level regression becomes a nonstandard measurement error problem. We propose and implement methods that take into account cross-sectional and longitudinal measurement error. The research presented here forms the basis for EEG signal processing for the SHHS. PMID:20057925
Parametric, nonparametric and parametric modelling of a chaotic circuit time series
NASA Astrophysics Data System (ADS)
Timmer, J.; Rust, H.; Horbelt, W.; Voss, H. U.
2000-09-01
The determination of a differential equation underlying a measured time series is a frequently arising task in nonlinear time series analysis. In the validation of a proposed model one often faces the dilemma that it is hard to decide whether possible discrepancies between the time series and model output are caused by an inappropriate model or by bad estimates of parameters in a correct type of model, or both. We propose a combination of parametric modelling based on Bock's multiple shooting algorithm and nonparametric modelling based on optimal transformations as a strategy to test proposed models and if rejected suggest and test new ones. We exemplify this strategy on an experimental time series from a chaotic circuit where we obtain an extremely accurate reconstruction of the observed attractor.
Adjacent-Categories Mokken Models for Rater-Mediated Assessments
ERIC Educational Resources Information Center
Wind, Stefanie A.
2017-01-01
Molenaar extended Mokken's original probabilistic-nonparametric scaling models for use with polytomous data. These polytomous extensions of Mokken's original scaling procedure have facilitated the use of Mokken scale analysis as an approach to exploring fundamental measurement properties across a variety of domains in which polytomous ratings are…
Treatment of Selective Mutism: A Best-Evidence Synthesis.
ERIC Educational Resources Information Center
Stone, Beth Pionek; Kratochwill, Thomas R.; Sladezcek, Ingrid; Serlin, Ronald C.
2002-01-01
Presents systematic analysis of the major treatment approaches used for selective mutism. Based on nonparametric statistical tests of effect sizes, major findings include the following: treatment of selective mutism is more effective than no treatment; behaviorally oriented treatment approaches are more effective than no treatment; and no…
Mathematical models for non-parametric inferences from line transect data
Burnham, K.P.; Anderson, D.R.
1976-01-01
A general mathematical theory of line transects is developed which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(0) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y I r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(0 I r).
A Semi-parametric Transformation Frailty Model for Semi-competing Risks Survival Data
Jiang, Fei; Haneuse, Sebastien
2016-01-01
In the analysis of semi-competing risks data interest lies in estimation and inference with respect to a so-called non-terminal event, the observation of which is subject to a terminal event. Multi-state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non-terminal and terminal events specified, in part, by a unit-specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, σ2. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi-parametric efficient score under the complete data setting and propose a non-parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small-sample operating characteristics evaluated via simulation. Although the proposed semi-parametric transformation model and non-parametric score imputation method are motivated by the analysis of semi-competing risks data, they are broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer. PMID:28439147
J. Chris Toney; Karen G. Schleeweis; Jennifer Dungan; Andrew Michaelis; Todd Schroeder; Gretchen G. Moisen
2015-01-01
The North American Forest Dynamics (NAFD) projectâs Attribution Team is completing nationwide processing of historic Landsat data to provide a comprehensive annual, wall-to-wall analysis of US disturbance history, with attribution, over the last 25+ years. Per-pixel time series analysis based on a new nonparametric curve fitting algorithm yields several metrics useful...
ERIC Educational Resources Information Center
Kimber, Birgitta; Sandell, Rolf
2009-01-01
The study considers the impact of a program for social and emotional learning in Swedish schools on use of drugs, volatile substances, alcohol and tobacco. The program was evaluated in an effectiveness study. Intervention students were compared longitudinally with non-intervention students using nonparametric latent class analysis to identify…
The Importance of Practice in the Development of Statistics.
1983-01-01
RESOLUTION TEST CHART NATIONAL BUREAU OIF STANDARDS 1963 -A NRC Technical Summary Report #2471 C THE IMORTANCE OF PRACTICE IN to THE DEVELOPMENT OF STATISTICS...component analysis, bioassay, limits for a ratio, quality control, sampling inspection, non-parametric tests , transformation theory, ARIMA time series...models, sequential tests , cumulative sum charts, data analysis plotting techniques, and a resolution of the Bayes - frequentist controversy. It appears
A Hybrid Index for Characterizing Drought Based on a Nonparametric Kernel Estimator
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huang, Shengzhi; Huang, Qiang; Leng, Guoyong
This study develops a nonparametric multivariate drought index, namely, the Nonparametric Multivariate Standardized Drought Index (NMSDI), by considering the variations of both precipitation and streamflow. Building upon previous efforts in constructing Nonparametric Multivariate Drought Index, we use the nonparametric kernel estimator to derive the joint distribution of precipitation and streamflow, thus providing additional insights in drought index development. The proposed NMSDI are applied in the Wei River Basin (WRB), based on which the drought evolution characteristics are investigated. Results indicate: (1) generally, NMSDI captures the drought onset similar to Standardized Precipitation Index (SPI) and drought termination and persistence similar tomore » Standardized Streamflow Index (SSFI). The drought events identified by NMSDI match well with historical drought records in the WRB. The performances are also consistent with that by an existing Multivariate Standardized Drought Index (MSDI) at various timescales, confirming the validity of the newly constructed NMSDI in drought detections (2) An increasing risk of drought has been detected for the past decades, and will be persistent to a certain extent in future in most areas of the WRB; (3) the identified change points of annual NMSDI are mainly concentrated in the early 1970s and middle 1990s, coincident with extensive water use and soil reservation practices. This study highlights the nonparametric multivariable drought index, which can be used for drought detections and predictions efficiently and comprehensively.« less
Why preferring parametric forecasting to nonparametric methods?
Jabot, Franck
2015-05-07
A recent series of papers by Charles T. Perretti and collaborators have shown that nonparametric forecasting methods can outperform parametric methods in noisy nonlinear systems. Such a situation can arise because of two main reasons: the instability of parametric inference procedures in chaotic systems which can lead to biased parameter estimates, and the discrepancy between the real system dynamics and the modeled one, a problem that Perretti and collaborators call "the true model myth". Should ecologists go on using the demanding parametric machinery when trying to forecast the dynamics of complex ecosystems? Or should they rely on the elegant nonparametric approach that appears so promising? It will be here argued that ecological forecasting based on parametric models presents two key comparative advantages over nonparametric approaches. First, the likelihood of parametric forecasting failure can be diagnosed thanks to simple Bayesian model checking procedures. Second, when parametric forecasting is diagnosed to be reliable, forecasting uncertainty can be estimated on virtual data generated with the fitted to data parametric model. In contrast, nonparametric techniques provide forecasts with unknown reliability. This argumentation is illustrated with the simple theta-logistic model that was previously used by Perretti and collaborators to make their point. It should convince ecologists to stick to standard parametric approaches, until methods have been developed to assess the reliability of nonparametric forecasting. Copyright © 2015 Elsevier Ltd. All rights reserved.
DIF Trees: Using Classification Trees to Detect Differential Item Functioning
ERIC Educational Resources Information Center
Vaughn, Brandon K.; Wang, Qiu
2010-01-01
A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…
Nonparametric Trajectory Analysis of CMAPS Data
As part of the Cleveland Multiple Air Pollutant Study (CMAPS), 30-minute average concentrations of the elemental composition of PM2.5 were made at two sites during the months of August 2009 and February 2010. The elements measured were: Al, As, Ba, Be, Ca, Cd, Ce, Co, Cr, Cs, Cu...
On the Bias-Amplifying Effect of Near Instruments in Observational Studies
ERIC Educational Resources Information Center
Steiner, Peter M.; Kim, Yongnam
2014-01-01
In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…
ERIC Educational Resources Information Center
Morgan, Daryle Whitney
To assess the status of welding in various manufacturing industries and to ascertain the occupational preparation needed for welding tradesmen, technicians, and technologists, completed questionnaires were obtained from 138 selected industrial specialists. The hypotheses were tested by Freidman's two-way nonparametric analysis of variance and by…
A nonparametric clustering technique which estimates the number of clusters
NASA Technical Reports Server (NTRS)
Ramey, D. B.
1983-01-01
In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.
NASA Astrophysics Data System (ADS)
Fernández-Llamazares, Álvaro; Belmonte, Jordina; Delgado, Rosario; De Linares, Concepción
2014-04-01
Airborne pollen records are a suitable indicator for the study of climate change. The present work focuses on the role of annual pollen indices for the detection of bioclimatic trends through the analysis of the aerobiological spectra of 11 taxa of great biogeographical relevance in Catalonia over an 18-year period (1994-2011), by means of different parametric and non-parametric statistical methods. Among others, two non-parametric rank-based statistical tests were performed for detecting monotonic trends in time series data of the selected airborne pollen types and we have observed that they have similar power in detecting trends. Except for those cases in which the pollen data can be well-modeled by a normal distribution, it is better to apply non-parametric statistical methods to aerobiological studies. Our results provide a reliable representation of the pollen trends in the region and suggest that greater pollen quantities are being liberated to the atmosphere in the last years, specially by Mediterranean taxa such as Pinus, Total Quercus and Evergreen Quercus, although the trends may differ geographically. Longer aerobiological monitoring periods are required to corroborate these results and survey the increasing levels of certain pollen types that could exert an impact in terms of public health.
Gorfine, Malka; Bordo, Nadia; Hsu, Li
2017-01-01
Summary Consider a popular case–control family study where individuals with a disease under study (case probands) and individuals who do not have the disease (control probands) are randomly sampled from a well-defined population. Possibly right-censored age at onset and disease status are observed for both probands and their relatives. For example, case probands are men diagnosed with prostate cancer, control probands are men free of prostate cancer, and the prostate cancer history of the fathers of the probands is also collected. Inherited genetic susceptibility, shared environment, and common behavior lead to correlation among the outcomes within a family. In this article, a novel nonparametric estimator of the marginal survival function is provided. The estimator is defined in the presence of intra-cluster dependence, and is based on consistent smoothed kernel estimators of conditional survival functions. By simulation, it is shown that the proposed estimator performs very well in terms of bias. The utility of the estimator is illustrated by the analysis of case–control family data of early onset prostate cancer. To our knowledge, this is the first article that provides a fully nonparametric marginal survival estimator based on case–control clustered age-at-onset data. PMID:27436674
Nixon, Richard M; Wonderling, David; Grieve, Richard D
2010-03-01
Cost-effectiveness analyses (CEA) alongside randomised controlled trials commonly estimate incremental net benefits (INB), with 95% confidence intervals, and compute cost-effectiveness acceptability curves and confidence ellipses. Two alternative non-parametric methods for estimating INB are to apply the central limit theorem (CLT) or to use the non-parametric bootstrap method, although it is unclear which method is preferable. This paper describes the statistical rationale underlying each of these methods and illustrates their application with a trial-based CEA. It compares the sampling uncertainty from using either technique in a Monte Carlo simulation. The experiments are repeated varying the sample size and the skewness of costs in the population. The results showed that, even when data were highly skewed, both methods accurately estimated the true standard errors (SEs) when sample sizes were moderate to large (n>50), and also gave good estimates for small data sets with low skewness. However, when sample sizes were relatively small and the data highly skewed, using the CLT rather than the bootstrap led to slightly more accurate SEs. We conclude that while in general using either method is appropriate, the CLT is easier to implement, and provides SEs that are at least as accurate as the bootstrap. (c) 2009 John Wiley & Sons, Ltd.
Nonparametric regression applied to quantitative structure-activity relationships
Constans; Hirst
2000-03-01
Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density designs. Performances were benchmarked against the nonlinear method of smoothing splines. A linear reference point was provided by multilinear regression (MLR). Variable selection was explored using systematic combinations of different variables and combinations of principal components. For the data set examined, 47 inhibitors of dopamine beta-hydroxylase, the additive nonparametric regressors have greater predictive accuracy (as measured by the mean absolute error of the predictions or the Pearson correlation in cross-validation trails) than MLR. The use of principal components did not improve the performance of the nonparametric regressors over use of the original descriptors, since the original descriptors are not strongly correlated. It remains to be seen if the nonparametric regressors can be successfully coupled with better variable selection and dimensionality reduction in the context of high-dimensional QSARs.
A nonparametric spatial scan statistic for continuous data.
Jung, Inkyung; Cho, Ho Jin
2015-10-20
Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.
Marmarelis, Vasilis Z.; Berger, Theodore W.
2009-01-01
Parametric and non-parametric modeling methods are combined to study the short-term plasticity (STP) of synapses in the central nervous system (CNS). The nonlinear dynamics of STP are modeled by means: (1) previously proposed parametric models based on mechanistic hypotheses and/or specific dynamical processes, and (2) non-parametric models (in the form of Volterra kernels) that transforms the presynaptic signals into postsynaptic signals. In order to synergistically use the two approaches, we estimate the Volterra kernels of the parametric models of STP for four types of synapses using synthetic broadband input–output data. Results show that the non-parametric models accurately and efficiently replicate the input–output transformations of the parametric models. Volterra kernels provide a general and quantitative representation of the STP. PMID:18506609
Nonparametric Bayesian Modeling for Automated Database Schema Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ferragut, Erik M; Laska, Jason A
2015-01-01
The problem of merging databases arises in many government and commercial applications. Schema matching, a common first step, identifies equivalent fields between databases. We introduce a schema matching framework that builds nonparametric Bayesian models for each field and compares them by computing the probability that a single model could have generated both fields. Our experiments show that our method is more accurate and faster than the existing instance-based matching algorithms in part because of the use of nonparametric Bayesian models.
Estimating survival of radio-tagged birds
Bunck, C.M.; Pollock, K.H.; Lebreton, J.-D.; North, P.M.
1993-01-01
Parametric and nonparametric methods for estimating survival of radio-tagged birds are described. The general assumptions of these methods are reviewed. An estimate based on the assumption of constant survival throughout the period is emphasized in the overview of parametric methods. Two nonparametric methods, the Kaplan-Meier estimate of the survival funcrion and the log rank test, are explained in detail The link between these nonparametric methods and traditional capture-recapture models is discussed aloag with considerations in designing studies that use telemetry techniques to estimate survival.
Nonparametric estimation of benchmark doses in environmental risk assessment
Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen
2013-01-01
Summary An important statistical objective in environmental risk analysis is estimation of minimum exposure levels, called benchmark doses (BMDs), that induce a pre-specified benchmark response in a dose-response experiment. In such settings, representations of the risk are traditionally based on a parametric dose-response model. It is a well-known concern, however, that if the chosen parametric form is misspecified, inaccurate and possibly unsafe low-dose inferences can result. We apply a nonparametric approach for calculating benchmark doses, based on an isotonic regression method for dose-response estimation with quantal-response data (Bhattacharya and Kong, 2007). We determine the large-sample properties of the estimator, develop bootstrap-based confidence limits on the BMDs, and explore the confidence limits’ small-sample properties via a short simulation study. An example from cancer risk assessment illustrates the calculations. PMID:23914133
A New Hybrid-Multiscale SSA Prediction of Non-Stationary Time Series
NASA Astrophysics Data System (ADS)
Ghanbarzadeh, Mitra; Aminghafari, Mina
2016-02-01
Singular spectral analysis (SSA) is a non-parametric method used in the prediction of non-stationary time series. It has two parameters, which are difficult to determine and very sensitive to their values. Since, SSA is a deterministic-based method, it does not give good results when the time series is contaminated with a high noise level and correlated noise. Therefore, we introduce a novel method to handle these problems. It is based on the prediction of non-decimated wavelet (NDW) signals by SSA and then, prediction of residuals by wavelet regression. The advantages of our method are the automatic determination of parameters and taking account of the stochastic structure of time series. As shown through the simulated and real data, we obtain better results than SSA, a non-parametric wavelet regression method and Holt-Winters method.
Varabyova, Yauheniya; Schreyögg, Jonas
2013-09-01
There is a growing interest in the cross-country comparisons of the performance of national health care systems. The present work provides a comparison of the technical efficiency of the hospital sector using unbalanced panel data from OECD countries over the period 2000-2009. The estimation of the technical efficiency of the hospital sector is performed using nonparametric data envelopment analysis (DEA) and parametric stochastic frontier analysis (SFA). Internal and external validity of findings is assessed by estimating the Spearman rank correlations between the results obtained in different model specifications. The panel-data analyses using two-step DEA and one-stage SFA show that countries, which have higher health care expenditure per capita, tend to have a more technically efficient hospital sector. Whether the expenditure is financed through private or public sources is not related to the technical efficiency of the hospital sector. On the other hand, the hospital sector in countries with higher income inequality and longer average hospital length of stay is less technically efficient. Copyright © 2013 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Xu, Maoqi; Chen, Liang
2018-01-01
The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Nonparametric methods in actigraphy: An update
Gonçalves, Bruno S.B.; Cavalcanti, Paula R.A.; Tavares, Gracilene R.; Campos, Tania F.; Araujo, John F.
2014-01-01
Circadian rhythmicity in humans has been well studied using actigraphy, a method of measuring gross motor movement. As actigraphic technology continues to evolve, it is important for data analysis to keep pace with new variables and features. Our objective is to study the behavior of two variables, interdaily stability and intradaily variability, to describe rest activity rhythm. Simulated data and actigraphy data of humans, rats, and marmosets were used in this study. We modified the method of calculation for IV and IS by modifying the time intervals of analysis. For each variable, we calculated the average value (IVm and ISm) results for each time interval. Simulated data showed that (1) synchronization analysis depends on sample size, and (2) fragmentation is independent of the amplitude of the generated noise. We were able to obtain a significant difference in the fragmentation patterns of stroke patients using an IVm variable, while the variable IV60 was not identified. Rhythmic synchronization of activity and rest was significantly higher in young than adults with Parkinson׳s when using the ISM variable; however, this difference was not seen using IS60. We propose an updated format to calculate rhythmic fragmentation, including two additional optional variables. These alternative methods of nonparametric analysis aim to more precisely detect sleep–wake cycle fragmentation and synchronization. PMID:26483921
Supratentorial lesions contribute to trigeminal neuralgia in multiple sclerosis.
Fröhlich, Kilian; Winder, Klemens; Linker, Ralf A; Engelhorn, Tobias; Dörfler, Arnd; Lee, De-Hyung; Hilz, Max J; Schwab, Stefan; Seifert, Frank
2018-06-01
Background It has been proposed that multiple sclerosis lesions afflicting the pontine trigeminal afferents contribute to trigeminal neuralgia in multiple sclerosis. So far, there are no imaging studies that have evaluated interactions between supratentorial lesions and trigeminal neuralgia in multiple sclerosis patients. Methods We conducted a retrospective study and sought multiple sclerosis patients with trigeminal neuralgia and controls in a local database. Multiple sclerosis lesions were manually outlined and transformed into stereotaxic space. We determined the lesion overlap and performed a voxel-wise subtraction analysis. Secondly, we conducted a voxel-wise non-parametric analysis using the Liebermeister test. Results From 12,210 multiple sclerosis patient records screened, we identified 41 patients with trigeminal neuralgia. The voxel-wise subtraction analysis yielded associations between trigeminal neuralgia and multiple sclerosis lesions in the pontine trigeminal afferents, as well as larger supratentorial lesion clusters in the contralateral insula and hippocampus. The non-parametric statistical analysis using the Liebermeister test yielded similar areas to be associated with multiple sclerosis-related trigeminal neuralgia. Conclusions Our study confirms previous data on associations between multiple sclerosis-related trigeminal neuralgia and pontine lesions, and showed for the first time an association with lesions in the insular region, a region involved in pain processing and endogenous pain modulation.
Autosomal Dominant Nonsyndromic Cleft Lip and Palate: Significant Evidence of Linkage at 18q21.1
Beiraghi, Soraya ; Nath, Swapan K. ; Gaines, Matthew ; Mandhyan, Desh D. ; Hutchings, David ; Ratnamala, Uppala ; McElreavey, Ken ; Bartoloni, Lucia ; Antonarakis, Gregory S. ; Antonarakis, Stylianos E. ; Radhakrishna, Uppala
2007-01-01
Nonsyndromic cleft lip with or without cleft palate (NSCL/P) is one of the most common congenital facial defects, with an incidence of 1 in 700–1,000 live births among individuals of European descent. Several linkage and association studies of NSCL/P have suggested numerous candidate genes and genomic regions. A genomewide linkage analysis of a large multigenerational family (UR410) with NSCL/P was performed using a single-nucleotide–polymorphism array. Nonparametric linkage (NPL) analysis provided significant evidence of linkage for marker rs728683 on chromosome 18q21.1 (NPL=43.33 and P=.000061; nonparametric LOD=3.97 and P=.00001). Parametric linkage analysis with a dominant mode of inheritance and reduced penetrance resulted in a maximum LOD score of 3.61 at position 47.4 Mb on chromosome 18q21.1. Haplotype analysis with informative crossovers defined a 5.7-Mb genomic region spanned by proximal marker rs1824683 (42,403,918 bp) and distal marker rs768206 (48,132,862 bp). Thus, a novel genomic region on 18q21.1 was identified that most likely harbors a high-risk variant for NSCL/P in this family; we propose to name this locus “OFC11” (orofacial cleft 11). PMID:17564975
A Review of DIMPACK Version 1.0: Conditional Covariance-Based Test Dimensionality Analysis Package
ERIC Educational Resources Information Center
Deng, Nina; Han, Kyung T.; Hambleton, Ronald K.
2013-01-01
DIMPACK Version 1.0 for assessing test dimensionality based on a nonparametric conditional covariance approach is reviewed. This software was originally distributed by Assessment Systems Corporation and now can be freely accessed online. The software consists of Windows-based interfaces of three components: DIMTEST, DETECT, and CCPROX/HAC, which…
The Effect of Sample Size on Parametric and Nonparametric Factor Analytical Methods
ERIC Educational Resources Information Center
Kalkan, Ömür Kaya; Kelecioglu, Hülya
2016-01-01
Linear factor analysis models used to examine constructs underlying the responses are not very suitable for dichotomous or polytomous response formats. The associated problems cannot be eliminated by polychoric or tetrachoric correlations in place of the Pearson correlation. Therefore, we considered parameters obtained from the NOHARM and FACTOR…
Efficiency, Technology and Productivity Change in Australian Universities, 1998-2003
ERIC Educational Resources Information Center
Worthington, Andrew C.; Lee, Boon L.
2008-01-01
In this study, productivity growth in 35 Australian universities is investigated using non-parametric frontier techniques over the period 1998-2003. The five inputs included in the analysis are full-time equivalent academic and non-academic staff, non-labor expenditure and undergraduate and postgraduate student load while the six outputs are…
Forest Stand Canopy Structure Attribute Estimation from High Resolution Digital Airborne Imagery
Demetrios Gatziolis
2006-01-01
A study of forest stand canopy variable assessment using digital, airborne, multispectral imagery is presented. Variable estimation involves stem density, canopy closure, and mean crown diameter, and it is based on quantification of spatial autocorrelation among pixel digital numbers (DN) using variogram analysis and an alternative, non-parametric approach known as...
The Efficiency of Higher Education Institutions in England Revisited: Comparing Alternative Measures
ERIC Educational Resources Information Center
Johnes, Geraint; Tone, Kaoru
2017-01-01
Data envelopment analysis (DEA) has often been used to evaluate efficiency in the context of higher education institutions. Yet there are numerous alternative non-parametric measures of efficiency available. This paper compares efficiency scores obtained for institutions of higher education in England, 2013-2014, using three different methods: the…
Conditional Covariance-Based Subtest Selection for DIMTEST
ERIC Educational Resources Information Center
Froelich, Amy G.; Habing, Brian
2008-01-01
DIMTEST is a nonparametric hypothesis-testing procedure designed to test the assumptions of a unidimensional and locally independent item response theory model. Several previous Monte Carlo studies have found that using linear factor analysis to select the assessment subtest for DIMTEST results in a moderate to severe loss of power when the exam…
Duan, Fenghai; Xu, Ye
2017-01-01
To analyze a microarray experiment to identify the genes with expressions varying after the diagnosis of breast cancer. A total of 44 928 probe sets in an Affymetrix microarray data publicly available on Gene Expression Omnibus from 249 patients with breast cancer were analyzed by the nonparametric multivariate adaptive splines. Then, the identified genes with turning points were grouped by K-means clustering, and their network relationship was subsequently analyzed by the Ingenuity Pathway Analysis. In total, 1640 probe sets (genes) were reliably identified to have turning points along with the age at diagnosis in their expression profiling, of which 927 expressed lower after turning points and 713 expressed higher after the turning points. K-means clustered them into 3 groups with turning points centering at 54, 62.5, and 72, respectively. The pathway analysis showed that the identified genes were actively involved in various cancer-related functions or networks. In this article, we applied the nonparametric multivariate adaptive splines method to a publicly available gene expression data and successfully identified genes with expressions varying before and after breast cancer diagnosis.
Koohbor, Behrad; Kidane, Addis; Lu, Wei -Yang; ...
2016-01-25
Dynamic stress–strain response of rigid closed-cell polymeric foams is investigated in this work by subjecting high toughness polyurethane foam specimens to direct impact with different projectile velocities and quantifying their deformation response with high speed stereo-photography together with 3D digital image correlation. The measured transient displacement field developed in the specimens during high stain rate loading is used to calculate the transient axial acceleration field throughout the specimen. A simple mathematical formulation based on conservation of mass is also proposed to determine the local change of density in the specimen during deformation. By obtaining the full-field acceleration and density distributions,more » the inertia stresses at each point in the specimen are determined through a non-parametric analysis and superimposed on the stress magnitudes measured at specimen ends to obtain the full-field stress distribution. Furthermore, the process outlined above overcomes a major challenge in high strain rate experiments with low impedance polymeric foam specimens, i.e. the delayed equilibrium conditions can be quantified.« less
Gini estimation under infinite variance
NASA Astrophysics Data System (ADS)
Fontanari, Andrea; Taleb, Nassim Nicholas; Cirillo, Pasquale
2018-07-01
We study the problems related to the estimation of the Gini index in presence of a fat-tailed data generating process, i.e. one in the stable distribution class with finite mean but infinite variance (i.e. with tail index α ∈(1 , 2)). We show that, in such a case, the Gini coefficient cannot be reliably estimated using conventional nonparametric methods, because of a downward bias that emerges under fat tails. This has important implications for the ongoing discussion about economic inequality. We start by discussing how the nonparametric estimator of the Gini index undergoes a phase transition in the symmetry structure of its asymptotic distribution, as the data distribution shifts from the domain of attraction of a light-tailed distribution to that of a fat-tailed one, especially in the case of infinite variance. We also show how the nonparametric Gini bias increases with lower values of α. We then prove that maximum likelihood estimation outperforms nonparametric methods, requiring a much smaller sample size to reach efficiency. Finally, for fat-tailed data, we provide a simple correction mechanism to the small sample bias of the nonparametric estimator based on the distance between the mode and the mean of its asymptotic distribution.
Lin, Lawrence; Pan, Yi; Hedayat, A S; Barnhart, Huiman X; Haber, Michael
2016-01-01
Total deviation index (TDI) captures a prespecified quantile of the absolute deviation of paired observations from raters, observers, methods, assays, instruments, etc. We compare the performance of TDI using nonparametric quantile regression to the TDI assuming normality (Lin, 2000). This simulation study considers three distributions: normal, Poisson, and uniform at quantile levels of 0.8 and 0.9 for cases with and without contamination. Study endpoints include the bias of TDI estimates (compared with their respective theoretical values), standard error of TDI estimates (compared with their true simulated standard errors), and test size (compared with 0.05), and power. Nonparametric TDI using quantile regression, although it slightly underestimates and delivers slightly less power for data without contamination, works satisfactorily under all simulated cases even for moderate (say, ≥40) sample sizes. The performance of the TDI based on a quantile of 0.8 is in general superior to that of 0.9. The performances of nonparametric and parametric TDI methods are compared with a real data example. Nonparametric TDI can be very useful when the underlying distribution on the difference is not normal, especially when it has a heavy tail.
NASA Astrophysics Data System (ADS)
Bugała, Artur; Bednarek, Karol; Kasprzyk, Leszek; Tomczewski, Andrzej
2017-10-01
The paper presents the most representative - from the three-year measurement time period - characteristics of daily and monthly electricity production from a photovoltaic conversion using modules installed in a fixed and 2-axis tracking construction. Results are presented for selected summer, autumn, spring and winter days. Analyzed measuring stand is located on the roof of the Faculty of Electrical Engineering Poznan University of Technology building. The basic parameters of the statistical analysis like mean value, standard deviation, skewness, kurtosis, median, range, or coefficient of variation were used. It was found that the asymmetry factor can be useful in the analysis of the daily electricity production from a photovoltaic conversion. In order to determine the repeatability of monthly electricity production, occurring between the summer, and summer and winter months, a non-parametric Mann-Whitney U test was used as a statistical solution. In order to analyze the repeatability of daily peak hours, describing the largest value of the hourly electricity production, a non-parametric Kruskal-Wallis test was applied as an extension of the Mann-Whitney U test. Based on the analysis of the electric energy distribution from a prepared monitoring system it was found that traditional forecasting methods of the electricity production from a photovoltaic conversion, like multiple regression models, should not be the preferred methods of the analysis.
Nonparametric method for failures diagnosis in the actuating subsystem of aircraft control system
NASA Astrophysics Data System (ADS)
Terentev, M. N.; Karpenko, S. S.; Zybin, E. Yu; Kosyanchuk, V. V.
2018-02-01
In this paper we design a nonparametric method for failures diagnosis in the aircraft control system that uses the measurements of the control signals and the aircraft states only. It doesn’t require a priori information of the aircraft model parameters, training or statistical calculations, and is based on analytical nonparametric one-step-ahead state prediction approach. This makes it possible to predict the behavior of unidentified and failure dynamic systems, to weaken the requirements to control signals, and to reduce the diagnostic time and problem complexity.
Prentice, Ross L; Zhao, Shanshan
2018-01-01
The Dabrowska (Ann Stat 16:1475-1489, 1988) product integral representation of the multivariate survivor function is extended, leading to a nonparametric survivor function estimator for an arbitrary number of failure time variates that has a simple recursive formula for its calculation. Empirical process methods are used to sketch proofs for this estimator's strong consistency and weak convergence properties. Summary measures of pairwise and higher-order dependencies are also defined and nonparametrically estimated. Simulation evaluation is given for the special case of three failure time variates.
Wilcox, Thomas P; Zwickl, Derrick J; Heath, Tracy A; Hillis, David M
2002-11-01
Four New World genera of dwarf boas (Exiliboa, Trachyboa, Tropidophis, and Ungaliophis) have been placed by many systematists in a single group (traditionally called Tropidophiidae). However, the monophyly of this group has been questioned in several studies. Moreover, the overall relationships among basal snake lineages, including the placement of the dwarf boas, are poorly understood. We obtained mtDNA sequence data for 12S, 16S, and intervening tRNA-val genes from 23 species of snakes representing most major snake lineages, including all four genera of New World dwarf boas. We then examined the phylogenetic position of these species by estimating the phylogeny of the basal snakes. Our phylogenetic analysis suggests that New World dwarf boas are not monophyletic. Instead, we find Exiliboa and Ungaliophis to be most closely related to sand boas (Erycinae), boas (Boinae), and advanced snakes (Caenophidea), whereas Tropidophis and Trachyboa form an independent clade that separated relatively early in snake radiation. Our estimate of snake phylogeny differs significantly in other ways from some previous estimates of snake phylogeny. For instance, pythons do not cluster with boas and sand boas, but instead show a strong relationship with Loxocemus and Xenopeltis. Additionally, uropeltids cluster strongly with Cylindrophis, and together are embedded in what has previously been considered the macrostomatan radiation. These relationships are supported by both bootstrapping (parametric and nonparametric approaches) and Bayesian analysis, although Bayesian support values are consistently higher than those obtained from nonparametric bootstrapping. Simulations show that Bayesian support values represent much better estimates of phylogenetic accuracy than do nonparametric bootstrap support values, at least under the conditions of our study. Copyright 2002 Elsevier Science (USA)
An appraisal of statistical procedures used in derivation of reference intervals.
Ichihara, Kiyoshi; Boyd, James C
2010-11-01
When conducting studies to derive reference intervals (RIs), various statistical procedures are commonly applied at each step, from the planning stages to final computation of RIs. Determination of the necessary sample size is an important consideration, and evaluation of at least 400 individuals in each subgroup has been recommended to establish reliable common RIs in multicenter studies. Multiple regression analysis allows identification of the most important factors contributing to variation in test results, while accounting for possible confounding relationships among these factors. Of the various approaches proposed for judging the necessity of partitioning reference values, nested analysis of variance (ANOVA) is the likely method of choice owing to its ability to handle multiple groups and being able to adjust for multiple factors. Box-Cox power transformation often has been used to transform data to a Gaussian distribution for parametric computation of RIs. However, this transformation occasionally fails. Therefore, the non-parametric method based on determination of the 2.5 and 97.5 percentiles following sorting of the data, has been recommended for general use. The performance of the Box-Cox transformation can be improved by introducing an additional parameter representing the origin of transformation. In simulations, the confidence intervals (CIs) of reference limits (RLs) calculated by the parametric method were narrower than those calculated by the non-parametric approach. However, the margin of difference was rather small owing to additional variability in parametrically-determined RLs introduced by estimation of parameters for the Box-Cox transformation. The parametric calculation method may have an advantage over the non-parametric method in allowing identification and exclusion of extreme values during RI computation.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235
Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.
Ding, Mingtao; He, Lihan; Dunson, David; Carin, Lawrence
2012-12-01
A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.
Using exogenous variables in testing for monotonic trends in hydrologic time series
Alley, William M.
1988-01-01
One approach that has been used in performing a nonparametric test for monotonic trend in a hydrologic time series consists of a two-stage analysis. First, a regression equation is estimated for the variable being tested as a function of an exogenous variable. A nonparametric trend test such as the Kendall test is then performed on the residuals from the equation. By analogy to stagewise regression and through Monte Carlo experiments, it is demonstrated that this approach will tend to underestimate the magnitude of the trend and to result in some loss in power as a result of ignoring the interaction between the exogenous variable and time. An alternative approach, referred to as the adjusted variable Kendall test, is demonstrated to generally have increased statistical power and to provide more reliable estimates of the trend slope. In addition, the utility of including an exogenous variable in a trend test is examined under selected conditions.
On an additive partial correlation operator and nonparametric estimation of graphical models.
Lee, Kuang-Yao; Li, Bing; Zhao, Hongyu
2016-09-01
We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance.
On an additive partial correlation operator and nonparametric estimation of graphical models
Li, Bing; Zhao, Hongyu
2016-01-01
Abstract We introduce an additive partial correlation operator as an extension of partial correlation to the nonlinear setting, and use it to develop a new estimator for nonparametric graphical models. Our graphical models are based on additive conditional independence, a statistical relation that captures the spirit of conditional independence without having to resort to high-dimensional kernels for its estimation. The additive partial correlation operator completely characterizes additive conditional independence, and has the additional advantage of putting marginal variation on appropriate scales when evaluating interdependence, which leads to more accurate statistical inference. We establish the consistency of the proposed estimator. Through simulation experiments and analysis of the DREAM4 Challenge dataset, we demonstrate that our method performs better than existing methods in cases where the Gaussian or copula Gaussian assumption does not hold, and that a more appropriate scaling for our method further enhances its performance. PMID:29422689
NASA Astrophysics Data System (ADS)
Dai, Jun; Zhou, Haigang; Zhao, Shaoquan
2017-01-01
This paper considers a multi-scale future hedge strategy that minimizes lower partial moments (LPM). To do this, wavelet analysis is adopted to decompose time series data into different components. Next, different parametric estimation methods with known distributions are applied to calculate the LPM of hedged portfolios, which is the key to determining multi-scale hedge ratios over different time scales. Then these parametric methods are compared with the prevailing nonparametric kernel metric method. Empirical results indicate that in the China Securities Index 300 (CSI 300) index futures and spot markets, hedge ratios and hedge efficiency estimated by the nonparametric kernel metric method are inferior to those estimated by parametric hedging model based on the features of sequence distributions. In addition, if minimum-LPM is selected as a hedge target, the hedging periods, degree of risk aversion, and target returns can affect the multi-scale hedge ratios and hedge efficiency, respectively.
Nonparametric estimation of stochastic differential equations with sparse Gaussian processes.
García, Constantino A; Otero, Abraham; Félix, Paulo; Presedo, Jesús; Márquez, David G
2017-08-01
The application of stochastic differential equations (SDEs) to the analysis of temporal data has attracted increasing attention, due to their ability to describe complex dynamics with physically interpretable equations. In this paper, we introduce a nonparametric method for estimating the drift and diffusion terms of SDEs from a densely observed discrete time series. The use of Gaussian processes as priors permits working directly in a function-space view and thus the inference takes place directly in this space. To cope with the computational complexity that requires the use of Gaussian processes, a sparse Gaussian process approximation is provided. This approximation permits the efficient computation of predictions for the drift and diffusion terms by using a distribution over a small subset of pseudosamples. The proposed method has been validated using both simulated data and real data from economy and paleoclimatology. The application of the method to real data demonstrates its ability to capture the behavior of complex systems.
Liu, Yuewei; Chen, Weihong
2012-02-01
As a nonparametric method, the Kruskal-Wallis test is widely used to compare three or more independent groups when an ordinal or interval level of data is available, especially when the assumptions of analysis of variance (ANOVA) are not met. If the Kruskal-Wallis statistic is statistically significant, Nemenyi test is an alternative method for further pairwise multiple comparisons to locate the source of significance. Unfortunately, most popular statistical packages do not integrate the Nemenyi test, which is not easy to be calculated by hand. We described the theory and applications of the Kruskal-Wallis and Nemenyi tests, and presented a flexible SAS macro to implement the two tests. The SAS macro was demonstrated by two examples from our cohort study in occupational epidemiology. It provides a useful tool for SAS users to test the differences among three or more independent groups using a nonparametric method.
Traffic flow forecasting using approximate nearest neighbor nonparametric regression
DOT National Transportation Integrated Search
2000-12-01
The purpose of this research is to enhance nonparametric regression (NPR) for use in real-time systems by first reducing execution time using advanced data structures and imprecise computations and then developing a methodology for applying NPR. Due ...
Radioactivity Registered With a Small Number of Events
NASA Astrophysics Data System (ADS)
Zlokazov, Victor; Utyonkov, Vladimir
2018-02-01
The synthesis of superheavy elements asks for the analysis of low statistics experimental data presumably obeying an unknown exponential distribution and to take the decision whether they originate from one source or have admixtures. Here we analyze predictions following from non-parametrical methods, employing only such fundamental sample properties as the sample mean, the median and the mode.
ERIC Educational Resources Information Center
Hoijtink, Herbert; Molenaar, Ivo W.
1997-01-01
This paper shows that a certain class of constrained latent class models may be interpreted as a special case of nonparametric multidimensional item response models. Parameters of this latent class model are estimated using an application of the Gibbs sampler, and model fit is investigated using posterior predictive checks. (SLD)
ERIC Educational Resources Information Center
Yorke, Mantz
2017-01-01
When analysing course-level data by subgroups based upon some demographic characteristics, the numbers in analytical cells are often too small to allow inferences to be drawn that might help in the enhancement of practices. However, relatively simple analyses can provide useful pointers. This article draws upon a study involving a partnership with…
ERIC Educational Resources Information Center
Dimitrov, Dimiter M.
2007-01-01
The validation of cognitive attributes required for correct answers on binary test items or tasks has been addressed in previous research through the integration of cognitive psychology and psychometric models using parametric or nonparametric item response theory, latent class modeling, and Bayesian modeling. All previous models, each with their…
Evaluating kriging as a tool to improve moderate resolution maps of forest biomass
Elizabeth A. Freeman; Gretchen G. Moisen
2007-01-01
The USDA Forest Service, Forest Inventory and Analysis program (FIA) recently produced a nationwide map of forest biomass by modeling biomass collected on forest inventory plots as nonparametric functions of moderate resolution satellite data and other environmental variables using Cubist software. Efforts are underway to develop methods to enhance this initial map. We...
Zhao, Zhibiao
2011-06-01
We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise.
Nonparametric Transfer Function Models
Liu, Jun M.; Chen, Rong; Yao, Qiwei
2009-01-01
In this paper a class of nonparametric transfer function models is proposed to model nonlinear relationships between ‘input’ and ‘output’ time series. The transfer function is smooth with unknown functional forms, and the noise is assumed to be a stationary autoregressive-moving average (ARMA) process. The nonparametric transfer function is estimated jointly with the ARMA parameters. By modeling the correlation in the noise, the transfer function can be estimated more efficiently. The parsimonious ARMA structure improves the estimation efficiency in finite samples. The asymptotic properties of the estimators are investigated. The finite-sample properties are illustrated through simulations and one empirical example. PMID:20628584
A semi-nonparametric Poisson regression model for analyzing motor vehicle crash data.
Ye, Xin; Wang, Ke; Zou, Yajie; Lord, Dominique
2018-01-01
This paper develops a semi-nonparametric Poisson regression model to analyze motor vehicle crash frequency data collected from rural multilane highway segments in California, US. Motor vehicle crash frequency on rural highway is a topic of interest in the area of transportation safety due to higher driving speeds and the resultant severity level. Unlike the traditional Negative Binomial (NB) model, the semi-nonparametric Poisson regression model can accommodate an unobserved heterogeneity following a highly flexible semi-nonparametric (SNP) distribution. Simulation experiments are conducted to demonstrate that the SNP distribution can well mimic a large family of distributions, including normal distributions, log-gamma distributions, bimodal and trimodal distributions. Empirical estimation results show that such flexibility offered by the SNP distribution can greatly improve model precision and the overall goodness-of-fit. The semi-nonparametric distribution can provide a better understanding of crash data structure through its ability to capture potential multimodality in the distribution of unobserved heterogeneity. When estimated coefficients in empirical models are compared, SNP and NB models are found to have a substantially different coefficient for the dummy variable indicating the lane width. The SNP model with better statistical performance suggests that the NB model overestimates the effect of lane width on crash frequency reduction by 83.1%.
Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo
2018-01-01
This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
Comparison of methods for estimating the attributable risk in the context of survival analysis.
Gassama, Malamine; Bénichou, Jacques; Dartois, Laureen; Thiébaut, Anne C M
2017-01-23
The attributable risk (AR) measures the proportion of disease cases that can be attributed to an exposure in the population. Several definitions and estimation methods have been proposed for survival data. Using simulations, we compared four methods for estimating AR defined in terms of survival functions: two nonparametric methods based on Kaplan-Meier's estimator, one semiparametric based on Cox's model, and one parametric based on the piecewise constant hazards model, as well as one simpler method based on estimated exposure prevalence at baseline and Cox's model hazard ratio. We considered a fixed binary exposure with varying exposure probabilities and strengths of association, and generated event times from a proportional hazards model with constant or monotonic (decreasing or increasing) Weibull baseline hazard, as well as from a nonproportional hazards model. We simulated 1,000 independent samples of size 1,000 or 10,000. The methods were compared in terms of mean bias, mean estimated standard error, empirical standard deviation and 95% confidence interval coverage probability at four equally spaced time points. Under proportional hazards, all five methods yielded unbiased results regardless of sample size. Nonparametric methods displayed greater variability than other approaches. All methods showed satisfactory coverage except for nonparametric methods at the end of follow-up for a sample size of 1,000 especially. With nonproportional hazards, nonparametric methods yielded similar results to those under proportional hazards, whereas semiparametric and parametric approaches that both relied on the proportional hazards assumption performed poorly. These methods were applied to estimate the AR of breast cancer due to menopausal hormone therapy in 38,359 women of the E3N cohort. In practice, our study suggests to use the semiparametric or parametric approaches to estimate AR as a function of time in cohort studies if the proportional hazards assumption appears appropriate.
Uncertainty in determining extreme precipitation thresholds
NASA Astrophysics Data System (ADS)
Liu, Bingjun; Chen, Junfan; Chen, Xiaohong; Lian, Yanqing; Wu, Lili
2013-10-01
Extreme precipitation events are rare and occur mostly on a relatively small and local scale, which makes it difficult to set the thresholds for extreme precipitations in a large basin. Based on the long term daily precipitation data from 62 observation stations in the Pearl River Basin, this study has assessed the applicability of the non-parametric, parametric, and the detrended fluctuation analysis (DFA) methods in determining extreme precipitation threshold (EPT) and the certainty to EPTs from each method. Analyses from this study show the non-parametric absolute critical value method is easy to use, but unable to reflect the difference of spatial rainfall distribution. The non-parametric percentile method can account for the spatial distribution feature of precipitation, but the problem with this method is that the threshold value is sensitive to the size of rainfall data series and is subjected to the selection of a percentile thus make it difficult to determine reasonable threshold values for a large basin. The parametric method can provide the most apt description of extreme precipitations by fitting extreme precipitation distributions with probability distribution functions; however, selections of probability distribution functions, the goodness-of-fit tests, and the size of the rainfall data series can greatly affect the fitting accuracy. In contrast to the non-parametric and the parametric methods which are unable to provide information for EPTs with certainty, the DFA method although involving complicated computational processes has proven to be the most appropriate method that is able to provide a unique set of EPTs for a large basin with uneven spatio-temporal precipitation distribution. The consistency between the spatial distribution of DFA-based thresholds with the annual average precipitation, the coefficient of variation (CV), and the coefficient of skewness (CS) for the daily precipitation further proves that EPTs determined by the DFA method are more reasonable and applicable for the Pearl River Basin.
NASA Astrophysics Data System (ADS)
Lototzis, M.; Papadopoulos, G. K.; Droulia, F.; Tseliou, A.; Tsiros, I. X.
2018-04-01
There are several cases where a circular variable is associated with a linear one. A typical example is wind direction that is often associated with linear quantities such as air temperature and air humidity. The analysis of a statistical relationship of this kind can be tested by the use of parametric and non-parametric methods, each of which has its own advantages and drawbacks. This work deals with correlation analysis using both the parametric and the non-parametric procedure on a small set of meteorological data of air temperature and wind direction during a summer period in a Mediterranean climate. Correlations were examined between hourly, daily and maximum-prevailing values, under typical and non-typical meteorological conditions. Both tests indicated a strong correlation between mean hourly wind directions and mean hourly air temperature, whereas mean daily wind direction and mean daily air temperature do not seem to be correlated. In some cases, however, the two procedures were found to give quite dissimilar levels of significance on the rejection or not of the null hypothesis of no correlation. The simple statistical analysis presented in this study, appropriately extended in large sets of meteorological data, may be a useful tool for estimating effects of wind on local climate studies.
A permutation-based non-parametric analysis of CRISPR screen data.
Jia, Gaoxiang; Wang, Xinlei; Xiao, Guanghua
2017-07-19
Clustered regularly-interspaced short palindromic repeats (CRISPR) screens are usually implemented in cultured cells to identify genes with critical functions. Although several methods have been developed or adapted to analyze CRISPR screening data, no single specific algorithm has gained popularity. Thus, rigorous procedures are needed to overcome the shortcomings of existing algorithms. We developed a Permutation-Based Non-Parametric Analysis (PBNPA) algorithm, which computes p-values at the gene level by permuting sgRNA labels, and thus it avoids restrictive distributional assumptions. Although PBNPA is designed to analyze CRISPR data, it can also be applied to analyze genetic screens implemented with siRNAs or shRNAs and drug screens. We compared the performance of PBNPA with competing methods on simulated data as well as on real data. PBNPA outperformed recent methods designed for CRISPR screen analysis, as well as methods used for analyzing other functional genomics screens, in terms of Receiver Operating Characteristics (ROC) curves and False Discovery Rate (FDR) control for simulated data under various settings. Remarkably, the PBNPA algorithm showed better consistency and FDR control on published real data as well. PBNPA yields more consistent and reliable results than its competitors, especially when the data quality is low. R package of PBNPA is available at: https://cran.r-project.org/web/packages/PBNPA/ .
New analysis methods to push the boundaries of diagnostic techniques in the environmental sciences
NASA Astrophysics Data System (ADS)
Lungaroni, M.; Murari, A.; Peluso, E.; Gelfusa, M.; Malizia, A.; Vega, J.; Talebzadeh, S.; Gaudio, P.
2016-04-01
In the last years, new and more sophisticated measurements have been at the basis of the major progress in various disciplines related to the environment, such as remote sensing and thermonuclear fusion. To maximize the effectiveness of the measurements, new data analysis techniques are required. First data processing tasks, such as filtering and fitting, are of primary importance, since they can have a strong influence on the rest of the analysis. Even if Support Vector Regression is a method devised and refined at the end of the 90s, a systematic comparison with more traditional non parametric regression methods has never been reported. In this paper, a series of systematic tests is described, which indicates how SVR is a very competitive method of non-parametric regression that can usefully complement and often outperform more consolidated approaches. The performance of Support Vector Regression as a method of filtering is investigated first, comparing it with the most popular alternative techniques. Then Support Vector Regression is applied to the problem of non-parametric regression to analyse Lidar surveys for the environments measurement of particulate matter due to wildfires. The proposed approach has given very positive results and provides new perspectives to the interpretation of the data.
Carvajal, Roberto C; Arias, Luis E; Garces, Hugo O; Sbarbaro, Daniel G
2016-04-01
This work presents a non-parametric method based on a principal component analysis (PCA) and a parametric one based on artificial neural networks (ANN) to remove continuous baseline features from spectra. The non-parametric method estimates the baseline based on a set of sampled basis vectors obtained from PCA applied over a previously composed continuous spectra learning matrix. The parametric method, however, uses an ANN to filter out the baseline. Previous studies have demonstrated that this method is one of the most effective for baseline removal. The evaluation of both methods was carried out by using a synthetic database designed for benchmarking baseline removal algorithms, containing 100 synthetic composed spectra at different signal-to-baseline ratio (SBR), signal-to-noise ratio (SNR), and baseline slopes. In addition to deomonstrating the utility of the proposed methods and to compare them in a real application, a spectral data set measured from a flame radiation process was used. Several performance metrics such as correlation coefficient, chi-square value, and goodness-of-fit coefficient were calculated to quantify and compare both algorithms. Results demonstrate that the PCA-based method outperforms the one based on ANN both in terms of performance and simplicity. © The Author(s) 2016.
Frequency Analysis Using Bootstrap Method and SIR Algorithm for Prevention of Natural Disasters
NASA Astrophysics Data System (ADS)
Kim, T.; Kim, Y. S.
2017-12-01
The frequency analysis of hydrometeorological data is one of the most important factors in response to natural disaster damage, and design standards for a disaster prevention facilities. In case of frequency analysis of hydrometeorological data, it assumes that observation data have statistical stationarity, and a parametric method considering the parameter of probability distribution is applied. For a parametric method, it is necessary to sufficiently collect reliable data; however, snowfall observations are needed to compensate for insufficient data in Korea, because of reducing the number of days for snowfall observations and mean maximum daily snowfall depth due to climate change. In this study, we conducted the frequency analysis for snowfall using the Bootstrap method and SIR algorithm which are the resampling methods that can overcome the problems of insufficient data. For the 58 meteorological stations distributed evenly in Korea, the probability of snowfall depth was estimated by non-parametric frequency analysis using the maximum daily snowfall depth data. The results show that probabilistic daily snowfall depth by frequency analysis is decreased at most stations, and most stations representing the rate of change were found to be consistent in both parametric and non-parametric frequency analysis. This study shows that the resampling methods can do the frequency analysis of the snowfall depth that has insufficient observed samples, which can be applied to interpretation of other natural disasters such as summer typhoons with seasonal characteristics. Acknowledgment.This research was supported by a grant(MPSS-NH-2015-79) from Disaster Prediction and Mitigation Technology Development Program funded by Korean Ministry of Public Safety and Security(MPSS).
A Nonparametric Statistical Approach to the Validation of Computer Simulation Models
1985-11-01
Ballistic Research Laboratory, the Experimental Design and Analysis Branch of the Systems Engineering and Concepts Analysis Division was funded to...2 Winter. E M. Wisemiler. D P. azd UjiharmJ K. Venrgcation ad Validatiot of Engineering Simulatiots with Minimal D2ta." Pmeedinr’ of the 1976 Summer...used by numerous authors. Law%6 has augmented their approach with specific suggestions for each of the three stage’s: 1. develop high face-validity
Serum and Plasma Metabolomic Biomarkers for Lung Cancer.
Kumar, Nishith; Shahjaman, Md; Mollah, Md Nurul Haque; Islam, S M Shahinul; Hoque, Md Aminul
2017-01-01
In drug invention and early disease prediction of lung cancer, metabolomic biomarker detection is very important. Mortality rate can be decreased, if cancer is predicted at the earlier stage. Recent diagnostic techniques for lung cancer are not prognosis diagnostic techniques. However, if we know the name of the metabolites, whose intensity levels are considerably changing between cancer subject and control subject, then it will be easy to early diagnosis the disease as well as to discover the drug. Therefore, in this paper we have identified the influential plasma and serum blood sample metabolites for lung cancer and also identified the biomarkers that will be helpful for early disease prediction as well as for drug invention. To identify the influential metabolites, we considered a parametric and a nonparametric test namely student׳s t-test as parametric and Kruskal-Wallis test as non-parametric test. We also categorized the up-regulated and down-regulated metabolites by the heatmap plot and identified the biomarkers by support vector machine (SVM) classifier and pathway analysis. From our analysis, we got 27 influential (p-value<0.05) metabolites from plasma sample and 13 influential (p-value<0.05) metabolites from serum sample. According to the importance plot through SVM classifier, pathway analysis and correlation network analysis, we declared 4 metabolites (taurine, aspertic acid, glutamine and pyruvic acid) as plasma biomarker and 3 metabolites (aspartic acid, taurine and inosine) as serum biomarker.
Thirty Years of Nonparametric Item Response Theory.
ERIC Educational Resources Information Center
Molenaar, Ivo W.
2001-01-01
Discusses relationships between a mathematical measurement model and its real-world applications. Makes a distinction between large-scale data matrices commonly found in educational measurement and smaller matrices found in attitude and personality measurement. Also evaluates nonparametric methods for estimating item response functions and…
Conditional Covariance-Based Nonparametric Multidimensionality Assessment.
ERIC Educational Resources Information Center
Stout, William; And Others
1996-01-01
Three nonparametric procedures that use estimates of covariances of item-pair responses conditioned on examinee trait level for assessing dimensionality of a test are described. The HCA/CCPROX, DIMTEST, and DETECT are applied to a dimensionality study of the Law School Admission Test. (SLD)
Nonparametric Regression and the Parametric Bootstrap for Local Dependence Assessment.
ERIC Educational Resources Information Center
Habing, Brian
2001-01-01
Discusses ideas underlying nonparametric regression and the parametric bootstrap with an overview of their application to item response theory and the assessment of local dependence. Illustrates the use of the method in assessing local dependence that varies with examinee trait levels. (SLD)
NASA Astrophysics Data System (ADS)
Liao, Meng; To, Quy-Dong; Léonard, Céline; Monchiet, Vincent
2018-03-01
In this paper, we use the molecular dynamics simulation method to study gas-wall boundary conditions. Discrete scattering information of gas molecules at the wall surface is obtained from collision simulations. The collision data can be used to identify the accommodation coefficients for parametric wall models such as Maxwell and Cercignani-Lampis scattering kernels. Since these scattering kernels are based on a limited number of accommodation coefficients, we adopt non-parametric statistical methods to construct the kernel to overcome these issues. Different from parametric kernels, the non-parametric kernels require no parameter (i.e. accommodation coefficients) and no predefined distribution. We also propose approaches to derive directly the Navier friction and Kapitza thermal resistance coefficients as well as other interface coefficients associated with moment equations from the non-parametric kernels. The methods are applied successfully to systems composed of CH4 or CO2 and graphite, which are of interest to the petroleum industry.
Nonparametric Simulation of Signal Transduction Networks with Semi-Synchronized Update
Nassiri, Isar; Masoudi-Nejad, Ali; Jalili, Mahdi; Moeini, Ali
2012-01-01
Simulating signal transduction in cellular signaling networks provides predictions of network dynamics by quantifying the changes in concentration and activity-level of the individual proteins. Since numerical values of kinetic parameters might be difficult to obtain, it is imperative to develop non-parametric approaches that combine the connectivity of a network with the response of individual proteins to signals which travel through the network. The activity levels of signaling proteins computed through existing non-parametric modeling tools do not show significant correlations with the observed values in experimental results. In this work we developed a non-parametric computational framework to describe the profile of the evolving process and the time course of the proportion of active form of molecules in the signal transduction networks. The model is also capable of incorporating perturbations. The model was validated on four signaling networks showing that it can effectively uncover the activity levels and trends of response during signal transduction process. PMID:22737250
Modeling Predictors of Duties Not Including Flying Status.
Tvaryanas, Anthony P; Griffith, Converse
2018-01-01
The purpose of this study was to reuse available datasets to conduct an analysis of potential predictors of U.S. Air Force aircrew nonavailability in terms of being in "duties not to include flying" (DNIF) status. This study was a retrospective cohort analysis of U.S. Air Force aircrew on active duty during the period from 2003-2012. Predictor variables included age, Air Force Specialty Code (AFSC), clinic location, diagnosis, gender, pay grade, and service component. The response variable was DNIF duration. Nonparametric methods were used for the exploratory analysis and parametric methods were used for model building and statistical inference. Out of a set of 783 potential predictor variables, 339 variables were identified from the nonparametric exploratory analysis for inclusion in the parametric analysis. Of these, 54 variables had significant associations with DNIF duration in the final model fitted to the validation data set. The predicted results of this model for DNIF duration had a correlation of 0.45 with the actual number of DNIF days. Predictor variables included age, 6 AFSCs, 7 clinic locations, and 40 primary diagnosis categories. Specific demographic (i.e., age), occupational (i.e., AFSC), and health (i.e., clinic location and primary diagnosis category) DNIF drivers were identified. Subsequent research should focus on the application of primary, secondary, and tertiary prevention measures to ameliorate the potential impact of these DNIF drivers where possible.Tvaryanas AP, Griffith C Jr. Modeling predictors of duties not including flying status. Aerosp Med Hum Perform. 2018; 89(1):52-57.
Nonparametric model validations for hidden Markov models with applications in financial econometrics
Zhao, Zhibiao
2011-01-01
We address the nonparametric model validation problem for hidden Markov models with partially observable variables and hidden states. We achieve this goal by constructing a nonparametric simultaneous confidence envelope for transition density function of the observable variables and checking whether the parametric density estimate is contained within such an envelope. Our specification test procedure is motivated by a functional connection between the transition density of the observable variables and the Markov transition kernel of the hidden states. Our approach is applicable for continuous time diffusion models, stochastic volatility models, nonlinear time series models, and models with market microstructure noise. PMID:21750601
Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA
Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui
2014-01-01
Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792
Non-parametric analysis of LANDSAT maps using neural nets and parallel computers
NASA Technical Reports Server (NTRS)
Salu, Yehuda; Tilton, James
1991-01-01
Nearest neighbor approaches and a new neural network, the Binary Diamond, are used for the classification of images of ground pixels obtained by LANDSAT satellite. The performances are evaluated by comparing classifications of a scene in the vicinity of Washington DC. The problem of optimal selection of categories is addressed as a step in the classification process.
To Math or Not to Math: The Algebra-Calculus Pipeline and Postsecondary Mathematics Remediation
ERIC Educational Resources Information Center
Showalter, Daniel A.
2017-01-01
This article reports on a study designed to estimate the effect of high school coursetaking in the algebra-calculus pipeline on the likelihood of placing out of postsecondary remedial mathematics. A nonparametric variant of propensity score analysis was used on a nationally representative data set to remove selection bias and test for an effect…
FORTRAN implementation of Friedman's test for several related samples
NASA Technical Reports Server (NTRS)
Davidson, S. A.
1982-01-01
The FRIEDMAN program is a FORTRAN-coded implementation of Friedman's nonparametric test for several related samples with one observation per treatment/-block combination, or as it is sometimes called, the two-way analysis of variance by ranks. The FRIEDMAN program is described and a test data set and its results are presented to aid potential users of this program.
Multivariate Density Estimation and Remote Sensing
NASA Technical Reports Server (NTRS)
Scott, D. W.
1983-01-01
Current efforts to develop methods and computer algorithms to effectively represent multivariate data commonly encountered in remote sensing applications are described. While this may involve scatter diagrams, multivariate representations of nonparametric probability density estimates are emphasized. The density function provides a useful graphical tool for looking at data and a useful theoretical tool for classification. This approach is called a thunderstorm data analysis.
ERIC Educational Resources Information Center
Beevers, Christopher G.; Strong, David R.; Meyer, Bjorn; Pilkonis, Paul A.; Miller, Ivan R.
2007-01-01
Despite a central role for dysfunctional attitudes in cognitive theories of depression and the widespread use of the Dysfunctional Attitude Scale, form A (DAS-A; A. Weissman, 1979), the psychometric development of the DAS-A has been relatively limited. The authors used nonparametric item response theory methods to examine the DAS-A items and…
NASA Astrophysics Data System (ADS)
Li, Zhengxiang; Gonzalez, J. E.; Yu, Hongwei; Zhu, Zong-Hong; Alcaniz, J. S.
2016-02-01
We apply two methods, i.e., the Gaussian processes and the nonparametric smoothing procedure, to reconstruct the Hubble parameter H (z ) as a function of redshift from 15 measurements of the expansion rate obtained from age estimates of passively evolving galaxies. These reconstructions enable us to derive the luminosity distance to a certain redshift z , calibrate the light-curve fitting parameters accounting for the (unknown) intrinsic magnitude of type Ia supernova (SNe Ia), and construct cosmological model-independent Hubble diagrams of SNe Ia. In order to test the compatibility between the reconstructed functions of H (z ), we perform a statistical analysis considering the latest SNe Ia sample, the so-called joint light-curve compilation. We find that, for the Gaussian processes, the reconstructed functions of Hubble parameter versus redshift, and thus the following analysis on SNe Ia calibrations and cosmological implications, are sensitive to prior mean functions. However, for the nonparametric smoothing method, the reconstructed functions are not dependent on initial guess models, and consistently require high values of H0, which are in excellent agreement with recent measurements of this quantity from Cepheids and other local distance indicators.
Henry, Ronald C; Vette, Alan; Norris, Gary; Vedantham, Ram; Kimbrough, Sue; Shores, Richard C
2011-12-15
Nonparametric Trajectory Analysis (NTA), a receptor-oriented model, was used to assess the impact of local sources of air pollution at monitoring sites located adjacent to highway I-15 in Las Vegas, NV. Measurements of black carbon, carbon monoxide, nitrogen oxides, and sulfur dioxide concentrations were collected from December 2008 to December 2009. The purpose of the study was to determine the impact of the highway at three downwind monitoring stations using an upwind station to measure background concentrations. NTA was used to precisely determine the contribution of the highway to the average concentrations measured at the monitoring stations accounting for the spatially heterogeneous contributions of other local urban sources. NTA uses short time average concentrations, 5 min in this case, and constructed local back-trajectories from similarly short time average wind speed and direction to locate and quantify contributions from local source regions. Averaged over an entire year, the decrease of concentrations with distance from the highway was found to be consistent with previous studies. For this study, the NTA model is shown to be a reliable approach to quantify the impact of the highway on local air quality in an urban area with other local sources.
Regionalizing nonparametric models of precipitation amounts on different temporal scales
NASA Astrophysics Data System (ADS)
Mosthaf, Tobias; Bárdossy, András
2017-05-01
Parametric distribution functions are commonly used to model precipitation amounts corresponding to different durations. The precipitation amounts themselves are crucial for stochastic rainfall generators and weather generators. Nonparametric kernel density estimates (KDEs) offer a more flexible way to model precipitation amounts. As already stated in their name, these models do not exhibit parameters that can be easily regionalized to run rainfall generators at ungauged locations as well as at gauged locations. To overcome this deficiency, we present a new interpolation scheme for nonparametric models and evaluate it for different temporal resolutions ranging from hourly to monthly. During the evaluation, the nonparametric methods are compared to commonly used parametric models like the two-parameter gamma and the mixed-exponential distribution. As water volume is considered to be an essential parameter for applications like flood modeling, a Lorenz-curve-based criterion is also introduced. To add value to the estimation of data at sub-daily resolutions, we incorporated the plentiful daily measurements in the interpolation scheme, and this idea was evaluated. The study region is the federal state of Baden-Württemberg in the southwest of Germany with more than 500 rain gauges. The validation results show that the newly proposed nonparametric interpolation scheme provides reasonable results and that the incorporation of daily values in the regionalization of sub-daily models is very beneficial.
A simple randomisation procedure for validating discriminant analysis: a methodological note.
Wastell, D G
1987-04-01
Because the goal of discriminant analysis (DA) is to optimise classification, it designedly exaggerates between-group differences. This bias complicates validation of DA. Jack-knifing has been used for validation but is inappropriate when stepwise selection (SWDA) is employed. A simple randomisation test is presented which is shown to give correct decisions for SWDA. The general superiority of randomisation tests over orthodox significance tests is discussed. Current work on non-parametric methods of estimating the error rates of prediction rules is briefly reviewed.
Hans T. Schreuder; Jin-Mann S. Lin; John Teply
2000-01-01
We estimate number of tree species in National Forest populations using the nonparametric estimator. Data from the Current Vegetation Survey (CVS) of Region 6 of the USDA Forest Service were used to estimate the number of tree species with a plot close in size to the Forest Inventory and Analysis (FIA) plot and the actual CVS plot for the 5.5 km FIA grid and the 2.7 km...
Sun, Xiaochun; Ma, Ping; Mumm, Rita H
2012-01-01
Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.
Sun, Xiaochun; Ma, Ping; Mumm, Rita H.
2012-01-01
Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression. PMID:23226325
Sarkar, Rajarshi
2013-07-01
The validity of the entire renal function tests as a diagnostic tool depends substantially on the Biological Reference Interval (BRI) of urea. Establishment of BRI of urea is difficult partly because exclusion criteria for selection of reference data are quite rigid and partly due to the compartmentalization considerations regarding age and sex of the reference individuals. Moreover, construction of Biological Reference Curve (BRC) of urea is imperative to highlight the partitioning requirements. This a priori study examines the data collected by measuring serum urea of 3202 age and sex matched individuals, aged between 1 and 80 years, by a kinetic UV Urease/GLDH method on a Roche Cobas 6000 auto-analyzer. Mann-Whitney U test of the reference data confirmed the partitioning requirement by both age and sex. Further statistical analysis revealed the incompatibility of the data for a proposed parametric model. Hence the data was non-parametrically analysed. BRI was found to be identical for both sexes till the 2(nd) decade, and the BRI for males increased progressively 6(th) decade onwards. Four non-parametric models were postulated for construction of BRC: Gaussian kernel, double kernel, local mean and local constant, of which the last one generated the best-fitting curves. Clinical decision making should become easier and diagnostic implications of renal function tests should become more meaningful if this BRI is followed and the BRC is used as a desktop tool in conjunction with similar data for serum creatinine.
A Bayesian Nonparametric Approach to Test Equating
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2009-01-01
A Bayesian nonparametric model is introduced for score equating. It is applicable to all major equating designs, and has advantages over previous equating models. Unlike the previous models, the Bayesian model accounts for positive dependence between distributions of scores from two tests. The Bayesian model and the previous equating models are…
Reliability of Test Scores in Nonparametric Item Response Theory.
ERIC Educational Resources Information Center
Sijtsma, Klaas; Molenaar, Ivo W.
1987-01-01
Three methods for estimating reliability are studied within the context of nonparametric item response theory. Two were proposed originally by Mokken and a third is developed in this paper. Using a Monte Carlo strategy, these three estimation methods are compared with four "classical" lower bounds to reliability. (Author/JAZ)
A Simulation Comparison of Parametric and Nonparametric Dimensionality Detection Procedures
ERIC Educational Resources Information Center
Mroch, Andrew A.; Bolt, Daniel M.
2006-01-01
Recently, nonparametric methods have been proposed that provide a dimensionally based description of test structure for tests with dichotomous items. Because such methods are based on different notions of dimensionality than are assumed when using a psychometric model, it remains unclear whether these procedures might lead to a different…
Bayesian Unimodal Density Regression for Causal Inference
ERIC Educational Resources Information Center
Karabatsos, George; Walker, Stephen G.
2011-01-01
Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…
USDA-ARS?s Scientific Manuscript database
Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...
A Comparison of Methods for Nonparametric Estimation of Item Characteristic Curves for Binary Items
ERIC Educational Resources Information Center
Lee, Young-Sun
2007-01-01
This study compares the performance of three nonparametric item characteristic curve (ICC) estimation procedures: isotonic regression, smoothed isotonic regression, and kernel smoothing. Smoothed isotonic regression, employed along with an appropriate kernel function, provides better estimates and also satisfies the assumption of strict…
Order-Constrained Bayes Inference for Dichotomous Models of Unidimensional Nonparametric IRT
ERIC Educational Resources Information Center
Karabatsos, George; Sheu, Ching-Fan
2004-01-01
This study introduces an order-constrained Bayes inference framework useful for analyzing data containing dichotomous scored item responses, under the assumptions of either the monotone homogeneity model or the double monotonicity model of nonparametric item response theory (NIRT). The framework involves the implementation of Gibbs sampling to…
Nonparametric probability density estimation by optimization theoretic techniques
NASA Technical Reports Server (NTRS)
Scott, D. W.
1976-01-01
Two nonparametric probability density estimators are considered. The first is the kernel estimator. The problem of choosing the kernel scaling factor based solely on a random sample is addressed. An interactive mode is discussed and an algorithm proposed to choose the scaling factor automatically. The second nonparametric probability estimate uses penalty function techniques with the maximum likelihood criterion. A discrete maximum penalized likelihood estimator is proposed and is shown to be consistent in the mean square error. A numerical implementation technique for the discrete solution is discussed and examples displayed. An extensive simulation study compares the integrated mean square error of the discrete and kernel estimators. The robustness of the discrete estimator is demonstrated graphically.
Illiquidity premium and expected stock returns in the UK: A new approach
NASA Astrophysics Data System (ADS)
Chen, Jiaqi; Sherif, Mohamed
2016-09-01
This study examines the relative importance of liquidity risk for the time-series and cross-section of stock returns in the UK. We propose a simple way to capture the multidimensionality of illiquidity. Our analysis indicates that existing illiquidity measures have considerable asset specific components, which justifies our new approach. Further, we use an alternative test of the Amihud (2002) measure and parametric and non-parametric methods to investigate whether liquidity risk is priced in the UK. We find that the inclusion of the illiquidity factor in the capital asset pricing model plays a significant role in explaining the cross-sectional variation in stock returns, in particular with the Fama-French three-factor model. Further, using Hansen-Jagannathan non-parametric bounds, we find that the illiquidity-augmented capital asset pricing models yield a small distance error, other non-liquidity based models fail to yield economically plausible distance values. Our findings have important implications for managing the liquidity risk of equity portfolios.
Gender Wage Disparities among the Highly Educated.
Black, Dan A; Haviland, Amelia; Sanders, Seth G; Taylor, Lowell J
2008-01-01
In the U.S. college-educated women earn approximately 30 percent less than their non-Hispanic white male counterparts. We conduct an empirical examination of this wage disparity for four groups of women-non-Hispanic white, black, Hispanic, and Asian-using the National Survey of College Graduates, a large data set that provides unusually detailed information on higher-level education. Nonparametric matching analysis indicates that among men and women who speak English at home, between 44 and 73 percent of the gender wage gaps are accounted for by such pre-market factors as highest degree and major. When we restrict attention further to women who have "high labor force attachment" (i.e., work experience that is similar to male comparables) we account for 54 to 99 percent of gender wage gaps. Our nonparametric approach differs from familiar regression-based decompositions, so for the sake of comparison we conduct parametric analyses as well. Inferences drawn from these latter decompositions can be quite misleading.
Shi, J Q; Wang, B; Will, E J; West, R M
2012-11-20
We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.
Power calculation for comparing diagnostic accuracies in a multi-reader, multi-test design.
Kim, Eunhee; Zhang, Zheng; Wang, Youdan; Zeng, Donglin
2014-12-01
Receiver operating characteristic (ROC) analysis is widely used to evaluate the performance of diagnostic tests with continuous or ordinal responses. A popular study design for assessing the accuracy of diagnostic tests involves multiple readers interpreting multiple diagnostic test results, called the multi-reader, multi-test design. Although several different approaches to analyzing data from this design exist, few methods have discussed the sample size and power issues. In this article, we develop a power formula to compare the correlated areas under the ROC curves (AUC) in a multi-reader, multi-test design. We present a nonparametric approach to estimate and compare the correlated AUCs by extending DeLong et al.'s (1988, Biometrics 44, 837-845) approach. A power formula is derived based on the asymptotic distribution of the nonparametric AUCs. Simulation studies are conducted to demonstrate the performance of the proposed power formula and an example is provided to illustrate the proposed procedure. © 2014, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Houssein, Hend A. A.; Jaafar, M. S.; Ramli, R. M.; Ismail, N. E.; Ahmad, A. L.; Bermakai, M. Y.
2010-07-01
In this study, the subpopulations of human blood parameters including lymphocytes, the mid-cell fractions (eosinophils, basophils, and monocytes), and granulocytes were determined by electronic sizing in the Health Centre of Universiti Sains Malaysia. These parameters have been correlated with human blood characteristics such as age, gender, ethnicity, and blood types; before and after irradiation with 0.95 mW He-Ne laser (λ = 632.8 nm). The correlations were obtained by finding patterns, paired non-parametric tests, and an independent non-parametric tests using the SPSS version 11.5, centroid and peak positions, and flux variations. The findings show that the centroid and peak positions, flux peak and total flux, were very much correlated and can become a significant indicator for blood analyses. Furthermore, the encircled flux analysis demonstrated a good future prospect in blood research, thus leading the way as a vibrant diagnosis tool to clarify diseases associated with blood.
Gender Wage Disparities among the Highly Educated
Black, Dan A.; Haviland, Amelia; Sanders, Seth G.; Taylor, Lowell J.
2015-01-01
In the U.S. college-educated women earn approximately 30 percent less than their non-Hispanic white male counterparts. We conduct an empirical examination of this wage disparity for four groups of women—non-Hispanic white, black, Hispanic, and Asian—using the National Survey of College Graduates, a large data set that provides unusually detailed information on higher-level education. Nonparametric matching analysis indicates that among men and women who speak English at home, between 44 and 73 percent of the gender wage gaps are accounted for by such pre-market factors as highest degree and major. When we restrict attention further to women who have “high labor force attachment” (i.e., work experience that is similar to male comparables) we account for 54 to 99 percent of gender wage gaps. Our nonparametric approach differs from familiar regression-based decompositions, so for the sake of comparison we conduct parametric analyses as well. Inferences drawn from these latter decompositions can be quite misleading. PMID:26097255
Spectral decompositions of multiple time series: a Bayesian non-parametric approach.
Macaro, Christian; Prado, Raquel
2014-01-01
We consider spectral decompositions of multiple time series that arise in studies where the interest lies in assessing the influence of two or more factors. We write the spectral density of each time series as a sum of the spectral densities associated to the different levels of the factors. We then use Whittle's approximation to the likelihood function and follow a Bayesian non-parametric approach to obtain posterior inference on the spectral densities based on Bernstein-Dirichlet prior distributions. The prior is strategically important as it carries identifiability conditions for the models and allows us to quantify our degree of confidence in such conditions. A Markov chain Monte Carlo (MCMC) algorithm for posterior inference within this class of frequency-domain models is presented.We illustrate the approach by analyzing simulated and real data via spectral one-way and two-way models. In particular, we present an analysis of functional magnetic resonance imaging (fMRI) brain responses measured in individuals who participated in a designed experiment to study pain perception in humans.
Monitoring the Level of Students' GPAs over Time
ERIC Educational Resources Information Center
Bakir, Saad T.; McNeal, Bob
2010-01-01
A nonparametric (or distribution-free) statistical quality control chart is used to monitor the cumulative grade point averages (GPAs) of students over time. The chart is designed to detect any statistically significant positive or negative shifts in student GPAs from a desired target level. This nonparametric control chart is based on the…
ERIC Educational Resources Information Center
Kogar, Hakan
2018-01-01
The aim of the present research study was to compare the findings from the nonparametric MSA, DIMTEST and DETECT and the parametric dimensionality determining methods in various simulation conditions by utilizing exploratory and confirmatory methods. For this purpose, various simulation conditions were established based on number of dimensions,…
Three Classes of Nonparametric Differential Step Functioning Effect Estimators
ERIC Educational Resources Information Center
Penfield, Randall D.
2008-01-01
The examination of measurement invariance in polytomous items is complicated by the possibility that the magnitude and sign of lack of invariance may vary across the steps underlying the set of polytomous response options, a concept referred to as differential step functioning (DSF). This article describes three classes of nonparametric DSF effect…
A Nonparametric Framework for Comparing Trends and Gaps across Tests
ERIC Educational Resources Information Center
Ho, Andrew Dean
2009-01-01
Problems of scale typically arise when comparing test score trends, gaps, and gap trends across different tests. To overcome some of these difficulties, test score distributions on the same score scale can be represented by nonparametric graphs or statistics that are invariant under monotone scale transformations. This article motivates and then…
A Nonparametric K-Sample Test for Equality of Slopes.
ERIC Educational Resources Information Center
Penfield, Douglas A.; Koffler, Stephen L.
1986-01-01
The development of a nonparametric K-sample test for equality of slopes using Puri's generalized L statistic is presented. The test is recommended when the assumptions underlying the parametric model are violated. This procedure replaces original data with either ranks (for data with heavy tails) or normal scores (for data with light tails).…
A Note on the Assumption of Identical Distributions for Nonparametric Tests of Location
ERIC Educational Resources Information Center
Nordstokke, David W.; Colp, S. Mitchell
2018-01-01
Often, when testing for shift in location, researchers will utilize nonparametric statistical tests in place of their parametric counterparts when there is evidence or belief that the assumptions of the parametric test are not met (i.e., normally distributed dependent variables). An underlying and often unattended to assumption of nonparametric…
A New Nonparametric Levene Test for Equal Variances
ERIC Educational Resources Information Center
Nordstokke, David W.; Zumbo, Bruno D.
2010-01-01
Tests of the equality of variances are sometimes used on their own to compare variability across groups of experimental or non-experimental conditions but they are most often used alongside other methods to support assumptions made about variances. A new nonparametric test of equality of variances is described and compared to current "gold…
A Bayesian Beta-Mixture Model for Nonparametric IRT (BBM-IRT)
ERIC Educational Resources Information Center
Arenson, Ethan A.; Karabatsos, George
2017-01-01
Item response models typically assume that the item characteristic (step) curves follow a logistic or normal cumulative distribution function, which are strictly monotone functions of person test ability. Such assumptions can be overly-restrictive for real item response data. We propose a simple and more flexible Bayesian nonparametric IRT model…
The relationship of the concentration of air pollutants to wind direction has been determined by nonparametric regression using a Gaussian kernel. The results are smooth curves with error bars that allow for the accurate determination of the wind direction where the concentrat...
A Unifying Framework for Teaching Nonparametric Statistical Tests
ERIC Educational Resources Information Center
Bargagliotti, Anna E.; Orrison, Michael E.
2014-01-01
Increased importance is being placed on statistics at both the K-12 and undergraduate level. Research divulging effective methods to teach specific statistical concepts is still widely sought after. In this paper, we focus on best practices for teaching topics in nonparametric statistics at the undergraduate level. To motivate the work, we…
Teaching Nonparametric Statistics Using Student Instrumental Values.
ERIC Educational Resources Information Center
Anderson, Jonathan W.; Diddams, Margaret
Nonparametric statistics are often difficult to teach in introduction to statistics courses because of the lack of real-world examples. This study demonstrated how teachers can use differences in the rankings and ratings of undergraduate and graduate values to discuss: (1) ipsative and normative scaling; (2) uses of the Mann-Whitney U-test; and…
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Potency control of modified live viral vaccines for veterinary use.
Terpstra, C; Kroese, A H
1996-04-01
This paper reviews various aspects of efficacy, and methods for assaying the potency of modified live viral vaccines. The pros and cons of parametric versus non-parametric methods for analysis of potency assays are discussed and critical levels of protection, as determined by the target(s) of vaccination, are exemplified. Recommendations are presented for designing potency assays on master virus seeds and vaccine batches.
Potency control of modified live viral vaccines for veterinary use.
Terpstra, C; Kroese, A H
1996-01-01
This paper reviews various aspects of efficacy, and methods for assaying the potency of modified live viral vaccines. The pros and cons of parametric versus non-parametric methods for analysis of potency assays are discussed and critical levels of protection, as determined by the target(s) of vaccination, are exemplified. Recommendations are presented for designing potency assays on master virus seeds and vaccine batches.
ERIC Educational Resources Information Center
Aristovnik, Aleksander
2012-01-01
The purpose of the paper is to review some previous researches examining ICT efficiency and the impact of ICT on educational output/outcome as well as different conceptual and methodological issues related to performance measurement. Moreover, a definition, measurements and the empirical application of a model measuring the efficiency of ICT use…
Wang, Tianyu; Nabavi, Sheida
2018-04-24
Differential gene expression analysis is one of the significant efforts in single cell RNA sequencing (scRNAseq) analysis to discover the specific changes in expression levels of individual cell types. Since scRNAseq exhibits multimodality, large amounts of zero counts, and sparsity, it is different from the traditional bulk RNA sequencing (RNAseq) data. The new challenges of scRNAseq data promote the development of new methods for identifying differentially expressed (DE) genes. In this study, we proposed a new method, SigEMD, that combines a data imputation approach, a logistic regression model and a nonparametric method based on the Earth Mover's Distance, to precisely and efficiently identify DE genes in scRNAseq data. The regression model and data imputation are used to reduce the impact of large amounts of zero counts, and the nonparametric method is used to improve the sensitivity of detecting DE genes from multimodal scRNAseq data. By additionally employing gene interaction network information to adjust the final states of DE genes, we further reduce the false positives of calling DE genes. We used simulated datasets and real datasets to evaluate the detection accuracy of the proposed method and to compare its performance with those of other differential expression analysis methods. Results indicate that the proposed method has an overall powerful performance in terms of precision in detection, sensitivity, and specificity. Copyright © 2018 Elsevier Inc. All rights reserved.
Comparison of parametric and bootstrap method in bioequivalence test.
Ahn, Byung-Jin; Yim, Dong-Seok
2009-10-01
The estimation of 90% parametric confidence intervals (CIs) of mean AUC and Cmax ratios in bioequivalence (BE) tests are based upon the assumption that formulation effects in log-transformed data are normally distributed. To compare the parametric CIs with those obtained from nonparametric methods we performed repeated estimation of bootstrap-resampled datasets. The AUC and Cmax values from 3 archived datasets were used. BE tests on 1,000 resampled datasets from each archived dataset were performed using SAS (Enterprise Guide Ver.3). Bootstrap nonparametric 90% CIs of formulation effects were then compared with the parametric 90% CIs of the original datasets. The 90% CIs of formulation effects estimated from the 3 archived datasets were slightly different from nonparametric 90% CIs obtained from BE tests on resampled datasets. Histograms and density curves of formulation effects obtained from resampled datasets were similar to those of normal distribution. However, in 2 of 3 resampled log (AUC) datasets, the estimates of formulation effects did not follow the Gaussian distribution. Bias-corrected and accelerated (BCa) CIs, one of the nonparametric CIs of formulation effects, shifted outside the parametric 90% CIs of the archived datasets in these 2 non-normally distributed resampled log (AUC) datasets. Currently, the 80~125% rule based upon the parametric 90% CIs is widely accepted under the assumption of normally distributed formulation effects in log-transformed data. However, nonparametric CIs may be a better choice when data do not follow this assumption.
Comparison of Parametric and Bootstrap Method in Bioequivalence Test
Ahn, Byung-Jin
2009-01-01
The estimation of 90% parametric confidence intervals (CIs) of mean AUC and Cmax ratios in bioequivalence (BE) tests are based upon the assumption that formulation effects in log-transformed data are normally distributed. To compare the parametric CIs with those obtained from nonparametric methods we performed repeated estimation of bootstrap-resampled datasets. The AUC and Cmax values from 3 archived datasets were used. BE tests on 1,000 resampled datasets from each archived dataset were performed using SAS (Enterprise Guide Ver.3). Bootstrap nonparametric 90% CIs of formulation effects were then compared with the parametric 90% CIs of the original datasets. The 90% CIs of formulation effects estimated from the 3 archived datasets were slightly different from nonparametric 90% CIs obtained from BE tests on resampled datasets. Histograms and density curves of formulation effects obtained from resampled datasets were similar to those of normal distribution. However, in 2 of 3 resampled log (AUC) datasets, the estimates of formulation effects did not follow the Gaussian distribution. Bias-corrected and accelerated (BCa) CIs, one of the nonparametric CIs of formulation effects, shifted outside the parametric 90% CIs of the archived datasets in these 2 non-normally distributed resampled log (AUC) datasets. Currently, the 80~125% rule based upon the parametric 90% CIs is widely accepted under the assumption of normally distributed formulation effects in log-transformed data. However, nonparametric CIs may be a better choice when data do not follow this assumption. PMID:19915699
Bayard, David S.; Neely, Michael
2016-01-01
An experimental design approach is presented for individualized therapy in the special case where the prior information is specified by a nonparametric (NP) population model. Here, a nonparametric model refers to a discrete probability model characterized by a finite set of support points and their associated weights. An important question arises as to how to best design experiments for this type of model. Many experimental design methods are based on Fisher Information or other approaches originally developed for parametric models. While such approaches have been used with some success across various applications, it is interesting to note that they largely fail to address the fundamentally discrete nature of the nonparametric model. Specifically, the problem of identifying an individual from a nonparametric prior is more naturally treated as a problem of classification, i.e., to find a support point that best matches the patient’s behavior. This paper studies the discrete nature of the NP experiment design problem from a classification point of view. Several new insights are provided including the use of Bayes Risk as an information measure, and new alternative methods for experiment design. One particular method, denoted as MMopt (Multiple-Model Optimal), will be examined in detail and shown to require minimal computation while having distinct advantages compared to existing approaches. Several simulated examples, including a case study involving oral voriconazole in children, are given to demonstrate the usefulness of MMopt in pharmacokinetics applications. PMID:27909942
Kharroubi, Samer A; Brazier, John E; McGhee, Sarah
2013-01-01
This article reports on the findings from applying a recently described approach to modeling health state valuation data and the impact of the respondent characteristics on health state valuations. The approach applies a nonparametric model to estimate a Bayesian six-dimensional health state short form (derived from short-form 36 health survey) health state valuation algorithm. A sample of 197 states defined by the six-dimensional health state short form (derived from short-form 36 health survey)has been valued by a representative sample of the Hong Kong general population by using standard gamble. The article reports the application of the nonparametric model and compares it to the original model estimated by using a conventional parametric random effects model. The two models are compared theoretically and in terms of empirical performance. Advantages of the nonparametric model are that it can be used to predict scores in populations with different distributions of characteristics than observed in the survey sample and that it allows for the impact of respondent characteristics to vary by health state (while ensuring that full health passes through unity). The results suggest an important age effect with sex, having some effect, but the remaining covariates having no discernible effect. The nonparametric Bayesian model is argued to be more theoretically appropriate than previously used parametric models. Furthermore, it is more flexible to take into account the impact of covariates. Copyright © 2013, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc.
Total recognition discriminability in Huntington's and Alzheimer's disease.
Graves, Lisa V; Holden, Heather M; Delano-Wood, Lisa; Bondi, Mark W; Woods, Steven Paul; Corey-Bloom, Jody; Salmon, David P; Delis, Dean C; Gilbert, Paul E
2017-03-01
Both the original and second editions of the California Verbal Learning Test (CVLT) provide an index of total recognition discriminability (TRD) but respectively utilize nonparametric and parametric formulas to compute the index. However, the degree to which population differences in TRD may vary across applications of these nonparametric and parametric formulas has not been explored. We evaluated individuals with Huntington's disease (HD), individuals with Alzheimer's disease (AD), healthy middle-aged adults, and healthy older adults who were administered the CVLT-II. Yes/no recognition memory indices were generated, including raw nonparametric TRD scores (as used in CVLT-I) and raw and standardized parametric TRD scores (as used in CVLT-II), as well as false positive (FP) rates. Overall, the patient groups had significantly lower TRD scores than their comparison groups. The application of nonparametric and parametric formulas resulted in comparable effect sizes for all group comparisons on raw TRD scores. Relative to the HD group, the AD group showed comparable standardized parametric TRD scores (despite lower raw nonparametric and parametric TRD scores), whereas the previous CVLT literature has shown that standardized TRD scores are lower in AD than in HD. Possible explanations for the similarity in standardized parametric TRD scores in the HD and AD groups in the present study are discussed, with an emphasis on the importance of evaluating TRD scores in the context of other indices such as FP rates in an effort to fully capture recognition memory function using the CVLT-II.
New approaches to the analysis of population trends in land birds: Comment
Link, W.A.; Sauer, J.R.
1997-01-01
James et al. (1996, Ecology 77:13-27) used data from the North American Breeding Bird Survey (BBS) to examine geographic variability in patterns of population change for 26 species of wood warblers. They emphasized the importance of evaluating nonlinear patterns of change in bird populations, proposed LOESS-based non-parametric and semi-parametric analyses of BBS data, and contrasted their results with other analyses, including those of Robbins et al. (1989, Proceedings of the National Academy of Sciences 86: 7658-7662) and Peterjohn et al. (1995, Pages 3-39 in T. E. Martin and D. M. Finch, eds. Ecology and management of Neotropical migratory birds: a synthesis and review of critical issues. Oxford University Press, New York.). In this note, we briefly comment on some of the issues that arose from their analysis of BBS data, suggest a few aspects of the survey that should inspire caution in analysts, and review the differences between the LOESS-based procedures and other procedures (e.g., Link and Sauer 1994). We strongly discourage the use of James et al.'s completely non-parametric procedure, which fails to account for observer effects. Our comparisons of estimators adds to the evidence already present in the literature of the bias associated with omitting observer information in analyses of BBS data. Bias resulting from change in observer abilities should be a consideration in any analysis of BBS data.
The analysis of professional competencies of a lecturer in adult education.
Žeravíková, Iveta; Tirpáková, Anna; Markechová, Dagmar
2015-01-01
In this article, we present the andragogical research project and evaluation of its results using nonparametric statistical methods and the semantic differential method. The presented research was realized in the years 2012-2013 in the dissertation of I. Žeravíková: Analysis of professional competencies of lecturer and creating his competence profile (Žeravíková 2013), and its purpose was based on the analysis of work activities of a lecturer to identify his most important professional competencies and to create a suggestion of competence profile of a lecturer in adult education.
Testing the Hypothesis of a Homoscedastic Error Term in Simple, Nonparametric Regression
ERIC Educational Resources Information Center
Wilcox, Rand R.
2006-01-01
Consider the nonparametric regression model Y = m(X)+ [tau](X)[epsilon], where X and [epsilon] are independent random variables, [epsilon] has a median of zero and variance [sigma][squared], [tau] is some unknown function used to model heteroscedasticity, and m(X) is an unknown function reflecting some conditional measure of location associated…
ERIC Educational Resources Information Center
Sengul Avsar, Asiye; Tavsancil, Ezel
2017-01-01
This study analysed polytomous items' psychometric properties according to nonparametric item response theory (NIRT) models. Thus, simulated datasets--three different test lengths (10, 20 and 30 items), three sample distributions (normal, right and left skewed) and three samples sizes (100, 250 and 500)--were generated by conducting 20…
Investigation of a Nonparametric Procedure for Assessing Goodness-of-Fit in Item Response Theory
ERIC Educational Resources Information Center
Wells, Craig S.; Bolt, Daniel M.
2008-01-01
Tests of model misfit are often performed to validate the use of a particular model in item response theory. Douglas and Cohen (2001) introduced a general nonparametric approach for detecting misfit under the two-parameter logistic model. However, the statistical properties of their approach, and empirical comparisons to other methods, have not…
Joint Entropy Minimization for Learning in Nonparametric Framework
2006-06-09
Tibshirani, G. Sherlock , W. C. Chan, T. C. Greiner, D. D. Weisenburger, J. O. Armitage, R. Warnke, R. Levy, W. Wilson, M. R. Grever, J. C. Byrd, D. Botstein, P...Entropy Minimization for Learning in Nonparametric Framework 33 [11] D.L. Collins, A.P. Zijdenbos, J.G. Kollokian, N.J. Sled, C.J. Kabani, C.J. Holmes
How to Compare Parametric and Nonparametric Person-Fit Statistics Using Real Data
ERIC Educational Resources Information Center
Sinharay, Sandip
2017-01-01
Person-fit assessment (PFA) is concerned with uncovering atypical test performance as reflected in the pattern of scores on individual items on a test. Existing person-fit statistics (PFSs) include both parametric and nonparametric statistics. Comparison of PFSs has been a popular research topic in PFA, but almost all comparisons have employed…
A Comparative Study of Test Data Dimensionality Assessment Procedures Under Nonparametric IRT Models
ERIC Educational Resources Information Center
van Abswoude, Alexandra A. H.; van der Ark, L. Andries; Sijtsma, Klaas
2004-01-01
In this article, an overview of nonparametric item response theory methods for determining the dimensionality of item response data is provided. Four methods were considered: MSP, DETECT, HCA/CCPROX, and DIMTEST. First, the methods were compared theoretically. Second, a simulation study was done to compare the effectiveness of MSP, DETECT, and…
An entropy-based nonparametric test for the validation of surrogate endpoints.
Miao, Xiaopeng; Wang, Yong-Cheng; Gangopadhyay, Ashis
2012-06-30
We present a nonparametric test to validate surrogate endpoints based on measure of divergence and random permutation. This test is a proposal to directly verify the Prentice statistical definition of surrogacy. The test does not impose distributional assumptions on the endpoints, and it is robust to model misspecification. Our simulation study shows that the proposed nonparametric test outperforms the practical test of the Prentice criterion in terms of both robustness of size and power. We also evaluate the performance of three leading methods that attempt to quantify the effect of surrogate endpoints. The proposed method is applied to validate magnetic resonance imaging lesions as the surrogate endpoint for clinical relapses in a multiple sclerosis trial. Copyright © 2012 John Wiley & Sons, Ltd.
Nonparametric estimation of plant density by the distance method
Patil, S.A.; Burnham, K.P.; Kovner, J.L.
1979-01-01
A relation between the plant density and the probability density function of the nearest neighbor distance (squared) from a random point is established under fairly broad conditions. Based upon this relationship, a nonparametric estimator for the plant density is developed and presented in terms of order statistics. Consistency and asymptotic normality of the estimator are discussed. An interval estimator for the density is obtained. The modifications of this estimator and its variance are given when the distribution is truncated. Simulation results are presented for regular, random and aggregated populations to illustrate the nonparametric estimator and its variance. A numerical example from field data is given. Merits and deficiencies of the estimator are discussed with regard to its robustness and variance.
A bias-corrected estimator in multiple imputation for missing data.
Tomita, Hiroaki; Fujisawa, Hironori; Henmi, Masayuki
2018-05-29
Multiple imputation (MI) is one of the most popular methods to deal with missing data, and its use has been rapidly increasing in medical studies. Although MI is rather appealing in practice since it is possible to use ordinary statistical methods for a complete data set once the missing values are fully imputed, the method of imputation is still problematic. If the missing values are imputed from some parametric model, the validity of imputation is not necessarily ensured, and the final estimate for a parameter of interest can be biased unless the parametric model is correctly specified. Nonparametric methods have been also proposed for MI, but it is not so straightforward as to produce imputation values from nonparametrically estimated distributions. In this paper, we propose a new method for MI to obtain a consistent (or asymptotically unbiased) final estimate even if the imputation model is misspecified. The key idea is to use an imputation model from which the imputation values are easily produced and to make a proper correction in the likelihood function after the imputation by using the density ratio between the imputation model and the true conditional density function for the missing variable as a weight. Although the conditional density must be nonparametrically estimated, it is not used for the imputation. The performance of our method is evaluated by both theory and simulation studies. A real data analysis is also conducted to illustrate our method by using the Duke Cardiac Catheterization Coronary Artery Disease Diagnostic Dataset. Copyright © 2018 John Wiley & Sons, Ltd.
Researches of fruit quality prediction model based on near infrared spectrum
NASA Astrophysics Data System (ADS)
Shen, Yulin; Li, Lian
2018-04-01
With the improvement in standards for food quality and safety, people pay more attention to the internal quality of fruits, therefore the measurement of fruit internal quality is increasingly imperative. In general, nondestructive soluble solid content (SSC) and total acid content (TAC) analysis of fruits is vital and effective for quality measurement in global fresh produce markets, so in this paper, we aim at establishing a novel fruit internal quality prediction model based on SSC and TAC for Near Infrared Spectrum. Firstly, the model of fruit quality prediction based on PCA + BP neural network, PCA + GRNN network, PCA + BP adaboost strong classifier, PCA + ELM and PCA + LS_SVM classifier are designed and implemented respectively; then, in the NSCT domain, the median filter and the SavitzkyGolay filter are used to preprocess the spectral signal, Kennard-Stone algorithm is used to automatically select the training samples and test samples; thirdly, we achieve the optimal models by comparing 15 kinds of prediction model based on the theory of multi-classifier competition mechanism, specifically, the non-parametric estimation is introduced to measure the effectiveness of proposed model, the reliability and variance of nonparametric estimation evaluation of each prediction model to evaluate the prediction result, while the estimated value and confidence interval regard as a reference, the experimental results demonstrate that this model can better achieve the optimal evaluation of the internal quality of fruit; finally, we employ cat swarm optimization to optimize two optimal models above obtained from nonparametric estimation, empirical testing indicates that the proposed method can provide more accurate and effective results than other forecasting methods.
Statistical Models and Inference Procedures for Structural and Materials Reliability
1990-12-01
as an official Department of the Army positio~n, policy, or decision, unless sD designated by other documentazion. 12a. DISTRIBUTION /AVAILABILITY...Some general stress-strength models were also developed and applied to the failure of systems subject to cyclic loading. Involved in the failure of...process control ideas and sequential design and analysis methods. Finally, smooth nonparametric quantile .wJ function estimators were studied. All of
Nikita, Efthymia
2014-03-01
The current article explores whether the application of generalized linear models (GLM) and generalized estimating equations (GEE) can be used in place of conventional statistical analyses in the study of ordinal data that code an underlying continuous variable, like entheseal changes. The analysis of artificial data and ordinal data expressing entheseal changes in archaeological North African populations gave the following results. Parametric and nonparametric tests give convergent results particularly for P values <0.1, irrespective of whether the underlying variable is normally distributed or not under the condition that the samples involved in the tests exhibit approximately equal sizes. If this prerequisite is valid and provided that the samples are of equal variances, analysis of covariance may be adopted. GLM are not subject to constraints and give results that converge to those obtained from all nonparametric tests. Therefore, they can be used instead of traditional tests as they give the same amount of information as them, but with the advantage of allowing the study of the simultaneous impact of multiple predictors and their interactions and the modeling of the experimental data. However, GLM should be replaced by GEE for the study of bilateral asymmetry and in general when paired samples are tested, because GEE are appropriate for correlated data. Copyright © 2013 Wiley Periodicals, Inc.
Nonparametric Statistics Test Software Package.
1983-09-01
statis- tics because of their acceptance in the academic world, the availability of computer support, and flexibility in model builling. Nonparametric...25 I1l,lCELL WRITE(NCF,12 ) IvE (I ,RCCT(I) 122 FORMAT(IlXt 3(H5 9 1) IF( IeLT *NCELL) WRITE (NOF1123 J PARTV(I1J 123 FORMAT( Xll----’,FIo.3J 25 CONT
ERIC Educational Resources Information Center
Cui, Zhongmin; Kolen, Michael J.
2008-01-01
This article considers two methods of estimating standard errors of equipercentile equating: the parametric bootstrap method and the nonparametric bootstrap method. Using a simulation study, these two methods are compared under three sample sizes (300, 1,000, and 3,000), for two test content areas (the Iowa Tests of Basic Skills Maps and Diagrams…
ERIC Educational Resources Information Center
Maydeu-Olivares, Albert
2005-01-01
Chernyshenko, Stark, Chan, Drasgow, and Williams (2001) investigated the fit of Samejima's logistic graded model and Levine's non-parametric MFS model to the scales of two personality questionnaires and found that the graded model did not fit well. We attribute the poor fit of the graded model to small amounts of multidimensionality present in…
ERIC Educational Resources Information Center
Park, Jungkyu; Yu, Hsiu-Ting
2016-01-01
The multilevel latent class model (MLCM) is a multilevel extension of a latent class model (LCM) that is used to analyze nested structure data structure. The nonparametric version of an MLCM assumes a discrete latent variable at a higher-level nesting structure to account for the dependency among observations nested within a higher-level unit. In…
John Hof; Curtis Flather; Tony Baltic; Rudy King
2006-01-01
The 2005 Forest and Rangeland Condition Indicator Model is a set of classification trees for forest and rangeland condition indicators at the national scale. This report documents the development of the database and the nonparametric statistical estimation for this analytical structure, with emphasis on three special characteristics of condition indicator production...
An Assessment of the Nonparametric Approach for Evaluating the Fit of Item Response Models
ERIC Educational Resources Information Center
Liang, Tie; Wells, Craig S.; Hambleton, Ronald K.
2014-01-01
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting…
Robust estimation for ordinary differential equation models.
Cao, J; Wang, L; Xu, J
2011-12-01
Applied scientists often like to use ordinary differential equations (ODEs) to model complex dynamic processes that arise in biology, engineering, medicine, and many other areas. It is interesting but challenging to estimate ODE parameters from noisy data, especially when the data have some outliers. We propose a robust method to address this problem. The dynamic process is represented with a nonparametric function, which is a linear combination of basis functions. The nonparametric function is estimated by a robust penalized smoothing method. The penalty term is defined with the parametric ODE model, which controls the roughness of the nonparametric function and maintains the fidelity of the nonparametric function to the ODE model. The basis coefficients and ODE parameters are estimated in two nested levels of optimization. The coefficient estimates are treated as an implicit function of ODE parameters, which enables one to derive the analytic gradients for optimization using the implicit function theorem. Simulation studies show that the robust method gives satisfactory estimates for the ODE parameters from noisy data with outliers. The robust method is demonstrated by estimating a predator-prey ODE model from real ecological data. © 2011, The International Biometric Society.
Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection
NASA Technical Reports Server (NTRS)
Kumar, Sricharan; Srivistava, Ashok N.
2012-01-01
Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.
ERIC Educational Resources Information Center
St-Onge, Christina; Valois, Pierre; Abdous, Belkacem; Germain, Stephane
2009-01-01
To date, there have been no studies comparing parametric and nonparametric Item Characteristic Curve (ICC) estimation methods on the effectiveness of Person-Fit Statistics (PFS). The primary aim of this study was to determine if the use of ICCs estimated by nonparametric methods would increase the accuracy of item response theory-based PFS for…
ERIC Educational Resources Information Center
Wagstaff, David A.; Elek, Elvira; Kulis, Stephen; Marsiglia, Flavio
2009-01-01
A nonparametric bootstrap was used to obtain an interval estimate of Pearson's "r," and test the null hypothesis that there was no association between 5th grade students' positive substance use expectancies and their intentions to not use substances. The students were participating in a substance use prevention program in which the unit of…
A Nonparametric Approach for Assessing Goodness-of-Fit of IRT Models in a Mixed Format Test
ERIC Educational Resources Information Center
Liang, Tie; Wells, Craig S.
2015-01-01
Investigating the fit of a parametric model plays a vital role in validating an item response theory (IRT) model. An area that has received little attention is the assessment of multiple IRT models used in a mixed-format test. The present study extends the nonparametric approach, proposed by Douglas and Cohen (2001), to assess model fit of three…
Learning Circulant Sensing Kernels
2014-03-01
Furthermore, we test learning the circulant sensing matrix/operator and the nonparametric dictionary altogether and obtain even better performance. We...scale. Furthermore, we test learning the circulant sensing matrix/operator and the nonparametric dictionary altogether and obtain even better performance...matrices, Tropp et al.[28] de - scribes a random filter for acquiring a signal x̄; Haupt et al.[12] describes a channel estimation problem to identify a
Goodness-of-Fit Tests and Nonparametric Adaptive Estimation for Spike Train Analysis
2014-01-01
When dealing with classical spike train analysis, the practitioner often performs goodness-of-fit tests to test whether the observed process is a Poisson process, for instance, or if it obeys another type of probabilistic model (Yana et al. in Biophys. J. 46(3):323–330, 1984; Brown et al. in Neural Comput. 14(2):325–346, 2002; Pouzat and Chaffiol in Technical report, http://arxiv.org/abs/arXiv:0909.2785, 2009). In doing so, there is a fundamental plug-in step, where the parameters of the supposed underlying model are estimated. The aim of this article is to show that plug-in has sometimes very undesirable effects. We propose a new method based on subsampling to deal with those plug-in issues in the case of the Kolmogorov–Smirnov test of uniformity. The method relies on the plug-in of good estimates of the underlying model that have to be consistent with a controlled rate of convergence. Some nonparametric estimates satisfying those constraints in the Poisson or in the Hawkes framework are highlighted. Moreover, they share adaptive properties that are useful from a practical point of view. We show the performance of those methods on simulated data. We also provide a complete analysis with these tools on single unit activity recorded on a monkey during a sensory-motor task. Electronic Supplementary Material The online version of this article (doi:10.1186/2190-8567-4-3) contains supplementary material. PMID:24742008
Joshi, Ankur; Arutagi, Vishwanath; Nahar, Nitin; Tiwari, Sharad; Singh, Daneshwar; Sethia, Soumitra
2016-01-01
The informational continuity for a diabetic patient is of paramount importance. This study on a pilot basis explores the process utility of structured educational modular sessions grounded on the principle of near-peer mentoring. Visual modules were prepared for diabetic patients. These modules were instituted to 25 diabetic patients in logical sequences. In the next phase, 4 persons of these 25 patients were designated as diabetic-diabetes ongoing sustainable care and treatment (DOST). Each diabetic-DOST was clubbed with two patients for modular session and informational deliverance during the next 7 days. Process analysis was performed with "proxy-indicators," namely, monthly glycemic status, knowledge assessment scores, and quality of life. Data were analyzed by interval estimates and through nonparametric analysis. Nonparametric analysis indicated a significant improvement in glycemic status in terms with fasting blood sugar (W = 78 z = 3.04, P = 0.002), 2 h-postprandial blood sugar (W = 54, z = 2.01, P = 0.035), and in knowledge score (χ 2 = 19.53, df = 3; P = 0.0002). Quality of life score showed significant improvement in 2 out of 7 domains, namely, satisfaction with treatment ([difference in mean score = 1.40 [1.94 to 0.85]) and symptom botherness (difference in mean score = 0.98 [1.3-0.65]). Because of inherent methodological limitations and innate biases, at this juncture no conclusive statement can be drawn. Although, primitive process evidences indicate the promising role of the diabetic-DOST strategy.
Single molecule force spectroscopy at high data acquisition: A Bayesian nonparametric analysis
NASA Astrophysics Data System (ADS)
Sgouralis, Ioannis; Whitmore, Miles; Lapidus, Lisa; Comstock, Matthew J.; Pressé, Steve
2018-03-01
Bayesian nonparametrics (BNPs) are poised to have a deep impact in the analysis of single molecule data as they provide posterior probabilities over entire models consistent with the supplied data, not just model parameters of one preferred model. Thus they provide an elegant and rigorous solution to the difficult problem encountered when selecting an appropriate candidate model. Nevertheless, BNPs' flexibility to learn models and their associated parameters from experimental data is a double-edged sword. Most importantly, BNPs are prone to increasing the complexity of the estimated models due to artifactual features present in time traces. Thus, because of experimental challenges unique to single molecule methods, naive application of available BNP tools is not possible. Here we consider traces with time correlations and, as a specific example, we deal with force spectroscopy traces collected at high acquisition rates. While high acquisition rates are required in order to capture dwells in short-lived molecular states, in this setup, a slow response of the optical trap instrumentation (i.e., trapped beads, ambient fluid, and tethering handles) distorts the molecular signals introducing time correlations into the data that may be misinterpreted as true states by naive BNPs. Our adaptation of BNP tools explicitly takes into consideration these response dynamics, in addition to drift and noise, and makes unsupervised time series analysis of correlated single molecule force spectroscopy measurements possible, even at acquisition rates similar to or below the trap's response times.
NASA Technical Reports Server (NTRS)
Dasarathy, B. V.
1976-01-01
An algorithm is proposed for dimensionality reduction in the context of clustering techniques based on histogram analysis. The approach is based on an evaluation of the hills and valleys in the unidimensional histograms along the different features and provides an economical means of assessing the significance of the features in a nonparametric unsupervised data environment. The method has relevance to remote sensing applications.
Yoshimoto, Junichiro; Shimizu, Yu; Okada, Go; Takamura, Masahiro; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
We propose a novel method for multiple clustering, which is useful for analysis of high-dimensional data containing heterogeneous types of features. Our method is based on nonparametric Bayesian mixture models in which features are automatically partitioned (into views) for each clustering solution. This feature partition works as feature selection for a particular clustering solution, which screens out irrelevant features. To make our method applicable to high-dimensional data, a co-clustering structure is newly introduced for each view. Further, the outstanding novelty of our method is that we simultaneously model different distribution families, such as Gaussian, Poisson, and multinomial distributions in each cluster block, which widens areas of application to real data. We apply the proposed method to synthetic and real data, and show that our method outperforms other multiple clustering methods both in recovering true cluster structures and in computation time. Finally, we apply our method to a depression dataset with no true cluster structure available, from which useful inferences are drawn about possible clustering structures of the data. PMID:29049392
Diffeomorphic demons: efficient non-parametric image registration.
Vercauteren, Tom; Pennec, Xavier; Perchant, Aymeric; Ayache, Nicholas
2009-03-01
We propose an efficient non-parametric diffeomorphic image registration algorithm based on Thirion's demons algorithm. In the first part of this paper, we show that Thirion's demons algorithm can be seen as an optimization procedure on the entire space of displacement fields. We provide strong theoretical roots to the different variants of Thirion's demons algorithm. This analysis predicts a theoretical advantage for the symmetric forces variant of the demons algorithm. We show on controlled experiments that this advantage is confirmed in practice and yields a faster convergence. In the second part of this paper, we adapt the optimization procedure underlying the demons algorithm to a space of diffeomorphic transformations. In contrast to many diffeomorphic registration algorithms, our solution is computationally efficient since in practice it only replaces an addition of displacement fields by a few compositions. Our experiments show that in addition to being diffeomorphic, our algorithm provides results that are similar to the ones from the demons algorithm but with transformations that are much smoother and closer to the gold standard, available in controlled experiments, in terms of Jacobians.
Bayesian nonparametric clustering in phylogenetics: modeling antigenic evolution in influenza.
Cybis, Gabriela B; Sinsheimer, Janet S; Bedford, Trevor; Rambaut, Andrew; Lemey, Philippe; Suchard, Marc A
2018-01-30
Influenza is responsible for up to 500,000 deaths every year, and antigenic variability represents much of its epidemiological burden. To visualize antigenic differences across many viral strains, antigenic cartography methods use multidimensional scaling on binding assay data to map influenza antigenicity onto a low-dimensional space. Analysis of such assay data ideally leads to natural clustering of influenza strains of similar antigenicity that correlate with sequence evolution. To understand the dynamics of these antigenic groups, we present a framework that jointly models genetic and antigenic evolution by combining multidimensional scaling of binding assay data, Bayesian phylogenetic machinery and nonparametric clustering methods. We propose a phylogenetic Chinese restaurant process that extends the current process to incorporate the phylogenetic dependency structure between strains in the modeling of antigenic clusters. With this method, we are able to use the genetic information to better understand the evolution of antigenicity throughout epidemics, as shown in applications of this model to H1N1 influenza. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Temporal changes and variability in temperature series over Peninsular Malaysia
NASA Astrophysics Data System (ADS)
Suhaila, Jamaludin
2015-02-01
With the current concern over climate change, the descriptions on how temperature series changed over time are very useful. Annual mean temperature has been analyzed for several stations over Peninsular Malaysia. Non-parametric statistical techniques such as Mann-Kendall test and Theil-Sen slope estimation are used primarily for assessing the significance and detection of trends, while a nonparametric Pettitt's test and sequential Mann-Kendall test are adopted to detect any abrupt climate change. Statistically significance increasing trends for annual mean temperature are detected for almost all studied stations with the magnitude of significant trend varied from 0.02°C to 0.05°C per year. The results shows that climate over Peninsular Malaysia is getting warmer than before. In addition, the results of the abrupt changes in temperature using Pettitt's and sequential Mann-Kendall test reveal the beginning of trends which can be related to El Nino episodes that occur in Malaysia. In general, the analysis results can help local stakeholders and water managers to understand the risks and vulnerabilities related to climate change in terms of mean events in the region.
Multi-object segmentation using coupled nonparametric shape and relative pose priors
NASA Astrophysics Data System (ADS)
Uzunbas, Mustafa Gökhan; Soldea, Octavian; Çetin, Müjdat; Ünal, Gözde; Erçil, Aytül; Unay, Devrim; Ekin, Ahmet; Firat, Zeynep
2009-02-01
We present a new method for multi-object segmentation in a maximum a posteriori estimation framework. Our method is motivated by the observation that neighboring or coupling objects in images generate configurations and co-dependencies which could potentially aid in segmentation if properly exploited. Our approach employs coupled shape and inter-shape pose priors that are computed using training images in a nonparametric multi-variate kernel density estimation framework. The coupled shape prior is obtained by estimating the joint shape distribution of multiple objects and the inter-shape pose priors are modeled via standard moments. Based on such statistical models, we formulate an optimization problem for segmentation, which we solve by an algorithm based on active contours. Our technique provides significant improvements in the segmentation of weakly contrasted objects in a number of applications. In particular for medical image analysis, we use our method to extract brain Basal Ganglia structures, which are members of a complex multi-object system posing a challenging segmentation problem. We also apply our technique to the problem of handwritten character segmentation. Finally, we use our method to segment cars in urban scenes.
Comparison of Salmonella enteritidis phage types isolated from layers and humans in Belgium in 2005.
Welby, Sarah; Imberechts, Hein; Riocreux, Flavien; Bertrand, Sophie; Dierick, Katelijne; Wildemauwe, Christa; Hooyberghs, Jozef; Van der Stede, Yves
2011-08-01
The aim of this study was to investigate the available results for Belgium of the European Union coordinated monitoring program (2004/665 EC) on Salmonella in layers in 2005, as well as the results of the monthly outbreak reports of Salmonella Enteritidis in humans in 2005 to identify a possible statistical significant trend in both populations. Separate descriptive statistics and univariate analysis were carried out and the parametric and/or non-parametric hypothesis tests were conducted. A time cluster analysis was performed for all Salmonella Enteritidis phage types (PTs) isolated. The proportions of each Salmonella Enteritidis PT in layers and in humans were compared and the monthly distribution of the most common PT, isolated in both populations, was evaluated. The time cluster analysis revealed significant clusters during the months May and June for layers and May, July, August, and September for humans. PT21, the most frequently isolated PT in both populations in 2005, seemed to be responsible of these significant clusters. PT4 was the second most frequently isolated PT. No significant difference was found for the monthly trend evolution of both PT in both populations based on parametric and non-parametric methods. A similar monthly trend of PT distribution in humans and layers during the year 2005 was observed. The time cluster analysis and the statistical significance testing confirmed these results. Moreover, the time cluster analysis showed significant clusters during the summer time and slightly delayed in time (humans after layers). These results suggest a common link between the prevalence of Salmonella Enteritidis in layers and the occurrence of the pathogen in humans. Phage typing was confirmed to be a useful tool for identifying temporal trends.
McCarron, C Elizabeth; Pullenayegum, Eleanor M; Thabane, Lehana; Goeree, Ron; Tarride, Jean-Eric
2013-04-01
Bayesian methods have been proposed as a way of synthesizing all available evidence to inform decision making. However, few practical applications of the use of Bayesian methods for combining patient-level data (i.e., trial) with additional evidence (e.g., literature) exist in the cost-effectiveness literature. The objective of this study was to compare a Bayesian cost-effectiveness analysis using informative priors to a standard non-Bayesian nonparametric method to assess the impact of incorporating additional information into a cost-effectiveness analysis. Patient-level data from a previously published nonrandomized study were analyzed using traditional nonparametric bootstrap techniques and bivariate normal Bayesian models with vague and informative priors. Two different types of informative priors were considered to reflect different valuations of the additional evidence relative to the patient-level data (i.e., "face value" and "skeptical"). The impact of using different distributions and valuations was assessed in a sensitivity analysis. Models were compared in terms of incremental net monetary benefit (INMB) and cost-effectiveness acceptability frontiers (CEAFs). The bootstrapping and Bayesian analyses using vague priors provided similar results. The most pronounced impact of incorporating the informative priors was the increase in estimated life years in the control arm relative to what was observed in the patient-level data alone. Consequently, the incremental difference in life years originally observed in the patient-level data was reduced, and the INMB and CEAF changed accordingly. The results of this study demonstrate the potential impact and importance of incorporating additional information into an analysis of patient-level data, suggesting this could alter decisions as to whether a treatment should be adopted and whether more information should be acquired.
BROCCOLI: Software for fast fMRI analysis on many-core CPUs and GPUs
Eklund, Anders; Dufort, Paul; Villani, Mattias; LaConte, Stephen
2014-01-01
Analysis of functional magnetic resonance imaging (fMRI) data is becoming ever more computationally demanding as temporal and spatial resolutions improve, and large, publicly available data sets proliferate. Moreover, methodological improvements in the neuroimaging pipeline, such as non-linear spatial normalization, non-parametric permutation tests and Bayesian Markov Chain Monte Carlo approaches, can dramatically increase the computational burden. Despite these challenges, there do not yet exist any fMRI software packages which leverage inexpensive and powerful graphics processing units (GPUs) to perform these analyses. Here, we therefore present BROCCOLI, a free software package written in OpenCL (Open Computing Language) that can be used for parallel analysis of fMRI data on a large variety of hardware configurations. BROCCOLI has, for example, been tested with an Intel CPU, an Nvidia GPU, and an AMD GPU. These tests show that parallel processing of fMRI data can lead to significantly faster analysis pipelines. This speedup can be achieved on relatively standard hardware, but further, dramatic speed improvements require only a modest investment in GPU hardware. BROCCOLI (running on a GPU) can perform non-linear spatial normalization to a 1 mm3 brain template in 4–6 s, and run a second level permutation test with 10,000 permutations in about a minute. These non-parametric tests are generally more robust than their parametric counterparts, and can also enable more sophisticated analyses by estimating complicated null distributions. Additionally, BROCCOLI includes support for Bayesian first-level fMRI analysis using a Gibbs sampler. The new software is freely available under GNU GPL3 and can be downloaded from github (https://github.com/wanderine/BROCCOLI/). PMID:24672471
Comparing nonparametric Bayesian tree priors for clonal reconstruction of tumors.
Deshwar, Amit G; Vembu, Shankar; Morris, Quaid
2015-01-01
Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor…
The Probability of Exceedance as a Nonparametric Person-Fit Statistic for Tests of Moderate Length
ERIC Educational Resources Information Center
Tendeiro, Jorge N.; Meijer, Rob R.
2013-01-01
To classify an item score pattern as not fitting a nonparametric item response theory (NIRT) model, the probability of exceedance (PE) of an observed response vector x can be determined as the sum of the probabilities of all response vectors that are, at most, as likely as x, conditional on the test's total score. Vector x is to be considered…
Model-free quantification of dynamic PET data using nonparametric deconvolution
Zanderigo, Francesca; Parsey, Ramin V; Todd Ogden, R
2015-01-01
Dynamic positron emission tomography (PET) data are usually quantified using compartment models (CMs) or derived graphical approaches. Often, however, CMs either do not properly describe the tracer kinetics, or are not identifiable, leading to nonphysiologic estimates of the tracer binding. The PET data are modeled as the convolution of the metabolite-corrected input function and the tracer impulse response function (IRF) in the tissue. Using nonparametric deconvolution methods, it is possible to obtain model-free estimates of the IRF, from which functionals related to tracer volume of distribution and binding may be computed, but this approach has rarely been applied in PET. Here, we apply nonparametric deconvolution using singular value decomposition to simulated and test–retest clinical PET data with four reversible tracers well characterized by CMs ([11C]CUMI-101, [11C]DASB, [11C]PE2I, and [11C]WAY-100635), and systematically compare reproducibility, reliability, and identifiability of various IRF-derived functionals with that of traditional CMs outcomes. Results show that nonparametric deconvolution, completely free of any model assumptions, allows for estimates of tracer volume of distribution and binding that are very close to the estimates obtained with CMs and, in some cases, show better test–retest performance than CMs outcomes. PMID:25873427
Modeling seasonal variation of hip fracture in Montreal, Canada.
Modarres, Reza; Ouarda, Taha B M J; Vanasse, Alain; Orzanco, Maria Gabriela; Gosselin, Pierre
2012-04-01
The investigation of the association of the climate variables with hip fracture incidences is important in social health issues. This study examined and modeled the seasonal variation of monthly population based hip fracture rate (HFr) time series. The seasonal ARIMA time series modeling approach is used to model monthly HFr incidences time series of female and male patients of the ages 40-74 and 75+ of Montreal, Québec province, Canada, in the period of 1993-2004. The correlation coefficients between meteorological variables such as temperature, snow depth, rainfall depth and day length and HFr are significant. The nonparametric Mann-Kendall test for trend assessment and the nonparametric Levene's test and Wilcoxon's test for checking the difference of HFr before and after change point are also used. The seasonality in HFr indicated sharp difference between winter and summer time. The trend assessment showed decreasing trends in HFr of female and male groups. The nonparametric test also indicated a significant change of the mean HFr. A seasonal ARIMA model was applied for HFr time series without trend and a time trend ARIMA model (TT-ARIMA) was developed and fitted to HFr time series with a significant trend. The multi criteria evaluation showed the adequacy of SARIMA and TT-ARIMA models for modeling seasonal hip fracture time series with and without significant trend. In the time series analysis of HFr of the Montreal region, the effects of the seasonal variation of climate variables on hip fracture are clear. The Seasonal ARIMA model is useful for modeling HFr time series without trend. However, for time series with significant trend, the TT-ARIMA model should be applied for modeling HFr time series. Copyright © 2011 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Rogers, Jeffrey N.; Parrish, Christopher E.; Ward, Larry G.; Burdick, David M.
2018-03-01
Salt marsh vegetation tends to increase vertical uncertainty in light detection and ranging (lidar) derived elevation data, often causing the data to become ineffective for analysis of topographic features governing tidal inundation or vegetation zonation. Previous attempts at improving lidar data collected in salt marsh environments range from simply computing and subtracting the global elevation bias to more complex methods such as computing vegetation-specific, constant correction factors. The vegetation specific corrections can be used along with an existing habitat map to apply separate corrections to different areas within a study site. It is hypothesized here that correcting salt marsh lidar data by applying location-specific, point-by-point corrections, which are computed from lidar waveform-derived features, tidal-datum based elevation, distance from shoreline and other lidar digital elevation model based variables, using nonparametric regression will produce better results. The methods were developed and tested using full-waveform lidar and ground truth for three marshes in Cape Cod, Massachusetts, U.S.A. Five different model algorithms for nonparametric regression were evaluated, with TreeNet's stochastic gradient boosting algorithm consistently producing better regression and classification results. Additionally, models were constructed to predict the vegetative zone (high marsh and low marsh). The predictive modeling methods used in this study estimated ground elevation with a mean bias of 0.00 m and a standard deviation of 0.07 m (0.07 m root mean square error). These methods appear very promising for correction of salt marsh lidar data and, importantly, do not require an existing habitat map, biomass measurements, or image based remote sensing data such as multi/hyperspectral imagery.
Nonparametric identification of nonlinear dynamic systems using a synchronisation-based method
NASA Astrophysics Data System (ADS)
Kenderi, Gábor; Fidlin, Alexander
2014-12-01
The present study proposes an identification method for highly nonlinear mechanical systems that does not require a priori knowledge of the underlying nonlinearities to reconstruct arbitrary restoring force surfaces between degrees of freedom. This approach is based on the master-slave synchronisation between a dynamic model of the system as the slave and the real system as the master using measurements of the latter. As the model synchronises to the measurements, it becomes an observer of the real system. The optimal observer algorithm in a least-squares sense is given by the Kalman filter. Using the well-known state augmentation technique, the Kalman filter can be turned into a dual state and parameter estimator to identify parameters of a priori characterised nonlinearities. The paper proposes an extension of this technique towards nonparametric identification. A general system model is introduced by describing the restoring forces as bilateral spring-dampers with time-variant coefficients, which are estimated as augmented states. The estimation procedure is followed by an a posteriori statistical analysis to reconstruct noise-free restoring force characteristics using the estimated states and their estimated variances. Observability is provided using only one measured mechanical quantity per degree of freedom, which makes this approach less demanding in the number of necessary measurement signals compared with truly nonparametric solutions, which typically require displacement, velocity and acceleration signals. Additionally, due to the statistical rigour of the procedure, it successfully addresses signals corrupted by significant measurement noise. In the present paper, the method is described in detail, which is followed by numerical examples of one degree of freedom (1DoF) and 2DoF mechanical systems with strong nonlinearities of vibro-impact type to demonstrate the effectiveness of the proposed technique.
NASA Astrophysics Data System (ADS)
Velasco-Forero, Carlos A.; Sempere-Torres, Daniel; Cassiraga, Eduardo F.; Jaime Gómez-Hernández, J.
2009-07-01
Quantitative estimation of rainfall fields has been a crucial objective from early studies of the hydrological applications of weather radar. Previous studies have suggested that flow estimations are improved when radar and rain gauge data are combined to estimate input rainfall fields. This paper reports new research carried out in this field. Classical approaches for the selection and fitting of a theoretical correlogram (or semivariogram) model (needed to apply geostatistical estimators) are avoided in this study. Instead, a non-parametric technique based on FFT is used to obtain two-dimensional positive-definite correlograms directly from radar observations, dealing with both the natural anisotropy and the temporal variation of the spatial structure of the rainfall in the estimated fields. Because these correlation maps can be automatically obtained at each time step of a given rainfall event, this technique might easily be used in operational (real-time) applications. This paper describes the development of the non-parametric estimator exploiting the advantages of FFT for the automatic computation of correlograms and provides examples of its application on a case study using six rainfall events. This methodology is applied to three different alternatives to incorporate the radar information (as a secondary variable), and a comparison of performances is provided. In particular, their ability to reproduce in estimated rainfall fields (i) the rain gauge observations (in a cross-validation analysis) and (ii) the spatial patterns of radar fields are analyzed. Results seem to indicate that the methodology of kriging with external drift [KED], in combination with the technique of automatically computing 2-D spatial correlograms, provides merged rainfall fields with good agreement with rain gauges and with the most accurate approach to the spatial tendencies observed in the radar rainfall fields, when compared with other alternatives analyzed.
Ocampo-Duque, William; Osorio, Carolina; Piamba, Christian; Schuhmacher, Marta; Domingo, José L
2013-02-01
The integration of water quality monitoring variables is essential in environmental decision making. Nowadays, advanced techniques to manage subjectivity, imprecision, uncertainty, vagueness, and variability are required in such complex evaluation process. We here propose a probabilistic fuzzy hybrid model to assess river water quality. Fuzzy logic reasoning has been used to compute a water quality integrative index. By applying a Monte Carlo technique, based on non-parametric probability distributions, the randomness of model inputs was estimated. Annual histograms of nine water quality variables were built with monitoring data systematically collected in the Colombian Cauca River, and probability density estimations using the kernel smoothing method were applied to fit data. Several years were assessed, and river sectors upstream and downstream the city of Santiago de Cali, a big city with basic wastewater treatment and high industrial activity, were analyzed. The probabilistic fuzzy water quality index was able to explain the reduction in water quality, as the river receives a larger number of agriculture, domestic, and industrial effluents. The results of the hybrid model were compared to traditional water quality indexes. The main advantage of the proposed method is that it considers flexible boundaries between the linguistic qualifiers used to define the water status, being the belongingness of water quality to the diverse output fuzzy sets or classes provided with percentiles and histograms, which allows classify better the real water condition. The results of this study show that fuzzy inference systems integrated to stochastic non-parametric techniques may be used as complementary tools in water quality indexing methodologies. Copyright © 2012 Elsevier Ltd. All rights reserved.
An Exploratory Data Analysis System for Support in Medical Decision-Making
Copeland, J. A.; Hamel, B.; Bourne, J. R.
1979-01-01
An experimental system was developed to allow retrieval and analysis of data collected during a study of neurobehavioral correlates of renal disease. After retrieving data organized in a relational data base, simple bivariate statistics of parametric and nonparametric nature could be conducted. An “exploratory” mode in which the system provided guidance in selection of appropriate statistical analyses was also available to the user. The system traversed a decision tree using the inherent qualities of the data (e.g., the identity and number of patients, tests, and time epochs) to search for the appropriate analyses to employ.
Liver proteomics in progressive alcoholic steatosis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fernando, Harshica; Wiktorowicz, John E.; Soman, Kizhake V.
2013-02-01
Fatty liver is an early stage of alcoholic and nonalcoholic liver disease (ALD and NALD) that progresses to steatohepatitis and other irreversible conditions. In this study, we identified proteins that were differentially expressed in the livers of rats fed 5% ethanol in a Lieber–DeCarli diet daily for 1 and 3 months by discovery proteomics (two-dimensional gel electrophoresis and mass spectrometry) and non-parametric modeling (Multivariate Adaptive Regression Splines). Hepatic fatty infiltration was significantly higher in ethanol-fed animals as compared to controls, and more pronounced at 3 months of ethanol feeding. Discovery proteomics identified changes in the expression of proteins involved inmore » alcohol, lipid, and amino acid metabolism after ethanol feeding. At 1 and 3 months, 12 and 15 different proteins were differentially expressed. Of the identified proteins, down regulation of alcohol dehydrogenase (− 1.6) at 1 month and up regulation of aldehyde dehydrogenase (2.1) at 3 months could be a protective/adaptive mechanism against ethanol toxicity. In addition, betaine-homocysteine S-methyltransferase 2 a protein responsible for methionine metabolism and previously implicated in fatty liver development was significantly up regulated (1.4) at ethanol-induced fatty liver stage (1 month) while peroxiredoxin-1 was down regulated (− 1.5) at late fatty liver stage (3 months). Nonparametric analysis of the protein spots yielded fewer proteins and narrowed the list of possible markers and identified D-dopachrome tautomerase (− 1.7, at 3 months) as a possible marker for ethanol-induced early steatohepatitis. The observed differential regulation of proteins have potential to serve as biomarker signature for the detection of steatosis and its progression to steatohepatitis once validated in plasma/serum. -- Graphical abstract: The figure shows the Hierarchial cluster analysis of differentially expressed protein spots obtained after ethanol feeding for 1 (1–3) and 3 (4–6) months. C and E represent pair-fed control and ethanol-fed rats, respectively. Highlights: ► Proteins related to ethanol-induced steatosis and mild steatohepatitis are identified. ► ADH1C and ALDH2 involved in alcohol metabolism are differentially expressed at 1 and 3 months. ► Discovery proteomics identified a group of proteins to serve as potential biomarkers. ► Using nonparametric analysis DDT is identified as a possible marker for liver damage.« less
Collective feature selection to identify crucial epistatic variants.
Verma, Shefali S; Lucas, Anastasia; Zhang, Xinyuan; Veturi, Yogasudha; Dudek, Scott; Li, Binglan; Li, Ruowang; Urbanowicz, Ryan; Moore, Jason H; Kim, Dokyoon; Ritchie, Marylyn D
2018-01-01
Machine learning methods have gained popularity and practicality in identifying linear and non-linear effects of variants associated with complex disease/traits. Detection of epistatic interactions still remains a challenge due to the large number of features and relatively small sample size as input, thus leading to the so-called "short fat data" problem. The efficiency of machine learning methods can be increased by limiting the number of input features. Thus, it is very important to perform variable selection before searching for epistasis. Many methods have been evaluated and proposed to perform feature selection, but no single method works best in all scenarios. We demonstrate this by conducting two separate simulation analyses to evaluate the proposed collective feature selection approach. Through our simulation study we propose a collective feature selection approach to select features that are in the "union" of the best performing methods. We explored various parametric, non-parametric, and data mining approaches to perform feature selection. We choose our top performing methods to select the union of the resulting variables based on a user-defined percentage of variants selected from each method to take to downstream analysis. Our simulation analysis shows that non-parametric data mining approaches, such as MDR, may work best under one simulation criteria for the high effect size (penetrance) datasets, while non-parametric methods designed for feature selection, such as Ranger and Gradient boosting, work best under other simulation criteria. Thus, using a collective approach proves to be more beneficial for selecting variables with epistatic effects also in low effect size datasets and different genetic architectures. Following this, we applied our proposed collective feature selection approach to select the top 1% of variables to identify potential interacting variables associated with Body Mass Index (BMI) in ~ 44,000 samples obtained from Geisinger's MyCode Community Health Initiative (on behalf of DiscovEHR collaboration). In this study, we were able to show that selecting variables using a collective feature selection approach could help in selecting true positive epistatic variables more frequently than applying any single method for feature selection via simulation studies. We were able to demonstrate the effectiveness of collective feature selection along with a comparison of many methods in our simulation analysis. We also applied our method to identify non-linear networks associated with obesity.
McCullagh, Laura; Schmitz, Susanne; Barry, Michael; Walsh, Cathal
2017-11-01
In Ireland, all new drugs for which reimbursement by the healthcare payer is sought undergo a health technology assessment by the National Centre for Pharmacoeconomics. The National Centre for Pharmacoeconomics estimate expected value of perfect information but not partial expected value of perfect information (owing to computational expense associated with typical methodologies). The objective of this study was to examine the feasibility and utility of estimating partial expected value of perfect information via a computationally efficient, non-parametric regression approach. This was a retrospective analysis of evaluations on drugs for cancer that had been submitted to the National Centre for Pharmacoeconomics (January 2010 to December 2014 inclusive). Drugs were excluded if cost effective at the submitted price. Drugs were excluded if concerns existed regarding the validity of the applicants' submission or if cost-effectiveness model functionality did not allow required modifications to be made. For each included drug (n = 14), value of information was estimated at the final reimbursement price, at a threshold equivalent to the incremental cost-effectiveness ratio at that price. The expected value of perfect information was estimated from probabilistic analysis. Partial expected value of perfect information was estimated via a non-parametric approach. Input parameters with a population value at least €1 million were identified as potential targets for research. All partial estimates were determined within minutes. Thirty parameters (across nine models) each had a value of at least €1 million. These were categorised. Collectively, survival analysis parameters were valued at €19.32 million, health state utility parameters at €15.81 million and parameters associated with the cost of treating adverse effects at €6.64 million. Those associated with drug acquisition costs and with the cost of care were valued at €6.51 million and €5.71 million, respectively. This research demonstrates that the estimation of partial expected value of perfect information via this computationally inexpensive approach could be considered feasible as part of the health technology assessment process for reimbursement purposes within the Irish healthcare system. It might be a useful tool in prioritising future research to decrease decision uncertainty.
Neural network representation and learning of mappings and their derivatives
NASA Technical Reports Server (NTRS)
White, Halbert; Hornik, Kurt; Stinchcombe, Maxwell; Gallant, A. Ronald
1991-01-01
Discussed here are recent theorems proving that artificial neural networks are capable of approximating an arbitrary mapping and its derivatives as accurately as desired. This fact forms the basis for further results establishing the learnability of the desired approximations, using results from non-parametric statistics. These results have potential applications in robotics, chaotic dynamics, control, and sensitivity analysis. An example involving learning the transfer function and its derivatives for a chaotic map is discussed.
Xu, Yonghong; Gao, Xiaohuan; Wang, Zhengxi
2014-04-01
Missing data represent a general problem in many scientific fields, especially in medical survival analysis. Dealing with censored data, interpolation method is one of important methods. However, most of the interpolation methods replace the censored data with the exact data, which will distort the real distribution of the censored data and reduce the probability of the real data falling into the interpolation data. In order to solve this problem, we in this paper propose a nonparametric method of estimating the survival function of right-censored and interval-censored data and compare its performance to SC (self-consistent) algorithm. Comparing to the average interpolation and the nearest neighbor interpolation method, the proposed method in this paper replaces the right-censored data with the interval-censored data, and greatly improves the probability of the real data falling into imputation interval. Then it bases on the empirical distribution theory to estimate the survival function of right-censored and interval-censored data. The results of numerical examples and a real breast cancer data set demonstrated that the proposed method had higher accuracy and better robustness for the different proportion of the censored data. This paper provides a good method to compare the clinical treatments performance with estimation of the survival data of the patients. This pro vides some help to the medical survival data analysis.
Zhang, Qingyang
2018-05-16
Differential co-expression analysis, as a complement of differential expression analysis, offers significant insights into the changes in molecular mechanism of different phenotypes. A prevailing approach to detecting differentially co-expressed genes is to compare Pearson's correlation coefficients in two phenotypes. However, due to the limitations of Pearson's correlation measure, this approach lacks the power to detect nonlinear changes in gene co-expression which is common in gene regulatory networks. In this work, a new nonparametric procedure is proposed to search differentially co-expressed gene pairs in different phenotypes from large-scale data. Our computational pipeline consisted of two main steps, a screening step and a testing step. The screening step is to reduce the search space by filtering out all the independent gene pairs using distance correlation measure. In the testing step, we compare the gene co-expression patterns in different phenotypes by a recently developed edge-count test. Both steps are distribution-free and targeting nonlinear relations. We illustrate the promise of the new approach by analyzing the Cancer Genome Atlas data and the METABRIC data for breast cancer subtypes. Compared with some existing methods, the new method is more powerful in detecting nonlinear type of differential co-expressions. The distance correlation screening can greatly improve computational efficiency, facilitating its application to large data sets.
A Statistician's View of Upcoming Grand Challenges
NASA Astrophysics Data System (ADS)
Meng, Xiao Li
2010-01-01
In this session we have seen some snapshots of the broad spectrum of challenges, in this age of huge, complex, computer-intensive models, data, instruments,and questions. These challenges bridge astronomy at many wavelengths; basic physics; machine learning; -- and statistics. At one end of our spectrum, we think of 'compressing' the data with non-parametric methods. This raises the question of creating 'pseudo-replicas' of the data for uncertainty estimates. What would be involved in, e.g. boot-strap and related methods? Somewhere in the middle are these non-parametric methods for encapsulating the uncertainty information. At the far end, we find more model-based approaches, with the physics model embedded in the likelihood and analysis. The other distinctive problem is really the 'black-box' problem, where one has a complicated e.g. fundamental physics-based computer code, or 'black box', and one needs to know how changing the parameters at input -- due to uncertainties of any kind -- will map to changing the output. All of these connect to challenges in complexity of data and computation speed. Dr. Meng will highlight ways to 'cut corners' with advanced computational techniques, such as Parallel Tempering and Equal Energy methods. As well, there are cautionary tales of running automated analysis with real data -- where "30 sigma" outliers due to data artifacts can be more common than the astrophysical event of interest.
Nonparametric Bayesian inference for mean residual life functions in survival analysis.
Poynor, Valerie; Kottas, Athanasios
2018-01-19
Modeling and inference for survival analysis problems typically revolves around different functions related to the survival distribution. Here, we focus on the mean residual life (MRL) function, which provides the expected remaining lifetime given that a subject has survived (i.e. is event-free) up to a particular time. This function is of direct interest in reliability, medical, and actuarial fields. In addition to its practical interpretation, the MRL function characterizes the survival distribution. We develop general Bayesian nonparametric inference for MRL functions built from a Dirichlet process mixture model for the associated survival distribution. The resulting model for the MRL function admits a representation as a mixture of the kernel MRL functions with time-dependent mixture weights. This model structure allows for a wide range of shapes for the MRL function. Particular emphasis is placed on the selection of the mixture kernel, taken to be a gamma distribution, to obtain desirable properties for the MRL function arising from the mixture model. The inference method is illustrated with a data set of two experimental groups and a data set involving right censoring. The supplementary material available at Biostatistics online provides further results on empirical performance of the model, using simulated data examples. © The Author 2018. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A review of methods to estimate cause-specific mortality in presence of competing risks
Heisey, Dennis M.; Patterson, Brent R.
2006-01-01
Estimating cause-specific mortality is often of central importance for understanding the dynamics of wildlife populations. Despite such importance, methodology for estimating and analyzing cause-specific mortality has received little attention in wildlife ecology during the past 20 years. The issue of analyzing cause-specific, mutually exclusive events in time is not unique to wildlife. In fact, this general problem has received substantial attention in human biomedical applications within the context of biostatistical survival analysis. Here, we consider cause-specific mortality from a modern biostatistical perspective. This requires carefully defining what we mean by cause-specific mortality and then providing an appropriate hazard-based representation as a competing risks problem. This leads to the general solution of cause-specific mortality as the cumulative incidence function (CIF). We describe the appropriate generalization of the fully nonparametric staggered-entry Kaplan–Meier survival estimator to cause-specific mortality via the nonparametric CIF estimator (NPCIFE), which in many situations offers an attractive alternative to the Heisey–Fuller estimator. An advantage of the NPCIFE is that it lends itself readily to risk factors analysis with standard software for Cox proportional hazards model. The competing risks–based approach also clarifies issues regarding another intuitive but erroneous "cause-specific mortality" estimator based on the Kaplan–Meier survival estimator and commonly seen in the life sciences literature.
Nonparametric Bayesian models through probit stick-breaking processes
Rodríguez, Abel; Dunson, David B.
2013-01-01
We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology. PMID:24358072
Nonparametric Bayesian models through probit stick-breaking processes.
Rodríguez, Abel; Dunson, David B
2011-03-01
We describe a novel class of Bayesian nonparametric priors based on stick-breaking constructions where the weights of the process are constructed as probit transformations of normal random variables. We show that these priors are extremely flexible, allowing us to generate a great variety of models while preserving computational simplicity. Particular emphasis is placed on the construction of rich temporal and spatial processes, which are applied to two problems in finance and ecology.
Le Bihan, Nicolas; Margerin, Ludovic
2009-07-01
In this paper, we present a nonparametric method to estimate the heterogeneity of a random medium from the angular distribution of intensity of waves transmitted through a slab of random material. Our approach is based on the modeling of forward multiple scattering using compound Poisson processes on compact Lie groups. The estimation technique is validated through numerical simulations based on radiative transfer theory.
GEE-Smoothing Spline in Semiparametric Model with Correlated Nominal Data
NASA Astrophysics Data System (ADS)
Ibrahim, Noor Akma; Suliadi
2010-11-01
In this paper we propose GEE-Smoothing spline in the estimation of semiparametric models with correlated nominal data. The method can be seen as an extension of parametric generalized estimating equation to semiparametric models. The nonparametric component is estimated using smoothing spline specifically the natural cubic spline. We use profile algorithm in the estimation of both parametric and nonparametric components. The properties of the estimators are evaluated using simulation studies.
A linear programming approach to characterizing norm bounded uncertainty from experimental data
NASA Technical Reports Server (NTRS)
Scheid, R. E.; Bayard, D. S.; Yam, Y.
1991-01-01
The linear programming spectral overbounding and factorization (LPSOF) algorithm, an algorithm for finding a minimum phase transfer function of specified order whose magnitude tightly overbounds a specified nonparametric function of frequency, is introduced. This method has direct application to transforming nonparametric uncertainty bounds (available from system identification experiments) into parametric representations required for modern robust control design software (i.e., a minimum-phase transfer function multiplied by a norm-bounded perturbation).
A Bayesian Nonparametric Approach to Image Super-Resolution.
Polatkan, Gungor; Zhou, Mingyuan; Carin, Lawrence; Blei, David; Daubechies, Ingrid
2015-02-01
Super-resolution methods form high-resolution images from low-resolution images. In this paper, we develop a new Bayesian nonparametric model for super-resolution. Our method uses a beta-Bernoulli process to learn a set of recurring visual patterns, called dictionary elements, from the data. Because it is nonparametric, the number of elements found is also determined from the data. We test the results on both benchmark and natural images, comparing with several other models from the research literature. We perform large-scale human evaluation experiments to assess the visual quality of the results. In a first implementation, we use Gibbs sampling to approximate the posterior. However, this algorithm is not feasible for large-scale data. To circumvent this, we then develop an online variational Bayes (VB) algorithm. This algorithm finds high quality dictionaries in a fraction of the time needed by the Gibbs sampler.
A Powerful Test for Comparing Multiple Regression Functions.
Maity, Arnab
2012-09-01
In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).
Nonparametric instrumental regression with non-convex constraints
NASA Astrophysics Data System (ADS)
Grasmair, M.; Scherzer, O.; Vanhems, A.
2013-03-01
This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.
Tau-REx: A new look at the retrieval of exoplanetary atmospheres
NASA Astrophysics Data System (ADS)
Waldmann, Ingo
2014-11-01
The field of exoplanetary spectroscopy is as fast moving as it is new. With an increasing amount of space and ground based instruments obtaining data on a large set of extrasolar planets we are indeed entering the era of exoplanetary characterisation. Permanently at the edge of instrument feasibility, it is as important as it is difficult to find the most optimal and objective methodologies to analysing and interpreting current data. This is particularly true for smaller and fainter Earth and Super-Earth type planets.For low to mid signal to noise (SNR) observations, we are prone to two sources of biases: 1) Prior selection in the data reduction and analysis; 2) Prior constraints on the spectral retrieval. In Waldmann et al. (2013), Morello et al. (2014) and Waldmann (2012, 2014) we have shown a prior-free approach to data analysis based on non-parametric machine learning techniques. Following these approaches we will present a new take on the spectral retrieval of extrasolar planets. Tau-REx (tau-retrieval of exoplanets) is a new line-by-line, atmospheric retrieval framework. In the past the decision on what opacity sources go into an atmospheric model were usually user defined. Manual input can lead to model biases and poor convergence of the atmospheric model to the data. In Tau-REx we have set out to solve this. Through custom built pattern recognition software, Tau-REx is able to rapidly identify the most likely atmospheric opacities from a large number of possible absorbers/emitters (ExoMol or HiTran data bases) and non-parametrically constrain the prior space for the Bayesian retrieval. Unlike other (MCMC based) techniques, Tau-REx is able to fully integrate high-dimensional log-likelihood spaces and to calculate the full Bayesian Evidence of the atmospheric models. We achieve this through a combination of Nested Sampling and a high degree of code parallelisation. This allows for an exact and unbiased Bayesian model selection and a fully mapping of potential model-data degeneracies. Together with non-parametric data de-trending of exoplanetary spectra, we can reach an un- precedented level of objectivity in our atmospheric characterisation of these foreign worlds.
Spatial hydrological drought characteristics in Karkheh River basin, southwest Iran using copulas
NASA Astrophysics Data System (ADS)
Dodangeh, Esmaeel; Shahedi, Kaka; Shiau, Jenq-Tzong; MirAkbari, Maryam
2017-08-01
Investigation on drought characteristics such as severity, duration, and frequency is crucial for water resources planning and management in a river basin. While the methodology for multivariate drought frequency analysis is well established by applying the copulas, the estimation on the associated parameters by various parameter estimation methods and the effects on the obtained results have not yet been investigated. This research aims at conducting a comparative analysis between the maximum likelihood parametric and non-parametric method of the Kendall τ estimation method for copulas parameter estimation. The methods were employed to study joint severity-duration probability and recurrence intervals in Karkheh River basin (southwest Iran) which is facing severe water-deficit problems. Daily streamflow data at three hydrological gauging stations (Tang Sazbon, Huleilan and Polchehr) near the Karkheh dam were used to draw flow duration curves (FDC) of these three stations. The Q_{75} index extracted from the FDC were set as threshold level to abstract drought characteristics such as drought duration and severity on the basis of the run theory. Drought duration and severity were separately modeled using the univariate probabilistic distributions and gamma-GEV, LN2-exponential, and LN2-gamma were selected as the best paired drought severity-duration inputs for copulas according to the Akaike Information Criteria (AIC), Kolmogorov-Smirnov and chi-square tests. Archimedean Clayton, Frank, and extreme value Gumbel copulas were employed to construct joint cumulative distribution functions (JCDF) of droughts for each station. Frank copula at Tang Sazbon and Gumbel at Huleilan and Polchehr stations were identified as the best copulas based on the performance evaluation criteria including AIC, BIC, log-likelihood and root mean square error (RMSE) values. Based on the RMSE values, nonparametric Kendall-τ is preferred to the parametric maximum likelihood estimation method. The results showed greater drought return periods by the parametric ML method in comparison to the nonparametric Kendall τ estimation method. The results also showed that stations located in tributaries (Huleilan and Polchehr) have close return periods, while the station along the main river (Tang Sazbon) has the smaller return periods for the drought events with identical drought duration and severity.
Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stevens, Andrew J.; Pu, Yunchen; Sun, Yannan
We introduce new dictionary learning methods for tensor-variate data of any order. We represent each data item as a sum of Kruskal decomposed dictionary atoms within the framework of beta-process factor analysis (BPFA). Our model is nonparametric and can infer the tensor-rank of each dictionary atom. This Kruskal-Factor Analysis (KFA) is a natural generalization of BPFA. We also extend KFA to a deep convolutional setting and develop online learning methods. We test our approach on image processing and classification tasks achieving state of the art results for 2D & 3D inpainting and Caltech 101. The experiments also show that atom-rankmore » impacts both overcompleteness and sparsity.« less
2007-01-01
Background The US Food and Drug Administration approved the Charité artificial disc on October 26, 2004. This approval was based on an extensive analysis and review process; 20 years of disc usage worldwide; and the results of a prospective, randomized, controlled clinical trial that compared lumbar artificial disc replacement to fusion. The results of the investigational device exemption (IDE) study led to a conclusion that clinical outcomes following lumbar arthroplasty were at least as good as outcomes from fusion. Methods The author performed a new analysis of the Visual Analog Scale pain scores and the Oswestry Disability Index scores from the Charité artificial disc IDE study and used a nonparametric statistical test, because observed data distributions were not normal. The analysis included all of the enrolled subjects in both the nonrandomized and randomized phases of the study. Results Subjects from both the treatment and control groups improved from the baseline situation (P < .001) at all follow-up times (6 weeks to 24 months). Additionally, these pain and disability levels with artificial disc replacement were superior (P < .05) to the fusion treatment at all follow-up times including 2 years. Conclusions The a priori statistical plan for an IDE study may not adequately address the final distribution of the data. Therefore, statistical analyses more appropriate to the distribution may be necessary to develop meaningful statistical conclusions from the study. A nonparametric statistical analysis of the Charité artificial disc IDE outcomes scores demonstrates superiority for lumbar arthroplasty versus fusion at all follow-up time points to 24 months. PMID:25802574
Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, Lawrence
2010-11-09
Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data.
Non-parametric trend analysis of the aridity index for three large arid and semi-arid basins in Iran
NASA Astrophysics Data System (ADS)
Ahani, Hossien; Kherad, Mehrzad; Kousari, Mohammad Reza; van Roosmalen, Lieke; Aryanfar, Ramin; Hosseini, Seyyed Mashaallah
2013-05-01
Currently, an important scientific challenge that researchers are facing is to gain a better understanding of climate change at the regional scale, which can be especially challenging in an area with low and highly variable precipitation amounts such as Iran. Trend analysis of the medium-term change using ground station observations of meteorological variables can enhance our knowledge of the dominant processes in an area and contribute to the analysis of future climate projections. Generally, studies focus on the long-term variability of temperature and precipitation and to a lesser extent on other important parameters such as moisture indices. In this study the recent 50-year trends (1955-2005) of precipitation (P), potential evapotranspiration (PET), and aridity index (AI) in monthly time scale were studied over 14 synoptic stations in three large Iran basins using the Mann-Kendall non-parametric test. Additionally, an analysis of the monthly, seasonal and annual trend of each parameter was performed. Results showed no significant trends in the monthly time series. However, PET showed significant, mostly decreasing trends, for the seasonal values, which resulted in a significant negative trend in annual PET at five stations. Significant negative trends in seasonal P values were only found at a number of stations in spring and summer and no station showed significant negative trends in annual P. Due to the varied positive and negative trends in annual P and to a lesser extent PET, almost as many stations with negative as positive trends in annual AI were found, indicating that both drying and wetting trends occurred in Iran. Overall, the northern part of the study area showed an increasing trend in annual AI which meant that the region became wetter, while the south showed decreasing trends in AI.
Parametric vs. non-parametric statistics of low resolution electromagnetic tomography (LORETA).
Thatcher, R W; North, D; Biver, C
2005-01-01
This study compared the relative statistical sensitivity of non-parametric and parametric statistics of 3-dimensional current sources as estimated by the EEG inverse solution Low Resolution Electromagnetic Tomography (LORETA). One would expect approximately 5% false positives (classification of a normal as abnormal) at the P < .025 level of probability (two tailed test) and approximately 1% false positives at the P < .005 level. EEG digital samples (2 second intervals sampled 128 Hz, 1 to 2 minutes eyes closed) from 43 normal adult subjects were imported into the Key Institute's LORETA program. We then used the Key Institute's cross-spectrum and the Key Institute's LORETA output files (*.lor) as the 2,394 gray matter pixel representation of 3-dimensional currents at different frequencies. The mean and standard deviation *.lor files were computed for each of the 2,394 gray matter pixels for each of the 43 subjects. Tests of Gaussianity and different transforms were computed in order to best approximate a normal distribution for each frequency and gray matter pixel. The relative sensitivity of parametric vs. non-parametric statistics were compared using a "leave-one-out" cross validation method in which individual normal subjects were withdrawn and then statistically classified as being either normal or abnormal based on the remaining subjects. Log10 transforms approximated Gaussian distribution in the range of 95% to 99% accuracy. Parametric Z score tests at P < .05 cross-validation demonstrated an average misclassification rate of approximately 4.25%, and range over the 2,394 gray matter pixels was 27.66% to 0.11%. At P < .01 parametric Z score cross-validation false positives were 0.26% and ranged from 6.65% to 0% false positives. The non-parametric Key Institute's t-max statistic at P < .05 had an average misclassification error rate of 7.64% and ranged from 43.37% to 0.04% false positives. The nonparametric t-max at P < .01 had an average misclassification rate of 6.67% and ranged from 41.34% to 0% false positives of the 2,394 gray matter pixels for any cross-validated normal subject. In conclusion, adequate approximation to Gaussian distribution and high cross-validation can be achieved by the Key Institute's LORETA programs by using a log10 transform and parametric statistics, and parametric normative comparisons had lower false positive rates than the non-parametric tests.
Dickie, David Alexander; Job, Dominic E.; Gonzalez, David Rodriguez; Shenkin, Susan D.; Wardlaw, Joanna M.
2015-01-01
Introduction Neurodegenerative disease diagnoses may be supported by the comparison of an individual patient’s brain magnetic resonance image (MRI) with a voxel-based atlas of normal brain MRI. Most current brain MRI atlases are of young to middle-aged adults and parametric, e.g., mean ±standard deviation (SD); these atlases require data to be Gaussian. Brain MRI data, e.g., grey matter (GM) proportion images, from normal older subjects are apparently not Gaussian. We created a nonparametric and a parametric atlas of the normal limits of GM proportions in older subjects and compared their classifications of GM proportions in Alzheimer’s disease (AD) patients. Methods Using publicly available brain MRI from 138 normal subjects and 138 subjects diagnosed with AD (all 55–90 years), we created: a mean ±SD atlas to estimate parametrically the percentile ranks and limits of normal ageing GM; and, separately, a nonparametric, rank order-based GM atlas from the same normal ageing subjects. GM images from AD patients were then classified with respect to each atlas to determine the effect statistical distributions had on classifications of proportions of GM in AD patients. Results The parametric atlas often defined the lower normal limit of the proportion of GM to be negative (which does not make sense physiologically as the lowest possible proportion is zero). Because of this, for approximately half of the AD subjects, 25–45% of voxels were classified as normal when compared to the parametric atlas; but were classified as abnormal when compared to the nonparametric atlas. These voxels were mainly concentrated in the frontal and occipital lobes. Discussion To our knowledge, we have presented the first nonparametric brain MRI atlas. In conditions where there is increasing variability in brain structure, such as in old age, nonparametric brain MRI atlases may represent the limits of normal brain structure more accurately than parametric approaches. Therefore, we conclude that the statistical method used for construction of brain MRI atlases should be selected taking into account the population and aim under study. Parametric methods are generally robust for defining central tendencies, e.g., means, of brain structure. Nonparametric methods are advisable when studying the limits of brain structure in ageing and neurodegenerative disease. PMID:26023913
Optimum nonparametric estimation of population density based on ordered distances
Patil, S.A.; Kovner, J.L.; Burnham, Kenneth P.
1982-01-01
The asymptotic mean and error mean square are determined for the nonparametric estimator of plant density by distance sampling proposed by Patil, Burnham and Kovner (1979, Biometrics 35, 597-604. On the basis of these formulae, a bias-reduced version of this estimator is given, and its specific form is determined which gives minimum mean square error under varying assumptions about the true probability density function of the sampled data. Extension is given to line-transect sampling.
Lucyshyn, Joseph M; Fossett, Brenda; Bakeman, Roger; Cheremshynski, Christy; Miller, Lynn; Lohrmann, Sharon; Binnendyk, Lauren; Khan, Sophia; Chinn, Stephen; Kwon, Samantha; Irvin, Larry K
2015-12-01
The efficacy and consequential validity of an ecological approach to behavioral intervention with families of children with developmental disabilities was examined. The approach aimed to transform coercive into constructive parent-child interaction in family routines. Ten families participated, including 10 mothers and fathers and 10 children 3-8 years old with developmental disabilities. Thirty-six family routines were selected (2 to 4 per family). Dependent measures included child problem behavior, routine steps completed, and coercive and constructive parent-child interaction. For each family, a single case, multiple baseline design was employed with three phases: baseline, intervention, and follow-up. Visual analysis evaluated the functional relation between intervention and improvements in child behavior and routine participation. Nonparametric tests across families evaluated the statistical significance of these improvements. Sequential analyses within families and univariate analyses across families examined changes from baseline to intervention in the percentage and odds ratio of coercive and constructive parent-child interaction. Multiple baseline results documented functional or basic effects for 8 of 10 families. Nonparametric tests showed these changes to be significant. Follow-up showed durability at 11 to 24 months postintervention. Sequential analyses documented the transformation of coercive into constructive processes for 9 of 10 families. Univariate analyses across families showed significant improvements in 2- and 4-step coercive and constructive processes but not in odds ratio. Results offer evidence of the efficacy of the approach and consequential validity of the ecological unit of analysis, parent-child interaction in family routines. Future studies should improve efficiency, and outcomes for families experiencing family systems challenges.
Karathanasis, Nestoras; Tsamardinos, Ioannis
2016-01-01
Background The advance of omics technologies has made possible to measure several data modalities on a system of interest. In this work, we illustrate how the Non-Parametric Combination methodology, namely NPC, can be used for simultaneously assessing the association of different molecular quantities with an outcome of interest. We argue that NPC methods have several potential applications in integrating heterogeneous omics technologies, as for example identifying genes whose methylation and transcriptional levels are jointly deregulated, or finding proteins whose abundance shows the same trends of the expression of their encoding genes. Results We implemented the NPC methodology within “omicsNPC”, an R function specifically tailored for the characteristics of omics data. We compare omicsNPC against a range of alternative methods on simulated as well as on real data. Comparisons on simulated data point out that omicsNPC produces unbiased / calibrated p-values and performs equally or significantly better than the other methods included in the study; furthermore, the analysis of real data show that omicsNPC (a) exhibits higher statistical power than other methods, (b) it is easily applicable in a number of different scenarios, and (c) its results have improved biological interpretability. Conclusions The omicsNPC function competitively behaves in all comparisons conducted in this study. Taking into account that the method (i) requires minimal assumptions, (ii) it can be used on different studies designs and (iii) it captures the dependences among heterogeneous data modalities, omicsNPC provides a flexible and statistically powerful solution for the integrative analysis of different omics data. PMID:27812137
Zornoza-Moreno, Matilde; Fuentes-Hernández, Silvia; Sánchez-Solis, Manuel; Rol, María Ángeles; Larqué, Elvira; Madrid, Juan Antonio
2011-05-01
The authors developed a method useful for home measurement of temperature, activity, and sleep rhythms in infants under normal-living conditions during their first 6 mos of life. In addition, parametric and nonparametric tests for assessing circadian system maturation in these infants were compared. Anthropometric parameters plus ankle skin temperature and activity were evaluated in 10 infants by means of two data loggers, Termochron iButton (DS1291H, Maxim Integrated Products, Sunnyvale, CA) for temperature and HOBO Pendant G (Hobo Pendant G Acceleration, UA-004-64, Onset Computer Corporation, Bourne, MA) for motor activity, located in special baby socks specifically designed for the study. Skin temperature and motor activity were recorded over 3 consecutive days at 15 days, 1, 3, and 6 mos of age. Circadian rhythms of skin temperature and motor activity appeared at 3 mos in most babies. Mean skin temperature decreased significantly by 3 mos of life relative to previous measurements (p = .0001), whereas mean activity continued to increase during the first 6 mos. For most of the parameters analyzed, statistically significant changes occurred at 3-6 mos relative to 0.5-1 mo of age. Major differences were found using nonparametric tests. Intradaily variability in motor activity decreased significantly at 6 mos of age relative to previous measurements, and followed a similar trend for temperature; interdaily stability increased significantly at 6 mos of age relative to previous measurements for both variables; relative amplitude increased significantly at 6 mos for temperature and at 3 mos for activity, both with respect to previous measurements. A high degree of correlation was found between chronobiological parametric and nonparametric tests for mean and mesor and also for relative amplitude versus the cosinor-derived amplitude. However, the correlation between parametric and nonparametric equivalent indices (acrophase and midpoint of M5, interdaily stability and Rayleigh test, or intradaily variability and P(1)/P(ultradian)) despite being significant, was lower for both temperature and activity. The circadian function index (CFI index), based on the integrated variable temperature-activity, increased gradually with age and was statistically significant at 6 mos of age. At 6 mos, 90% of the infants' rest period coincided with the standard sleep period of their parents, defined from 23:00 to 07:00 h (dichotomic index I < O; when I < O = 100%, there is a complete coincidence between infant nocturnal rest period and the standard rest period), whereas at 15 days of life the coincidence was only 75%. The combination of thermometry and actimetry using data loggers placed in infants' socks is a reliable method for assessing both variables and also sleep rhythms in infants under ambulatory conditions, with minimal disturbance. Using this methodological approach, circadian rhythms of skin temperature and motor activity appeared by 3 mos in most babies. Nonparametric tests provided more reliable information than cosinor analysis for circadian rhythm assessment in infants.
The x-ray luminosity-redshift relationship of quasars
Segal, I. E.; Segal, W.
1980-01-01
Chronometric cosmology provides an excellent fit for the phenomenological x-ray luminosity-redshift relationship for 49 quasars observed by the Einstein satellite. Analysis of the data on the basis of the Friedmann cosmology leads to a correlation of absolute x-ray luminosity with redshift of >0.8, which is increased to ∼1 in the bright envelope. Although the trend might be ascribed a priori to an observational magnitude bias, it persists after nonparametric, maximum-likelihood removal of this bias. PMID:16592826
The Generalized Roy Model and the Cost-Benefit Analysis of Social Programs.
Eisenhauer, Philipp; Heckman, James J; Vytlacil, Edward
2015-04-01
The literature on treatment effects focuses on gross benefits from program participation. We extend this literature by developing conditions under which it is possible to identify parameters measuring the cost and net surplus from program participation. Using the generalized Roy model, we nonparametrically identify the cost, benefit, and net surplus of selection into treatment without requiring the analyst to have direct information on the cost. We apply our methodology to estimate the gross benefit and net surplus of attending college.
The Generalized Roy Model and the Cost-Benefit Analysis of Social Programs*
Eisenhauer, Philipp; Heckman, James J.; Vytlacil, Edward
2015-01-01
The literature on treatment effects focuses on gross benefits from program participation. We extend this literature by developing conditions under which it is possible to identify parameters measuring the cost and net surplus from program participation. Using the generalized Roy model, we nonparametrically identify the cost, benefit, and net surplus of selection into treatment without requiring the analyst to have direct information on the cost. We apply our methodology to estimate the gross benefit and net surplus of attending college. PMID:26709315
Considerations for monitoring raptor population trends based on counts of migrants
Titus, K.; Fuller, M.R.; Ruos, J.L.; Meyburg, B-U.; Chancellor, R.D.
1989-01-01
Various problems were identified with standardized hawk count data as annually collected at six sites. Some of the hawk lookouts increased their hours of observation from 1979-1985, thereby confounding the total counts. Data recording and missing data hamper coding of data and their use with modern analytical techniques. Coefficients of variation among years in counts averaged about 40%. The advantages and disadvantages of various analytical techniques are discussed including regression, non-parametric rank correlation trend analysis, and moving averages.
Dynamic characteristics of oxygen consumption.
Ye, Lin; Argha, Ahmadreza; Yu, Hairong; Celler, Branko G; Nguyen, Hung T; Su, Steven
2018-04-23
Previous studies have indicated that oxygen uptake ([Formula: see text]) is one of the most accurate indices for assessing the cardiorespiratory response to exercise. In most existing studies, the response of [Formula: see text] is often roughly modelled as a first-order system due to the inadequate stimulation and low signal to noise ratio. To overcome this difficulty, this paper proposes a novel nonparametric kernel-based method for the dynamic modelling of [Formula: see text] response to provide a more robust estimation. Twenty healthy non-athlete participants conducted treadmill exercises with monotonous stimulation (e.g., single step function as input). During the exercise, [Formula: see text] was measured and recorded by a popular portable gas analyser ([Formula: see text], COSMED). Based on the recorded data, a kernel-based estimation method was proposed to perform the nonparametric modelling of [Formula: see text]. For the proposed method, a properly selected kernel can represent the prior modelling information to reduce the dependence of comprehensive stimulations. Furthermore, due to the special elastic net formed by [Formula: see text] norm and kernelised [Formula: see text] norm, the estimations are smooth and concise. Additionally, the finite impulse response based nonparametric model which estimated by the proposed method can optimally select the order and fit better in terms of goodness-of-fit comparing to classical methods. Several kernels were introduced for the kernel-based [Formula: see text] modelling method. The results clearly indicated that the stable spline (SS) kernel has the best performance for [Formula: see text] modelling. Particularly, based on the experimental data from 20 participants, the estimated response from the proposed method with SS kernel was significantly better than the results from the benchmark method [i.e., prediction error method (PEM)] ([Formula: see text] vs [Formula: see text]). The proposed nonparametric modelling method is an effective method for the estimation of the impulse response of VO 2 -Speed system. Furthermore, the identified average nonparametric model method can dynamically predict [Formula: see text] response with acceptable accuracy during treadmill exercise.
Cerruela García, G; García-Pedrajas, N; Luque Ruiz, I; Gómez-Nieto, M Á
2018-03-01
This paper proposes a method for molecular activity prediction in QSAR studies using ensembles of classifiers constructed by means of two supervised subspace projection methods, namely nonparametric discriminant analysis (NDA) and hybrid discriminant analysis (HDA). We studied the performance of the proposed ensembles compared to classical ensemble methods using four molecular datasets and eight different models for the representation of the molecular structure. Using several measures and statistical tests for classifier comparison, we observe that our proposal improves the classification results with respect to classical ensemble methods. Therefore, we show that ensembles constructed using supervised subspace projections offer an effective way of creating classifiers in cheminformatics.
Feder, Paul I; Ma, Zhenxu J; Bull, Richard J; Teuschler, Linda K; Rice, Glenn
2009-01-01
In chemical mixtures risk assessment, the use of dose-response data developed for one mixture to estimate risk posed by a second mixture depends on whether the two mixtures are sufficiently similar. While evaluations of similarity may be made using qualitative judgments, this article uses nonparametric statistical methods based on the "bootstrap" resampling technique to address the question of similarity among mixtures of chemical disinfectant by-products (DBP) in drinking water. The bootstrap resampling technique is a general-purpose, computer-intensive approach to statistical inference that substitutes empirical sampling for theoretically based parametric mathematical modeling. Nonparametric, bootstrap-based inference involves fewer assumptions than parametric normal theory based inference. The bootstrap procedure is appropriate, at least in an asymptotic sense, whether or not the parametric, distributional assumptions hold, even approximately. The statistical analysis procedures in this article are initially illustrated with data from 5 water treatment plants (Schenck et al., 2009), and then extended using data developed from a study of 35 drinking-water utilities (U.S. EPA/AMWA, 1989), which permits inclusion of a greater number of water constituents and increased structure in the statistical models.
Kharroubi, Samer A
2017-10-06
Valuations of health state descriptors such as EQ-5D or SF6D have been conducted in different countries. There is a scope to make use of the results in one country as informative priors to help with the analysis of a study in another, for this to enable better estimation to be obtained in the new country than analyzing its data separately. Data from 2 EQ-5D valuation studies were analyzed using the time trade-off technique, where values for 42 health states were devised from representative samples of the UK and US populations. A Bayesian non-parametric approach has been applied to predict the health utilities of the US population, where the UK results were used as informative priors in the model to improve their estimation. The findings showed that employing additional information from the UK data helped in the production of US utility estimates much more precisely than would have been possible using the US study data alone. It is very plausible that this method would serve useful in countries where the conduction of large evaluation studies is not very feasible.
Ishwaran, Hemant; Lu, Min
2018-06-04
Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.
Lee, Kit-Hang; Fu, Denny K.C.; Leong, Martin C.W.; Chow, Marco; Fu, Hing-Choi; Althoefer, Kaspar; Sze, Kam Yim; Yeung, Chung-Kwong
2017-01-01
Abstract Bioinspired robotic structures comprising soft actuation units have attracted increasing research interest. Taking advantage of its inherent compliance, soft robots can assure safe interaction with external environments, provided that precise and effective manipulation could be achieved. Endoscopy is a typical application. However, previous model-based control approaches often require simplified geometric assumptions on the soft manipulator, but which could be very inaccurate in the presence of unmodeled external interaction forces. In this study, we propose a generic control framework based on nonparametric and online, as well as local, training to learn the inverse model directly, without prior knowledge of the robot's structural parameters. Detailed experimental evaluation was conducted on a soft robot prototype with control redundancy, performing trajectory tracking in dynamically constrained environments. Advanced element formulation of finite element analysis is employed to initialize the control policy, hence eliminating the need for random exploration in the robot's workspace. The proposed control framework enabled a soft fluid-driven continuum robot to follow a 3D trajectory precisely, even under dynamic external disturbance. Such enhanced control accuracy and adaptability would facilitate effective endoscopic navigation in complex and changing environments. PMID:29251567
Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics
Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven
2011-01-01
Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957
Lee, Kit-Hang; Fu, Denny K C; Leong, Martin C W; Chow, Marco; Fu, Hing-Choi; Althoefer, Kaspar; Sze, Kam Yim; Yeung, Chung-Kwong; Kwok, Ka-Wai
2017-12-01
Bioinspired robotic structures comprising soft actuation units have attracted increasing research interest. Taking advantage of its inherent compliance, soft robots can assure safe interaction with external environments, provided that precise and effective manipulation could be achieved. Endoscopy is a typical application. However, previous model-based control approaches often require simplified geometric assumptions on the soft manipulator, but which could be very inaccurate in the presence of unmodeled external interaction forces. In this study, we propose a generic control framework based on nonparametric and online, as well as local, training to learn the inverse model directly, without prior knowledge of the robot's structural parameters. Detailed experimental evaluation was conducted on a soft robot prototype with control redundancy, performing trajectory tracking in dynamically constrained environments. Advanced element formulation of finite element analysis is employed to initialize the control policy, hence eliminating the need for random exploration in the robot's workspace. The proposed control framework enabled a soft fluid-driven continuum robot to follow a 3D trajectory precisely, even under dynamic external disturbance. Such enhanced control accuracy and adaptability would facilitate effective endoscopic navigation in complex and changing environments.
Nonparametric Bayesian clustering to detect bipolar methylated genomic loci.
Wu, Xiaowei; Sun, Ming-An; Zhu, Hongxiao; Xie, Hehuang
2015-01-16
With recent development in sequencing technology, a large number of genome-wide DNA methylation studies have generated massive amounts of bisulfite sequencing data. The analysis of DNA methylation patterns helps researchers understand epigenetic regulatory mechanisms. Highly variable methylation patterns reflect stochastic fluctuations in DNA methylation, whereas well-structured methylation patterns imply deterministic methylation events. Among these methylation patterns, bipolar patterns are important as they may originate from allele-specific methylation (ASM) or cell-specific methylation (CSM). Utilizing nonparametric Bayesian clustering followed by hypothesis testing, we have developed a novel statistical approach to identify bipolar methylated genomic regions in bisulfite sequencing data. Simulation studies demonstrate that the proposed method achieves good performance in terms of specificity and sensitivity. We used the method to analyze data from mouse brain and human blood methylomes. The bipolar methylated segments detected are found highly consistent with the differentially methylated regions identified by using purified cell subsets. Bipolar DNA methylation often indicates epigenetic heterogeneity caused by ASM or CSM. With allele-specific events filtered out or appropriately taken into account, our proposed approach sheds light on the identification of cell-specific genes/pathways under strong epigenetic control in a heterogeneous cell population.
Robust location and spread measures for nonparametric probability density function estimation.
López-Rubio, Ezequiel
2009-10-01
Robustness against outliers is a desirable property of any unsupervised learning scheme. In particular, probability density estimators benefit from incorporating this feature. A possible strategy to achieve this goal is to substitute the sample mean and the sample covariance matrix by more robust location and spread estimators. Here we use the L1-median to develop a nonparametric probability density function (PDF) estimator. We prove its most relevant properties, and we show its performance in density estimation and classification applications.
Nonparametric Regression Subject to a Given Number of Local Extreme Value
2001-07-01
compilation report: ADP013708 thru ADP013761 UNCLASSIFIED Nonparametric regression subject to a given number of local extreme value Ali Majidi and Laurie...locations of the local extremes for the smoothing algorithm. 280 A. Majidi and L. Davies 3 The smoothing problem We make the smoothing problem precise...is the solution of QP3. k--oo 282 A. Majidi and L. Davies FiG. 2. The captions top-left, top-right, bottom-left, bottom-right show the result of the
Nonparametric and Semiparametric Regression Estimation for Length-biased Survival Data
Shen, Yu; Ning, Jing; Qin, Jing
2016-01-01
For the past several decades, nonparametric and semiparametric modeling for conventional right-censored survival data has been investigated intensively under a noninformative censoring mechanism. However, these methods may not be applicable for analyzing right-censored survival data that arise from prevalent cohorts when the failure times are subject to length-biased sampling. This review article is intended to provide a summary of some newly developed methods as well as established methods for analyzing length-biased data. PMID:27086362
Nonparametric tests for interaction and group differences in a two-way layout.
Fisher, A C; Wallenstein, S
1991-01-01
Nonparametric tests of group differences and interaction across strata are developed in which the null hypotheses for these tests are expressed as functions of rho i = P(X > Y) + 1/2P(X = Y), where X refers to a random observation from one group and Y refers to a random observation from the other group within stratum i. The estimator r of the parameter rho is shown to be a useful way to summarize and examine data for ordinal and continuous data.
2011-01-01
Background The identification of genes or quantitative trait loci that are expressed in response to different environmental factors such as temperature and light, through functional mapping, critically relies on precise modeling of the covariance structure. Previous work used separable parametric covariance structures, such as a Kronecker product of autoregressive one [AR(1)] matrices, that do not account for interaction effects of different environmental factors. Results We implement a more robust nonparametric covariance estimator to model these interactions within the framework of functional mapping of reaction norms to two signals. Our results from Monte Carlo simulations show that this estimator can be useful in modeling interactions that exist between two environmental signals. The interactions are simulated using nonseparable covariance models with spatio-temporal structural forms that mimic interaction effects. Conclusions The nonparametric covariance estimator has an advantage over separable parametric covariance estimators in the detection of QTL location, thus extending the breadth of use of functional mapping in practical settings. PMID:21269481
A comparative study of nonparametric methods for pattern recognition
NASA Technical Reports Server (NTRS)
Hahn, S. F.; Nelson, G. D.
1972-01-01
The applied research discussed in this report determines and compares the correct classification percentage of the nonparametric sign test, Wilcoxon's signed rank test, and K-class classifier with the performance of the Bayes classifier. The performance is determined for data which have Gaussian, Laplacian and Rayleigh probability density functions. The correct classification percentage is shown graphically for differences in modes and/or means of the probability density functions for four, eight and sixteen samples. The K-class classifier performed very well with respect to the other classifiers used. Since the K-class classifier is a nonparametric technique, it usually performed better than the Bayes classifier which assumes the data to be Gaussian even though it may not be. The K-class classifier has the advantage over the Bayes in that it works well with non-Gaussian data without having to determine the probability density function of the data. It should be noted that the data in this experiment was always unimodal.
On non-parametric maximum likelihood estimation of the bivariate survivor function.
Prentice, R L
The likelihood function for the bivariate survivor function F, under independent censorship, is maximized to obtain a non-parametric maximum likelihood estimator &Fcirc;. &Fcirc; may or may not be unique depending on the configuration of singly- and doubly-censored pairs. The likelihood function can be maximized by placing all mass on the grid formed by the uncensored failure times, or half lines beyond the failure time grid, or in the upper right quadrant beyond the grid. By accumulating the mass along lines (or regions) where the likelihood is flat, one obtains a partially maximized likelihood as a function of parameters that can be uniquely estimated. The score equations corresponding to these point mass parameters are derived, using a Lagrange multiplier technique to ensure unit total mass, and a modified Newton procedure is used to calculate the parameter estimates in some limited simulation studies. Some considerations for the further development of non-parametric bivariate survivor function estimators are briefly described.
Advanced Imaging Methods for Long-Baseline Optical Interferometry
NASA Astrophysics Data System (ADS)
Le Besnerais, G.; Lacour, S.; Mugnier, L. M.; Thiebaut, E.; Perrin, G.; Meimon, S.
2008-11-01
We address the data processing methods needed for imaging with a long baseline optical interferometer. We first describe parametric reconstruction approaches and adopt a general formulation of nonparametric image reconstruction as the solution of a constrained optimization problem. Within this framework, we present two recent reconstruction methods, Mira and Wisard, representative of the two generic approaches for dealing with the missing phase information. Mira is based on an implicit approach and a direct optimization of a Bayesian criterion while Wisard adopts a self-calibration approach and an alternate minimization scheme inspired from radio-astronomy. Both methods can handle various regularization criteria. We review commonly used regularization terms and introduce an original quadratic regularization called ldquosoft support constraintrdquo that favors the object compactness. It yields images of quality comparable to nonquadratic regularizations on the synthetic data we have processed. We then perform image reconstructions, both parametric and nonparametric, on astronomical data from the IOTA interferometer, and discuss the respective roles of parametric and nonparametric approaches for optical interferometric imaging.
Linkage mapping of beta 2 EEG waves via non-parametric regression.
Ghosh, Saurabh; Begleiter, Henri; Porjesz, Bernice; Chorlian, David B; Edenberg, Howard J; Foroud, Tatiana; Goate, Alison; Reich, Theodore
2003-04-01
Parametric linkage methods for analyzing quantitative trait loci are sensitive to violations in trait distributional assumptions. Non-parametric methods are relatively more robust. In this article, we modify the non-parametric regression procedure proposed by Ghosh and Majumder [2000: Am J Hum Genet 66:1046-1061] to map Beta 2 EEG waves using genome-wide data generated in the COGA project. Significant linkage findings are obtained on chromosomes 1, 4, 5, and 15 with findings at multiple regions on chromosomes 4 and 15. We analyze the data both with and without incorporating alcoholism as a covariate. We also test for epistatic interactions between regions of the genome exhibiting significant linkage with the EEG phenotypes and find evidence of epistatic interactions between a region each on chromosome 1 and chromosome 4 with one region on chromosome 15. While regressing out the effect of alcoholism does not affect the linkage findings, the epistatic interactions become statistically insignificant. Copyright 2003 Wiley-Liss, Inc.
Moss, Brian G; Yeaton, William H
2013-10-01
Annually, American colleges and universities provide developmental education (DE) to millions of underprepared students; however, evaluation estimates of DE benefits have been mixed. Using a prototypic exemplar of DE, our primary objective was to investigate the utility of a replicative evaluative framework for assessing program effectiveness. Within the context of the regression discontinuity (RD) design, this research examined the effectiveness of a DE program for five, sequential cohorts of first-time college students. Discontinuity estimates were generated for individual terms and cumulatively, across terms. Participants were 3,589 first-time community college students. DE program effects were measured by contrasting both college-level English grades and a dichotomous measure of pass/fail, for DE and non-DE students. Parametric and nonparametric estimates of overall effect were positive for continuous and dichotomous measures of achievement (grade and pass/fail). The variability of program effects over time was determined by tracking results within individual terms and cumulatively, across terms. Applying this replication strategy, DE's overall impact was modest (an effect size of approximately .20) but quite consistent, based on parametric and nonparametric estimation approaches. A meta-analysis of five RD results yielded virtually the same estimate as the overall, parametric findings. Subset analysis, though tentative, suggested that males benefited more than females, while academic gains were comparable for different ethnicities. The cumulative, within-study comparison, replication approach offers considerable potential for the evaluation of new and existing policies, particularly when effects are relatively small, as is often the case in applied settings.
A nonparametric test for Markovianity in the illness-death model.
Rodríguez-Girondo, Mar; de Uña-Álvarez, Jacobo
2012-12-30
Multistate models are useful tools for modeling disease progression when survival is the main outcome, but several intermediate events of interest are observed during the follow-up time. The illness-death model is a special multistate model with important applications in the biomedical literature. It provides a suitable representation of the individual's history when a unique intermediate event can be experienced before the main event of interest. Nonparametric estimation of transition probabilities in this and other multistate models is usually performed through the Aalen-Johansen estimator under a Markov assumption. The Markov assumption claims that given the present state, the future evolution of the illness is independent of the states previously visited and the transition times among them. However, this assumption fails in some applications, leading to inconsistent estimates. In this paper, we provide a new approach for testing Markovianity in the illness-death model. The new method is based on measuring the future-past association along time. This results in a detailed inspection of the process, which often reveals a non-Markovian behavior with different trends in the association measure. A test of significance for zero future-past association at each time point is introduced, and a significance trace is proposed accordingly. Besides, we propose a global test for Markovianity based on a supremum-type test statistic. The finite sample performance of the test is investigated through simulations. We illustrate the new method through the analysis of two biomedical data analysis. Copyright © 2012 John Wiley & Sons, Ltd.
A hybrid method in combining treatment effects from matched and unmatched studies.
Byun, Jinyoung; Lai, Dejian; Luo, Sheng; Risser, Jan; Tung, Betty; Hardy, Robert J
2013-12-10
The most common data structures in the biomedical studies have been matched or unmatched designs. Data structures resulting from a hybrid of the two may create challenges for statistical inferences. The question may arise whether to use parametric or nonparametric methods on the hybrid data structure. The Early Treatment for Retinopathy of Prematurity study was a multicenter clinical trial sponsored by the National Eye Institute. The design produced data requiring a statistical method of a hybrid nature. An infant in this multicenter randomized clinical trial had high-risk prethreshold retinopathy of prematurity that was eligible for treatment in one or both eyes at entry into the trial. During follow-up, recognition visual acuity was accessed for both eyes. Data from both eyes (matched) and from only one eye (unmatched) were eligible to be used in the trial. The new hybrid nonparametric method is a meta-analysis based on combining the Hodges-Lehmann estimates of treatment effects from the Wilcoxon signed rank and rank sum tests. To compare the new method, we used the classic meta-analysis with the t-test method to combine estimates of treatment effects from the paired and two sample t-tests. We used simulations to calculate the empirical size and power of the test statistics, as well as the bias, mean square and confidence interval width of the corresponding estimators. The proposed method provides an effective tool to evaluate data from clinical trials and similar comparative studies. Copyright © 2013 John Wiley & Sons, Ltd.
Mize, T W; Sundararaj, K P; Leite, R S; Huang, Y
2015-06-01
Both gingival tissue destruction and regeneration are associated with chronic periodontitis, although the former overwhelms the latter. Studies have shown that transforming growth factor beta 1 (TGF-β1), a growth factor largely involved in tissue regeneration and remodeling, is upregulated in chronic periodontitis. However, the gingival expression of connective tissue growth factor (CTGF or CCN2), a TGF-β1-upregulated gene, in patients with periodontitis remains undetermined. Although both CTGF/CCN2 and TGF-b1 increase the production of extracellular matrix, they have many different biological functions. Therefore, it is important to delineate the impact of periodontitis on gingival CTGF/CCN2 expression. Periodontal tissue specimens were collected from seven individuals without periodontitis (group 1) and from 14 with periodontitis (group 2). The expression of CTGF and TGFβ1 mRNAs were quantified using real-time PCR. Analysis using the nonparametric Mann-Whitney U-test showed that the levels of expression of both CTGF/CCN2 and TGFβ1 mRNAs were significantly increased in individuals with periodontitis compared with individuals without periodontitis. Furthermore, analysis using a nonparametric correlation (Spearman r) test showed a positive correlation between TGFβ1 and CTGF/CCN2 mRNAs. The gingival expression levels of CTGF/CCN2 and TGFβ1 mRNAs in individuals with periodontitis are upregulated and correlated. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Ramajo, Julián; Cordero, José Manuel; Márquez, Miguel Ángel
2017-10-01
This paper analyses region-level technical efficiency in nine European countries over the 1995-2007 period. We propose the application of a nonparametric conditional frontier approach to account for the presence of heterogeneous conditions in the form of geographical externalities. Such environmental factors are beyond the control of regional authorities, but may affect the production function. Therefore, they need to be considered in the frontier estimation. Specifically, a spatial autoregressive term is included as an external conditioning factor in a robust order- m model. Thus we can test the hypothesis of non-separability (the external factor impacts both the input-output space and the distribution of efficiencies), demonstrating the existence of significant global interregional spillovers into the production process. Our findings show that geographical externalities affect both the frontier level and the probability of being more or less efficient. Specifically, the results support the fact that the spatial lag variable has an inverted U-shaped non-linear impact on the performance of regions. This finding can be interpreted as a differential effect of interregional spillovers depending on the size of the neighboring economies: positive externalities for small values, possibly related to agglomeration economies, and negative externalities for high values, indicating the possibility of production congestion. Additionally, evidence of the existence of a strong geographic pattern of European regional efficiency is reported and the levels of technical efficiency are acknowledged to have converged during the period under analysis.
Chaibub Neto, Elias
2015-01-01
In this paper we propose a vectorized implementation of the non-parametric bootstrap for statistics based on sample moments. Basically, we adopt the multinomial sampling formulation of the non-parametric bootstrap, and compute bootstrap replications of sample moment statistics by simply weighting the observed data according to multinomial counts instead of evaluating the statistic on a resampled version of the observed data. Using this formulation we can generate a matrix of bootstrap weights and compute the entire vector of bootstrap replications with a few matrix multiplications. Vectorization is particularly important for matrix-oriented programming languages such as R, where matrix/vector calculations tend to be faster than scalar operations implemented in a loop. We illustrate the application of the vectorized implementation in real and simulated data sets, when bootstrapping Pearson’s sample correlation coefficient, and compared its performance against two state-of-the-art R implementations of the non-parametric bootstrap, as well as a straightforward one based on a for loop. Our investigations spanned varying sample sizes and number of bootstrap replications. The vectorized bootstrap compared favorably against the state-of-the-art implementations in all cases tested, and was remarkably/considerably faster for small/moderate sample sizes. The same results were observed in the comparison with the straightforward implementation, except for large sample sizes, where the vectorized bootstrap was slightly slower than the straightforward implementation due to increased time expenditures in the generation of weight matrices via multinomial sampling. PMID:26125965
Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.
2014-01-01
Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289
Oostenveld, Robert; Fries, Pascal; Maris, Eric; Schoffelen, Jan-Mathijs
2011-01-01
This paper describes FieldTrip, an open source software package that we developed for the analysis of MEG, EEG, and other electrophysiological data. The software is implemented as a MATLAB toolbox and includes a complete set of consistent and user-friendly high-level functions that allow experimental neuroscientists to analyze experimental data. It includes algorithms for simple and advanced analysis, such as time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. The implementation as toolbox allows the user to perform elaborate and structured analyses of large data sets using the MATLAB command line and batch scripting. Furthermore, users and developers can easily extend the functionality and implement new algorithms. The modular design facilitates the reuse in other software packages.
Genetic heterogeneity in Finnish hereditary prostate cancer using ordered subset analysis
Simpson, Claire L; Cropp, Cheryl D; Wahlfors, Tiina; George, Asha; Jones, MaryPat S; Harper, Ursula; Ponciano-Jackson, Damaris; Tammela, Teuvo; Schleutker, Johanna; Bailey-Wilson, Joan E
2013-01-01
Prostate cancer (PrCa) is the most common male cancer in developed countries and the second most common cause of cancer death after lung cancer. We recently reported a genome-wide linkage scan in 69 Finnish hereditary PrCa (HPC) families, which replicated the HPC9 locus on 17q21-q22 and identified a locus on 2q37. The aim of this study was to identify and to detect other loci linked to HPC. Here we used ordered subset analysis (OSA), conditioned on nonparametric linkage to these loci to detect other loci linked to HPC in subsets of families, but not the overall sample. We analyzed the families based on their evidence for linkage to chromosome 2, chromosome 17 and a maximum score using the strongest evidence of linkage from either of the two loci. Significant linkage to a 5-cM linkage interval with a peak OSA nonparametric allele-sharing LOD score of 4.876 on Xq26.3-q27 (ΔLOD=3.193, empirical P=0.009) was observed in a subset of 41 families weakly linked to 2q37, overlapping the HPCX1 locus. Two peaks that were novel to the analysis combining linkage evidence from both primary loci were identified; 18q12.1-q12.2 (OSA LOD=2.541, ΔLOD=1.651, P=0.03) and 22q11.1-q11.21 (OSA LOD=2.395, ΔLOD=2.36, P=0.006), which is close to HPC6. Using OSA allows us to find additional loci linked to HPC in subsets of families, and underlines the complex genetic heterogeneity of HPC even in highly aggregated families. PMID:22948022
Human Rights Event Detection from Heterogeneous Social Media Graphs.
Chen, Feng; Neill, Daniel B
2015-03-01
Human rights organizations are increasingly monitoring social media for identification, verification, and documentation of human rights violations. Since manual extraction of events from the massive amount of online social network data is difficult and time-consuming, we propose an approach for automated, large-scale discovery and analysis of human rights-related events. We apply our recently developed Non-Parametric Heterogeneous Graph Scan (NPHGS), which models social media data such as Twitter as a heterogeneous network (with multiple different node types, features, and relationships) and detects emerging patterns in the network, to identify and characterize human rights events. NPHGS efficiently maximizes a nonparametric scan statistic (an aggregate measure of anomalousness) over connected subgraphs of the heterogeneous network to identify the most anomalous network clusters. It summarizes each event with information such as type of event, geographical locations, time, and participants, and provides documentation such as links to videos and news reports. Building on our previous work that demonstrates the utility of NPHGS for civil unrest prediction and rare disease outbreak detection, we present an analysis of human rights events detected by NPHGS using two years of Twitter data from Mexico. NPHGS was able to accurately detect relevant clusters of human rights-related tweets prior to international news sources, and in some cases, prior to local news reports. Analysis of social media using NPHGS could enhance the information-gathering missions of human rights organizations by pinpointing specific abuses, revealing events and details that may be blocked from traditional media sources, and providing evidence of emerging patterns of human rights violations. This could lead to more timely, targeted, and effective advocacy, as well as other potential interventions.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
NASA Astrophysics Data System (ADS)
Bono, R. K.; Dare, M. S.; Tarduno, J. A.; Cottrell, R. D.
2016-12-01
Magnetic directions from coarse clastic rocks are typically highly scattered, to the point that the null hypothesis that they are drawn from a random distribution, using the iconic test of Watson (1956), cannot be rejected at a high confidence level (e.g. 95%). Here, we use an alternative approach of searching for directional clusters on a sphere. When applied to a new data set of directions from quartzites from the Jack Hills of Western Australia, we find evidence for distinct and meaningful magnetic directions at low (200 to 300 degrees C) and intermediate ( 350 to 450 degrees C) unblocking temperatures, whereas the test of Watson (1956) fails to draw a distinction from random distributions for the ensemble of directions at these unblocking temperature ranges. The robustness of the directional groups identified by the cluster analysis is confirmed by non-parametric resampling tests. The lowest unblocking temperature directional mode appears related to the present day field, perhaps contaminated by viscous magnetizations. The intermediate temperature magnetization matches an overprint recorded by the secondary mineral fuchsite (Cottrell et al., 2016) acquired at ca. 2.65 Ga. These data thus indicate that the Jack Hills carry an overprint at intermediate unblocking temperatures of Archean age. We find no evidence for a 1 Ga remagnetization. In general, the application of cluster analysis on a sphere, with directions confirmed by nonparametric tests, represents a new approach that should be applied when evaluating data with high dispersion, such as those that typically come from weak coarse-grained clastic sedimentary rocks, and/or rocks that have seen several tectonic events that could have imparted multiple magnetic overprints.
Meta-analysis of genome-wide linkage studies in BMI and obesity.
Saunders, Catherine L; Chiodini, Benedetta D; Sham, Pak; Lewis, Cathryn M; Abkevich, Victor; Adeyemo, Adebowale A; de Andrade, Mariza; Arya, Rector; Berenson, Gerald S; Blangero, John; Boehnke, Michael; Borecki, Ingrid B; Chagnon, Yvon C; Chen, Wei; Comuzzie, Anthony G; Deng, Hong-Wen; Duggirala, Ravindranath; Feitosa, Mary F; Froguel, Philippe; Hanson, Robert L; Hebebrand, Johannes; Huezo-Dias, Patricia; Kissebah, Ahmed H; Li, Weidong; Luke, Amy; Martin, Lisa J; Nash, Matthew; Ohman, Miina; Palmer, Lyle J; Peltonen, Leena; Perola, Markus; Price, R Arlen; Redline, Susan; Srinivasan, Sathanur R; Stern, Michael P; Stone, Steven; Stringham, Heather; Turner, Stephen; Wijmenga, Cisca; Collier, David A
2007-09-01
The objective was to provide an overall assessment of genetic linkage data of BMI and BMI-defined obesity using a nonparametric genome scan meta-analysis. We identified 37 published studies containing data on over 31,000 individuals from more than >10,000 families and obtained genome-wide logarithm of the odds (LOD) scores, non-parametric linkage (NPL) scores, or maximum likelihood scores (MLS). BMI was analyzed in a pooled set of all studies, as a subgroup of 10 studies that used BMI-defined obesity, and for subgroups ascertained through type 2 diabetes, hypertension, or subjects of European ancestry. Bins at chromosome 13q13.2- q33.1, 12q23-q24.3 achieved suggestive evidence of linkage to BMI in the pooled analysis and samples ascertained for hypertension. Nominal evidence of linkage to these regions and suggestive evidence for 11q13.3-22.3 were also observed for BMI-defined obesity. The FTO obesity gene locus at 16q12.2 also showed nominal evidence for linkage. However, overall distribution of summed rank p values <0.05 is not different from that expected by chance. The strongest evidence was obtained in the families ascertained for hypertension at 9q31.1-qter and 12p11.21-q23 (p < 0.01). Despite having substantial statistical power, we did not unequivocally implicate specific loci for BMI or obesity. This may be because genes influencing adiposity are of very small effect, with substantial genetic heterogeneity and variable dependence on environmental factors. However, the observation that the FTO gene maps to one of the highest ranking bins for obesity is interesting and, while not a validation of this approach, indicates that other potential loci identified in this study should be investigated further.
Marko, Nicholas F.; Weil, Robert J.
2012-01-01
Introduction Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. Methods We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Results Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Conclusions Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects. PMID:23118863
Simpson, Claire L.; Wojciechowski, Robert; Ibay, Grace; Stambolian, Dwight
2011-01-01
Purpose Despite many years of research, most of the genetic factors contributing to myopia development remain unknown. Genetic studies have pointed to a strong inherited component, but although many candidate regions have been implicated, few genes have been positively identified. Methods We have previously reported 2 genomewide linkage scans in a population of 63 highly aggregated Ashkenazi Jewish families that identified a locus on chromosome 22. Here we used ordered subset analysis (OSA), conditioned on non-parametric linkage to chromosome 22 to detect other chromosomal regions which had evidence of linkage to myopia in subsets of the families, but not the overall sample. Results Strong evidence of linkage to a 19-cM linkage interval with a peak OSA nonparametric allele-sharing logarithm-of-odds (LOD) score of 3.14 on 20p12-q11.1 (ΔLOD=2.39, empirical p=0.029) was identified in a subset of 20 families that also exhibited strong evidence of linkage to chromosome 22. One other locus also presented with suggestive LOD scores >2.0 on chromosome 11p14-q14 and one locus on chromosome 6q22-q24 had an OSA LOD score=1.76 (ΔLOD=1.65, empirical p=0.02). Conclusions The chromosome 6 and 20 loci are entirely novel and appear linked in a subset of families whose myopia is known to be linked to chromosome 22. The chromosome 11 locus overlaps with the known Myopia-7 (MYP7, OMIM 609256) locus. Using ordered subset analysis allows us to find additional loci linked to myopia in subsets of families, and underlines the complex genetic heterogeneity of myopia even in highly aggregated families and genetically isolated populations such as the Ashkenazi Jews. PMID:21738393
Bayesian Nonparametric Inference – Why and How
Müller, Peter; Mitra, Riten
2013-01-01
We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932
A Bayesian nonparametric approach to dynamical noise reduction
NASA Astrophysics Data System (ADS)
Kaloudis, Konstantinos; Hatjispyros, Spyridon J.
2018-06-01
We propose a Bayesian nonparametric approach for the noise reduction of a given chaotic time series contaminated by dynamical noise, based on Markov Chain Monte Carlo methods. The underlying unknown noise process (possibly) exhibits heavy tailed behavior. We introduce the Dynamic Noise Reduction Replicator model with which we reconstruct the unknown dynamic equations and in parallel we replicate the dynamics under reduced noise level dynamical perturbations. The dynamic noise reduction procedure is demonstrated specifically in the case of polynomial maps. Simulations based on synthetic time series are presented.
NASA Astrophysics Data System (ADS)
Karpenko, S. S.; Zybin, E. Yu; Kosyanchuk, V. V.
2018-02-01
In this paper we design a nonparametric method for failures detection and localization in the aircraft control system that uses the measurements of the control signals and the aircraft states only. It doesn’t require a priori information of the aircraft model parameters, training or statistical calculations, and is based on algebraic solvability conditions for the aircraft model identification problem. This makes it possible to significantly increase the efficiency of detection and localization problem solution by completely eliminating errors, associated with aircraft model uncertainties.
2016-05-31
and included explosives such as TATP, HMTD, RDX, RDX, ammonium nitrate , potassium perchlorate, potassium nitrate , sugar, and TNT. The approach...Distribution Unlimited UU UU UU UU 31-05-2016 15-Apr-2014 14-Jan-2015 Final Report: Technical Topic 3.2.2. d Bayesian and Non- parametric Statistics...of Papers published in non peer-reviewed journals: Final Report: Technical Topic 3.2.2. d Bayesian and Non-parametric Statistics: Integration of Neural
Donnelly, Aoife; Misstear, Bruce; Broderick, Brian
2011-02-15
Background concentrations of nitrogen dioxide (NO(2)) are not constant but vary temporally and spatially. The current paper presents a powerful tool for the quantification of the effects of wind direction and wind speed on background NO(2) concentrations, particularly in cases where monitoring data are limited. In contrast to previous studies which applied similar methods to sites directly affected by local pollution sources, the current study focuses on background sites with the aim of improving methods for predicting background concentrations adopted in air quality modelling studies. The relationship between measured NO(2) concentration in air at three such sites in Ireland and locally measured wind direction has been quantified using nonparametric regression methods. The major aim was to analyse a method for quantifying the effects of local wind direction on background levels of NO(2) in Ireland. The method was expanded to include wind speed as an added predictor variable. A Gaussian kernel function is used in the analysis and circular statistics employed for the wind direction variable. Wind direction and wind speed were both found to have a statistically significant effect on background levels of NO(2) at all three sites. Frequently environmental impact assessments are based on short term baseline monitoring producing a limited dataset. The presented non-parametric regression methods, in contrast to the frequently used methods such as binning of the data, allow concentrations for missing data pairs to be estimated and distinction between spurious and true peaks in concentrations to be made. The methods were found to provide a realistic estimation of long term concentration variation with wind direction and speed, even for cases where the data set is limited. Accurate identification of the actual variation at each location and causative factors could be made, thus supporting the improved definition of background concentrations for use in air quality modelling studies. Copyright © 2010 Elsevier B.V. All rights reserved.
Chen, Gang; Taylor, Paul A.; Shin, Yong-Wook; Reynolds, Richard C.; Cox, Robert W.
2016-01-01
It has been argued that naturalistic conditions in FMRI studies provide a useful paradigm for investigating perception and cognition through a synchronization measure, inter-subject correlation (ISC). However, one analytical stumbling block has been the fact that the ISC values associated with each single subject are not independent, and our previous paper (Chen et al., 2016) used simulations and analyses of real data to show that the methodologies adopted in the literature do not have the proper control for false positives. In the same paper, we proposed nonparametric subject-wise bootstrapping and permutation testing techniques for one and two groups, respectively, which account for the correlation structure, and these greatly outperformed the prior methods in controlling the false positive rate (FPR); that is, subject-wise bootstrapping (SWB) worked relatively well for both cases with one and two groups, and subject-wise permutation (SWP) testing was virtually ideal for group comparisons. Here we seek to explicate and adopt a parametric approach through linear mixed-effects (LME) modeling for studying the ISC values, building on the previous correlation framework, with the benefit that the LME platform offers wider adaptability, more powerful interpretations, and quality control checking capability than nonparametric methods. We describe both theoretical and practical issues involved in the modeling and the manner in which LME with crossed random effects (CRE) modeling is applied. A data-doubling step further allows us to conveniently track the subject index, and achieve easy implementations. We pit the LME approach against the best nonparametric methods, and find that the LME framework achieves proper control for false positives. The new LME methodologies are shown to be both efficient and robust, and they will be added as an additional option and settings in an existing open source program, 3dLME, in AFNI (http://afni.nimh.nih.gov). PMID:27751943
Tree-Structured Infinite Sparse Factor Model
Zhang, XianXing; Dunson, David B.; Carin, Lawrence
2013-01-01
A tree-structured multiplicative gamma process (TMGP) is developed, for inferring the depth of a tree-based factor-analysis model. This new model is coupled with the nested Chinese restaurant process, to nonparametrically infer the depth and width (structure) of the tree. In addition to developing the model, theoretical properties of the TMGP are addressed, and a novel MCMC sampler is developed. The structure of the inferred tree is used to learn relationships between high-dimensional data, and the model is also applied to compressive sensing and interpolation of incomplete images. PMID:25279389
Methods for scalar-on-function regression.
Reiss, Philip T; Goldsmith, Jeff; Shang, Han Lin; Ogden, R Todd
2017-08-01
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.
Evaluation of automobiles with alternative fuels utilizing multicriteria techniques
NASA Astrophysics Data System (ADS)
Brey, J. J.; Contreras, I.; Carazo, A. F.; Brey, R.; Hernández-Díaz, A. G.; Castro, A.
This work applies the non-parametric technique of Data Envelopment Analysis (DEA) to conduct a multicriteria comparison of some existing and under development technologies in the automotive sector. The results indicate that some of the technologies under development, such as hydrogen fuel cell vehicles, can be classified as efficient when evaluated in function of environmental and economic criteria, with greater importance being given to the environmental criteria. The article also demonstrates the need to improve the hydrogen-based technology, in comparison with the others, in aspects such as vehicle sale costs and fuel price.
1980-05-01
different cells of the histogram are recognisably those of a IGaussian distribution. On the other hand, at Q = 2 there is no way to decide whether the...levels. INote further that the first row (Q=8) contains the same data as in Table 6, but the cells are arranged in a different order. Consequent- Ily the...values of the more coarsely merged cells will be different at lower levels of Q. Now since we are dealing with nominal data the order of the bins is
Bayesian Nonparametric Statistical Inference for Shock Models and Wear Processes.
1979-12-01
Naval Research under Contract N00014-75-C-0781 and the National Science Foundation under Grant MCS78-01422 with the University of California...SUPPLEMENTARY NOTES Also supported by the National Science Foundation under Grant MCS78-01422. It. 96Y WORDS MOCa’t"u a’ iVWae" side if n*0eaem7 imW~ 149001 b Wek...Barlow and Proschan (1975), among others. The analogy of the shock model in risk and acturial analysis has been given by BUhlmann (1970, Chapter 2
Variable selection for marginal longitudinal generalized linear models.
Cantoni, Eva; Flemming, Joanna Mills; Ronchetti, Elvezio
2005-06-01
Variable selection is an essential part of any statistical analysis and yet has been somewhat neglected in the context of longitudinal data analysis. In this article, we propose a generalized version of Mallows's C(p) (GC(p)) suitable for use with both parametric and nonparametric models. GC(p) provides an estimate of a measure of model's adequacy for prediction. We examine its performance with popular marginal longitudinal models (fitted using GEE) and contrast results with what is typically done in practice: variable selection based on Wald-type or score-type tests. An application to real data further demonstrates the merits of our approach while at the same time emphasizing some important robust features inherent to GC(p).
Statistical methods for astronomical data with upper limits. II - Correlation and regression
NASA Technical Reports Server (NTRS)
Isobe, T.; Feigelson, E. D.; Nelson, P. I.
1986-01-01
Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
Examination of influential observations in penalized spline regression
NASA Astrophysics Data System (ADS)
Türkan, Semra
2013-10-01
In parametric or nonparametric regression models, the results of regression analysis are affected by some anomalous observations in the data set. Thus, detection of these observations is one of the major steps in regression analysis. These observations are precisely detected by well-known influence measures. Pena's statistic is one of them. In this study, Pena's approach is formulated for penalized spline regression in terms of ordinary residuals and leverages. The real data and artificial data are used to see illustrate the effectiveness of Pena's statistic as to Cook's distance on detecting influential observations. The results of the study clearly reveal that the proposed measure is superior to Cook's Distance to detect these observations in large data set.
Astrophysical data analysis with information field theory
NASA Astrophysics Data System (ADS)
Enßlin, Torsten
2014-12-01
Non-parametric imaging and data analysis in astrophysics and cosmology can be addressed by information field theory (IFT), a means of Bayesian, data based inference on spatially distributed signal fields. IFT is a statistical field theory, which permits the construction of optimal signal recovery algorithms. It exploits spatial correlations of the signal fields even for nonlinear and non-Gaussian signal inference problems. The alleviation of a perception threshold for recovering signals of unknown correlation structure by using IFT will be discussed in particular as well as a novel improvement on instrumental self-calibration schemes. IFT can be applied to many areas. Here, applications in in cosmology (cosmic microwave background, large-scale structure) and astrophysics (galactic magnetism, radio interferometry) are presented.
Two-sample tests and one-way MANOVA for multivariate biomarker data with nondetects.
Thulin, M
2016-09-10
Testing whether the mean vector of a multivariate set of biomarkers differs between several populations is an increasingly common problem in medical research. Biomarker data is often left censored because some measurements fall below the laboratory's detection limit. We investigate how such censoring affects multivariate two-sample and one-way multivariate analysis of variance tests. Type I error rates, power and robustness to increasing censoring are studied, under both normality and non-normality. Parametric tests are found to perform better than non-parametric alternatives, indicating that the current recommendations for analysis of censored multivariate data may have to be revised. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
A Study of Specific Fracture Energy at Percussion Drilling
NASA Astrophysics Data System (ADS)
A, Shadrina; T, Kabanova; V, Krets; L, Saruev
2014-08-01
The paper presents experimental studies of rock failure provided by percussion drilling. Quantification and qualitative analysis were carried out to estimate critical values of rock failure depending on the hammer pre-impact velocity, types of drill bits and cylindrical hammer parameters (weight, length, diameter), and turn angle of a drill bit. Obtained data in this work were compared with obtained results by other researchers. The particle-size distribution in granite-cutting sludge was analyzed in this paper. Statistical approach (Spearmen's rank-order correlation, multiple regression analysis with dummy variables, Kruskal-Wallis nonparametric test) was used to analyze the drilling process. Experimental data will be useful for specialists engaged in simulation and illustration of rock failure.
Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea
2016-01-01
In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used to characterize neuroanatomical alterations in individual subjects as long as non-parametric statistics are employed.
Staid, Andrea; Watson, Jean -Paul; Wets, Roger J. -B.; ...
2017-07-11
Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power timemore » series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Staid, Andrea; Watson, Jean -Paul; Wets, Roger J. -B.
Forecasts of available wind power are critical in key electric power systems operations planning problems, including economic dispatch and unit commitment. Such forecasts are necessarily uncertain, limiting the reliability and cost effectiveness of operations planning models based on a single deterministic or “point” forecast. A common approach to address this limitation involves the use of a number of probabilistic scenarios, each specifying a possible trajectory of wind power production, with associated probability. We present and analyze a novel method for generating probabilistic wind power scenarios, leveraging available historical information in the form of forecasted and corresponding observed wind power timemore » series. We estimate non-parametric forecast error densities, specifically using epi-spline basis functions, allowing us to capture the skewed and non-parametric nature of error densities observed in real-world data. We then describe a method to generate probabilistic scenarios from these basis functions that allows users to control for the degree to which extreme errors are captured.We compare the performance of our approach to the current state-of-the-art considering publicly available data associated with the Bonneville Power Administration, analyzing aggregate production of a number of wind farms over a large geographic region. Finally, we discuss the advantages of our approach in the context of specific power systems operations planning problems: stochastic unit commitment and economic dispatch. Here, our methodology is embodied in the joint Sandia – University of California Davis Prescient software package for assessing and analyzing stochastic operations strategies.« less
Advanced statistical methods for improved data analysis of NASA astrophysics missions
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.
1992-01-01
The investigators under this grant studied ways to improve the statistical analysis of astronomical data. They looked at existing techniques, the development of new techniques, and the production and distribution of specialized software to the astronomical community. Abstracts of nine papers that were produced are included, as well as brief descriptions of four software packages. The articles that are abstracted discuss analytical and Monte Carlo comparisons of six different linear least squares fits, a (second) paper on linear regression in astronomy, two reviews of public domain software for the astronomer, subsample and half-sample methods for estimating sampling distributions, a nonparametric estimation of survival functions under dependent competing risks, censoring in astronomical data due to nondetections, an astronomy survival analysis computer package called ASURV, and improving the statistical methodology of astronomical data analysis.
Transformation-invariant and nonparametric monotone smooth estimation of ROC curves.
Du, Pang; Tang, Liansheng
2009-01-30
When a new diagnostic test is developed, it is of interest to evaluate its accuracy in distinguishing diseased subjects from non-diseased subjects. The accuracy of the test is often evaluated by receiver operating characteristic (ROC) curves. Smooth ROC estimates are often preferable for continuous test results when the underlying ROC curves are in fact continuous. Nonparametric and parametric methods have been proposed by various authors to obtain smooth ROC curve estimates. However, there are certain drawbacks with the existing methods. Parametric methods need specific model assumptions. Nonparametric methods do not always satisfy the inherent properties of the ROC curves, such as monotonicity and transformation invariance. In this paper we propose a monotone spline approach to obtain smooth monotone ROC curves. Our method ensures important inherent properties of the underlying ROC curves, which include monotonicity, transformation invariance, and boundary constraints. We compare the finite sample performance of the newly proposed ROC method with other ROC smoothing methods in large-scale simulation studies. We illustrate our method through a real life example. Copyright (c) 2008 John Wiley & Sons, Ltd.
Boosted Multivariate Trees for Longitudinal Data
Pande, Amol; Li, Liang; Rajeswaran, Jeevanantham; Ehrlinger, John; Kogalur, Udaya B.; Blackstone, Eugene H.; Ishwaran, Hemant
2017-01-01
Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing P-splines with estimated smoothing parameter. In order to avoid overfitting, we describe a relatively simple in sample cross-validation method which can be used to estimate the optimal boosting iteration and which has the surprising added benefit of stabilizing certain parameter estimates. Our new multivariate tree boosting method is shown to be highly flexible, robust to covariance misspecification and unbalanced designs, and resistant to overfitting in high dimensions. Feature selection can be used to identify important features and feature-time interactions. An application to longitudinal data of forced 1-second lung expiratory volume (FEV1) for lung transplant patients identifies an important feature-time interaction and illustrates the ease with which our method can find complex relationships in longitudinal data. PMID:29249866
MEASURING DARK MATTER PROFILES NON-PARAMETRICALLY IN DWARF SPHEROIDALS: AN APPLICATION TO DRACO
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jardel, John R.; Gebhardt, Karl; Fabricius, Maximilian H.
2013-02-15
We introduce a novel implementation of orbit-based (or Schwarzschild) modeling that allows dark matter density profiles to be calculated non-parametrically in nearby galaxies. Our models require no assumptions to be made about velocity anisotropy or the dark matter profile. The technique can be applied to any dispersion-supported stellar system, and we demonstrate its use by studying the Local Group dwarf spheroidal galaxy (dSph) Draco. We use existing kinematic data at larger radii and also present 12 new radial velocities within the central 13 pc obtained with the VIRUS-W integral field spectrograph on the 2.7 m telescope at McDonald Observatory. Ourmore » non-parametric Schwarzschild models find strong evidence that the dark matter profile in Draco is cuspy for 20 {<=} r {<=} 700 pc. The profile for r {>=} 20 pc is well fit by a power law with slope {alpha} = -1.0 {+-} 0.2, consistent with predictions from cold dark matter simulations. Our models confirm that, despite its low baryon content relative to other dSphs, Draco lives in a massive halo.« less
Application of the LSQR algorithm in non-parametric estimation of aerosol size distribution
NASA Astrophysics Data System (ADS)
He, Zhenzong; Qi, Hong; Lew, Zhongyuan; Ruan, Liming; Tan, Heping; Luo, Kun
2016-05-01
Based on the Least Squares QR decomposition (LSQR) algorithm, the aerosol size distribution (ASD) is retrieved in non-parametric approach. The direct problem is solved by the Anomalous Diffraction Approximation (ADA) and the Lambert-Beer Law. An optimal wavelength selection method is developed to improve the retrieval accuracy of the ASD. The proposed optimal wavelength set is selected by the method which can make the measurement signals sensitive to wavelength and decrease the degree of the ill-condition of coefficient matrix of linear systems effectively to enhance the anti-interference ability of retrieval results. Two common kinds of monomodal and bimodal ASDs, log-normal (L-N) and Gamma distributions, are estimated, respectively. Numerical tests show that the LSQR algorithm can be successfully applied to retrieve the ASD with high stability in the presence of random noise and low susceptibility to the shape of distributions. Finally, the experimental measurement ASD over Harbin in China is recovered reasonably. All the results confirm that the LSQR algorithm combined with the optimal wavelength selection method is an effective and reliable technique in non-parametric estimation of ASD.
Macmillan, N A; Creelman, C D
1996-06-01
Can accuracy and response bias in two-stimulus, two-response recognition or detection experiments be measured nonparametrically? Pollack and Norman (1964) answered this question affirmatively for sensitivity, Hodos (1970) for bias: Both proposed measures based on triangular areas in receiver-operating characteristic space. Their papers, and especially a paper by Grier (1971) that provided computing formulas for the measures, continue to be heavily cited in a wide range of content areas. In our sample of articles, most authors described triangle-based measures as making fewer assumptions than measures associated with detection theory. However, we show that statistics based on products or ratios of right triangle areas, including a recently proposed bias index and a not-yetproposed but apparently plausible sensitivity index, are consistent with a decision process based on logistic distributions. Even the Pollack and Norman measure, which is based on non-right triangles, is approximately logistic for low values of sensitivity. Simple geometric models for sensitivity and bias are not nonparametric, even if their implications are not acknowledged in the defining publications.
Bayesian nonparametric dictionary learning for compressed sensing MRI.
Huang, Yue; Paisley, John; Lin, Qin; Ding, Xinghao; Fu, Xueyang; Zhang, Xiao-Ping
2014-12-01
We develop a Bayesian nonparametric model for reconstructing magnetic resonance images (MRIs) from highly undersampled k -space data. We perform dictionary learning as part of the image reconstruction process. To this end, we use the beta process as a nonparametric dictionary learning prior for representing an image patch as a sparse combination of dictionary elements. The size of the dictionary and patch-specific sparsity pattern are inferred from the data, in addition to other dictionary learning variables. Dictionary learning is performed directly on the compressed image, and so is tailored to the MRI being considered. In addition, we investigate a total variation penalty term in combination with the dictionary learning model, and show how the denoising property of dictionary learning removes dependence on regularization parameters in the noisy setting. We derive a stochastic optimization algorithm based on Markov chain Monte Carlo for the Bayesian model, and use the alternating direction method of multipliers for efficiently performing total variation minimization. We present empirical results on several MRI, which show that the proposed regularization framework can improve reconstruction accuracy over other methods.
Oostenveld, Robert; Fries, Pascal; Maris, Eric; Schoffelen, Jan-Mathijs
2011-01-01
This paper describes FieldTrip, an open source software package that we developed for the analysis of MEG, EEG, and other electrophysiological data. The software is implemented as a MATLAB toolbox and includes a complete set of consistent and user-friendly high-level functions that allow experimental neuroscientists to analyze experimental data. It includes algorithms for simple and advanced analysis, such as time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. The implementation as toolbox allows the user to perform elaborate and structured analyses of large data sets using the MATLAB command line and batch scripting. Furthermore, users and developers can easily extend the functionality and implement new algorithms. The modular design facilitates the reuse in other software packages. PMID:21253357
Paolini, Marco; Keeser, Daniel; Ingrisch, Michael; Werner, Natalie; Kindermann, Nicole; Reiser, Maximilian; Blautzik, Janusch
2015-05-01
Little research exists on the influence of a magnetic resonance imaging (MRI) head coil's channel count on measured resting-state functional connectivity. To compare a 32-element (32ch) and an 8-element (8ch) phased array head coil with respect to their potential to detect functional connectivity within resting-state networks. Twenty-six healthy adults (mean age, 21.7 years; SD, 2.1 years) underwent resting-state functional MRI at 3.0 Tesla with both coils using equal standard imaging parameters and a counterbalanced design. Independent component analysis (ICA) at different model orders and a dual regression approach were performed. Voxel-wise non-parametric statistical between-group contrasts were determined using permutation-based non-parametric inference. Phantom measurements demonstrated a generally higher image signal-to-noise ratio using the 32ch head coil. However, the results showed no significant differences between corresponding resting-state networks derived from both coils (p < 0.05, FWE-corrected). Using the identical standard acquisition parameters, the 32ch head coil does not offer any significant advantages in detecting ICA-based functional connectivity within RSNs. © The Foundation Acta Radiologica 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.
Etter, Nicole M; Mckeon, Patrick O; Dressler, Emily V; Andreatta, Richard D
2017-05-03
Current theoretical models suggest the importance of a bidirectional relationship between sensation and production in the vocal tract to maintain lifelong speech skills. The purpose of this study was to assess age-related changes in orofacial skilled force production and to begin defining the orofacial perception-action relationship in healthy adults. Low-level orofacial force control measures (reaction time, rise time, peak force, mean hold force (N) and force hold SD) were collected from 60 adults (19-84 years). Non-parametric Kruskal Wallis tests were performed to identify statistical differences between force and group demographics. Non-parametric Spearman's rank correlations were completed to compare force measures against previously published sensory data from the same cohort of participants. Significant group differences in force control were found for age, sex, speech usage and smoking status. Significant correlational relationships were identified between labial vibrotactile thresholds and several low-level force control measures collected during step and ramp-and-hold conditions. These findings demonstrate age-related alterations in orofacial force production. Furthermore, correlational analysis suggests as vibrotactile detection thresholds increase, the ability to maintain low-level force control accuracy decreases. Possible clinical applications and treatment consequences of these findings for speech disorders in the ageing population are provided.
Estimating scaled treatment effects with multiple outcomes.
Kennedy, Edward H; Kangovi, Shreya; Mitra, Nandita
2017-01-01
In classical study designs, the aim is often to learn about the effects of a treatment or intervention on a single outcome; in many modern studies, however, data on multiple outcomes are collected and it is of interest to explore effects on multiple outcomes simultaneously. Such designs can be particularly useful in patient-centered research, where different outcomes might be more or less important to different patients. In this paper, we propose scaled effect measures (via potential outcomes) that translate effects on multiple outcomes to a common scale, using mean-variance and median-interquartile range based standardizations. We present efficient, nonparametric, doubly robust methods for estimating these scaled effects (and weighted average summary measures), and for testing the null hypothesis that treatment affects all outcomes equally. We also discuss methods for exploring how treatment effects depend on covariates (i.e., effect modification). In addition to describing efficiency theory for our estimands and the asymptotic behavior of our estimators, we illustrate the methods in a simulation study and a data analysis. Importantly, and in contrast to much of the literature concerning effects on multiple outcomes, our methods are nonparametric and can be used not only in randomized trials to yield increased efficiency, but also in observational studies with high-dimensional covariates to reduce confounding bias.
Koohbor, Behrad; Kidane, Addis; Lu, Wei-Yang
2016-06-27
As an optimum energy-absorbing material system, polymeric foams are needed to dissipate the kinetic energy of an impact, while maintaining the impact force transferred to the protected object at a low level. As a result, it is crucial to accurately characterize the load bearing and energy dissipation performance of foams at high strain rate loading conditions. There are certain challenges faced in the accurate measurement of the deformation response of foams due to their low mechanical impedance. In the present work, a non-parametric method is successfully implemented to enable the accurate assessment of the compressive constitutive response of rigid polymericmore » foams subjected to impact loading conditions. The method is based on stereovision high speed photography in conjunction with 3D digital image correlation, and allows for accurate evaluation of inertia stresses developed within the specimen during deformation time. In conclusion, full-field distributions of stress, strain and strain rate are used to extract the local constitutive response of the material at any given location along the specimen axis. In addition, the effective energy absorbed by the material is calculated. Finally, results obtained from the proposed non-parametric analysis are compared with data obtained from conventional test procedures.« less
Lee, Chi Hyun; Luo, Xianghua; Huang, Chiung-Yu; DeFor, Todd E; Brunstein, Claudio G; Weisdorf, Daniel J
2016-06-01
Infection is one of the most common complications after hematopoietic cell transplantation. Many patients experience infectious complications repeatedly after transplant. Existing statistical methods for recurrent gap time data typically assume that patients are enrolled due to the occurrence of an event of interest, and subsequently experience recurrent events of the same type; moreover, for one-sample estimation, the gap times between consecutive events are usually assumed to be identically distributed. Applying these methods to analyze the post-transplant infection data will inevitably lead to incorrect inferential results because the time from transplant to the first infection has a different biological meaning than the gap times between consecutive recurrent infections. Some unbiased yet inefficient methods include univariate survival analysis methods based on data from the first infection or bivariate serial event data methods based on the first and second infections. In this article, we propose a nonparametric estimator of the joint distribution of time from transplant to the first infection and the gap times between consecutive infections. The proposed estimator takes into account the potentially different distributions of the two types of gap times and better uses the recurrent infection data. Asymptotic properties of the proposed estimators are established. © 2015, The International Biometric Society.
Lee, Chi Hyun; Huang, Chiung-Yu; DeFor, Todd E.; Brunstein, Claudio G.; Weisdorf, Daniel J.
2015-01-01
Summary Infection is one of the most common complications after hematopoietic cell transplantation. Many patients experience infectious complications repeatedly after transplant. Existing statistical methods for recurrent gap time data typically assume that patients are enrolled due to the occurrence of an event of interest, and subsequently experience recurrent events of the same type; moreover, for one-sample estimation, the gap times between consecutive events are usually assumed to be identically distributed. Applying these methods to analyze the post-transplant infection data will inevitably lead to incorrect inferential results because the time from transplant to the first infection has a different biological meaning than the gap times between consecutive recurrent infections. Some unbiased yet inefficient methods include univariate survival analysis methods based on data from the first infection or bivariate serial event data methods based on the first and second infections. In this paper, we propose a nonparametric estimator of the joint distribution of time from transplant to the first infection and the gap times between consecutive infections. The proposed estimator takes into account the potentially different distributions of the two types of gap times and better uses the recurrent infection data. Asymptotic properties of the proposed estimators are established. PMID:26575402
Shkvarko, Yuriy; Tuxpan, José; Santos, Stewart
2011-01-01
We consider a problem of high-resolution array radar/SAR imaging formalized in terms of a nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the random wavefield scattered from a remotely sensed scene observed through a kernel signal formation operator and contaminated with random Gaussian noise. First, the Sobolev-type solution space is constructed to specify the class of consistent kernel SSP estimators with the reproducing kernel structures adapted to the metrics in such the solution space. Next, the "model-free" variational analysis (VA)-based image enhancement approach and the "model-based" descriptive experiment design (DEED) regularization paradigm are unified into a new dynamic experiment design (DYED) regularization framework. Application of the proposed DYED framework to the adaptive array radar/SAR imaging problem leads to a class of two-level (DEED-VA) regularized SSP reconstruction techniques that aggregate the kernel adaptive anisotropic windowing with the projections onto convex sets to enforce the consistency and robustness of the overall iterative SSP estimators. We also show how the proposed DYED regularization method may be considered as a generalization of the MVDR, APES and other high-resolution nonparametric adaptive radar sensing techniques. A family of the DYED-related algorithms is constructed and their effectiveness is finally illustrated via numerical simulations.
Rodríguez-Álvarez, María Xosé; Roca-Pardiñas, Javier; Cadarso-Suárez, Carmen; Tahoces, Pablo G
2018-03-01
Prior to using a diagnostic test in a routine clinical setting, the rigorous evaluation of its diagnostic accuracy is essential. The receiver-operating characteristic curve is the measure of accuracy most widely used for continuous diagnostic tests. However, the possible impact of extra information about the patient (or even the environment) on diagnostic accuracy also needs to be assessed. In this paper, we focus on an estimator for the covariate-specific receiver-operating characteristic curve based on direct regression modelling and nonparametric smoothing techniques. This approach defines the class of generalised additive models for the receiver-operating characteristic curve. The main aim of the paper is to offer new inferential procedures for testing the effect of covariates on the conditional receiver-operating characteristic curve within the above-mentioned class. Specifically, two different bootstrap-based tests are suggested to check (a) the possible effect of continuous covariates on the receiver-operating characteristic curve and (b) the presence of factor-by-curve interaction terms. The validity of the proposed bootstrap-based procedures is supported by simulations. To facilitate the application of these new procedures in practice, an R-package, known as npROCRegression, is provided and briefly described. Finally, data derived from a computer-aided diagnostic system for the automatic detection of tumour masses in breast cancer is analysed.
2010-01-01
Background Nonparametric Bayesian techniques have been developed recently to extend the sophistication of factor models, allowing one to infer the number of appropriate factors from the observed data. We consider such techniques for sparse factor analysis, with application to gene-expression data from three virus challenge studies. Particular attention is placed on employing the Beta Process (BP), the Indian Buffet Process (IBP), and related sparseness-promoting techniques to infer a proper number of factors. The posterior density function on the model parameters is computed using Gibbs sampling and variational Bayesian (VB) analysis. Results Time-evolving gene-expression data are considered for respiratory syncytial virus (RSV), Rhino virus, and influenza, using blood samples from healthy human subjects. These data were acquired in three challenge studies, each executed after receiving institutional review board (IRB) approval from Duke University. Comparisons are made between several alternative means of per-forming nonparametric factor analysis on these data, with comparisons as well to sparse-PCA and Penalized Matrix Decomposition (PMD), closely related non-Bayesian approaches. Conclusions Applying the Beta Process to the factor scores, or to the singular values of a pseudo-SVD construction, the proposed algorithms infer the number of factors in gene-expression data. For real data the "true" number of factors is unknown; in our simulations we consider a range of noise variances, and the proposed Bayesian models inferred the number of factors accurately relative to other methods in the literature, such as sparse-PCA and PMD. We have also identified a "pan-viral" factor of importance for each of the three viruses considered in this study. We have identified a set of genes associated with this pan-viral factor, of interest for early detection of such viruses based upon the host response, as quantified via gene-expression data. PMID:21062443
Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure
Berisha, Visar; Wisler, Alan; Hero, Alfred O.; Spanias, Andreas
2015-01-01
Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks. PMID:26807014
Asymptotics of nonparametric L-1 regression models with dependent data
ZHAO, ZHIBIAO; WEI, YING; LIN, DENNIS K.J.
2013-01-01
We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile estimates. The obtained Bahadur representations provide deep insights into the asymptotic behavior of the estimates. Our main theoretical development is based on studying the modulus of continuity of kernel weighted empirical process through a coupling argument. Progesterone data is used for an illustration. PMID:24955016
Revisiting the Distance Duality Relation using a non-parametric regression method
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rana, Akshay; Mahajan, Shobhit; Mukherjee, Amitabha
2016-07-01
The interdependence of luminosity distance, D {sub L} and angular diameter distance, D {sub A} given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η( z )≡ D {sub L} / D {sub A} (1+ z ){sup 2} =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and worksmore » with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η ( z ) data based on a phenomenological model η( z )= (1+ z ){sup ε}. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η ( z ). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z ≤ 2.418).« less
Taracido Trunk, M; Figueiras, A; Castro Lareo, I
1999-01-01
In the Autonomous Region of Galicia, no study has been made of the impacts of air pollution on human health, despite the fact that several of its major cities have moderate levels of pollution. Therefore, we have considered the need of making this study in the city of Vigo. The main objective of this analysis is that of analyzing the short-term impact of air pollution on the daily death rate for all reasons in the city of Vigo throughout the 1991-1994 period, by using the procedure for analysis set out as part of the EMECAM Project. The daily fluctuations in the number of deaths for all causes with the exception of the external ones are listed with the daily fluctuations of sulfur dioxide and particles using Poisson regression models. A non-parametric model is also used in order to better control the confusion variables. Using the Poisson regression model, no significant relationships have been found to exist between the pollutants and the death rate. In the non-parametric model, a relationship was found between the concentration of particles on the day immediately prior to the date of death and the death rate, an effect which remains unchanged on including the autoregressive terms. Particle-based air pollution is a health risk despite the average levels of this pollutant falling within the air quality guideline levels in the city of Vigo.
Mura, Maria Chiara; De Felice, Marco; Morlino, Roberta; Fuselli, Sergio
2010-01-01
In step with the need to develop statistical procedures to manage small-size environmental samples, in this work we have used concentration values of benzene (C6H6), concurrently detected by seven outdoor and indoor monitoring stations over 12 000 minutes, in order to assess the representativeness of collected data and the impact of the pollutant on indoor environment. Clearly, the former issue is strictly connected to sampling-site geometry, which proves critical to correctly retrieving information from analysis of pollutants of sanitary interest. Therefore, according to current criteria for network-planning, single stations have been interpreted as nodes of a set of adjoining triangles; then, a) node pairs have been taken into account in order to estimate pollutant stationarity on triangle sides, as well as b) node triplets, to statistically associate data from air-monitoring with the corresponding territory area, and c) node sextuplets, to assess the impact probability of the outdoor pollutant on indoor environment for each area. Distributions from the various node combinations are all non-Gaussian, in the consequently, Kruskal-Wallis (KW) non-parametric statistics has been exploited to test variability on continuous density function from each pair, triplet and sextuplet. Results from the above-mentioned statistical analysis have shown randomness of site selection, which has not allowed a reliable generalization of monitoring data to the entire selected territory, except for a single "forced" case (70%); most important, they suggest a possible procedure to optimize network design.
Ponciano, José Miguel
2017-11-22
Using a nonparametric Bayesian approach Palacios and Minin (2013) dramatically improved the accuracy, precision of Bayesian inference of population size trajectories from gene genealogies. These authors proposed an extension of a Gaussian Process (GP) nonparametric inferential method for the intensity function of non-homogeneous Poisson processes. They found that not only the statistical properties of the estimators were improved with their method, but also, that key aspects of the demographic histories were recovered. The authors' work represents the first Bayesian nonparametric solution to this inferential problem because they specify a convenient prior belief without a particular functional form on the population trajectory. Their approach works so well and provides such a profound understanding of the biological process, that the question arises as to how truly "biology-free" their approach really is. Using well-known concepts of stochastic population dynamics, here I demonstrate that in fact, Palacios and Minin's GP model can be cast as a parametric population growth model with density dependence and environmental stochasticity. Making this link between population genetics and stochastic population dynamics modeling provides novel insights into eliciting biologically meaningful priors for the trajectory of the effective population size. The results presented here also bring novel understanding of GP as models for the evolution of a trait. Thus, the ecological principles foundation of Palacios and Minin (2013)'s prior adds to the conceptual and scientific value of these authors' inferential approach. I conclude this note by listing a series of insights brought about by this connection with Ecology. Copyright © 2017 The Author. Published by Elsevier Inc. All rights reserved.
Chaotic map clustering algorithm for EEG analysis
NASA Astrophysics Data System (ADS)
Bellotti, R.; De Carlo, F.; Stramaglia, S.
2004-03-01
The non-parametric chaotic map clustering algorithm has been applied to the analysis of electroencephalographic signals, in order to recognize the Huntington's disease, one of the most dangerous pathologies of the central nervous system. The performance of the method has been compared with those obtained through parametric algorithms, as K-means and deterministic annealing, and supervised multi-layer perceptron. While supervised neural networks need a training phase, performed by means of data tagged by the genetic test, and the parametric methods require a prior choice of the number of classes to find, the chaotic map clustering gives a natural evidence of the pathological class, without any training or supervision, thus providing a new efficient methodology for the recognition of patterns affected by the Huntington's disease.
Time Series Imputation via L1 Norm-Based Singular Spectrum Analysis
NASA Astrophysics Data System (ADS)
Kalantari, Mahdi; Yarmohammadi, Masoud; Hassani, Hossein; Silva, Emmanuel Sirimal
Missing values in time series data is a well-known and important problem which many researchers have studied extensively in various fields. In this paper, a new nonparametric approach for missing value imputation in time series is proposed. The main novelty of this research is applying the L1 norm-based version of Singular Spectrum Analysis (SSA), namely L1-SSA which is robust against outliers. The performance of the new imputation method has been compared with many other established methods. The comparison is done by applying them to various real and simulated time series. The obtained results confirm that the SSA-based methods, especially L1-SSA can provide better imputation in comparison to other methods.
Application of meta-analysis methods for identifying proteomic expression level differences.
Amess, Bob; Kluge, Wolfgang; Schwarz, Emanuel; Haenisch, Frieder; Alsaif, Murtada; Yolken, Robert H; Leweke, F Markus; Guest, Paul C; Bahn, Sabine
2013-07-01
We present new statistical approaches for identification of proteins with expression levels that are significantly changed when applying meta-analysis to two or more independent experiments. We showed that the Euclidean distance measure has reduced risk of false positives compared to the rank product method. Our Ψ-ranking method has advantages over the traditional fold-change approach by incorporating both the fold-change direction as well as the p-value. In addition, the second novel method, Π-ranking, considers the ratio of the fold-change and thus integrates all three parameters. We further improved the latter by introducing our third technique, Σ-ranking, which combines all three parameters in a balanced nonparametric approach. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Tests of Mediation: Paradoxical Decline in Statistical Power as a Function of Mediator Collinearity
Beasley, T. Mark
2013-01-01
Increasing the correlation between the independent variable and the mediator (a coefficient) increases the effect size (ab) for mediation analysis; however, increasing a by definition increases collinearity in mediation models. As a result, the standard error of product tests increase. The variance inflation due to increases in a at some point outweighs the increase of the effect size (ab) and results in a loss of statistical power. This phenomenon also occurs with nonparametric bootstrapping approaches because the variance of the bootstrap distribution of ab approximates the variance expected from normal theory. Both variances increase dramatically when a exceeds the b coefficient, thus explaining the power decline with increases in a. Implications for statistical analysis and applied researchers are discussed. PMID:24954952
Algorithm for Identifying Erroneous Rain-Gauge Readings
NASA Technical Reports Server (NTRS)
Rickman, Doug
2005-01-01
An algorithm analyzes rain-gauge data to identify statistical outliers that could be deemed to be erroneous readings. Heretofore, analyses of this type have been performed in burdensome manual procedures that have involved subjective judgements. Sometimes, the analyses have included computational assistance for detecting values falling outside of arbitrary limits. The analyses have been performed without statistically valid knowledge of the spatial and temporal variations of precipitation within rain events. In contrast, the present algorithm makes it possible to automate such an analysis, makes the analysis objective, takes account of the spatial distribution of rain gauges in conjunction with the statistical nature of spatial variations in rainfall readings, and minimizes the use of arbitrary criteria. The algorithm implements an iterative process that involves nonparametric statistics.
Nonparametric Bayesian models for a spatial covariance.
Reich, Brian J; Fuentes, Montserrat
2012-01-01
A crucial step in the analysis of spatial data is to estimate the spatial correlation function that determines the relationship between a spatial process at two locations. The standard approach to selecting the appropriate correlation function is to use prior knowledge or exploratory analysis, such as a variogram analysis, to select the correct parametric correlation function. Rather that selecting a particular parametric correlation function, we treat the covariance function as an unknown function to be estimated from the data. We propose a flexible prior for the correlation function to provide robustness to the choice of correlation function. We specify the prior for the correlation function using spectral methods and the Dirichlet process prior, which is a common prior for an unknown distribution function. Our model does not require Gaussian data or spatial locations on a regular grid. The approach is demonstrated using a simulation study as well as an analysis of California air pollution data.
Karakaya, Jale; Karabulut, Erdem; Yucel, Recai M.
2015-01-01
Modern statistical methods using incomplete data have been increasingly applied in a wide variety of substantive problems. Similarly, receiver operating characteristic (ROC) analysis, a method used in evaluating diagnostic tests or biomarkers in medical research, has also been increasingly popular problem in both its development and application. While missing-data methods have been applied in ROC analysis, the impact of model mis-specification and/or assumptions (e.g. missing at random) underlying the missing data has not been thoroughly studied. In this work, we study the performance of multiple imputation (MI) inference in ROC analysis. Particularly, we investigate parametric and non-parametric techniques for MI inference under common missingness mechanisms. Depending on the coherency of the imputation model with the underlying data generation mechanism, our results show that MI generally leads to well-calibrated inferences under ignorable missingness mechanisms. PMID:26379316
Regression analysis of informative current status data with the additive hazards model.
Zhao, Shishun; Hu, Tao; Ma, Ling; Wang, Peijie; Sun, Jianguo
2015-04-01
This paper discusses regression analysis of current status failure time data arising from the additive hazards model in the presence of informative censoring. Many methods have been developed for regression analysis of current status data under various regression models if the censoring is noninformative, and also there exists a large literature on parametric analysis of informative current status data in the context of tumorgenicity experiments. In this paper, a semiparametric maximum likelihood estimation procedure is presented and in the method, the copula model is employed to describe the relationship between the failure time of interest and the censoring time. Furthermore, I-splines are used to approximate the nonparametric functions involved and the asymptotic consistency and normality of the proposed estimators are established. A simulation study is conducted and indicates that the proposed approach works well for practical situations. An illustrative example is also provided.
NASA Astrophysics Data System (ADS)
Phuong, Vu Hung
2018-03-01
This research applies Data Envelopment Analysis (DEA) approach to analyze Total Factor Productivity (TFP) and efficiency changes in Vietnam coal mining industry from 2007 to 2013. The TFP of Vietnam coal mining companies decreased due to slow technological progress and unimproved efficiency. The decadence of technical efficiency in many enterprises proved that the coal mining industry has a large potential to increase productivity through technical efficiency improvement. Enhancing human resource training, technology and research & development investment could help the industry to improve efficiency and productivity in Vietnam coal mining industry.
Transfer pricing in hospitals and efficiency of physicians: the case of anesthesia services.
Kuntz, Ludwig; Vera, Antonio
2005-01-01
The objective is to investigate theoretically and empirically how the efficiency of the physicians involved in anesthesia and surgery can be optimized by the introduction of transfer pricing for anesthesia services. The anesthesiology data of approximately 57,000 operations carried out at the University Hospital Hamburg-Eppendorf (UKE) in Germany in the period from 2000 to 2002 are analyzed using parametric and non-parametric methods. The principal finding of the empirical analysis is that the efficiency of the physicians involved in anesthesia and surgery at the UKE improved after the introduction of transfer pricing.
[Trait variability in ontogenesis of epiphytic lichen Hypogymnia physodes (L.) Nyl].
Suetina, Iu G; Glotov, N V
2014-01-01
Ontogenesis of the foliose lichen Hypogymniaphysodes has been described on the basis of the material obtained from natural populations. Ontogenetic dynamics (diameter of thallus and the number of lobes) and the features of reproductive structures (the number and diameter of labelloid and galeated sorales) were studied in ecologically different pine forests. We reasonably rejected the use of the variance analysis and nonparametric criteria for the result processing. It was shown that the median dynamics and trait variance may be either similar or different throughout the ontogenesis. The trait variances in ecologically different ecotopes were shown to be different.
Lucijanic, Marko; Petrovecki, Mladen
2012-01-01
Analyzing events over time is often complicated by incomplete, or censored, observations. Special non-parametric statistical methods were developed to overcome difficulties in summarizing and comparing censored data. Life-table (actuarial) method and Kaplan-Meier method are described with an explanation of survival curves. For the didactic purpose authors prepared a workbook based on most widely used Kaplan-Meier method. It should help the reader understand how Kaplan-Meier method is conceptualized and how it can be used to obtain statistics and survival curves needed to completely describe a sample of patients. Log-rank test and hazard ratio are also discussed.
Palmar dermatoglyphic patterns in twins.
Jacques, S M; Salzano, F M; Penña, H F
1977-01-01
The role of genetic factors in the determination of palmar dermatoglyphic patterns was investigated in a series of 49 MZ and 51 DZ twins, using Spearman's rank correlation and analysis of variance. Both methods indicated that the genetic effect in the distribution of patterns is highest in the interdigital III and lowest in the interdigital IV regions, the hypothenar and thenar showing intermediate values. As for interdigital II, no evaluation of genetic effects was possible using the nonparametric test, but the estimates of genetic variance indicate that inherited factors may play a relatively minor role in the pattern distribution of this area.
Analysis of quantitative data obtained from toxicity studies showing non-normal distribution.
Kobayashi, Katsumi
2005-05-01
The data obtained from toxicity studies are examined for homogeneity of variance, but, usually, they are not examined for normal distribution. In this study I examined the measured items of a carcinogenicity/chronic toxicity study with rats for both homogeneity of variance and normal distribution. It was observed that a lot of hematology and biochemistry items showed non-normal distribution. For testing normal distribution of the data obtained from toxicity studies, the data of the concurrent control group may be examined, and for the data that show a non-normal distribution, non-parametric tests with robustness may be applied.
Twenty-five years of maximum-entropy principle
NASA Astrophysics Data System (ADS)
Kapur, J. N.
1983-04-01
The strengths and weaknesses of the maximum entropy principle (MEP) are examined and some challenging problems that remain outstanding at the end of the first quarter century of the principle are discussed. The original formalism of the MEP is presented and its relationship to statistical mechanics is set forth. The use of MEP for characterizing statistical distributions, in statistical inference, nonlinear spectral analysis, transportation models, population density models, models for brand-switching in marketing and vote-switching in elections is discussed. Its application to finance, insurance, image reconstruction, pattern recognition, operations research and engineering, biology and medicine, and nonparametric density estimation is considered.
Minimum distance classification in remote sensing
NASA Technical Reports Server (NTRS)
Wacker, A. G.; Landgrebe, D. A.
1972-01-01
The utilization of minimum distance classification methods in remote sensing problems, such as crop species identification, is considered. Literature concerning both minimum distance classification problems and distance measures is reviewed. Experimental results are presented for several examples. The objective of these examples is to: (a) compare the sample classification accuracy of a minimum distance classifier, with the vector classification accuracy of a maximum likelihood classifier, and (b) compare the accuracy of a parametric minimum distance classifier with that of a nonparametric one. Results show the minimum distance classifier performance is 5% to 10% better than that of the maximum likelihood classifier. The nonparametric classifier is only slightly better than the parametric version.
Computation of nonparametric convex hazard estimators via profile methods.
Jankowski, Hanna K; Wellner, Jon A
2009-05-01
This paper proposes a profile likelihood algorithm to compute the nonparametric maximum likelihood estimator of a convex hazard function. The maximisation is performed in two steps: First the support reduction algorithm is used to maximise the likelihood over all hazard functions with a given point of minimum (or antimode). Then it is shown that the profile (or partially maximised) likelihood is quasi-concave as a function of the antimode, so that a bisection algorithm can be applied to find the maximum of the profile likelihood, and hence also the global maximum. The new algorithm is illustrated using both artificial and real data, including lifetime data for Canadian males and females.
Kerschbamer, Rudolf
2015-05-01
This paper proposes a geometric delineation of distributional preference types and a non-parametric approach for their identification in a two-person context. It starts with a small set of assumptions on preferences and shows that this set (i) naturally results in a taxonomy of distributional archetypes that nests all empirically relevant types considered in previous work; and (ii) gives rise to a clean experimental identification procedure - the Equality Equivalence Test - that discriminates between archetypes according to core features of preferences rather than properties of specific modeling variants. As a by-product the test yields a two-dimensional index of preference intensity.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2010-07-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.
Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang
2013-01-01
We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286
Stochastic Earthquake Rupture Modeling Using Nonparametric Co-Regionalization
NASA Astrophysics Data System (ADS)
Lee, Kyungbook; Song, Seok Goo
2017-09-01
Accurate predictions of the intensity and variability of ground motions are essential in simulation-based seismic hazard assessment. Advanced simulation-based ground motion prediction methods have been proposed to complement the empirical approach, which suffers from the lack of observed ground motion data, especially in the near-source region for large events. It is important to quantify the variability of the earthquake rupture process for future events and to produce a number of rupture scenario models to capture the variability in simulation-based ground motion predictions. In this study, we improved the previously developed stochastic earthquake rupture modeling method by applying the nonparametric co-regionalization, which was proposed in geostatistics, to the correlation models estimated from dynamically derived earthquake rupture models. The nonparametric approach adopted in this study is computationally efficient and, therefore, enables us to simulate numerous rupture scenarios, including large events ( M > 7.0). It also gives us an opportunity to check the shape of true input correlation models in stochastic modeling after being deformed for permissibility. We expect that this type of modeling will improve our ability to simulate a wide range of rupture scenario models and thereby predict ground motions and perform seismic hazard assessment more accurately.
Bivariate discrete beta Kernel graduation of mortality data.
Mazza, Angelo; Punzo, Antonio
2015-07-01
Various parametric/nonparametric techniques have been proposed in literature to graduate mortality data as a function of age. Nonparametric approaches, as for example kernel smoothing regression, are often preferred because they do not assume any particular mortality law. Among the existing kernel smoothing approaches, the recently proposed (univariate) discrete beta kernel smoother has been shown to provide some benefits. Bivariate graduation, over age and calendar years or durations, is common practice in demography and actuarial sciences. In this paper, we generalize the discrete beta kernel smoother to the bivariate case, and we introduce an adaptive bandwidth variant that may provide additional benefits when data on exposures to the risk of death are available; furthermore, we outline a cross-validation procedure for bandwidths selection. Using simulations studies, we compare the bivariate approach proposed here with its corresponding univariate formulation and with two popular nonparametric bivariate graduation techniques, based on Epanechnikov kernels and on P-splines. To make simulations realistic, a bivariate dataset, based on probabilities of dying recorded for the US males, is used. Simulations have confirmed the gain in performance of the new bivariate approach with respect to both the univariate and the bivariate competitors.
Statistical methods used in articles published by the Journal of Periodontal and Implant Science.
Choi, Eunsil; Lyu, Jiyoung; Park, Jinyoung; Kim, Hae-Young
2014-12-01
The purposes of this study were to assess the trend of use of statistical methods including parametric and nonparametric methods and to evaluate the use of complex statistical methodology in recent periodontal studies. This study analyzed 123 articles published in the Journal of Periodontal & Implant Science (JPIS) between 2010 and 2014. Frequencies and percentages were calculated according to the number of statistical methods used, the type of statistical method applied, and the type of statistical software used. Most of the published articles considered (64.4%) used statistical methods. Since 2011, the percentage of JPIS articles using statistics has increased. On the basis of multiple counting, we found that the percentage of studies in JPIS using parametric methods was 61.1%. Further, complex statistical methods were applied in only 6 of the published studies (5.0%), and nonparametric statistical methods were applied in 77 of the published studies (38.9% of a total of 198 studies considered). We found an increasing trend towards the application of statistical methods and nonparametric methods in recent periodontal studies and thus, concluded that increased use of complex statistical methodology might be preferred by the researchers in the fields of study covered by JPIS.
Maximum Likelihood Estimations and EM Algorithms with Length-biased Data
Qin, Jing; Ning, Jing; Liu, Hao; Shen, Yu
2012-01-01
SUMMARY Length-biased sampling has been well recognized in economics, industrial reliability, etiology applications, epidemiological, genetic and cancer screening studies. Length-biased right-censored data have a unique data structure different from traditional survival data. The nonparametric and semiparametric estimations and inference methods for traditional survival data are not directly applicable for length-biased right-censored data. We propose new expectation-maximization algorithms for estimations based on full likelihoods involving infinite dimensional parameters under three settings for length-biased data: estimating nonparametric distribution function, estimating nonparametric hazard function under an increasing failure rate constraint, and jointly estimating baseline hazards function and the covariate coefficients under the Cox proportional hazards model. Extensive empirical simulation studies show that the maximum likelihood estimators perform well with moderate sample sizes and lead to more efficient estimators compared to the estimating equation approaches. The proposed estimates are also more robust to various right-censoring mechanisms. We prove the strong consistency properties of the estimators, and establish the asymptotic normality of the semi-parametric maximum likelihood estimators under the Cox model using modern empirical processes theory. We apply the proposed methods to a prevalent cohort medical study. Supplemental materials are available online. PMID:22323840
The Statistical Consulting Center for Astronomy (SCCA)
NASA Technical Reports Server (NTRS)
Akritas, Michael
2001-01-01
The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
2014-01-01
Background Measures of household socio-economic position (SEP) are widely used in health research. There exist a number of approaches to their measurement, with Principal Components Analysis (PCA) applied to a basket of household assets being one of the most common. PCA, however, carries a number of assumptions about the distribution of the data which may be untenable, and alternative, non-parametric, approaches may be preferred. Mokken scale analysis is a non-parametric, item response theory approach to scale development which appears never to have been applied to household asset data. A Mokken scale can be used to rank order items (measures of wealth) as well as households. Using data on household asset ownership from a national sample of 4,154 consenting households in the World Health Survey from Vietnam, 2003, we construct two measures of household SEP. Seventeen items asking about assets, and utility and infrastructure use were used. Mokken Scaling and PCA were applied to the data. A single item measure of total household expenditure is used as a point of contrast. Results An 11 item scale, out of the 17 items, was identified that conformed to the assumptions of a Mokken Scale. All the items in the scale were identified as strong items (Hi > .5). Two PCA measures of SEP were developed as a point of contrast. One PCA measure was developed using all 17 available asset items, the other used the reduced set of 11 items identified in the Mokken scale analaysis. The Mokken Scale measure of SEP and the 17 item PCA measure had a very high correlation (r = .98), and they both correlated moderately with total household expenditure: r = .59 and r = .57 respectively. In contrast the 11 item PCA measure correlated moderately with the Mokken scale (r = .68), and weakly with the total household expenditure (r = .18). Conclusion The Mokken scale measure of household SEP performed at least as well as PCA, and outperformed the PCA measure developed with the 11 items used in the Mokken scale. Unlike PCA, Mokken scaling carries no assumptions about the underlying shape of the distribution of the data, and can be used simultaneous to order household SEP and items. The approach, however, has not been tested with data from other countries and remains an interesting, but under researched approach. PMID:25126103
Reidpath, Daniel D; Ahmadi, Keivan
2014-01-01
Measures of household socio-economic position (SEP) are widely used in health research. There exist a number of approaches to their measurement, with Principal Components Analysis (PCA) applied to a basket of household assets being one of the most common. PCA, however, carries a number of assumptions about the distribution of the data which may be untenable, and alternative, non-parametric, approaches may be preferred. Mokken scale analysis is a non-parametric, item response theory approach to scale development which appears never to have been applied to household asset data. A Mokken scale can be used to rank order items (measures of wealth) as well as households. Using data on household asset ownership from a national sample of 4,154 consenting households in the World Health Survey from Vietnam, 2003, we construct two measures of household SEP. Seventeen items asking about assets, and utility and infrastructure use were used. Mokken Scaling and PCA were applied to the data. A single item measure of total household expenditure is used as a point of contrast. An 11 item scale, out of the 17 items, was identified that conformed to the assumptions of a Mokken Scale. All the items in the scale were identified as strong items (Hi > .5). Two PCA measures of SEP were developed as a point of contrast. One PCA measure was developed using all 17 available asset items, the other used the reduced set of 11 items identified in the Mokken scale analaysis. The Mokken Scale measure of SEP and the 17 item PCA measure had a very high correlation (r = .98), and they both correlated moderately with total household expenditure: r = .59 and r = .57 respectively. In contrast the 11 item PCA measure correlated moderately with the Mokken scale (r = .68), and weakly with the total household expenditure (r = .18). The Mokken scale measure of household SEP performed at least as well as PCA, and outperformed the PCA measure developed with the 11 items used in the Mokken scale. Unlike PCA, Mokken scaling carries no assumptions about the underlying shape of the distribution of the data, and can be used simultaneous to order household SEP and items. The approach, however, has not been tested with data from other countries and remains an interesting, but under researched approach.
Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M
2006-04-21
Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.
Biostatistics Series Module 3: Comparing Groups: Numerical Variables.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Numerical data that are normally distributed can be analyzed with parametric tests, that is, tests which are based on the parameters that define a normal distribution curve. If the distribution is uncertain, the data can be plotted as a normal probability plot and visually inspected, or tested for normality using one of a number of goodness of fit tests, such as the Kolmogorov-Smirnov test. The widely used Student's t-test has three variants. The one-sample t-test is used to assess if a sample mean (as an estimate of the population mean) differs significantly from a given population mean. The means of two independent samples may be compared for a statistically significant difference by the unpaired or independent samples t-test. If the data sets are related in some way, their means may be compared by the paired or dependent samples t-test. The t-test should not be used to compare the means of more than two groups. Although it is possible to compare groups in pairs, when there are more than two groups, this will increase the probability of a Type I error. The one-way analysis of variance (ANOVA) is employed to compare the means of three or more independent data sets that are normally distributed. Multiple measurements from the same set of subjects cannot be treated as separate, unrelated data sets. Comparison of means in such a situation requires repeated measures ANOVA. It is to be noted that while a multiple group comparison test such as ANOVA can point to a significant difference, it does not identify exactly between which two groups the difference lies. To do this, multiple group comparison needs to be followed up by an appropriate post hoc test. An example is the Tukey's honestly significant difference test following ANOVA. If the assumptions for parametric tests are not met, there are nonparametric alternatives for comparing data sets. These include Mann-Whitney U-test as the nonparametric counterpart of the unpaired Student's t-test, Wilcoxon signed-rank test as the counterpart of the paired Student's t-test, Kruskal-Wallis test as the nonparametric equivalent of ANOVA and the Friedman's test as the counterpart of repeated measures ANOVA.
Maiorino, Leonardo; Farke, Andrew A; Kotsakis, Tassos; Teresi, Luciano; Piras, Paolo
2015-11-01
Ceratopsidae represents a group of quadrupedal herbivorous dinosaurs that inhabited western North America and eastern Asia during the Late Cretaceous. Although horns and frills of the cranium are highly variable across species, the lower jaw historically has been considered to be relatively conservative in morphology. Here, the lower jaws from 58 specimens representing 21 ceratopsoid taxa were sampled, using geometric morphometrics and 2D finite element analysis (FEA) to explore differences in morphology and mechanical performance across Ceratopsoidea (the clade including Ceratopsidae, Turanoceratops and Zuniceratops). Principal component analyses and non-parametric permuted manovas highlight Triceratopsini as a morphologically distinct clade within the sample. A relatively robust and elongate dentary, a larger and more elongated coronoid process, and a small and dorso-ventrally compressed angular characterize this clade, as well as the absolutely larger size. By contrast, non-triceratopsin chasmosaurines, Centrosaurini and Pachyrhinosaurini have similar morphologies to each other. Zuniceratops and Avaceratops are distinct from other taxa. No differences in size between Pachyrhinosaurini and Centrosaurini are recovered using non-parametric permuted anovas. Structural performance, as evaluated using a 2D FEA, is similar across all groups as measured by overall stress, with the exception of Triceratopsini. Shape, size and stress are phylogenetically constrained. A longer dentary as well as a long coronoid process result in a lower jaw that is reconstructed as relatively much more stressed in triceratopsins. © 2015 Anatomical Society.
Tao, Chenyang; Nichols, Thomas E.; Hua, Xue; Ching, Christopher R.K.; Rolls, Edmund T.; Thompson, Paul M.; Feng, Jianfeng
2017-01-01
We propose a generalized reduced rank latent factor regression model (GRRLF) for the analysis of tensor field responses and high dimensional covariates. The model is motivated by the need from imaging-genetic studies to identify genetic variants that are associated with brain imaging phenotypes, often in the form of high dimensional tensor fields. GRRLF identifies from the structure in the data the effective dimensionality of the data, and then jointly performs dimension reduction of the covariates, dynamic identification of latent factors, and nonparametric estimation of both covariate and latent response fields. After accounting for the latent and covariate effects, GRLLF performs a nonparametric test on the remaining factor of interest. GRRLF provides a better factorization of the signals compared with common solutions, and is less susceptible to overfitting because it exploits the effective dimensionality. The generality and the flexibility of GRRLF also allow various statistical models to be handled in a unified framework and solutions can be efficiently computed. Within the field of neuroimaging, it improves the sensitivity for weak signals and is a promising alternative to existing approaches. The operation of the framework is demonstrated with both synthetic datasets and a real-world neuroimaging example in which the effects of a set of genes on the structure of the brain at the voxel level were measured, and the results compared favorably with those from existing approaches. PMID:27666385
Shkvarko, Yuriy; Tuxpan, José; Santos, Stewart
2011-01-01
We consider a problem of high-resolution array radar/SAR imaging formalized in terms of a nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) of the random wavefield scattered from a remotely sensed scene observed through a kernel signal formation operator and contaminated with random Gaussian noise. First, the Sobolev-type solution space is constructed to specify the class of consistent kernel SSP estimators with the reproducing kernel structures adapted to the metrics in such the solution space. Next, the “model-free” variational analysis (VA)-based image enhancement approach and the “model-based” descriptive experiment design (DEED) regularization paradigm are unified into a new dynamic experiment design (DYED) regularization framework. Application of the proposed DYED framework to the adaptive array radar/SAR imaging problem leads to a class of two-level (DEED-VA) regularized SSP reconstruction techniques that aggregate the kernel adaptive anisotropic windowing with the projections onto convex sets to enforce the consistency and robustness of the overall iterative SSP estimators. We also show how the proposed DYED regularization method may be considered as a generalization of the MVDR, APES and other high-resolution nonparametric adaptive radar sensing techniques. A family of the DYED-related algorithms is constructed and their effectiveness is finally illustrated via numerical simulations. PMID:22163859
Villanueva, Pia; Newbury, Dianne F; Jara, Lilian; De Barbieri, Zulema; Mirza, Ghazala; Palomino, Hernán M; Fernández, María Angélica; Cazier, Jean-Baptiste; Monaco, Anthony P; Palomino, Hernán
2011-01-01
Specific language impairment (SLI) is an unexpected deficit in the acquisition of language skills and affects between 5 and 8% of pre-school children. Despite its prevalence and high heritability, our understanding of the aetiology of this disorder is only emerging. In this paper, we apply genome-wide techniques to investigate an isolated Chilean population who exhibit an increased frequency of SLI. Loss of heterozygosity (LOH) mapping and parametric and non-parametric linkage analyses indicate that complex genetic factors are likely to underlie susceptibility to SLI in this population. Across all analyses performed, the most consistently implicated locus was on chromosome 7q. This locus achieved highly significant linkage under all three non-parametric models (max NPL=6.73, P=4.0 × 10−11). In addition, it yielded a HLOD of 1.24 in the recessive parametric linkage analyses and contained a segment that was homozygous in two affected individuals. Further, investigation of this region identified a two-SNP haplotype that occurs at an increased frequency in language-impaired individuals (P=0.008). We hypothesise that the linkage regions identified here, in particular that on chromosome 7, may contain variants that underlie the high prevalence of SLI observed in this isolated population and may be of relevance to other populations affected by language impairments. PMID:21248734
Repeat sample intraocular pressure variance in induced and naturally ocular hypertensive monkeys.
Dawson, William W; Dawson, Judyth C; Hope, George M; Brooks, Dennis E; Percicot, Christine L
2005-12-01
To compare repeat-sample means variance of laser induced ocular hypertension (OH) in rhesus monkeys with the repeat-sample mean variance of natural OH in age-range matched monkeys of similar and dissimilar pedigrees. Multiple monocular, retrospective, intraocular pressure (IOP) measures were recorded repeatedly during a short sampling interval (SSI, 1-5 months) and a long sampling interval (LSI, 6-36 months). There were 5-13 eyes in each SSI and LSI subgroup. Each interval contained subgroups from the Florida with natural hypertension (NHT), induced hypertension (IHT1) Florida monkeys, unrelated (Strasbourg, France) induced hypertensives (IHT2), and Florida age-range matched controls (C). Repeat-sample individual variance means and related IOPs were analyzed by a parametric analysis of variance (ANOV) and results compared to non-parametric Kruskal-Wallis ANOV. As designed, all group intraocular pressure distributions were significantly different (P < or = 0.009) except for the two (Florida/Strasbourg) induced OH groups. A parametric 2 x 4 design ANOV for mean variance showed large significant effects due to treatment group and sampling interval. Similar results were produced by the nonparametric ANOV. Induced OH sample variance (LSI) was 43x the natural OH sample variance-mean. The same relationship for the SSI was 12x. Laser induced ocular hypertension in rhesus monkeys produces large IOP repeat-sample variance mean results compared to controls and natural OH.
Reconstruction of cosmological matter perturbations in modified gravity
NASA Astrophysics Data System (ADS)
Gonzalez, J. E.
2017-12-01
The analysis of perturbative quantities is a powerful tool to distinguish between different dark energy models and gravity theories degenerated at the background level. In this work, we generalize the integral solution of the matter density contrast for general relativity gravity [V. Sahni and A. Starobinsky, Int. J. Mod. Phys. D 15, 2105 (2006)., 10.1142/S0218271806009704, U. Alam, V. Sahni, and A. A. Starobinsky, Astrophys. J. 704, 1086 (2009)., 10.1088/0004-637X/704/2/1086] to a wide class of modified gravity (MG) theories. To calculate this solution, it is necessary to have prior knowledge of the Hubble rate, the density parameter at the present epoch (Ωm 0), and the functional form of the effective Newton's constant that characterizes the gravity theory. We estimate in a model-independent way the Hubble expansion rate by applying a nonparametric reconstruction method to model-independent cosmic chronometer data and high-z quasar data. In order to compare our generalized solution of the matter density contrast, using the nonparametric reconstruction of H (z ) from observational data, with a purely theoretical one, we choose a parametrization of the screened modified gravity and the Ωm 0 from WMAP-9 Collaborations. Finally, we calculate the growth index for the analyzed cases, finding very good agreement between theoretical values and the obtained ones using the approach presented in this work.
Karak, Tanmoy; Paul, Ranjit Kumar; Kutu, Funso Raphael; Mehra, Aradhana; Khare, Puja; Dutta, Amrit Kumar; Bora, Krishnamoni; Boruah, Romesh Kumar
2017-02-01
The current study aims to assess the infusion pattern of three important micronutrients namely copper (Cu), iron (Fe), and zinc (Zn) contents from black tea samples produced in Assam (India) and Thohoyandou (South Africa). Average daily intakes and hazardous quotient were reported for these micronutrients. Total content for Cu, Fe, and Zn varied from 2.25 to 48.82 mg kg -1 , 14.75 to 148.18 mg kg -1 , and 28.48 to 106.68 mg kg -1 , respectively. The average contents of each of the three micronutrients were higher in tea leaves samples collected from South Africa than those from India while the contents in tea infusions in Indian samples were higher than in South African tea samples. Results of this study revealed that the consumption of 600 mL tea infusion produced from 24 g of made tea per day may be beneficial to human in terms of these micronutrients content. Application of nonparametric tests revealed that most of the data sets do not satisfy the normality assumptions. Hence, the use of both parametric and nonparametric statistical analysis that subsequently revealed significant differences in elemental contents among Indian and South African tea.
Observed changes in relative humidity and dew point temperature in coastal regions of Iran
NASA Astrophysics Data System (ADS)
Hosseinzadeh Talaee, P.; Sabziparvar, A. A.; Tabari, Hossein
2012-12-01
The analysis of trends in hydroclimatic parameters and assessment of their statistical significance have recently received a great concern to clarify whether or not there is an obvious climate change. In the current study, parametric linear regression and nonparametric Mann-Kendall tests were applied for detecting annual and seasonal trends in the relative humidity (RH) and dew point temperature ( T dew) time series at ten coastal weather stations in Iran during 1966-2005. The serial structure of the data was considered, and the significant serial correlations were eliminated using the trend-free pre-whitening method. The results showed that annual RH increased by 1.03 and 0.28 %/decade at the northern and southern coastal regions of the country, respectively, while annual T dew increased by 0.29 and 0.15°C per decade at the northern and southern regions, respectively. The significant trends were frequent in the T dew series, but they were observed only at 2 out of the 50 RH series. The results showed that the difference between the results of the parametric and nonparametric tests was small, although the parametric test detected larger significant trends in the RH and T dew time series. Furthermore, the differences between the results of the trend tests were not related to the normality of the statistical distribution.
Methods of analysis speech rate: a pilot study.
Costa, Luanna Maria Oliveira; Martins-Reis, Vanessa de Oliveira; Celeste, Letícia Côrrea
2016-01-01
To describe the performance of fluent adults in different measures of speech rate. The study included 24 fluent adults, of both genders, speakers of Brazilian Portuguese, who were born and still living in the metropolitan region of Belo Horizonte, state of Minas Gerais, aged between 18 and 59 years. Participants were grouped by age: G1 (18-29 years), G2 (30-39 years), G3 (40-49 years), and G4 (50-59 years). The speech samples were obtained following the methodology of the Speech Fluency Assessment Protocol. In addition to the measures of speech rate proposed by the protocol (speech rate in words and syllables per minute), the rate of speech into phonemes per second and the articulation rate with and without the disfluencies were calculated. We used the nonparametric Friedman test and the Wilcoxon test for multiple comparisons. Groups were compared using the nonparametric Kruskal Wallis. The significance level was of 5%. There were significant differences between measures of speech rate involving syllables. The multiple comparisons showed that all the three measures were different. There was no effect of age for the studied measures. These findings corroborate previous studies. The inclusion of temporal acoustic measures such as speech rate in phonemes per second and articulation rates with and without disfluencies can be a complementary approach in the evaluation of speech rate.
NASA Astrophysics Data System (ADS)
Diakogiannis, Foivos I.; Lewis, Geraint F.; Ibata, Rodrigo A.; Guglielmo, Magda; Kafle, Prajwal R.; Wilkinson, Mark I.; Power, Chris
2017-09-01
Dwarf galaxies, among the most dark matter dominated structures of our Universe, are excellent test-beds for dark matter theories. Unfortunately, mass modelling of these systems suffers from the well-documented mass-velocity anisotropy degeneracy. For the case of spherically symmetric systems, we describe a method for non-parametric modelling of the radial and tangential velocity moments. The method is a numerical velocity anisotropy 'inversion', with parametric mass models, where the radial velocity dispersion profile, σrr2, is modelled as a B-spline, and the optimization is a three-step process that consists of (I) an evolutionary modelling to determine the mass model form and the best B-spline basis to represent σrr2; (II) an optimization of the smoothing parameters and (III) a Markov chain Monte Carlo analysis to determine the physical parameters. The mass-anisotropy degeneracy is reduced into mass model inference, irrespective of kinematics. We test our method using synthetic data. Our algorithm constructs the best kinematic profile and discriminates between competing dark matter models. We apply our method to the Fornax dwarf spheroidal galaxy. Using a King brightness profile and testing various dark matter mass models, our model inference favours a simple mass-follows-light system. We find that the anisotropy profile of Fornax is tangential (β(r) < 0) and we estimate a total mass of M_{tot} = 1.613^{+0.050}_{-0.075} × 10^8 M_{⊙}, and a mass-to-light ratio of Υ_V = 8.93 ^{+0.32}_{-0.47} (M_{⊙}/L_{⊙}). The algorithm we present is a robust and computationally inexpensive method for non-parametric modelling of spherical clusters independent of the mass-anisotropy degeneracy.
Khan, Anzalee; Lindenmayer, Jean-Pierre; Opler, Mark; Yavorsky, Christian; Rothman, Brian; Lucic, Luka
2013-10-01
Debate persists with regard to how best to categorize the syndromal dimension of negative symptoms in schizophrenia. The aim was to first review published Principle Components Analysis (PCA) of the PANSS, and extract items most frequently included in the negative domain, and secondly, to examine the quality of items using Item Response Theory (IRT) to select items that best represent a measurable dimension (or dimensions) of negative symptoms. First, 22 factor analyses and PCA met were included. Second, using a large dataset (n=7187) of participants in clinical trials with chronic schizophrenia, we extracted items loading on one or more PCA. Third, items not loading with a value of ≥ 0.5, or loading on more than one component with values of ≥ 0.5 were discarded. Fourth, resulting items were included in a non-parametric IRT and retained based on Option Characteristic Curves (OCCs) and Item Characteristic Curves (ICCs). 15 items loaded on a negative domain in at least one study, with Emotional Withdrawal loading on all studies. Non-parametric IRT retained nine items as an Integrated Negative Factor: Emotional Withdrawal, Blunted Affect, Passive/Apathetic Social Withdrawal, Poor Rapport, Lack of Spontaneity/Conversation Flow, Active Social Avoidance, Disturbance of Volition, Stereotyped Thinking and Difficulty in Abstract Thinking. This is the first study to use a psychometric IRT process to arrive at a set of negative symptom items. Future steps will include further examination of these nine items in terms of their stability, sensitivity to change, and correlations with functional and cognitive outcomes. © 2013 Elsevier B.V. All rights reserved.
Descriptive quantitative analysis of hallux abductovalgus transverse plane radiographic parameters.
Meyr, Andrew J; Myers, Adam; Pontious, Jane
2014-01-01
Although the transverse plane radiographic parameters of the first intermetatarsal angle (IMA), hallux abductus angle (HAA), and the metatarsal-sesamoid position (MSP) form the basis of preoperative procedure selection and postoperative surgical evaluation of the hallux abductovalgus deformity, the so-called normal values of these measurements have not been well established. The objectives of the present study were to (1) evaluate the descriptive statistics of the first IMA, HAA, and MSP from a large patient population and (2) to determine an objective basis for defining "normal" versus "abnormal" measurements. Anteroposterior foot radiographs from 373 consecutive patients without a history of previous foot and ankle surgery and/or trauma were evaluated for the measurements of the first IMA, HAA, and MSP. The results revealed a mean measurement of 9.93°, 17.59°, and position 3.63 for the first IMA, HAA, and MSP, respectively. An advanced descriptive analysis demonstrated data characteristics of both parametric and nonparametric distributions. Furthermore, clear differentiations in deformity progression were appreciated when the variables were graphically depicted against each other. This could represent a quantitative basis for defining "normal" versus "abnormal" values. From the results of the present study, we have concluded that these radiographic parameters can be more conservatively reported and analyzed using nonparametric descriptive and comparative statistics within medical studies and that the combination of a first IMA, HAA, and MSP at or greater than approximately 10°, 18°, and position 4, respectively, appears to be an objective "tipping point" in terms of deformity progression and might represent an upper limit of acceptable in terms of surgical deformity correction. Copyright © 2014 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Rajwa, Bartek; Wallace, Paul K.; Griffiths, Elizabeth A.; Dundar, Murat
2017-01-01
Objective Flow cytometry (FC) is a widely acknowledged technology in diagnosis of acute myeloid leukemia (AML) and has been indispensable in determining progression of the disease. Although FC plays a key role as a post-therapy prognosticator and evaluator of therapeutic efficacy, the manual analysis of cytometry data is a barrier to optimization of reproducibility and objectivity. This study investigates the utility of our recently introduced non-parametric Bayesian framework in accurately predicting the direction of change in disease progression in AML patients using FC data. Methods The highly flexible non-parametric Bayesian model based on the infinite mixture of infinite Gaussian mixtures is used for jointly modeling data from multiple FC samples to automatically identify functionally distinct cell populations and their local realizations. Phenotype vectors are obtained by characterizing each sample by the proportions of recovered cell populations, which are in turn used to predict the direction of change in disease progression for each patient. Results We used 200 diseased and non-diseased immunophenotypic panels for training and tested the system with 36 additional AML cases collected at multiple time points. The proposed framework identified the change in direction of disease progression with accuracies of 90% (9 out of 10) for relapsing cases and 100% (26 out of 26) for the remaining cases. Conclusions We believe that these promising results are an important first step towards the development of automated predictive systems for disease monitoring and continuous response evaluation. Significance Automated measurement and monitoring of therapeutic response is critical not only for objective evaluation of disease status prognosis but also for timely assessment of treatment strategies. PMID:27416585
A non-parametric peak calling algorithm for DamID-Seq.
Li, Renhua; Hempel, Leonie U; Jiang, Tingbo
2015-01-01
Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.
Zou, Kelly H; Resnic, Frederic S; Talos, Ion-Florin; Goldberg-Zimring, Daniel; Bhagwat, Jui G; Haker, Steven J; Kikinis, Ron; Jolesz, Ferenc A; Ohno-Machado, Lucila
2005-10-01
Medical classification accuracy studies often yield continuous data based on predictive models for treatment outcomes. A popular method for evaluating the performance of diagnostic tests is the receiver operating characteristic (ROC) curve analysis. The main objective was to develop a global statistical hypothesis test for assessing the goodness-of-fit (GOF) for parametric ROC curves via the bootstrap. A simple log (or logit) and a more flexible Box-Cox normality transformations were applied to untransformed or transformed data from two clinical studies to predict complications following percutaneous coronary interventions (PCIs) and for image-guided neurosurgical resection results predicted by tumor volume, respectively. We compared a non-parametric with a parametric binormal estimate of the underlying ROC curve. To construct such a GOF test, we used the non-parametric and parametric areas under the curve (AUCs) as the metrics, with a resulting p value reported. In the interventional cardiology example, logit and Box-Cox transformations of the predictive probabilities led to satisfactory AUCs (AUC=0.888; p=0.78, and AUC=0.888; p=0.73, respectively), while in the brain tumor resection example, log and Box-Cox transformations of the tumor size also led to satisfactory AUCs (AUC=0.898; p=0.61, and AUC=0.899; p=0.42, respectively). In contrast, significant departures from GOF were observed without applying any transformation prior to assuming a binormal model (AUC=0.766; p=0.004, and AUC=0.831; p=0.03), respectively. In both studies the p values suggested that transformations were important to consider before applying any binormal model to estimate the AUC. Our analyses also demonstrated and confirmed the predictive values of different classifiers for determining the interventional complications following PCIs and resection outcomes in image-guided neurosurgery.
The urban heat island in Rio de Janeiro, Brazil, in the last 30 years using remote sensing data
NASA Astrophysics Data System (ADS)
Peres, Leonardo de Faria; Lucena, Andrews José de; Rotunno Filho, Otto Corrêa; França, José Ricardo de Almeida
2018-02-01
The aim of this work is to study urban heat island (UHI) in Metropolitan Area of Rio de Janeiro (MARJ) based on the analysis of land-surface temperature (LST) and land-use patterns retrieved from Landsat-5/Thematic Mapper (TM), Landsat-7/Enhanced Thematic Mapper Plus (ETM+) and Landsat-8/Operational Land Imager (OLI) and Thermal Infrared Sensors (TIRS) data covering a 32-year period between 1984 and 2015. LST temporal evolution is assessed by comparing the average LST composites for 1984-1999 and 2000-2015 where the parametric Student t-test was conducted at 5% significance level to map the pixels where LST for the more recent period is statistically significantly greater than the previous one. The non-parametric Mann-Whitney-Wilcoxon rank sum test has also confirmed at the same 5% significance level that the more recent period (2000-2015) has higher LST values. UHI intensity between ;urban; and ;rural/urban low density; (;vegetation;) areas for 1984-1999 and 2000-2015 was established and confirmed by both parametric and non-parametric tests at 1% significance level as 3.3 °C (5.1 °C) and 4.4 °C (7.1 °C), respectively. LST has statistically significantly (p-value < 0.01) increased over time in two of three land cover classes (;urban; and ;urban low density;), respectively by 1.9 °C and 0.9 °C, except in ;vegetation; class. A spatial analysis was also performed to identify the urban pixels within MARJ where UHI is more intense by subtracting the LST of these pixels from the LST mean value of ;vegetation; land-use class.
Complete genomic screen in Parkinson disease: evidence for multiple genes.
Scott, W K; Nance, M A; Watts, R L; Hubble, J P; Koller, W C; Lyons, K; Pahwa, R; Stern, M B; Colcher, A; Hiner, B C; Jankovic, J; Ondo, W G; Allen, F H; Goetz, C G; Small, G W; Masterman, D; Mastaglia, F; Laing, N G; Stajich, J M; Slotterbeck, B; Booze, M W; Ribble, R C; Rampersaud, E; West, S G; Gibson, R A; Middleton, L T; Roses, A D; Haines, J L; Scott, B L; Vance, J M; Pericak-Vance, M A
2001-11-14
The relative contribution of genes vs environment in idiopathic Parkinson disease (PD) is controversial. Although genetic studies have identified 2 genes in which mutations cause rare single-gene variants of PD and observational studies have suggested a genetic component, twin studies have suggested that little genetic contribution exists in the common forms of PD. To identify genetic risk factors for idiopathic PD. Genetic linkage study conducted 1995-2000 in which a complete genomic screen (n = 344 markers) was performed in 174 families with multiple individuals diagnosed as having idiopathic PD, identified through probands in 13 clinic populations in the continental United States and Australia. A total of 870 family members were studied: 378 diagnosed as having PD, 379 unaffected by PD, and 113 with unclear status. Logarithm of odds (lod) scores generated from parametric and nonparametric genetic linkage analysis. Two-point parametric maximum parametric lod score (MLOD) and multipoint nonparametric lod score (LOD) linkage analysis detected significant evidence for linkage to 5 distinct chromosomal regions: chromosome 6 in the parkin gene (MLOD = 5.07; LOD = 5.47) in families with at least 1 individual with PD onset at younger than 40 years, chromosomes 17q (MLOD = 2.28; LOD = 2.62), 8p (MLOD = 2.01; LOD = 2.22), and 5q (MLOD = 2.39; LOD = 1.50) overall and in families with late-onset PD, and chromosome 9q (MLOD = 1.52; LOD = 2.59) in families with both levodopa-responsive and levodopa-nonresponsive patients. Our data suggest that the parkin gene is important in early-onset PD and that multiple genetic factors may be important in the development of idiopathic late-onset PD.
Guindon, Stéphane; Dufayard, Jean-François; Lefort, Vincent; Anisimova, Maria; Hordijk, Wim; Gascuel, Olivier
2010-05-01
PhyML is a phylogeny software based on the maximum-likelihood principle. Early PhyML versions used a fast algorithm performing nearest neighbor interchanges to improve a reasonable starting tree topology. Since the original publication (Guindon S., Gascuel O. 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52:696-704), PhyML has been widely used (>2500 citations in ISI Web of Science) because of its simplicity and a fair compromise between accuracy and speed. In the meantime, research around PhyML has continued, and this article describes the new algorithms and methods implemented in the program. First, we introduce a new algorithm to search the tree space with user-defined intensity using subtree pruning and regrafting topological moves. The parsimony criterion is used here to filter out the least promising topology modifications with respect to the likelihood function. The analysis of a large collection of real nucleotide and amino acid data sets of various sizes demonstrates the good performance of this method. Second, we describe a new test to assess the support of the data for internal branches of a phylogeny. This approach extends the recently proposed approximate likelihood-ratio test and relies on a nonparametric, Shimodaira-Hasegawa-like procedure. A detailed analysis of real alignments sheds light on the links between this new approach and the more classical nonparametric bootstrap method. Overall, our tests show that the last version (3.0) of PhyML is fast, accurate, stable, and ready to use. A Web server and binary files are available from http://www.atgc-montpellier.fr/phyml/.
Carmignani, Lucio O; Pedro, Adriana Orcesi; Montemor, Eliana B; Arias, Victor A; Costa-Paiva, Lucia H; Pinto-Neto, Aarão M
2015-07-01
This study aims to compare the effects of a soy-based dietary supplement, low-dose hormone therapy (HT), and placebo on the urogenital system in postmenopausal women. In this double-blind, randomized, placebo-controlled trial, 60 healthy postmenopausal women aged 40 to 60 years (mean time since menopause, 4.1 y) were randomized into three groups: a soy dietary supplement group (90 mg of isoflavone), a low-dose HT group (1 mg of estradiol plus 0.5 mg of norethisterone), and a placebo group. Urinary, vaginal, and sexual complaints were evaluated using the urogenital subscale of the Menopause Rating Scale. Vaginal maturation value was calculated. Transvaginal sonography was performed to evaluate endometrial thickness. Genital bleeding pattern was assessed. Statistical analysis was performed using χ(2) test, Fisher's exact test, paired Student's t test, Kruskal-Wallis test, Kruskal-Wallis nonparametric test, and analysis of variance. For intergroup comparisons, Kruskal-Wallis nonparametric test (followed by Mann-Whitney U test) was used. Vaginal dryness improved significantly in the soy and HT groups (P = 0.04). Urinary and sexual symptoms did not change with treatment in the three groups. After 16 weeks of treatment, there was a significant increase in maturation value only in the HT group (P < 0.01). Vaginal pH decreased only in this group (P < 0.01). There were no statistically significant differences in endometrial thickness between the three groups, and the adverse effects evaluated were similar. This study shows that a soy-based dietary supplement used for 16 weeks fails to exert estrogenic action on the urogenital tract but improves vaginal dryness.
Ji, Jiadong; He, Di; Feng, Yang; He, Yong; Xue, Fuzhong; Xie, Lei
2017-10-01
A complex disease is usually driven by a number of genes interwoven into networks, rather than a single gene product. Network comparison or differential network analysis has become an important means of revealing the underlying mechanism of pathogenesis and identifying clinical biomarkers for disease classification. Most studies, however, are limited to network correlations that mainly capture the linear relationship among genes, or rely on the assumption of a parametric probability distribution of gene measurements. They are restrictive in real application. We propose a new Joint density based non-parametric Differential Interaction Network Analysis and Classification (JDINAC) method to identify differential interaction patterns of network activation between two groups. At the same time, JDINAC uses the network biomarkers to build a classification model. The novelty of JDINAC lies in its potential to capture non-linear relations between molecular interactions using high-dimensional sparse data as well as to adjust confounding factors, without the need of the assumption of a parametric probability distribution of gene measurements. Simulation studies demonstrate that JDINAC provides more accurate differential network estimation and lower classification error than that achieved by other state-of-the-art methods. We apply JDINAC to a Breast Invasive Carcinoma dataset, which includes 114 patients who have both tumor and matched normal samples. The hub genes and differential interaction patterns identified were consistent with existing experimental studies. Furthermore, JDINAC discriminated the tumor and normal sample with high accuracy by virtue of the identified biomarkers. JDINAC provides a general framework for feature selection and classification using high-dimensional sparse omics data. R scripts available at https://github.com/jijiadong/JDINAC. lxie@iscb.org. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Churchill, Nathan W.; Oder, Anita; Abdi, Hervé; Tam, Fred; Lee, Wayne; Thomas, Christopher; Ween, Jon E.; Graham, Simon J.; Strother, Stephen C.
2016-01-01
Subject-specific artifacts caused by head motion and physiological noise are major confounds in BOLD fMRI analyses. However, there is little consensus on the optimal choice of data preprocessing steps to minimize these effects. To evaluate the effects of various preprocessing strategies, we present a framework which comprises a combination of (1) nonparametric testing including reproducibility and prediction metrics of the data-driven NPAIRS framework (Strother et al. [2002]: NeuroImage 15:747–771), and (2) intersubject comparison of SPM effects, using DISTATIS (a three-way version of metric multidimensional scaling (Abdi et al. [2009]: NeuroImage 45:89–95). It is shown that the quality of brain activation maps may be significantly limited by sub-optimal choices of data preprocessing steps (or “pipeline”) in a clinical task-design, an fMRI adaptation of the widely used Trail-Making Test. The relative importance of motion correction, physiological noise correction, motion parameter regression, and temporal detrending were examined for fMRI data acquired in young, healthy adults. Analysis performance and the quality of activation maps were evaluated based on Penalized Discriminant Analysis (PDA). The relative importance of different preprocessing steps was assessed by (1) a nonparametric Friedman rank test for fixed sets of preprocessing steps, applied to all subjects; and (2) evaluating pipelines chosen specifically for each subject. Results demonstrate that preprocessing choices have significant, but subject-dependant effects, and that individually-optimized pipelines may significantly improve the reproducibility of fMRI results over fixed pipelines. This was demonstrated by the detection of a significant interaction with motion parameter regression and physiological noise correction, even though the range of subject head motion was small across the group (≪ 1 voxel). Optimizing pipelines on an individual-subject basis also revealed brain activation patterns either weak or absent under fixed pipelines, which has implications for the overall interpretation of fMRI data, and the relative importance of preprocessing methods. PMID:21455942
A Genomewide Linkage Scan of Cocaine Dependence and Major Depressive Episode in Two Populations
Yang, Bao-Zhu; Han, Shizhong; Kranzler, Henry R; Farrer, Lindsay A; Gelernter, Joel
2011-01-01
Cocaine dependence (CD) and major depressive episode (MDE) frequently co-occur with poorer treatment outcome and higher relapse risk. Shared genetic risk was affirmed; to date, there have been no reports of genomewide linkage scans (GWLSs) surveying the susceptibility regions for comorbid CD and MDE (CD–MDE). We aimed to identify chromosomal regions and candidate genes susceptible to CD, MDE, and CD–MDE in African Americans (AAs) and European Americans (EAs). A total of 1896 individuals were recruited from 384 AA and 355 EA families, each with at least a sibling-pair with CD and/or opioid dependence. Array-based genotyping of about 6000 single-nucleotide polymorphisms was completed for all individuals. Parametric and non-parametric genomewide linkage analyses were performed. We found a genomewide-significant linkage peak on chromosome 7 at 183.4 cM for non-parametric analysis of CD–MDE in AAs (lod=3.8, genomewide empirical p=0.016; point-wise p=0.00001). A nearly genomewide significant linkage was identified for CD–MDE in EAs on chromosome 5 at 14.3 cM (logarithm of odds (lod)=2.95, genomewide empirical p=0.055; point-wise p=0.00012). Parametric analysis corroborated the findings in these two regions and improved the support for the peak on chromosome 5 so that it reached genomewide significance (heterogeneity lod=3.28, genomewide empirical p=0.046; point-wise p=0.00053). This is the first GWLS for CD–MDE. The genomewide significant linkage regions on chromosomes 5 and 7 harbor four particularly promising candidate genes: SRD5A1, UBE3C, PTPRN2, and VIPR2. Replication of the linkage findings in other populations is warranted, as is a focused analysis of the genes located in the linkage regions implicated here. PMID:21849985
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Casella, Ivan Benaduce; Fukushima, Rodrigo Bono; Marques, Anita Battistini de Azevedo; Cury, Marcus Vinícius Martins; Presti, Calógero
2015-03-01
To compare a new dedicated software program and Adobe Photoshop for gray-scale median (GSM) analysis of B-mode images of carotid plaques. A series of 42 carotid plaques generating ≥50% diameter stenosis was evaluated by a single observer. The best segment for visualization of internal carotid artery plaque was identified on a single longitudinal view and images were recorded in JPEG format. Plaque analysis was performed by both programs. After normalization of image intensity (blood = 0, adventitial layer = 190), histograms were obtained after manual delineation of plaque. Results were compared with nonparametric Wilcoxon signed rank test and Kendall tau-b correlation analysis. GSM ranged from 00 to 100 with Adobe Photoshop and from 00 to 96 with IMTPC, with a high grade of similarity between image pairs, and a highly significant correlation (R = 0.94, p < .0001). IMTPC software appears suitable for the GSM analysis of carotid plaques. © 2014 Wiley Periodicals, Inc.
Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data.
Rohrer, Sebastian G; Baumann, Knut
2009-02-01
Refined nearest neighbor analysis was recently introduced for the analysis of virtual screening benchmark data sets. It constitutes a technique from the field of spatial statistics and provides a mathematical framework for the nonparametric analysis of mapped point patterns. Here, refined nearest neighbor analysis is used to design benchmark data sets for virtual screening based on PubChem bioactivity data. A workflow is devised that purges data sets of compounds active against pharmaceutically relevant targets from unselective hits. Topological optimization using experimental design strategies monitored by refined nearest neighbor analysis functions is applied to generate corresponding data sets of actives and decoys that are unbiased with regard to analogue bias and artificial enrichment. These data sets provide a tool for Maximum Unbiased Validation (MUV) of virtual screening methods. The data sets and a software package implementing the MUV design workflow are freely available at http://www.pharmchem.tu-bs.de/lehre/baumann/MUV.html.
Locally-Based Kernal PLS Smoothing to Non-Parametric Regression Curve Fitting
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Trejo, Leonard J.; Wheeler, Kevin; Korsmeyer, David (Technical Monitor)
2002-01-01
We present a novel smoothing approach to non-parametric regression curve fitting. This is based on kernel partial least squares (PLS) regression in reproducing kernel Hilbert space. It is our concern to apply the methodology for smoothing experimental data where some level of knowledge about the approximate shape, local inhomogeneities or points where the desired function changes its curvature is known a priori or can be derived based on the observed noisy data. We propose locally-based kernel PLS regression that extends the previous kernel PLS methodology by incorporating this knowledge. We compare our approach with existing smoothing splines, hybrid adaptive splines and wavelet shrinkage techniques on two generated data sets.
Nonparametric Model of Smooth Muscle Force Production During Electrical Stimulation.
Cole, Marc; Eikenberry, Steffen; Kato, Takahide; Sandler, Roman A; Yamashiro, Stanley M; Marmarelis, Vasilis Z
2017-03-01
A nonparametric model of smooth muscle tension response to electrical stimulation was estimated using the Laguerre expansion technique of nonlinear system kernel estimation. The experimental data consisted of force responses of smooth muscle to energy-matched alternating single pulse and burst current stimuli. The burst stimuli led to at least a 10-fold increase in peak force in smooth muscle from Mytilus edulis, despite the constant energy constraint. A linear model did not fit the data. However, a second-order model fit the data accurately, so the higher-order models were not required to fit the data. Results showed that smooth muscle force response is not linearly related to the stimulation power.
Non-parametric diffeomorphic image registration with the demons algorithm.
Vercauteren, Tom; Pennec, Xavier; Perchant, Aymeric; Ayache, Nicholas
2007-01-01
We propose a non-parametric diffeomorphic image registration algorithm based on Thirion's demons algorithm. The demons algorithm can be seen as an optimization procedure on the entire space of displacement fields. The main idea of our algorithm is to adapt this procedure to a space of diffeomorphic transformations. In contrast to many diffeomorphic registration algorithms, our solution is computationally efficient since in practice it only replaces an addition of free form deformations by a few compositions. Our experiments show that in addition to being diffeomorphic, our algorithm provides results that are similar to the ones from the demons algorithm but with transformations that are much smoother and closer to the true ones in terms of Jacobians.
Lee, Minjung; Dignam, James J.; Han, Junhee
2014-01-01
We propose a nonparametric approach for cumulative incidence estimation when causes of failure are unknown or missing for some subjects. Under the missing at random assumption, we estimate the cumulative incidence function using multiple imputation methods. We develop asymptotic theory for the cumulative incidence estimators obtained from multiple imputation methods. We also discuss how to construct confidence intervals for the cumulative incidence function and perform a test for comparing the cumulative incidence functions in two samples with missing cause of failure. Through simulation studies, we show that the proposed methods perform well. The methods are illustrated with data from a randomized clinical trial in early stage breast cancer. PMID:25043107
Hidden Markov models and neural networks for fault detection in dynamic systems
NASA Technical Reports Server (NTRS)
Smyth, Padhraic
1994-01-01
Neural networks plus hidden Markov models (HMM) can provide excellent detection and false alarm rate performance in fault detection applications, as shown in this viewgraph presentation. Modified models allow for novelty detection. Key contributions of neural network models are: (1) excellent nonparametric discrimination capability; (2) a good estimator of posterior state probabilities, even in high dimensions, and thus can be embedded within overall probabilistic model (HMM); and (3) simple to implement compared to other nonparametric models. Neural network/HMM monitoring model is currently being integrated with the new Deep Space Network (DSN) antenna controller software and will be on-line monitoring a new DSN 34-m antenna (DSS-24) by July, 1994.
Hoover, D R; Peng, Y; Saah, A J; Detels, R R; Day, R S; Phair, J P
A simple non-parametric approach is developed to simultaneously estimate net incidence and morbidity time from specific AIDS illnesses in populations at high risk for death from these illnesses and other causes. The disease-death process has four-stages that can be recast as two sandwiching three-state multiple decrement processes. Non-parametric estimation of net incidence and morbidity time with error bounds are achieved from these sandwiching models through modification of methods from Aalen and Greenwood, and bootstrapping. An application to immunosuppressed HIV-1 infected homosexual men reveals that cytomegalovirus disease, Kaposi's sarcoma and Pneumocystis pneumonia are likely to occur and cause significant morbidity time.
Bhattacharya, Abhishek; Dunson, David B.
2012-01-01
This article considers a broad class of kernel mixture density models on compact metric spaces and manifolds. Following a Bayesian approach with a nonparametric prior on the location mixing distribution, sufficient conditions are obtained on the kernel, prior and the underlying space for strong posterior consistency at any continuous density. The prior is also allowed to depend on the sample size n and sufficient conditions are obtained for weak and strong consistency. These conditions are verified on compact Euclidean spaces using multivariate Gaussian kernels, on the hypersphere using a von Mises-Fisher kernel and on the planar shape space using complex Watson kernels. PMID:22984295
A mixture model for robust registration in Kinect sensor
NASA Astrophysics Data System (ADS)
Peng, Li; Zhou, Huabing; Zhu, Shengguo
2018-03-01
The Microsoft Kinect sensor has been widely used in many applications, but it suffers from the drawback of low registration precision between color image and depth image. In this paper, we present a robust method to improve the registration precision by a mixture model that can handle multiply images with the nonparametric model. We impose non-parametric geometrical constraints on the correspondence, as a prior distribution, in a reproducing kernel Hilbert space (RKHS).The estimation is performed by the EM algorithm which by also estimating the variance of the prior model is able to obtain good estimates. We illustrate the proposed method on the public available dataset. The experimental results show that our approach outperforms the baseline methods.
Burroughs, N J; Pillay, D; Mutimer, D
1999-01-01
Bayesian analysis using a virus dynamics model is demonstrated to facilitate hypothesis testing of patterns in clinical time-series. Our Markov chain Monte Carlo implementation demonstrates that the viraemia time-series observed in two sets of hepatitis B patients on antiviral (lamivudine) therapy, chronic carriers and liver transplant patients, are significantly different, overcoming clinical trial design differences that question the validity of non-parametric tests. We show that lamivudine-resistant mutants grow faster in transplant patients than in chronic carriers, which probably explains the differences in emergence times and failure rates between these two sets of patients. Incorporation of dynamic models into Bayesian parameter analysis is of general applicability in medical statistics. PMID:10643081
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].
Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel
2017-01-01
The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Survivor experience of a child sexual abuse prevention program: a pilot study.
Barron, Ian G; Topping, Keith J
2013-09-01
Addressing gaps in the research, the current study assesses the impact of a community-based child sexual abuse prevention program on known survivor knowledge/skills, disclosures, and subjective experience. Methodologically, novel measures of program fidelity and implementation cost are applied. A pre- posttest wait-list control design was utilized with intervention (n = 10) and comparison groups (n = 10). Measures included a standardized knowledge/skill questionnaire, coding of disclosures, subjective experience questionnaires, in-depth interviews, video analysis of program adherence, and a measure of cost. Analysis involved nonparametric tests and thematic analysis of interview and video data. Cost was calculated for the group and per survivor. Survivors achieved significant gains in knowledge/skills, made further disclosures, and were positive about their program experience. No gains were identified in the control group. Costs were small. Future studies need to explore survivor experience of programs delivered in classrooms.
Comparison of four approaches to a rock facies classification problem
Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.
2007-01-01
In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.
Confidence intervals for single-case effect size measures based on randomization test inversion.
Michiels, Bart; Heyvaert, Mieke; Meulders, Ann; Onghena, Patrick
2017-02-01
In the current paper, we present a method to construct nonparametric confidence intervals (CIs) for single-case effect size measures in the context of various single-case designs. We use the relationship between a two-sided statistical hypothesis test at significance level α and a 100 (1 - α) % two-sided CI to construct CIs for any effect size measure θ that contain all point null hypothesis θ values that cannot be rejected by the hypothesis test at significance level α. This method of hypothesis test inversion (HTI) can be employed using a randomization test as the statistical hypothesis test in order to construct a nonparametric CI for θ. We will refer to this procedure as randomization test inversion (RTI). We illustrate RTI in a situation in which θ is the unstandardized and the standardized difference in means between two treatments in a completely randomized single-case design. Additionally, we demonstrate how RTI can be extended to other types of single-case designs. Finally, we discuss a few challenges for RTI as well as possibilities when using the method with other effect size measures, such as rank-based nonoverlap indices. Supplementary to this paper, we provide easy-to-use R code, which allows the user to construct nonparametric CIs according to the proposed method.