Sample records for multivariate nonparametric regression

  1. Modelling lecturer performance index of private university in Tulungagung by using survival analysis with multivariate adaptive regression spline

    NASA Astrophysics Data System (ADS)

    Hasyim, M.; Prastyo, D. D.

    2018-03-01

    Survival analysis performs relationship between independent variables and survival time as dependent variable. In fact, not all survival data can be recorded completely by any reasons. In such situation, the data is called censored data. Moreover, several model for survival analysis requires assumptions. One of the approaches in survival analysis is nonparametric that gives more relax assumption. In this research, the nonparametric approach that is employed is Multivariate Regression Adaptive Spline (MARS). This study is aimed to measure the performance of private university’s lecturer. The survival time in this study is duration needed by lecturer to obtain their professional certificate. The results show that research activities is a significant factor along with developing courses material, good publication in international or national journal, and activities in research collaboration.

  2. Non-parametric directionality analysis - Extension for removal of a single common predictor and application to time series.

    PubMed

    Halliday, David M; Senik, Mohd Harizal; Stevenson, Carl W; Mason, Rob

    2016-08-01

    The ability to infer network structure from multivariate neuronal signals is central to computational neuroscience. Directed network analyses typically use parametric approaches based on auto-regressive (AR) models, where networks are constructed from estimates of AR model parameters. However, the validity of using low order AR models for neurophysiological signals has been questioned. A recent article introduced a non-parametric approach to estimate directionality in bivariate data, non-parametric approaches are free from concerns over model validity. We extend the non-parametric framework to include measures of directed conditional independence, using scalar measures that decompose the overall partial correlation coefficient summatively by direction, and a set of functions that decompose the partial coherence summatively by direction. A time domain partial correlation function allows both time and frequency views of the data to be constructed. The conditional independence estimates are conditioned on a single predictor. The framework is applied to simulated cortical neuron networks and mixtures of Gaussian time series data with known interactions. It is applied to experimental data consisting of local field potential recordings from bilateral hippocampus in anaesthetised rats. The framework offers a non-parametric approach to estimation of directed interactions in multivariate neuronal recordings, and increased flexibility in dealing with both spike train and time series data. The framework offers a novel alternative non-parametric approach to estimate directed interactions in multivariate neuronal recordings, and is applicable to spike train and time series data. Copyright © 2016 Elsevier B.V. All rights reserved.

  3. Nonparametric regression applied to quantitative structure-activity relationships

    PubMed

    Constans; Hirst

    2000-03-01

    Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density designs. Performances were benchmarked against the nonlinear method of smoothing splines. A linear reference point was provided by multilinear regression (MLR). Variable selection was explored using systematic combinations of different variables and combinations of principal components. For the data set examined, 47 inhibitors of dopamine beta-hydroxylase, the additive nonparametric regressors have greater predictive accuracy (as measured by the mean absolute error of the predictions or the Pearson correlation in cross-validation trails) than MLR. The use of principal components did not improve the performance of the nonparametric regressors over use of the original descriptors, since the original descriptors are not strongly correlated. It remains to be seen if the nonparametric regressors can be successfully coupled with better variable selection and dimensionality reduction in the context of high-dimensional QSARs.

  4. Local polynomial estimation of heteroscedasticity in a multivariate linear regression model and its applications in economics.

    PubMed

    Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan

    2012-01-01

    Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.

  5. Stock price forecasting for companies listed on Tehran stock exchange using multivariate adaptive regression splines model and semi-parametric splines technique

    NASA Astrophysics Data System (ADS)

    Rounaghi, Mohammad Mahdi; Abbaszadeh, Mohammad Reza; Arashi, Mohammad

    2015-11-01

    One of the most important topics of interest to investors is stock price changes. Investors whose goals are long term are sensitive to stock price and its changes and react to them. In this regard, we used multivariate adaptive regression splines (MARS) model and semi-parametric splines technique for predicting stock price in this study. The MARS model as a nonparametric method is an adaptive method for regression and it fits for problems with high dimensions and several variables. semi-parametric splines technique was used in this study. Smoothing splines is a nonparametric regression method. In this study, we used 40 variables (30 accounting variables and 10 economic variables) for predicting stock price using the MARS model and using semi-parametric splines technique. After investigating the models, we select 4 accounting variables (book value per share, predicted earnings per share, P/E ratio and risk) as influencing variables on predicting stock price using the MARS model. After fitting the semi-parametric splines technique, only 4 accounting variables (dividends, net EPS, EPS Forecast and P/E Ratio) were selected as variables effective in forecasting stock prices.

  6. Diagnostic tools for nearest neighbors techniques when used with satellite imagery

    Treesearch

    Ronald E. McRoberts

    2009-01-01

    Nearest neighbors techniques are non-parametric approaches to multivariate prediction that are useful for predicting both continuous and categorical forest attribute variables. Although some assumptions underlying nearest neighbor techniques are common to other prediction techniques such as regression, other assumptions are unique to nearest neighbor techniques....

  7. Surface Estimation, Variable Selection, and the Nonparametric Oracle Property.

    PubMed

    Storlie, Curtis B; Bondell, Howard D; Reich, Brian J; Zhang, Hao Helen

    2011-04-01

    Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting.

  8. Surface Estimation, Variable Selection, and the Nonparametric Oracle Property

    PubMed Central

    Storlie, Curtis B.; Bondell, Howard D.; Reich, Brian J.; Zhang, Hao Helen

    2010-01-01

    Variable selection for multivariate nonparametric regression is an important, yet challenging, problem due, in part, to the infinite dimensionality of the function space. An ideal selection procedure should be automatic, stable, easy to use, and have desirable asymptotic properties. In particular, we define a selection procedure to be nonparametric oracle (np-oracle) if it consistently selects the correct subset of predictors and at the same time estimates the smooth surface at the optimal nonparametric rate, as the sample size goes to infinity. In this paper, we propose a model selection procedure for nonparametric models, and explore the conditions under which the new method enjoys the aforementioned properties. Developed in the framework of smoothing spline ANOVA, our estimator is obtained via solving a regularization problem with a novel adaptive penalty on the sum of functional component norms. Theoretical properties of the new estimator are established. Additionally, numerous simulated and real examples further demonstrate that the new approach substantially outperforms other existing methods in the finite sample setting. PMID:21603586

  9. Complementary nonparametric analysis of covariance for logistic regression in a randomized clinical trial setting.

    PubMed

    Tangen, C M; Koch, G G

    1999-03-01

    In the randomized clinical trial setting, controlling for covariates is expected to produce variance reduction for the treatment parameter estimate and to adjust for random imbalances of covariates between the treatment groups. However, for the logistic regression model, variance reduction is not obviously obtained. This can lead to concerns about the assumptions of the logistic model. We introduce a complementary nonparametric method for covariate adjustment. It provides results that are usually compatible with expectations for analysis of covariance. The only assumptions required are based on randomization and sampling arguments. The resulting treatment parameter is a (unconditional) population average log-odds ratio that has been adjusted for random imbalance of covariates. Data from a randomized clinical trial are used to compare results from the traditional maximum likelihood logistic method with those from the nonparametric logistic method. We examine treatment parameter estimates, corresponding standard errors, and significance levels in models with and without covariate adjustment. In addition, we discuss differences between unconditional population average treatment parameters and conditional subpopulation average treatment parameters. Additional features of the nonparametric method, including stratified (multicenter) and multivariate (multivisit) analyses, are illustrated. Extensions of this methodology to the proportional odds model are also made.

  10. Using Multivariate Adaptive Regression Spline and Artificial Neural Network to Simulate Urbanization in Mumbai, India

    NASA Astrophysics Data System (ADS)

    Ahmadlou, M.; Delavar, M. R.; Tayyebi, A.; Shafizadeh-Moghadam, H.

    2015-12-01

    Land use change (LUC) models used for modelling urban growth are different in structure and performance. Local models divide the data into separate subsets and fit distinct models on each of the subsets. Non-parametric models are data driven and usually do not have a fixed model structure or model structure is unknown before the modelling process. On the other hand, global models perform modelling using all the available data. In addition, parametric models have a fixed structure before the modelling process and they are model driven. Since few studies have compared local non-parametric models with global parametric models, this study compares a local non-parametric model called multivariate adaptive regression spline (MARS), and a global parametric model called artificial neural network (ANN) to simulate urbanization in Mumbai, India. Both models determine the relationship between a dependent variable and multiple independent variables. We used receiver operating characteristic (ROC) to compare the power of the both models for simulating urbanization. Landsat images of 1991 (TM) and 2010 (ETM+) were used for modelling the urbanization process. The drivers considered for urbanization in this area were distance to urban areas, urban density, distance to roads, distance to water, distance to forest, distance to railway, distance to central business district, number of agricultural cells in a 7 by 7 neighbourhoods, and slope in 1991. The results showed that the area under the ROC curve for MARS and ANN was 94.77% and 95.36%, respectively. Thus, ANN performed slightly better than MARS to simulate urban areas in Mumbai, India.

  11. A framework for multivariate data-based at-site flood frequency analysis: Essentiality of the conjugal application of parametric and nonparametric approaches

    NASA Astrophysics Data System (ADS)

    Vittal, H.; Singh, Jitendra; Kumar, Pankaj; Karmakar, Subhankar

    2015-06-01

    In watershed management, flood frequency analysis (FFA) is performed to quantify the risk of flooding at different spatial locations and also to provide guidelines for determining the design periods of flood control structures. The traditional FFA was extensively performed by considering univariate scenario for both at-site and regional estimation of return periods. However, due to inherent mutual dependence of the flood variables or characteristics [i.e., peak flow (P), flood volume (V) and flood duration (D), which are random in nature], analysis has been further extended to multivariate scenario, with some restrictive assumptions. To overcome the assumption of same family of marginal density function for all flood variables, the concept of copula has been introduced. Although, the advancement from univariate to multivariate analyses drew formidable attention to the FFA research community, the basic limitation was that the analyses were performed with the implementation of only parametric family of distributions. The aim of the current study is to emphasize the importance of nonparametric approaches in the field of multivariate FFA; however, the nonparametric distribution may not always be a good-fit and capable of replacing well-implemented multivariate parametric and multivariate copula-based applications. Nevertheless, the potential of obtaining best-fit using nonparametric distributions might be improved because such distributions reproduce the sample's characteristics, resulting in more accurate estimations of the multivariate return period. Hence, the current study shows the importance of conjugating multivariate nonparametric approach with multivariate parametric and copula-based approaches, thereby results in a comprehensive framework for complete at-site FFA. Although the proposed framework is designed for at-site FFA, this approach can also be applied to regional FFA because regional estimations ideally include at-site estimations. The framework is based on the following steps: (i) comprehensive trend analysis to assess nonstationarity in the observed data; (ii) selection of the best-fit univariate marginal distribution with a comprehensive set of parametric and nonparametric distributions for the flood variables; (iii) multivariate frequency analyses with parametric, copula-based and nonparametric approaches; and (iv) estimation of joint and various conditional return periods. The proposed framework for frequency analysis is demonstrated using 110 years of observed data from Allegheny River at Salamanca, New York, USA. The results show that for both univariate and multivariate cases, the nonparametric Gaussian kernel provides the best estimate. Further, we perform FFA for twenty major rivers over continental USA, which shows for seven rivers, all the flood variables followed nonparametric Gaussian kernel; whereas for other rivers, parametric distributions provide the best-fit either for one or two flood variables. Thus the summary of results shows that the nonparametric method cannot substitute the parametric and copula-based approaches, but should be considered during any at-site FFA to provide the broadest choices for best estimation of the flood return periods.

  12. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

    PubMed Central

    Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

    2018-01-01

    This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555

  13. A Hybrid Index for Characterizing Drought Based on a Nonparametric Kernel Estimator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Huang, Shengzhi; Huang, Qiang; Leng, Guoyong

    This study develops a nonparametric multivariate drought index, namely, the Nonparametric Multivariate Standardized Drought Index (NMSDI), by considering the variations of both precipitation and streamflow. Building upon previous efforts in constructing Nonparametric Multivariate Drought Index, we use the nonparametric kernel estimator to derive the joint distribution of precipitation and streamflow, thus providing additional insights in drought index development. The proposed NMSDI are applied in the Wei River Basin (WRB), based on which the drought evolution characteristics are investigated. Results indicate: (1) generally, NMSDI captures the drought onset similar to Standardized Precipitation Index (SPI) and drought termination and persistence similar tomore » Standardized Streamflow Index (SSFI). The drought events identified by NMSDI match well with historical drought records in the WRB. The performances are also consistent with that by an existing Multivariate Standardized Drought Index (MSDI) at various timescales, confirming the validity of the newly constructed NMSDI in drought detections (2) An increasing risk of drought has been detected for the past decades, and will be persistent to a certain extent in future in most areas of the WRB; (3) the identified change points of annual NMSDI are mainly concentrated in the early 1970s and middle 1990s, coincident with extensive water use and soil reservation practices. This study highlights the nonparametric multivariable drought index, which can be used for drought detections and predictions efficiently and comprehensively.« less

  14. Nonparametric estimation of the multivariate survivor function: the multivariate Kaplan-Meier estimator.

    PubMed

    Prentice, Ross L; Zhao, Shanshan

    2018-01-01

    The Dabrowska (Ann Stat 16:1475-1489, 1988) product integral representation of the multivariate survivor function is extended, leading to a nonparametric survivor function estimator for an arbitrary number of failure time variates that has a simple recursive formula for its calculation. Empirical process methods are used to sketch proofs for this estimator's strong consistency and weak convergence properties. Summary measures of pairwise and higher-order dependencies are also defined and nonparametrically estimated. Simulation evaluation is given for the special case of three failure time variates.

  15. Finding structure in data using multivariate tree boosting

    PubMed Central

    Miller, Patrick J.; Lubke, Gitta H.; McArtor, Daniel B.; Bergeman, C. S.

    2016-01-01

    Technology and collaboration enable dramatic increases in the size of psychological and psychiatric data collections, but finding structure in these large data sets with many collected variables is challenging. Decision tree ensembles such as random forests (Strobl, Malley, & Tutz, 2009) are a useful tool for finding structure, but are difficult to interpret with multiple outcome variables which are often of interest in psychology. To find and interpret structure in data sets with multiple outcomes and many predictors (possibly exceeding the sample size), we introduce a multivariate extension to a decision tree ensemble method called gradient boosted regression trees (Friedman, 2001). Our extension, multivariate tree boosting, is a method for nonparametric regression that is useful for identifying important predictors, detecting predictors with nonlinear effects and interactions without specification of such effects, and for identifying predictors that cause two or more outcome variables to covary. We provide the R package ‘mvtboost’ to estimate, tune, and interpret the resulting model, which extends the implementation of univariate boosting in the R package ‘gbm’ (Ridgeway et al., 2015) to continuous, multivariate outcomes. To illustrate the approach, we analyze predictors of psychological well-being (Ryff & Keyes, 1995). Simulations verify that our approach identifies predictors with nonlinear effects and achieves high prediction accuracy, exceeding or matching the performance of (penalized) multivariate multiple regression and multivariate decision trees over a wide range of conditions. PMID:27918183

  16. A new approach to correct the QT interval for changes in heart rate using a nonparametric regression model in beagle dogs.

    PubMed

    Watanabe, Hiroyuki; Miyazaki, Hiroyasu

    2006-01-01

    Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.

  17. [Multivariate Adaptive Regression Splines (MARS), an alternative for the analysis of time series].

    PubMed

    Vanegas, Jairo; Vásquez, Fabián

    Multivariate Adaptive Regression Splines (MARS) is a non-parametric modelling method that extends the linear model, incorporating nonlinearities and interactions between variables. It is a flexible tool that automates the construction of predictive models: selecting relevant variables, transforming the predictor variables, processing missing values and preventing overshooting using a self-test. It is also able to predict, taking into account structural factors that might influence the outcome variable, thereby generating hypothetical models. The end result could identify relevant cut-off points in data series. It is rarely used in health, so it is proposed as a tool for the evaluation of relevant public health indicators. For demonstrative purposes, data series regarding the mortality of children under 5 years of age in Costa Rica were used, comprising the period 1978-2008. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.

  18. Regional vertical total electron content (VTEC) modeling together with satellite and receiver differential code biases (DCBs) using semi-parametric multivariate adaptive regression B-splines (SP-BMARS)

    NASA Astrophysics Data System (ADS)

    Durmaz, Murat; Karslioglu, Mahmut Onur

    2015-04-01

    There are various global and regional methods that have been proposed for the modeling of ionospheric vertical total electron content (VTEC). Global distribution of VTEC is usually modeled by spherical harmonic expansions, while tensor products of compactly supported univariate B-splines can be used for regional modeling. In these empirical parametric models, the coefficients of the basis functions as well as differential code biases (DCBs) of satellites and receivers can be treated as unknown parameters which can be estimated from geometry-free linear combinations of global positioning system observables. In this work we propose a new semi-parametric multivariate adaptive regression B-splines (SP-BMARS) method for the regional modeling of VTEC together with satellite and receiver DCBs, where the parametric part of the model is related to the DCBs as fixed parameters and the non-parametric part adaptively models the spatio-temporal distribution of VTEC. The latter is based on multivariate adaptive regression B-splines which is a non-parametric modeling technique making use of compactly supported B-spline basis functions that are generated from the observations automatically. This algorithm takes advantage of an adaptive scale-by-scale model building strategy that searches for best-fitting B-splines to the data at each scale. The VTEC maps generated from the proposed method are compared numerically and visually with the global ionosphere maps (GIMs) which are provided by the Center for Orbit Determination in Europe (CODE). The VTEC values from SP-BMARS and CODE GIMs are also compared with VTEC values obtained through calibration using local ionospheric model. The estimated satellite and receiver DCBs from the SP-BMARS model are compared with the CODE distributed DCBs. The results show that the SP-BMARS algorithm can be used to estimate satellite and receiver DCBs while adaptively and flexibly modeling the daily regional VTEC.

  19. Estimation of Subpixel Snow-Covered Area by Nonparametric Regression Splines

    NASA Astrophysics Data System (ADS)

    Kuter, S.; Akyürek, Z.; Weber, G.-W.

    2016-10-01

    Measurement of the areal extent of snow cover with high accuracy plays an important role in hydrological and climate modeling. Remotely-sensed data acquired by earth-observing satellites offer great advantages for timely monitoring of snow cover. However, the main obstacle is the tradeoff between temporal and spatial resolution of satellite imageries. Soft or subpixel classification of low or moderate resolution satellite images is a preferred technique to overcome this problem. The most frequently employed snow cover fraction methods applied on Moderate Resolution Imaging Spectroradiometer (MODIS) data have evolved from spectral unmixing and empirical Normalized Difference Snow Index (NDSI) methods to latest machine learning-based artificial neural networks (ANNs). This study demonstrates the implementation of subpixel snow-covered area estimation based on the state-of-the-art nonparametric spline regression method, namely, Multivariate Adaptive Regression Splines (MARS). MARS models were trained by using MODIS top of atmospheric reflectance values of bands 1-7 as predictor variables. Reference percentage snow cover maps were generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also employed to estimate the percentage snow-covered area on the same data set. The results indicated that the developed MARS model performed better than th

  20. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data.

    PubMed

    Abram, Samantha V; Helwig, Nathaniel E; Moodie, Craig A; DeYoung, Colin G; MacDonald, Angus W; Waller, Niels G

    2016-01-01

    Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks.

  1. Bootstrap Enhanced Penalized Regression for Variable Selection with Neuroimaging Data

    PubMed Central

    Abram, Samantha V.; Helwig, Nathaniel E.; Moodie, Craig A.; DeYoung, Colin G.; MacDonald, Angus W.; Waller, Niels G.

    2016-01-01

    Recent advances in fMRI research highlight the use of multivariate methods for examining whole-brain connectivity. Complementary data-driven methods are needed for determining the subset of predictors related to individual differences. Although commonly used for this purpose, ordinary least squares (OLS) regression may not be ideal due to multi-collinearity and over-fitting issues. Penalized regression is a promising and underutilized alternative to OLS regression. In this paper, we propose a nonparametric bootstrap quantile (QNT) approach for variable selection with neuroimaging data. We use real and simulated data, as well as annotated R code, to demonstrate the benefits of our proposed method. Our results illustrate the practical potential of our proposed bootstrap QNT approach. Our real data example demonstrates how our method can be used to relate individual differences in neural network connectivity with an externalizing personality measure. Also, our simulation results reveal that the QNT method is effective under a variety of data conditions. Penalized regression yields more stable estimates and sparser models than OLS regression in situations with large numbers of highly correlated neural predictors. Our results demonstrate that penalized regression is a promising method for examining associations between neural predictors and clinically relevant traits or behaviors. These findings have important implications for the growing field of functional connectivity research, where multivariate methods produce numerous, highly correlated brain networks. PMID:27516732

  2. NONPARAMETRIC MANOVA APPROACHES FOR NON-NORMAL MULTIVARIATE OUTCOMES WITH MISSING VALUES

    PubMed Central

    He, Fanyin; Mazumdar, Sati; Tang, Gong; Bhatia, Triptish; Anderson, Stewart J.; Dew, Mary Amanda; Krafty, Robert; Nimgaonkar, Vishwajit; Deshpande, Smita; Hall, Martica; Reynolds, Charles F.

    2017-01-01

    Between-group comparisons often entail many correlated response variables. The multivariate linear model, with its assumption of multivariate normality, is the accepted standard tool for these tests. When this assumption is violated, the nonparametric multivariate Kruskal-Wallis (MKW) test is frequently used. However, this test requires complete cases with no missing values in response variables. Deletion of cases with missing values likely leads to inefficient statistical inference. Here we extend the MKW test to retain information from partially-observed cases. Results of simulated studies and analysis of real data show that the proposed method provides adequate coverage and superior power to complete-case analyses. PMID:29416225

  3. An Empirical Study of Eight Nonparametric Tests in Hierarchical Regression.

    ERIC Educational Resources Information Center

    Harwell, Michael; Serlin, Ronald C.

    When normality does not hold, nonparametric tests represent an important data-analytic alternative to parametric tests. However, the use of nonparametric tests in educational research has been limited by the absence of easily performed tests for complex experimental designs and analyses, such as factorial designs and multiple regression analyses,…

  4. A New Approach of Juvenile Age Estimation using Measurements of the Ilium and Multivariate Adaptive Regression Splines (MARS) Models for Better Age Prediction.

    PubMed

    Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal

    2017-01-01

    Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.

  5. Integrating Growth Variability of the Ilium, Fifth Lumbar Vertebra, and Clavicle with Multivariate Adaptive Regression Splines Models for Subadult Age Estimation.

    PubMed

    Corron, Louise; Marchal, François; Condemi, Silvana; Telmon, Norbert; Chaumoitre, Kathia; Adalian, Pascal

    2018-05-31

    Subadult age estimation should rely on sampling and statistical protocols capturing development variability for more accurate age estimates. In this perspective, measurements were taken on the fifth lumbar vertebrae and/or clavicles of 534 French males and females aged 0-19 years and the ilia of 244 males and females aged 0-12 years. These variables were fitted in nonparametric multivariate adaptive regression splines (MARS) models with 95% prediction intervals (PIs) of age. The models were tested on two independent samples from Marseille and the Luis Lopes reference collection from Lisbon. Models using ilium width and module, maximum clavicle length, and lateral vertebral body heights were more than 92% accurate. Precision was lower for postpubertal individuals. Integrating punctual nonlinearities of the relationship between age and the variables and dynamic prediction intervals incorporated the normal increase in interindividual growth variability (heteroscedasticity of variance) with age for more biologically accurate predictions. © 2018 American Academy of Forensic Sciences.

  6. Bayesian Unimodal Density Regression for Causal Inference

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2011-01-01

    Karabatsos and Walker (2011) introduced a new Bayesian nonparametric (BNP) regression model. Through analyses of real and simulated data, they showed that the BNP regression model outperforms other parametric and nonparametric regression models of common use, in terms of predictive accuracy of the outcome (dependent) variable. The other,…

  7. Application of Semiparametric Spline Regression Model in Analyzing Factors that In uence Population Density in Central Java

    NASA Astrophysics Data System (ADS)

    Sumantari, Y. D.; Slamet, I.; Sugiyanto

    2017-06-01

    Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.

  8. Variable Selection for Nonparametric Quantile Regression via Smoothing Spline AN OVA

    PubMed Central

    Lin, Chen-Yen; Bondell, Howard; Zhang, Hao Helen; Zou, Hui

    2014-01-01

    Quantile regression provides a more thorough view of the effect of covariates on a response. Nonparametric quantile regression has become a viable alternative to avoid restrictive parametric assumption. The problem of variable selection for quantile regression is challenging, since important variables can influence various quantiles in different ways. We tackle the problem via regularization in the context of smoothing spline ANOVA models. The proposed sparse nonparametric quantile regression (SNQR) can identify important variables and provide flexible estimates for quantiles. Our numerical study suggests the promising performance of the new procedure in variable selection and function estimation. Supplementary materials for this article are available online. PMID:24554792

  9. CADDIS Volume 4. Data Analysis: PECBO Appendix - R Scripts for Non-Parametric Regressions

    EPA Pesticide Factsheets

    Script for computing nonparametric regression analysis. Overview of using scripts to infer environmental conditions from biological observations, statistically estimating species-environment relationships, statistical scripts.

  10. A Comparison of Methods for Nonparametric Estimation of Item Characteristic Curves for Binary Items

    ERIC Educational Resources Information Center

    Lee, Young-Sun

    2007-01-01

    This study compares the performance of three nonparametric item characteristic curve (ICC) estimation procedures: isotonic regression, smoothed isotonic regression, and kernel smoothing. Smoothed isotonic regression, employed along with an appropriate kernel function, provides better estimates and also satisfies the assumption of strict…

  11. Multivariate decoding of brain images using ordinal regression.

    PubMed

    Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F

    2013-11-01

    Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.

  12. Multiresponse semiparametric regression for modelling the effect of regional socio-economic variables on the use of information technology

    NASA Astrophysics Data System (ADS)

    Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania

    2017-03-01

    Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.

  13. Estimation and model selection of semiparametric multivariate survival functions under general censorship.

    PubMed

    Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

    2010-07-01

    We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root- n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided.

  14. Estimation and model selection of semiparametric multivariate survival functions under general censorship

    PubMed Central

    Chen, Xiaohong; Fan, Yanqin; Pouzo, Demian; Ying, Zhiliang

    2013-01-01

    We study estimation and model selection of semiparametric models of multivariate survival functions for censored data, which are characterized by possibly misspecified parametric copulas and nonparametric marginal survivals. We obtain the consistency and root-n asymptotic normality of a two-step copula estimator to the pseudo-true copula parameter value according to KLIC, and provide a simple consistent estimator of its asymptotic variance, allowing for a first-step nonparametric estimation of the marginal survivals. We establish the asymptotic distribution of the penalized pseudo-likelihood ratio statistic for comparing multiple semiparametric multivariate survival functions subject to copula misspecification and general censorship. An empirical application is provided. PMID:24790286

  15. Traffic flow forecasting using approximate nearest neighbor nonparametric regression

    DOT National Transportation Integrated Search

    2000-12-01

    The purpose of this research is to enhance nonparametric regression (NPR) for use in real-time systems by first reducing execution time using advanced data structures and imprecise computations and then developing a methodology for applying NPR. Due ...

  16. Nonparametric analysis of Minnesota spruce and aspen tree data and LANDSAT data

    NASA Technical Reports Server (NTRS)

    Scott, D. W.; Jee, R.

    1984-01-01

    The application of nonparametric methods in data-intensive problems faced by NASA is described. The theoretical development of efficient multivariate density estimators and the novel use of color graphics workstations are reviewed. The use of nonparametric density estimates for data representation and for Bayesian classification are described and illustrated. Progress in building a data analysis system in a workstation environment is reviewed and preliminary runs presented.

  17. Episiotomy increases perineal laceration length in primiparous women.

    PubMed

    Nager, C W; Helliwell, J P

    2001-08-01

    The aim of this study was to determine the clinical factors that contribute to posterior perineal laceration length. A prospective observational study was performed in 80 consenting, mostly primiparous women with term pregnancies. Posterior perineal lacerations were measured immediately after delivery. Numerous maternal, fetal, and operator variables were evaluated against laceration length and degree of tear. Univariate and multivariate regression analyses were performed to evaluate laceration length and parametric clinical variables. Nonparametric clinical variables were evaluated against laceration length by the Mann-Whitney U test. A multivariate stepwise linear regression equation revealed that episiotomy adds nearly 3 cm to perineal lacerations. Tear length was highly associated with the degree of tear (R = 0.86, R(2) = 0.73) and the risk of recognized anal sphincter disruption. None of 35 patients without an episiotomy had a recognized anal sphincter disruption, but 6 of 27 patients with an episiotomy did (P <.001). Body mass index was the only maternal or fetal variable that showed even a slight correlation with laceration length (R = 0.30, P =.04). Episiotomy is the overriding determinant of perineal laceration length and recognized anal sphincter disruption.

  18. PM10 modeling in the Oviedo urban area (Northern Spain) by using multivariate adaptive regression splines

    NASA Astrophysics Data System (ADS)

    Nieto, Paulino José García; Antón, Juan Carlos Álvarez; Vilán, José Antonio Vilán; García-Gonzalo, Esperanza

    2014-10-01

    The aim of this research work is to build a regression model of the particulate matter up to 10 micrometers in size (PM10) by using the multivariate adaptive regression splines (MARS) technique in the Oviedo urban area (Northern Spain) at local scale. This research work explores the use of a nonparametric regression algorithm known as multivariate adaptive regression splines (MARS) which has the ability to approximate the relationship between the inputs and outputs, and express the relationship mathematically. In this sense, hazardous air pollutants or toxic air contaminants refer to any substance that may cause or contribute to an increase in mortality or serious illness, or that may pose a present or potential hazard to human health. To accomplish the objective of this study, the experimental dataset of nitrogen oxides (NOx), carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3) and dust (PM10) were collected over 3 years (2006-2008) and they are used to create a highly nonlinear model of the PM10 in the Oviedo urban nucleus (Northern Spain) based on the MARS technique. One main objective of this model is to obtain a preliminary estimate of the dependence between PM10 pollutant in the Oviedo urban area at local scale. A second aim is to determine the factors with the greatest bearing on air quality with a view to proposing health and lifestyle improvements. The United States National Ambient Air Quality Standards (NAAQS) establishes the limit values of the main pollutants in the atmosphere in order to ensure the health of healthy people. Firstly, this MARS regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the main pollutants in the Oviedo urban area. Secondly, the main advantages of MARS are its capacity to produce simple, easy-to-interpret models, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, on the basis of these numerical calculations, using the multivariate adaptive regression splines (MARS) technique, conclusions of this research work are exposed.

  19. Nonparametric Regression and the Parametric Bootstrap for Local Dependence Assessment.

    ERIC Educational Resources Information Center

    Habing, Brian

    2001-01-01

    Discusses ideas underlying nonparametric regression and the parametric bootstrap with an overview of their application to item response theory and the assessment of local dependence. Illustrates the use of the method in assessing local dependence that varies with examinee trait levels. (SLD)

  20. Classification and regression tree analysis vs. multivariable linear and logistic regression methods as statistical tools for studying haemophilia.

    PubMed

    Henrard, S; Speybroeck, N; Hermans, C

    2015-11-01

    Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.

  1. A comparative study between nonlinear regression and nonparametric approaches for modelling Phalaris paradoxa seedling emergence

    USDA-ARS?s Scientific Manuscript database

    Parametric non-linear regression (PNR) techniques commonly are used to develop weed seedling emergence models. Such techniques, however, require statistical assumptions that are difficult to meet. To examine and overcome these limitations, we compared PNR with a nonparametric estimation technique. F...

  2. Regional trends in short-duration precipitation extremes: a flexible multivariate monotone quantile regression approach

    NASA Astrophysics Data System (ADS)

    Cannon, Alex

    2017-04-01

    Estimating historical trends in short-duration rainfall extremes at regional and local scales is challenging due to low signal-to-noise ratios and the limited availability of homogenized observational data. In addition to being of scientific interest, trends in rainfall extremes are of practical importance, as their presence calls into question the stationarity assumptions that underpin traditional engineering and infrastructure design practice. Even with these fundamental challenges, increasingly complex questions are being asked about time series of extremes. For instance, users may not only want to know whether or not rainfall extremes have changed over time, they may also want information on the modulation of trends by large-scale climate modes or on the nonstationarity of trends (e.g., identifying hiatus periods or periods of accelerating positive trends). Efforts have thus been devoted to the development and application of more robust and powerful statistical estimators for regional and local scale trends. While a standard nonparametric method like the regional Mann-Kendall test, which tests for the presence of monotonic trends (i.e., strictly non-decreasing or non-increasing changes), makes fewer assumptions than parametric methods and pools information from stations within a region, it is not designed to visualize detected trends, include information from covariates, or answer questions about the rate of change in trends. As a remedy, monotone quantile regression (MQR) has been developed as a nonparametric alternative that can be used to estimate a common monotonic trend in extremes at multiple stations. Quantile regression makes efficient use of data by directly estimating conditional quantiles based on information from all rainfall data in a region, i.e., without having to precompute the sample quantiles. The MQR method is also flexible and can be used to visualize and analyze the nonlinearity of the detected trend. However, it is fundamentally a univariate technique, and cannot incorporate information from additional covariates, for example ENSO state or physiographic controls on extreme rainfall within a region. Here, the univariate MQR model is extended to allow the use of multiple covariates. Multivariate monotone quantile regression (MMQR) is based on a single hidden-layer feedforward network with the quantile regression error function and partial monotonicity constraints. The MMQR model is demonstrated via Monte Carlo simulations and the estimation and visualization of regional trends in moderate rainfall extremes based on homogenized sub-daily precipitation data at stations in Canada.

  3. Boosted Multivariate Trees for Longitudinal Data

    PubMed Central

    Pande, Amol; Li, Liang; Rajeswaran, Jeevanantham; Ehrlinger, John; Kogalur, Udaya B.; Blackstone, Eugene H.; Ishwaran, Hemant

    2017-01-01

    Machine learning methods provide a powerful approach for analyzing longitudinal data in which repeated measurements are observed for a subject over time. We boost multivariate trees to fit a novel flexible semi-nonparametric marginal model for longitudinal data. In this model, features are assumed to be nonparametric, while feature-time interactions are modeled semi-nonparametrically utilizing P-splines with estimated smoothing parameter. In order to avoid overfitting, we describe a relatively simple in sample cross-validation method which can be used to estimate the optimal boosting iteration and which has the surprising added benefit of stabilizing certain parameter estimates. Our new multivariate tree boosting method is shown to be highly flexible, robust to covariance misspecification and unbalanced designs, and resistant to overfitting in high dimensions. Feature selection can be used to identify important features and feature-time interactions. An application to longitudinal data of forced 1-second lung expiratory volume (FEV1) for lung transplant patients identifies an important feature-time interaction and illustrates the ease with which our method can find complex relationships in longitudinal data. PMID:29249866

  4. A Powerful Test for Comparing Multiple Regression Functions.

    PubMed

    Maity, Arnab

    2012-09-01

    In this article, we address the important problem of comparison of two or more population regression functions. Recently, Pardo-Fernández, Van Keilegom and González-Manteiga (2007) developed test statistics for simple nonparametric regression models: Y(ij) = θ(j)(Z(ij)) + σ(j)(Z(ij))∊(ij), based on empirical distributions of the errors in each population j = 1, … , J. In this paper, we propose a test for equality of the θ(j)(·) based on the concept of generalized likelihood ratio type statistics. We also generalize our test for other nonparametric regression setups, e.g, nonparametric logistic regression, where the loglikelihood for population j is any general smooth function [Formula: see text]. We describe a resampling procedure to obtain the critical values of the test. In addition, we present a simulation study to evaluate the performance of the proposed test and compare our results to those in Pardo-Fernández et al. (2007).

  5. LOCATING NEARBY SOURCES OF AIR POLLUTION BY NONPARAMETRIC REGRESSION OF ATMOSPHERIC CONCENTRATIONS ON WIND DIRECTION. (R826238)

    EPA Science Inventory

    The relationship of the concentration of air pollutants to wind direction has been determined by nonparametric regression using a Gaussian kernel. The results are smooth curves with error bars that allow for the accurate determination of the wind direction where the concentrat...

  6. Benchmark dose analysis via nonparametric regression modeling

    PubMed Central

    Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

    2013-01-01

    Estimation of benchmark doses (BMDs) in quantitative risk assessment traditionally is based upon parametric dose-response modeling. It is a well-known concern, however, that if the chosen parametric model is uncertain and/or misspecified, inaccurate and possibly unsafe low-dose inferences can result. We describe a nonparametric approach for estimating BMDs with quantal-response data based on an isotonic regression method, and also study use of corresponding, nonparametric, bootstrap-based confidence limits for the BMD. We explore the confidence limits’ small-sample properties via a simulation study, and illustrate the calculations with an example from cancer risk assessment. It is seen that this nonparametric approach can provide a useful alternative for BMD estimation when faced with the problem of parametric model uncertainty. PMID:23683057

  7. Introduction to multivariate discrimination

    NASA Astrophysics Data System (ADS)

    Kégl, Balázs

    2013-07-01

    Multivariate discrimination or classification is one of the best-studied problem in machine learning, with a plethora of well-tested and well-performing algorithms. There are also several good general textbooks [1-9] on the subject written to an average engineering, computer science, or statistics graduate student; most of them are also accessible for an average physics student with some background on computer science and statistics. Hence, instead of writing a generic introduction, we concentrate here on relating the subject to a practitioner experimental physicist. After a short introduction on the basic setup (Section 1) we delve into the practical issues of complexity regularization, model selection, and hyperparameter optimization (Section 2), since it is this step that makes high-complexity non-parametric fitting so different from low-dimensional parametric fitting. To emphasize that this issue is not restricted to classification, we illustrate the concept on a low-dimensional but non-parametric regression example (Section 2.1). Section 3 describes the common algorithmic-statistical formal framework that unifies the main families of multivariate classification algorithms. We explain here the large-margin principle that partly explains why these algorithms work. Section 4 is devoted to the description of the three main (families of) classification algorithms, neural networks, the support vector machine, and AdaBoost. We do not go into the algorithmic details; the goal is to give an overview on the form of the functions these methods learn and on the objective functions they optimize. Besides their technical description, we also make an attempt to put these algorithm into a socio-historical context. We then briefly describe some rather heterogeneous applications to illustrate the pattern recognition pipeline and to show how widespread the use of these methods is (Section 5). We conclude the chapter with three essentially open research problems that are either relevant to or even motivated by certain unorthodox applications of multivariate discrimination in experimental physics.

  8. Multilevel covariance regression with correlated random effects in the mean and variance structure.

    PubMed

    Quintero, Adrian; Lesaffre, Emmanuel

    2017-09-01

    Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Testing the Hypothesis of a Homoscedastic Error Term in Simple, Nonparametric Regression

    ERIC Educational Resources Information Center

    Wilcox, Rand R.

    2006-01-01

    Consider the nonparametric regression model Y = m(X)+ [tau](X)[epsilon], where X and [epsilon] are independent random variables, [epsilon] has a median of zero and variance [sigma][squared], [tau] is some unknown function used to model heteroscedasticity, and m(X) is an unknown function reflecting some conditional measure of location associated…

  10. A simulation study of nonparametric total deviation index as a measure of agreement based on quantile regression.

    PubMed

    Lin, Lawrence; Pan, Yi; Hedayat, A S; Barnhart, Huiman X; Haber, Michael

    2016-01-01

    Total deviation index (TDI) captures a prespecified quantile of the absolute deviation of paired observations from raters, observers, methods, assays, instruments, etc. We compare the performance of TDI using nonparametric quantile regression to the TDI assuming normality (Lin, 2000). This simulation study considers three distributions: normal, Poisson, and uniform at quantile levels of 0.8 and 0.9 for cases with and without contamination. Study endpoints include the bias of TDI estimates (compared with their respective theoretical values), standard error of TDI estimates (compared with their true simulated standard errors), and test size (compared with 0.05), and power. Nonparametric TDI using quantile regression, although it slightly underestimates and delivers slightly less power for data without contamination, works satisfactorily under all simulated cases even for moderate (say, ≥40) sample sizes. The performance of the TDI based on a quantile of 0.8 is in general superior to that of 0.9. The performances of nonparametric and parametric TDI methods are compared with a real data example. Nonparametric TDI can be very useful when the underlying distribution on the difference is not normal, especially when it has a heavy tail.

  11. Nonparametric instrumental regression with non-convex constraints

    NASA Astrophysics Data System (ADS)

    Grasmair, M.; Scherzer, O.; Vanhems, A.

    2013-03-01

    This paper considers the nonparametric regression model with an additive error that is dependent on the explanatory variables. As is common in empirical studies in epidemiology and economics, it also supposes that valid instrumental variables are observed. A classical example in microeconomics considers the consumer demand function as a function of the price of goods and the income, both variables often considered as endogenous. In this framework, the economic theory also imposes shape restrictions on the demand function, such as integrability conditions. Motivated by this illustration in microeconomics, we study an estimator of a nonparametric constrained regression function using instrumental variables by means of Tikhonov regularization. We derive rates of convergence for the regularized model both in a deterministic and stochastic setting under the assumption that the true regression function satisfies a projected source condition including, because of the non-convexity of the imposed constraints, an additional smallness condition.

  12. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  13. Bootstrap Prediction Intervals in Non-Parametric Regression with Applications to Anomaly Detection

    NASA Technical Reports Server (NTRS)

    Kumar, Sricharan; Srivistava, Ashok N.

    2012-01-01

    Prediction intervals provide a measure of the probable interval in which the outputs of a regression model can be expected to occur. Subsequently, these prediction intervals can be used to determine if the observed output is anomalous or not, conditioned on the input. In this paper, a procedure for determining prediction intervals for outputs of nonparametric regression models using bootstrap methods is proposed. Bootstrap methods allow for a non-parametric approach to computing prediction intervals with no specific assumptions about the sampling distribution of the noise or the data. The asymptotic fidelity of the proposed prediction intervals is theoretically proved. Subsequently, the validity of the bootstrap based prediction intervals is illustrated via simulations. Finally, the bootstrap prediction intervals are applied to the problem of anomaly detection on aviation data.

  14. Robust neural network with applications to credit portfolio data analysis.

    PubMed

    Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun

    2010-01-01

    In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.

  15. A semi-nonparametric Poisson regression model for analyzing motor vehicle crash data.

    PubMed

    Ye, Xin; Wang, Ke; Zou, Yajie; Lord, Dominique

    2018-01-01

    This paper develops a semi-nonparametric Poisson regression model to analyze motor vehicle crash frequency data collected from rural multilane highway segments in California, US. Motor vehicle crash frequency on rural highway is a topic of interest in the area of transportation safety due to higher driving speeds and the resultant severity level. Unlike the traditional Negative Binomial (NB) model, the semi-nonparametric Poisson regression model can accommodate an unobserved heterogeneity following a highly flexible semi-nonparametric (SNP) distribution. Simulation experiments are conducted to demonstrate that the SNP distribution can well mimic a large family of distributions, including normal distributions, log-gamma distributions, bimodal and trimodal distributions. Empirical estimation results show that such flexibility offered by the SNP distribution can greatly improve model precision and the overall goodness-of-fit. The semi-nonparametric distribution can provide a better understanding of crash data structure through its ability to capture potential multimodality in the distribution of unobserved heterogeneity. When estimated coefficients in empirical models are compared, SNP and NB models are found to have a substantially different coefficient for the dummy variable indicating the lane width. The SNP model with better statistical performance suggests that the NB model overestimates the effect of lane width on crash frequency reduction by 83.1%.

  16. Cox regression analysis with missing covariates via nonparametric multiple imputation.

    PubMed

    Hsu, Chiu-Hsieh; Yu, Mandi

    2018-01-01

    We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.

  17. Multivariate Density Estimation and Remote Sensing

    NASA Technical Reports Server (NTRS)

    Scott, D. W.

    1983-01-01

    Current efforts to develop methods and computer algorithms to effectively represent multivariate data commonly encountered in remote sensing applications are described. While this may involve scatter diagrams, multivariate representations of nonparametric probability density estimates are emphasized. The density function provides a useful graphical tool for looking at data and a useful theoretical tool for classification. This approach is called a thunderstorm data analysis.

  18. Penalized spline estimation for functional coefficient regression models.

    PubMed

    Cao, Yanrong; Lin, Haiqun; Wu, Tracy Z; Yu, Yan

    2010-04-01

    The functional coefficient regression models assume that the regression coefficients vary with some "threshold" variable, providing appreciable flexibility in capturing the underlying dynamics in data and avoiding the so-called "curse of dimensionality" in multivariate nonparametric estimation. We first investigate the estimation, inference, and forecasting for the functional coefficient regression models with dependent observations via penalized splines. The P-spline approach, as a direct ridge regression shrinkage type global smoothing method, is computationally efficient and stable. With established fixed-knot asymptotics, inference is readily available. Exact inference can be obtained for fixed smoothing parameter λ, which is most appealing for finite samples. Our penalized spline approach gives an explicit model expression, which also enables multi-step-ahead forecasting via simulations. Furthermore, we examine different methods of choosing the important smoothing parameter λ: modified multi-fold cross-validation (MCV), generalized cross-validation (GCV), and an extension of empirical bias bandwidth selection (EBBS) to P-splines. In addition, we implement smoothing parameter selection using mixed model framework through restricted maximum likelihood (REML) for P-spline functional coefficient regression models with independent observations. The P-spline approach also easily allows different smoothness for different functional coefficients, which is enabled by assigning different penalty λ accordingly. We demonstrate the proposed approach by both simulation examples and a real data application.

  19. High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics

    PubMed Central

    Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike

    2010-01-01

    We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139

  20. Patient acceptance of non-invasive testing for fetal aneuploidy via cell-free fetal DNA.

    PubMed

    Vahanian, Sevan A; Baraa Allaf, M; Yeh, Corinne; Chavez, Martin R; Kinzler, Wendy L; Vintzileos, Anthony M

    2014-01-01

    To evaluate factors associated with patient acceptance of noninvasive prenatal testing for trisomy 21, 18 and 13 via cell-free fetal DNA. This was a retrospective study of all patients who were offered noninvasive prenatal testing at a single institution from 1 March 2012 to 2 July 2012. Patients were identified through our perinatal ultrasound database; demographic information, testing indication and insurance coverage were compared between patients who accepted the test and those who declined. Parametric and nonparametric tests were used as appropriate. Significant variables were assessed using multivariate logistic regression. The value p < 0.05 was considered significant. Two hundred thirty-five patients were offered noninvasive prenatal testing. Ninety-three patients (40%) accepted testing and 142 (60%) declined. Women who accepted noninvasive prenatal testing were more commonly white, had private insurance and had more than one testing indication. There was no statistical difference in the number or the type of testing indications. Multivariable logistic regression analysis was then used to assess individual variables. After controlling for race, patients with public insurance were 83% less likely to accept noninvasive prenatal testing than those with private insurance (3% vs. 97%, adjusted RR 0.17, 95% CI 0.05-0.62). In our population, having public insurance was the factor most strongly associated with declining noninvasive prenatal testing.

  1. Non-Gaussian spatiotemporal simulation of multisite daily precipitation: downscaling framework

    NASA Astrophysics Data System (ADS)

    Ben Alaya, M. A.; Ouarda, T. B. M. J.; Chebana, F.

    2018-01-01

    Probabilistic regression approaches for downscaling daily precipitation are very useful. They provide the whole conditional distribution at each forecast step to better represent the temporal variability. The question addressed in this paper is: how to simulate spatiotemporal characteristics of multisite daily precipitation from probabilistic regression models? Recent publications point out the complexity of multisite properties of daily precipitation and highlight the need for using a non-Gaussian flexible tool. This work proposes a reasonable compromise between simplicity and flexibility avoiding model misspecification. A suitable nonparametric bootstrapping (NB) technique is adopted. A downscaling model which merges a vector generalized linear model (VGLM as a probabilistic regression tool) and the proposed bootstrapping technique is introduced to simulate realistic multisite precipitation series. The model is applied to data sets from the southern part of the province of Quebec, Canada. It is shown that the model is capable of reproducing both at-site properties and the spatial structure of daily precipitations. Results indicate the superiority of the proposed NB technique, over a multivariate autoregressive Gaussian framework (i.e. Gaussian copula).

  2. Nonparametric estimation and testing of fixed effects panel data models

    PubMed Central

    Henderson, Daniel J.; Carroll, Raymond J.; Li, Qi

    2009-01-01

    In this paper we consider the problem of estimating nonparametric panel data models with fixed effects. We introduce an iterative nonparametric kernel estimator. We also extend the estimation method to the case of a semiparametric partially linear fixed effects model. To determine whether a parametric, semiparametric or nonparametric model is appropriate, we propose test statistics to test between the three alternatives in practice. We further propose a test statistic for testing the null hypothesis of random effects against fixed effects in a nonparametric panel data regression model. Simulations are used to examine the finite sample performance of the proposed estimators and the test statistics. PMID:19444335

  3. Marginally specified priors for non-parametric Bayesian estimation

    PubMed Central

    Kessler, David C.; Hoff, Peter D.; Dunson, David B.

    2014-01-01

    Summary Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables. PMID:25663813

  4. Locally-Based Kernal PLS Smoothing to Non-Parametric Regression Curve Fitting

    NASA Technical Reports Server (NTRS)

    Rosipal, Roman; Trejo, Leonard J.; Wheeler, Kevin; Korsmeyer, David (Technical Monitor)

    2002-01-01

    We present a novel smoothing approach to non-parametric regression curve fitting. This is based on kernel partial least squares (PLS) regression in reproducing kernel Hilbert space. It is our concern to apply the methodology for smoothing experimental data where some level of knowledge about the approximate shape, local inhomogeneities or points where the desired function changes its curvature is known a priori or can be derived based on the observed noisy data. We propose locally-based kernel PLS regression that extends the previous kernel PLS methodology by incorporating this knowledge. We compare our approach with existing smoothing splines, hybrid adaptive splines and wavelet shrinkage techniques on two generated data sets.

  5. Penalized nonparametric scalar-on-function regression via principal coordinates

    PubMed Central

    Reiss, Philip T.; Miller, David L.; Wu, Pei-Shien; Hua, Wen-Yu

    2016-01-01

    A number of classical approaches to nonparametric regression have recently been extended to the case of functional predictors. This paper introduces a new method of this type, which extends intermediate-rank penalized smoothing to scalar-on-function regression. In the proposed method, which we call principal coordinate ridge regression, one regresses the response on leading principal coordinates defined by a relevant distance among the functional predictors, while applying a ridge penalty. Our publicly available implementation, based on generalized additive modeling software, allows for fast optimal tuning parameter selection and for extensions to multiple functional predictors, exponential family-valued responses, and mixed-effects models. In an application to signature verification data, principal coordinate ridge regression, with dynamic time warping distance used to define the principal coordinates, is shown to outperform a functional generalized linear model. PMID:29217963

  6. The Breslow estimator of the nonparametric baseline survivor function in Cox's regression model: some heuristics.

    PubMed

    Hanley, James A

    2008-01-01

    Most survival analysis textbooks explain how the hazard ratio parameters in Cox's life table regression model are estimated. Fewer explain how the components of the nonparametric baseline survivor function are derived. Those that do often relegate the explanation to an "advanced" section and merely present the components as algebraic or iterative solutions to estimating equations. None comment on the structure of these estimators. This note brings out a heuristic representation that may help to de-mystify the structure.

  7. Nonparametric Regression Subject to a Given Number of Local Extreme Value

    DTIC Science & Technology

    2001-07-01

    compilation report: ADP013708 thru ADP013761 UNCLASSIFIED Nonparametric regression subject to a given number of local extreme value Ali Majidi and Laurie...locations of the local extremes for the smoothing algorithm. 280 A. Majidi and L. Davies 3 The smoothing problem We make the smoothing problem precise...is the solution of QP3. k--oo 282 A. Majidi and L. Davies FiG. 2. The captions top-left, top-right, bottom-left, bottom-right show the result of the

  8. Linkage mapping of beta 2 EEG waves via non-parametric regression.

    PubMed

    Ghosh, Saurabh; Begleiter, Henri; Porjesz, Bernice; Chorlian, David B; Edenberg, Howard J; Foroud, Tatiana; Goate, Alison; Reich, Theodore

    2003-04-01

    Parametric linkage methods for analyzing quantitative trait loci are sensitive to violations in trait distributional assumptions. Non-parametric methods are relatively more robust. In this article, we modify the non-parametric regression procedure proposed by Ghosh and Majumder [2000: Am J Hum Genet 66:1046-1061] to map Beta 2 EEG waves using genome-wide data generated in the COGA project. Significant linkage findings are obtained on chromosomes 1, 4, 5, and 15 with findings at multiple regions on chromosomes 4 and 15. We analyze the data both with and without incorporating alcoholism as a covariate. We also test for epistatic interactions between regions of the genome exhibiting significant linkage with the EEG phenotypes and find evidence of epistatic interactions between a region each on chromosome 1 and chromosome 4 with one region on chromosome 15. While regressing out the effect of alcoholism does not affect the linkage findings, the epistatic interactions become statistically insignificant. Copyright 2003 Wiley-Liss, Inc.

  9. Parametric and Nonparametric Statistical Methods for Genomic Selection of Traits with Additive and Epistatic Genetic Architectures

    PubMed Central

    Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.

    2014-01-01

    Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289

  10. Robust, Adaptive Functional Regression in Functional Mixed Model Framework.

    PubMed

    Zhu, Hongxiao; Brown, Philip J; Morris, Jeffrey S

    2011-09-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets.

  11. Robust, Adaptive Functional Regression in Functional Mixed Model Framework

    PubMed Central

    Zhu, Hongxiao; Brown, Philip J.; Morris, Jeffrey S.

    2012-01-01

    Functional data are increasingly encountered in scientific studies, and their high dimensionality and complexity lead to many analytical challenges. Various methods for functional data analysis have been developed, including functional response regression methods that involve regression of a functional response on univariate/multivariate predictors with nonparametrically represented functional coefficients. In existing methods, however, the functional regression can be sensitive to outlying curves and outlying regions of curves, so is not robust. In this paper, we introduce a new Bayesian method, robust functional mixed models (R-FMM), for performing robust functional regression within the general functional mixed model framework, which includes multiple continuous or categorical predictors and random effect functions accommodating potential between-function correlation induced by the experimental design. The underlying model involves a hierarchical scale mixture model for the fixed effects, random effect and residual error functions. These modeling assumptions across curves result in robust nonparametric estimators of the fixed and random effect functions which down-weight outlying curves and regions of curves, and produce statistics that can be used to flag global and local outliers. These assumptions also lead to distributions across wavelet coefficients that have outstanding sparsity and adaptive shrinkage properties, with great flexibility for the data to determine the sparsity and the heaviness of the tails. Together with the down-weighting of outliers, these within-curve properties lead to fixed and random effect function estimates that appear in our simulations to be remarkably adaptive in their ability to remove spurious features yet retain true features of the functions. We have developed general code to implement this fully Bayesian method that is automatic, requiring the user to only provide the functional data and design matrices. It is efficient enough to handle large data sets, and yields posterior samples of all model parameters that can be used to perform desired Bayesian estimation and inference. Although we present details for a specific implementation of the R-FMM using specific distributional choices in the hierarchical model, 1D functions, and wavelet transforms, the method can be applied more generally using other heavy-tailed distributions, higher dimensional functions (e.g. images), and using other invertible transformations as alternatives to wavelets. PMID:22308015

  12. Bayesian isotonic density regression

    PubMed Central

    Wang, Lianming; Dunson, David B.

    2011-01-01

    Density regression models allow the conditional distribution of the response given predictors to change flexibly over the predictor space. Such models are much more flexible than nonparametric mean regression models with nonparametric residual distributions, and are well supported in many applications. A rich variety of Bayesian methods have been proposed for density regression, but it is not clear whether such priors have full support so that any true data-generating model can be accurately approximated. This article develops a new class of density regression models that incorporate stochastic-ordering constraints which are natural when a response tends to increase or decrease monotonely with a predictor. Theory is developed showing large support. Methods are developed for hypothesis testing, with posterior computation relying on a simple Gibbs sampler. Frequentist properties are illustrated in a simulation study, and an epidemiology application is considered. PMID:22822259

  13. Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics

    PubMed Central

    Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven

    2011-01-01

    Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957

  14. Out-of-Sample Extensions for Non-Parametric Kernel Methods.

    PubMed

    Pan, Binbin; Chen, Wen-Sheng; Chen, Bo; Xu, Chen; Lai, Jianhuang

    2017-02-01

    Choosing suitable kernels plays an important role in the performance of kernel methods. Recently, a number of studies were devoted to developing nonparametric kernels. Without assuming any parametric form of the target kernel, nonparametric kernel learning offers a flexible scheme to utilize the information of the data, which may potentially characterize the data similarity better. The kernel methods using nonparametric kernels are referred to as nonparametric kernel methods. However, many nonparametric kernel methods are restricted to transductive learning, where the prediction function is defined only over the data points given beforehand. They have no straightforward extension for the out-of-sample data points, and thus cannot be applied to inductive learning. In this paper, we show how to make the nonparametric kernel methods applicable to inductive learning. The key problem of out-of-sample extension is how to extend the nonparametric kernel matrix to the corresponding kernel function. A regression approach in the hyper reproducing kernel Hilbert space is proposed to solve this problem. Empirical results indicate that the out-of-sample performance is comparable to the in-sample performance in most cases. Experiments on face recognition demonstrate the superiority of our nonparametric kernel method over the state-of-the-art parametric kernel methods.

  15. Value of Information Analysis for Time-lapse Seismic Data by Simulation-Regression

    NASA Astrophysics Data System (ADS)

    Dutta, G.; Mukerji, T.; Eidsvik, J.

    2016-12-01

    A novel method to estimate the Value of Information (VOI) of time-lapse seismic data in the context of reservoir development is proposed. VOI is a decision analytic metric quantifying the incremental value that would be created by collecting information prior to making a decision under uncertainty. The VOI has to be computed before collecting the information and can be used to justify its collection. Previous work on estimating the VOI of geophysical data has involved explicit approximation of the posterior distribution of reservoir properties given the data and then evaluating the prospect values for that posterior distribution of reservoir properties. Here, we propose to directly estimate the prospect values given the data by building a statistical relationship between them using regression. Various regression techniques such as Partial Least Squares Regression (PLSR), Multivariate Adaptive Regression Splines (MARS) and k-Nearest Neighbors (k-NN) are used to estimate the VOI, and the results compared. For a univariate Gaussian case, the VOI obtained from simulation-regression has been shown to be close to the analytical solution. Estimating VOI by simulation-regression is much less computationally expensive since the posterior distribution of reservoir properties given each possible dataset need not be modeled and the prospect values need not be evaluated for each such posterior distribution of reservoir properties. This method is flexible, since it does not require rigid model specification of posterior but rather fits conditional expectations non-parametrically from samples of values and data.

  16. Measuring firm size distribution with semi-nonparametric densities

    NASA Astrophysics Data System (ADS)

    Cortés, Lina M.; Mora-Valencia, Andrés; Perote, Javier

    2017-11-01

    In this article, we propose a new methodology based on a (log) semi-nonparametric (log-SNP) distribution that nests the lognormal and enables better fits in the upper tail of the distribution through the introduction of new parameters. We test the performance of the lognormal and log-SNP distributions capturing firm size, measured through a sample of US firms in 2004-2015. Taking different levels of aggregation by type of economic activity, our study shows that the log-SNP provides a better fit of the firm size distribution. We also formally introduce the multivariate log-SNP distribution, which encompasses the multivariate lognormal, to analyze the estimation of the joint distribution of the value of the firm's assets and sales. The results suggest that sales are a better firm size measure, as indicated by other studies in the literature.

  17. Nonparametric Item Response Curve Estimation with Correction for Measurement Error

    ERIC Educational Resources Information Center

    Guo, Hongwen; Sinharay, Sandip

    2011-01-01

    Nonparametric or kernel regression estimation of item response curves (IRCs) is often used in item analysis in testing programs. These estimates are biased when the observed scores are used as the regressor because the observed scores are contaminated by measurement error. Accuracy of this estimation is a concern theoretically and operationally.…

  18. Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.

    PubMed

    Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R

    2018-06-01

    Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.

  19. A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION

    EPA Science Inventory

    We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...

  20. Applying Multivariate Adaptive Splines to Identify Genes With Expressions Varying After Diagnosis in Microarray Experiments.

    PubMed

    Duan, Fenghai; Xu, Ye

    2017-01-01

    To analyze a microarray experiment to identify the genes with expressions varying after the diagnosis of breast cancer. A total of 44 928 probe sets in an Affymetrix microarray data publicly available on Gene Expression Omnibus from 249 patients with breast cancer were analyzed by the nonparametric multivariate adaptive splines. Then, the identified genes with turning points were grouped by K-means clustering, and their network relationship was subsequently analyzed by the Ingenuity Pathway Analysis. In total, 1640 probe sets (genes) were reliably identified to have turning points along with the age at diagnosis in their expression profiling, of which 927 expressed lower after turning points and 713 expressed higher after the turning points. K-means clustered them into 3 groups with turning points centering at 54, 62.5, and 72, respectively. The pathway analysis showed that the identified genes were actively involved in various cancer-related functions or networks. In this article, we applied the nonparametric multivariate adaptive splines method to a publicly available gene expression data and successfully identified genes with expressions varying before and after breast cancer diagnosis.

  1. Asymptotics of nonparametric L-1 regression models with dependent data

    PubMed Central

    ZHAO, ZHIBIAO; WEI, YING; LIN, DENNIS K.J.

    2013-01-01

    We investigate asymptotic properties of least-absolute-deviation or median quantile estimates of the location and scale functions in nonparametric regression models with dependent data from multiple subjects. Under a general dependence structure that allows for longitudinal data and some spatially correlated data, we establish uniform Bahadur representations for the proposed median quantile estimates. The obtained Bahadur representations provide deep insights into the asymptotic behavior of the estimates. Our main theoretical development is based on studying the modulus of continuity of kernel weighted empirical process through a coupling argument. Progesterone data is used for an illustration. PMID:24955016

  2. Reexamining the effects of gestational age, fetal growth, and maternal smoking on neonatal mortality

    PubMed Central

    Ananth, Cande V; Platt, Robert W

    2004-01-01

    Background Low birth weight (<2,500 g) is a strong predictor of infant mortality. Yet low birth weight, in isolation, is uninformative since it is comprised of two intertwined components: preterm delivery and reduced fetal growth. Through nonparametric logistic regression models, we examine the effects of gestational age, fetal growth, and maternal smoking on neonatal mortality. Methods We derived data on over 10 million singleton live births delivered at ≥ 24 weeks from the 1998–2000 U.S. natality data files. Nonparametric multivariable logistic regression based on generalized additive models was used to examine neonatal mortality (deaths within the first 28 days) in relation to fetal growth (gestational age-specific standardized birth weight), gestational age, and number of cigarettes smoked per day. All analyses were further adjusted for the confounding effects due to maternal age and gravidity. Results The relationship between standardized birth weight and neonatal mortality is nonlinear; mortality is high at low z-score birth weights, drops precipitously with increasing z-score birth weight, and begins to flatten for heavier infants. Gestational age is also strongly associated with mortality, with patterns similar to those of z-score birth weight. Although the direct effect of smoking on neonatal mortality is weak, its effects (on mortality) appear to be largely mediated through reduced fetal growth and, to a lesser extent, through shortened gestation. In fact, the association between smoking and reduced fetal growth gets stronger as pregnancies approach term. Conclusions Our study provides important insights regarding the combined effects of fetal growth, gestational age, and smoking on neonatal mortality. The findings suggest that the effect of maternal smoking on neonatal mortality is largely mediated through reduced fetal growth. PMID:15574192

  3. Comparison Between Linear and Non-parametric Regression Models for Genome-Enabled Prediction in Wheat

    PubMed Central

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-01-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882

  4. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat.

    PubMed

    Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne

    2012-12-01

    In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.

  5. The Statistical Consulting Center for Astronomy (SCCA)

    NASA Technical Reports Server (NTRS)

    Akritas, Michael

    2001-01-01

    The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.

  6. Nonparametric Bayesian Segmentation of a Multivariate Inhomogeneous Space-Time Poisson Process.

    PubMed

    Ding, Mingtao; He, Lihan; Dunson, David; Carin, Lawrence

    2012-12-01

    A nonparametric Bayesian model is proposed for segmenting time-evolving multivariate spatial point process data. An inhomogeneous Poisson process is assumed, with a logistic stick-breaking process (LSBP) used to encourage piecewise-constant spatial Poisson intensities. The LSBP explicitly favors spatially contiguous segments, and infers the number of segments based on the observed data. The temporal dynamics of the segmentation and of the Poisson intensities are modeled with exponential correlation in time, implemented in the form of a first-order autoregressive model for uniformly sampled discrete data, and via a Gaussian process with an exponential kernel for general temporal sampling. We consider and compare two different inference techniques: a Markov chain Monte Carlo sampler, which has relatively high computational complexity; and an approximate and efficient variational Bayesian analysis. The model is demonstrated with a simulated example and a real example of space-time crime events in Cincinnati, Ohio, USA.

  7. Two-sample tests and one-way MANOVA for multivariate biomarker data with nondetects.

    PubMed

    Thulin, M

    2016-09-10

    Testing whether the mean vector of a multivariate set of biomarkers differs between several populations is an increasingly common problem in medical research. Biomarker data is often left censored because some measurements fall below the laboratory's detection limit. We investigate how such censoring affects multivariate two-sample and one-way multivariate analysis of variance tests. Type I error rates, power and robustness to increasing censoring are studied, under both normality and non-normality. Parametric tests are found to perform better than non-parametric alternatives, indicating that the current recommendations for analysis of censored multivariate data may have to be revised. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  8. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    USGS Publications Warehouse

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.

  9. Associations Between Physician Empathy, Physician Characteristics, and Standardized Measures of Patient Experience.

    PubMed

    Chaitoff, Alexander; Sun, Bob; Windover, Amy; Bokar, Daniel; Featherall, Joseph; Rothberg, Michael B; Misra-Hebert, Anita D

    2017-10-01

    To identify correlates of physician empathy and determine whether physician empathy is related to standardized measures of patient experience. Demographic, professional, and empathy data were collected during 2013-2015 from Cleveland Clinic Health System physicians prior to participation in mandatory communication skills training. Empathy was assessed using the Jefferson Scale of Empathy. Data were also collected for seven measures (six provider communication items and overall provider rating) from the visit-specific and 12-month Consumer Assessment of Healthcare Providers and Systems Clinician and Group (CG-CAHPS) surveys. Associations between empathy and provider characteristics were assessed by linear regression, ANOVA, or a nonparametric equivalent. Significant predictors were included in a multivariable linear regression model. Correlations between empathy and CG-CAHPS scores were assessed using Spearman rank correlation coefficients. In bivariable analysis (n = 847 physicians), female sex (P < .001), specialty (P < .01), outpatient practice setting (P < .05), and DO degree (P < .05) were associated with higher empathy scores. In multivariable analysis, female sex (P < .001) and four specialties (obstetrics-gynecology, pediatrics, psychiatry, and thoracic surgery; all P < .05) were significantly associated with higher empathy scores. Of the seven CG-CAHPS measures, scores on five for the 583 physicians with visit-specific data and on three for the 277 physicians with 12-month data were positively correlated with empathy. Specialty and sex were independently associated with physician empathy. Empathy was correlated with higher scores on multiple CG-CAHPS items, suggesting improving physician empathy might play a role in improving patient experience.

  10. Comparative study of some robust statistical methods: weighted, parametric, and nonparametric linear regression of HPLC convoluted peak responses using internal standard method in drug bioavailability studies.

    PubMed

    Korany, Mohamed A; Maher, Hadir M; Galal, Shereen M; Ragab, Marwa A A

    2013-05-01

    This manuscript discusses the application and the comparison between three statistical regression methods for handling data: parametric, nonparametric, and weighted regression (WR). These data were obtained from different chemometric methods applied to the high-performance liquid chromatography response data using the internal standard method. This was performed on a model drug Acyclovir which was analyzed in human plasma with the use of ganciclovir as internal standard. In vivo study was also performed. Derivative treatment of chromatographic response ratio data was followed by convolution of the resulting derivative curves using 8-points sin x i polynomials (discrete Fourier functions). This work studies and also compares the application of WR method and Theil's method, a nonparametric regression (NPR) method with the least squares parametric regression (LSPR) method, which is considered the de facto standard method used for regression. When the assumption of homoscedasticity is not met for analytical data, a simple and effective way to counteract the great influence of the high concentrations on the fitted regression line is to use WR method. WR was found to be superior to the method of LSPR as the former assumes that the y-direction error in the calibration curve will increase as x increases. Theil's NPR method was also found to be superior to the method of LSPR as the former assumes that errors could occur in both x- and y-directions and that might not be normally distributed. Most of the results showed a significant improvement in the precision and accuracy on applying WR and NPR methods relative to LSPR.

  11. Semiparametric regression during 2003–2007*

    PubMed Central

    Ruppert, David; Wand, M.P.; Carroll, Raymond J.

    2010-01-01

    Semiparametric regression is a fusion between parametric regression and nonparametric regression that integrates low-rank penalized splines, mixed model and hierarchical Bayesian methodology – thus allowing more streamlined handling of longitudinal and spatial correlation. We review progress in the field over the five-year period between 2003 and 2007. We find semiparametric regression to be a vibrant field with substantial involvement and activity, continual enhancement and widespread application. PMID:20305800

  12. Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data

    PubMed Central

    Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

    2012-01-01

    Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions. PMID:23645976

  13. Joint nonparametric correction estimator for excess relative risk regression in survival analysis with exposure measurement error

    PubMed Central

    Wang, Ching-Yun; Cullings, Harry; Song, Xiao; Kopecky, Kenneth J.

    2017-01-01

    SUMMARY Observational epidemiological studies often confront the problem of estimating exposure-disease relationships when the exposure is not measured exactly. In the paper, we investigate exposure measurement error in excess relative risk regression, which is a widely used model in radiation exposure effect research. In the study cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies a generalized version of the classical additive measurement error model, but it may or may not have repeated measurements. In addition, an instrumental variable is available for individuals in a subset of the whole cohort. We develop a nonparametric correction (NPC) estimator using data from the subcohort, and further propose a joint nonparametric correction (JNPC) estimator using all observed data to adjust for exposure measurement error. An optimal linear combination estimator of JNPC and NPC is further developed. The proposed estimators are nonparametric, which are consistent without imposing a covariate or error distribution, and are robust to heteroscedastic errors. Finite sample performance is examined via a simulation study. We apply the developed methods to data from the Radiation Effects Research Foundation, in which chromosome aberration is used to adjust for the effects of radiation dose measurement error on the estimation of radiation dose responses. PMID:29354018

  14. Parametrically Guided Generalized Additive Models with Application to Mergers and Acquisitions Data.

    PubMed

    Fan, Jianqing; Maity, Arnab; Wang, Yihui; Wu, Yichao

    2013-01-01

    Generalized nonparametric additive models present a flexible way to evaluate the effects of several covariates on a general outcome of interest via a link function. In this modeling framework, one assumes that the effect of each of the covariates is nonparametric and additive. However, in practice, often there is prior information available about the shape of the regression functions, possibly from pilot studies or exploratory analysis. In this paper, we consider such situations and propose an estimation procedure where the prior information is used as a parametric guide to fit the additive model. Specifically, we first posit a parametric family for each of the regression functions using the prior information (parametric guides). After removing these parametric trends, we then estimate the remainder of the nonparametric functions using a nonparametric generalized additive model, and form the final estimates by adding back the parametric trend. We investigate the asymptotic properties of the estimates and show that when a good guide is chosen, the asymptotic variance of the estimates can be reduced significantly while keeping the asymptotic variance same as the unguided estimator. We observe the performance of our method via a simulation study and demonstrate our method by applying to a real data set on mergers and acquisitions.

  15. Mixed-effects Gaussian process functional regression models with application to dose-response curve prediction.

    PubMed

    Shi, J Q; Wang, B; Will, E J; West, R M

    2012-11-20

    We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.

  16. REGRES: A FORTRAN-77 program to calculate nonparametric and ``structural'' parametric solutions to bivariate regression equations

    NASA Astrophysics Data System (ADS)

    Rock, N. M. S.; Duffy, T. R.

    REGRES allows a range of regression equations to be calculated for paired sets of data values in which both variables are subject to error (i.e. neither is the "independent" variable). Nonparametric regressions, based on medians of all possible pairwise slopes and intercepts, are treated in detail. Estimated slopes and intercepts are output, along with confidence limits, Spearman and Kendall rank correlation coefficients. Outliers can be rejected with user-determined stringency. Parametric regressions can be calculated for any value of λ (the ratio of the variances of the random errors for y and x)—including: (1) major axis ( λ = 1); (2) reduced major axis ( λ = variance of y/variance of x); (3) Y on Xλ = infinity; or (4) X on Y ( λ = 0) solutions. Pearson linear correlation coefficients also are output. REGRES provides an alternative to conventional isochron assessment techniques where bivariate normal errors cannot be assumed, or weighting methods are inappropriate.

  17. Mapping the Structure-Function Relationship in Glaucoma and Healthy Patients Measured with Spectralis OCT and Humphrey Perimetry

    PubMed Central

    Muñoz–Negrete, Francisco J.; Oblanca, Noelia; Rebolleda, Gema

    2018-01-01

    Purpose To study the structure-function relationship in glaucoma and healthy patients assessed with Spectralis OCT and Humphrey perimetry using new statistical approaches. Materials and Methods Eighty-five eyes were prospectively selected and divided into 2 groups: glaucoma (44) and healthy patients (41). Three different statistical approaches were carried out: (1) factor analysis of the threshold sensitivities (dB) (automated perimetry) and the macular thickness (μm) (Spectralis OCT), subsequently applying Pearson's correlation to the obtained regions, (2) nonparametric regression analysis relating the values in each pair of regions that showed significant correlation, and (3) nonparametric spatial regressions using three models designed for the purpose of this study. Results In the glaucoma group, a map that relates structural and functional damage was drawn. The strongest correlation with visual fields was observed in the peripheral nasal region of both superior and inferior hemigrids (r = 0.602 and r = 0.458, resp.). The estimated functions obtained with the nonparametric regressions provided the mean sensitivity that corresponds to each given macular thickness. These functions allowed for accurate characterization of the structure-function relationship. Conclusions Both maps and point-to-point functions obtained linking structure and function damage contribute to a better understanding of this relationship and may help in the future to improve glaucoma diagnosis. PMID:29850196

  18. Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

    PubMed

    Ma, Yan; Mazumdar, Madhu

    2011-10-30

    Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.

  19. Predicting Market Impact Costs Using Nonparametric Machine Learning Models.

    PubMed

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.

  20. Predicting Market Impact Costs Using Nonparametric Machine Learning Models

    PubMed Central

    Park, Saerom; Lee, Jaewook; Son, Youngdoo

    2016-01-01

    Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235

  1. Does the high–tech industry consistently reduce CO{sub 2} emissions? Results from nonparametric additive regression model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Bin; Research Center of Applied Statistics, Jiangxi University of Finance and Economics, Nanchang, Jiangxi 330013; Lin, Boqiang, E-mail: bqlin@xmu.edu.cn

    China is currently the world's largest carbon dioxide (CO{sub 2}) emitter. Moreover, total energy consumption and CO{sub 2} emissions in China will continue to increase due to the rapid growth of industrialization and urbanization. Therefore, vigorously developing the high–tech industry becomes an inevitable choice to reduce CO{sub 2} emissions at the moment or in the future. However, ignoring the existing nonlinear links between economic variables, most scholars use traditional linear models to explore the impact of the high–tech industry on CO{sub 2} emissions from an aggregate perspective. Few studies have focused on nonlinear relationships and regional differences in China. Basedmore » on panel data of 1998–2014, this study uses the nonparametric additive regression model to explore the nonlinear effect of the high–tech industry from a regional perspective. The estimated results show that the residual sum of squares (SSR) of the nonparametric additive regression model in the eastern, central and western regions are 0.693, 0.054 and 0.085 respectively, which are much less those that of the traditional linear regression model (3.158, 4.227 and 7.196). This verifies that the nonparametric additive regression model has a better fitting effect. Specifically, the high–tech industry produces an inverted “U–shaped” nonlinear impact on CO{sub 2} emissions in the eastern region, but a positive “U–shaped” nonlinear effect in the central and western regions. Therefore, the nonlinear impact of the high–tech industry on CO{sub 2} emissions in the three regions should be given adequate attention in developing effective abatement policies. - Highlights: • The nonlinear effect of the high–tech industry on CO{sub 2} emissions was investigated. • The high–tech industry yields an inverted “U–shaped” effect in the eastern region. • The high–tech industry has a positive “U–shaped” nonlinear effect in other regions. • The linear impact of the high–tech industry in the eastern region is the strongest.« less

  2. Subpixel Snow Cover Mapping from MODIS Data by Nonparametric Regression Splines

    NASA Astrophysics Data System (ADS)

    Akyurek, Z.; Kuter, S.; Weber, G. W.

    2016-12-01

    Spatial extent of snow cover is often considered as one of the key parameters in climatological, hydrological and ecological modeling due to its energy storage, high reflectance in the visible and NIR regions of the electromagnetic spectrum, significant heat capacity and insulating properties. A significant challenge in snow mapping by remote sensing (RS) is the trade-off between the temporal and spatial resolution of satellite imageries. In order to tackle this issue, machine learning-based subpixel snow mapping methods, like Artificial Neural Networks (ANNs), from low or moderate resolution images have been proposed. Multivariate Adaptive Regression Splines (MARS) is a nonparametric regression tool that can build flexible models for high dimensional and complex nonlinear data. Although MARS is not often employed in RS, it has various successful implementations such as estimation of vertical total electron content in ionosphere, atmospheric correction and classification of satellite images. This study is the first attempt in RS to evaluate the applicability of MARS for subpixel snow cover mapping from MODIS data. Total 16 MODIS-Landsat ETM+ image pairs taken over European Alps between March 2000 and April 2003 were used in the study. MODIS top-of-atmospheric reflectance, NDSI, NDVI and land cover classes were used as predictor variables. Cloud-covered, cloud shadow, water and bad-quality pixels were excluded from further analysis by a spatial mask. MARS models were trained and validated by using reference fractional snow cover (FSC) maps generated from higher spatial resolution Landsat ETM+ binary snow cover maps. A multilayer feed-forward ANN with one hidden layer trained with backpropagation was also developed. The mutual comparison of obtained MARS and ANN models was accomplished on independent test areas. The MARS model performed better than the ANN model with an average RMSE of 0.1288 over the independent test areas; whereas the average RMSE of the ANN model was 0.1500. MARS estimates for low FSC values (i.e., FSC<0.3) were better than that of ANN. Both ANN and MARS tended to overestimate medium FSC values (i.e., 0.30.7).

  3. Relationship between substances in seminal plasma and Acrobeads Test results.

    PubMed

    Komori, Kazuhiko; Tsujimura, Akira; Okamoto, Yoshio; Matsuoka, Yasuhiro; Takao, Tetsuya; Miyagawa, Yasushi; Takada, Shingo; Nonomura, Norio; Okuyama, Akihiko

    2009-01-01

    To asses the effects of seminal plasma on sperm function. Retrospective case-control study. University hospital. One hundred fourteen infertile men. Acrobeads Test scores (0-4) and measurement of interleukin (IL)-6, soluble IL-6 receptor, epidermal growth factor, insulin-like growth factor-I (IGF-I), transforming growth factor-beta I, superoxide dismutase, calcitonin, and macrophage migration inhibitory factor (MIF) levels in seminal plasma. Kruskal-Wallis test to compare the concentrations of substances as a nonparametric test for differences among Acrobeads Test scores and a multivariable logistic regression model to find independent risk factors associated with abnormal Acrobeads Test results. The Acrobeads Test score was 0 for 7 samples, 1 for 20 samples, 2 for 18 samples, 3 for 28 samples, and 4 for 41 samples. Age, abstinence period, and semen parameters, except for sperm motility and percentage of sperm with abnormal morphology, had no effect on the Acrobeads Test results. Concentrations of IGF-I and MIF were significantly higher in patients with abnormal Acrobeads Test results. Multivariate analysis indicated that MIF and IGF-I were significantly associated with abnormal Acrobeads Test results (scores 0 to 1). Although further studies are needed, IGF-I and MIF in seminal plasma may have negative effects on sperm function.

  4. Considerations in cross-validation type density smoothing with a look at some data

    NASA Technical Reports Server (NTRS)

    Schuster, E. F.

    1982-01-01

    Experience gained in applying nonparametric maximum likelihood techniques of density estimation to judge the comparative quality of various estimators is reported. Two invariate data sets of one hundered samples (one Cauchy, one natural normal) are considered as well as studies in the multivariate case.

  5. Geostatistics and petroleum geology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hohn, M.E.

    1988-01-01

    This book examines purpose and use of geostatistics in exploration and development of oil and gas with an emphasis on appropriate and pertinent case studies. It present an overview of geostatistics. Topics covered include: The semivariogram; Linear estimation; Multivariate geostatistics; Nonlinear estimation; From indicator variables to nonparametric estimation; and More detail, less certainty; conditional simulation.

  6. A nonparametric clustering technique which estimates the number of clusters

    NASA Technical Reports Server (NTRS)

    Ramey, D. B.

    1983-01-01

    In applications of cluster analysis, one usually needs to determine the number of clusters, K, and the assignment of observations to each cluster. A clustering technique based on recursive application of a multivariate test of bimodality which automatically estimates both K and the cluster assignments is presented.

  7. Nonparametric Methods in Astronomy: Think, Regress, Observe—Pick Any Three

    NASA Astrophysics Data System (ADS)

    Steinhardt, Charles L.; Jermyn, Adam S.

    2018-02-01

    Telescopes are much more expensive than astronomers, so it is essential to minimize required sample sizes by using the most data-efficient statistical methods possible. However, the most commonly used model-independent techniques for finding the relationship between two variables in astronomy are flawed. In the worst case they can lead without warning to subtly yet catastrophically wrong results, and even in the best case they require more data than necessary. Unfortunately, there is no single best technique for nonparametric regression. Instead, we provide a guide for how astronomers can choose the best method for their specific problem and provide a python library with both wrappers for the most useful existing algorithms and implementations of two new algorithms developed here.

  8. On the Bias-Amplifying Effect of Near Instruments in Observational Studies

    ERIC Educational Resources Information Center

    Steiner, Peter M.; Kim, Yongnam

    2014-01-01

    In contrast to randomized experiments, the estimation of unbiased treatment effects from observational data requires an analysis that conditions on all confounding covariates. Conditioning on covariates can be done via standard parametric regression techniques or nonparametric matching like propensity score (PS) matching. The regression or…

  9. A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

    PubMed

    Karabatsos, George

    2017-02-01

    Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.

  10. AucPR: an AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.

    PubMed

    Yu, Wenbao; Park, Taesung

    2014-01-01

    It is common to get an optimal combination of markers for disease classification and prediction when multiple markers are available. Many approaches based on the area under the receiver operating characteristic curve (AUC) have been proposed. Existing works based on AUC in a high-dimensional context depend mainly on a non-parametric, smooth approximation of AUC, with no work using a parametric AUC-based approach, for high-dimensional data. We propose an AUC-based approach using penalized regression (AucPR), which is a parametric method used for obtaining a linear combination for maximizing the AUC. To obtain the AUC maximizer in a high-dimensional context, we transform a classical parametric AUC maximizer, which is used in a low-dimensional context, into a regression framework and thus, apply the penalization regression approach directly. Two kinds of penalization, lasso and elastic net, are considered. The parametric approach can avoid some of the difficulties of a conventional non-parametric AUC-based approach, such as the lack of an appropriate concave objective function and a prudent choice of the smoothing parameter. We apply the proposed AucPR for gene selection and classification using four real microarray and synthetic data. Through numerical studies, AucPR is shown to perform better than the penalized logistic regression and the nonparametric AUC-based method, in the sense of AUC and sensitivity for a given specificity, particularly when there are many correlated genes. We propose a powerful parametric and easily-implementable linear classifier AucPR, for gene selection and disease prediction for high-dimensional data. AucPR is recommended for its good prediction performance. Beside gene expression microarray data, AucPR can be applied to other types of high-dimensional omics data, such as miRNA and protein data.

  11. Strong consistency of nonparametric Bayes density estimation on compact metric spaces with applications to specific manifolds

    PubMed Central

    Bhattacharya, Abhishek; Dunson, David B.

    2012-01-01

    This article considers a broad class of kernel mixture density models on compact metric spaces and manifolds. Following a Bayesian approach with a nonparametric prior on the location mixing distribution, sufficient conditions are obtained on the kernel, prior and the underlying space for strong posterior consistency at any continuous density. The prior is also allowed to depend on the sample size n and sufficient conditions are obtained for weak and strong consistency. These conditions are verified on compact Euclidean spaces using multivariate Gaussian kernels, on the hypersphere using a von Mises-Fisher kernel and on the planar shape space using complex Watson kernels. PMID:22984295

  12. Identifying cytokine predictors of cognitive functioning in breast cancer survivors up to 10 years post chemotherapy using machine learning.

    PubMed

    Henneghan, Ashley M; Palesh, Oxana; Harrison, Michelle; Kesler, Shelli R

    2018-07-15

    The purpose of this study is to explore 13 cytokine predictors of chemotherapy-related cognitive impairment (CRCI) in breast cancer survivors (BCS) 6 months to 10 years after chemotherapy completion using a multivariate, non-parametric approach. Cross sectional data collection included completion of a survey, cognitive testing, and non-fasting blood from 66 participants. Data were analyzed using random forest regression to identify the most significant predictors for each of the cognitive test scores. A different cytokine profile predicted each cognitive test. Adjusted R 2 for each model ranged from 0.71-0.77 (p's < 9.50 -10 ). The relationships between all the cytokine predictors and cognitive test scores were non-linear. Our findings are unique to the field of CRCI and suggest non-linear cytokine specificity to neural networks underlying cognitive functions assessed in this study. Copyright © 2018 Elsevier B.V. All rights reserved.

  13. Nonparametric and Semiparametric Regression Estimation for Length-biased Survival Data

    PubMed Central

    Shen, Yu; Ning, Jing; Qin, Jing

    2016-01-01

    For the past several decades, nonparametric and semiparametric modeling for conventional right-censored survival data has been investigated intensively under a noninformative censoring mechanism. However, these methods may not be applicable for analyzing right-censored survival data that arise from prevalent cohorts when the failure times are subject to length-biased sampling. This review article is intended to provide a summary of some newly developed methods as well as established methods for analyzing length-biased data. PMID:27086362

  14. Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample

    PubMed Central

    Wang, Ching-Yun; Song, Xiao

    2017-01-01

    SUMMARY Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women’s Health Initiative. PMID:27546625

  15. A Bayesian Nonparametric Causal Model for Regression Discontinuity Designs

    ERIC Educational Resources Information Center

    Karabatsos, George; Walker, Stephen G.

    2013-01-01

    The regression discontinuity (RD) design (Thistlewaite & Campbell, 1960; Cook, 2008) provides a framework to identify and estimate causal effects from a non-randomized design. Each subject of a RD design is assigned to the treatment (versus assignment to a non-treatment) whenever her/his observed value of the assignment variable equals or…

  16. Recent trends in counts of migrant hawks from northeastern North America

    USGS Publications Warehouse

    Titus, K.; Fuller, M.R.

    1990-01-01

    Using simple regression, pooled-sites route-regression, and nonparametric rank-trend analyses, we evaluated trends in counts of hawks migrating past 6 eastern hawk lookouts from 1972 to 1987. The indexing variable was the total count for a season. Bald eagle (Haliaeetus leucocephalus), peregrine falcon (Falco peregrinus), merlin (F. columbarius), osprey (Pandion haliaetus), and Cooper's hawk (Accipiter cooperii) counts increased using route-regression and nonparametric methods (P 0.10). We found no consistent trends (P > 0.10) in counts of sharp-shinned hawks (A. striatus), northern goshawks (A. gentilis) red-shouldered hawks (Buteo lineatus), red-tailed hawks (B. jamaicensis), rough-legged hawsk (B. lagopus), and American kestrels (F. sparverius). Broad-winged hawk (B. platypterus) counts declined (P < 0.05) based on the route-regression method. Empirical comparisons of our results with those for well-studied species such as the peregrine falcon, bald eagle, and osprey indicated agreement with nesting surveys. We suggest that counts of migrant hawks are a useful and economical method for detecting long-term trends in species across regions, particularly for species that otherwise cannot be easily surveyed.

  17. Local Composite Quantile Regression Smoothing for Harris Recurrent Markov Processes

    PubMed Central

    Li, Degui; Li, Runze

    2016-01-01

    In this paper, we study the local polynomial composite quantile regression (CQR) smoothing method for the nonlinear and nonparametric models under the Harris recurrent Markov chain framework. The local polynomial CQR regression method is a robust alternative to the widely-used local polynomial method, and has been well studied in stationary time series. In this paper, we relax the stationarity restriction on the model, and allow that the regressors are generated by a general Harris recurrent Markov process which includes both the stationary (positive recurrent) and nonstationary (null recurrent) cases. Under some mild conditions, we establish the asymptotic theory for the proposed local polynomial CQR estimator of the mean regression function, and show that the convergence rate for the estimator in nonstationary case is slower than that in stationary case. Furthermore, a weighted type local polynomial CQR estimator is provided to improve the estimation efficiency, and a data-driven bandwidth selection is introduced to choose the optimal bandwidth involved in the nonparametric estimators. Finally, we give some numerical studies to examine the finite sample performance of the developed methodology and theory. PMID:27667894

  18. Modeling the human development index and the percentage of poor people using quantile smoothing splines

    NASA Astrophysics Data System (ADS)

    Mulyani, Sri; Andriyana, Yudhie; Sudartianto

    2017-03-01

    Mean regression is a statistical method to explain the relationship between the response variable and the predictor variable based on the central tendency of the data (mean) of the response variable. The parameter estimation in mean regression (with Ordinary Least Square or OLS) generates a problem if we apply it to the data with a symmetric, fat-tailed, or containing outlier. Hence, an alternative method is necessary to be used to that kind of data, for example quantile regression method. The quantile regression is a robust technique to the outlier. This model can explain the relationship between the response variable and the predictor variable, not only on the central tendency of the data (median) but also on various quantile, in order to obtain complete information about that relationship. In this study, a quantile regression is developed with a nonparametric approach such as smoothing spline. Nonparametric approach is used if the prespecification model is difficult to determine, the relation between two variables follow the unknown function. We will apply that proposed method to poverty data. Here, we want to estimate the Percentage of Poor People as the response variable involving the Human Development Index (HDI) as the predictor variable.

  19. Bayesian nonparametric regression with varying residual density

    PubMed Central

    Pati, Debdeep; Dunson, David B.

    2013-01-01

    We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053

  20. Variable-Domain Functional Regression for Modeling ICU Data.

    PubMed

    Gellar, Jonathan E; Colantuoni, Elizabeth; Needham, Dale M; Crainiceanu, Ciprian M

    2014-12-01

    We introduce a class of scalar-on-function regression models with subject-specific functional predictor domains. The fundamental idea is to consider a bivariate functional parameter that depends both on the functional argument and on the width of the functional predictor domain. Both parametric and nonparametric models are introduced to fit the functional coefficient. The nonparametric model is theoretically and practically invariant to functional support transformation, or support registration. Methods were motivated by and applied to a study of association between daily measures of the Intensive Care Unit (ICU) Sequential Organ Failure Assessment (SOFA) score and two outcomes: in-hospital mortality, and physical impairment at hospital discharge among survivors. Methods are generally applicable to a large number of new studies that record a continuous variables over unequal domains.

  1. New analysis methods to push the boundaries of diagnostic techniques in the environmental sciences

    NASA Astrophysics Data System (ADS)

    Lungaroni, M.; Murari, A.; Peluso, E.; Gelfusa, M.; Malizia, A.; Vega, J.; Talebzadeh, S.; Gaudio, P.

    2016-04-01

    In the last years, new and more sophisticated measurements have been at the basis of the major progress in various disciplines related to the environment, such as remote sensing and thermonuclear fusion. To maximize the effectiveness of the measurements, new data analysis techniques are required. First data processing tasks, such as filtering and fitting, are of primary importance, since they can have a strong influence on the rest of the analysis. Even if Support Vector Regression is a method devised and refined at the end of the 90s, a systematic comparison with more traditional non parametric regression methods has never been reported. In this paper, a series of systematic tests is described, which indicates how SVR is a very competitive method of non-parametric regression that can usefully complement and often outperform more consolidated approaches. The performance of Support Vector Regression as a method of filtering is investigated first, comparing it with the most popular alternative techniques. Then Support Vector Regression is applied to the problem of non-parametric regression to analyse Lidar surveys for the environments measurement of particulate matter due to wildfires. The proposed approach has given very positive results and provides new perspectives to the interpretation of the data.

  2. Reporting quality of multivariable logistic regression in selected Indian medical journals.

    PubMed

    Kumar, R; Indrayan, A; Chhabra, P

    2012-01-01

    Use of multivariable logistic regression (MLR) modeling has steeply increased in the medical literature over the past few years. Testing of model assumptions and adequate reporting of MLR allow the reader to interpret results more accurately. To review the fulfillment of assumptions and reporting quality of MLR in selected Indian medical journals using established criteria. Analysis of published literature. Medknow.com publishes 68 Indian medical journals with open access. Eight of these journals had at least five articles using MLR between the years 1994 to 2008. Articles from each of these journals were evaluated according to the previously established 10-point quality criteria for reporting and to test the MLR model assumptions. SPSS 17 software and non-parametric test (Kruskal-Wallis H, Mann Whitney U, Spearman Correlation). One hundred and nine articles were finally found using MLR for analyzing the data in the selected eight journals. The number of such articles gradually increased after year 2003, but quality score remained almost similar over time. P value, odds ratio, and 95% confidence interval for coefficients in MLR was reported in 75.2% and sufficient cases (>10) per covariate of limiting sample size were reported in the 58.7% of the articles. No article reported the test for conformity of linear gradient for continuous covariates. Total score was not significantly different across the journals. However, involvement of statistician or epidemiologist as a co-author improved the average quality score significantly (P=0.014). Reporting of MLR in many Indian journals is incomplete. Only one article managed to score 8 out of 10 among 109 articles under review. All others scored less. Appropriate guidelines in instructions to authors, and pre-publication review of articles using MLR by a qualified statistician may improve quality of reporting.

  3. Stochastic search, optimization and regression with energy applications

    NASA Astrophysics Data System (ADS)

    Hannah, Lauren A.

    Designing clean energy systems will be an important task over the next few decades. One of the major roadblocks is a lack of mathematical tools to economically evaluate those energy systems. However, solutions to these mathematical problems are also of interest to the operations research and statistical communities in general. This thesis studies three problems that are of interest to the energy community itself or provide support for solution methods: R&D portfolio optimization, nonparametric regression and stochastic search with an observable state variable. First, we consider the one stage R&D portfolio optimization problem to avoid the sequential decision process associated with the multi-stage. The one stage problem is still difficult because of a non-convex, combinatorial decision space and a non-convex objective function. We propose a heuristic solution method that uses marginal project values---which depend on the selected portfolio---to create a linear objective function. In conjunction with the 0-1 decision space, this new problem can be solved as a knapsack linear program. This method scales well to large decision spaces. We also propose an alternate, provably convergent algorithm that does not exploit problem structure. These methods are compared on a solid oxide fuel cell R&D portfolio problem. Next, we propose Dirichlet Process mixtures of Generalized Linear Models (DPGLM), a new method of nonparametric regression that accommodates continuous and categorical inputs, and responses that can be modeled by a generalized linear model. We prove conditions for the asymptotic unbiasedness of the DP-GLM regression mean function estimate. We also give examples for when those conditions hold, including models for compactly supported continuous distributions and a model with continuous covariates and categorical response. We empirically analyze the properties of the DP-GLM and why it provides better results than existing Dirichlet process mixture regression models. We evaluate DP-GLM on several data sets, comparing it to modern methods of nonparametric regression like CART, Bayesian trees and Gaussian processes. Compared to existing techniques, the DP-GLM provides a single model (and corresponding inference algorithms) that performs well in many regression settings. Finally, we study convex stochastic search problems where a noisy objective function value is observed after a decision is made. There are many stochastic search problems whose behavior depends on an exogenous state variable which affects the shape of the objective function. Currently, there is no general purpose algorithm to solve this class of problems. We use nonparametric density estimation to take observations from the joint state-outcome distribution and use them to infer the optimal decision for a given query state. We propose two solution methods that depend on the problem characteristics: function-based and gradient-based optimization. We examine two weighting schemes, kernel-based weights and Dirichlet process-based weights, for use with the solution methods. The weights and solution methods are tested on a synthetic multi-product newsvendor problem and the hour-ahead wind commitment problem. Our results show that in some cases Dirichlet process weights offer substantial benefits over kernel based weights and more generally that nonparametric estimation methods provide good solutions to otherwise intractable problems.

  4. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  5. Using Evidence-Based Decision Trees Instead of Formulas to Identify At-Risk Readers. REL 2014-036

    ERIC Educational Resources Information Center

    Koon, Sharon; Petscher, Yaacov; Foorman, Barbara R.

    2014-01-01

    This study examines whether the classification and regression tree (CART) model improves the early identification of students at risk for reading comprehension difficulties compared with the more difficult to interpret logistic regression model. CART is a type of predictive modeling that relies on nonparametric techniques. It presents results in…

  6. Bayesian Nonparametric Inference – Why and How

    PubMed Central

    Müller, Peter; Mitra, Riten

    2013-01-01

    We review inference under models with nonparametric Bayesian (BNP) priors. The discussion follows a set of examples for some common inference problems. The examples are chosen to highlight problems that are challenging for standard parametric inference. We discuss inference for density estimation, clustering, regression and for mixed effects models with random effects distributions. While we focus on arguing for the need for the flexibility of BNP models, we also review some of the more commonly used BNP models, thus hopefully answering a bit of both questions, why and how to use BNP. PMID:24368932

  7. A New Hybrid-Multiscale SSA Prediction of Non-Stationary Time Series

    NASA Astrophysics Data System (ADS)

    Ghanbarzadeh, Mitra; Aminghafari, Mina

    2016-02-01

    Singular spectral analysis (SSA) is a non-parametric method used in the prediction of non-stationary time series. It has two parameters, which are difficult to determine and very sensitive to their values. Since, SSA is a deterministic-based method, it does not give good results when the time series is contaminated with a high noise level and correlated noise. Therefore, we introduce a novel method to handle these problems. It is based on the prediction of non-decimated wavelet (NDW) signals by SSA and then, prediction of residuals by wavelet regression. The advantages of our method are the automatic determination of parameters and taking account of the stochastic structure of time series. As shown through the simulated and real data, we obtain better results than SSA, a non-parametric wavelet regression method and Holt-Winters method.

  8. Robust best linear estimator for Cox regression with instrumental variables in whole cohort and surrogates with additive measurement error in calibration sample.

    PubMed

    Wang, Ching-Yun; Song, Xiao

    2016-11-01

    Biomedical researchers are often interested in estimating the effect of an environmental exposure in relation to a chronic disease endpoint. However, the exposure variable of interest may be measured with errors. In a subset of the whole cohort, a surrogate variable is available for the true unobserved exposure variable. The surrogate variable satisfies an additive measurement error model, but it may not have repeated measurements. The subset in which the surrogate variables are available is called a calibration sample. In addition to the surrogate variables that are available among the subjects in the calibration sample, we consider the situation when there is an instrumental variable available for all study subjects. An instrumental variable is correlated with the unobserved true exposure variable, and hence can be useful in the estimation of the regression coefficients. In this paper, we propose a nonparametric method for Cox regression using the observed data from the whole cohort. The nonparametric estimator is the best linear combination of a nonparametric correction estimator from the calibration sample and the difference of the naive estimators from the calibration sample and the whole cohort. The asymptotic distribution is derived, and the finite sample performance of the proposed estimator is examined via intensive simulation studies. The methods are applied to the Nutritional Biomarkers Study of the Women's Health Initiative. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  9. Error Covariance Penalized Regression: A novel multivariate model combining penalized regression with multivariate error structure.

    PubMed

    Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C

    2018-06-29

    A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.

  10. Approximating prediction uncertainty for random forest regression models

    Treesearch

    John W. Coulston; Christine E. Blinn; Valerie A. Thomas; Randolph H. Wynne

    2016-01-01

    Machine learning approaches such as random forest have increased for the spatial modeling and mapping of continuous variables. Random forest is a non-parametric ensemble approach, and unlike traditional regression approaches there is no direct quantification of prediction error. Understanding prediction uncertainty is important when using model-based continuous maps as...

  11. Bayesian quantile regression-based partially linear mixed-effects joint models for longitudinal data with multiple features.

    PubMed

    Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara

    2017-01-01

    In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.

  12. Proceedings of the Third Annual Symposium on Mathematical Pattern Recognition and Image Analysis

    NASA Technical Reports Server (NTRS)

    Guseman, L. F., Jr.

    1985-01-01

    Topics addressed include: multivariate spline method; normal mixture analysis applied to remote sensing; image data analysis; classifications in spatially correlated environments; probability density functions; graphical nonparametric methods; subpixel registration analysis; hypothesis integration in image understanding systems; rectification of satellite scanner imagery; spatial variation in remotely sensed images; smooth multidimensional interpolation; and optimal frequency domain textural edge detection filters.

  13. Estimating areal means and variances of forest attributes using the k-Nearest Neighbors technique and satellite imagery

    Treesearch

    Ronald E. McRoberts; Erkki O. Tomppo; Andrew O. Finley; Heikkinen Juha

    2007-01-01

    The k-Nearest Neighbor (k-NN) technique has become extremely popular for a variety of forest inventory mapping and estimation applications. Much of this popularity may be attributed to the non-parametric, multivariate features of the technique, its intuitiveness, and its ease of use. When used with satellite imagery and forest...

  14. Divergences and estimating tight bounds on Bayes error with applications to multivariate Gaussian copula and latent Gaussian copula

    NASA Astrophysics Data System (ADS)

    Thelen, Brian J.; Xique, Ismael J.; Burns, Joseph W.; Goley, G. Steven; Nolan, Adam R.; Benson, Jonathan W.

    2017-04-01

    In Bayesian decision theory, there has been a great amount of research into theoretical frameworks and information- theoretic quantities that can be used to provide lower and upper bounds for the Bayes error. These include well-known bounds such as Chernoff, Battacharrya, and J-divergence. Part of the challenge of utilizing these various metrics in practice is (i) whether they are "loose" or "tight" bounds, (ii) how they might be estimated via either parametric or non-parametric methods, and (iii) how accurate the estimates are for limited amounts of data. In general what is desired is a methodology for generating relatively tight lower and upper bounds, and then an approach to estimate these bounds efficiently from data. In this paper, we explore the so-called triangle divergence which has been around for a while, but was recently made more prominent in some recent research on non-parametric estimation of information metrics. Part of this work is motivated by applications for quantifying fundamental information content in SAR/LIDAR data, and to help in this, we have developed a flexible multivariate modeling framework based on multivariate Gaussian copula models which can be combined with the triangle divergence framework to quantify this information, and provide approximate bounds on Bayes error. In this paper we present an overview of the bounds, including those based on triangle divergence and verify that under a number of multivariate models, the upper and lower bounds derived from triangle divergence are significantly tighter than the other common bounds, and often times, dramatically so. We also propose some simple but effective means for computing the triangle divergence using Monte Carlo methods, and then discuss estimation of the triangle divergence from empirical data based on Gaussian Copula models.

  15. Using exogenous variables in testing for monotonic trends in hydrologic time series

    USGS Publications Warehouse

    Alley, William M.

    1988-01-01

    One approach that has been used in performing a nonparametric test for monotonic trend in a hydrologic time series consists of a two-stage analysis. First, a regression equation is estimated for the variable being tested as a function of an exogenous variable. A nonparametric trend test such as the Kendall test is then performed on the residuals from the equation. By analogy to stagewise regression and through Monte Carlo experiments, it is demonstrated that this approach will tend to underestimate the magnitude of the trend and to result in some loss in power as a result of ignoring the interaction between the exogenous variable and time. An alternative approach, referred to as the adjusted variable Kendall test, is demonstrated to generally have increased statistical power and to provide more reliable estimates of the trend slope. In addition, the utility of including an exogenous variable in a trend test is examined under selected conditions.

  16. The analysis of incontinence episodes and other count data in patients with overactive bladder by Poisson and negative binomial regression.

    PubMed

    Martina, R; Kay, R; van Maanen, R; Ridder, A

    2015-01-01

    Clinical studies in overactive bladder have traditionally used analysis of covariance or nonparametric methods to analyse the number of incontinence episodes and other count data. It is known that if the underlying distributional assumptions of a particular parametric method do not hold, an alternative parametric method may be more efficient than a nonparametric one, which makes no assumptions regarding the underlying distribution of the data. Therefore, there are advantages in using methods based on the Poisson distribution or extensions of that method, which incorporate specific features that provide a modelling framework for count data. One challenge with count data is overdispersion, but methods are available that can account for this through the introduction of random effect terms in the modelling, and it is this modelling framework that leads to the negative binomial distribution. These models can also provide clinicians with a clearer and more appropriate interpretation of treatment effects in terms of rate ratios. In this paper, the previously used parametric and non-parametric approaches are contrasted with those based on Poisson regression and various extensions in trials evaluating solifenacin and mirabegron in patients with overactive bladder. In these applications, negative binomial models are seen to fit the data well. Copyright © 2014 John Wiley & Sons, Ltd.

  17. Estimation of variance in Cox's regression model with shared gamma frailties.

    PubMed

    Andersen, P K; Klein, J P; Knudsen, K M; Tabanera y Palacios, R

    1997-12-01

    The Cox regression model with a shared frailty factor allows for unobserved heterogeneity or for statistical dependence between the observed survival times. Estimation in this model when the frailties are assumed to follow a gamma distribution is reviewed, and we address the problem of obtaining variance estimates for regression coefficients, frailty parameter, and cumulative baseline hazards using the observed nonparametric information matrix. A number of examples are given comparing this approach with fully parametric inference in models with piecewise constant baseline hazards.

  18. A program for the Bayesian Neural Network in the ROOT framework

    NASA Astrophysics Data System (ADS)

    Zhong, Jiahang; Huang, Run-Sheng; Lee, Shih-Chang

    2011-12-01

    We present a Bayesian Neural Network algorithm implemented in the TMVA package (Hoecker et al., 2007 [1]), within the ROOT framework (Brun and Rademakers, 1997 [2]). Comparing to the conventional utilization of Neural Network as discriminator, this new implementation has more advantages as a non-parametric regression tool, particularly for fitting probabilities. It provides functionalities including cost function selection, complexity control and uncertainty estimation. An example of such application in High Energy Physics is shown. The algorithm is available with ROOT release later than 5.29. Program summaryProgram title: TMVA-BNN Catalogue identifier: AEJX_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJX_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: BSD license No. of lines in distributed program, including test data, etc.: 5094 No. of bytes in distributed program, including test data, etc.: 1,320,987 Distribution format: tar.gz Programming language: C++ Computer: Any computer system or cluster with C++ compiler and UNIX-like operating system Operating system: Most UNIX/Linux systems. The application programs were thoroughly tested under Fedora and Scientific Linux CERN. Classification: 11.9 External routines: ROOT package version 5.29 or higher ( http://root.cern.ch) Nature of problem: Non-parametric fitting of multivariate distributions Solution method: An implementation of Neural Network following the Bayesian statistical interpretation. Uses Laplace approximation for the Bayesian marginalizations. Provides the functionalities of automatic complexity control and uncertainty estimation. Running time: Time consumption for the training depends substantially on the size of input sample, the NN topology, the number of training iterations, etc. For the example in this manuscript, about 7 min was used on a PC/Linux with 2.0 GHz processors.

  19. Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting

    PubMed Central

    Chan, Kwun Chuen Gary; Yam, Sheung Chi Phillip; Zhang, Zheng

    2015-01-01

    Summary The estimation of average treatment effects based on observational data is extremely important in practice and has been studied by generations of statisticians under different frameworks. Existing globally efficient estimators require non-parametric estimation of a propensity score function, an outcome regression function or both, but their performance can be poor in practical sample sizes. Without explicitly estimating either functions, we consider a wide class calibration weights constructed to attain an exact three-way balance of the moments of observed covariates among the treated, the control, and the combined group. The wide class includes exponential tilting, empirical likelihood and generalized regression as important special cases, and extends survey calibration estimators to different statistical problems and with important distinctions. Global semiparametric efficiency for the estimation of average treatment effects is established for this general class of calibration estimators. The results show that efficiency can be achieved by solely balancing the covariate distributions without resorting to direct estimation of propensity score or outcome regression function. We also propose a consistent estimator for the efficient asymptotic variance, which does not involve additional functional estimation of either the propensity score or the outcome regression functions. The proposed variance estimator outperforms existing estimators that require a direct approximation of the efficient influence function. PMID:27346982

  20. Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting.

    PubMed

    Chan, Kwun Chuen Gary; Yam, Sheung Chi Phillip; Zhang, Zheng

    2016-06-01

    The estimation of average treatment effects based on observational data is extremely important in practice and has been studied by generations of statisticians under different frameworks. Existing globally efficient estimators require non-parametric estimation of a propensity score function, an outcome regression function or both, but their performance can be poor in practical sample sizes. Without explicitly estimating either functions, we consider a wide class calibration weights constructed to attain an exact three-way balance of the moments of observed covariates among the treated, the control, and the combined group. The wide class includes exponential tilting, empirical likelihood and generalized regression as important special cases, and extends survey calibration estimators to different statistical problems and with important distinctions. Global semiparametric efficiency for the estimation of average treatment effects is established for this general class of calibration estimators. The results show that efficiency can be achieved by solely balancing the covariate distributions without resorting to direct estimation of propensity score or outcome regression function. We also propose a consistent estimator for the efficient asymptotic variance, which does not involve additional functional estimation of either the propensity score or the outcome regression functions. The proposed variance estimator outperforms existing estimators that require a direct approximation of the efficient influence function.

  1. A comparison of selected parametric and imputation methods for estimating snag density and snag quality attributes

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Hagar, Joan; Temesgen, Hailemariam

    2012-01-01

    Snags (standing dead trees) are an essential structural component of forests. Because wildlife use of snags depends on size and decay stage, snag density estimation without any information about snag quality attributes is of little value for wildlife management decision makers. Little work has been done to develop models that allow multivariate estimation of snag density by snag quality class. Using climate, topography, Landsat TM data, stand age and forest type collected for 2356 forested Forest Inventory and Analysis plots in western Washington and western Oregon, we evaluated two multivariate techniques for their abilities to estimate density of snags by three decay classes. The density of live trees and snags in three decay classes (D1: recently dead, little decay; D2: decay, without top, some branches and bark missing; D3: extensive decay, missing bark and most branches) with diameter at breast height (DBH) ≥ 12.7 cm was estimated using a nonparametric random forest nearest neighbor imputation technique (RF) and a parametric two-stage model (QPORD), for which the number of trees per hectare was estimated with a Quasipoisson model in the first stage and the probability of belonging to a tree status class (live, D1, D2, D3) was estimated with an ordinal regression model in the second stage. The presence of large snags with DBH ≥ 50 cm was predicted using a logistic regression and RF imputation. Because of the more homogenous conditions on private forest lands, snag density by decay class was predicted with higher accuracies on private forest lands than on public lands, while presence of large snags was more accurately predicted on public lands, owing to the higher prevalence of large snags on public lands. RF outperformed the QPORD model in terms of percent accurate predictions, while QPORD provided smaller root mean square errors in predicting snag density by decay class. The logistic regression model achieved more accurate presence/absence classification of large snags than the RF imputation approach. Adjusting the decision threshold to account for unequal size for presence and absence classes is more straightforward for the logistic regression than for the RF imputation approach. Overall, model accuracies were poor in this study, which can be attributed to the poor predictive quality of the explanatory variables and the large range of forest types and geographic conditions observed in the data.

  2. Factors affecting plant species composition of hedgerows: relative importance and hierarchy

    NASA Astrophysics Data System (ADS)

    Deckers, Bart; Hermy, Martin; Muys, Bart

    2004-07-01

    Although there has been a clear quantitative and qualitative decline in traditional hedgerow network landscapes during last century, hedgerows are crucial for the conservation of rural biodiversity, functioning as an important habitat, refuge and corridor for numerous species. To safeguard this conservation function, insight in the basic organizing principles of hedgerow plant communities is needed. The vegetation composition of 511 individual hedgerows situated within an ancient hedgerow network landscape in Flanders, Belgium was recorded, in combination with a wide range of explanatory variables, including a selection of spatial variables. Non-parametric statistics in combination with multivariate data analysis techniques were used to study the effect of individual explanatory variables. Next, variables were grouped in five distinct subsets and the relative importance of these variable groups was assessed by two related variation partitioning techniques, partial regression and partial canonical correspondence analysis, taking into account explicitly the existence of intercorrelations between variables of different factor groups. Most explanatory variables affected significantly hedgerow species richness and composition. Multivariate analysis showed that, besides adjacent land use, hedgerow management, soil conditions, hedgerow type and origin, the role of other factors such as hedge dimensions, intactness, etc., could certainly not be neglected. Furthermore, both methods revealed the same overall ranking of the five distinct factor groups. Besides a predominant impact of abiotic environmental conditions, it was found that management variables and structural aspects have a relatively larger influence on the distribution of plant species in hedgerows than their historical background or spatial configuration.

  3. Measuring agreement of multivariate discrete survival times using a modified weighted kappa coefficient.

    PubMed

    Guo, Ying; Manatunga, Amita K

    2009-03-01

    Assessing agreement is often of interest in clinical studies to evaluate the similarity of measurements produced by different raters or methods on the same subjects. We present a modified weighted kappa coefficient to measure agreement between bivariate discrete survival times. The proposed kappa coefficient accommodates censoring by redistributing the mass of censored observations within the grid where the unobserved events may potentially happen. A generalized modified weighted kappa is proposed for multivariate discrete survival times. We estimate the modified kappa coefficients nonparametrically through a multivariate survival function estimator. The asymptotic properties of the kappa estimators are established and the performance of the estimators are examined through simulation studies of bivariate and trivariate survival times. We illustrate the application of the modified kappa coefficient in the presence of censored observations with data from a prostate cancer study.

  4. Testing and Estimating Shape-Constrained Nonparametric Density and Regression in the Presence of Measurement Error.

    PubMed

    Carroll, Raymond J; Delaigle, Aurore; Hall, Peter

    2011-03-01

    In many applications we can expect that, or are interested to know if, a density function or a regression curve satisfies some specific shape constraints. For example, when the explanatory variable, X, represents the value taken by a treatment or dosage, the conditional mean of the response, Y , is often anticipated to be a monotone function of X. Indeed, if this regression mean is not monotone (in the appropriate direction) then the medical or commercial value of the treatment is likely to be significantly curtailed, at least for values of X that lie beyond the point at which monotonicity fails. In the case of a density, common shape constraints include log-concavity and unimodality. If we can correctly guess the shape of a curve, then nonparametric estimators can be improved by taking this information into account. Addressing such problems requires a method for testing the hypothesis that the curve of interest satisfies a shape constraint, and, if the conclusion of the test is positive, a technique for estimating the curve subject to the constraint. Nonparametric methodology for solving these problems already exists, but only in cases where the covariates are observed precisely. However in many problems, data can only be observed with measurement errors, and the methods employed in the error-free case typically do not carry over to this error context. In this paper we develop a novel approach to hypothesis testing and function estimation under shape constraints, which is valid in the context of measurement errors. Our method is based on tilting an estimator of the density or the regression mean until it satisfies the shape constraint, and we take as our test statistic the distance through which it is tilted. Bootstrap methods are used to calibrate the test. The constrained curve estimators that we develop are also based on tilting, and in that context our work has points of contact with methodology in the error-free case.

  5. Improving salt marsh digital elevation model accuracy with full-waveform lidar and nonparametric predictive modeling

    NASA Astrophysics Data System (ADS)

    Rogers, Jeffrey N.; Parrish, Christopher E.; Ward, Larry G.; Burdick, David M.

    2018-03-01

    Salt marsh vegetation tends to increase vertical uncertainty in light detection and ranging (lidar) derived elevation data, often causing the data to become ineffective for analysis of topographic features governing tidal inundation or vegetation zonation. Previous attempts at improving lidar data collected in salt marsh environments range from simply computing and subtracting the global elevation bias to more complex methods such as computing vegetation-specific, constant correction factors. The vegetation specific corrections can be used along with an existing habitat map to apply separate corrections to different areas within a study site. It is hypothesized here that correcting salt marsh lidar data by applying location-specific, point-by-point corrections, which are computed from lidar waveform-derived features, tidal-datum based elevation, distance from shoreline and other lidar digital elevation model based variables, using nonparametric regression will produce better results. The methods were developed and tested using full-waveform lidar and ground truth for three marshes in Cape Cod, Massachusetts, U.S.A. Five different model algorithms for nonparametric regression were evaluated, with TreeNet's stochastic gradient boosting algorithm consistently producing better regression and classification results. Additionally, models were constructed to predict the vegetative zone (high marsh and low marsh). The predictive modeling methods used in this study estimated ground elevation with a mean bias of 0.00 m and a standard deviation of 0.07 m (0.07 m root mean square error). These methods appear very promising for correction of salt marsh lidar data and, importantly, do not require an existing habitat map, biomass measurements, or image based remote sensing data such as multi/hyperspectral imagery.

  6. A Machine Learning Framework for Plan Payment Risk Adjustment.

    PubMed

    Rose, Sherri

    2016-12-01

    To introduce cross-validation and a nonparametric machine learning framework for plan payment risk adjustment and then assess whether they have the potential to improve risk adjustment. 2011-2012 Truven MarketScan database. We compare the performance of multiple statistical approaches within a broad machine learning framework for estimation of risk adjustment formulas. Total annual expenditure was predicted using age, sex, geography, inpatient diagnoses, and hierarchical condition category variables. The methods included regression, penalized regression, decision trees, neural networks, and an ensemble super learner, all in concert with screening algorithms that reduce the set of variables considered. The performance of these methods was compared based on cross-validated R 2 . Our results indicate that a simplified risk adjustment formula selected via this nonparametric framework maintains much of the efficiency of a traditional larger formula. The ensemble approach also outperformed classical regression and all other algorithms studied. The implementation of cross-validated machine learning techniques provides novel insight into risk adjustment estimation, possibly allowing for a simplified formula, thereby reducing incentives for increased coding intensity as well as the ability of insurers to "game" the system with aggressive diagnostic upcoding. © Health Research and Educational Trust.

  7. Source Region Identification Using Kernel Smoothing

    EPA Science Inventory

    As described in this paper, Nonparametric Wind Regression is a source-to-receptor source apportionment model that can be used to identify and quantify the impact of possible source regions of pollutants as defined by wind direction sectors. It is described in detail with an exam...

  8. Examination of influential observations in penalized spline regression

    NASA Astrophysics Data System (ADS)

    Türkan, Semra

    2013-10-01

    In parametric or nonparametric regression models, the results of regression analysis are affected by some anomalous observations in the data set. Thus, detection of these observations is one of the major steps in regression analysis. These observations are precisely detected by well-known influence measures. Pena's statistic is one of them. In this study, Pena's approach is formulated for penalized spline regression in terms of ordinary residuals and leverages. The real data and artificial data are used to see illustrate the effectiveness of Pena's statistic as to Cook's distance on detecting influential observations. The results of the study clearly reveal that the proposed measure is superior to Cook's Distance to detect these observations in large data set.

  9. Zero- vs. one-dimensional, parametric vs. non-parametric, and confidence interval vs. hypothesis testing procedures in one-dimensional biomechanical trajectory analysis.

    PubMed

    Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A

    2015-05-01

    Biomechanical processes are often manifested as one-dimensional (1D) trajectories. It has been shown that 1D confidence intervals (CIs) are biased when based on 0D statistical procedures, and the non-parametric 1D bootstrap CI has emerged in the Biomechanics literature as a viable solution. The primary purpose of this paper was to clarify that, for 1D biomechanics datasets, the distinction between 0D and 1D methods is much more important than the distinction between parametric and non-parametric procedures. A secondary purpose was to demonstrate that a parametric equivalent to the 1D bootstrap exists in the form of a random field theory (RFT) correction for multiple comparisons. To emphasize these points we analyzed six datasets consisting of force and kinematic trajectories in one-sample, paired, two-sample and regression designs. Results showed, first, that the 1D bootstrap and other 1D non-parametric CIs were qualitatively identical to RFT CIs, and all were very different from 0D CIs. Second, 1D parametric and 1D non-parametric hypothesis testing results were qualitatively identical for all six datasets. Last, we highlight the limitations of 1D CIs by demonstrating that they are complex, design-dependent, and thus non-generalizable. These results suggest that (i) analyses of 1D data based on 0D models of randomness are generally biased unless one explicitly identifies 0D variables before the experiment, and (ii) parametric and non-parametric 1D hypothesis testing provide an unambiguous framework for analysis when one׳s hypothesis explicitly or implicitly pertains to whole 1D trajectories. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. Model Robust Calibration: Method and Application to Electronically-Scanned Pressure Transducers

    NASA Technical Reports Server (NTRS)

    Walker, Eric L.; Starnes, B. Alden; Birch, Jeffery B.; Mays, James E.

    2010-01-01

    This article presents the application of a recently developed statistical regression method to the controlled instrument calibration problem. The statistical method of Model Robust Regression (MRR), developed by Mays, Birch, and Starnes, is shown to improve instrument calibration by reducing the reliance of the calibration on a predetermined parametric (e.g. polynomial, exponential, logarithmic) model. This is accomplished by allowing fits from the predetermined parametric model to be augmented by a certain portion of a fit to the residuals from the initial regression using a nonparametric (locally parametric) regression technique. The method is demonstrated for the absolute scale calibration of silicon-based pressure transducers.

  11. Some New Approaches to Multivariate Probability Distributions.

    DTIC Science & Technology

    1986-12-01

    Krishnaiah (1977). The following example may serve as an illustration of this point. EXAMPLE 2. (Fre^*chet’s bivariate continuous distribution...the error in the theorem of "" Prakasa Rao (1974) and to Dr. P.R. Krishnaiah for his valuable comments on the initial draft, his monumental patience and...M. and Proschan, F. (1984). Nonparametric Concepts and Methods in Reliability, Handbook of Statistics, 4, 613-655, (eds. P.R. Krishnaiah and P.K

  12. Modeling animal movements using stochastic differential equations

    Treesearch

    Haiganoush K. Preisler; Alan A. Ager; Bruce K. Johnson; John G. Kie

    2004-01-01

    We describe the use of bivariate stochastic differential equations (SDE) for modeling movements of 216 radiocollared female Rocky Mountain elk at the Starkey Experimental Forest and Range in northeastern Oregon. Spatially and temporally explicit vector fields were estimated using approximating difference equations and nonparametric regression techniques. Estimated...

  13. Bayesian Estimation of Multivariate Latent Regression Models: Gauss versus Laplace

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew; Park, Trevor

    2017-01-01

    A latent multivariate regression model is developed that employs a generalized asymmetric Laplace (GAL) prior distribution for regression coefficients. The model is designed for high-dimensional applications where an approximate sparsity condition is satisfied, such that many regression coefficients are near zero after accounting for all the model…

  14. DIF Trees: Using Classification Trees to Detect Differential Item Functioning

    ERIC Educational Resources Information Center

    Vaughn, Brandon K.; Wang, Qiu

    2010-01-01

    A nonparametric tree classification procedure is used to detect differential item functioning for items that are dichotomously scored. Classification trees are shown to be an alternative procedure to detect differential item functioning other than the use of traditional Mantel-Haenszel and logistic regression analysis. A nonparametric…

  15. A Bayesian Semiparametric Latent Variable Model for Mixed Responses

    ERIC Educational Resources Information Center

    Fahrmeir, Ludwig; Raach, Alexander

    2007-01-01

    In this paper we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric Gaussian regression model. We extend existing LVMs with the usual linear covariate effects by including nonparametric components for nonlinear…

  16. Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases.

    PubMed

    Wendling, T; Jung, K; Callahan, A; Schuler, A; Shah, N H; Gallego, B

    2018-06-03

    There is growing interest in using routinely collected data from health care databases to study the safety and effectiveness of therapies in "real-world" conditions, as it can provide complementary evidence to that of randomized controlled trials. Causal inference from health care databases is challenging because the data are typically noisy, high dimensional, and most importantly, observational. It requires methods that can estimate heterogeneous treatment effects while controlling for confounding in high dimensions. Bayesian additive regression trees, causal forests, causal boosting, and causal multivariate adaptive regression splines are off-the-shelf methods that have shown good performance for estimation of heterogeneous treatment effects in observational studies of continuous outcomes. However, it is not clear how these methods would perform in health care database studies where outcomes are often binary and rare and data structures are complex. In this study, we evaluate these methods in simulation studies that recapitulate key characteristics of comparative effectiveness studies. We focus on the conditional average effect of a binary treatment on a binary outcome using the conditional risk difference as an estimand. To emulate health care database studies, we propose a simulation design where real covariate and treatment assignment data are used and only outcomes are simulated based on nonparametric models of the real outcomes. We apply this design to 4 published observational studies that used records from 2 major health care databases in the United States. Our results suggest that Bayesian additive regression trees and causal boosting consistently provide low bias in conditional risk difference estimates in the context of health care database studies. Copyright © 2018 John Wiley & Sons, Ltd.

  17. Multivariate Regression Analysis and Slaughter Livestock,

    DTIC Science & Technology

    AGRICULTURE, *ECONOMICS), (*MEAT, PRODUCTION), MULTIVARIATE ANALYSIS, REGRESSION ANALYSIS , ANIMALS, WEIGHT, COSTS, PREDICTIONS, STABILITY, MATHEMATICAL MODELS, STORAGE, BEEF, PORK, FOOD, STATISTICAL DATA, ACCURACY

  18. An efficient surrogate-based simulation-optimization method for calibrating a regional MODFLOW model

    NASA Astrophysics Data System (ADS)

    Chen, Mingjie; Izady, Azizallah; Abdalla, Osman A.

    2017-01-01

    Simulation-optimization method entails a large number of model simulations, which is computationally intensive or even prohibitive if the model simulation is extremely time-consuming. Statistical models have been examined as a surrogate of the high-fidelity physical model during simulation-optimization process to tackle this problem. Among them, Multivariate Adaptive Regression Splines (MARS), a non-parametric adaptive regression method, is superior in overcoming problems of high-dimensions and discontinuities of the data. Furthermore, the stability and accuracy of MARS model can be improved by bootstrap aggregating methods, namely, bagging. In this paper, Bagging MARS (BMARS) method is integrated to a surrogate-based simulation-optimization framework to calibrate a three-dimensional MODFLOW model, which is developed to simulate the groundwater flow in an arid hardrock-alluvium region in northwestern Oman. The physical MODFLOW model is surrogated by the statistical model developed using BMARS algorithm. The surrogate model, which is fitted and validated using training dataset generated by the physical model, can approximate solutions rapidly. An efficient Sobol' method is employed to calculate global sensitivities of head outputs to input parameters, which are used to analyze their importance for the model outputs spatiotemporally. Only sensitive parameters are included in the calibration process to further improve the computational efficiency. Normalized root mean square error (NRMSE) between measured and simulated heads at observation wells is used as the objective function to be minimized during optimization. The reasonable history match between the simulated and observed heads demonstrated feasibility of this high-efficient calibration framework.

  19. Liver proteomics in progressive alcoholic steatosis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fernando, Harshica; Wiktorowicz, John E.; Soman, Kizhake V.

    2013-02-01

    Fatty liver is an early stage of alcoholic and nonalcoholic liver disease (ALD and NALD) that progresses to steatohepatitis and other irreversible conditions. In this study, we identified proteins that were differentially expressed in the livers of rats fed 5% ethanol in a Lieber–DeCarli diet daily for 1 and 3 months by discovery proteomics (two-dimensional gel electrophoresis and mass spectrometry) and non-parametric modeling (Multivariate Adaptive Regression Splines). Hepatic fatty infiltration was significantly higher in ethanol-fed animals as compared to controls, and more pronounced at 3 months of ethanol feeding. Discovery proteomics identified changes in the expression of proteins involved inmore » alcohol, lipid, and amino acid metabolism after ethanol feeding. At 1 and 3 months, 12 and 15 different proteins were differentially expressed. Of the identified proteins, down regulation of alcohol dehydrogenase (− 1.6) at 1 month and up regulation of aldehyde dehydrogenase (2.1) at 3 months could be a protective/adaptive mechanism against ethanol toxicity. In addition, betaine-homocysteine S-methyltransferase 2 a protein responsible for methionine metabolism and previously implicated in fatty liver development was significantly up regulated (1.4) at ethanol-induced fatty liver stage (1 month) while peroxiredoxin-1 was down regulated (− 1.5) at late fatty liver stage (3 months). Nonparametric analysis of the protein spots yielded fewer proteins and narrowed the list of possible markers and identified D-dopachrome tautomerase (− 1.7, at 3 months) as a possible marker for ethanol-induced early steatohepatitis. The observed differential regulation of proteins have potential to serve as biomarker signature for the detection of steatosis and its progression to steatohepatitis once validated in plasma/serum. -- Graphical abstract: The figure shows the Hierarchial cluster analysis of differentially expressed protein spots obtained after ethanol feeding for 1 (1–3) and 3 (4–6) months. C and E represent pair-fed control and ethanol-fed rats, respectively. Highlights: ► Proteins related to ethanol-induced steatosis and mild steatohepatitis are identified. ► ADH1C and ALDH2 involved in alcohol metabolism are differentially expressed at 1 and 3 months. ► Discovery proteomics identified a group of proteins to serve as potential biomarkers. ► Using nonparametric analysis DDT is identified as a possible marker for liver damage.« less

  20. Empirical study of the dependence of the results of multivariable flexible survival analyses on model selection strategy.

    PubMed

    Binquet, C; Abrahamowicz, M; Mahboubi, A; Jooste, V; Faivre, J; Bonithon-Kopp, C; Quantin, C

    2008-12-30

    Flexible survival models, which avoid assumptions about hazards proportionality (PH) or linearity of continuous covariates effects, bring the issues of model selection to a new level of complexity. Each 'candidate covariate' requires inter-dependent decisions regarding (i) its inclusion in the model, and representation of its effects on the log hazard as (ii) either constant over time or time-dependent (TD) and, for continuous covariates, (iii) either loglinear or non-loglinear (NL). Moreover, 'optimal' decisions for one covariate depend on the decisions regarding others. Thus, some efficient model-building strategy is necessary.We carried out an empirical study of the impact of the model selection strategy on the estimates obtained in flexible multivariable survival analyses of prognostic factors for mortality in 273 gastric cancer patients. We used 10 different strategies to select alternative multivariable parametric as well as spline-based models, allowing flexible modeling of non-parametric (TD and/or NL) effects. We employed 5-fold cross-validation to compare the predictive ability of alternative models.All flexible models indicated significant non-linearity and changes over time in the effect of age at diagnosis. Conventional 'parametric' models suggested the lack of period effect, whereas more flexible strategies indicated a significant NL effect. Cross-validation confirmed that flexible models predicted better mortality. The resulting differences in the 'final model' selected by various strategies had also impact on the risk prediction for individual subjects.Overall, our analyses underline (a) the importance of accounting for significant non-parametric effects of covariates and (b) the need for developing accurate model selection strategies for flexible survival analyses. Copyright 2008 John Wiley & Sons, Ltd.

  1. Neuropathic pain in Systemic Sclerosis patients: A cross-sectional study.

    PubMed

    Sousa-Neves, Joana; Cerqueira, Marcos; Santos-Faria, Daniela; Afonso, Carmo; Teixeira, Filipa

    2018-01-31

    To investigate if patients with Systemic Sclerosis (SSc) show a higher prevalence of neuropathic pain (NP) in comparison with controls. To study the relationship between clinical variables of the disease and NP among SSc patients. 48 patients and 45 controls were included. Presence of NP was assessed applying the DN4 "Douleur Neuropathique en 4 Questions" questionnaire. Different clinical variables were also assessed in patients. Statistical analysis included parametric, nonparametric tests and multivariate logistic regression. NP was significantly higher in SSc patients (56.2% vs 13.3%, p<0.001). Mean Modified Rodnan Skin Score was independently associated with the presence of NP (p<0.05, OR 1.90). Peripheral nervous system involvement in SSc is not well studied and, as far as the authors are aware, this is the first study published evaluating NP in SSc patients and controls. These findings should raise the awareness of the clinician to recognize and address the presence of NP in these patients, especially in those with severe skin involvement. Copyright © 2018 Elsevier España, S.L.U. and Sociedad Española de Reumatología y Colegio Mexicano de Reumatología. All rights reserved.

  2. Mapping Quantitative Traits in Unselected Families: Algorithms and Examples

    PubMed Central

    Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David

    2009-01-01

    Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016

  3. Detecting correlation changes in multivariate time series: A comparison of four non-parametric change point detection methods.

    PubMed

    Cabrieto, Jedelyn; Tuerlinckx, Francis; Kuppens, Peter; Grassmann, Mariel; Ceulemans, Eva

    2017-06-01

    Change point detection in multivariate time series is a complex task since next to the mean, the correlation structure of the monitored variables may also alter when change occurs. DeCon was recently developed to detect such changes in mean and\\or correlation by combining a moving windows approach and robust PCA. However, in the literature, several other methods have been proposed that employ other non-parametric tools: E-divisive, Multirank, and KCP. Since these methods use different statistical approaches, two issues need to be tackled. First, applied researchers may find it hard to appraise the differences between the methods. Second, a direct comparison of the relative performance of all these methods for capturing change points signaling correlation changes is still lacking. Therefore, we present the basic principles behind DeCon, E-divisive, Multirank, and KCP and the corresponding algorithms, to make them more accessible to readers. We further compared their performance through extensive simulations using the settings of Bulteel et al. (Biological Psychology, 98 (1), 29-42, 2014) implying changes in mean and in correlation structure and those of Matteson and James (Journal of the American Statistical Association, 109 (505), 334-345, 2014) implying different numbers of (noise) variables. KCP emerged as the best method in almost all settings. However, in case of more than two noise variables, only DeCon performed adequately in detecting correlation changes.

  4. Statistical methods for astronomical data with upper limits. II - Correlation and regression

    NASA Technical Reports Server (NTRS)

    Isobe, T.; Feigelson, E. D.; Nelson, P. I.

    1986-01-01

    Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.

  5. Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts

    NASA Astrophysics Data System (ADS)

    AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.

    2014-12-01

    Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using nonparametric kernel methods. In addition, to the pointwise hourly wind speed forecasts, a confidence interval is also provided which allows to quantify the uncertainty around the forecasts.

  6. Remote-sensing data processing with the multivariate regression analysis method for iron mineral resource potential mapping: a case study in the Sarvian area, central Iran

    NASA Astrophysics Data System (ADS)

    Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran

    2018-03-01

    This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).

  7. Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method

    PubMed Central

    Zhang, Tingting; Kou, S. C.

    2010-01-01

    Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure. PMID:21258615

  8. Nonparametric Inference of Doubly Stochastic Poisson Process Data via the Kernel Method.

    PubMed

    Zhang, Tingting; Kou, S C

    2010-01-01

    Doubly stochastic Poisson processes, also known as the Cox processes, frequently occur in various scientific fields. In this article, motivated primarily by analyzing Cox process data in biophysics, we propose a nonparametric kernel-based inference method. We conduct a detailed study, including an asymptotic analysis, of the proposed method, and provide guidelines for its practical use, introducing a fast and stable regression method for bandwidth selection. We apply our method to real photon arrival data from recent single-molecule biophysical experiments, investigating proteins' conformational dynamics. Our result shows that conformational fluctuation is widely present in protein systems, and that the fluctuation covers a broad range of time scales, highlighting the dynamic and complex nature of proteins' structure.

  9. A non-parametric consistency test of the ΛCDM model with Planck CMB data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Aghamousa, Amir; Shafieloo, Arman; Hamann, Jan, E-mail: amir@aghamousa.com, E-mail: jan.hamann@unsw.edu.au, E-mail: shafieloo@kasi.re.kr

    Non-parametric reconstruction methods, such as Gaussian process (GP) regression, provide a model-independent way of estimating an underlying function and its uncertainty from noisy data. We demonstrate how GP-reconstruction can be used as a consistency test between a given data set and a specific model by looking for structures in the residuals of the data with respect to the model's best-fit. Applying this formalism to the Planck temperature and polarisation power spectrum measurements, we test their global consistency with the predictions of the base ΛCDM model. Our results do not show any serious inconsistencies, lending further support to the interpretation ofmore » the base ΛCDM model as cosmology's gold standard.« less

  10. Machine Learning Based Evaluation of Reading and Writing Difficulties.

    PubMed

    Iwabuchi, Mamoru; Hirabayashi, Rumi; Nakamura, Kenryu; Dim, Nem Khan

    2017-01-01

    The possibility of auto evaluation of reading and writing difficulties was investigated using non-parametric machine learning (ML) regression technique for URAWSS (Understanding Reading and Writing Skills of Schoolchildren) [1] test data of 168 children of grade 1 - 9. The result showed that the ML had better prediction than the ordinary rule-based decision.

  11. Comparing Inference Approaches for RD Designs: A Reexamination of the Effect of Head Start on Child Mortality

    ERIC Educational Resources Information Center

    Cattaneo, Matias D.; Titiunik, Rocío; Vazquez-Bare, Gonzalo

    2017-01-01

    The regression discontinuity (RD) design is a popular quasi-experimental design for causal inference and policy evaluation. The most common inference approaches in RD designs employ "flexible" parametric and nonparametric local polynomial methods, which rely on extrapolation and large-sample approximations of conditional expectations…

  12. Estimating a Meaningful Point of Change: A Comparison of Exploratory Techniques Based on Nonparametric Regression

    ERIC Educational Resources Information Center

    Klotsche, Jens; Gloster, Andrew T.

    2012-01-01

    Longitudinal studies are increasingly common in psychological research. Characterized by repeated measurements, longitudinal designs aim to observe phenomena that change over time. One important question involves identification of the exact point in time when the observed phenomena begin to meaningfully change above and beyond baseline…

  13. Heterogeneity in Schooling Rates of Return

    ERIC Educational Resources Information Center

    Henderson, Daniel J.; Polachek, Solomon W.; Wang, Le

    2011-01-01

    This paper relaxes the assumption of homogeneous rates of return to schooling by employing nonparametric kernel regression. This approach allows us to examine the differences in rates of return to education both across and within groups. Similar to previous studies we find that on average blacks have higher returns to education than whites,…

  14. Self-organising mixture autoregressive model for non-stationary time series modelling.

    PubMed

    Ni, He; Yin, Hujun

    2008-12-01

    Modelling non-stationary time series has been a difficult task for both parametric and nonparametric methods. One promising solution is to combine the flexibility of nonparametric models with the simplicity of parametric models. In this paper, the self-organising mixture autoregressive (SOMAR) network is adopted as a such mixture model. It breaks time series into underlying segments and at the same time fits local linear regressive models to the clusters of segments. In such a way, a global non-stationary time series is represented by a dynamic set of local linear regressive models. Neural gas is used for a more flexible structure of the mixture model. Furthermore, a new similarity measure has been introduced in the self-organising network to better quantify the similarity of time series segments. The network can be used naturally in modelling and forecasting non-stationary time series. Experiments on artificial, benchmark time series (e.g. Mackey-Glass) and real-world data (e.g. numbers of sunspots and Forex rates) are presented and the results show that the proposed SOMAR network is effective and superior to other similar approaches.

  15. A robust nonparametric framework for reconstruction of stochastic differential equation models

    NASA Astrophysics Data System (ADS)

    Rajabzadeh, Yalda; Rezaie, Amir Hossein; Amindavar, Hamidreza

    2016-05-01

    In this paper, we employ a nonparametric framework to robustly estimate the functional forms of drift and diffusion terms from discrete stationary time series. The proposed method significantly improves the accuracy of the parameter estimation. In this framework, drift and diffusion coefficients are modeled through orthogonal Legendre polynomials. We employ the least squares regression approach along with the Euler-Maruyama approximation method to learn coefficients of stochastic model. Next, a numerical discrete construction of mean squared prediction error (MSPE) is established to calculate the order of Legendre polynomials in drift and diffusion terms. We show numerically that the new method is robust against the variation in sample size and sampling rate. The performance of our method in comparison with the kernel-based regression (KBR) method is demonstrated through simulation and real data. In case of real dataset, we test our method for discriminating healthy electroencephalogram (EEG) signals from epilepsy ones. We also demonstrate the efficiency of the method through prediction in the financial data. In both simulation and real data, our algorithm outperforms the KBR method.

  16. Correlative and multivariate analysis of increased radon concentration in underground laboratory.

    PubMed

    Maletić, Dimitrije M; Udovičić, Vladimir I; Banjanac, Radomir M; Joković, Dejan R; Dragić, Aleksandar L; Veselinović, Nikola B; Filipović, Jelena

    2014-11-01

    The results of analysis using correlative and multivariate methods, as developed for data analysis in high-energy physics and implemented in the Toolkit for Multivariate Analysis software package, of the relations of the variation of increased radon concentration with climate variables in shallow underground laboratory is presented. Multivariate regression analysis identified a number of multivariate methods which can give a good evaluation of increased radon concentrations based on climate variables. The use of the multivariate regression methods will enable the investigation of the relations of specific climate variable with increased radon concentrations by analysis of regression methods resulting in 'mapped' underlying functional behaviour of radon concentrations depending on a wide spectrum of climate variables. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  17. A nonparametric method for assessment of interactions in a median regression model for analyzing right censored data.

    PubMed

    Lee, MinJae; Rahbar, Mohammad H; Talebi, Hooshang

    2018-01-01

    We propose a nonparametric test for interactions when we are concerned with investigation of the simultaneous effects of two or more factors in a median regression model with right censored survival data. Our approach is developed to detect interaction in special situations, when the covariates have a finite number of levels with a limited number of observations in each level, and it allows varying levels of variance and censorship at different levels of the covariates. Through simulation studies, we compare the power of detecting an interaction between the study group variable and a covariate using our proposed procedure with that of the Cox Proportional Hazard (PH) model and censored quantile regression model. We also assess the impact of censoring rate and type on the standard error of the estimators of parameters. Finally, we illustrate application of our proposed method to real life data from Prospective Observational Multicenter Major Trauma Transfusion (PROMMTT) study to test an interaction effect between type of injury and study sites using median time for a trauma patient to receive three units of red blood cells. The results from simulation studies indicate that our procedure performs better than both Cox PH model and censored quantile regression model based on statistical power for detecting the interaction, especially when the number of observations is small. It is also relatively less sensitive to censoring rates or even the presence of conditionally independent censoring that is conditional on the levels of covariates.

  18. Spatio-Temporal Regression Based Clustering of Precipitation Extremes in a Presence of Systematically Missing Covariates

    NASA Astrophysics Data System (ADS)

    Kaiser, Olga; Martius, Olivia; Horenko, Illia

    2017-04-01

    Regression based Generalized Pareto Distribution (GPD) models are often used to describe the dynamics of hydrological threshold excesses relying on the explicit availability of all of the relevant covariates. But, in real application the complete set of relevant covariates might be not available. In this context, it was shown that under weak assumptions the influence coming from systematically missing covariates can be reflected by a nonstationary and nonhomogenous dynamics. We present a data-driven, semiparametric and an adaptive approach for spatio-temporal regression based clustering of threshold excesses in a presence of systematically missing covariates. The nonstationary and nonhomogenous behavior of threshold excesses is describes by a set of local stationary GPD models, where the parameters are expressed as regression models, and a non-parametric spatio-temporal hidden switching process. Exploiting nonparametric Finite Element time-series analysis Methodology (FEM) with Bounded Variation of the model parameters (BV) for resolving the spatio-temporal switching process, the approach goes beyond strong a priori assumptions made is standard latent class models like Mixture Models and Hidden Markov Models. Additionally, the presented FEM-BV-GPD provides a pragmatic description of the corresponding spatial dependence structure by grouping together all locations that exhibit similar behavior of the switching process. The performance of the framework is demonstrated on daily accumulated precipitation series over 17 different locations in Switzerland from 1981 till 2013 - showing that the introduced approach allows for a better description of the historical data.

  19. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.

    PubMed

    Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.

  20. Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification

    PubMed Central

    Feng, Yang; Jiang, Jiancheng; Tong, Xin

    2015-01-01

    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970

  1. Application of nonparametric regression methods to study the relationship between NO2 concentrations and local wind direction and speed at background sites.

    PubMed

    Donnelly, Aoife; Misstear, Bruce; Broderick, Brian

    2011-02-15

    Background concentrations of nitrogen dioxide (NO(2)) are not constant but vary temporally and spatially. The current paper presents a powerful tool for the quantification of the effects of wind direction and wind speed on background NO(2) concentrations, particularly in cases where monitoring data are limited. In contrast to previous studies which applied similar methods to sites directly affected by local pollution sources, the current study focuses on background sites with the aim of improving methods for predicting background concentrations adopted in air quality modelling studies. The relationship between measured NO(2) concentration in air at three such sites in Ireland and locally measured wind direction has been quantified using nonparametric regression methods. The major aim was to analyse a method for quantifying the effects of local wind direction on background levels of NO(2) in Ireland. The method was expanded to include wind speed as an added predictor variable. A Gaussian kernel function is used in the analysis and circular statistics employed for the wind direction variable. Wind direction and wind speed were both found to have a statistically significant effect on background levels of NO(2) at all three sites. Frequently environmental impact assessments are based on short term baseline monitoring producing a limited dataset. The presented non-parametric regression methods, in contrast to the frequently used methods such as binning of the data, allow concentrations for missing data pairs to be estimated and distinction between spurious and true peaks in concentrations to be made. The methods were found to provide a realistic estimation of long term concentration variation with wind direction and speed, even for cases where the data set is limited. Accurate identification of the actual variation at each location and causative factors could be made, thus supporting the improved definition of background concentrations for use in air quality modelling studies. Copyright © 2010 Elsevier B.V. All rights reserved.

  2. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression

    USDA-ARS?s Scientific Manuscript database

    In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly ...

  3. Local Linear Regression for Data with AR Errors.

    PubMed

    Li, Runze; Li, Yan

    2009-07-01

    In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.

  4. Design of adaptive control systems by means of self-adjusting transversal filters

    NASA Technical Reports Server (NTRS)

    Merhav, S. J.

    1986-01-01

    The design of closed-loop adaptive control systems based on nonparametric identification was addressed. Implementation is by self-adjusting Least Mean Square (LMS) transversal filters. The design concept is Model Reference Adaptive Control (MRAC). Major issues are to preserve the linearity of the error equations of each LMS filter, and to prevent estimation bias that is due to process or measurement noise, thus providing necessary conditions for the convergence and stability of the control system. The controlled element is assumed to be asymptotically stable and minimum phase. Because of the nonparametric Finite Impulse Response (FIR) estimates provided by the LMS filters, a-priori information on the plant model is needed only in broad terms. Following a survey of control system configurations and filter design considerations, system implementation is shown here in Single Input Single Output (SISO) format which is readily extendable to multivariable forms. In extensive computer simulation studies the controlled element is represented by a second-order system with widely varying damping, natural frequency, and relative degree.

  5. Methods for scalar-on-function regression.

    PubMed

    Reiss, Philip T; Goldsmith, Jeff; Shang, Han Lin; Ogden, R Todd

    2017-08-01

    Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in which curves, spectra, images, etc. are considered as basic functional data units. A central problem in FDA is how to fit regression models with scalar responses and functional data points as predictors. We review some of the main approaches to this problem, categorizing the basic model types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and illustrate some of the procedures by application to a functional magnetic resonance imaging dataset.

  6. Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys

    ERIC Educational Resources Information Center

    Si, Yajuan; Reiter, Jerome P.

    2013-01-01

    In many surveys, the data comprise a large number of categorical variables that suffer from item nonresponse. Standard methods for multiple imputation, like log-linear models or sequential regression imputation, can fail to capture complex dependencies and can be difficult to implement effectively in high dimensions. We present a fully Bayesian,…

  7. Shape selection in Landsat time series: A tool for monitoring forest dynamics

    Treesearch

    Gretchen G. Moisen; Mary C. Meyer; Todd A. Schroeder; Xiyue Liao; Karen G. Schleeweis; Elizabeth A. Freeman; Chris Toney

    2016-01-01

    We present a new methodology for fitting nonparametric shape-restricted regression splines to time series of Landsat imagery for the purpose of modeling, mapping, and monitoring annual forest disturbance dynamics over nearly three decades. For each pixel and spectral band or index of choice in temporal Landsat data, our method delivers a smoothed rendition of...

  8. Consequences of ignoring geologic variation in evaluating grazing impacts

    Treesearch

    Jonathan W. Long; Alvin L. Medina

    2006-01-01

    The geologic diversity of landforms in the Southwest complicates efforts to evaluate impacts of land uses such as livestock grazing. We examined a research study that evaluated relationships between trout biomass and stream habitat in the White Mountains of east-central Arizona. That study interpreted results of stepwise regressions and a nonparametric test of “grazed...

  9. An application of quantile random forests for predictive mapping of forest attributes

    Treesearch

    E.A. Freeman; G.G. Moisen

    2015-01-01

    Increasingly, random forest models are used in predictive mapping of forest attributes. Traditional random forests output the mean prediction from the random trees. Quantile regression forests (QRF) is an extension of random forests developed by Nicolai Meinshausen that provides non-parametric estimates of the median predicted value as well as prediction quantiles. It...

  10. Bayesian Nonparametric Ordination for the Analysis of Microbial Communities.

    PubMed

    Ren, Boyu; Bacallado, Sergio; Favaro, Stefano; Holmes, Susan; Trippa, Lorenzo

    2017-01-01

    Human microbiome studies use sequencing technologies to measure the abundance of bacterial species or Operational Taxonomic Units (OTUs) in samples of biological material. Typically the data are organized in contingency tables with OTU counts across heterogeneous biological samples. In the microbial ecology community, ordination methods are frequently used to investigate latent factors or clusters that capture and describe variations of OTU counts across biological samples. It remains important to evaluate how uncertainty in estimates of each biological sample's microbial distribution propagates to ordination analyses, including visualization of clusters and projections of biological samples on low dimensional spaces. We propose a Bayesian analysis for dependent distributions to endow frequently used ordinations with estimates of uncertainty. A Bayesian nonparametric prior for dependent normalized random measures is constructed, which is marginally equivalent to the normalized generalized Gamma process, a well-known prior for nonparametric analyses. In our prior, the dependence and similarity between microbial distributions is represented by latent factors that concentrate in a low dimensional space. We use a shrinkage prior to tune the dimensionality of the latent factors. The resulting posterior samples of model parameters can be used to evaluate uncertainty in analyses routinely applied in microbiome studies. Specifically, by combining them with multivariate data analysis techniques we can visualize credible regions in ecological ordination plots. The characteristics of the proposed model are illustrated through a simulation study and applications in two microbiome datasets.

  11. Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery.

    PubMed

    Liu, Han; Wang, Lie; Zhao, Tuo

    2015-08-01

    We propose a calibrated multivariate regression method named CMR for fitting high dimensional multivariate regression models. Compared with existing methods, CMR calibrates regularization for each regression task with respect to its noise level so that it simultaneously attains improved finite-sample performance and tuning insensitiveness. Theoretically, we provide sufficient conditions under which CMR achieves the optimal rate of convergence in parameter estimation. Computationally, we propose an efficient smoothed proximal gradient algorithm with a worst-case numerical rate of convergence O (1/ ϵ ), where ϵ is a pre-specified accuracy of the objective function value. We conduct thorough numerical simulations to illustrate that CMR consistently outperforms other high dimensional multivariate regression methods. We also apply CMR to solve a brain activity prediction problem and find that it is as competitive as a handcrafted model created by human experts. The R package camel implementing the proposed method is available on the Comprehensive R Archive Network http://cran.r-project.org/web/packages/camel/.

  12. Nonparametric Signal Extraction and Measurement Error in the Analysis of Electroencephalographic Activity During Sleep

    PubMed Central

    Crainiceanu, Ciprian M.; Caffo, Brian S.; Di, Chong-Zhi; Punjabi, Naresh M.

    2009-01-01

    We introduce methods for signal and associated variability estimation based on hierarchical nonparametric smoothing with application to the Sleep Heart Health Study (SHHS). SHHS is the largest electroencephalographic (EEG) collection of sleep-related data, which contains, at each visit, two quasi-continuous EEG signals for each subject. The signal features extracted from EEG data are then used in second level analyses to investigate the relation between health, behavioral, or biometric outcomes and sleep. Using subject specific signals estimated with known variability in a second level regression becomes a nonstandard measurement error problem. We propose and implement methods that take into account cross-sectional and longitudinal measurement error. The research presented here forms the basis for EEG signal processing for the SHHS. PMID:20057925

  13. Sieve estimation of Cox models with latent structures.

    PubMed

    Cao, Yongxiu; Huang, Jian; Liu, Yanyan; Zhao, Xingqiu

    2016-12-01

    This article considers sieve estimation in the Cox model with an unknown regression structure based on right-censored data. We propose a semiparametric pursuit method to simultaneously identify and estimate linear and nonparametric covariate effects based on B-spline expansions through a penalized group selection method with concave penalties. We show that the estimators of the linear effects and the nonparametric component are consistent. Furthermore, we establish the asymptotic normality of the estimator of the linear effects. To compute the proposed estimators, we develop a modified blockwise majorization descent algorithm that is efficient and easy to implement. Simulation studies demonstrate that the proposed method performs well in finite sample situations. We also use the primary biliary cirrhosis data to illustrate its application. © 2016, The International Biometric Society.

  14. Methodologic considerations in the design and analysis of nested case-control studies: association between cytokines and postoperative delirium.

    PubMed

    Ngo, Long H; Inouye, Sharon K; Jones, Richard N; Travison, Thomas G; Libermann, Towia A; Dillon, Simon T; Kuchel, George A; Vasunilashorn, Sarinnapha M; Alsop, David C; Marcantonio, Edward R

    2017-06-06

    The nested case-control study (NCC) design within a prospective cohort study is used when outcome data are available for all subjects, but the exposure of interest has not been collected, and is difficult or prohibitively expensive to obtain for all subjects. A NCC analysis with good matching procedures yields estimates that are as efficient and unbiased as estimates from the full cohort study. We present methodological considerations in a matched NCC design and analysis, which include the choice of match algorithms, analysis methods to evaluate the association of exposures of interest with outcomes, and consideration of overmatching. Matched, NCC design within a longitudinal observational prospective cohort study in the setting of two academic hospitals. Study participants are patients aged over 70 years who underwent scheduled major non-cardiac surgery. The primary outcome was postoperative delirium from in-hospital interviews and medical record review. The main exposure was IL-6 concentration (pg/ml) from blood sampled at three time points before delirium occurred. We used nonparametric signed ranked test to test for the median of the paired differences. We used conditional logistic regression to model the risk of IL-6 on delirium incidence. Simulation was used to generate a sample of cohort data on which unconditional multivariable logistic regression was used, and the results were compared to those of the conditional logistic regression. Partial R-square was used to assess the level of overmatching. We found that the optimal match algorithm yielded more matched pairs than the greedy algorithm. The choice of analytic strategy-whether to consider measured cytokine levels as the predictor or outcome-- yielded inferences that have different clinical interpretations but similar levels of statistical significance. Estimation results from NCC design using conditional logistic regression, and from simulated cohort design using unconditional logistic regression, were similar. We found minimal evidence for overmatching. Using a matched NCC approach introduces methodological challenges into the study design and data analysis. Nonetheless, with careful selection of the match algorithm, match factors, and analysis methods, this design is cost effective and, for our study, yields estimates that are similar to those from a prospective cohort study design.

  15. Methodology for the AutoRegressive Planet Search (ARPS) Project

    NASA Astrophysics Data System (ADS)

    Feigelson, Eric; Caceres, Gabriel; ARPS Collaboration

    2018-01-01

    The detection of periodic signals of transiting exoplanets is often impeded by the presence of aperiodic photometric variations. This variability is intrinsic to the host star in space-based observations (typically arising from magnetic activity) and from observational conditions in ground-based observations. The most common statistical procedures to remove stellar variations are nonparametric, such as wavelet decomposition or Gaussian Processes regression. However, many stars display variability with autoregressive properties, wherein later flux values are correlated with previous ones. Providing the time series is evenly spaced, parametric autoregressive models can prove very effective. Here we present the methodology of the Autoregessive Planet Search (ARPS) project which uses Autoregressive Integrated Moving Average (ARIMA) models to treat a wide variety of stochastic short-memory processes, as well as nonstationarity. Additionally, we introduce a planet-search algorithm to detect periodic transits in the time-series residuals after application of ARIMA models. Our matched-filter algorithm, the Transit Comb Filter (TCF), replaces the traditional box-fitting step. We construct a periodogram based on the TCF to concentrate the signal of these periodic spikes. Various features of the original light curves, the ARIMA fits, the TCF periodograms, and folded light curves at peaks of the TCF periodogram can then be collected to provide constraints for planet detection. These features provide input into a multivariate classifier when a training set is available. The ARPS procedure has been applied NASA's Kepler mission observations of ~200,000 stars (Caceres, Dissertation Talk, this meeting) and will be applied in the future to other datasets.

  16. Seropositivity for the human heat shock protein (Hsp)60 accompanying seropositivity for Chlamydia trachomatis is less prevalent among tubal ectopic pregnancy cases than individuals with normal reproductive history.

    PubMed

    Ozyurek, Eser S; Karacan, Tolga; Ozdalgicoglu, Cenk; Yilmaz, Salih; Isik, Salman; San, Mevlide; Kaya, Erdal

    2018-04-01

    To investigate the role of anti-human heat shock protein 60 (hHsp60) antibody positivity in the pathogenesis of ectopic pregnancy, following Chlamydia trachomatis (CT) infection. In a case-control study, serological tests for anti-hHsp60 were performed in ectopic pregnancies (study group) and parturients with normal reproductive histories (control group). All participants in both groups were CT IgG(+). hHsp60 IgG(+) prevalences were compared between the two groups, by semiquantitative ELISA. Data were evaluated using nonparametric and parametric tests and multivariable regression. After an initial pilot study, two groups were formed: 63 ectopic gestations (study group) and 95 normal parturients (control group), all CT IgG(+). Blood samples from all cases were tested for anti-hHsp60 IgG. Age, gravidity, and practising contraception were higher in the control group, while a history of pelvic infections were more common in the study group. Hsp60 IgG(+) was found to be significantly higher in the control group (63/95, 66.3%) compared to study group (30/63, 47.6%). Regression analysis revealed anti-hHsp60 positivity was an independent factor delineating the two groups. Immunity to hHsp60 is less common in CT IgG(+) ectopic pregnancies than CT IgG(+) fertile subjects without a history of ectopic pregnancies. Hence, our findings suggest that hHsp60 seropositivity may decrease the probability of an ectopic gestation in subjects with previous CT infections. Copyright © 2018 Elsevier B.V. All rights reserved.

  17. Self-injurious behavior among Greek male prisoners: prevalence and risk factors.

    PubMed

    Sakelliadis, E I; Papadodima, S A; Sergentanis, T N; Giotakos, O; Spiliopoulou, C A

    2010-04-01

    Self-harm among prisoners is a common phenomenon. This study aims to estimate the prevalence of self-injurious behavior (SIB) among Greek male prisoners, record their motives and determine independent risk factors. A self-administered, anonymous questionnaire was administered to 173 male prisoners in the Chalkida prison, Greece. The questionnaire included items on self-harm/SIB, demographic parameters, childhood history, family history, physical and mental disease, lifestyle and smoking habits, alcohol dependence (CAGE questionnaire), illicit substance use, aggression (Buss-Perry Aggression Questionnaire [BPAQ] and Lifetime History of Aggression [LTHA]), impulsivity (Barrat Impulsivity Scale-11) and suicidal ideation (Spectrum of Suicidal Behavior Scale). Univariate nonparametric statistics and multivariate ordinal logistic regression were performed. Of all the participants, 49.4% (95% CI: 41.5-57.3%) disclosed self-harm (direct or indirect). The prevalence of SIB was equal to 34.8% (95% CI: 27.5-42.6%). Most frequently, SIB coexisted with indirect self-harm (80.7%). The most common underlying motives were to obtain emotional release (31.6%) and to release anger (21.1%). At the univariate analysis, SIB was positively associated with a host of closely related factors: low education, physical/sexual abuse in childhood, parental neglect, parental divorce, alcoholism in family, psychiatric condition in family, recidivism, age, sentence already served, impulsivity, aggression, alcohol dependence, self-reported diagnosed psychiatric condition and illicit substance use. Childhood variables were particularly associated with the presence of diagnosed psychiatric condition. At the multivariate analysis, however, only three parameters were proven independent risk factors: self-reported diagnosed psychiatric condition, illicit substance use and aggression (BPAQ scale). The prevalence of SIB is particularly high. Psychiatric condition, illicit substance use and aggression seem to be the most meaningful risk factors; childhood events seem only to act indirectly. Copyright (c) 2009 Elsevier Masson SAS. All rights reserved.

  18. Linear regression analysis and its application to multivariate chromatographic calibration for the quantitative analysis of two-component mixtures.

    PubMed

    Dinç, Erdal; Ozdemir, Abdil

    2005-01-01

    Multivariate chromatographic calibration technique was developed for the quantitative analysis of binary mixtures enalapril maleate (EA) and hydrochlorothiazide (HCT) in tablets in the presence of losartan potassium (LST). The mathematical algorithm of multivariate chromatographic calibration technique is based on the use of the linear regression equations constructed using relationship between concentration and peak area at the five-wavelength set. The algorithm of this mathematical calibration model having a simple mathematical content was briefly described. This approach is a powerful mathematical tool for an optimum chromatographic multivariate calibration and elimination of fluctuations coming from instrumental and experimental conditions. This multivariate chromatographic calibration contains reduction of multivariate linear regression functions to univariate data set. The validation of model was carried out by analyzing various synthetic binary mixtures and using the standard addition technique. Developed calibration technique was applied to the analysis of the real pharmaceutical tablets containing EA and HCT. The obtained results were compared with those obtained by classical HPLC method. It was observed that the proposed multivariate chromatographic calibration gives better results than classical HPLC.

  19. SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES

    PubMed Central

    Zhu, Liping; Huang, Mian; Li, Runze

    2012-01-01

    This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536

  20. Estimation of spline function in nonparametric path analysis based on penalized weighted least square (PWLS)

    NASA Astrophysics Data System (ADS)

    Fernandes, Adji Achmad Rinaldo; Solimun, Arisoesilaningsih, Endang

    2017-12-01

    The aim of this research is to estimate the spline in Path Analysis-based on Nonparametric Regression using Penalized Weighted Least Square (PWLS) approach. Approach used is Reproducing Kernel Hilbert Space at sobolev space. Nonparametric path analysis model on the equation y1 i=f1.1(x1 i)+ε1 i; y2 i=f1.2(x1 i)+f2.2(y1 i)+ε2 i; i =1 ,2 ,…,n Nonparametric Path Analysis which meet the criteria of minimizing PWLS min fw .k∈W2m[aw .k,bw .k], k =1 ,2 { (2n ) -1(y˜-f ˜ ) TΣ-1(y ˜-f ˜ ) + ∑k =1 2 ∑w =1 2 λw .k ∫aw .k bw .k [fw.k (m )(xi) ] 2d xi } is f ˜^=Ay ˜ with A=T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1+V1U1-1∑-1[I-T1(T1TU1-1∑-1T1)-1T1TU1-1∑-1] columnalign="left">+T2(T2TU2-1∑-1T2)-1T2TU2-1∑-1+V2U2-1∑-1[I1-T2(T2TU2-1∑-1T2) -1T2TU2-1∑-1

  1. Statistical Package User’s Guide.

    DTIC Science & Technology

    1980-08-01

    261 C. STACH Nonparametric Descriptive Statistics ... ......... ... 265 D. CHIRA Coefficient of Concordance...135 I.- -a - - W 7- Test Data: This program was tested using data from John Neter and William Wasserman, Applied Linear Statistical Models: Regression...length of data file e. new fileý name (not same as raw data file) 5. Printout as optioned for only. Comments: Ranked data are used for program CHIRA

  2. ShapeSelectForest: a new r package for modeling landsat time series

    Treesearch

    Mary Meyer; Xiyue Liao; Gretchen Moisen; Elizabeth Freeman

    2015-01-01

    We present a new R package called ShapeSelectForest recently posted to the Comprehensive R Archival Network. The package was developed to fit nonparametric shape-restricted regression splines to time series of Landsat imagery for the purpose of modeling, mapping, and monitoring annual forest disturbance dynamics over nearly three decades. For each pixel and spectral...

  3. Robust Machine Learning Variable Importance Analyses of Medical Conditions for Health Care Spending.

    PubMed

    Rose, Sherri

    2018-03-11

    To propose nonparametric double robust machine learning in variable importance analyses of medical conditions for health spending. 2011-2012 Truven MarketScan database. I evaluate how much more, on average, commercially insured enrollees with each of 26 of the most prevalent medical conditions cost per year after controlling for demographics and other medical conditions. This is accomplished within the nonparametric targeted learning framework, which incorporates ensemble machine learning. Previous literature studying the impact of medical conditions on health care spending has almost exclusively focused on parametric risk adjustment; thus, I compare my approach to parametric regression. My results demonstrate that multiple sclerosis, congestive heart failure, severe cancers, major depression and bipolar disorders, and chronic hepatitis are the most costly medical conditions on average per individual. These findings differed from those obtained using parametric regression. The literature may be underestimating the spending contributions of several medical conditions, which is a potentially critical oversight. If current methods are not capturing the true incremental effect of medical conditions, undesirable incentives related to care may remain. Further work is needed to directly study these issues in the context of federal formulas. © Health Research and Educational Trust.

  4. Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival.

    PubMed

    Ishwaran, Hemant; Lu, Min

    2018-06-04

    Random forests are a popular nonparametric tree ensemble procedure with broad applications to data analysis. While its widespread popularity stems from its prediction performance, an equally important feature is that it provides a fully nonparametric measure of variable importance (VIMP). A current limitation of VIMP, however, is that no systematic method exists for estimating its variance. As a solution, we propose a subsampling approach that can be used to estimate the variance of VIMP and for constructing confidence intervals. The method is general enough that it can be applied to many useful settings, including regression, classification, and survival problems. Using extensive simulations, we demonstrate the effectiveness of the subsampling estimator and in particular find that the delete-d jackknife variance estimator, a close cousin, is especially effective under low subsampling rates due to its bias correction properties. These 2 estimators are highly competitive when compared with the .164 bootstrap estimator, a modified bootstrap procedure designed to deal with ties in out-of-sample data. Most importantly, subsampling is computationally fast, thus making it especially attractive for big data settings. Copyright © 2018 John Wiley & Sons, Ltd.

  5. Monitoring waterbird abundance in wetlands: The importance of controlling results for variation in water depth

    USGS Publications Warehouse

    Bolduc, F.; Afton, A.D.

    2008-01-01

    Wetland use by waterbirds is highly dependent on water depth, and depth requirements generally vary among species. Furthermore, water depth within wetlands often varies greatly over time due to unpredictable hydrological events, making comparisons of waterbird abundance among wetlands difficult as effects of habitat variables and water depth are confounded. Species-specific relationships between bird abundance and water depth necessarily are non-linear; thus, we developed a methodology to correct waterbird abundance for variation in water depth, based on the non-parametric regression of these two variables. Accordingly, we used the difference between observed and predicted abundances from non-parametric regression (analogous to parametric residuals) as an estimate of bird abundance at equivalent water depths. We scaled this difference to levels of observed and predicted abundances using the formula: ((observed - predicted abundance)/(observed + predicted abundance)) ?? 100. This estimate also corresponds to the observed:predicted abundance ratio, which allows easy interpretation of results. We illustrated this methodology using two hypothetical species that differed in water depth and wetland preferences. Comparisons of wetlands, using both observed and relative corrected abundances, indicated that relative corrected abundance adequately separates the effect of water depth from the effect of wetlands. ?? 2008 Elsevier B.V.

  6. Non-parametric causality detection: An application to social media and financial data

    NASA Astrophysics Data System (ADS)

    Tsapeli, Fani; Musolesi, Mirco; Tino, Peter

    2017-10-01

    According to behavioral finance, stock market returns are influenced by emotional, social and psychological factors. Several recent works support this theory by providing evidence of correlation between stock market prices and collective sentiment indexes measured using social media data. However, a pure correlation analysis is not sufficient to prove that stock market returns are influenced by such emotional factors since both stock market prices and collective sentiment may be driven by a third unmeasured factor. Controlling for factors that could influence the study by applying multivariate regression models is challenging given the complexity of stock market data. False assumptions about the linearity or non-linearity of the model and inaccuracies on model specification may result in misleading conclusions. In this work, we propose a novel framework for causal inference that does not require any assumption about a particular parametric form of the model expressing statistical relationships among the variables of the study and can effectively control a large number of observed factors. We apply our method in order to estimate the causal impact that information posted in social media may have on stock market returns of four big companies. Our results indicate that social media data not only correlate with stock market returns but also influence them.

  7. Effects of alopecia on body image and quality of life of Turkish cancer women with or without headscarf.

    PubMed

    Erol, Ozgul; Can, Gulbeyaz; Aydıner, Adnan

    2012-10-01

    The aim of this study was to find out the effects of chemotherapy-related alopecia on body image and quality of life of Turkish women who have cancer with or without headscarves and factors affecting them. This descriptive study was conducted with 204 women who received chemotherapy at the Istanbul University Institute of Oncology, Turkey. The Patient Description Form, Body Image Scale and Nightingale Symptom Assessment Scale were used in data collection. Statistical analyses were performed using descriptive statistics and non-parametric tests. Logistic regression analysis was done to predict the factors affecting body image and quality of life of the patients. No difference was found between women wearing headscarves and those who did not in respect of their body image. However, women who wore headscarves who had no alopecia felt less dissatisfied with their scars, and women not wearing headscarves who had no alopecia have been feeling less self-conscious, less dissatisfied with their appearance. There was difference in terms of quality of life: women wearing headscarves had worse physical, psychological and general well-being than others. Although there were many important factors, multivariate analysis showed that for body image, having alopecia and wearing headscarves; and for quality of life, having alopecia were the variables that had considerable effects.

  8. Bivariate discrete beta Kernel graduation of mortality data.

    PubMed

    Mazza, Angelo; Punzo, Antonio

    2015-07-01

    Various parametric/nonparametric techniques have been proposed in literature to graduate mortality data as a function of age. Nonparametric approaches, as for example kernel smoothing regression, are often preferred because they do not assume any particular mortality law. Among the existing kernel smoothing approaches, the recently proposed (univariate) discrete beta kernel smoother has been shown to provide some benefits. Bivariate graduation, over age and calendar years or durations, is common practice in demography and actuarial sciences. In this paper, we generalize the discrete beta kernel smoother to the bivariate case, and we introduce an adaptive bandwidth variant that may provide additional benefits when data on exposures to the risk of death are available; furthermore, we outline a cross-validation procedure for bandwidths selection. Using simulations studies, we compare the bivariate approach proposed here with its corresponding univariate formulation and with two popular nonparametric bivariate graduation techniques, based on Epanechnikov kernels and on P-splines. To make simulations realistic, a bivariate dataset, based on probabilities of dying recorded for the US males, is used. Simulations have confirmed the gain in performance of the new bivariate approach with respect to both the univariate and the bivariate competitors.

  9. Nonparametric estimation of benchmark doses in environmental risk assessment

    PubMed Central

    Piegorsch, Walter W.; Xiong, Hui; Bhattacharya, Rabi N.; Lin, Lizhen

    2013-01-01

    Summary An important statistical objective in environmental risk analysis is estimation of minimum exposure levels, called benchmark doses (BMDs), that induce a pre-specified benchmark response in a dose-response experiment. In such settings, representations of the risk are traditionally based on a parametric dose-response model. It is a well-known concern, however, that if the chosen parametric form is misspecified, inaccurate and possibly unsafe low-dose inferences can result. We apply a nonparametric approach for calculating benchmark doses, based on an isotonic regression method for dose-response estimation with quantal-response data (Bhattacharya and Kong, 2007). We determine the large-sample properties of the estimator, develop bootstrap-based confidence limits on the BMDs, and explore the confidence limits’ small-sample properties via a short simulation study. An example from cancer risk assessment illustrates the calculations. PMID:23914133

  10. Alternatives for using multivariate regression to adjust prospective payment rates

    PubMed Central

    Sheingold, Steven H.

    1990-01-01

    Multivariate regression analysis has been used in structuring three of the adjustments to Medicare's prospective payment rates. Because the indirect-teaching adjustment, the disproportionate-share adjustment, and the adjustment for large cities are responsible for distributing approximately $3 billion in payments each year, the specification of regression models for these adjustments is of critical importance. In this article, the application of regression for adjusting Medicare's prospective rates is discussed, and the implications that differing specifications could have for these adjustments are demonstrated. PMID:10113271

  11. Electricity Consumption in the Industrial Sector of Jordan: Application of Multivariate Linear Regression and Adaptive Neuro-Fuzzy Techniques

    NASA Astrophysics Data System (ADS)

    Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.

    2009-08-01

    In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.

  12. Efficient Regressions via Optimally Combining Quantile Information*

    PubMed Central

    Zhao, Zhibiao; Xiao, Zhijie

    2014-01-01

    We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481

  13. Semi-nonparametric VaR forecasts for hedge funds during the recent crisis

    NASA Astrophysics Data System (ADS)

    Del Brio, Esther B.; Mora-Valencia, Andrés; Perote, Javier

    2014-05-01

    The need to provide accurate value-at-risk (VaR) forecasting measures has triggered an important literature in econophysics. Although these accurate VaR models and methodologies are particularly demanded for hedge fund managers, there exist few articles specifically devoted to implement new techniques in hedge fund returns VaR forecasting. This article advances in these issues by comparing the performance of risk measures based on parametric distributions (the normal, Student’s t and skewed-t), semi-nonparametric (SNP) methodologies based on Gram-Charlier (GC) series and the extreme value theory (EVT) approach. Our results show that normal-, Student’s t- and Skewed t- based methodologies fail to forecast hedge fund VaR, whilst SNP and EVT approaches accurately success on it. We extend these results to the multivariate framework by providing an explicit formula for the GC copula and its density that encompasses the Gaussian copula and accounts for non-linear dependences. We show that the VaR obtained by the meta GC accurately captures portfolio risk and outperforms regulatory VaR estimates obtained through the meta Gaussian and Student’s t distributions.

  14. A Non-parametric Cutout Index for Robust Evaluation of Identified Proteins*

    PubMed Central

    Serang, Oliver; Paulo, Joao; Steen, Hanno; Steen, Judith A.

    2013-01-01

    This paper proposes a novel, automated method for evaluating sets of proteins identified using mass spectrometry. The remaining peptide-spectrum match score distributions of protein sets are compared to an empirical absent peptide-spectrum match score distribution, and a Bayesian non-parametric method reminiscent of the Dirichlet process is presented to accurately perform this comparison. Thus, for a given protein set, the process computes the likelihood that the proteins identified are correctly identified. First, the method is used to evaluate protein sets chosen using different protein-level false discovery rate (FDR) thresholds, assigning each protein set a likelihood. The protein set assigned the highest likelihood is used to choose a non-arbitrary protein-level FDR threshold. Because the method can be used to evaluate any protein identification strategy (and is not limited to mere comparisons of different FDR thresholds), we subsequently use the method to compare and evaluate multiple simple methods for merging peptide evidence over replicate experiments. The general statistical approach can be applied to other types of data (e.g. RNA sequencing) and generalizes to multivariate problems. PMID:23292186

  15. Multi-object segmentation using coupled nonparametric shape and relative pose priors

    NASA Astrophysics Data System (ADS)

    Uzunbas, Mustafa Gökhan; Soldea, Octavian; Çetin, Müjdat; Ünal, Gözde; Erçil, Aytül; Unay, Devrim; Ekin, Ahmet; Firat, Zeynep

    2009-02-01

    We present a new method for multi-object segmentation in a maximum a posteriori estimation framework. Our method is motivated by the observation that neighboring or coupling objects in images generate configurations and co-dependencies which could potentially aid in segmentation if properly exploited. Our approach employs coupled shape and inter-shape pose priors that are computed using training images in a nonparametric multi-variate kernel density estimation framework. The coupled shape prior is obtained by estimating the joint shape distribution of multiple objects and the inter-shape pose priors are modeled via standard moments. Based on such statistical models, we formulate an optimization problem for segmentation, which we solve by an algorithm based on active contours. Our technique provides significant improvements in the segmentation of weakly contrasted objects in a number of applications. In particular for medical image analysis, we use our method to extract brain Basal Ganglia structures, which are members of a complex multi-object system posing a challenging segmentation problem. We also apply our technique to the problem of handwritten character segmentation. Finally, we use our method to segment cars in urban scenes.

  16. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory.

    PubMed

    Kruppa, Jochen; Liu, Yufeng; Biau, Gérard; Kohler, Michael; König, Inke R; Malley, James D; Ziegler, Andreas

    2014-07-01

    Probability estimation for binary and multicategory outcome using logistic and multinomial logistic regression has a long-standing tradition in biostatistics. However, biases may occur if the model is misspecified. In contrast, outcome probabilities for individuals can be estimated consistently with machine learning approaches, including k-nearest neighbors (k-NN), bagged nearest neighbors (b-NN), random forests (RF), and support vector machines (SVM). Because machine learning methods are rarely used by applied biostatisticians, the primary goal of this paper is to explain the concept of probability estimation with these methods and to summarize recent theoretical findings. Probability estimation in k-NN, b-NN, and RF can be embedded into the class of nonparametric regression learning machines; therefore, we start with the construction of nonparametric regression estimates and review results on consistency and rates of convergence. In SVMs, outcome probabilities for individuals are estimated consistently by repeatedly solving classification problems. For SVMs we review classification problem and then dichotomous probability estimation. Next we extend the algorithms for estimating probabilities using k-NN, b-NN, and RF to multicategory outcomes and discuss approaches for the multicategory probability estimation problem using SVM. In simulation studies for dichotomous and multicategory dependent variables we demonstrate the general validity of the machine learning methods and compare it with logistic regression. However, each method fails in at least one simulation scenario. We conclude with a discussion of the failures and give recommendations for selecting and tuning the methods. Applications to real data and example code are provided in a companion article (doi:10.1002/bimj.201300077). © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  17. Bayesian Local Contamination Models for Multivariate Outliers

    PubMed Central

    Page, Garritt L.; Dunson, David B.

    2013-01-01

    In studies where data are generated from multiple locations or sources it is common for there to exist observations that are quite unlike the majority. Motivated by the application of establishing a reference value in an inter-laboratory setting when outlying labs are present, we propose a local contamination model that is able to accommodate unusual multivariate realizations in a flexible way. The proposed method models the process level of a hierarchical model using a mixture with a parametric component and a possibly nonparametric contamination. Much of the flexibility in the methodology is achieved by allowing varying random subsets of the elements in the lab-specific mean vectors to be allocated to the contamination component. Computational methods are developed and the methodology is compared to three other possible approaches using a simulation study. We apply the proposed method to a NIST/NOAA sponsored inter-laboratory study which motivated the methodological development. PMID:24363465

  18. Assessment of benthic changes during 20 years of monitoring the Mexican Salina Cruz Bay.

    PubMed

    González-Macías, C; Schifter, I; Lluch-Cota, D B; Méndez-Rodríguez, L; Hernández-Vázquez, S

    2009-02-01

    In this work a non-parametric multivariate analysis was used to assess the impact of metals and organic compounds in the macro infaunal component of the mollusks benthic community using surface sediment data from several monitoring programs collected over 20 years in Salina Cruz Bay, Mexico. The data for benthic mollusks community characteristics (richness, abundance and diversity) were linked to multivariate environmental patterns, using the Alternating Conditional Expectations method to correlate the biological measurements of the mollusk community with the physicochemical properties of water and sediments. Mollusks community variation is related to environmental characteristics as well as lead content. Surface deposit feeders are increasing their relative density, while subsurface deposit feeders are decreasing with respect to time, these last are expected to be more related with sediment and more affected then by its quality. However gastropods with predatory carnivore as well as chemosymbiotic deposit feeder bivalves have maintained their relative densities along time.

  19. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

    PubMed Central

    Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

    2016-01-01

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260

  20. Short-term forecasting of meteorological time series using Nonparametric Functional Data Analysis (NPFDA)

    NASA Astrophysics Data System (ADS)

    Curceac, S.; Ternynck, C.; Ouarda, T.

    2015-12-01

    Over the past decades, a substantial amount of research has been conducted to model and forecast climatic variables. In this study, Nonparametric Functional Data Analysis (NPFDA) methods are applied to forecast air temperature and wind speed time series in Abu Dhabi, UAE. The dataset consists of hourly measurements recorded for a period of 29 years, 1982-2010. The novelty of the Functional Data Analysis approach is in expressing the data as curves. In the present work, the focus is on daily forecasting and the functional observations (curves) express the daily measurements of the above mentioned variables. We apply a non-linear regression model with a functional non-parametric kernel estimator. The computation of the estimator is performed using an asymmetrical quadratic kernel function for local weighting based on the bandwidth obtained by a cross validation procedure. The proximities between functional objects are calculated by families of semi-metrics based on derivatives and Functional Principal Component Analysis (FPCA). Additionally, functional conditional mode and functional conditional median estimators are applied and the advantages of combining their results are analysed. A different approach employs a SARIMA model selected according to the minimum Akaike (AIC) and Bayessian (BIC) Information Criteria and based on the residuals of the model. The performance of the models is assessed by calculating error indices such as the root mean square error (RMSE), relative RMSE, BIAS and relative BIAS. The results indicate that the NPFDA models provide more accurate forecasts than the SARIMA models. Key words: Nonparametric functional data analysis, SARIMA, time series forecast, air temperature, wind speed

  1. Applied Statistics: From Bivariate through Multivariate Techniques [with CD-ROM

    ERIC Educational Resources Information Center

    Warner, Rebecca M.

    2007-01-01

    This book provides a clear introduction to widely used topics in bivariate and multivariate statistics, including multiple regression, discriminant analysis, MANOVA, factor analysis, and binary logistic regression. The approach is applied and does not require formal mathematics; equations are accompanied by verbal explanations. Students are asked…

  2. Pressure Ulcers in the United States' Inpatient Population From 2008 to 2012: Results of a Retrospective Nationwide Study.

    PubMed

    Bauer, Karen; Rock, Kathryn; Nazzal, Munier; Jones, Olivia; Qu, Weikai

    2016-11-01

    Pressure ulcers are common, increase patient morbidity and mortality, and costly for patients, their families, and the health care system. A retrospective study was conducted to evaluate the impact of pressure ulcers on short-term outcomes in United States inpatient populations and to identify patient characteristics associated with having 1 or more pressure ulcers. The US Nationwide Inpatient Sample (NIS) database was analyzed using the International Classification of Disease, 9th Revision, Clinical Modification (ICD-9 CM) diagnosis codes as the screening tool for all inpatient pressure ulcers recorded from 2008 to 2012. Patient demographics and comorbid conditions, as identified by ICD-9 code, were extracted, along with primary outcomes of length of stay (LOS), total hospital charge (TC), inhospital mortality, and discharge disposition. Continuous variables with normal distribution were expressed in terms of mean and standard deviation. Group comparisons were performed using t-test or ANOVA test. Continuous nonnormal distributed variables such as LOS and TC were expressed in terms of median, and nonparametric tests were used to compare the differences between groups. Categorical data were presented in terms of percentages of the number of cases within each group. Chi-squared tests were used to compare categorical data in different groups. For multivariate analysis, linear regressions (for continuous variable) and logistic regression (for categorical variables) were used to analyze the possible risk factors for the investigated outcomes of LOS, TC, inhospital mortality, and patient disposition. Coefficients were calculated with multivariate regression with all included patients versus patients with pressure ulcers alone. The 5-year average number of admitted patients with at least 1 pressure ulcer was determined to be 670 767 (average overall rate: 1.8%). Statistically significant differences between patients with and without pressure ulcers were observed for median LOS (7 days [mean 11.1 ± 15] compared to 3 days [mean 4.6 ± 6.8]) and median TC ($36 500 [mean $72 000 ± $122 900] compared to $17 200 [mean $32 200 ± $57 500]). The mortality rate in patients with a pressure ulcer was significantly higher than in patients without a pressure ulcer (9.1% versus 1.8%, OR = 5.08, CI: 5.03-5.1, P <0.001). Pressure ulcers were significantly more common in patients who were older or had malnutrition. The results of this study confirm the importance of prevention initiatives to help reduce the negative impact of pressure ulcers on patient outcomes and costs of care.

  3. Age and mortality after injury: is the association linear?

    PubMed

    Friese, R S; Wynne, J; Joseph, B; Hashmi, A; Diven, C; Pandit, V; O'Keeffe, T; Zangbar, B; Kulvatunyou, N; Rhee, P

    2014-10-01

    Multiple studies have demonstrated a linear association between advancing age and mortality after injury. An inflection point, or an age at which outcomes begin to differ, has not been previously described. We hypothesized that the relationship between age and mortality after injury is non-linear and an inflection point exists. We performed a retrospective cohort analysis at our urban level I center from 2007 through 2009. All patients aged 65 years and older with the admission diagnosis of injury were included. Non-parametric logistic regression was used to identify the functional form between mortality and age. Multivariate logistic regression was utilized to explore the association between age and mortality. Age 65 years was used as the reference. Significance was defined as p < 0.05. A total of 1,107 patients were included in the analysis. One-third required intensive care unit (ICU) admission and 48 % had traumatic brain injury. 229 patients (20.6 %) were 84 years of age or older. The overall mortality was 7.2 %. Our model indicates that mortality is a quadratic function of age. After controlling for confounders, age is associated with mortality with a regression coefficient of 1.08 for the linear term (p = 0.02) and a regression coefficient of -0.006 for the quadratic term (p = 0.03). The model identified 84.4 years of age as the inflection point at which mortality rates begin to decline. The risk of death after injury varies linearly with age until 84 years. After 84 years of age, the mortality rates decline. These findings may reflect the varying severity of comorbidities and differences in baseline functional status in elderly trauma patients. Specifically, a proportion of our injured patient population less than 84 years old may be more frail, contributing to increased mortality after trauma, whereas a larger proportion of our injured patients over 84 years old, by virtue of reaching this advanced age, may, in fact, be less frail, contributing to less risk of death.

  4. Educational opportunities in bladder cancer: increasing cystoscopic adherence and the availability of smoking-cessation programs.

    PubMed

    Kowalkowski, Marc A; Goltz, Heather Honoré; Petersen, Nancy J; Amiel, Gilad E; Lerner, Seth P; Latini, David M

    2014-12-01

    Cancer survivors who continue to smoke following diagnosis are at increased risk for recurrence. Yet, smoking prevalence among survivors is similar to the general population. Adherence to cystoscopic surveillance is an important disease-management strategy for non-muscle-invasive bladder cancer (NMIBC) survivors, but data from Surveillance, Epidemiology, and End Results program (SEER) suggest current adherence levels are insufficient to identify recurrences at critically early stages. This study was conducted to identify actionable targets for educational intervention to increase adherence to cystoscopic monitoring for disease recurrence or progression. NMIBC survivors (n = 109) completed telephone-based surveys. Adherence was determined by measuring time from diagnosis to interview date; cystoscopies received were then compared to American Urological Association (AUA) guidelines. Data were analyzed using non-parametric tests for univariate and logistic regression for multivariable analyses. Participants averaged 65 years (SD = 9.3) and were primarily white (95 %), male (75 %), married (75 %), and non-smokers (84 %). Eighty-three percent reported either Ta- or T1-stage bladder tumors. Forty-five percent met AUA guidelines for adherence. Compared to non-smokers, current smokers reported increased fear of recurrence and psychological distress (p < 0.05). In regression analyses, non-adherence was associated with smoking (OR = 33.91, p < 0.01), providing a behavioral marker to describe a survivor group with unmet needs that may contribute to low cystoscopic adherence. Research assessing survivorship needs and designing and evaluating educational programs for NMIBC survivors should be a high priority. Identifying unmet needs among NMIBC survivors and developing programs to address these needs may increase compliance with cystoscopic monitoring, improve outcomes, and enhance quality of life.

  5. Effect of Contact Damage on the Strength of Ceramic Materials.

    DTIC Science & Technology

    1982-10-01

    variables that are important to erosion, and a multivariate , linear regression analysis is used to fit the data to the dimensional analysis. The...of Equations 7 and 8 by a multivariable regression analysis (room tem- perature data) Exponent Regression Standard error Computed coefficient of...1980) 593. WEAVER, Proc. Brit. Ceram. Soc. 22 (1973) 125. 39. P. W. BRIDGMAN, "Dimensional Analaysis ", (Yale 18. R. W. RICE, S. W. FREIMAN and P. F

  6. Inference of Gene Regulatory Networks Using Bayesian Nonparametric Regression and Topology Information.

    PubMed

    Fan, Yue; Wang, Xiao; Peng, Qinke

    2017-01-01

    Gene regulatory networks (GRNs) play an important role in cellular systems and are important for understanding biological processes. Many algorithms have been developed to infer the GRNs. However, most algorithms only pay attention to the gene expression data but do not consider the topology information in their inference process, while incorporating this information can partially compensate for the lack of reliable expression data. Here we develop a Bayesian group lasso with spike and slab priors to perform gene selection and estimation for nonparametric models. B-spline basis functions are used to capture the nonlinear relationships flexibly and penalties are used to avoid overfitting. Further, we incorporate the topology information into the Bayesian method as a prior. We present the application of our method on DREAM3 and DREAM4 datasets and two real biological datasets. The results show that our method performs better than existing methods and the topology information prior can improve the result.

  7. A Selective Review of Group Selection in High-Dimensional Models

    PubMed Central

    Huang, Jian; Breheny, Patrick; Ma, Shuangge

    2013-01-01

    Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study. PMID:24174707

  8. Design of neural networks for classification of remotely sensed imagery

    NASA Technical Reports Server (NTRS)

    Chettri, Samir R.; Cromp, Robert F.; Birmingham, Mark

    1992-01-01

    Classification accuracies of a backpropagation neural network are discussed and compared with a maximum likelihood classifier (MLC) with multivariate normal class models. We have found that, because of its nonparametric nature, the neural network outperforms the MLC in this area. In addition, we discuss techniques for constructing optimal neural nets on parallel hardware like the MasPar MP-1 currently at GSFC. Other important discussions are centered around training and classification times of the two methods, and sensitivity to the training data. Finally, we discuss future work in the area of classification and neural nets.

  9. Bootstrap-based procedures for inference in nonparametric receiver-operating characteristic curve regression analysis.

    PubMed

    Rodríguez-Álvarez, María Xosé; Roca-Pardiñas, Javier; Cadarso-Suárez, Carmen; Tahoces, Pablo G

    2018-03-01

    Prior to using a diagnostic test in a routine clinical setting, the rigorous evaluation of its diagnostic accuracy is essential. The receiver-operating characteristic curve is the measure of accuracy most widely used for continuous diagnostic tests. However, the possible impact of extra information about the patient (or even the environment) on diagnostic accuracy also needs to be assessed. In this paper, we focus on an estimator for the covariate-specific receiver-operating characteristic curve based on direct regression modelling and nonparametric smoothing techniques. This approach defines the class of generalised additive models for the receiver-operating characteristic curve. The main aim of the paper is to offer new inferential procedures for testing the effect of covariates on the conditional receiver-operating characteristic curve within the above-mentioned class. Specifically, two different bootstrap-based tests are suggested to check (a) the possible effect of continuous covariates on the receiver-operating characteristic curve and (b) the presence of factor-by-curve interaction terms. The validity of the proposed bootstrap-based procedures is supported by simulations. To facilitate the application of these new procedures in practice, an R-package, known as npROCRegression, is provided and briefly described. Finally, data derived from a computer-aided diagnostic system for the automatic detection of tumour masses in breast cancer is analysed.

  10. Variable selection and model choice in geoadditive regression models.

    PubMed

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  11. Regression analysis of informative current status data with the additive hazards model.

    PubMed

    Zhao, Shishun; Hu, Tao; Ma, Ling; Wang, Peijie; Sun, Jianguo

    2015-04-01

    This paper discusses regression analysis of current status failure time data arising from the additive hazards model in the presence of informative censoring. Many methods have been developed for regression analysis of current status data under various regression models if the censoring is noninformative, and also there exists a large literature on parametric analysis of informative current status data in the context of tumorgenicity experiments. In this paper, a semiparametric maximum likelihood estimation procedure is presented and in the method, the copula model is employed to describe the relationship between the failure time of interest and the censoring time. Furthermore, I-splines are used to approximate the nonparametric functions involved and the asymptotic consistency and normality of the proposed estimators are established. A simulation study is conducted and indicates that the proposed approach works well for practical situations. An illustrative example is also provided.

  12. Variable Selection in Logistic Regression.

    DTIC Science & Technology

    1987-06-01

    23 %. AUTIOR(.) S. CONTRACT OR GRANT NUMBE Rf.i %Z. D. Bai, P. R. Krishnaiah and . C. Zhao F49620-85- C-0008 " PERFORMING ORGANIZATION NAME AND AOORESS...d I7 IOK-TK- d 7 -I0 7’ VARIABLE SELECTION IN LOGISTIC REGRESSION Z. D. Bai, P. R. Krishnaiah and L. C. Zhao Center for Multivariate Analysis...University of Pittsburgh Center for Multivariate Analysis University of Pittsburgh Y !I VARIABLE SELECTION IN LOGISTIC REGRESSION Z- 0. Bai, P. R. Krishnaiah

  13. Comparing lagged linear correlation, lagged regression, Granger causality, and vector autoregression for uncovering associations in EHR data.

    PubMed

    Levine, Matthew E; Albers, David J; Hripcsak, George

    2016-01-01

    Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.

  14. A matrix-based method of moments for fitting the multivariate random effects model for meta-analysis and meta-regression

    PubMed Central

    Jackson, Dan; White, Ian R; Riley, Richard D

    2013-01-01

    Multivariate meta-analysis is becoming more commonly used. Methods for fitting the multivariate random effects model include maximum likelihood, restricted maximum likelihood, Bayesian estimation and multivariate generalisations of the standard univariate method of moments. Here, we provide a new multivariate method of moments for estimating the between-study covariance matrix with the properties that (1) it allows for either complete or incomplete outcomes and (2) it allows for covariates through meta-regression. Further, for complete data, it is invariant to linear transformations. Our method reduces to the usual univariate method of moments, proposed by DerSimonian and Laird, in a single dimension. We illustrate our method and compare it with some of the alternatives using a simulation study and a real example. PMID:23401213

  15. SMURC: High-Dimension Small-Sample Multivariate Regression With Covariance Estimation.

    PubMed

    Bayar, Belhassen; Bouaynaya, Nidhal; Shterenberg, Roman

    2017-03-01

    We consider a high-dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. The system is underdetermined as there are more parameters than samples. We show that the maximum likelihood approach with covariance estimation is senseless because the likelihood diverges. We subsequently propose a normalization of the likelihood function that guarantees convergence. We call this method small-sample multivariate regression with covariance (SMURC) estimation. We derive an optimization problem and its convex approximation to compute SMURC. Simulation results show that the proposed algorithm outperforms the regularized likelihood estimator with known covariance matrix and the sparse conditional Gaussian graphical model. We also apply SMURC to the inference of the wing-muscle gene network of the Drosophila melanogaster (fruit fly).

  16. Keeping data continuous when analyzing the prognostic impact of a tumor marker: an example with cathepsin D in breast cancer.

    PubMed

    Bossard, N; Descotes, F; Bremond, A G; Bobin, Y; De Saint Hilaire, P; Golfier, F; Awada, A; Mathevet, P M; Berrerd, L; Barbier, Y; Estève, J

    2003-11-01

    The prognostic value of cathepsin D has been recently recognized, but as many quantitative tumor markers, its clinical use remains unclear partly because of methodological issues in defining cut-off values. Guidelines have been proposed for analyzing quantitative prognostic factors, underlining the need for keeping data continuous, instead of categorizing them. Flexible approaches, parametric and non-parametric, have been proposed in order to improve the knowledge of the functional form relating a continuous factor to the risk. We studied the prognostic value of cathepsin D in a retrospective hospital cohort of 771 patients with breast cancer, and focused our overall survival analysis, based on the Cox regression, on two flexible approaches: smoothing splines and fractional polynomials. We also determined a cut-off value from the maximum likelihood estimate of a threshold model. These different approaches complemented each other for (1) identifying the functional form relating cathepsin D to the risk, and obtaining a cut-off value and (2) optimizing the adjustment for complex covariate like age at diagnosis in the final multivariate Cox model. We found a significant increase in the death rate, reaching 70% with a doubling of the level of cathepsin D, after the threshold of 37.5 pmol mg(-1). The proper prognostic impact of this marker could be confirmed and a methodology providing appropriate ways to use markers in clinical practice was proposed.

  17. Identifying Pathways for Improving Household Food Self-Sufficiency Outcomes in the Hills of Nepal

    PubMed Central

    Karki, Tika B.; Sah, Shrawan K.; Thapa, Resam B.; McDonald, Andrew J.; Davis, Adam S.

    2015-01-01

    Maintaining and improving household food self-sufficiency (FSS) in mountain regions is an ongoing challenge. There are many facets to the issue, including comparatively high levels of land fragmentation, challenging terrain and transportation bottlenecks, declining labor availability due to out-migration, and low technical knowledge, among others. Using a nonparametric multivariate approach, we quantified primary associations underlying current levels of FSS in the mid-hills of Nepal. A needs assessment survey was administered to 77 households in Lungaun (Baglung District), Pang (Parbat District), and Pathlekhet (Myagdi District), with a total of 80 variables covering five performance areas; resulting data were analyzed using Classification and Regression Trees. The most parsimonious statistical model for household FSS highlighted associations with agronomic management, including yields of maize and fingermillet within a relay cropping system and adoption of improved crop cultivars. Secondary analyses of the variables retained in the first model again focused primarily on crop and livestock management. It thus appears that continued emphasis on technical agricultural improvements is warranted, independent of factors such as land holding size that, in any case, are very difficult to change through development interventions. Initiatives to increase household FSS in the mid-hills of Nepal will benefit from placing a primary focus on methods of agricultural intensification to improve crop yields and effective technology transfer to increase adoption of these methods. PMID:26047508

  18. Seasonal trends in Ceratitis capitata reproductive potential derived from live-caught females in Greece

    PubMed Central

    Kouloussis, Nikos A.; Papadopoulos, Nikos T.; Katsoyannos, Byron I.; Müller, Hans-Georg; Wang, Jane-Ling; Su, Yu-Ru; Molleman, Freerk; Carey, James R.

    2012-01-01

    Reproductive data of individual insects are extremely hard to collect under natural conditions, thus the study of research questions related to oviposition has not advanced. Patterns of oviposition are often inferred only indirectly, through monitoring of host infestation, whereas the influence of age structure and several other factors on oviposition remains unknown. Using a new approach, in this article, we live-trapped wild Ceratitis capitata (Wiedemann) (Diptera: Tephritidae) females on the Greek island of Chios during two field seasons. For their remaining lifetime, these females were placed individually in small cages and their daily oviposition was monitored. Reproduction rates between cohorts from different collection dates were then compared. The results showed that in the different captive cohorts the average remaining lifetime and reproduction were highly variable within and between seasons. Multivariate regression analysis showed that the month of capture had a significant effect on captive life span, average daily reproduction, and patterns of egg laying. The effect of year was significant on reproduction, but not on captive life span. These differences between sampling periods probably reflect differences in the availability of hosts and other factors that vary during the season and affect age structure and reproduction. Using a non-parametric generalized additive model, we found a statistically significant correlation between the captive life span and the average daily reproduction. These findings and the experimental approach have several important implications. PMID:22791908

  19. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  20. Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

    PubMed

    Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

    2017-03-15

    Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  1. Bayesian Nonparametric Prediction and Statistical Inference

    DTIC Science & Technology

    1989-09-07

    Kadane, J. (1980), "Bayesian decision theory and the sim- plification of models," in Evaluation of Econometric Models, J. Kmenta and J. Ramsey , eds...the random model and weighted least squares regression," in Evaluation of Econometric Models, ed. by J. Kmenta and J. Ramsey , Academic Press, 197-217...likelihood function. On the other hand, H. Jeffreys’s theory of hypothesis testing covers the most important situations in which the prior is not diffuse. See

  2. Network structure of multivariate time series.

    PubMed

    Lacasa, Lucas; Nicosia, Vincenzo; Latora, Vito

    2015-10-21

    Our understanding of a variety of phenomena in physics, biology and economics crucially depends on the analysis of multivariate time series. While a wide range tools and techniques for time series analysis already exist, the increasing availability of massive data structures calls for new approaches for multidimensional signal processing. We present here a non-parametric method to analyse multivariate time series, based on the mapping of a multidimensional time series into a multilayer network, which allows to extract information on a high dimensional dynamical system through the analysis of the structure of the associated multiplex network. The method is simple to implement, general, scalable, does not require ad hoc phase space partitioning, and is thus suitable for the analysis of large, heterogeneous and non-stationary time series. We show that simple structural descriptors of the associated multiplex networks allow to extract and quantify nontrivial properties of coupled chaotic maps, including the transition between different dynamical phases and the onset of various types of synchronization. As a concrete example we then study financial time series, showing that a multiplex network analysis can efficiently discriminate crises from periods of financial stability, where standard methods based on time-series symbolization often fail.

  3. Multivariate regression model for predicting lumber grade volumes of northern red oak sawlogs

    Treesearch

    Daniel A. Yaussy; Robert L. Brisbin

    1983-01-01

    A multivariate regression model was developed to predict green board-foot yields for the seven common factory lumber grades processed from northern red oak (Quercus rubra L.) factory grade logs. The model uses the standard log measurements of grade, scaling diameter, length, and percent defect. It was validated with an independent data set. The model...

  4. Predictive and mechanistic multivariate linear regression models for reaction development

    PubMed Central

    Santiago, Celine B.; Guo, Jing-Yao

    2018-01-01

    Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711

  5. Multivariate regression model for predicting yields of grade lumber from yellow birch sawlogs

    Treesearch

    Andrew F. Howard; Daniel A. Yaussy

    1986-01-01

    A multivariate regression model was developed to predict green board-foot yields for the common grades of factory lumber processed from yellow birch factory-grade logs. The model incorporates the standard log measurements of scaling diameter, length, proportion of scalable defects, and the assigned USDA Forest Service log grade. Differences in yields between band and...

  6. G/SPLINES: A hybrid of Friedman's Multivariate Adaptive Regression Splines (MARS) algorithm with Holland's genetic algorithm

    NASA Technical Reports Server (NTRS)

    Rogers, David

    1991-01-01

    G/SPLINES are a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm with Holland's Genetic Algorithm. In this hybrid, the incremental search is replaced by a genetic search. The G/SPLINE algorithm exhibits performance comparable to that of the MARS algorithm, requires fewer least squares computations, and allows significantly larger problems to be considered.

  7. Validation of cross-sectional time series and multivariate adaptive regression splines models for the prediction of energy expenditure in children and adolescents using doubly labeled water

    USDA-ARS?s Scientific Manuscript database

    Accurate, nonintrusive, and inexpensive techniques are needed to measure energy expenditure (EE) in free-living populations. Our primary aim in this study was to validate cross-sectional time series (CSTS) and multivariate adaptive regression splines (MARS) models based on observable participant cha...

  8. Analytical framework for reconstructing heterogeneous environmental variables from mammal community structure.

    PubMed

    Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C

    2015-01-01

    We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.

  9. Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

    NASA Astrophysics Data System (ADS)

    Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

    2016-03-01

    From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.

  10. Higher-order Multivariable Polynomial Regression to Estimate Human Affective States

    PubMed Central

    Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin

    2016-01-01

    From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states. PMID:26996254

  11. Regression analysis of current-status data: an application to breast-feeding.

    PubMed

    Grummer-strawn, L M

    1993-09-01

    "Although techniques for calculating mean survival time from current-status data are well known, their use in multiple regression models is somewhat troublesome. Using data on current breast-feeding behavior, this article considers a number of techniques that have been suggested in the literature, including parametric, nonparametric, and semiparametric models as well as the application of standard schedules. Models are tested in both proportional-odds and proportional-hazards frameworks....I fit [the] models to current status data on breast-feeding from the Demographic and Health Survey (DHS) in six countries: two African (Mali and Ondo State, Nigeria), two Asian (Indonesia and Sri Lanka), and two Latin American (Colombia and Peru)." excerpt

  12. Regression Models For Multivariate Count Data

    PubMed Central

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2016-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. PMID:28348500

  13. Regression Models For Multivariate Count Data.

    PubMed

    Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei

    2017-01-01

    Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.

  14. A nonparametric mean-variance smoothing method to assess Arabidopsis cold stress transcriptional regulator CBF2 overexpression microarray data.

    PubMed

    Hu, Pingsha; Maiti, Tapabrata

    2011-01-01

    Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request.

  15. A Nonparametric Mean-Variance Smoothing Method to Assess Arabidopsis Cold Stress Transcriptional Regulator CBF2 Overexpression Microarray Data

    PubMed Central

    Hu, Pingsha; Maiti, Tapabrata

    2011-01-01

    Microarray is a powerful tool for genome-wide gene expression analysis. In microarray expression data, often mean and variance have certain relationships. We present a non-parametric mean-variance smoothing method (NPMVS) to analyze differentially expressed genes. In this method, a nonlinear smoothing curve is fitted to estimate the relationship between mean and variance. Inference is then made upon shrinkage estimation of posterior means assuming variances are known. Different methods have been applied to simulated datasets, in which a variety of mean and variance relationships were imposed. The simulation study showed that NPMVS outperformed the other two popular shrinkage estimation methods in some mean-variance relationships; and NPMVS was competitive with the two methods in other relationships. A real biological dataset, in which a cold stress transcription factor gene, CBF2, was overexpressed, has also been analyzed with the three methods. Gene ontology and cis-element analysis showed that NPMVS identified more cold and stress responsive genes than the other two methods did. The good performance of NPMVS is mainly due to its shrinkage estimation for both means and variances. In addition, NPMVS exploits a non-parametric regression between mean and variance, instead of assuming a specific parametric relationship between mean and variance. The source code written in R is available from the authors on request. PMID:21611181

  16. The relationship between multilevel models and non-parametric multilevel mixture models: Discrete approximation of intraclass correlation, random coefficient distributions, and residual heteroscedasticity.

    PubMed

    Rights, Jason D; Sterba, Sonya K

    2016-11-01

    Multilevel data structures are common in the social sciences. Often, such nested data are analysed with multilevel models (MLMs) in which heterogeneity between clusters is modelled by continuously distributed random intercepts and/or slopes. Alternatively, the non-parametric multilevel regression mixture model (NPMM) can accommodate the same nested data structures through discrete latent class variation. The purpose of this article is to delineate analytic relationships between NPMM and MLM parameters that are useful for understanding the indirect interpretation of the NPMM as a non-parametric approximation of the MLM, with relaxed distributional assumptions. We define how seven standard and non-standard MLM specifications can be indirectly approximated by particular NPMM specifications. We provide formulas showing how the NPMM can serve as an approximation of the MLM in terms of intraclass correlation, random coefficient means and (co)variances, heteroscedasticity of residuals at level 1, and heteroscedasticity of residuals at level 2. Further, we discuss how these relationships can be useful in practice. The specific relationships are illustrated with simulated graphical demonstrations, and direct and indirect interpretations of NPMM classes are contrasted. We provide an R function to aid in implementing and visualizing an indirect interpretation of NPMM classes. An empirical example is presented and future directions are discussed. © 2016 The British Psychological Society.

  17. Generalized reduced rank latent factor regression for high dimensional tensor fields, and neuroimaging-genetic applications

    PubMed Central

    Tao, Chenyang; Nichols, Thomas E.; Hua, Xue; Ching, Christopher R.K.; Rolls, Edmund T.; Thompson, Paul M.; Feng, Jianfeng

    2017-01-01

    We propose a generalized reduced rank latent factor regression model (GRRLF) for the analysis of tensor field responses and high dimensional covariates. The model is motivated by the need from imaging-genetic studies to identify genetic variants that are associated with brain imaging phenotypes, often in the form of high dimensional tensor fields. GRRLF identifies from the structure in the data the effective dimensionality of the data, and then jointly performs dimension reduction of the covariates, dynamic identification of latent factors, and nonparametric estimation of both covariate and latent response fields. After accounting for the latent and covariate effects, GRLLF performs a nonparametric test on the remaining factor of interest. GRRLF provides a better factorization of the signals compared with common solutions, and is less susceptible to overfitting because it exploits the effective dimensionality. The generality and the flexibility of GRRLF also allow various statistical models to be handled in a unified framework and solutions can be efficiently computed. Within the field of neuroimaging, it improves the sensitivity for weak signals and is a promising alternative to existing approaches. The operation of the framework is demonstrated with both synthetic datasets and a real-world neuroimaging example in which the effects of a set of genes on the structure of the brain at the voxel level were measured, and the results compared favorably with those from existing approaches. PMID:27666385

  18. Kernel-based whole-genome prediction of complex traits: a review.

    PubMed

    Morota, Gota; Gianola, Daniel

    2014-01-01

    Prediction of genetic values has been a focus of applied quantitative genetics since the beginning of the 20th century, with renewed interest following the advent of the era of whole genome-enabled prediction. Opportunities offered by the emergence of high-dimensional genomic data fueled by post-Sanger sequencing technologies, especially molecular markers, have driven researchers to extend Ronald Fisher and Sewall Wright's models to confront new challenges. In particular, kernel methods are gaining consideration as a regression method of choice for genome-enabled prediction. Complex traits are presumably influenced by many genomic regions working in concert with others (clearly so when considering pathways), thus generating interactions. Motivated by this view, a growing number of statistical approaches based on kernels attempt to capture non-additive effects, either parametrically or non-parametrically. This review centers on whole-genome regression using kernel methods applied to a wide range of quantitative traits of agricultural importance in animals and plants. We discuss various kernel-based approaches tailored to capturing total genetic variation, with the aim of arriving at an enhanced predictive performance in the light of available genome annotation information. Connections between prediction machines born in animal breeding, statistics, and machine learning are revisited, and their empirical prediction performance is discussed. Overall, while some encouraging results have been obtained with non-parametric kernels, recovering non-additive genetic variation in a validation dataset remains a challenge in quantitative genetics.

  19. Determination of the minimal clinically important difference for seven fatigue measures in rheumatoid arthritis

    PubMed Central

    Pouchot, Jacques; Kherani, Raheem B.; Brant, Rollin; Lacaille, Diane; Lehman, Allen J.; Ensworth, Stephanie; Kopec, Jacek; Esdaile, John M.; Liang, Matthew H.

    2008-01-01

    Objective To estimate the minimal clinically important difference (MCID) of seven measures of fatigue in rheumatoid arthritis. Study Design and Setting A cross-sectional study design based on inter-individual comparisons was used. Six to eight subjects participated in a single meeting and completed seven fatigue questionnaires (nine sessions were organized and 61 subjects participated). After completion of the questionnaires, the subjects had five one-on-one 10-minute conversations with different people in the group to discuss their fatigue. After each conversation, each patient compared their fatigue to their conversational partner’s on a global rating. Ratings were compared to the scores of the fatigue measures to estimate the MCID. Both non-parametric and linear regression analyses were used. Results Non-parametric estimates for the MCID relative to “little more fatigue” tended to be smaller than those for “little less fatigue”. The global MCIDs estimated by linear regression were: FSS 20.2, VT 14.8, MAF 18.7, MFI 16.6, FACIT–F 15.9, CFS 9.9, RS 19.7, for normalized scores (0 to 100). The standardized MCIDs for the seven measures were roughly similar (0.67 to 0.76). Conclusion These estimates of MCID will help to interpret changes observed in a fatigue score and will be critical in estimating sample size requirements. PMID:18359189

  20. Extending the Distributed Lag Model framework to handle chemical mixtures.

    PubMed

    Bello, Ghalib A; Arora, Manish; Austin, Christine; Horton, Megan K; Wright, Robert O; Gennings, Chris

    2017-07-01

    Distributed Lag Models (DLMs) are used in environmental health studies to analyze the time-delayed effect of an exposure on an outcome of interest. Given the increasing need for analytical tools for evaluation of the effects of exposure to multi-pollutant mixtures, this study attempts to extend the classical DLM framework to accommodate and evaluate multiple longitudinally observed exposures. We introduce 2 techniques for quantifying the time-varying mixture effect of multiple exposures on an outcome of interest. Lagged WQS, the first technique, is based on Weighted Quantile Sum (WQS) regression, a penalized regression method that estimates mixture effects using a weighted index. We also introduce Tree-based DLMs, a nonparametric alternative for assessment of lagged mixture effects. This technique is based on the Random Forest (RF) algorithm, a nonparametric, tree-based estimation technique that has shown excellent performance in a wide variety of domains. In a simulation study, we tested the feasibility of these techniques and evaluated their performance in comparison to standard methodology. Both methods exhibited relatively robust performance, accurately capturing pre-defined non-linear functional relationships in different simulation settings. Further, we applied these techniques to data on perinatal exposure to environmental metal toxicants, with the goal of evaluating the effects of exposure on neurodevelopment. Our methods identified critical neurodevelopmental windows showing significant sensitivity to metal mixtures. Copyright © 2017 Elsevier Inc. All rights reserved.

  1. Assessing the response of area burned to changing climate in western boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach

    Treesearch

    Michael S. Balshi; A. David McGuire; Paul Duffy; Mike Flannigan; John Walsh; Jerry Melillo

    2009-01-01

    We developed temporally and spatially explicit relationships between air temperature and fuel moisture codes derived from the Canadian Fire Weather Index System to estimate annual area burned at 2.5o (latitude x longitude) resolution using a Multivariate Adaptive Regression Spline (MARS) approach across Alaska and Canada. Burned area was...

  2. Development of a new outcome prediction model for Chinese patients with penile squamous cell carcinoma based on preoperative serum C-reactive protein, body mass index, and standard pathological risk factors: the TNCB score group system.

    PubMed

    Li, Zai-Shang; Chen, Peng; Yao, Kai; Wang, Bin; Li, Jing; Mi, Qi-Wu; Chen, Xiao-Feng; Zhao, Qi; Li, Yong-Hong; Chen, Jie-Ping; Deng, Chuang-Zhong; Ye, Yun-Lin; Zhong, Ming-Zhu; Liu, Zhuo-Wei; Qin, Zi-Ke; Lin, Xiang-Tian; Liang, Wei-Cong; Han, Hui; Zhou, Fang-Jian

    2016-04-12

    To determine the predictive value and feasibility of the new outcome prediction model for Chinese patients with penile squamous cell carcinoma. The 3-year disease-specific survival (DSS) survival (DSS) was 92.3% in patients with < 8.70 mg/L CRP and 54.9% in those with elevated CRP (P < 0.001). The 3-year DSS was 86.5% in patients with a BMI < 22.6 Kg/m2 and 69.9% in those with a higher BMI (P = 0.025). In a multivariate analysis, pathological T stage (P < 0.001), pathological N stage (P = 0.002), BMI (P = 0.002), and CRP (P = 0.004) were independent predictors of DSS. A new scoring model was developed, consisting of BMI, CRP, and tumor T and N classification. In our study, we found that the addition of the above-mentioned parameters significantly increased the predictive accuracy of the system of the American Joint Committee on Cancer (AJCC) anatomic stage group. The accuracy of the new prediction category was verified. A total of 172 Chinese patients with penile squamous cell cancer were analyzed retrospectively between November 2005 and November 2014. Statistical data analysis was conducted using the nonparametric method. Survival analysis was performed with the log-rank test and the Cox proportional hazard model. Based on regression estimates of significant parameters in multivariate analysis, a new BMI-, CRP- and pathologic factors-based scoring model was developed to predict disease--specific outcomes. The predictive accuracy of the model was evaluated using the internal and external validation. The present study demonstrated that the TNCB score group system maybe a precise and easy to use tool for predicting outcomes in Chinese penile squamous cell carcinoma patients.

  3. SPICE: exploration and analysis of post-cytometric complex multivariate datasets.

    PubMed

    Roederer, Mario; Nozzi, Joshua L; Nason, Martha C

    2011-02-01

    Polychromatic flow cytometry results in complex, multivariate datasets. To date, tools for the aggregate analysis of these datasets across multiple specimens grouped by different categorical variables, such as demographic information, have not been optimized. Often, the exploration of such datasets is accomplished by visualization of patterns with pie charts or bar charts, without easy access to statistical comparisons of measurements that comprise multiple components. Here we report on algorithms and a graphical interface we developed for these purposes. In particular, we discuss thresholding necessary for accurate representation of data in pie charts, the implications for display and comparison of normalized versus unnormalized data, and the effects of averaging when samples with significant background noise are present. Finally, we define a statistic for the nonparametric comparison of complex distributions to test for difference between groups of samples based on multi-component measurements. While originally developed to support the analysis of T cell functional profiles, these techniques are amenable to a broad range of datatypes. Published 2011 Wiley-Liss, Inc.

  4. Improving Prediction Accuracy for WSN Data Reduction by Applying Multivariate Spatio-Temporal Correlation

    PubMed Central

    Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman

    2011-01-01

    This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626

  5. Characterizing multivariate decoding models based on correlated EEG spectral features

    PubMed Central

    McFarland, Dennis J.

    2013-01-01

    Objective Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Methods Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). Results The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Conclusions Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. Significance While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. PMID:23466267

  6. Gender Wage Disparities among the Highly Educated.

    PubMed

    Black, Dan A; Haviland, Amelia; Sanders, Seth G; Taylor, Lowell J

    2008-01-01

    In the U.S. college-educated women earn approximately 30 percent less than their non-Hispanic white male counterparts. We conduct an empirical examination of this wage disparity for four groups of women-non-Hispanic white, black, Hispanic, and Asian-using the National Survey of College Graduates, a large data set that provides unusually detailed information on higher-level education. Nonparametric matching analysis indicates that among men and women who speak English at home, between 44 and 73 percent of the gender wage gaps are accounted for by such pre-market factors as highest degree and major. When we restrict attention further to women who have "high labor force attachment" (i.e., work experience that is similar to male comparables) we account for 54 to 99 percent of gender wage gaps. Our nonparametric approach differs from familiar regression-based decompositions, so for the sake of comparison we conduct parametric analyses as well. Inferences drawn from these latter decompositions can be quite misleading.

  7. Order-restricted inference for means with missing values.

    PubMed

    Wang, Heng; Zhong, Ping-Shou

    2017-09-01

    Missing values appear very often in many applications, but the problem of missing values has not received much attention in testing order-restricted alternatives. Under the missing at random (MAR) assumption, we impute the missing values nonparametrically using kernel regression. For data with imputation, the classical likelihood ratio test designed for testing the order-restricted means is no longer applicable since the likelihood does not exist. This article proposes a novel method for constructing test statistics for assessing means with an increasing order or a decreasing order based on jackknife empirical likelihood (JEL) ratio. It is shown that the JEL ratio statistic evaluated under the null hypothesis converges to a chi-bar-square distribution, whose weights depend on missing probabilities and nonparametric imputation. Simulation study shows that the proposed test performs well under various missing scenarios and is robust for normally and nonnormally distributed data. The proposed method is applied to an Alzheimer's disease neuroimaging initiative data set for finding a biomarker for the diagnosis of the Alzheimer's disease. © 2017, The International Biometric Society.

  8. Gender Wage Disparities among the Highly Educated

    PubMed Central

    Black, Dan A.; Haviland, Amelia; Sanders, Seth G.; Taylor, Lowell J.

    2015-01-01

    In the U.S. college-educated women earn approximately 30 percent less than their non-Hispanic white male counterparts. We conduct an empirical examination of this wage disparity for four groups of women—non-Hispanic white, black, Hispanic, and Asian—using the National Survey of College Graduates, a large data set that provides unusually detailed information on higher-level education. Nonparametric matching analysis indicates that among men and women who speak English at home, between 44 and 73 percent of the gender wage gaps are accounted for by such pre-market factors as highest degree and major. When we restrict attention further to women who have “high labor force attachment” (i.e., work experience that is similar to male comparables) we account for 54 to 99 percent of gender wage gaps. Our nonparametric approach differs from familiar regression-based decompositions, so for the sake of comparison we conduct parametric analyses as well. Inferences drawn from these latter decompositions can be quite misleading. PMID:26097255

  9. Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data

    PubMed Central

    Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian

    2015-01-01

    In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213

  10. Quantitative monitoring of sucrose, reducing sugar and total sugar dynamics for phenotyping of water-deficit stress tolerance in rice through spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Das, Bappa; Sahoo, Rabi N.; Pargal, Sourabh; Krishna, Gopal; Verma, Rakesh; Chinnusamy, Viswanathan; Sehgal, Vinay K.; Gupta, Vinod K.; Dash, Sushanta K.; Swain, Padmini

    2018-03-01

    In the present investigation, the changes in sucrose, reducing and total sugar content due to water-deficit stress in rice leaves were modeled using visible, near infrared (VNIR) and shortwave infrared (SWIR) spectroscopy. The objectives of the study were to identify the best vegetation indices and suitable multivariate technique based on precise analysis of hyperspectral data (350 to 2500 nm) and sucrose, reducing sugar and total sugar content measured at different stress levels from 16 different rice genotypes. Spectral data analysis was done to identify suitable spectral indices and models for sucrose estimation. Novel spectral indices in near infrared (NIR) range viz. ratio spectral index (RSI) and normalised difference spectral indices (NDSI) sensitive to sucrose, reducing sugar and total sugar content were identified which were subsequently calibrated and validated. The RSI and NDSI models had R2 values of 0.65, 0.71 and 0.67; RPD values of 1.68, 1.95 and 1.66 for sucrose, reducing sugar and total sugar, respectively for validation dataset. Different multivariate spectral models such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), multiple linear regression (MLR), partial least square regression (PLSR), random forest regression (RFR) and support vector machine regression (SVMR) were also evaluated. The best performing multivariate models for sucrose, reducing sugars and total sugars were found to be, MARS, ANN and MARS, respectively with respect to RPD values of 2.08, 2.44, and 1.93. Results indicated that VNIR and SWIR spectroscopy combined with multivariate calibration can be used as a reliable alternative to conventional methods for measurement of sucrose, reducing sugars and total sugars of rice under water-deficit stress as this technique is fast, economic, and noninvasive.

  11. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression

    PubMed Central

    Jiang, Feng; Han, Ji-zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088

  12. A Cross-Domain Collaborative Filtering Algorithm Based on Feature Construction and Locally Weighted Linear Regression.

    PubMed

    Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong

    2018-01-01

    Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.

  13. Reference Charts for Fetal Cerebellar Vermis Height: A Prospective Cross-Sectional Study of 10605 Fetuses

    PubMed Central

    Cignini, Pietro; Giorlandino, Maurizio; Brutti, Pierpaolo; Mangiafico, Lucia; Aloisi, Alessia; Giorlandino, Claudio

    2016-01-01

    Objective To establish reference charts for fetal cerebellar vermis height in an unselected population. Methods A prospective cross-sectional study between September 2009 and December 2014 was carried out at ALTAMEDICA Fetal–Maternal Medical Centre, Rome, Italy. Of 25203 fetal biometric measurements, 12167 (48%) measurements of the cerebellar vermis were available. After excluding 1562 (12.8%) measurements, a total of 10605 (87.2%) fetuses were considered and analyzed once only. Parametric and nonparametric quantile regression models were used for the statistical analysis. In order to evaluate the robustness of the proposed reference charts regarding various distributional assumptions on the ultrasound measurements at hand, we compared the gestational age-specific reference curves we produced through the statistical methods used. Normal mean height based on parametric and nonparametric methods were defined for each week of gestation and the regression equation expressing the height of the cerebellar vermis as a function of gestational age was calculated. Finally the correlation between dimension/gestation was measured. Results The mean height of the cerebellar vermis was 12.7mm (SD, 1.6mm; 95% confidence interval, 12.7–12.8mm). The regression equation expressing the height of the CV as a function of the gestational age was: height (mm) = -4.85+0.78 x gestational age. The correlation between dimension/gestation was expressed by the coefficient r = 0.87. Conclusion This is the first prospective cross-sectional study on fetal cerebellar vermis biometry with such a large sample size reported in literature. It is a detailed statistical survey and contains new centile-based reference charts for fetal height of cerebellar vermis measurements. PMID:26812238

  14. The PIT-trap-A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses.

    PubMed

    Warton, David I; Thibaut, Loïc; Wang, Yi Alice

    2017-01-01

    Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)-common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of "model-free bootstrap", adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods.

  15. The PIT-trap—A “model-free” bootstrap procedure for inference about regression models with discrete, multivariate responses

    PubMed Central

    Thibaut, Loïc; Wang, Yi Alice

    2017-01-01

    Bootstrap methods are widely used in statistics, and bootstrapping of residuals can be especially useful in the regression context. However, difficulties are encountered extending residual resampling to regression settings where residuals are not identically distributed (thus not amenable to bootstrapping)—common examples including logistic or Poisson regression and generalizations to handle clustered or multivariate data, such as generalised estimating equations. We propose a bootstrap method based on probability integral transform (PIT-) residuals, which we call the PIT-trap, which assumes data come from some marginal distribution F of known parametric form. This method can be understood as a type of “model-free bootstrap”, adapted to the problem of discrete and highly multivariate data. PIT-residuals have the key property that they are (asymptotically) pivotal. The PIT-trap thus inherits the key property, not afforded by any other residual resampling approach, that the marginal distribution of data can be preserved under PIT-trapping. This in turn enables the derivation of some standard bootstrap properties, including second-order correctness of pivotal PIT-trap test statistics. In multivariate data, bootstrapping rows of PIT-residuals affords the property that it preserves correlation in data without the need for it to be modelled, a key point of difference as compared to a parametric bootstrap. The proposed method is illustrated on an example involving multivariate abundance data in ecology, and demonstrated via simulation to have improved properties as compared to competing resampling methods. PMID:28738071

  16. Is Heart Rate Variability Better Than Routine Vital Signs for Prehospital Identification of Major Hemorrhage

    DTIC Science & Technology

    2015-01-01

    different PRBC transfusion volumes. We performed multivariate regression analysis using HRV metrics and routine vital signs to test the hypothesis that...study sponsors did not have any role in the study design, data collection, analysis and interpretation of data, report writing, or the decision to...primary outcome was hemorrhagic injury plus different PRBC transfusion volumes. We performed multivariate regression analysis using HRV metrics and

  17. Methods for estimating population density in data-limited areas: evaluating regression and tree-based models in Peru.

    PubMed

    Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

    2014-01-01

    Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies.

  18. Methods for Estimating Population Density in Data-Limited Areas: Evaluating Regression and Tree-Based Models in Peru

    PubMed Central

    Anderson, Weston; Guikema, Seth; Zaitchik, Ben; Pan, William

    2014-01-01

    Obtaining accurate small area estimates of population is essential for policy and health planning but is often difficult in countries with limited data. In lieu of available population data, small area estimate models draw information from previous time periods or from similar areas. This study focuses on model-based methods for estimating population when no direct samples are available in the area of interest. To explore the efficacy of tree-based models for estimating population density, we compare six different model structures including Random Forest and Bayesian Additive Regression Trees. Results demonstrate that without information from prior time periods, non-parametric tree-based models produced more accurate predictions than did conventional regression methods. Improving estimates of population density in non-sampled areas is important for regions with incomplete census data and has implications for economic, health and development policies. PMID:24992657

  19. Genome-wide regression and prediction with the BGLR statistical package.

    PubMed

    Pérez, Paulino; de los Campos, Gustavo

    2014-10-01

    Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis. Copyright © 2014 by the Genetics Society of America.

  20. Considerations for monitoring raptor population trends based on counts of migrants

    USGS Publications Warehouse

    Titus, K.; Fuller, M.R.; Ruos, J.L.; Meyburg, B-U.; Chancellor, R.D.

    1989-01-01

    Various problems were identified with standardized hawk count data as annually collected at six sites. Some of the hawk lookouts increased their hours of observation from 1979-1985, thereby confounding the total counts. Data recording and missing data hamper coding of data and their use with modern analytical techniques. Coefficients of variation among years in counts averaged about 40%. The advantages and disadvantages of various analytical techniques are discussed including regression, non-parametric rank correlation trend analysis, and moving averages.

  1. Human immunophenotyping via low-variance, low-bias, interpretive regression modeling of small, wide data sets: Application to aging and immune response to influenza vaccination.

    PubMed

    Holmes, Tyson H; He, Xiao-Song

    2016-10-01

    Small, wide data sets are commonplace in human immunophenotyping research. As defined here, a small, wide data set is constructed by sampling a small to modest quantity n,1

  2. Human Immunophenotyping via Low-Variance, Low-Bias, Interpretive Regression Modeling of Small, Wide Data Sets: Application to Aging and Immune Response to Influenza Vaccination

    PubMed Central

    Holmes, Tyson H.; He, Xiao-Song

    2016-01-01

    Small, wide data sets are commonplace in human immunophenotyping research. As defined here, a small, wide data set is constructed by sampling a small to modest quantity n, 1 < n < 50, of human participants for the purpose of estimating many parameters p, such that n < p < 1,000. We offer a set of prescriptions that are designed to facilitate low-variance (i.e. stable), low-bias, interpretive regression modeling of small, wide data sets. These prescriptions are distinctive in their especially heavy emphasis on minimizing use of out-of-sample information for conducting statistical inference. That allows the working immunologist to proceed without being encumbered by imposed and often untestable statistical assumptions. Problems of unmeasured confounders, confidence-interval coverage, feature selection, and shrinkage/denoising are defined clearly and treated in detail. We propose an extension of an existing nonparametric technique for improved small-sample confidence-interval tail coverage from the univariate case (single immune feature) to the multivariate (many, possibly correlated immune features). An important role for derived features in the immunological interpretation of regression analyses is stressed. Areas of further research are discussed. Presented principles and methods are illustrated through application to a small, wide data set of adults spanning a wide range in ages and multiple immunophenotypes that were assayed before and after immunization with inactivated influenza vaccine (IIV). Our regression modeling prescriptions identify some potentially important topics for future immunological research. 1) Immunologists may wish to distinguish age-related differences in immune features from changes in immune features caused by aging. 2) A form of the bootstrap that employs linear extrapolation may prove to be an invaluable analytic tool because it allows the working immunologist to obtain accurate estimates of the stability of immune parameter estimates with a bare minimum of imposed assumptions. 3) Liberal inclusion of immune features in phenotyping panels can facilitate accurate separation of biological signal of interest from noise. In addition, through a combination of denoising and potentially improved confidence interval coverage, we identify some candidate immune correlates (frequency of cell subset and concentration of cytokine) with B cell response as measured by quantity of IIV-specific IgA antibody-secreting cells and quantity of IIV-specific IgG antibody-secreting cells. PMID:27196789

  3. Characterizing multivariate decoding models based on correlated EEG spectral features.

    PubMed

    McFarland, Dennis J

    2013-07-01

    Multivariate decoding methods are popular techniques for analysis of neurophysiological data. The present study explored potential interpretative problems with these techniques when predictors are correlated. Data from sensorimotor rhythm-based cursor control experiments was analyzed offline with linear univariate and multivariate models. Features were derived from autoregressive (AR) spectral analysis of varying model order which produced predictors that varied in their degree of correlation (i.e., multicollinearity). The use of multivariate regression models resulted in much better prediction of target position as compared to univariate regression models. However, with lower order AR features interpretation of the spectral patterns of the weights was difficult. This is likely to be due to the high degree of multicollinearity present with lower order AR features. Care should be exercised when interpreting the pattern of weights of multivariate models with correlated predictors. Comparison with univariate statistics is advisable. While multivariate decoding algorithms are very useful for prediction their utility for interpretation may be limited when predictors are correlated. Copyright © 2013 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.

  4. The Toxicological Evaluation of Realistic Emissions of Source Aerosols Study: Statistical Methods

    PubMed Central

    Coull, Brent A.; Wellenius, Gregory A.; Gonzalez-Flecha, Beatriz; Diaz, Edgar; Koutrakis, Petros; Godleski, John J.

    2013-01-01

    The Toxicological Evaluation of Realistic Emissions of Source Aerosols (TERESA) study involved withdrawal, aging, and atmospheric transformation of emissions of three coal-fired power plants. Toxicological evaluations were carried out in rats exposed to different emission scenarios with extensive exposure characterization. Data generated had multiple levels of resolution: exposure, scenario and constituent chemical composition. Here, we outline a multilayered approach to analyze the associations between exposure and health effects beginning with standard ANOVA models that treat exposure as a categorical variable. The model assessed differences in exposure effects across scenarios (by plant). To assess unadjusted associations between pollutant concentrations and health, univariate analyses were conducted using the difference between the response means under exposed and control conditions and a single constituent concentration as the predictor. Then, a novel multivariate analysis of exposure composition and health was used based on random forests, a recent extension of classification and regression trees that were applied to the outcome differences. For each exposure constituent, this approach yielded a nonparametric measure of the importance of that constituent in predicting differences in response on a given day, controlling for the other measured constituent concentrations in the model. Finally, an R2 analysis compared the relative importance of exposure scenario, plant, and constituent concentrations on each outcome. Peak expiratory flow is used to demonstrate how the multiple levels of the analysis complement each other to assess constituents most strongly associated with health effects. PMID:21913820

  5. The toxicological evaluation of realistic emissions of source aerosols study: statistical methods.

    PubMed

    Coull, Brent A; Wellenius, Gregory A; Gonzalez-Flecha, Beatriz; Diaz, Edgar; Koutrakis, Petros; Godleski, John J

    2011-08-01

    The Toxicological Evaluation of Realistic Emissions of Source Aerosols (TERESA) study involved withdrawal, aging, and atmospheric transformation of emissions of three coal-fired power plants. Toxicological evaluations were carried out in rats exposed to different emission scenarios with extensive exposure characterization. Data generated had multiple levels of resolution: exposure, scenario, and constituent chemical composition. Here, we outline a multilayered approach to analyze the associations between exposure and health effects beginning with standard ANOVA models that treat exposure as a categorical variable. The model assessed differences in exposure effects across scenarios (by plant). To assess unadjusted associations between pollutant concentrations and health, univariate analyses were conducted using the difference between the response means under exposed and control conditions and a single constituent concentration as the predictor. Then, a novel multivariate analysis of exposure composition and health was used based on Random Forests(™), a recent extension of classification and regression trees that were applied to the outcome differences. For each exposure constituent, this approach yielded a nonparametric measure of the importance of that constituent in predicting differences in response on a given day, controlling for the other measured constituent concentrations in the model. Finally, an R(2) analysis compared the relative importance of exposure scenario, plant, and constituent concentrations on each outcome. Peak expiratory flow (PEF) is used to demonstrate how the multiple levels of the analysis complement each other to assess constituents most strongly associated with health effects.

  6. SPReM: Sparse Projection Regression Model For High-dimensional Linear Regression *

    PubMed Central

    Sun, Qiang; Zhu, Hongtu; Liu, Yufeng; Ibrahim, Joseph G.

    2014-01-01

    The aim of this paper is to develop a sparse projection regression modeling (SPReM) framework to perform multivariate regression modeling with a large number of responses and a multivariate covariate of interest. We propose two novel heritability ratios to simultaneously perform dimension reduction, response selection, estimation, and testing, while explicitly accounting for correlations among multivariate responses. Our SPReM is devised to specifically address the low statistical power issue of many standard statistical approaches, such as the Hotelling’s T2 test statistic or a mass univariate analysis, for high-dimensional data. We formulate the estimation problem of SPREM as a novel sparse unit rank projection (SURP) problem and propose a fast optimization algorithm for SURP. Furthermore, we extend SURP to the sparse multi-rank projection (SMURP) by adopting a sequential SURP approximation. Theoretically, we have systematically investigated the convergence properties of SURP and the convergence rate of SURP estimates. Our simulation results and real data analysis have shown that SPReM out-performs other state-of-the-art methods. PMID:26527844

  7. Non-parametric identification of multivariable systems: A local rational modeling approach with application to a vibration isolation benchmark

    NASA Astrophysics Data System (ADS)

    Voorhoeve, Robbert; van der Maas, Annemiek; Oomen, Tom

    2018-05-01

    Frequency response function (FRF) identification is often used as a basis for control systems design and as a starting point for subsequent parametric system identification. The aim of this paper is to develop a multiple-input multiple-output (MIMO) local parametric modeling approach for FRF identification of lightly damped mechanical systems with improved speed and accuracy. The proposed method is based on local rational models, which can efficiently handle the lightly-damped resonant dynamics. A key aspect herein is the freedom in the multivariable rational model parametrizations. Several choices for such multivariable rational model parametrizations are proposed and investigated. For systems with many inputs and outputs the required number of model parameters can rapidly increase, adversely affecting the performance of the local modeling approach. Therefore, low-order model structures are investigated. The structure of these low-order parametrizations leads to an undesired directionality in the identification problem. To address this, an iterative local rational modeling algorithm is proposed. As a special case recently developed SISO algorithms are recovered. The proposed approach is successfully demonstrated on simulations and on an active vibration isolation system benchmark, confirming good performance of the method using significantly less parameters compared with alternative approaches.

  8. Nonparametric method for genomics-based prediction of performance of quantitative traits involving epistasis in plant breeding.

    PubMed

    Sun, Xiaochun; Ma, Ping; Mumm, Rita H

    2012-01-01

    Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression.

  9. Nonparametric Method for Genomics-Based Prediction of Performance of Quantitative Traits Involving Epistasis in Plant Breeding

    PubMed Central

    Sun, Xiaochun; Ma, Ping; Mumm, Rita H.

    2012-01-01

    Genomic selection (GS) procedures have proven useful in estimating breeding value and predicting phenotype with genome-wide molecular marker information. However, issues of high dimensionality, multicollinearity, and the inability to deal effectively with epistasis can jeopardize accuracy and predictive ability. We, therefore, propose a new nonparametric method, pRKHS, which combines the features of supervised principal component analysis (SPCA) and reproducing kernel Hilbert spaces (RKHS) regression, with versions for traits with no/low epistasis, pRKHS-NE, to high epistasis, pRKHS-E. Instead of assigning a specific relationship to represent the underlying epistasis, the method maps genotype to phenotype in a nonparametric way, thus requiring fewer genetic assumptions. SPCA decreases the number of markers needed for prediction by filtering out low-signal markers with the optimal marker set determined by cross-validation. Principal components are computed from reduced marker matrix (called supervised principal components, SPC) and included in the smoothing spline ANOVA model as independent variables to fit the data. The new method was evaluated in comparison with current popular methods for practicing GS, specifically RR-BLUP, BayesA, BayesB, as well as a newer method by Crossa et al., RKHS-M, using both simulated and real data. Results demonstrate that pRKHS generally delivers greater predictive ability, particularly when epistasis impacts trait expression. Beyond prediction, the new method also facilitates inferences about the extent to which epistasis influences trait expression. PMID:23226325

  10. Establishment of Biological Reference Intervals and Reference Curve for Urea by Exploratory Parametric and Non-Parametric Quantile Regression Models.

    PubMed

    Sarkar, Rajarshi

    2013-07-01

    The validity of the entire renal function tests as a diagnostic tool depends substantially on the Biological Reference Interval (BRI) of urea. Establishment of BRI of urea is difficult partly because exclusion criteria for selection of reference data are quite rigid and partly due to the compartmentalization considerations regarding age and sex of the reference individuals. Moreover, construction of Biological Reference Curve (BRC) of urea is imperative to highlight the partitioning requirements. This a priori study examines the data collected by measuring serum urea of 3202 age and sex matched individuals, aged between 1 and 80 years, by a kinetic UV Urease/GLDH method on a Roche Cobas 6000 auto-analyzer. Mann-Whitney U test of the reference data confirmed the partitioning requirement by both age and sex. Further statistical analysis revealed the incompatibility of the data for a proposed parametric model. Hence the data was non-parametrically analysed. BRI was found to be identical for both sexes till the 2(nd) decade, and the BRI for males increased progressively 6(th) decade onwards. Four non-parametric models were postulated for construction of BRC: Gaussian kernel, double kernel, local mean and local constant, of which the last one generated the best-fitting curves. Clinical decision making should become easier and diagnostic implications of renal function tests should become more meaningful if this BRI is followed and the BRC is used as a desktop tool in conjunction with similar data for serum creatinine.

  11. Evaluating effects of developmental education for college students using a regression discontinuity design.

    PubMed

    Moss, Brian G; Yeaton, William H

    2013-10-01

    Annually, American colleges and universities provide developmental education (DE) to millions of underprepared students; however, evaluation estimates of DE benefits have been mixed. Using a prototypic exemplar of DE, our primary objective was to investigate the utility of a replicative evaluative framework for assessing program effectiveness. Within the context of the regression discontinuity (RD) design, this research examined the effectiveness of a DE program for five, sequential cohorts of first-time college students. Discontinuity estimates were generated for individual terms and cumulatively, across terms. Participants were 3,589 first-time community college students. DE program effects were measured by contrasting both college-level English grades and a dichotomous measure of pass/fail, for DE and non-DE students. Parametric and nonparametric estimates of overall effect were positive for continuous and dichotomous measures of achievement (grade and pass/fail). The variability of program effects over time was determined by tracking results within individual terms and cumulatively, across terms. Applying this replication strategy, DE's overall impact was modest (an effect size of approximately .20) but quite consistent, based on parametric and nonparametric estimation approaches. A meta-analysis of five RD results yielded virtually the same estimate as the overall, parametric findings. Subset analysis, though tentative, suggested that males benefited more than females, while academic gains were comparable for different ethnicities. The cumulative, within-study comparison, replication approach offers considerable potential for the evaluation of new and existing policies, particularly when effects are relatively small, as is often the case in applied settings.

  12. SigEMD: A powerful method for differential gene expression analysis in single-cell RNA sequencing data.

    PubMed

    Wang, Tianyu; Nabavi, Sheida

    2018-04-24

    Differential gene expression analysis is one of the significant efforts in single cell RNA sequencing (scRNAseq) analysis to discover the specific changes in expression levels of individual cell types. Since scRNAseq exhibits multimodality, large amounts of zero counts, and sparsity, it is different from the traditional bulk RNA sequencing (RNAseq) data. The new challenges of scRNAseq data promote the development of new methods for identifying differentially expressed (DE) genes. In this study, we proposed a new method, SigEMD, that combines a data imputation approach, a logistic regression model and a nonparametric method based on the Earth Mover's Distance, to precisely and efficiently identify DE genes in scRNAseq data. The regression model and data imputation are used to reduce the impact of large amounts of zero counts, and the nonparametric method is used to improve the sensitivity of detecting DE genes from multimodal scRNAseq data. By additionally employing gene interaction network information to adjust the final states of DE genes, we further reduce the false positives of calling DE genes. We used simulated datasets and real datasets to evaluate the detection accuracy of the proposed method and to compare its performance with those of other differential expression analysis methods. Results indicate that the proposed method has an overall powerful performance in terms of precision in detection, sensitivity, and specificity. Copyright © 2018 Elsevier Inc. All rights reserved.

  13. EXTENDING MULTIVARIATE DISTANCE MATRIX REGRESSION WITH AN EFFECT SIZE MEASURE AND THE ASYMPTOTIC NULL DISTRIBUTION OF THE TEST STATISTIC

    PubMed Central

    McArtor, Daniel B.; Lubke, Gitta H.; Bergeman, C. S.

    2017-01-01

    Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains. PMID:27738957

  14. Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic.

    PubMed

    McArtor, Daniel B; Lubke, Gitta H; Bergeman, C S

    2017-12-01

    Person-centered methods are useful for studying individual differences in terms of (dis)similarities between response profiles on multivariate outcomes. Multivariate distance matrix regression (MDMR) tests the significance of associations of response profile (dis)similarities and a set of predictors using permutation tests. This paper extends MDMR by deriving and empirically validating the asymptotic null distribution of its test statistic, and by proposing an effect size for individual outcome variables, which is shown to recover true associations. These extensions alleviate the computational burden of permutation tests currently used in MDMR and render more informative results, thus making MDMR accessible to new research domains.

  15. Serum cardiac troponin I in canine syncope and seizures.

    PubMed

    Dutton, E; Dukes-McEwan, J; Cripps, P J

    2017-02-01

    To determine if serum cardiac troponin I (cTnI) concentration distinguishes between cardiogenic syncope and collapsing dogs presenting with either generalized epileptic seizures (both with and without cardiac disease) or vasovagal syncope. Seventy-nine prospectively recruited dogs, grouped according to aetiology of collapse: generalized epileptic seizures (group E), cardiogenic syncope (group C), dogs with both epileptic seizures and cardiac disease (group B), vasovagal syncope (group V) or unclassified (group U). Most patients had ECG (n = 78), echocardiography (n = 78) and BP measurement (n = 74) performed. Dogs with a history of intoxications, trauma, evidence of metabolic disorders or renal insufficiency (based on serum creatinine concentrations >150 μmol/L and urine specific gravity <1.030) were excluded. Serum cTnI concentrations were measured and compared between groups using non-parametric statistical methods. Multivariable regression analysis investigated factors associated with cTnI. Receiver operator characteristic curve analysis examined whether cTnI could identify cardiogenic syncope. Median cTnI concentrations were higher in group C than E (cTnI: 0.165 [0.02-27.41] vs. 0.03 [0.01-1.92] ng/mL; p<0.05). Regression analysis found that serum cTnI concentrations decreased with increasing time from collapse (p=0.015) and increased with increasing creatinine concentration (p=0.028). Serum cTnI diagnosed cardiogenic syncope with a sensitivity of 75% and specificity of 80%. Serum cTnI concentrations were significantly different between groups C and E. However, due to the overlap in cTnI concentrations between groups cTnI, measurement in an individual is not optimally discriminatory to differentiate cardiogenic syncope from collapse with generalized epileptic seizures (both with and without cardiac disease) or vasovagal syncope. Copyright © 2016 Elsevier B.V. All rights reserved.

  16. Tryptophan Metabolism and Its Relationship with Depression and Cognitive Impairment Among HIV-infected Individuals.

    PubMed

    Keegan, Michael R; Chittiprol, Seetharamaiah; Letendre, Scott L; Winston, Alan; Fuchs, Dietmar; Boasso, Adriano; Iudicello, Jennifer; Ellis, Ronald J

    2016-01-01

    Cognitive impairment (CI) and major depressive disorder (MDD) remain prevalent in treated HIV-1 disease; however, the pathogenesis remains elusive. A possible contributing mechanism is immune-mediated degradation of tryptophan (TRP) via the kynurenine (KYN) pathway, resulting in decreased production of serotonin and accumulation of TRP degradation products. We explored the association of these biochemical pathways and their relationship with CI and MDD in HIV-positive (HIV+) individuals. In a cross-sectional analysis, concentrations of neopterin (NEO), tumor necrosis factor-alpha, TRP, KYN, KYN/TRP ratio, phenylalanine (PHE), tyrosine (TYR), PHE/TYR ratio, and nitrite were assessed in the cerebrospinal fluid (CSF) and plasma of HIV+ ( n = 91) and HIV-negative (HIV-) individuals ( n = 66). CI and MDD were assessed via a comprehensive neuropsychological test battery. A Global Deficit Score ≥0.5 was defined as CI. Nonparametric statistical analyses included Kruskal-Wallis and Mann-Whitney U tests, and multivariate logistic regression. Following Bonferroni correction, NEO concentrations were found to be greater in CSF and TRP concentration was found to be lower in the plasma of HIV+ versus HIV- individuals, including a subgroup of aviremic (defined as HIV-1 RNA <50 cps/mL) HIV+ participants receiving antiretroviral therapy ( n = 44). There was a nonsignificant trend toward higher KYN/TRP ratios in plasma in the HIV+ group ( P = 0.027; Bonferroni corrected α = 0.0027). In a logistic regression model, lower KYN/TRP ratios in plasma were associated with CI and MDD in the overall HIV+ group ( P = 0.038 and P = 0.063, respectively) and the aviremic subgroup ( P = 0.066 and P = 0.027, respectively), though this observation was not statistically significant following Bonferroni correction (Bonferroni corrected α = 0.0031). We observed a trend toward lower KYN/TRP ratios in aviremic HIV+ patients with CI and MDD.

  17. The relationship between psychosocial status, acculturation and country of origin in mid-life Hispanic women: data from the Study of Women's Health Across the Nation (SWAN).

    PubMed

    Green, R; Santoro, N F; McGinn, A P; Wildman, R P; Derby, C A; Polotsky, A J; Weiss, G

    2010-12-01

    To test the hypothesis that psychosocial symptomatology differs by country of origin and acculturation among Hispanic women, we examined 419 women, aged 42-52 years at baseline, enrolled in the New Jersey site of the Study of Women's Health Across the Nation (SWAN). Women were categorized into six groups: Central (CA, n = 29) or South American (SA, n = 106), Puerto Rican (PR, n = 56), Dominican (D, n = 42), Cuban (Cu, n = 44) and non-Hispanic Caucasian (NHC, n = 142). Acculturation, depressive symptoms, hostility/cynicism, mistreatment/discrimination, sleep quality, social support, and perceived stress were assessed at baseline. Physical functioning, trait anxiety and anger were assessed at the fourth annual follow-up. Comparisons between Hispanic and non-Hispanic Caucasians used χ², t test or non-parametric alternatives; ANOVA or Kruskal-Wallis testing examined differences among the five Hispanic sub-groups. Multivariable regression models used PR women as the reference group. Hispanic women were overall less educated, less acculturated (p < 0.001 for both) and reported more depressive symptoms, cynicism, perceived stress, and less mistreatment/discrimination than NHCs. Along with D women, PR women reported worse sleep than Cu women (p < 0.01) and more trait anxiety than SA and Cu women (p < 0.01). Yet, PR women were most acculturated (21.4% highly acculturated vs. CA (0.0%), D (4.8%), SA (4.8%) and Cu (2.3%) women; p < 0.001). In regression models, PR women reported depressive symptoms more frequently than D, Cu, or SA women, and reported trait anxiety more frequently than Cu or SA women. Greater acculturation was associated with more favorable psychosocial status, but PR ethnicity was negatively related to psychosocial status. Psychosocial symptomatology among Hispanic women differs by country of origin and the relatively adverse profile of Puerto Rican women is not explained by acculturation.

  18. Logistic models--an odd(s) kind of regression.

    PubMed

    Jupiter, Daniel C

    2013-01-01

    The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  19. The challenge for genetic epidemiologists: how to analyze large numbers of SNPs in relation to complex diseases.

    PubMed

    Heidema, A Geert; Boer, Jolanda M A; Nagelkerke, Nico; Mariman, Edwin C M; van der A, Daphne L; Feskens, Edith J M

    2006-04-21

    Genetic epidemiologists have taken the challenge to identify genetic polymorphisms involved in the development of diseases. Many have collected data on large numbers of genetic markers but are not familiar with available methods to assess their association with complex diseases. Statistical methods have been developed for analyzing the relation between large numbers of genetic and environmental predictors to disease or disease-related variables in genetic association studies. In this commentary we discuss logistic regression analysis, neural networks, including the parameter decreasing method (PDM) and genetic programming optimized neural networks (GPNN) and several non-parametric methods, which include the set association approach, combinatorial partitioning method (CPM), restricted partitioning method (RPM), multifactor dimensionality reduction (MDR) method and the random forests approach. The relative strengths and weaknesses of these methods are highlighted. Logistic regression and neural networks can handle only a limited number of predictor variables, depending on the number of observations in the dataset. Therefore, they are less useful than the non-parametric methods to approach association studies with large numbers of predictor variables. GPNN on the other hand may be a useful approach to select and model important predictors, but its performance to select the important effects in the presence of large numbers of predictors needs to be examined. Both the set association approach and random forests approach are able to handle a large number of predictors and are useful in reducing these predictors to a subset of predictors with an important contribution to disease. The combinatorial methods give more insight in combination patterns for sets of genetic and/or environmental predictor variables that may be related to the outcome variable. As the non-parametric methods have different strengths and weaknesses we conclude that to approach genetic association studies using the case-control design, the application of a combination of several methods, including the set association approach, MDR and the random forests approach, will likely be a useful strategy to find the important genes and interaction patterns involved in complex diseases.

  20. Sinogram restoration for ultra-low-dose x-ray multi-slice helical CT by nonparametric regression

    NASA Astrophysics Data System (ADS)

    Jiang, Lu; Siddiqui, Khan; Zhu, Bin; Tao, Yang; Siegel, Eliot

    2007-03-01

    During the last decade, x-ray computed tomography (CT) has been applied to screen large asymptomatic smoking and nonsmoking populations for early lung cancer detection. Because a larger population will be involved in such screening exams, more and more attention has been paid to studying low-dose, even ultra-low-dose x-ray CT. However, reducing CT radiation exposure will increase noise level in the sinogram, thereby degrading the quality of reconstructed CT images as well as causing more streak artifacts near the apices of the lung. Thus, how to reduce the noise levels and streak artifacts in the low-dose CT images is becoming a meaningful topic. Since multi-slice helical CT has replaced conventional stop-and-shoot CT in many clinical applications, this research mainly focused on the noise reduction issue in multi-slice helical CT. The experiment data were provided by Siemens SOMATOM Sensation 16-Slice helical CT. It included both conventional CT data acquired under 120 kvp voltage and 119 mA current and ultra-low-dose CT data acquired under 120 kvp and 10 mA protocols. All other settings are the same as that of conventional CT. In this paper, a nonparametric smoothing method with thin plate smoothing splines and the roughness penalty was proposed to restore the ultra-low-dose CT raw data. Each projection frame was firstly divided into blocks, and then the 2D data in each block was fitted to a thin-plate smoothing splines' surface via minimizing a roughness-penalized least squares objective function. By doing so, the noise in each ultra-low-dose CT projection was reduced by leveraging the information contained not only within each individual projection profile, but also among nearby profiles. Finally the restored ultra-low-dose projection data were fed into standard filtered back projection (FBP) algorithm to reconstruct CT images. The rebuilt results as well as the comparison between proposed approach and traditional method were given in the results and discussions section, and showed effectiveness of proposed thin-plate based nonparametric regression method.

  1. Essays in applied macroeconomics: Asymmetric price adjustment, exchange rate and treatment effect

    NASA Astrophysics Data System (ADS)

    Gu, Jingping

    This dissertation consists of three essays. Chapter II examines the possible asymmetric response of gasoline prices to crude oil price changes using an error correction model with GARCH errors. Recent papers have looked at this issue. Some of these papers estimate a form of error correction model, but none of them accounts for autoregressive heteroskedasticity in estimation and testing for asymmetry and none of them takes the response of crude oil price into consideration. We find that time-varying volatility of gasoline price disturbances is an important feature of the data, and when we allow for asymmetric GARCH errors and investigate the system wide impulse response function, we find evidence of asymmetric adjustment to crude oil price changes in weekly retail gasoline prices. Chapter III discusses the relationship between fiscal deficit and exchange rate. Economic theory predicts that fiscal deficits can significantly affect real exchange rate movements, but existing empirical evidence reports only a weak impact of fiscal deficits on exchange rates. Based on US dollar-based real exchange rates in G5 countries and a flexible varying coefficient model, we show that the previously documented weak relationship between fiscal deficits and exchange rates may be the result of additive specifications, and that the relationship is stronger if we allow fiscal deficits to impact real exchange rates non-additively as well as nonlinearly. We find that the speed of exchange rate adjustment toward equilibrium depends on the state of the fiscal deficit; a fiscal contraction in the US can lead to less persistence in the deviation of exchange rates from fundamentals, and faster mean reversion to the equilibrium. Chapter IV proposes a kernel method to deal with the nonparametric regression model with only discrete covariates as regressors. This new approach is based on recently developed least squares cross-validation kernel smoothing method. It can not only automatically smooth the irrelevant variables out of the nonparametric regression model, but also avoid the problem of loss of efficiency related to the traditional nonparametric frequency-based method and the problem of misspecification based on parametric model.

  2. Retro-regression--another important multivariate regression improvement.

    PubMed

    Randić, M

    2001-01-01

    We review the serious problem associated with instabilities of the coefficients of regression equations, referred to as the MRA (multivariate regression analysis) "nightmare of the first kind". This is manifested when in a stepwise regression a descriptor is included or excluded from a regression. The consequence is an unpredictable change of the coefficients of the descriptors that remain in the regression equation. We follow with consideration of an even more serious problem, referred to as the MRA "nightmare of the second kind", arising when optimal descriptors are selected from a large pool of descriptors. This process typically causes at different steps of the stepwise regression a replacement of several previously used descriptors by new ones. We describe a procedure that resolves these difficulties. The approach is illustrated on boiling points of nonanes which are considered (1) by using an ordered connectivity basis; (2) by using an ordering resulting from application of greedy algorithm; and (3) by using an ordering derived from an exhaustive search for optimal descriptors. A novel variant of multiple regression analysis, called retro-regression (RR), is outlined showing how it resolves the ambiguities associated with both "nightmares" of the first and the second kind of MRA.

  3. Linear Multivariable Regression Models for Prediction of Eddy Dissipation Rate from Available Meteorological Data

    NASA Technical Reports Server (NTRS)

    MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.

    2005-01-01

    Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.

  4. Assessing Principal Component Regression Prediction of Neurochemicals Detected with Fast-Scan Cyclic Voltammetry

    PubMed Central

    2011-01-01

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586

  5. Parameter estimation of multivariate multiple regression model using bayesian with non-informative Jeffreys’ prior distribution

    NASA Astrophysics Data System (ADS)

    Saputro, D. R. S.; Amalia, F.; Widyaningsih, P.; Affan, R. C.

    2018-05-01

    Bayesian method is a method that can be used to estimate the parameters of multivariate multiple regression model. Bayesian method has two distributions, there are prior and posterior distributions. Posterior distribution is influenced by the selection of prior distribution. Jeffreys’ prior distribution is a kind of Non-informative prior distribution. This prior is used when the information about parameter not available. Non-informative Jeffreys’ prior distribution is combined with the sample information resulting the posterior distribution. Posterior distribution is used to estimate the parameter. The purposes of this research is to estimate the parameters of multivariate regression model using Bayesian method with Non-informative Jeffreys’ prior distribution. Based on the results and discussion, parameter estimation of β and Σ which were obtained from expected value of random variable of marginal posterior distribution function. The marginal posterior distributions for β and Σ are multivariate normal and inverse Wishart. However, in calculation of the expected value involving integral of a function which difficult to determine the value. Therefore, approach is needed by generating of random samples according to the posterior distribution characteristics of each parameter using Markov chain Monte Carlo (MCMC) Gibbs sampling algorithm.

  6. Field applications of stand-off sensing using visible/NIR multivariate optical computing

    NASA Astrophysics Data System (ADS)

    Eastwood, DeLyle; Soyemi, Olusola O.; Karunamuni, Jeevanandra; Zhang, Lixia; Li, Hongli; Myrick, Michael L.

    2001-02-01

    12 A novel multivariate visible/NIR optical computing approach applicable to standoff sensing will be demonstrated with porphyrin mixtures as examples. The ultimate goal is to develop environmental or counter-terrorism sensors for chemicals such as organophosphorus (OP) pesticides or chemical warfare simulants in the near infrared spectral region. The mathematical operation that characterizes prediction of properties via regression from optical spectra is a calculation of inner products between the spectrum and the pre-determined regression vector. The result is scaled appropriately and offset to correspond to the basis from which the regression vector is derived. The process involves collecting spectroscopic data and synthesizing a multivariate vector using a pattern recognition method. Then, an interference coating is designed that reproduces the pattern of the multivariate vector in its transmission or reflection spectrum, and appropriate interference filters are fabricated. High and low refractive index materials such as Nb2O5 and SiO2 are excellent choices for the visible and near infrared regions. The proof of concept has now been established for this system in the visible and will later be extended to chemicals such as OP compounds in the near and mid-infrared.

  7. Assessing principal component regression prediction of neurochemicals detected with fast-scan cyclic voltammetry.

    PubMed

    Keithley, Richard B; Wightman, R Mark

    2011-06-07

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.

  8. A refined method for multivariate meta-analysis and meta-regression.

    PubMed

    Jackson, Daniel; Riley, Richard D

    2014-02-20

    Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects' standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. Copyright © 2013 John Wiley & Sons, Ltd.

  9. Newer classification and regression tree techniques: Bagging and Random Forests for ecological prediction

    Treesearch

    Anantha M. Prasad; Louis R. Iverson; Andy Liaw; Andy Liaw

    2006-01-01

    We evaluated four statistical models - Regression Tree Analysis (RTA), Bagging Trees (BT), Random Forests (RF), and Multivariate Adaptive Regression Splines (MARS) - for predictive vegetation mapping under current and future climate scenarios according to the Canadian Climate Centre global circulation model.

  10. A Semiparametric Change-Point Regression Model for Longitudinal Observations.

    PubMed

    Xing, Haipeng; Ying, Zhiliang

    2012-12-01

    Many longitudinal studies involve relating an outcome process to a set of possibly time-varying covariates, giving rise to the usual regression models for longitudinal data. When the purpose of the study is to investigate the covariate effects when experimental environment undergoes abrupt changes or to locate the periods with different levels of covariate effects, a simple and easy-to-interpret approach is to introduce change-points in regression coefficients. In this connection, we propose a semiparametric change-point regression model, in which the error process (stochastic component) is nonparametric and the baseline mean function (functional part) is completely unspecified, the observation times are allowed to be subject-specific, and the number, locations and magnitudes of change-points are unknown and need to be estimated. We further develop an estimation procedure which combines the recent advance in semiparametric analysis based on counting process argument and multiple change-points inference, and discuss its large sample properties, including consistency and asymptotic normality, under suitable regularity conditions. Simulation results show that the proposed methods work well under a variety of scenarios. An application to a real data set is also given.

  11. Application of empirical mode decomposition with local linear quantile regression in financial time series forecasting.

    PubMed

    Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M

    2014-01-01

    This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.

  12. Functional Relationships and Regression Analysis.

    ERIC Educational Resources Information Center

    Preece, Peter F. W.

    1978-01-01

    Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…

  13. A comparison of patency and interventions in thigh versus Hemodialysis Reliable Outflow grafts for chronic hemodialysis vascular access.

    PubMed

    Brownie, Evan R; Kensinger, Clark D; Feurer, Irene D; Moore, Derek E; Shaffer, David

    2016-11-01

    With improvements in medical management and survival of patients with end-stage renal disease, maintaining durable vascular access is increasingly challenging. This study compared primary, assisted primary, and secondary patency, and procedure-specific complications, and evaluated whether the number of interventions to maintain or restore patency differed between prosthetic femoral-femoral looped inguinal access (thigh) grafts and Hemodialysis Reliable Outflow (HeRO; Hemosphere Inc, Minneapolis, Minn) grafts. A single-center, retrospective, intention-to-treat analysis was conducted of consecutive thigh and HeRO grafts placed between May 2004 and June 2015. Medical history, interventions to maintain or restore patency, and complications were abstracted from the electronic medical record. Data were analyzed using parametric and nonparametric statistical tests, Kaplan-Meier survival methods, and multivariable proportional hazards regression and logistic regression. Seventy-six (43 thigh, 33 HeRO) grafts were placed in 61 patients (54% male; age 53 [standard deviation, 13] years). Median follow-up time in the intention-to-treat analysis was 21.2 months (min, 0.0; max, 85.3 months) for thigh grafts and 6.7 months (min, 0.0; max, 56.3 months) for HeRO grafts (P = .02). The groups were comparable for sex, age, coronary artery disease, diabetes mellitus, peripheral vascular disease, and smoking history (all P ≥ .12). One thigh graft (2%) and five HeRO (15%) grafts failed primarily. In the intention-to-treat analysis, patency durations were significantly longer in the thigh grafts (all log-rank P ≤ .01). Point estimates of primary patency at 6 months, 1 year, and 3 years were 61%, 46%, and 4% for the thigh grafts and 25%, 15%, and 6% for the HeRO grafts. Point estimates of assisted primary patency at 6 months, 1 year, and 3 years were 75%, 66%, and 54% for the thigh grafts and 41%, 30%, and 10% for the HeRO grafts. Point estimates of secondary patency at 6 months, 1 year, and 3 years were 88%, 88%, and 70% for the thigh grafts and 53%, 43%, and 12% for the HeRO grafts. There were no differences in ischemic (P = .63) or infectious (P = .79) complications between the groups. Multivariable logistic regression demonstrated that after adjusting for follow-up time, HeRO grafts were associated with an increased number of interventions (P = .03). Thigh grafts have significantly better primary, assisted primary, and secondary patency compared with HeRO grafts. There is no significant difference between thigh grafts and HeRO grafts in ischemic or infectious complications. Our logistic regression model demonstrated an association between HeRO grafts and an increased number of interventions to maintain or restore patency. Although HeRO grafts may extend the use of the upper extremity, thigh grafts provide a more durable option for chronic hemodialysis. Copyright © 2016 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.

  14. Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.

    PubMed

    Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C

    2014-03-01

    In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.

  15. High-throughput quantitative biochemical characterization of algal biomass by NIR spectroscopy; multiple linear regression and multivariate linear regression analysis.

    PubMed

    Laurens, L M L; Wolfrum, E J

    2013-12-18

    One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.

  16. Membrane Introduction Mass Spectrometry Combined with an Orthogonal Partial-Least Squares Calibration Model for Mixture Analysis.

    PubMed

    Li, Min; Zhang, Lu; Yao, Xiaolong; Jiang, Xingyu

    2017-01-01

    The emerging membrane introduction mass spectrometry technique has been successfully used to detect benzene, toluene, ethyl benzene and xylene (BTEX), while overlapped spectra have unfortunately hindered its further application to the analysis of mixtures. Multivariate calibration, an efficient method to analyze mixtures, has been widely applied. In this paper, we compared univariate and multivariate analyses for quantification of the individual components of mixture samples. The results showed that the univariate analysis creates poor models with regression coefficients of 0.912, 0.867, 0.440 and 0.351 for BTEX, respectively. For multivariate analysis, a comparison to the partial-least squares (PLS) model shows that the orthogonal partial-least squares (OPLS) regression exhibits an optimal performance with regression coefficients of 0.995, 0.999, 0.980 and 0.976, favorable calibration parameters (RMSEC and RMSECV) and a favorable validation parameter (RMSEP). Furthermore, the OPLS exhibits a good recovery of 73.86 - 122.20% and relative standard deviation (RSD) of the repeatability of 1.14 - 4.87%. Thus, MIMS coupled with the OPLS regression provides an optimal approach for a quantitative BTEX mixture analysis in monitoring and predicting water pollution.

  17. Local linear regression for function learning: an analysis based on sample discrepancy.

    PubMed

    Cervellera, Cristiano; Macciò, Danilo

    2014-11-01

    Local linear regression models, a kind of nonparametric structures that locally perform a linear estimation of the target function, are analyzed in the context of empirical risk minimization (ERM) for function learning. The analysis is carried out with emphasis on geometric properties of the available data. In particular, the discrepancy of the observation points used both to build the local regression models and compute the empirical risk is considered. This allows to treat indifferently the case in which the samples come from a random external source and the one in which the input space can be freely explored. Both consistency of the ERM procedure and approximating capabilities of the estimator are analyzed, proving conditions to ensure convergence. Since the theoretical analysis shows that the estimation improves as the discrepancy of the observation points becomes smaller, low-discrepancy sequences, a family of sampling methods commonly employed for efficient numerical integration, are also analyzed. Simulation results involving two different examples of function learning are provided.

  18. Non-Parametric Blur Map Regression for Depth of Field Extension.

    PubMed

    D'Andres, Laurent; Salvador, Jordi; Kochale, Axel; Susstrunk, Sabine

    2016-04-01

    Real camera systems have a limited depth of field (DOF) which may cause an image to be degraded due to visible misfocus or too shallow DOF. In this paper, we present a blind deblurring pipeline able to restore such images by slightly extending their DOF and recovering sharpness in regions slightly out of focus. To address this severely ill-posed problem, our algorithm relies first on the estimation of the spatially varying defocus blur. Drawing on local frequency image features, a machine learning approach based on the recently introduced regression tree fields is used to train a model able to regress a coherent defocus blur map of the image, labeling each pixel by the scale of a defocus point spread function. A non-blind spatially varying deblurring algorithm is then used to properly extend the DOF of the image. The good performance of our algorithm is assessed both quantitatively, using realistic ground truth data obtained with a novel approach based on a plenoptic camera, and qualitatively with real images.

  19. Mixed effect Poisson log-linear models for clinical and epidemiological sleep hypnogram data

    PubMed Central

    Swihart, Bruce J.; Caffo, Brian S.; Crainiceanu, Ciprian; Punjabi, Naresh M.

    2013-01-01

    Bayesian Poisson log-linear multilevel models scalable to epidemiological studies are proposed to investigate population variability in sleep state transition rates. Hierarchical random effects are used to account for pairings of subjects and repeated measures within those subjects, as comparing diseased to non-diseased subjects while minimizing bias is of importance. Essentially, non-parametric piecewise constant hazards are estimated and smoothed, allowing for time-varying covariates and segment of the night comparisons. The Bayesian Poisson regression is justified through a re-derivation of a classical algebraic likelihood equivalence of Poisson regression with a log(time) offset and survival regression assuming exponentially distributed survival times. Such re-derivation allows synthesis of two methods currently used to analyze sleep transition phenomena: stratified multi-state proportional hazards models and log-linear models with GEE for transition counts. An example data set from the Sleep Heart Health Study is analyzed. Supplementary material includes the analyzed data set as well as the code for a reproducible analysis. PMID:22241689

  20. Analysis of small sample size studies using nonparametric bootstrap test with pooled resampling method.

    PubMed

    Dwivedi, Alok Kumar; Mallawaarachchi, Indika; Alvarado, Luis A

    2017-06-30

    Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  1. Flood-frequency prediction methods for unregulated streams of Tennessee, 2000

    USGS Publications Warehouse

    Law, George S.; Tasker, Gary D.

    2003-01-01

    Up-to-date flood-frequency prediction methods for unregulated, ungaged rivers and streams of Tennessee have been developed. Prediction methods include the regional-regression method and the newer region-of-influence method. The prediction methods were developed using stream-gage records from unregulated streams draining basins having from 1 percent to about 30 percent total impervious area. These methods, however, should not be used in heavily developed or storm-sewered basins with impervious areas greater than 10 percent. The methods can be used to estimate 2-, 5-, 10-, 25-, 50-, 100-, and 500-year recurrence-interval floods of most unregulated rural streams in Tennessee. A computer application was developed that automates the calculation of flood frequency for unregulated, ungaged rivers and streams of Tennessee. Regional-regression equations were derived by using both single-variable and multivariable regional-regression analysis. Contributing drainage area is the explanatory variable used in the single-variable equations. Contributing drainage area, main-channel slope, and a climate factor are the explanatory variables used in the multivariable equations. Deleted-residual standard error for the single-variable equations ranged from 32 to 65 percent. Deleted-residual standard error for the multivariable equations ranged from 31 to 63 percent. These equations are included in the computer application to allow easy comparison of results produced by the different methods. The region-of-influence method calculates multivariable regression equations for each ungaged site and recurrence interval using basin characteristics from 60 similar sites selected from the study area. Explanatory variables that may be used in regression equations computed by the region-of-influence method include contributing drainage area, main-channel slope, a climate factor, and a physiographic-region factor. Deleted-residual standard error for the region-of-influence method tended to be only slightly smaller than those for the regional-regression method and ranged from 27 to 62 percent.

  2. Multivariate logistic regression analysis of postoperative complications and risk model establishment of gastrectomy for gastric cancer: A single-center cohort report.

    PubMed

    Zhou, Jinzhe; Zhou, Yanbing; Cao, Shougen; Li, Shikuan; Wang, Hao; Niu, Zhaojian; Chen, Dong; Wang, Dongsheng; Lv, Liang; Zhang, Jian; Li, Yu; Jiao, Xuelong; Tan, Xiaojie; Zhang, Jianli; Wang, Haibo; Zhang, Bingyuan; Lu, Yun; Sun, Zhenqing

    2016-01-01

    Reporting of surgical complications is common, but few provide information about the severity and estimate risk factors of complications. If have, but lack of specificity. We retrospectively analyzed data on 2795 gastric cancer patients underwent surgical procedure at the Affiliated Hospital of Qingdao University between June 2007 and June 2012, established multivariate logistic regression model to predictive risk factors related to the postoperative complications according to the Clavien-Dindo classification system. Twenty-four out of 86 variables were identified statistically significant in univariate logistic regression analysis, 11 significant variables entered multivariate analysis were employed to produce the risk model. Liver cirrhosis, diabetes mellitus, Child classification, invasion of neighboring organs, combined resection, introperative transfusion, Billroth II anastomosis of reconstruction, malnutrition, surgical volume of surgeons, operating time and age were independent risk factors for postoperative complications after gastrectomy. Based on logistic regression equation, p=Exp∑BiXi / (1+Exp∑BiXi), multivariate logistic regression predictive model that calculated the risk of postoperative morbidity was developed, p = 1/(1 + e((4.810-1.287X1-0.504X2-0.500X3-0.474X4-0.405X5-0.318X6-0.316X7-0.305X8-0.278X9-0.255X10-0.138X11))). The accuracy, sensitivity and specificity of the model to predict the postoperative complications were 86.7%, 76.2% and 88.6%, respectively. This risk model based on Clavien-Dindo grading severity of complications system and logistic regression analysis can predict severe morbidity specific to an individual patient's risk factors, estimate patients' risks and benefits of gastric surgery as an accurate decision-making tool and may serve as a template for the development of risk models for other surgical groups.

  3. PARAMETRIC AND NON PARAMETRIC (MARS: MULTIVARIATE ADDITIVE REGRESSION SPLINES) LOGISTIC REGRESSIONS FOR PREDICTION OF A DICHOTOMOUS RESPONSE VARIABLE WITH AN EXAMPLE FOR PRESENCE/ABSENCE OF AMPHIBIANS

    EPA Science Inventory

    The purpose of this report is to provide a reference manual that could be used by investigators for making informed use of logistic regression using two methods (standard logistic regression and MARS). The details for analyses of relationships between a dependent binary response ...

  4. Causal diagrams and multivariate analysis II: precision work.

    PubMed

    Jupiter, Daniel C

    2014-01-01

    In this Investigators' Corner, I continue my discussion of when and why we researchers should include variables in multivariate regression. My examination focuses on studies comparing treatment groups and situations for which we can either exclude variables from multivariate analyses or include them for reasons of precision. Copyright © 2014 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  5. Why are we regressing?

    PubMed

    Jupiter, Daniel C

    2012-01-01

    In this first of a series of statistical methodology commentaries for the clinician, we discuss the use of multivariate linear regression. Copyright © 2012 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.

  6. Revisiting the Distance Duality Relation using a non-parametric regression method

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rana, Akshay; Mahajan, Shobhit; Mukherjee, Amitabha

    2016-07-01

    The interdependence of luminosity distance, D {sub L} and angular diameter distance, D {sub A} given by the distance duality relation (DDR) is very significant in observational cosmology. It is very closely tied with the temperature-redshift relation of Cosmic Microwave Background (CMB) radiation. Any deviation from η( z )≡ D {sub L} / D {sub A} (1+ z ){sup 2} =1 indicates a possible emergence of new physics. Our aim in this work is to check the consistency of these relations using a non-parametric regression method namely, LOESS with SIMEX. This technique avoids dependency on the cosmological model and worksmore » with a minimal set of assumptions. Further, to analyze the efficiency of the methodology, we simulate a dataset of 020 points of η ( z ) data based on a phenomenological model η( z )= (1+ z ){sup ε}. The error on the simulated data points is obtained by using the temperature of CMB radiation at various redshifts. For testing the distance duality relation, we use the JLA SNe Ia data for luminosity distances, while the angular diameter distances are obtained from radio galaxies datasets. Since the DDR is linked with CMB temperature-redshift relation, therefore we also use the CMB temperature data to reconstruct η ( z ). It is important to note that with CMB data, we are able to study the evolution of DDR upto a very high redshift z = 2.418. In this analysis, we find no evidence of deviation from η=1 within a 1σ region in the entire redshift range used in this analysis (0 < z ≤ 2.418).« less

  7. [The short-term effects of air pollution on mortality. The results of the EMECAM project in the city of Vigo, 1991-94. Estudio Multicéntrico Español sobre la Relación entre la Contaminación Atmosférica y la Mortalidad].

    PubMed

    Taracido Trunk, M; Figueiras, A; Castro Lareo, I

    1999-01-01

    In the Autonomous Region of Galicia, no study has been made of the impacts of air pollution on human health, despite the fact that several of its major cities have moderate levels of pollution. Therefore, we have considered the need of making this study in the city of Vigo. The main objective of this analysis is that of analyzing the short-term impact of air pollution on the daily death rate for all reasons in the city of Vigo throughout the 1991-1994 period, by using the procedure for analysis set out as part of the EMECAM Project. The daily fluctuations in the number of deaths for all causes with the exception of the external ones are listed with the daily fluctuations of sulfur dioxide and particles using Poisson regression models. A non-parametric model is also used in order to better control the confusion variables. Using the Poisson regression model, no significant relationships have been found to exist between the pollutants and the death rate. In the non-parametric model, a relationship was found between the concentration of particles on the day immediately prior to the date of death and the death rate, an effect which remains unchanged on including the autoregressive terms. Particle-based air pollution is a health risk despite the average levels of this pollutant falling within the air quality guideline levels in the city of Vigo.

  8. A Semi-parametric Transformation Frailty Model for Semi-competing Risks Survival Data

    PubMed Central

    Jiang, Fei; Haneuse, Sebastien

    2016-01-01

    In the analysis of semi-competing risks data interest lies in estimation and inference with respect to a so-called non-terminal event, the observation of which is subject to a terminal event. Multi-state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non-terminal and terminal events specified, in part, by a unit-specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, σ2. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi-parametric efficient score under the complete data setting and propose a non-parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small-sample operating characteristics evaluated via simulation. Although the proposed semi-parametric transformation model and non-parametric score imputation method are motivated by the analysis of semi-competing risks data, they are broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer. PMID:28439147

  9. The Highly Adaptive Lasso Estimator

    PubMed Central

    Benkeser, David; van der Laan, Mark

    2017-01-01

    Estimation of a regression functions is a common goal of statistical learning. We propose a novel nonparametric regression estimator that, in contrast to many existing methods, does not rely on local smoothness assumptions nor is it constructed using local smoothing techniques. Instead, our estimator respects global smoothness constraints by virtue of falling in a class of right-hand continuous functions with left-hand limits that have variation norm bounded by a constant. Using empirical process theory, we establish a fast minimal rate of convergence of our proposed estimator and illustrate how such an estimator can be constructed using standard software. In simulations, we show that the finite-sample performance of our estimator is competitive with other popular machine learning techniques across a variety of data generating mechanisms. We also illustrate competitive performance in real data examples using several publicly available data sets. PMID:29094111

  10. A refined method for multivariate meta-analysis and meta-regression

    PubMed Central

    Jackson, Daniel; Riley, Richard D

    2014-01-01

    Making inferences about the average treatment effect using the random effects model for meta-analysis is problematic in the common situation where there is a small number of studies. This is because estimates of the between-study variance are not precise enough to accurately apply the conventional methods for testing and deriving a confidence interval for the average effect. We have found that a refined method for univariate meta-analysis, which applies a scaling factor to the estimated effects’ standard error, provides more accurate inference. We explain how to extend this method to the multivariate scenario and show that our proposal for refined multivariate meta-analysis and meta-regression can provide more accurate inferences than the more conventional approach. We explain how our proposed approach can be implemented using standard output from multivariate meta-analysis software packages and apply our methodology to two real examples. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:23996351

  11. Multivariate meta-analysis for non-linear and other multi-parameter associations

    PubMed Central

    Gasparrini, A; Armstrong, B; Kenward, M G

    2012-01-01

    In this paper, we formalize the application of multivariate meta-analysis and meta-regression to synthesize estimates of multi-parameter associations obtained from different studies. This modelling approach extends the standard two-stage analysis used to combine results across different sub-groups or populations. The most straightforward application is for the meta-analysis of non-linear relationships, described for example by regression coefficients of splines or other functions, but the methodology easily generalizes to any setting where complex associations are described by multiple correlated parameters. The modelling framework of multivariate meta-analysis is implemented in the package mvmeta within the statistical environment R. As an illustrative example, we propose a two-stage analysis for investigating the non-linear exposure–response relationship between temperature and non-accidental mortality using time-series data from multiple cities. Multivariate meta-analysis represents a useful analytical tool for studying complex associations through a two-stage procedure. Copyright © 2012 John Wiley & Sons, Ltd. PMID:22807043

  12. Full-waveform and discrete-return lidar in salt marsh environments: An assessment of biophysical parameters, vertical uncertatinty, and nonparametric dem correction

    NASA Astrophysics Data System (ADS)

    Rogers, Jeffrey N.

    High-resolution and high-accuracy elevation data sets of coastal salt marsh environments are necessary to support restoration and other management initiatives, such as adaptation to sea level rise. Lidar (light detection and ranging) data may serve this need by enabling efficient acquisition of detailed elevation data from an airborne platform. However, previous research has revealed that lidar data tend to have lower vertical accuracy (i.e., greater uncertainty) in salt marshes than in other environments. The increase in vertical uncertainty in lidar data of salt marshes can be attributed primarily to low, dense-growing salt marsh vegetation. Unfortunately, this increased vertical uncertainty often renders lidar-derived digital elevation models (DEM) ineffective for analysis of topographic features controlling tidal inundation frequency and ecology. This study aims to address these challenges by providing a detailed assessment of the factors influencing lidar-derived elevation uncertainty in marshes. The information gained from this assessment is then used to: 1) test the ability to predict marsh vegetation biophysical parameters from lidar-derived metrics, and 2) develop a method for improving salt marsh DEM accuracy. Discrete-return and full-waveform lidar, along with RTK GNSS (Real-time Kinematic Global Navigation Satellite System) reference data, were acquired for four salt marsh systems characterized by four major taxa (Spartina alterniflora, Spartina patens, Distichlis spicata, and Salicornia spp.) on Cape Cod, Massachusetts. These data were used to: 1) develop an innovative combination of full-waveform lidar and field methods to assess the vertical distribution of aboveground biomass as well as its light blocking properties; 2) investigate lidar elevation bias and standard deviation using varying interpolation and filtering methods; 3) evaluate the effects of seasonality (temporal differences between peak growth and senescent conditions) using lidar data flown in summer and spring; 4) create new products, called Relative Uncertainty Surfaces (RUS), from lidar waveform-derived metrics and determine their utility; and 5) develop and test five nonparametric regression model algorithms (MARS -- Multivariate Adaptive Regression, CART -- Classification and Regression Trees, TreeNet, Random Forests, and GPSM -- Generalized Path Seeker) with 13 predictor variables derived from both discrete and full waveform lidar sources in order to develop a method of improving lidar DEM quality. Results of this study indicate strong correlations for Spartina alterniflora (r > 0.9) between vertical biomass (VB), the distribution of vegetation biomass by height, and vertical obscuration (VO), the measure of the vertical distribution of the ratio of vegetation to airspace. It was determined that simple, feature-based lidar waveform metrics, such as waveform width, can provide new information to estimate salt marsh vegetation biophysical parameters such as vegetation height. The results also clearly illustrate the importance of seasonality, species, and lidar interpolation and filtering methods on elevation uncertainty in salt marshes. Relative uncertainty surfaces generated from lidar waveform features were determined useful in qualitative/visual assessment of lidar elevation uncertainty and correlate well with vegetation height and presence of Spartina alterniflora. Finally, DEMs generated using full-waveform predictor models produced corrections (compared to ground based RTK GNSS elevations) with R2 values of up to 0.98 and slopes within 4% of a perfect 1:1 correlation. The findings from this research have strong potential to advance tidal marsh mapping, research and management initiatives.

  13. Spatial Variability of Soil-Water Storage in the Southern Sierra Critical Zone Observatory: Measurement and Prediction

    NASA Astrophysics Data System (ADS)

    Oroza, C.; Bales, R. C.; Zheng, Z.; Glaser, S. D.

    2017-12-01

    Predicting the spatial distribution of soil moisture in mountain environments is confounded by multiple factors, including complex topography, spatial variably of soil texture, sub-surface flow paths, and snow-soil interactions. While remote-sensing tools such as passive-microwave monitoring can measure spatial variability of soil moisture, they only capture near-surface soil layers. Large-scale sensor networks are increasingly providing soil-moisture measurements at high temporal resolution across a broader range of depths than are accessible from remote sensing. It may be possible to combine these in-situ measurements with high-resolution LIDAR topography and canopy cover to estimate the spatial distribution of soil moisture at high spatial resolution at multiple depths. We study the feasibility of this approach using six years (2009-2014) of daily volumetric water content measurements at 10-, 30-, and 60-cm depths from the Southern Sierra Critical Zone Observatory. A non-parametric, multivariate regression algorithm, Random Forest, was used to predict the spatial distribution of depth-integrated soil-water storage, based on the in-situ measurements and a combination of node attributes (topographic wetness, northness, elevation, soil texture, and location with respect to canopy cover). We observe predictable patterns of predictor accuracy and independent variable ranking during the six-year study period. Predictor accuracy is highest during the snow-cover and early recession periods but declines during the dry period. Soil texture has consistently high feature importance. Other landscape attributes exhibit seasonal trends: northness peaks during the wet-up period, and elevation and topographic-wetness index peak during the recession and dry period, respectively.

  14. [Cost comparison of three treatments for localized prostate cancer in Spain: radical prostatectomy, prostate brachytherapy and external 3D conformal radiotherapy].

    PubMed

    Becerra Bachino, Virginia; Cots, Francesc; Guedea, Ferran; Pera, Joan; Boladeras, Ana; Aguiló, Ferran; Suárez, José Francisco; Gallo, Pedro; Murgui, Lluis; Pont, Angels; Cunillera, Oriol; Pardo, Yolanda; Ferrer, Montserrat

    2011-01-01

    To compare the initial costs of the three most established treatments for clinically localized prostate cancer according to risk, age and comorbidity groups, from the healthcare provider's perspective. We carried out a cost comparison study in a sample of patients consecutively recruited between 2003 and 2005 from a functional unit for prostate cancer treatment in Catalonia (Spain). The use of services up to 6 months after the treatment start date was obtained from hospital databases and direct costs were estimated by micro-cost calculation. Information on the clinical characteristics of patients and treatments was collected prospectively. Costs were compared by using nonparametric tests comparing medians (Kruskall-Wallis) and a semi-logarithmic multiple regression model. Among the 398 patients included, the cost difference among treatments was statistically significant: medians were € 3,229.10, € 5,369.00 and € 6,265.60, respectively, for the groups of patients treated with external 3D conformal radiotherapy, brachytherapy and radical retropublic prostatectomy, (p<0.001). In the multivariate analysis (adjusted R(2)=0.8), the average costs of brachytherapy and external radiotherapy were significantly lower than that of prostatectomy (coefficient -0.212 and -0.729, respectively). Radical prostatectomy proved to be the most expensive treatment option. Overall, the estimated costs in our study were lower than those published elsewhere. Most of the costs were explained by the therapeutic option and neither comorbidity nor risk groups showed an effect on total costs independent of treatment. Copyright © 2010 SESPAS. Published by Elsevier Espana. All rights reserved.

  15. Attributes of a surgical chairperson associated with extramural funding of a department of surgery.

    PubMed

    Turaga, Kiran K; Green, Danielle E; Jayakrishnan, Thejus T; Hwang, Michael; Gamblin, T Clark

    2013-12-01

    Chairpersons of surgery departments are key stakeholders and role models and leaders of research in academic medical institutions. However, the characteristics of surgical chairpersons are understudied. This study aimed to investigate the association between the personal academic achievement of a surgical chairperson and the National Institutes of Health (NIH) funding of the department. We calculated the Hirsch index (H-index), a measure of research productivity, for chairpersons of surgery of the top 90 research medical schools that were ranked by U.S. News & World Report. Specialty training, y as chairperson, location, and NIH institutional and department funding were analyzed. Nonparametric tests and linear regression methods were used to compare the different groups. Of the 90 chairpersons, 20 positions for chairs (22%) are either recent (<1 y) or unfilled (n = 6). Only 3% of all chairpersons are women, and the median H-index for the chairpersons is 20 (Interquartile range 14-27) with a median 101 publications with 14 cites per publication. Median surgery-specific NIH funding in 2011 was $1.7 million (Interquartile range $721,042-5,085,305). The chairperson's H-index was exponentially associated with department funding in multivariate models adjusting for institution rank, except when the H-index was extreme (<4 or >49) (coefficient 0.32, P = 0.02). The research productivity of a chairperson is the only personal attribute of those studied that is associated with the departmental NIH funding. This suggests the important role an academically productive surgical leader may play as a champion for the academic success of the department. Copyright © 2013 Elsevier Inc. All rights reserved.

  16. Hematologic and serum biochemical reference intervals for free-ranging common bottlenose dolphins (Tursiops truncatus) and variation in the distributions of clinicopathologic values related to geographic sampling site.

    PubMed

    Schwacke, Lori H; Hall, Ailsa J; Townsend, Forrest I; Wells, Randall S; Hansen, Larry J; Hohn, Aleta A; Bossart, Gregory D; Fair, Patricia A; Rowles, Teresa K

    2009-08-01

    To develop robust reference intervals for hematologic and serum biochemical variables by use of data derived from free-ranging bottlenose dolphins (Tursiops truncatus) and examine potential variation in distributions of clinicopathologic values related to sampling sites' geographic locations. 255 free-ranging bottlenose dolphins. Data from samples collected during multiple bottlenose dolphin capture-release projects conducted at 4 southeastern US coastal locations in 2000 through 2006 were combined to determine reference intervals for 52 clinicopathologic variables. A nonparametric bootstrap approach was applied to estimate 95th percentiles and associated 90% confidence intervals; the need for partitioning by length and sex classes was determined by testing for differences in estimated thresholds with a bootstrap method. When appropriate, quantile regression was used to determine continuous functions for 95th percentiles dependent on length. The proportion of out-of-range samples for all clinicopathologic measurements was examined for each geographic site, and multivariate ANOVA was applied to further explore variation in leukocyte subgroups. A need for partitioning by length and sex classes was indicated for many clinicopathologic variables. For each geographic site, few significant deviations from expected number of out-of-range samples were detected. Although mean leukocyte counts did not vary among sites, differences in the mean counts for leukocyte subgroups were identified. Although differences in the centrality of distributions for some variables were detected, the 95th percentiles estimated from the pooled data were robust and applicable across geographic sites. The derived reference intervals provide critical information for conducting bottlenose dolphin population health studies.

  17. Gut microbiota in early pediatric multiple sclerosis: a case-control study.

    PubMed

    Tremlett, Helen; Fadrosh, Douglas W; Faruqi, Ali A; Zhu, Feng; Hart, Janace; Roalstad, Shelly; Graves, Jennifer; Lynch, Susan; Waubant, Emmanuelle

    2016-08-01

    Alterations in the gut microbial community composition may be influential in neurological disease. Microbial community profiles were compared between early onset pediatric multiple sclerosis (MS) and control children similar for age and sex. Children ≤18 years old within 2 years of MS onset or controls without autoimmune disorders attending a University of California, San Francisco, USA, pediatric clinic were examined for fecal bacterial community composition and predicted function by 16S ribosomal RNA sequencing and phylogenetic reconstruction of unobserved states (PICRUSt) analysis. Associations between subject characteristics and the microbiota, including beta diversity and taxa abundance, were identified using non-parametric tests, permutational multivariate analysis of variance and negative binomial regression. Eighteen relapsing-remitting MS cases and 17 controls (mean age 13 years; range 4-18) were studied. Cases had a short disease duration (mean 11 months; range 2-24) and half were immunomodulatory drug (IMD) naïve. Whilst overall gut bacterial beta diversity was not significantly related to MS status, IMD exposure was (Canberra, P < 0.02). However, relative to controls, MS cases had a significant enrichment in relative abundance for members of the Desulfovibrionaceae (Bilophila, Desulfovibrio and Christensenellaceae) and depletion in Lachnospiraceae and Ruminococcaceae (all P and q < 0.000005). Microbial genes predicted as enriched in MS versus controls included those involved in glutathione metabolism (Mann-Whitney, P = 0.017), findings that were consistent regardless of IMD exposure. In recent onset pediatric MS, perturbations in the gut microbiome composition were observed, in parallel with predicted enrichment of metabolic pathways associated with neurodegeneration. Findings were suggestive of a pro-inflammatory milieu. © 2016 EAN.

  18. Demographic factors associated with moral sensitivity among nursing students.

    PubMed

    Tuvesson, Hanna; Lützén, Kim

    2017-11-01

    Today's healthcare environment is often characterized by an ethically demanding work situation, and nursing students need to prepare to meet ethical challenges in their future role. Moral sensitivity is an important aspect of the ethical decision-making process, but little is known regarding nursing students' moral sensitivity and its possible development during nursing education. The aims of this study were to investigate moral sensitivity among nursing students, differences in moral sensitivity according to sample sub-group, and the relation between demographic characteristics of nursing students and moral sensitivity. A convenience sample of 299 nursing students from one university completed a questionnaire comprising questions about demographic information and the revised Moral Sensitivity Questionnaire. With the use of SPSS, non-parametric statistics, including logistic regression models, were used to investigate the relationship between demographic characteristics and moral sensitivity. Ethical considerations: The study followed the regulations according to the Swedish Ethical Review Act and was reviewed by the Ethics Committee of South-East Sweden. The findings showed that mean scores of nursing students' moral sensitivity were found in the middle to upper segment of the rating scale. Multivariate analysis showed that gender (odds ratio = 3.32), age (odds ratio = 2.09; 1.73), and parental status (odds ratio = 0.31) were of relevance to nursing students' moral sensitivity. Academic year was found to be unrelated to moral sensitivity. These demographic aspects should be considered when designing ethics education for nursing students. Future studies should continue to investigate moral sensitivity in nursing students, such as if and how various pedagogical strategies in ethics may contribute to moral sensitivity in nursing students.

  19. Ten-year Survival and Its Associated Factors in the Patients Undergoing Pacemaker Implantation in Hospitals Affiliated to Shiraz University of Medical Sciences During 2002 - 2012

    PubMed Central

    Rajaeefard, Abdolreza; Ghorbani, Mohammad; Babaee Baigi, Mohammad Ali; Tabatabae, Hamidreza

    2015-01-01

    Background: Heart failure is a prevalent disease affecting about 4.9 million people in the U.S. and more than 22 million individuals worldwide. Using electric pacemaker is the most common treatment for the patients with heart conduction problems. The present study aimed to determine the factors affecting survival in the patients undergoing pacemaker implantation in the hospitals affiliated to Shiraz University of Medical Sciences. Objectives: The aim of the present study was to identify the factors affecting the survival of the patients suffering from arrhythmia. Patients and Methods: This retrospective survival analysis was conducted on all 1207 patients with heart failure who had undergone permanent pacemaker implantation in the hospitals affiliated to Shiraz University of Medical Sciences from 2002 to 2012. The data were analyzed using non-parametric methods such as Kaplan-Meier method, life table, and Cox regression model. The risk factors of mortality were determined using multivariate Cox proportional hazards method. Results: Survival data were available for 1030 (80%) patients (median age = 71 years [5th to 95th percentile range: 26 - 86 years]) and follow-up was completed for 84.28% of them. According to the results, 56% of the patients had received dual-chamber systems, while 44% had been implanted by single-chamber ventricular systems. Moreover, sick sinus syndrome and pacemaker mode were independent predictors of increased mortality. Conclusions: In this study, sick sinus syndrome and pacemaker mode followed by syncope were independently associated with increased mortality. PMID:26734484

  20. C-reactive protein as a marker of melanoma progression.

    PubMed

    Fang, Shenying; Wang, Yuling; Sui, Dawen; Liu, Huey; Ross, Merrick I; Gershenwald, Jeffrey E; Cormier, Janice N; Royal, Richard E; Lucci, Anthony; Schacherer, Christopher W; Gardner, Julie M; Reveille, John D; Bassett, Roland L; Wang, Li-E; Wei, Qingyi; Amos, Christopher I; Lee, Jeffrey E

    2015-04-20

    To investigate the association between blood levels of C-reactive protein (CRP) in patients with melanoma and overall survival (OS), melanoma-specific survival (MSS), and disease-free survival. Two independent sets of plasma samples from a total of 1,144 patients with melanoma (587 initial and 557 confirmatory) were available for CRP determination. Kaplan-Meier method and Cox regression were used to evaluate the relationship between CRP and clinical outcome. Among 115 patients who underwent sequential blood draws, we evaluated the relationship between change in disease status and change in CRP using nonparametric tests. Elevated CRP level was associated with poorer OS and MSS in the initial, confirmatory, and combined data sets (combined data set: OS hazard ratio, 1.44 per unit increase of logarithmic CRP; 95% CI, 1.30 to 1.59; P < .001; MSS hazard ratio, 1.51 per unit increase of logarithmic CRP; 95% CI, 1.36 to 1.68; P < .001). These findings persisted after multivariable adjustment. As compared with CRP < 10 mg/L, CRP ≥ 10 mg/L conferred poorer OS in patients with any-stage, stage I/II, or stage III/IV disease and poorer disease-free survival in those with stage I/II disease. In patients who underwent sequential evaluation of CRP, an association was identified between an increase in CRP and melanoma disease progression. CRP is an independent prognostic marker in patients with melanoma. CRP measurement should be considered for incorporation into prospective studies of outcome in patients with melanoma and clinical trials of systemic therapies for those with melanoma. © 2015 by American Society of Clinical Oncology.

  1. C-Reactive Protein As a Marker of Melanoma Progression

    PubMed Central

    Fang, Shenying; Wang, Yuling; Sui, Dawen; Liu, Huey; Ross, Merrick I.; Gershenwald, Jeffrey E.; Cormier, Janice N.; Royal, Richard E.; Lucci, Anthony; Schacherer, Christopher W.; Gardner, Julie M.; Reveille, John D.; Bassett, Roland L.; Wang, Li-E; Wei, Qingyi; Amos, Christopher I.; Lee, Jeffrey E.

    2015-01-01

    Purpose To investigate the association between blood levels of C-reactive protein (CRP) in patients with melanoma and overall survival (OS), melanoma-specific survival (MSS), and disease-free survival. Patients and Methods Two independent sets of plasma samples from a total of 1,144 patients with melanoma (587 initial and 557 confirmatory) were available for CRP determination. Kaplan-Meier method and Cox regression were used to evaluate the relationship between CRP and clinical outcome. Among 115 patients who underwent sequential blood draws, we evaluated the relationship between change in disease status and change in CRP using nonparametric tests. Results Elevated CRP level was associated with poorer OS and MSS in the initial, confirmatory, and combined data sets (combined data set: OS hazard ratio, 1.44 per unit increase of logarithmic CRP; 95% CI, 1.30 to 1.59; P < .001; MSS hazard ratio, 1.51 per unit increase of logarithmic CRP; 95% CI, 1.36 to 1.68; P < .001). These findings persisted after multivariable adjustment. As compared with CRP < 10 mg/L, CRP ≥ 10 mg/L conferred poorer OS in patients with any-stage, stage I/II, or stage III/IV disease and poorer disease-free survival in those with stage I/II disease. In patients who underwent sequential evaluation of CRP, an association was identified between an increase in CRP and melanoma disease progression. Conclusion CRP is an independent prognostic marker in patients with melanoma. CRP measurement should be considered for incorporation into prospective studies of outcome in patients with melanoma and clinical trials of systemic therapies for those with melanoma. PMID:25779565

  2. A land use regression model for ambient ultrafine particles in Montreal, Canada: A comparison of linear regression and a machine learning approach.

    PubMed

    Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne

    2016-04-01

    Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.

  3. Access disparities to Magnet hospitals for patients undergoing neurosurgical operations

    PubMed Central

    Missios, Symeon; Bekelis, Kimon

    2017-01-01

    Background Centers of excellence focusing on quality improvement have demonstrated superior outcomes for a variety of surgical interventions. We investigated the presence of access disparities to hospitals recognized by the Magnet Recognition Program of the American Nurses Credentialing Center (ANCC) for patients undergoing neurosurgical operations. Methods We performed a cohort study of all neurosurgery patients who were registered in the New York Statewide Planning and Research Cooperative System (SPARCS) database from 2009–2013. We examined the association of African-American race and lack of insurance with Magnet status hospitalization for neurosurgical procedures. A mixed effects propensity adjusted multivariable regression analysis was used to control for confounding. Results During the study period, 190,535 neurosurgical patients met the inclusion criteria. Using a multivariable logistic regression, we demonstrate that African-Americans had lower admission rates to Magnet institutions (OR 0.62; 95% CI, 0.58–0.67). This persisted in a mixed effects logistic regression model (OR 0.77; 95% CI, 0.70–0.83) to adjust for clustering at the patient county level, and a propensity score adjusted logistic regression model (OR 0.75; 95% CI, 0.69–0.82). Additionally, lack of insurance was associated with lower admission rates to Magnet institutions (OR 0.71; 95% CI, 0.68–0.73), in a multivariable logistic regression model. This persisted in a mixed effects logistic regression model (OR 0.72; 95% CI, 0.69–0.74), and a propensity score adjusted logistic regression model (OR 0.72; 95% CI, 0.69–0.75). Conclusions Using a comprehensive all-payer cohort of neurosurgery patients in New York State we identified an association of African-American race and lack of insurance with lower rates of admission to Magnet hospitals. PMID:28684152

  4. Primary intervention for memory structuring and meaning acquisition (PIMSMA): study of a mental health first-aid intervention in the ED with injured survivors of suicide bombing attacks.

    PubMed

    Schreiber, Shaul; Dolberg, Ornah T; Barkai, Gabriel; Peles, Einat; Leor, Agnes; Rapoport, Elena; Heinik, Jeremia; Bloch, Miki

    2007-01-01

    To assess the impact of a structured intervention, the "primary intervention for memory structuring and meaning acquisition" (PIMSMA) performed randomly in the emergency department with survivors of suicide bombing attacks, on their medium-term mental health outcome. Follow up and assessment 3-9 months postinjury, and 24 months thereafter. A tertiary referral general hospital in Tel Aviv, Israel. Injured survivors of 9 suicide bombing and suicide shooting, men and women aged 16-72 at the time of the incident. Diagnosis of posttraumatic stress disorder (PTSD) was made using the Hebrew validated version of the DSM-IV SCID-PTSD rating scale. Other psychiatric symptoms were assessed using the following rating scales: impact of event scale (IES), Hamilton rating scale for depression (HAM-D) and for anxiety (HAM-A), and the Pittsburgh sleep quality index (PSQI). Effects of PIMSMA and PTSD level of psychological distress were analyzed using ANOVA and for change over time for continuous variables repeated measured multivariate analyses was performed, and for categorical variables nonparametric-related sample McNemar. Logistic regression for variable associated with PTSD was performed. Out of 213 eligible injured survivors evacuated to our ER, 129 were retrieved 3-9 months after the incident, and 53 were available for assessment 2 years later. Multivariate analyses for being PTSD vs non-PTSD at the first evaluation, being hospitalized OR = 5.6 (95 percent CI 1.1-27.6) and treated OR = 24.5 (95 percent CI 2.8-200) were the only predictors, with no effect (p = 0.9) for PIMSMA vs other supportive intervention. Predictor for PTSD at the second evaluation were IES severity score at first evaluation OR = 1.1 (95 percent CI 1.04-1.2). The PIMSMA approach is as good as the nonspecific supportive treatment performed routinely in the ED with all survivors of traumatic events of any origin. Further studies are needed to establish valid, evidence-based treatment approaches for the acute aftermath of exposure to severe potentially traumatic events.

  5. Reservoir characterization using core, well log, and seismic data and intelligent software

    NASA Astrophysics Data System (ADS)

    Soto Becerra, Rodolfo

    We have developed intelligent software, Oilfield Intelligence (OI), as an engineering tool to improve the characterization of oil and gas reservoirs. OI integrates neural networks and multivariate statistical analysis. It is composed of five main subsystems: data input, preprocessing, architecture design, graphics design, and inference engine modules. More than 1,200 lines of programming code as M-files using the language MATLAB been written. The degree of success of many oil and gas drilling, completion, and production activities depends upon the accuracy of the models used in a reservoir description. Neural networks have been applied for identification of nonlinear systems in almost all scientific fields of humankind. Solving reservoir characterization problems is no exception. Neural networks have a number of attractive features that can help to extract and recognize underlying patterns, structures, and relationships among data. However, before developing a neural network model, we must solve the problem of dimensionality such as determining dominant and irrelevant variables. We can apply principal components and factor analysis to reduce the dimensionality and help the neural networks formulate more realistic models. We validated OI by obtaining confident models in three different oil field problems: (1) A neural network in-situ stress model using lithology and gamma ray logs for the Travis Peak formation of east Texas, (2) A neural network permeability model using porosity and gamma ray and a neural network pseudo-gamma ray log model using 3D seismic attributes for the reservoir VLE 196 Lamar field located in Block V of south-central Lake Maracaibo (Venezuela), and (3) Neural network primary ultimate oil recovery (PRUR), initial waterflooding ultimate oil recovery (IWUR), and infill drilling ultimate oil recovery (IDUR) models using reservoir parameters for San Andres and Clearfork carbonate formations in west Texas. In all cases, we compared the results from the neural network models with the results from regression statistical and non-parametric approach models. The results show that it is possible to obtain the highest cross-correlation coefficient between predicted and actual target variables, and the lowest average absolute errors using the integrated techniques of multivariate statistical analysis and neural networks in our intelligent software.

  6. The Role of Environmental Toxins on ALS: A Case-Control Study of Occupational Risk Factors

    PubMed Central

    Su, Feng-Chiao.; Goutman, Stephen A.; Chernyak, Sergey; Mukherjee, Bhramar; Callaghan, Brian C.; Batterman, Stuart; Feldman, Eva L.

    2016-01-01

    Importance Persistent environmental pollutants may represent a modifiable risk factor involved in the gene-time-environment hypothesis in amyotrophic lateral sclerosis (ALS). Objective To evaluate the association of occupational exposures and environmental toxins on the odds of developing ALS in Michigan, a state with historically high levels of environmental pollution. Design Case-control study conducted between 2011 and 2014. Setting Tertiary referral center/ALS referral center Participants ALS cases (n=156) with a diagnosis of definitive, probable, probable with laboratory support, or possible ALS by revised El Escorial criteria. Controls (n=128) were excluded if they had a diagnosis of ALS, another neurodegenerative condition, or a family history of ALS in a first- or second-degree blood relative. Additional exclusions included age less than 18 or inability to communicate in English. Main Outcome and Measure(s) Cases and controls completed a survey assessing occupational and residential exposures. Blood concentrations of 122 persistent environmental pollutants, including organochlorine pesticides (OCP), polychlorinated biphenyls (PCBs), and brominated flame retardants (BFRs), were measured using gas chromatography/mass spectrometry. Multivariable models with self-reported occupational exposures in various exposure time windows and environmental toxin blood concentrations were separately fit by logistic regression models. Concordance between the survey data and pollutant measurements was assessed using the nonparametric Kendall’s Tau correlation coefficient. Results Survey data revealed that reported pesticide exposure in the cumulative exposure windows was significantly associated with ALS (OR = 5.09, 95% CI = 1.85–14.0). Military service was also associated with ALS in two time windows. A multivariable model of measured persistent environmental pollutants in the blood, representing cumulative occupational and residential exposure, showed increased odds of ALS for 2 OCPs, 2 PCBs, and 1 BFR. There was modest concordance between survey data and the measurements of persistent environmental pollutants in blood. Conclusions and Relevance Persistent environmental pollutants measured in blood are significantly associated with ALS. These environmental pollutants may represent a modifiable ALS disease risk factor and should be further studied. PMID:27159543

  7. Remote sensing and GIS-based landslide hazard analysis and cross-validation using multivariate logistic regression model on three test areas in Malaysia

    NASA Astrophysics Data System (ADS)

    Pradhan, Biswajeet

    2010-05-01

    This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.

  8. Serum dehydroepiandrosterone sulphate, psychosocial factors and musculoskeletal pain in workers.

    PubMed

    Marinelli, A; Prodi, A; Pesel, G; Ronchese, F; Bovenzi, M; Negro, C; Larese Filon, F

    2017-12-30

    The serum level of dehydroepiandrosterone sulphate (DHEA-S) has been suggested as a biological marker of stress. To assess the association between serum DHEA-S, psychosocial factors and musculoskeletal (MS) pain in university workers. The study population included voluntary workers at the scientific departments of the University of Trieste (Italy) who underwent periodical health surveillance from January 2011 to June 2012. DHEA-S level was analysed in serum. The assessment tools included the General Health Questionnaire (GHQ) and a modified Nordic musculoskeletal symptoms questionnaire. The relation between DHEA-S, individual characteristics, pain perception and psychological factors was assessed by means of multivariable linear regression analysis. There were 189 study participants. The study population was characterized by high reward and low effort. Pain perception in the neck, shoulder, upper limbs, upper back and lower back was reported by 42, 32, 19, 29 and 43% of people, respectively. In multivariable regression analysis, gender, age and pain perception in the shoulder and upper limbs were significantly related to serum DHEA-S. Effort and overcommitment were related to shoulder and neck pain but not to DHEA-S. The GHQ score was associated with pain perception in different body sites and inversely to DHEA-S but significance was lost in multivariable regression analysis. DHEA-S was associated with age, gender and perception of MS pain, while effort-reward imbalance dimensions and GHQ score failed to reach the statistical significance in multivariable regression analysis. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society of Occupational Medicine. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  9. Independent Prognostic Factors for Acute Organophosphorus Pesticide Poisoning.

    PubMed

    Tang, Weidong; Ruan, Feng; Chen, Qi; Chen, Suping; Shao, Xuebo; Gao, Jianbo; Zhang, Mao

    2016-07-01

    Acute organophosphorus pesticide poisoning (AOPP) is becoming a significant problem and a potential cause of human mortality because of the abuse of organophosphate compounds. This study aims to determine the independent prognostic factors of AOPP by using multivariate logistic regression analysis. The clinical data for 71 subjects with AOPP admitted to our hospital were retrospectively analyzed. This information included the Acute Physiology and Chronic Health Evaluation II (APACHE II) scores, 6-h post-admission blood lactate levels, post-admission 6-h lactate clearance rates, admission blood cholinesterase levels, 6-h post-admission blood cholinesterase levels, cholinesterase activity, blood pH, and other factors. Univariate analysis and multivariate logistic regression analyses were conducted to identify all prognostic factors and independent prognostic factors, respectively. A receiver operating characteristic curve was plotted to analyze the testing power of independent prognostic factors. Twelve of 71 subjects died. Admission blood lactate levels, 6-h post-admission blood lactate levels, post-admission 6-h lactate clearance rates, blood pH, and APACHE II scores were identified as prognostic factors for AOPP according to the univariate analysis, whereas only 6-h post-admission blood lactate levels, post-admission 6-h lactate clearance rates, and blood pH were independent prognostic factors identified by multivariate logistic regression analysis. The receiver operating characteristic analysis suggested that post-admission 6-h lactate clearance rates were of moderate diagnostic value. High 6-h post-admission blood lactate levels, low blood pH, and low post-admission 6-h lactate clearance rates were independent prognostic factors identified by multivariate logistic regression analysis. Copyright © 2016 by Daedalus Enterprises.

  10. Quality Reporting of Multivariable Regression Models in Observational Studies: Review of a Representative Sample of Articles Published in Biomedical Journals.

    PubMed

    Real, Jordi; Forné, Carles; Roso-Llorach, Albert; Martínez-Sánchez, Jose M

    2016-05-01

    Controlling for confounders is a crucial step in analytical observational studies, and multivariable models are widely used as statistical adjustment techniques. However, the validation of the assumptions of the multivariable regression models (MRMs) should be made clear in scientific reporting. The objective of this study is to review the quality of statistical reporting of the most commonly used MRMs (logistic, linear, and Cox regression) that were applied in analytical observational studies published between 2003 and 2014 by journals indexed in MEDLINE.Review of a representative sample of articles indexed in MEDLINE (n = 428) with observational design and use of MRMs (logistic, linear, and Cox regression). We assessed the quality of reporting about: model assumptions and goodness-of-fit, interactions, sensitivity analysis, crude and adjusted effect estimate, and specification of more than 1 adjusted model.The tests of underlying assumptions or goodness-of-fit of the MRMs used were described in 26.2% (95% CI: 22.0-30.3) of the articles and 18.5% (95% CI: 14.8-22.1) reported the interaction analysis. Reporting of all items assessed was higher in articles published in journals with a higher impact factor.A low percentage of articles indexed in MEDLINE that used multivariable techniques provided information demonstrating rigorous application of the model selected as an adjustment method. Given the importance of these methods to the final results and conclusions of observational studies, greater rigor is required in reporting the use of MRMs in the scientific literature.

  11. MODELING SNAKE MICROHABITAT FROM RADIOTELEMETRY STUDIES USING POLYTOMOUS LOGISTIC REGRESSION

    EPA Science Inventory

    Multivariate analysis of snake microhabitat has historically used techniques that were derived under assumptions of normality and common covariance structure (e.g., discriminant function analysis, MANOVA). In this study, polytomous logistic regression (PLR which does not require ...

  12. Nonparametric methods for doubly robust estimation of continuous treatment effects.

    PubMed

    Kennedy, Edward H; Ma, Zongming; McHugh, Matthew D; Small, Dylan S

    2017-09-01

    Continuous treatments (e.g., doses) arise often in practice, but many available causal effect estimators are limited by either requiring parametric models for the effect curve, or by not allowing doubly robust covariate adjustment. We develop a novel kernel smoothing approach that requires only mild smoothness assumptions on the effect curve, and still allows for misspecification of either the treatment density or outcome regression. We derive asymptotic properties and give a procedure for data-driven bandwidth selection. The methods are illustrated via simulation and in a study of the effect of nurse staffing on hospital readmissions penalties.

  13. Salting-out assisted liquid-liquid extraction and partial least squares regression to assay low molecular weight polycyclic aromatic hydrocarbons leached from soils and sediments

    NASA Astrophysics Data System (ADS)

    Bressan, Lucas P.; do Nascimento, Paulo Cícero; Schmidt, Marcella E. P.; Faccin, Henrique; de Machado, Leandro Carvalho; Bohrer, Denise

    2017-02-01

    A novel method was developed to determine low molecular weight polycyclic aromatic hydrocarbons in aqueous leachates from soils and sediments using a salting-out assisted liquid-liquid extraction, synchronous fluorescence spectrometry and a multivariate calibration technique. Several experimental parameters were controlled and the optimum conditions were: sodium carbonate as the salting-out agent at concentration of 2 mol L- 1, 3 mL of acetonitrile as extraction solvent, 6 mL of aqueous leachate, vortexing for 5 min and centrifuging at 4000 rpm for 5 min. The partial least squares calibration was optimized to the lowest values of root mean squared error and five latent variables were chosen for each of the targeted compounds. The regression coefficients for the true versus predicted concentrations were higher than 0.99. Figures of merit for the multivariate method were calculated, namely sensitivity, multivariate detection limit and multivariate quantification limit. The selectivity was also evaluated and other polycyclic aromatic hydrocarbons did not interfere in the analysis. Likewise, high performance liquid chromatography was used as a comparative methodology, and the regression analysis between the methods showed no statistical difference (t-test). The proposed methodology was applied to soils and sediments of a Brazilian river and the recoveries ranged from 74.3% to 105.8%. Overall, the proposed methodology was suitable for the targeted compounds, showing that the extraction method can be applied to spectrofluorometric analysis and that the multivariate calibration is also suitable for these compounds in leachates from real samples.

  14. Predicting volumes in four Hawaii hardwoods...first multivariate equations developed

    Treesearch

    David A. Sharpnack

    1966-01-01

    Multivariate regression equations were developed for predicting board-foot (Int. 1/ 4-inch log rule ) and cubic-foot volumes in each 8.15-foot section of trees of four Hawaii hardwood species. The species are koa (Acacia koa), ohia (Metrosideros polymorpha), robusta eucalyptus (Eucalyptus robusta), and...

  15. A Multivariate Test of the Bott Hypothesis in an Urban Irish Setting

    ERIC Educational Resources Information Center

    Gordon, Michael; Downing, Helen

    1978-01-01

    Using a sample of 686 married Irish women in Cork City the Bott hypothesis was tested, and the results of a multivariate regression analysis revealed that neither network connectedness nor the strength of the respondent's emotional ties to the network had any explanatory power. (Author)

  16. Multi-frequency Phase Unwrap from Noisy Data: Adaptive Least Squares Approach

    NASA Astrophysics Data System (ADS)

    Katkovnik, Vladimir; Bioucas-Dias, José

    2010-04-01

    Multiple frequency interferometry is, basically, a phase acquisition strategy aimed at reducing or eliminating the ambiguity of the wrapped phase observations or, equivalently, reducing or eliminating the fringe ambiguity order. In multiple frequency interferometry, the phase measurements are acquired at different frequencies (or wavelengths) and recorded using the corresponding sensors (measurement channels). Assuming that the absolute phase to be reconstructed is piece-wise smooth, we use a nonparametric regression technique for the phase reconstruction. The nonparametric estimates are derived from a local least squares criterion, which, when applied to the multifrequency data, yields denoised (filtered) phase estimates with extended ambiguity (periodized), compared with the phase ambiguities inherent to each measurement frequency. The filtering algorithm is based on local polynomial (LPA) approximation for design of nonlinear filters (estimators) and adaptation of these filters to unknown smoothness of the spatially varying absolute phase [9]. For phase unwrapping, from filtered periodized data, we apply the recently introduced robust (in the sense of discontinuity preserving) PUMA unwrapping algorithm [1]. Simulations give evidence that the proposed algorithm yields state-of-the-art performance for continuous as well as for discontinues phase surfaces, enabling phase unwrapping in extraordinary difficult situations when all other algorithms fail.

  17. Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci.

    PubMed

    Yap, John Stephen; Fan, Jianqing; Wu, Rongling

    2009-12-01

    Estimation of the covariance structure of longitudinal processes is a fundamental prerequisite for the practical deployment of functional mapping designed to study the genetic regulation and network of quantitative variation in dynamic complex traits. We present a nonparametric approach for estimating the covariance structure of a quantitative trait measured repeatedly at a series of time points. Specifically, we adopt Huang et al.'s (2006, Biometrika 93, 85-98) approach of invoking the modified Cholesky decomposition and converting the problem into modeling a sequence of regressions of responses. A regularized covariance estimator is obtained using a normal penalized likelihood with an L(2) penalty. This approach, embedded within a mixture likelihood framework, leads to enhanced accuracy, precision, and flexibility of functional mapping while preserving its biological relevance. Simulation studies are performed to reveal the statistical properties and advantages of the proposed method. A real example from a mouse genome project is analyzed to illustrate the utilization of the methodology. The new method will provide a useful tool for genome-wide scanning for the existence and distribution of quantitative trait loci underlying a dynamic trait important to agriculture, biology, and health sciences.

  18. RBF kernel based support vector regression to estimate the blood volume and heart rate responses during hemodialysis.

    PubMed

    Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H

    2009-01-01

    This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).

  19. An appraisal of statistical procedures used in derivation of reference intervals.

    PubMed

    Ichihara, Kiyoshi; Boyd, James C

    2010-11-01

    When conducting studies to derive reference intervals (RIs), various statistical procedures are commonly applied at each step, from the planning stages to final computation of RIs. Determination of the necessary sample size is an important consideration, and evaluation of at least 400 individuals in each subgroup has been recommended to establish reliable common RIs in multicenter studies. Multiple regression analysis allows identification of the most important factors contributing to variation in test results, while accounting for possible confounding relationships among these factors. Of the various approaches proposed for judging the necessity of partitioning reference values, nested analysis of variance (ANOVA) is the likely method of choice owing to its ability to handle multiple groups and being able to adjust for multiple factors. Box-Cox power transformation often has been used to transform data to a Gaussian distribution for parametric computation of RIs. However, this transformation occasionally fails. Therefore, the non-parametric method based on determination of the 2.5 and 97.5 percentiles following sorting of the data, has been recommended for general use. The performance of the Box-Cox transformation can be improved by introducing an additional parameter representing the origin of transformation. In simulations, the confidence intervals (CIs) of reference limits (RLs) calculated by the parametric method were narrower than those calculated by the non-parametric approach. However, the margin of difference was rather small owing to additional variability in parametrically-determined RLs introduced by estimation of parameters for the Box-Cox transformation. The parametric calculation method may have an advantage over the non-parametric method in allowing identification and exclusion of extreme values during RI computation.

  20. Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records.

    PubMed

    Cheung, Li C; Pan, Qing; Hyun, Noorie; Schiffman, Mark; Fetterman, Barbara; Castle, Philip E; Lorey, Thomas; Katki, Hormuzd A

    2017-09-30

    For cost-effectiveness and efficiency, many large-scale general-purpose cohort studies are being assembled within large health-care providers who use electronic health records. Two key features of such data are that incident disease is interval-censored between irregular visits and there can be pre-existing (prevalent) disease. Because prevalent disease is not always immediately diagnosed, some disease diagnosed at later visits are actually undiagnosed prevalent disease. We consider prevalent disease as a point mass at time zero for clinical applications where there is no interest in time of prevalent disease onset. We demonstrate that the naive Kaplan-Meier cumulative risk estimator underestimates risks at early time points and overestimates later risks. We propose a general family of mixture models for undiagnosed prevalent disease and interval-censored incident disease that we call prevalence-incidence models. Parameters for parametric prevalence-incidence models, such as the logistic regression and Weibull survival (logistic-Weibull) model, are estimated by direct likelihood maximization or by EM algorithm. Non-parametric methods are proposed to calculate cumulative risks for cases without covariates. We compare naive Kaplan-Meier, logistic-Weibull, and non-parametric estimates of cumulative risk in the cervical cancer screening program at Kaiser Permanente Northern California. Kaplan-Meier provided poor estimates while the logistic-Weibull model was a close fit to the non-parametric. Our findings support our use of logistic-Weibull models to develop the risk estimates that underlie current US risk-based cervical cancer screening guidelines. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA. Published 2017. This article has been contributed to by US Government employees and their work is in the public domain in the USA.

  1. Multivariate research in areas of phosphorus cast-iron brake shoes manufacturing using the statistical analysis and the multiple regression equations

    NASA Astrophysics Data System (ADS)

    Kiss, I.; Cioată, V. G.; Alexa, V.; Raţiu, S. A.

    2017-05-01

    The braking system is one of the most important and complex subsystems of railway vehicles, especially when it comes for safety. Therefore, installing efficient safe brakes on the modern railway vehicles is essential. Nowadays is devoted attention to solving problems connected with using high performance brake materials and its impact on thermal and mechanical loading of railway wheels. The main factor that influences the selection of a friction material for railway applications is the performance criterion, due to the interaction between the brake block and the wheel produce complex thermos-mechanical phenomena. In this work, the investigated subjects are the cast-iron brake shoes, which are still widely used on freight wagons. Therefore, the cast-iron brake shoes - with lamellar graphite and with a high content of phosphorus (0.8-1.1%) - need a special investigation. In order to establish the optimal condition for the cast-iron brake shoes we proposed a mathematical modelling study by using the statistical analysis and multiple regression equations. Multivariate research is important in areas of cast-iron brake shoes manufacturing, because many variables interact with each other simultaneously. Multivariate visualization comes to the fore when researchers have difficulties in comprehending many dimensions at one time. Technological data (hardness and chemical composition) obtained from cast-iron brake shoes were used for this purpose. In order to settle the multiple correlation between the hardness of the cast-iron brake shoes, and the chemical compositions elements several model of regression equation types has been proposed. Because a three-dimensional surface with variables on three axes is a common way to illustrate multivariate data, in which the maximum and minimum values are easily highlighted, we plotted graphical representation of the regression equations in order to explain interaction of the variables and locate the optimal level of each variable for maximal response. For the calculation of the regression coefficients, dispersion and correlation coefficients, the software Matlab was used.

  2. A Statistical Approach for Testing Cross-Phenotype Effects of Rare Variants

    PubMed Central

    Broadaway, K. Alaine; Cutler, David J.; Duncan, Richard; Moore, Jacob L.; Ware, Erin B.; Jhun, Min A.; Bielak, Lawrence F.; Zhao, Wei; Smith, Jennifer A.; Peyser, Patricia A.; Kardia, Sharon L.R.; Ghosh, Debashis; Epstein, Michael P.

    2016-01-01

    Increasing empirical evidence suggests that many genetic variants influence multiple distinct phenotypes. When cross-phenotype effects exist, multivariate association methods that consider pleiotropy are often more powerful than univariate methods that model each phenotype separately. Although several statistical approaches exist for testing cross-phenotype effects for common variants, there is a lack of similar tests for gene-based analysis of rare variants. In order to fill this important gap, we introduce a statistical method for cross-phenotype analysis of rare variants using a nonparametric distance-covariance approach that compares similarity in multivariate phenotypes to similarity in rare-variant genotypes across a gene. The approach can accommodate both binary and continuous phenotypes and further can adjust for covariates. Our approach yields a closed-form test whose significance can be evaluated analytically, thereby improving computational efficiency and permitting application on a genome-wide scale. We use simulated data to demonstrate that our method, which we refer to as the Gene Association with Multiple Traits (GAMuT) test, provides increased power over competing approaches. We also illustrate our approach using exome-chip data from the Genetic Epidemiology Network of Arteriopathy. PMID:26942286

  3. Effect of selected gastrointestinal parasites and viral agents on fecal S100A12 concentrations in puppies as a potential comparative model.

    PubMed

    Heilmann, Romy M; Grellet, Aurélien; Grützner, Niels; Cranford, Shannon M; Suchodolski, Jan S; Chastant-Maillard, Sylvie; Steiner, Jörg M

    2018-04-17

    Previous data suggest that fecal S100A12 has clinical utility as a biomarker of chronic gastrointestinal inflammation (idiopathic inflammatory bowel disease) in both people and dogs, but the effect of gastrointestinal pathogens on fecal S100A12 concentrations is largely unknown. The role of S100A12 in parasite and viral infections is also difficult to study in traditional animal models due to the lack of S100A12 expression in rodents. Thus, the aim of this study was to evaluate fecal S100A12 concentrations in a cohort of puppies with intestinal parasites (Cystoisospora spp., Toxocara canis, Giardia sp.) and viral agents that are frequently encountered and known to cause gastrointestinal signs in dogs (coronavirus, parvovirus) as a comparative model. Spot fecal samples were collected from 307 puppies [median age (range): 7 (4-13) weeks; 29 different breeds] in French breeding kennels, and fecal scores (semiquantitative system; scores 1-13) were assigned. Fecal samples were tested for Cystoisospora spp. (C. canis and C. ohioensis), Toxocara canis, Giardia sp., as well as canine coronavirus (CCV) and parvovirus (CPV). S100A12 concentrations were measured in all fecal samples using an in-house radioimmunoassay. Statistical analyses were performed using non-parametric 2-group or multiple-group comparisons, non-parametric correlation analysis, association testing between nominal variables, and construction of a multivariate mixed model. Fecal S100A12 concentrations ranged from < 24-14,363 ng/g. Univariate analysis only showed increased fecal S100A12 concentrations in dogs shedding Cystoisospora spp. (P = 0.0384) and in dogs infected with parvovirus (P = 0.0277), whereas dogs infected with coronavirus had decreased fecal S100A12 concentrations (P = 0.0345). However, shedding of any single enteropathogen did not affect fecal S100A12 concentrations in multivariate analysis (all P > 0.05) in this study. Only fecal score and breed size had an effect on fecal S100A12 concentrations in multivariate analysis (P < 0.0001). An infection with any single enteropathogen tested in this study is unlikely to alter fecal S100A12 concentrations, and these preliminary data are important for further studies evaluating fecal S100A12 concentrations in dogs or when using fecal S100A12 concentrations as a biomarker in patients with chronic idiopathic gastrointestinal inflammation.

  4. Assessment of Communications-related Admissions Criteria in a Three-year Pharmacy Program

    PubMed Central

    Tejada, Frederick R.; Lang, Lynn A.; Purnell, Miriam; Acedera, Lisa; Ngonga, Ferdinand

    2015-01-01

    Objective. To determine if there is a correlation between TOEFL and other admissions criteria that assess communications skills (ie, PCAT variables: verbal, reading, essay, and composite), interview, and observational scores and to evaluate TOEFL and these admissions criteria as predictors of academic performance. Methods. Statistical analyses included two sample t tests, multiple regression and Pearson’s correlations for parametric variables, and Mann-Whitney U for nonparametric variables, which were conducted on the retrospective data of 162 students, 57 of whom were foreign-born. Results. The multiple regression model of the other admissions criteria on TOEFL was significant. There was no significant correlation between TOEFL scores and academic performance. However, significant correlations were found between the other admissions criteria and academic performance. Conclusion. Since TOEFL is not a significant predictor of either communication skills or academic success of foreign-born PharmD students in the program, it may be eliminated as an admissions criterion. PMID:26430273

  5. Assessment of Communications-related Admissions Criteria in a Three-year Pharmacy Program.

    PubMed

    Parmar, Jayesh R; Tejada, Frederick R; Lang, Lynn A; Purnell, Miriam; Acedera, Lisa; Ngonga, Ferdinand

    2015-08-25

    To determine if there is a correlation between TOEFL and other admissions criteria that assess communications skills (ie, PCAT variables: verbal, reading, essay, and composite), interview, and observational scores and to evaluate TOEFL and these admissions criteria as predictors of academic performance. Statistical analyses included two sample t tests, multiple regression and Pearson's correlations for parametric variables, and Mann-Whitney U for nonparametric variables, which were conducted on the retrospective data of 162 students, 57 of whom were foreign-born. The multiple regression model of the other admissions criteria on TOEFL was significant. There was no significant correlation between TOEFL scores and academic performance. However, significant correlations were found between the other admissions criteria and academic performance. Since TOEFL is not a significant predictor of either communication skills or academic success of foreign-born PharmD students in the program, it may be eliminated as an admissions criterion.

  6. Confidence limits for data mining models of options prices

    NASA Astrophysics Data System (ADS)

    Healy, J. V.; Dixon, M.; Read, B. J.; Cai, F. F.

    2004-12-01

    Non-parametric methods such as artificial neural nets can successfully model prices of financial options, out-performing the Black-Scholes analytic model (Eur. Phys. J. B 27 (2002) 219). However, the accuracy of such approaches is usually expressed only by a global fitting/error measure. This paper describes a robust method for determining prediction intervals for models derived by non-linear regression. We have demonstrated it by application to a standard synthetic example (29th Annual Conference of the IEEE Industrial Electronics Society, Special Session on Intelligent Systems, pp. 1926-1931). The method is used here to obtain prediction intervals for option prices using market data for LIFFE “ESX” FTSE 100 index options ( http://www.liffe.com/liffedata/contracts/month_onmonth.xls). We avoid special neural net architectures and use standard regression procedures to determine local error bars. The method is appropriate for target data with non constant variance (or volatility).

  7. Construction of reactive potential energy surfaces with Gaussian process regression: active data selection

    NASA Astrophysics Data System (ADS)

    Guan, Yafu; Yang, Shuo; Zhang, Dong H.

    2018-04-01

    Gaussian process regression (GPR) is an efficient non-parametric method for constructing multi-dimensional potential energy surfaces (PESs) for polyatomic molecules. Since not only the posterior mean but also the posterior variance can be easily calculated, GPR provides a well-established model for active learning, through which PESs can be constructed more efficiently and accurately. We propose a strategy of active data selection for the construction of PESs with emphasis on low energy regions. Through three-dimensional (3D) example of H3, the validity of this strategy is verified. The PESs for two prototypically reactive systems, namely, H + H2O ↔ H2 + OH reaction and H + CH4 ↔ H2 + CH3 reaction are reconstructed. Only 920 and 4000 points are assembled to reconstruct these two PESs respectively. The accuracy of the GP PESs is not only tested by energy errors but also validated by quantum scattering calculations.

  8. Computation of nonlinear least squares estimator and maximum likelihood using principles in matrix calculus

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.

    2017-11-01

    This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation

  9. Multivariate Drought Characterization in India for Monitoring and Prediction

    NASA Astrophysics Data System (ADS)

    Sreekumaran Unnithan, P.; Mondal, A.

    2016-12-01

    Droughts are one of the most important natural hazards that affect the society significantly in terms of mortality and productivity. The metric that is most widely used by the India Meteorological Department (IMD) to monitor and predict the occurrence, spread, intensification and termination of drought is based on the univariate Standardized Precipitation Index (SPI). However, droughts may be caused by the influence and interaction of many variables (such as precipitation, soil moisture, runoff, etc.), emphasizing the need for a multivariate approach for drought characterization. This study advocates and illustrates use of the recently proposed multivariate standardized drought index (MSDI) in monitoring and prediction of drought and assessing its concerned risk in the Indian region. MSDI combines information from multiple sources: precipitation and soil moisture, and has been deemed to be a more reliable drought index. All-India monthly rainfall and soil moisture data sets are analysed for the period 1980 to 2014 to characterize historical droughts using both the univariate indices, the precipitation-based SPI and the standardized soil moisture index (SSI), as well as the multivariate MSDI using parametric and non-parametric approaches. We confirm that MSDI can capture droughts of 1986 and 1990 that aren't detected by using SPI alone. Moreover, in 1987, MSDI indicated a higher severity of drought when a deficiency in both soil moisture and precipitation was encountered. Further, this study also explores the use of MSDI for drought forecasts and assesses its performance vis-à-vis existing predictions from the IMD. Future research efforts will be directed towards formulating a more robust standardized drought indicator that can take into account socio-economic aspects that also play a key role for water-stressed regions such as India.

  10. Rex fortran 4 system for combinatorial screening or conventional analysis of multivariate regressions

    Treesearch

    L.R. Grosenbaugh

    1967-01-01

    Describes an expansible computerized system that provides data needed in regression or covariance analysis of as many as 50 variables, 8 of which may be dependent. Alternatively, it can screen variously generated combinations of independent variables to find the regression with the smallest mean-squared-residual, which will be fitted if desired. The user can easily...

  11. Multivariate logistic regression for predicting total culturable virus presence at the intake of a potable-water treatment plant: novel application of the atypical coliform/total coliform ratio.

    PubMed

    Black, L E; Brion, G M; Freitas, S J

    2007-06-01

    Predicting the presence of enteric viruses in surface waters is a complex modeling problem. Multiple water quality parameters that indicate the presence of human fecal material, the load of fecal material, and the amount of time fecal material has been in the environment are needed. This paper presents the results of a multiyear study of raw-water quality at the inlet of a potable-water plant that related 17 physical, chemical, and biological indices to the presence of enteric viruses as indicated by cytopathic changes in cell cultures. It was found that several simple, multivariate logistic regression models that could reliably identify observations of the presence or absence of total culturable virus could be fitted. The best models developed combined a fecal age indicator (the atypical coliform [AC]/total coliform [TC] ratio), the detectable presence of a human-associated sterol (epicoprostanol) to indicate the fecal source, and one of several fecal load indicators (the levels of Giardia species cysts, coliform bacteria, and coprostanol). The best fit to the data was found when the AC/TC ratio, the presence of epicoprostanol, and the density of fecal coliform bacteria were input into a simple, multivariate logistic regression equation, resulting in 84.5% and 78.6% accuracies for the identification of the presence and absence of total culturable virus, respectively. The AC/TC ratio was the most influential input variable in all of the models generated, but producing the best prediction required additional input related to the fecal source and the fecal load. The potential for replacing microbial indicators of fecal load with levels of coprostanol was proposed and evaluated by multivariate logistic regression modeling for the presence and absence of virus.

  12. A regressive methodology for estimating missing data in rainfall daily time series

    NASA Astrophysics Data System (ADS)

    Barca, E.; Passarella, G.

    2009-04-01

    The "presence" of gaps in environmental data time series represents a very common, but extremely critical problem, since it can produce biased results (Rubin, 1976). Missing data plagues almost all surveys. The problem is how to deal with missing data once it has been deemed impossible to recover the actual missing values. Apart from the amount of missing data, another issue which plays an important role in the choice of any recovery approach is the evaluation of "missingness" mechanisms. When data missing is conditioned by some other variable observed in the data set (Schafer, 1997) the mechanism is called MAR (Missing at Random). Otherwise, when the missingness mechanism depends on the actual value of the missing data, it is called NCAR (Not Missing at Random). This last is the most difficult condition to model. In the last decade interest arose in the estimation of missing data by using regression (single imputation). More recently multiple imputation has become also available, which returns a distribution of estimated values (Scheffer, 2002). In this paper an automatic methodology for estimating missing data is presented. In practice, given a gauging station affected by missing data (target station), the methodology checks the randomness of the missing data and classifies the "similarity" between the target station and the other gauging stations spread over the study area. Among different methods useful for defining the similarity degree, whose effectiveness strongly depends on the data distribution, the Spearman correlation coefficient was chosen. Once defined the similarity matrix, a suitable, nonparametric, univariate, and regressive method was applied in order to estimate missing data in the target station: the Theil method (Theil, 1950). Even though the methodology revealed to be rather reliable an improvement of the missing data estimation can be achieved by a generalization. A first possible improvement consists in extending the univariate technique to the multivariate approach. Another approach follows the paradigm of the "multiple imputation" (Rubin, 1987; Rubin, 1988), which consists in using a set of "similar stations" instead than the most similar. This way, a sort of estimation range can be determined allowing the introduction of uncertainty. Finally, time series can be grouped on the basis of monthly rainfall rates defining classes of wetness (i.e.: dry, moderately rainy and rainy), in order to achieve the estimation using homogeneous data subsets. We expect that integrating the methodology with these enhancements will certainly improve its reliability. The methodology was applied to the daily rainfall time series data registered in the Candelaro River Basin (Apulia - South Italy) from 1970 to 2001. REFERENCES D.B., Rubin, 1976. Inference and Missing Data. Biometrika 63 581-592 D.B. Rubin, 1987. Multiple Imputation for Nonresponce in Surveys, New York: John Wiley & Sons, Inc. D.B. Rubin, 1988. An overview of multiple imputation. In Survey Research Section, pp. 79-84, American Statistical Association, 1988. J.L., Schafer, 1997. Analysis of Incomplete Multivariate Data, Chapman & Hall. J., Scheffer, 2002. Dealing with Missing Data. Res. Lett. Inf. Math. Sci. 3, 153-160. Available online at http://www.massey.ac.nz/~wwiims/research/letters/ H. Theil, 1950. A rank-invariant method of linear and polynomial regression analysis. Indicationes Mathematicae, 12, pp.85-91.

  13. SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.

    PubMed

    Chu, Annie; Cui, Jenny; Dinov, Ivo D

    2009-03-01

    The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.

  14. Application of artificial neural network to fMRI regression analysis.

    PubMed

    Misaki, Masaya; Miyauchi, Satoru

    2006-01-15

    We used an artificial neural network (ANN) to detect correlations between event sequences and fMRI (functional magnetic resonance imaging) signals. The layered feed-forward neural network, given a series of events as inputs and the fMRI signal as a supervised signal, performed a non-linear regression analysis. This type of ANN is capable of approximating any continuous function, and thus this analysis method can detect any fMRI signals that correlated with corresponding events. Because of the flexible nature of ANNs, fitting to autocorrelation noise is a problem in fMRI analyses. We avoided this problem by using cross-validation and an early stopping procedure. The results showed that the ANN could detect various responses with different time courses. The simulation analysis also indicated an additional advantage of ANN over non-parametric methods in detecting parametrically modulated responses, i.e., it can detect various types of parametric modulations without a priori assumptions. The ANN regression analysis is therefore beneficial for exploratory fMRI analyses in detecting continuous changes in responses modulated by changes in input values.

  15. Nonparametric methods for drought severity estimation at ungauged sites

    NASA Astrophysics Data System (ADS)

    Sadri, S.; Burn, D. H.

    2012-12-01

    The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.

  16. Experimentally testing the dependence of momentum transport on second derivatives using Gaussian process regression

    NASA Astrophysics Data System (ADS)

    Chilenski, M. A.; Greenwald, M. J.; Hubbard, A. E.; Hughes, J. W.; Lee, J. P.; Marzouk, Y. M.; Rice, J. E.; White, A. E.

    2017-12-01

    It remains an open question to explain the dramatic change in intrinsic rotation induced by slight changes in electron density (White et al 2013 Phys. Plasmas 20 056106). One proposed explanation is that momentum transport is sensitive to the second derivatives of the temperature and density profiles (Lee et al 2015 Plasma Phys. Control. Fusion 57 125006), but it is widely considered to be impossible to measure these higher derivatives. In this paper, we show that it is possible to estimate second derivatives of electron density and temperature using a nonparametric regression technique known as Gaussian process regression. This technique avoids over-constraining the fit by not assuming an explicit functional form for the fitted curve. The uncertainties, obtained rigorously using Markov chain Monte Carlo sampling, are small enough that it is reasonable to explore hypotheses which depend on second derivatives. It is found that the differences in the second derivatives of n{e} and T{e} between the peaked and hollow rotation cases are rather small, suggesting that changes in the second derivatives are not likely to explain the experimental results.

  17. Enhanced ID Pit Sizing Using Multivariate Regression Algorithm

    NASA Astrophysics Data System (ADS)

    Krzywosz, Kenji

    2007-03-01

    EPRI is funding a program to enhance and improve the reliability of inside diameter (ID) pit sizing for balance-of plant heat exchangers, such as condensers and component cooling water heat exchangers. More traditional approaches to ID pit sizing involve the use of frequency-specific amplitude or phase angles. The enhanced multivariate regression algorithm for ID pit depth sizing incorporates three simultaneous input parameters of frequency, amplitude, and phase angle. A set of calibration data sets consisting of machined pits of various rounded and elongated shapes and depths was acquired in the frequency range of 100 kHz to 1 MHz for stainless steel tubing having nominal wall thickness of 0.028 inch. To add noise to the acquired data set, each test sample was rotated and test data acquired at 3, 6, 9, and 12 o'clock positions. The ID pit depths were estimated using a second order and fourth order regression functions by relying on normalized amplitude and phase angle information from multiple frequencies. Due to unique damage morphology associated with the microbiologically-influenced ID pits, it was necessary to modify the elongated calibration standard-based algorithms by relying on the algorithm developed solely from the destructive sectioning results. This paper presents the use of transformed multivariate regression algorithm to estimate ID pit depths and compare the results with the traditional univariate phase angle analysis. Both estimates were then compared with the destructive sectioning results.

  18. Determination of boiling point of petrochemicals by gas chromatography-mass spectrometry and multivariate regression analysis of structural activity relationship.

    PubMed

    Fakayode, Sayo O; Mitchell, Breanna S; Pollard, David A

    2014-08-01

    Accurate understanding of analyte boiling points (BP) is of critical importance in gas chromatographic (GC) separation and crude oil refinery operation in petrochemical industries. This study reported the first combined use of GC separation and partial-least-square (PLS1) multivariate regression analysis of petrochemical structural activity relationship (SAR) for accurate BP determination of two commercially available (D3710 and MA VHP) calibration gas mix samples. The results of the BP determination using PLS1 multivariate regression were further compared with the results of traditional simulated distillation method of BP determination. The developed PLS1 regression was able to correctly predict analytes BP in D3710 and MA VHP calibration gas mix samples, with a root-mean-square-%-relative-error (RMS%RE) of 6.4%, and 10.8% respectively. In contrast, the overall RMS%RE of 32.9% and 40.4%, respectively obtained for BP determination in D3710 and MA VHP using a traditional simulated distillation method were approximately four times larger than the corresponding RMS%RE of BP prediction using MRA, demonstrating the better predictive ability of MRA. The reported method is rapid, robust, and promising, and can be potentially used routinely for fast analysis, pattern recognition, and analyte BP determination in petrochemical industries. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Inference for multivariate regression model based on multiply imputed synthetic data generated via posterior predictive sampling

    NASA Astrophysics Data System (ADS)

    Moura, Ricardo; Sinha, Bimal; Coelho, Carlos A.

    2017-06-01

    The recent popularity of the use of synthetic data as a Statistical Disclosure Control technique has enabled the development of several methods of generating and analyzing such data, but almost always relying in asymptotic distributions and in consequence being not adequate for small sample datasets. Thus, a likelihood-based exact inference procedure is derived for the matrix of regression coefficients of the multivariate regression model, for multiply imputed synthetic data generated via Posterior Predictive Sampling. Since it is based in exact distributions this procedure may even be used in small sample datasets. Simulation studies compare the results obtained from the proposed exact inferential procedure with the results obtained from an adaptation of Reiters combination rule to multiply imputed synthetic datasets and an application to the 2000 Current Population Survey is discussed.

  20. Functional Groups Based on Leaf Physiology: Are they Spatially and Temporally Robust?

    NASA Technical Reports Server (NTRS)

    Foster, Tammy E.; Brooks, J. Renee

    2004-01-01

    The functional grouping hypothesis, which suggests that complexity in ecosystem function can be simplified by grouping species with similar responses, was tested in the Florida scrub habitat. Functional groups were identified based on how species in fire maintained Florida scrub regulate exchange of carbon and water with the atmosphere as indicated by both instantaneous gas exchange measurements and integrated measures of function (%N, delta C-13, delta N-15, C-N ratio). Using cluster analysis, five distinct physiologically-based functional groups were identified in the fire maintained scrub. These functional groups were tested to determine if they were robust spatially, temporally, and with management regime. Analysis of Similarities (ANOSIM), a non-parametric multivariate analysis, indicated that these five physiologically-based groupings were not altered by plot differences (R = -0.115, p = 0.893) or by the three different management regimes; prescribed burn, mechanically treated and burn, and fire-suppressed (R = 0.018, p = 0.349). The physiological groupings also remained robust between the two climatically different years 1999 and 2000 (R = -0.027, p = 0.725). Easy-to-measure morphological characteristics indicating functional groups would be more practical for scaling and modeling ecosystem processes than detailed gas-exchange measurements, therefore we tested a variety of morphological characteristics as functional indicators. A combination of non-parametric multivariate techniques (Hierarchical cluster analysis, non-metric Multi-Dimensional Scaling, and ANOSIM) were used to compare the ability of life form, leaf thickness, and specific leaf area classifications to identify the physiologically-based functional groups. Life form classifications (ANOSIM; R = 0.629, p 0.001) were able to depict the physiological groupings more adequately than either specific leaf area (ANOSIM; R = 0.426, p = 0.001) or leaf thickness (ANOSIM; R 0.344, p 0.001). The ability of life forms to depict the physiological groupings was improved by separating the parasitic Ximenia americana from the shrub category (ANOSIM; R = 0.794, p = 0.001). Therefore, a life form classification including parasites was determined to be a good indicator of the physiological processes of scrub species, and would be a useful method of grouping for scaling physiological processes to the ecosystem level.

  1. Multivariate random-parameters zero-inflated negative binomial regression model: an application to estimate crash frequencies at intersections.

    PubMed

    Dong, Chunjiao; Clarke, David B; Yan, Xuedong; Khattak, Asad; Huang, Baoshan

    2014-09-01

    Crash data are collected through police reports and integrated with road inventory data for further analysis. Integrated police reports and inventory data yield correlated multivariate data for roadway entities (e.g., segments or intersections). Analysis of such data reveals important relationships that can help focus on high-risk situations and coming up with safety countermeasures. To understand relationships between crash frequencies and associated variables, while taking full advantage of the available data, multivariate random-parameters models are appropriate since they can simultaneously consider the correlation among the specific crash types and account for unobserved heterogeneity. However, a key issue that arises with correlated multivariate data is the number of crash-free samples increases, as crash counts have many categories. In this paper, we describe a multivariate random-parameters zero-inflated negative binomial (MRZINB) regression model for jointly modeling crash counts. The full Bayesian method is employed to estimate the model parameters. Crash frequencies at urban signalized intersections in Tennessee are analyzed. The paper investigates the performance of MZINB and MRZINB regression models in establishing the relationship between crash frequencies, pavement conditions, traffic factors, and geometric design features of roadway intersections. Compared to the MZINB model, the MRZINB model identifies additional statistically significant factors and provides better goodness of fit in developing the relationships. The empirical results show that MRZINB model possesses most of the desirable statistical properties in terms of its ability to accommodate unobserved heterogeneity and excess zero counts in correlated data. Notably, in the random-parameters MZINB model, the estimated parameters vary significantly across intersections for different crash types. Copyright © 2014 Elsevier Ltd. All rights reserved.

  2. Learning investment indicators through data extension

    NASA Astrophysics Data System (ADS)

    Dvořák, Marek

    2017-07-01

    Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.

  3. Advanced statistics: linear regression, part II: multiple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.

  4. Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis.

    PubMed

    Liu, Fei; Ye, Lanhan; Peng, Jiyu; Song, Kunlin; Shen, Tingting; Zhang, Chu; He, Yong

    2018-02-27

    Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R 2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where R c 2 and R p 2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice.

  5. Fast Detection of Copper Content in Rice by Laser-Induced Breakdown Spectroscopy with Uni- and Multivariate Analysis

    PubMed Central

    Ye, Lanhan; Song, Kunlin; Shen, Tingting

    2018-01-01

    Fast detection of heavy metals is very important for ensuring the quality and safety of crops. Laser-induced breakdown spectroscopy (LIBS), coupled with uni- and multivariate analysis, was applied for quantitative analysis of copper in three kinds of rice (Jiangsu rice, regular rice, and Simiao rice). For univariate analysis, three pre-processing methods were applied to reduce fluctuations, including background normalization, the internal standard method, and the standard normal variate (SNV). Linear regression models showed a strong correlation between spectral intensity and Cu content, with an R2 more than 0.97. The limit of detection (LOD) was around 5 ppm, lower than the tolerance limit of copper in foods. For multivariate analysis, partial least squares regression (PLSR) showed its advantage in extracting effective information for prediction, and its sensitivity reached 1.95 ppm, while support vector machine regression (SVMR) performed better in both calibration and prediction sets, where Rc2 and Rp2 reached 0.9979 and 0.9879, respectively. This study showed that LIBS could be considered as a constructive tool for the quantification of copper contamination in rice. PMID:29495445

  6. [Use of multiple regression models in observational studies (1970-2013) and requirements of the STROBE guidelines in Spanish scientific journals].

    PubMed

    Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M

    In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.

  7. The microbiological profile and presence of bloodstream infection influence mortality rates in necrotizing fasciitis

    PubMed Central

    2011-01-01

    Introduction Necrotizing fasciitis (NF) is a life threatening infectious disease with a high mortality rate. We carried out a microbiological characterization of the causative pathogens. We investigated the correlation of mortality in NF with bloodstream infection and with the presence of co-morbidities. Methods In this retrospective study, we analyzed 323 patients who presented with necrotizing fasciitis at two different institutions. Bloodstream infection (BSI) was defined as a positive blood culture result. The patients were categorized as survivors and non-survivors. Eleven clinically important variables which were statistically significant by univariate analysis were selected for multivariate regression analysis and a stepwise logistic regression model was developed to determine the association between BSI and mortality. Results Univariate logistic regression analysis showed that patients with hypotension, heart disease, liver disease, presence of Vibrio spp. in wound cultures, presence of fungus in wound cultures, and presence of Streptococcus group A, Aeromonas spp. or Vibrio spp. in blood cultures, had a significantly higher risk of in-hospital mortality. Our multivariate logistic regression analysis showed a higher risk of mortality in patients with pre-existing conditions like hypotension, heart disease, and liver disease. Multivariate logistic regression analysis also showed that presence of Vibrio spp in wound cultures, and presence of Streptococcus Group A in blood cultures were associated with a high risk of mortality while debridement > = 3 was associated with improved survival. Conclusions Mortality in patients with necrotizing fasciitis was significantly associated with the presence of Vibrio in wound cultures and Streptococcus group A in blood cultures. PMID:21693053

  8. The Heart´s rhythm 'n' blues: Sex differences in circadian variation patterns of vagal activity vary by depressive symptoms in predominantly healthy employees.

    PubMed

    Jarczok, Marc N; Aguilar-Raab, Corina; Koenig, Julian; Kaess, Michael; Borniger, Jeremy C; Nelson, Randy J; Hall, Martica; Ditzen, Beate; Thayer, Julian F; Fischer, Joachim E

    2018-03-15

    Successful regulation of emotional states is positively associated to mental health, while difficulties in regulating emotions are negatively associated to overall mental health and in particular associated with anxiety or depression symptoms. A key structure associated to socio-emotional regulatory processes is the central autonomic network. Activity in this structure is associated to vagal activity can be indexed noninvasively and simply by measures of peripheral cardiac autonomic modulations such as heart rate variability. Vagal activity exhibits a circadian variation pattern, with a maximum during nighttime. Depression is known to affect chronobiology. Also, depressive symptoms are known to be associated with decreased resting state vagal activity, but studies investigating the association between circadian variation pattern of vagal activity and depressive symptoms are scarce. We aim to examine these patterns in association to symptom severity of depression using chronobiologic methods. Data from the Manheim Industrial Cohort Studies (MICS) were used. A total of 3,030 predominantly healthy working adults underwent, among others, ambulatory 24-h hear rate-recordings, detailed health examination and online questionnaires and were available for this analysis. The root mean sum of successive differences (RMSSD) was used as an indicator of vagally mediated heart rate variability. Three individual-level cosine function parameters (MESOR, amplitude, acrophase) were estimated to quantify circadian variation pattern. Multivariate linear regression models including important covariates such as age, sex, and lifestyle factors as well as an interaction effect of sex with depressive symptoms were used to estimate the association of circadian variation pattern of vagal activity with depressive symptoms simultaneously. The analysis sample consisted of 20.2% females and an average age 41 with standard deviation of 11 years. Nonparametric bivariate analysis revealed significant MESOR and amplitude differences between the 90 th percentile split, but not on acrophase. Multivariate linear regression models estimated depressive symptoms to be negatively associated with the 24h mean (MESOR) and oscillation amplitude in men but positively associated in women. This pattern of findings indicates a blunted day-night rhythm of vagal activity in men with greater depressive symptoms as well as a moderation effect of sex in the association of CVP and depressive symptoms. This is the first study investigating circadian variation pattern by mild depressive symptoms in a large, rather healthy occupational sample. Depressive symptoms were associated with decreased circadian variation pattern of vagal activity in men but with increased circadian variation pattern in women. The possible underlying mechanism(s) are discussed using the neurovisceral integration model. These findings may have implications for the knowledge on etiology, diagnosis, course, and treatment of depressive symptoms and thus may be of significant public health relevance.

  9. An institutional study of time delays for symptomatic carotid endarterectomy.

    PubMed

    Charbonneau, Philippe; Bonaventure, Paule Lessard; Drudi, Laura M; Beaudoin, Nathalie; Blair, Jean-François; Elkouri, Stéphane

    2016-12-01

    The aim of this study was to assess time delays between first cerebrovascular symptoms and carotid endarterectomy (CEA) at a single center and to systematically evaluate causes of these delays. Consecutive adult patients who underwent CEAs between January 2010 and September 2011 at a single university-affiliated center (Centre Hospitalier de l'Université Montréal-Hôtel-Dieu Hospital, Montreal) were identified from a clinical database and operative records. Covariates of interest were extracted from electronic medical records. Timing and nature of the first cerebrovascular symptoms were also documented. The first medical contact and pathway of referral were also assessed. When possible, the ABCD 2 score (age, blood pressure, clinical features, duration of symptoms, and diabetes) was calculated to calculate further risk of stroke. The nonparametric Wilcoxon test was used to assess differences in time intervals between two variables. The Kruskal-Wallis test was used to assess differences in time intervals in comparing more than two variables. A multivariate linear regression analysis was performed using covariates that were determined to be statistically significant in our sensitivity analyses. The cohort consisted of 111 patients with documented symptomatic carotid stenosis undergoing surgical intervention. Thirty-nine percent of all patients were operated on within 2 weeks from the first cerebrovascular symptoms. The median time between the occurrence of the first neurologic symptom and the CEA procedure was 25 (interquartile range [IQR], 11-85) days. The patient-dependent delay, defined as the median delay between the first neurologic symptom and the first medical contact, was 1 (IQR, 0-14) day. The medical-dependent delay was defined as the time interval between the first medical contact and CEA. This included the delay between the first medical contact and the request for surgery consultation (median, 3 [IQR, 1-10] days). The multivariate regression model demonstrated that the emergency physician as referral source (P = .0002) was statistically significant for reducing CEA delay. Patients who were investigated as an outpatient (P = .02), first medical contact with a general practitioner (P = .0002), and hospital center I as referral center (P = .045) were also found to be statistically significant to extend CEA delay when the model was adjusted over all covariates. In this center, there was no correlation between ABCD 2 risk score and waiting time for surgery. The majority of our cohort falls short of the recommended 2-week interval to perform CEA. Factors contributing to reduced CEA delay were presentation to an emergency department, in-patient investigations, and a stroke center where a vascular surgeon is available. Copyright © 2016 Society for Vascular Surgery. Published by Elsevier Inc. All rights reserved.

  10. Using Logistic Regression To Predict the Probability of Debris Flows Occurring in Areas Recently Burned By Wildland Fires

    USGS Publications Warehouse

    Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.

    2003-01-01

    Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.

  11. Long-term Impact of Adjuvant Versus Early Salvage Radiation Therapy in pT3N0 Prostate Cancer Patients Treated with Radical Prostatectomy: Results from a Multi-institutional Series.

    PubMed

    Fossati, Nicola; Karnes, R Jeffrey; Boorjian, Stephen A; Moschini, Marco; Morlacco, Alessandro; Bossi, Alberto; Seisen, Thomas; Cozzarini, Cesare; Fiorino, Claudio; Noris Chiorda, Barbara; Gandaglia, Giorgio; Dell'Oglio, Paolo; Joniau, Steven; Tosco, Lorenzo; Shariat, Shahrokh; Goldner, Gregor; Hinkelbein, Wolfgang; Bartkowiak, Detlef; Haustermans, Karin; Tombal, Bertrand; Montorsi, Francesco; Van Poppel, Hein; Wiegel, Thomas; Briganti, Alberto

    2017-06-01

    Three prospective randomised trials reported discordant findings regarding the impact of adjuvant radiation therapy (aRT) versus observation for metastasis-free survival (MFS) and overall survival (OS) among patients with pT3N0 prostate cancer treated with radical prostatectomy (RP). None of these trials systematically included patients who underwent early salvage radiation therapy (esRT). To test the hypothesis that aRT was associated with better cancer control and survival compared with observation followed by esRT. Using a multi-institutional cohort from seven tertiary referral centres, we retrospectively identified 510 pT3pN0 patients with undetectable prostate-specific antigen (PSA) after RP between 1996 and 2009. Patients were stratified into two groups: aRT (group 1) versus observation followed by esRT in case of PSA relapse (group 2). Specifically, esRT was administered at a PSA level ≤0.5ng/ml. We compared aRT versus observation followed by esRT. The evaluated outcomes were MFS and OS. Multivariable Cox regression analyses tested the association between groups (aRT vs observation followed by esRT) and oncologic outcomes. Covariates consisted of pathologic stage (pT3a vs pT3b or higher), pathologic Gleason score (≤6, 7, or ≥8), surgical margin status (negative vs positive), and year of surgery. An interaction with groups and baseline patient risk was tested for the hypothesis that the impact of aRT versus observation followed by esRT was different by pathologic characteristics. The nonparametric curve fitting method was used to explore graphically the relationship between MFS and OS at 8 yr and baseline patient risk (derived from the multivariable analysis). Overall, 243 patients (48%) underwent aRT, and 267 (52%) underwent initial observation. Within the latter group, 141 patients experienced PSA relapse and received esRT. Median follow-up after RP was 94 mo (interquartile range [IQR]: 53-126) and 92 mo (IQR: 70-136), respectively (p=0.2). MFS (92% vs 91%; p=0.9) and OS (89% vs 92%; p=0.9) at 8 yr after surgery were not significantly different between the two groups. These results were confirmed in multivariable analysis, in which observation followed by esRT was not associated with a significantly higher risk of distant metastasis (hazard ratio [HR]: 1.35; p=0.4) and overall mortality (HR: 1.39; p=0.4) compared with aRT. Using the nonparametric curve fitting method, a comparable proportion of MFS and OS at 8 yr among groups was observed regardless of pathologic cancer features (p=0.9 and p=0.7, respectively). Limitations consisted of the retrospective nature of the study and the relatively small size of the patient population. At long-term follow-up, no significant differences between aRT and esRT were observed for MFS and OS. Our study, although based on retrospective data, suggests that esRT does not compromise cancer control and potentially reduces overtreatment associated with aRT. At long-term follow-up, no significant differences in terms of distant metastasis and mortality were observed between immediate postoperative adjuvant radiation therapy (aRT) and initial observation followed by early salvage radiation therapy (esRT) in case of prostate-specific antigen relapse. Our study suggests that esRT does not compromise cancer control and potentially reduces overtreatment associated with aRT. Copyright © 2016 European Association of Urology. Published by Elsevier B.V. All rights reserved.

  12. The association between pelvic girdle pain and sick leave during pregnancy; a retrospective study of a Norwegian population.

    PubMed

    Malmqvist, Stefan; Kjaermann, Inger; Andersen, Knut; Økland, Inger; Larsen, Jan Petter; Brønnick, Kolbjørn

    2015-10-05

    The incidence of pelvic girdle pain (PGP) in pregnancy is wide ranged depending on definition, the utilised diagnostic means, and the design of the studies. PGP during pregnancy has negative effects on activities of daily living and causes long sick leave, which makes it a major public health issue. Our objectives were to explore the frequency of sick leave in pregnancy due to PGP, assess the relationship between different types of pain-related activities of daily living, examine physical workload, type of work in relation to sick leave, and to explore factors that make women less likely to take sick leave for PGP. All women giving birth at the maternity ward of Stavanger University Hospital, Norway, were asked to participate and complete a questionnaire on demographic features, PGP, pain-related activities of daily living, sick leave in general and for PGP, frequency of exercising before and during pregnancy. Drawings of pelvic girdle and low back area were used for the localization of pain. PGP intensity was then rated retrospectively on a numerical rating scale. Non-parametric tests, multinomial logistic regression and sequential linear regression analysis were used in the statistical analysis. PGP is a frequent and major cause of sick leave during pregnancy among Norwegian women, which is also reflected in activities of daily living as measured with scores on all Oswestry disability index items. In the multivariate analysis of factors related to sick leave and PGP we found that work satisfaction, problems with lifting and sleeping, and pain intensity were risk factors for sick leave. In addition, women with longer education, higher work satisfaction and fewer problems with sitting, walking and standing, were less likely to take sick leave in pregnancy, despite the same pain intensity as women being on sick leave. A coping factor in pregnant women with PGP was discovered, most likely dependant on education, associated with work situation and/or work posture, which decreases sick leave. We recommend these issues to be further examined in a prospective longitudinal study since it may have important implications for sick leave frequency during pregnancy.

  13. Hospital Care for Mental Health and Substance Abuse in Children with Epilepsy

    PubMed Central

    Thibault, Dylan P.; Mendizabal, Adys; Abend, Nicholas S.; Davis, Kathryn A.; Crispo, James; Willis, Allison. W.

    2017-01-01

    Background Reducing the burden of pediatric mental illness requires greater knowledge of Mental Health and Substance Abuse (MHSA) outcomes in children who are at an increased risk of primary psychiatric illness. National data on hospital care for psychiatric illness in children with epilepsy are limited. Methods We used the Kids’ Inpatient Database (KID), Healthcare Cost and Utilization Project (HCUP), Agency for Healthcare Research and Quality, from 2003-2009 to examine MHSA hospitalization patterns in children with comorbid epilepsy. Non-parametric and regression analyses determined the association of comorbid epilepsy with specific MHSA diagnoses and examined the impact of epilepsy on length of stay (LOS) for such MHSA diagnoses while controlling for demographic, payer and hospital characteristics. Results We observed 353,319 weighted MHSA hospitalizations of children ages 6 - 20; 3,280 of these involved a child with epilepsy. Depression was the most common MHSA diagnosis in the general population (39.5%) whereas bipolar disorder was the most common MHSA diagnosis among children with epilepsy (36.2%). Multivariate logistic regression models revealed that children with comorbid epilepsy had greater adjusted odds of bipolar disorder (AOR 1.17, 1.04-1.30), psychosis (AOR 1.78, 1.51-2.09), sleep disorder (AOR 5.90, 1.90-18.34), and suicide attempt/ ideation (AOR 3.20, 1.46-6.99) compared to the general MHSA inpatient population. Epilepsy was associated with a greater LOS and a higher adjusted incidence rate ratio (IRR) for prolonged LOS (IRR 1.12, 1.09-1.17), particularly for suicide attempt/ideation (IRR 3.74, 1.68-8.34). Conclusions Children with epilepsy have distinct patterns of hospital care for mental illness and substance abuse, and experience prolonged hospitalization for MHSA conditions. Strategies to reduce psychiatric hospitalizations in this population may require disease specific approaches and should measure disease relevant outcomes. Hospitals caring for large numbers of children with neurological disease (such as academic centers) may have inaccurate measurements of mental health care quality unless the impact of key comorbid conditions such as epilepsy is considered. PMID:26963820

  14. Radiation-Induced Liver Injury in Three-Dimensional Conformal Radiation Therapy (3D-CRT) for Postoperative or Locoregional Recurrent Gastric Cancer: Risk Factors and Dose Limitations.

    PubMed

    Li, Guichao; Wang, Jiazhou; Hu, Weigang; Zhang, Zhen

    2015-01-01

    This study examined the status of radiation-induced liver injury in adjuvant or palliative gastric cancer radiation therapy (RT), identified risk factors of radiation-induced liver injury in gastric cancer RT, analysed the dose-volume effects of liver injury, and developed a liver dose limitation reference for gastric cancer RT. Data for 56 post-operative gastric cancer patients and 6 locoregional recurrent gastric cancer patients treated with three-dimensional conformal radiation therapy (3D-CRT) or intensity-modulated radiation therapy (IMRT) from Sep 2007 to Sep 2009 were analysed. Forty patients (65%) were administered concurrent chemotherapy. Pre- and post-radiation chemotherapy were given to 61 patients and 43 patients, respectively. The radiation dose was 45-50.4 Gy in 25-28 fractions. Clinical parameters, including gender, age, hepatic B virus status, concurrent chemotherapy, and the total number of chemotherapy cycles, were included in the analysis. Univariate analyses with a non-parametric rank test (Mann-Whitney test) and logistic regression test and a multivariate analysis using a logistic regression test were completed. We also analysed the correlation between RT and the changes in serum chemistry parameters [including total bilirubin, (TB), direct bilirubin (D-TB), alkaline phosphatase (ALP), alanine aminotransferase (ALT), aspartate aminotransferase (AST) and serum albumin (ALB)] after RT. The Child-Pugh grade progressed from grade A to grade B after radiotherapy in 10 patients. A total of 16 cases of classic radiation-induced liver disease (RILD) were observed, and 2 patients had both Child-Pugh grade progression and classic RILD. No cases of non-classic radiation liver injury occurred in the study population. Among the tested clinical parameters, the total number of chemotherapy cycles correlated with liver function injury. V35 and ALP levels were significant predictive factors for radiation liver injury. In 3D-CRT for gastric cancer patients, radiation-induced liver injury may occur and affect the overall treatment plan. The total number of chemotherapy cycles correlated with liver function injury, and V35 and ALP are significant predictive factors for radiation-induced liver injury. Our dose limitation reference for liver protection is feasible.

  15. Hospital care for mental health and substance abuse in children with epilepsy.

    PubMed

    Thibault, Dylan P; Mendizabal, Adys; Abend, Nicholas S; Davis, Kathryn A; Crispo, James; Willis, Allison W

    2016-04-01

    Reducing the burden of pediatric mental illness requires greater knowledge of mental health and substance abuse (MHSA) outcomes in children who are at an increased risk of primary psychiatric illness. National data on hospital care for psychiatric illness in children with epilepsy are limited. We used the Kids' Inpatient Database (KID), the Healthcare Cost and Utilization Project (HCUP), and the Agency for Healthcare Research and Quality from 2003 to 2009 to examine MHSA hospitalization patterns in children with comorbid epilepsy. Nonparametric and regression analyses determined the association of comorbid epilepsy with specific MHSA diagnoses and examined the impact of epilepsy on length of stay (LOS) for such MHSA diagnoses while controlling for demographic, payer, and hospital characteristics. We observed 353,319 weighted MHSA hospitalizations of children ages 6-20; 3280 of these involved a child with epilepsy. Depression was the most common MHSA diagnosis in the general population (39.5%) whereas bipolar disorder was the most common MHSA diagnosis among children with epilepsy (36.2%). Multivariate logistic regression models revealed that children with comorbid epilepsy had greater adjusted odds of bipolar disorder (AOR: 1.17, 1.04-1.30), psychosis (AOR: 1.78, 1.51-2.09), sleep disorder (AOR: 5.90, 1.90-18.34), and suicide attempt/ideation (AOR: 3.20, 1.46-6.99) compared to the general MHSA inpatient population. Epilepsy was associated with a greater LOS and a higher adjusted incidence rate ratio (IRR) for prolonged LOS (IRR: 1.12, 1.09-1.17), particularly for suicide attempt/ideation (IRR: 3.74, 1.68-8.34). Children with epilepsy have distinct patterns of hospital care for mental illness and substance abuse and experience prolonged hospitalization for MHSA conditions. Strategies to reduce psychiatric hospitalizations in this population may require disease-specific approaches and should measure disease-relevant outcomes. Hospitals caring for large numbers of children with neurological disease (such as academic centers) may have inaccurate measurements of mental health-care quality unless the impact of key comorbid conditions such as epilepsy is considered. Copyright © 2016 Elsevier Inc. All rights reserved.

  16. Improved estimation of PM2.5 using Lagrangian satellite-measured aerosol optical depth

    NASA Astrophysics Data System (ADS)

    Olivas Saunders, Rolando

    Suspended particulate matter (aerosols) with aerodynamic diameters less than 2.5 mum (PM2.5) has negative effects on human health, plays an important role in climate change and also causes the corrosion of structures by acid deposition. Accurate estimates of PM2.5 concentrations are thus relevant in air quality, epidemiology, cloud microphysics and climate forcing studies. Aerosol optical depth (AOD) retrieved by the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite instrument has been used as an empirical predictor to estimate ground-level concentrations of PM2.5 . These estimates usually have large uncertainties and errors. The main objective of this work is to assess the value of using upwind (Lagrangian) MODIS-AOD as predictors in empirical models of PM2.5. The upwind locations of the Lagrangian AOD were estimated using modeled backward air trajectories. Since the specification of an arrival elevation is somewhat arbitrary, trajectories were calculated to arrive at four different elevations at ten measurement sites within the continental United States. A systematic examination revealed trajectory model calculations to be sensitive to starting elevation. With a 500 m difference in starting elevation, the 48-hr mean horizontal separation of trajectory endpoints was 326 km. When the difference in starting elevation was doubled and tripled to 1000 m and 1500m, the mean horizontal separation of trajectory endpoints approximately doubled and tripled to 627 km and 886 km, respectively. A seasonal dependence of this sensitivity was also found: the smallest mean horizontal separation of trajectory endpoints was exhibited during the summer and the largest separations during the winter. A daily average AOD product was generated and coupled to the trajectory model in order to determine AOD values upwind of the measurement sites during the period 2003-2007. Empirical models that included in situ AOD and upwind AOD as predictors of PM2.5 were generated by multivariate linear regressions using the least squares method. The multivariate models showed improved performance over the single variable regression (PM2.5 and in situ AOD) models. The statistical significance of the improvement of the multivariate models over the single variable regression models was tested using the extra sum of squares principle. In many cases, even when the R-squared was high for the multivariate models, the improvement over the single models was not statistically significant. The R-squared of these multivariate models varied with respect to seasons, with the best performance occurring during the summer months. A set of seasonal categorical variables was included in the regressions to exploit this variability. The multivariate regression models that included these categorical seasonal variables performed better than the models that didn't account for seasonal variability. Furthermore, 71% of these regressions exhibited improvement over the single variable models that was statistically significant at a 95% confidence level.

  17. Multivariate regression model for partitioning tree volume of white oak into round-product classes

    Treesearch

    Daniel A. Yaussy; David L. Sonderman

    1984-01-01

    Describes the development of multivariate equations that predict the expected cubic volume of four round-product classes from independent variables composed of individual tree-quality characteristics. Although the model has limited application at this time, it does demonstrate the feasibility of partitioning total tree cubic volume into round-product classes based on...

  18. Prognostic value of long noncoding RNA HOTAIR in digestive system malignancies.

    PubMed

    Wang, Shuai; Wang, Zhou

    2015-07-01

    HOX transcript antisense intergenic RNA (HOTAIR), a well-known long noncoding RNA, has been found to play significant roles in several tumors. However, the clinical application value of HOTAIR in digestive system malignancies remains to be clarified. We aimed to explore comprehensively the potential role of HOTAIR as a prognostic indicator in digestive system malignancies. Systematic search was performed in Pubmed, Embase, Cochrane Library, and Web of Science until July 5, 2014. A quantitative meta-analysis was conducted with standard statistical methods for eligible papers on the prognostic value of HOTAIR in digestive system cancers. A total of 1059 patients from 13 studies were included in the meta-analysis. A significant association was found between HOTAIR abundance and poor overall survival (OS) of patients with digestive system malignancies, with pooled hazard ratio (HR) of 2.587 (95% confidence interval [CI]: 2.054-3.259, P < 0.001). By combining HRs from Cox multivariate analyses, we found HOTAIR was an independent prognostic factor for OS without obvious heterogeneity (HR: 2.405, 95% CI: 1.883-3.0722, P < 0.001). Subgroup analysis showed that tumor type, histology type, region, publication year, sample size, and quality score did not alter the predictive value of HOTAIR as an independent factor for survival. Meta-regression and sensitivity analysis both suggested the reliability of our findings. A slight publication bias was observed. After adjustment by nonparametric "trim-and-fill" method, the corrected HRs had no significant change. HOTAIR could be exploited as a novel prognostic biomarker for patients with digestive system malignancies. © 2015 Journal of Gastroenterology and Hepatology Foundation and Wiley Publishing Asia Pty Ltd.

  19. Comparative Associations of Working Memory and Pain Catastrophizing With Chronic Low Back Pain Intensity.

    PubMed

    Simon, Corey B; Lentz, Trevor A; Bishop, Mark D; Riley, Joseph L; Fillingim, Roger B; George, Steven Z

    2016-07-01

    Because of its high global burden, determining biopsychosocial influences of chronic low back pain (CLBP) is a research priority. Psychological factors such as pain catastrophizing are well established. However, cognitive factors such as working memory warrant further investigation to be clinically useful. The purpose of this study was to determine how working memory and pain catastrophizing are associated with CLBP measures of daily pain intensity and movement-evoked pain intensity. This study was a cross-sectional analysis of individuals with ≥3 months of CLBP (n=60) compared with pain-free controls (n=30). Participants completed measures of working memory, pain catastrophizing, and daily pain intensity. Movement-evoked pain intensity was assessed using the Back Performance Scale. Outcome measures were compared between individuals with CLBP and those who were pain-free using nonparametric testing. Associations were determined using multivariate regression analyses. Participants with CLBP (mean age=47.7 years, 68% female) had lower working memory performance (P=.008) and higher pain catastrophizing (P<.001) compared with pain-free controls (mean age=47.6 years, 63% female). For individuals with CLBP, only working memory remained associated with daily pain intensity (R(2)=.07, standardized beta=-.308, P=.041) and movement-evoked pain intensity (R(2)=.14, standardized beta=-.502, P=.001) after accounting for age, sex, education, and interactions between pain catastrophizing and working memory. The cross-sectional design prevented prospective analysis. Findings also are not indicative of overall working memory (eg, spatial) or cognitive performance. Working memory demonstrated the strongest association with daily pain and movement-evoked pain intensity compared with (and after accounting for) established CLBP factors. Future research will elucidate the prognostic value of working memory on prevention and recovery of CLBP. © 2016 American Physical Therapy Association.

  20. Patterns of Ninety-Day Readmissions Following Total Joint Replacement in a Bundled Payment Initiative.

    PubMed

    Behery, Omar A; Kester, Benjamin S; Williams, Jarrett; Bosco, Joseph A; Slover, James D; Iorio, Richard; Schwarzkopf, Ran

    2017-04-01

    Alternative payment models aim to improve quality and decrease costs associated with total joint replacement. Postoperative readmissions within 90 days are of interest to clinicians and administrators as there is no additional reimbursement beyond the episode bundled payment target price. The aim of this study is to improve the understanding of the patterns of readmission which would better guide perioperative patient management affecting readmissions. We hypothesize that readmissions have different timing, location, and patient health profile patterns based on whether the readmission is related to a medical or surgical diagnosis. A retrospective cohort of 80 readmissions out of 1412 total joint replacement patients reimbursed through a bundled payment plan was analyzed. Patients were grouped by readmission diagnosis (surgical or medical) and the main variables analyzed were time to readmission, location of readmission, and baseline Perioperative Orthopaedic Surgical Home and American Society of Anesthesiologists scores capturing pre-existing state of health. Nonparametric tests and multivariable regressions were used to test associations. Surgical readmissions occurred earlier than medical readmissions (mean 18 vs 33 days, P = .011), and were more likely to occur at the hospital where the surgery was performed (P = .035). Perioperative Orthopaedic Surgical Home and American Society of Anesthesiologists scores did not predict medical vs surgical readmissions (P = .466 and .879) after adjusting for confounding variables. Readmissions appear to follow different patterns depending on whether they are surgical or medical. Surgical readmissions occur earlier than medical readmissions, and more often at the hospital where the surgery was performed. The results of this study suggest that these 2 types of readmissions have different patterns with different implications toward perioperative care and follow-up after total joint replacement. Copyright © 2016 Elsevier Inc. All rights reserved.

  1. Applying Additive Hazards Models for Analyzing Survival in Patients with Colorectal Cancer in Fars Province, Southern Iran

    PubMed

    Madadizadeh, Farzan; Ghanbarnejad, Amin; Ghavami, Vahid; Zare Bandamiri, Mohammad; Mohammadianpanah, Mohammad

    2017-04-01

    Introduction: Colorectal cancer (CRC) is a commonly fatal cancer that ranks as third worldwide and third and the fifth in Iranian women and men, respectively. There are several methods for analyzing time to event data. Additive hazards regression models take priority over the popular Cox proportional hazards model if the absolute hazard (risk) change instead of hazard ratio is of primary concern, or a proportionality assumption is not made. Methods: This study used data gathered from medical records of 561 colorectal cancer patients who were admitted to Namazi Hospital, Shiraz, Iran, during 2005 to 2010 and followed until December 2015. The nonparametric Aalen’s additive hazards model, semiparametric Lin and Ying’s additive hazards model and Cox proportional hazards model were applied for data analysis. The proportionality assumption for the Cox model was evaluated with a test based on the Schoenfeld residuals and for test goodness of fit in additive models, Cox-Snell residual plots were used. Analyses were performed with SAS 9.2 and R3.2 software. Results: The median follow-up time was 49 months. The five-year survival rate and the mean survival time after cancer diagnosis were 59.6% and 68.1±1.4 months, respectively. Multivariate analyses using Lin and Ying’s additive model and the Cox proportional model indicated that the age of diagnosis, site of tumor, stage, and proportion of positive lymph nodes, lymphovascular invasion and type of treatment were factors affecting survival of the CRC patients. Conclusion: Additive models are suitable alternatives to the Cox proportionality model if there is interest in evaluation of absolute hazard change, or no proportionality assumption is made. Creative Commons Attribution License

  2. Variations in BMI and prevalence of health risks in diverse racial and ethnic populations.

    PubMed

    Stommel, Manfred; Schoenborn, Charlotte A

    2010-09-01

    When examining health risks associated with the BMI, investigators often rely on the customary BMI thresholds of the 1995 World Health Organization report. However, within-interval variations in morbidity and mortality can be substantial, and the thresholds do not necessarily correspond to identifiable risk increases. Comparing the prevalence of hypertension, diabetes, coronary heart disease (CHD), asthma, and arthritis among non-Hispanic whites, blacks, East Asians and Hispanics, we examine differences in the BMI-health-risk relationships for small BMI increments. The analysis is based on 11 years of data of the National Health Interview Survey (NHIS), with a sample size of 337,375 for the combined 1997-2007 Sample Adult. The analysis uses multivariate logistic regression models, employing a nonparametric approach to modeling the BMI-health-risk relationship, while relying on narrowly defined BMI categories. Rising BMI levels are associated with higher levels of chronic disease burdens in four major racial and ethnic groups, even after adjusting for many socio-demographic characteristics and three important health-related behaviors (smoking, physical activity, alcohol consumption). For all population groups, except East Asians, a modestly higher disease risk was noted for persons with a BMI <20 compared with persons with BMI in the range of 20-21. Using five chronic conditions as risk criteria, a categorization of the BMI into normal weight, overweight, or obesity appears arbitrary. Although the prevalence of disease risks differs among racial and ethnic groups regardless of BMI levels, the evidence presented here does not support the notion that the BMI-health-risk profile of East Asians and others warrants race-specific BMI cutoff points.

  3. Prevalence and Safety of Intravenous Immunoglobulin Administration During Maintenance Chemotherapy in Children with Acute Lymphoblastic Leukemia in First Complete Remission: A Health Maintenance Organization Perspective.

    PubMed

    Van Winkle, Patrick; Burchette, Raoul; Kim, Raymond; Raghunathan, Rukmani; Qureshi, Naveen

    2018-04-09

    Children with acute lymphoblastic leukemia (ALL) in first complete remission (CR1) experience hypogammaglobulinemia and are at risk of sepsis during maintenance chemotherapy. Intravenous immunoglobulin (IVIG) has been used to try to circumvent this risk, but no data exist regarding its safety and prevalence in a health maintenance organization. To evaluate the prevalence and safety of IVIG in children with ALL in CR1 during maintenance chemotherapy. A multicenter, retrospective cohort study of consecutive children with ALL in CR1 during maintenance chemotherapy from 2008 to 2014. Groups treated with or without IVIG were compared using nonparametric statistics. Multivariate logistic regression involved all variables available before maintenance therapy began. One hundred eighteen patients were included (53% males), aged 9 months to 19 years. Thirty of 31 patients (97%) who had immunoglobulins analyzed before IVIG were hypogammaglobulinemic. Thirty-six patients (30%) received IVIG during maintenance chemotherapy. Patients received an average of 10.5 IVIG doses (range = 1-31). Ninety-seven percent of doses were administered without a transfusion reaction. Other factors associated with IVIG use were prior double-delayed intensification (odds ratio = 5.36, 95% confidence interval = 1.3-27.49, p = 0.026) and episodes of bacteremia or fungemia before maintenance chemotherapy (odds ratio = 3.04, 95% confidence interval = 1.25-7.51, p = 0.015). Use of IVIG in children with ALL in CR1 with hypogammaglobulinemia occurred in approximately 30% of patients and was well tolerated. Administration of IVIG significantly correlated with a history of double-delayed intensification and prior bacteremia or fungemia.

  4. Gastroenterologist and nurse management of symptoms after pelvic radiotherapy for cancer: an economic evaluation of a clinical randomized controlled trial (the ORBIT study).

    PubMed

    Jordan, Jake; Gage, Heather; Benton, Barbara; Lalji, Amyn; Norton, Christine; Andreyev, H Jervoise N

    2017-01-01

    Over 20 distressing gastrointestinal symptoms affect many patients after pelvic radiotherapy, but in the United Kingdom few are referred for assessment. Algorithmic-based treatment delivered by either a consultant gastroenterologist or a clinical nurse specialist has been shown in a randomized trial to be statistically and clinically more effective than provision of a self-help booklet. In this study, we assessed cost-effectiveness. Outcomes were measured at baseline (pre-randomization) and 6 months. Change in quality-adjusted life years (QALYs) was the primary outcome for the economic evaluation; a secondary analysis used change in the bowel subset score of the modified Inflammatory Bowel Disease Questionnaire (IBDQ-B). Intervention costs, British pounds 2013, covered visits with the gastroenterologist or nurse, investigations, medications and treatments. Incremental outcomes and incremental costs were estimated simultaneously using multivariate linear regression. Uncertainty was handled non-parametrically using bootstrap with replacement. The mean (SD) cost of treatment was £895 (499) for the nurse and £1101 (567) for the consultant. The nurse was dominated by usual care, which was cheaper and achieved better outcomes. The mean cost per QALY gained from the consultant, compared to usual care, was £250,455; comparing the consultant to the nurse, it was £25,875. Algorithmic care produced better outcomes compared to the booklet only, as reflected in the IBDQ-B results, at a cost of ~£1,000. Algorithmic treatment of radiation bowel injury by a consultant or a nurse results in significant symptom relief for patients but was not found to be cost-effective according to the National Institute for Health and Care Excellence (NICE) criteria.

  5. Gene-gene interaction between CETP and APOE polymorphisms confers higher risk for hypertriglyceridemia in oldest-old Chinese women.

    PubMed

    Sun, Liang; Hu, Caiyou; Zheng, Chenguang; Huang, Zezhi; Lv, Zeping; Huang, Jin; Liang, Siying; Shi, Xiaohong; Zhu, Xiaoquan; Yuan, Huiping; Yang, Ze

    2014-07-01

    The knowledge of dyslipidemia and its genetic contributors in oldest-old subjects is limited; in addition, the majority of oldest-old subjects are females. Evidence has accumulated that multiple genetic factors play important roles in determining susceptibility to dyslipidemia and extended life span. Cholesterol ester transfer protein (CETP) and apolipoprotein E (APOE) are two plausible candidate genes for human longevity owing to their functionally related modulation of circulating lipid homeostasis; however, few studies have considered their interplay. In this study, we analyzed the distribution of CETP*V (rs5882) and APOE*4 (rs429358 and rs7412) in 372 oldest-old Chinese women (aged 80-109) and 340 controls (aged 20-58). In addition to replicating the association of longevity, our main goal was to evaluate the contribution of CETP*V, APOE*4 and CETP*APOE interaction to the risk of dyslipidemia. Only APOE*4 conferred a risk against longevity and was associated with high-cholesterol (hTC) and mixed dyslipidemia for oldest-old females. Moreover, CETP*V was found to be associated with hypertriglyceridemia (hTG) independently from APOE*4, age, BMI, alcohol drinking, TC, TG, HDL-c, and LDL-c. The stratification test, multivariable-adjusted logistic regression, and nonparametric MDR analysis all suggested a significant CETP*APOE interaction associated with hTG. The unadjusted odds for hTG were more than 4-fold in subjects with CETP*V and APOE*4 than those without either (OR=4.36, P<0.001). These results provide evidence of strong independent associations between hTG and CETP*V in oldest-old Chinese females, and APOE*4, as an independently non-significant variant, might interact with CETP*V resulting in an increased risk for hTG. Copyright © 2014 Elsevier Inc. All rights reserved.

  6. Intraoperative adverse events can be compensated by technical performance in neonates and infants after cardiac surgery: a prospective study.

    PubMed

    Nathan, Meena; Karamichalis, John M; Liu, Hua; del Nido, Pedro; Pigula, Frank; Thiagarajan, Ravi; Bacha, Emile A

    2011-11-01

    Our objective was to define the relationship between surgical technical performance score, intraoperative adverse events, and major postoperative adverse events in complex pediatric cardiac repairs. Infants younger than 6 months were prospectively followed up until discharge from the hospital. Technical performance scores were graded as optimal, adequate, or inadequate based on discharge echocardiograms and need for reintervention after initial surgery. Case complexity was determined by Risk Adjustment in Congenital Heart Surgery (RACHS-1) category, and preoperative illness severity was assessed by Pediatric Risk of Mortality (PRISM) III score. Intraoperative adverse events were prospectively monitored. Outcomes were analyzed using nonparametric methods and a logistic regression model. A total of 166 patients (RACHS 4-6 [49%]), neonates [50%]) were observed. Sixty-one (37%) had at least 1 intraoperative adverse event, and 47 (28.3%) had at least 1 major postoperative adverse event. There was no correlation between intraoperative adverse events and RACHS, preoperative PRISM III, technical performance score, or postoperative adverse events on multivariate analysis. For the entire cohort, better technical performance score resulted in lower postoperative adverse events, lower postoperative PRISM, and lower length of stay and ventilation time (P < .001). Patients requiring intraoperative revisions fared as well as patients without, provided the technical score was at least adequate. In neonatal and infant open heart repairs, technical performance score is one of the main predictors of postoperative morbidity. Outcomes are not affected by intraoperative adverse events, including surgical revisions, provided technical performance score is at least adequate. Copyright © 2011 The American Association for Thoracic Surgery. Published by Mosby, Inc. All rights reserved.

  7. Does the prevalence of latent toxoplasmosis and frequency of Rhesus-negative subjects correlate with the nationwide rate of traffic accidents?

    PubMed

    Flegr, Jaroslav; Dama, Madhukar

    2014-12-01

    Latent toxoplasmosis is probably the most common protistan parasitic disease with many indirect negative impacts on human health. One of the important impacts is impaired psychomotor function leading to reduced driving efficiency in Toxoplasma-seropositive subjects. Numerous case-control studies have established a positive relation between the seroprevalence of Toxoplasma gondii (Nicolle et Manceaux, 1908) and probability of traffic accidents in study populations. The prevalence of toxoplasmosis varies between populations according to local geographical conditions, hygienic practices and kitchen habits. Similarly, we see a striking variation in the incidence of traffic accidents across countries. Hence, we compiled the largest ever data set on the seroprevalence of toxoplasmosis and tried to understand its role in traffic accident-related deaths and disabilities across 87 countries. Simple non-parametric analysis showed a positive and strong relation of T. gondii seroprevalence and traffic accident related disabilities. Further, we conducted multivariate analysis to control for confounding factors. After controlling for wealth, geographical latitude, health of population, length of roads and number of vehicles, the correlation disappeared. When the frequency of RhD negativity and its interaction with toxoplasmosis were included into the model, the effects of toxoplasmosis seemingly returned. However, the analysed data suffered from the problem of multicollinearity. When a proper method of analysis, ridge regression, was applied, the effects of toxoplasmosis prevalence and RhD negativity frequency disappeared again. The existence of a strong correlation between the prevalence of toxoplasmosis and health of population in particular countries, which was the probable cause of multicollinearity and possible reason for the negative result of the present study, suggests that 'asymptomatic' latent toxoplasmosis could have a large impact on public health.

  8. Comparative Associations of Working Memory and Pain Catastrophizing With Chronic Low Back Pain Intensity

    PubMed Central

    Lentz, Trevor A.; Bishop, Mark D.; Riley, Joseph L.; Fillingim, Roger B.; George, Steven Z.

    2016-01-01

    Background Because of its high global burden, determining biopsychosocial influences of chronic low back pain (CLBP) is a research priority. Psychological factors such as pain catastrophizing are well established. However, cognitive factors such as working memory warrant further investigation to be clinically useful. Objective The purpose of this study was to determine how working memory and pain catastrophizing are associated with CLBP measures of daily pain intensity and movement-evoked pain intensity. Design This study was a cross-sectional analysis of individuals with ≥3 months of CLBP (n=60) compared with pain-free controls (n=30). Method Participants completed measures of working memory, pain catastrophizing, and daily pain intensity. Movement-evoked pain intensity was assessed using the Back Performance Scale. Outcome measures were compared between individuals with CLBP and those who were pain-free using nonparametric testing. Associations were determined using multivariate regression analyses. Results Participants with CLBP (mean age=47.7 years, 68% female) had lower working memory performance (P=.008) and higher pain catastrophizing (P<.001) compared with pain-free controls (mean age=47.6 years, 63% female). For individuals with CLBP, only working memory remained associated with daily pain intensity (R2=.07, standardized beta=−.308, P=.041) and movement-evoked pain intensity (R2=.14, standardized beta=−.502, P=.001) after accounting for age, sex, education, and interactions between pain catastrophizing and working memory. Limitations The cross-sectional design prevented prospective analysis. Findings also are not indicative of overall working memory (eg, spatial) or cognitive performance. Conclusion Working memory demonstrated the strongest association with daily pain and movement-evoked pain intensity compared with (and after accounting for) established CLBP factors. Future research will elucidate the prognostic value of working memory on prevention and recovery of CLBP. PMID:26700272

  9. Relation between cost of drug treatment and body mass index in people with type 2 diabetes in Latin America.

    PubMed

    Elgart, Jorge Federico; Prestes, Mariana; Gonzalez, Lorena; Rucci, Enzo; Gagliardino, Juan Jose

    2017-01-01

    Despite the frequent association of obesity with type 2 diabetes (T2D), the effect of the former on the cost of drug treatment of the latest has not been specifically addressed. We studied the association of overweight/obesity on the cost of drug treatment of hyperglycemia, hypertension and dyslipidemia in a population with T2D. This observational study utilized data from the QUALIDIAB database on 3,099 T2D patients seen in Diabetes Centers in Argentina, Chile, Colombia, Peru, and Venezuela. Data were grouped according to body mass index (BMI) as Normal (18.5≤BMI<25), Overweight (25≤BMI<30), and Obese (BMI≥30). Thereafter, we assessed clinical and metabolic data and cost of drug treatment in each category. Statistical analyses included group comparisons for continuous variables (parametric or non-parametric tests), Chi-square tests for differences between proportions, and multivariable regression analysis to assess the association between BMI and monthly cost of drug treatment. Although all groups showed comparable degree of glycometabolic control (FBG, HbA1c), we found significant differences in other metabolic control indicators. Total cost of drug treatment of hyperglycemia and associated cardiovascular risk factors (CVRF) increased significantly (p<0.001) with increment of BMI. Hyperglycemia treatment cost showed a significant increase concordant with BMI whereas hypertension and dyslipidemia did not. Despite different values and percentages of increase, this growing cost profile was reproduced in every participating country. BMI significantly and independently affected hyperglycemia treatment cost. Our study shows for the first time that BMI significantly increases total expenditure on drugs for T2D and its associated CVRF treatment in Latin America.

  10. The effects of austerity measures on quality of healthcare services: a national survey of physicians in the public and private sectors in Portugal.

    PubMed

    Correia, Tiago; Carapinheiro, Graça; Carvalho, Helena; Silva, José Manuel; Dussault, Gilles

    2017-12-12

    The European Union member countries reacted differently to the 2008 economic and financial crisis. However, few countries have monitored the outcomes of their policy responses, and there is therefore little evidence as to whether or not savings undermined the performance of health systems. We discuss the situation in Portugal, where a financial adjustment program was implemented between 2011 and 2014, and explore the views of health workers on the effects of austerity measures on quality of care delivery. A nationwide survey of physicians' experiences was conducted in 2013-2014 (n = 3442). We used a two-step model to compare public and private services and look at the possible moderating effects of the physicians' specialty and years of practice. Our data analysis included descriptive statistics, the independent t test, analysis of variance (ANOVA), multivariate logistic regression, General Linear Model Univariate Analysis, non-parametric methods (bootstrap), and post hoc probing. Mainly in the public sector, the policy goal of maintaining quality of care was undermined by a lack of resources, the deterioration in medical residency conditions, and to a lesser extent, greater administrative interference in clinical decision-making. Differences in public and private services showed that the effects of the austerity measures were not the same throughout the health system. Our results also showed that physicians with similar years of practice and in the same medical specialty did not necessarily experience the same pressures. The debate on the effects of austerity measures should focus more closely on health workers' concrete experiences, as they demonstrate the non-linearity between policy setting and expected outcomes. We also suggest that it is necessary to explore the interplay between lower quality and the undermining of trust relationships in health.

  11. Caries prevalence and fluoride use in low SES children in Clermont-Ferrand (France).

    PubMed

    Tubert-Jeannin, S; Riordan, P J; Manevy, R; Lecuyer, M M; Pegon-Machat, E

    2009-03-01

    To evaluate the association between dental caries experience and preventive behaviours of children residing in a deprived area in Clermont-Ferrand (France). All 4-5 yr-olds attending nine schools in deprived areas of the city were invited to participate and 81% (n=282) consented and were examined. Dental caries was recorded at the dentine threshold. Parents completed a questionnaire concerning family demographics and the child's use of fluoride. Non-parametric tests and logistic regression assessed the relative importance of SES and fluoride variables on dental status (dt>1). Fifty four (19%) of the examined children were living in families with an immigrant background, 33% were fully covered by the national health insurance programme for deprived families. Caries experience was high; mean dft was 1.94 (3.31) and 30% of the children had >1 carious teeth. Thirty percent of the families reported using fluoridated salt. Tooth brushing once daily was reported for 39% and twice daily for 26%. Parents declared supervising tooth brushing for 60%. Two thirds of the children, according to their parents, used fluoride supplement between birth and two years. Supervised tooth brushing was significantly correlated with lower mean dt scores. Systemic fluoride use was poorly related to dental caries Immigrant background, family size, type of health insurance and mother's unemployment were significantly correlated with caries prevalence. In multivariate analysis, immigrant status, supervised tooth brushing and parental knowledge about fluoride in toothpastes were significant caries predictors. The majority of low SES children did not practice effective caries prevention; few reported twice daily brushing with fluoride toothpaste. Caries experience was very high and much was untreated. Immigrant status, supervised tooth brushing and parental knowledge about fluoride in toothpastes were significant caries predictors.

  12. Associations between host characteristics and antimicrobial resistance of Salmonella typhimurium.

    PubMed

    Ruddat, I; Tietze, E; Ziehm, D; Kreienbrock, L

    2014-10-01

    A collection of Salmonella Typhimurium isolates obtained from sporadic salmonellosis cases in humans from Lower Saxony, Germany between June 2008 and May 2010 was used to perform an exploratory risk-factor analysis on antimicrobial resistance (AMR) using comprehensive host information on sociodemographic attributes, medical history, food habits and animal contact. Multivariate resistance profiles of minimum inhibitory concentrations for 13 antimicrobial agents were analysed using a non-parametric approach with multifactorial models adjusted for phage types. Statistically significant associations were observed for consumption of antimicrobial agents, region type and three factors on egg-purchasing behaviour, indicating that besides antimicrobial use the proximity to other community members, health consciousness and other lifestyle-related attributes may play a role in the dissemination of resistances. Furthermore, a statistically significant increase in AMR from the first study year to the second year was observed.

  13. A survey of kernel-type estimators for copula and their applications

    NASA Astrophysics Data System (ADS)

    Sumarjaya, I. W.

    2017-10-01

    Copulas have been widely used to model nonlinear dependence structure. Main applications of copulas include areas such as finance, insurance, hydrology, rainfall to name but a few. The flexibility of copula allows researchers to model dependence structure beyond Gaussian distribution. Basically, a copula is a function that couples multivariate distribution functions to their one-dimensional marginal distribution functions. In general, there are three methods to estimate copula. These are parametric, nonparametric, and semiparametric method. In this article we survey kernel-type estimators for copula such as mirror reflection kernel, beta kernel, transformation method and local likelihood transformation method. Then, we apply these kernel methods to three stock indexes in Asia. The results of our analysis suggest that, albeit variation in information criterion values, the local likelihood transformation method performs better than the other kernel methods.

  14. Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol lowering drugs

    PubMed Central

    Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G.; Shah, Arvind K.; Lin, Jianxin

    2013-01-01

    In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data (IPD) in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the Deviance Information Criterion (DIC) is used to select the best transformation model. Since the model is quite complex, a novel Monte Carlo Markov chain (MCMC) sampling scheme is developed to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol lowering drugs where the goal is to jointly model the three dimensional response consisting of Low Density Lipoprotein Cholesterol (LDL-C), High Density Lipoprotein Cholesterol (HDL-C), and Triglycerides (TG) (LDL-C, HDL-C, TG). Since the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately: however, a multivariate approach would be more appropriate since these variables are correlated with each other. A detailed analysis of these data is carried out using the proposed methodology. PMID:23580436

  15. Bayesian inference for multivariate meta-analysis Box-Cox transformation models for individual patient data with applications to evaluation of cholesterol-lowering drugs.

    PubMed

    Kim, Sungduk; Chen, Ming-Hui; Ibrahim, Joseph G; Shah, Arvind K; Lin, Jianxin

    2013-10-15

    In this paper, we propose a class of Box-Cox transformation regression models with multidimensional random effects for analyzing multivariate responses for individual patient data in meta-analysis. Our modeling formulation uses a multivariate normal response meta-analysis model with multivariate random effects, in which each response is allowed to have its own Box-Cox transformation. Prior distributions are specified for the Box-Cox transformation parameters as well as the regression coefficients in this complex model, and the deviance information criterion is used to select the best transformation model. Because the model is quite complex, we develop a novel Monte Carlo Markov chain sampling scheme to sample from the joint posterior of the parameters. This model is motivated by a very rich dataset comprising 26 clinical trials involving cholesterol-lowering drugs where the goal is to jointly model the three-dimensional response consisting of low density lipoprotein cholesterol (LDL-C), high density lipoprotein cholesterol (HDL-C), and triglycerides (TG) (LDL-C, HDL-C, TG). Because the joint distribution of (LDL-C, HDL-C, TG) is not multivariate normal and in fact quite skewed, a Box-Cox transformation is needed to achieve normality. In the clinical literature, these three variables are usually analyzed univariately; however, a multivariate approach would be more appropriate because these variables are correlated with each other. We carry out a detailed analysis of these data by using the proposed methodology. Copyright © 2013 John Wiley & Sons, Ltd.

  16. Incremental online learning in high dimensions.

    PubMed

    Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan

    2005-12-01

    Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.

  17. Quantifying Vegetation Biophysical Variables from Imaging Spectroscopy Data: A Review on Retrieval Methods

    NASA Astrophysics Data System (ADS)

    Verrelst, Jochem; Malenovský, Zbyněk; Van der Tol, Christiaan; Camps-Valls, Gustau; Gastellu-Etchegorry, Jean-Philippe; Lewis, Philip; North, Peter; Moreno, Jose

    2018-06-01

    An unprecedented spectroscopic data stream will soon become available with forthcoming Earth-observing satellite missions equipped with imaging spectroradiometers. This data stream will open up a vast array of opportunities to quantify a diversity of biochemical and structural vegetation properties. The processing requirements for such large data streams require reliable retrieval techniques enabling the spatiotemporally explicit quantification of biophysical variables. With the aim of preparing for this new era of Earth observation, this review summarizes the state-of-the-art retrieval methods that have been applied in experimental imaging spectroscopy studies inferring all kinds of vegetation biophysical variables. Identified retrieval methods are categorized into: (1) parametric regression, including vegetation indices, shape indices and spectral transformations; (2) nonparametric regression, including linear and nonlinear machine learning regression algorithms; (3) physically based, including inversion of radiative transfer models (RTMs) using numerical optimization and look-up table approaches; and (4) hybrid regression methods, which combine RTM simulations with machine learning regression methods. For each of these categories, an overview of widely applied methods with application to mapping vegetation properties is given. In view of processing imaging spectroscopy data, a critical aspect involves the challenge of dealing with spectral multicollinearity. The ability to provide robust estimates, retrieval uncertainties and acceptable retrieval processing speed are other important aspects in view of operational processing. Recommendations towards new-generation spectroscopy-based processing chains for operational production of biophysical variables are given.

  18. Multiple Imputation of a Randomly Censored Covariate Improves Logistic Regression Analysis.

    PubMed

    Atem, Folefac D; Qian, Jing; Maye, Jacqueline E; Johnson, Keith A; Betensky, Rebecca A

    2016-01-01

    Randomly censored covariates arise frequently in epidemiologic studies. The most commonly used methods, including complete case and single imputation or substitution, suffer from inefficiency and bias. They make strong parametric assumptions or they consider limit of detection censoring only. We employ multiple imputation, in conjunction with semi-parametric modeling of the censored covariate, to overcome these shortcomings and to facilitate robust estimation. We develop a multiple imputation approach for randomly censored covariates within the framework of a logistic regression model. We use the non-parametric estimate of the covariate distribution or the semiparametric Cox model estimate in the presence of additional covariates in the model. We evaluate this procedure in simulations, and compare its operating characteristics to those from the complete case analysis and a survival regression approach. We apply the procedures to an Alzheimer's study of the association between amyloid positivity and maternal age of onset of dementia. Multiple imputation achieves lower standard errors and higher power than the complete case approach under heavy and moderate censoring and is comparable under light censoring. The survival regression approach achieves the highest power among all procedures, but does not produce interpretable estimates of association. Multiple imputation offers a favorable alternative to complete case analysis and ad hoc substitution methods in the presence of randomly censored covariates within the framework of logistic regression.

  19. Nonparametric Estimation of the Probability of Ruin.

    DTIC Science & Technology

    1985-02-01

    MATHEMATICS RESEARCH CENTER I E N FREES FEB 85 MRC/TSR...in NONPARAMETRIC ESTIMATION OF THE PROBABILITY OF RUIN Lf Edward W. Frees * Mathematics Research Center University of Wisconsin-Madison 610 Walnut...34 - .. --- - • ’. - -:- - - ..- . . .- -- .-.-. . -. . .- •. . - . . - . . .’ . ’- - .. -’vi . .-" "-- -" ,’- UNIVERSITY OF WISCONSIN-MADISON MATHEMATICS RESEARCH CENTER NONPARAMETRIC ESTIMATION OF THE PROBABILITY

  20. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert M.

    2013-01-01

    A new regression model search algorithm was developed that may be applied to both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The algorithm is a simplified version of a more complex algorithm that was originally developed for the NASA Ames Balance Calibration Laboratory. The new algorithm performs regression model term reduction to prevent overfitting of data. It has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a regression model search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression model. Therefore, the simplified algorithm is not intended to replace the original algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new search algorithm.

  1. A spline-based regression parameter set for creating customized DARTEL MRI brain templates from infancy to old age.

    PubMed

    Wilke, Marko

    2018-02-01

    This dataset contains the regression parameters derived by analyzing segmented brain MRI images (gray matter and white matter) from a large population of healthy subjects, using a multivariate adaptive regression splines approach. A total of 1919 MRI datasets ranging in age from 1-75 years from four publicly available datasets (NIH, C-MIND, fCONN, and IXI) were segmented using the CAT12 segmentation framework, writing out gray matter and white matter images normalized using an affine-only spatial normalization approach. These images were then subjected to a six-step DARTEL procedure, employing an iterative non-linear registration approach and yielding increasingly crisp intermediate images. The resulting six datasets per tissue class were then analyzed using multivariate adaptive regression splines, using the CerebroMatic toolbox. This approach allows for flexibly modelling smoothly varying trajectories while taking into account demographic (age, gender) as well as technical (field strength, data quality) predictors. The resulting regression parameters described here can be used to generate matched DARTEL or SHOOT templates for a given population under study, from infancy to old age. The dataset and the algorithm used to generate it are publicly available at https://irc.cchmc.org/software/cerebromatic.php.

  2. [Influences of environmental factors and interaction of several chemokines gene-environmental on systemic lupus erythematosus].

    PubMed

    Ye, Dong-qing; Hu, Yi-song; Li, Xiang-pei; Huang, Fen; Yang, Shi-gui; Hao, Jia-hu; Yin, Jing; Zhang, Guo-qing; Liu, Hui-hui

    2004-11-01

    To explore the impact of environmental factors, daily lifestyle, psycho-social factors and the interactions between environmental factors and chemokines genes on systemic lupus erythematosus (SLE). Case-control study was carried out and environmental factors for SLE were analyzed by univariate and multivariate unconditional logistic regression. Interactions between environmental factors and chemokines polymorphism contributing to systemic lupus erythematosus were also analyzed by logistic regression model. There were nineteen factors associated with SLE when univariate unconditional logistic regression was used. However, when multivariate unconditional logistic regression was used, only five factors showed having impacts on the disease, in which drinking well water (OR=0.099) was protective factor for SLE, and multiple drug allergy (OR=8.174), over-exposure to sunshine (OR=18.339), taking antibiotics (OR=9.630) and oral contraceptives were risk factors for SLE. When unconditional logistic regression model was used, results showed that there was interaction between eating irritable food and -2518MCP-1G/G genotype (OR=4.387). No interaction between environmental factors was found that contributing to SLE in this study. Many environmental factors were related to SLE, and there was an interaction between -2518MCP-1G/G genotype and eating irritable food.

  3. Spatial hydrological drought characteristics in Karkheh River basin, southwest Iran using copulas

    NASA Astrophysics Data System (ADS)

    Dodangeh, Esmaeel; Shahedi, Kaka; Shiau, Jenq-Tzong; MirAkbari, Maryam

    2017-08-01

    Investigation on drought characteristics such as severity, duration, and frequency is crucial for water resources planning and management in a river basin. While the methodology for multivariate drought frequency analysis is well established by applying the copulas, the estimation on the associated parameters by various parameter estimation methods and the effects on the obtained results have not yet been investigated. This research aims at conducting a comparative analysis between the maximum likelihood parametric and non-parametric method of the Kendall τ estimation method for copulas parameter estimation. The methods were employed to study joint severity-duration probability and recurrence intervals in Karkheh River basin (southwest Iran) which is facing severe water-deficit problems. Daily streamflow data at three hydrological gauging stations (Tang Sazbon, Huleilan and Polchehr) near the Karkheh dam were used to draw flow duration curves (FDC) of these three stations. The Q_{75} index extracted from the FDC were set as threshold level to abstract drought characteristics such as drought duration and severity on the basis of the run theory. Drought duration and severity were separately modeled using the univariate probabilistic distributions and gamma-GEV, LN2-exponential, and LN2-gamma were selected as the best paired drought severity-duration inputs for copulas according to the Akaike Information Criteria (AIC), Kolmogorov-Smirnov and chi-square tests. Archimedean Clayton, Frank, and extreme value Gumbel copulas were employed to construct joint cumulative distribution functions (JCDF) of droughts for each station. Frank copula at Tang Sazbon and Gumbel at Huleilan and Polchehr stations were identified as the best copulas based on the performance evaluation criteria including AIC, BIC, log-likelihood and root mean square error (RMSE) values. Based on the RMSE values, nonparametric Kendall-τ is preferred to the parametric maximum likelihood estimation method. The results showed greater drought return periods by the parametric ML method in comparison to the nonparametric Kendall τ estimation method. The results also showed that stations located in tributaries (Huleilan and Polchehr) have close return periods, while the station along the main river (Tang Sazbon) has the smaller return periods for the drought events with identical drought duration and severity.

  4. Binding affinity toward human prion protein of some anti-prion compounds - Assessment based on QSAR modeling, molecular docking and non-parametric ranking.

    PubMed

    Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija

    2018-01-01

    The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.

  5. Resting-state networks in healthy adult subjects: a comparison between a 32-element and an 8-element phased array head coil at 3.0 Tesla.

    PubMed

    Paolini, Marco; Keeser, Daniel; Ingrisch, Michael; Werner, Natalie; Kindermann, Nicole; Reiser, Maximilian; Blautzik, Janusch

    2015-05-01

    Little research exists on the influence of a magnetic resonance imaging (MRI) head coil's channel count on measured resting-state functional connectivity. To compare a 32-element (32ch) and an 8-element (8ch) phased array head coil with respect to their potential to detect functional connectivity within resting-state networks. Twenty-six healthy adults (mean age, 21.7 years; SD, 2.1 years) underwent resting-state functional MRI at 3.0 Tesla with both coils using equal standard imaging parameters and a counterbalanced design. Independent component analysis (ICA) at different model orders and a dual regression approach were performed. Voxel-wise non-parametric statistical between-group contrasts were determined using permutation-based non-parametric inference. Phantom measurements demonstrated a generally higher image signal-to-noise ratio using the 32ch head coil. However, the results showed no significant differences between corresponding resting-state networks derived from both coils (p < 0.05, FWE-corrected). Using the identical standard acquisition parameters, the 32ch head coil does not offer any significant advantages in detecting ICA-based functional connectivity within RSNs. © The Foundation Acta Radiologica 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  6. Nonparametric Subgroup Identification by PRIM and CART: A Simulation and Application Study

    PubMed Central

    2017-01-01

    Two nonparametric methods for the identification of subgroups with outstanding outcome values are described and compared to each other in a simulation study and an application to clinical data. The Patient Rule Induction Method (PRIM) searches for box-shaped areas in the given data which exceed a minimal size and average outcome. This is achieved via a combination of iterative peeling and pasting steps, where small fractions of the data are removed or added to the current box. As an alternative, Classification and Regression Trees (CART) prediction models perform sequential binary splits of the data to produce subsets which can be interpreted as subgroups of heterogeneous outcome. PRIM and CART were compared in a simulation study to investigate their strengths and weaknesses under various data settings, taking different performance measures into account. PRIM was shown to be superior in rather complex settings such as those with few observations, a smaller signal-to-noise ratio, and more than one subgroup. CART showed the best performance in simpler situations. A practical application of the two methods was illustrated using a clinical data set. For this application, both methods produced similar results but the higher amount of user involvement of PRIM became apparent. PRIM can be flexibly tuned by the user, whereas CART, although simpler to implement, is rather static. PMID:28611849

  7. Nonparametric Subgroup Identification by PRIM and CART: A Simulation and Application Study.

    PubMed

    Ott, Armin; Hapfelmeier, Alexander

    2017-01-01

    Two nonparametric methods for the identification of subgroups with outstanding outcome values are described and compared to each other in a simulation study and an application to clinical data. The Patient Rule Induction Method (PRIM) searches for box-shaped areas in the given data which exceed a minimal size and average outcome. This is achieved via a combination of iterative peeling and pasting steps, where small fractions of the data are removed or added to the current box. As an alternative, Classification and Regression Trees (CART) prediction models perform sequential binary splits of the data to produce subsets which can be interpreted as subgroups of heterogeneous outcome. PRIM and CART were compared in a simulation study to investigate their strengths and weaknesses under various data settings, taking different performance measures into account. PRIM was shown to be superior in rather complex settings such as those with few observations, a smaller signal-to-noise ratio, and more than one subgroup. CART showed the best performance in simpler situations. A practical application of the two methods was illustrated using a clinical data set. For this application, both methods produced similar results but the higher amount of user involvement of PRIM became apparent. PRIM can be flexibly tuned by the user, whereas CART, although simpler to implement, is rather static.

  8. Predictive equations for the estimation of body size in seals and sea lions (Carnivora: Pinnipedia)

    PubMed Central

    Churchill, Morgan; Clementz, Mark T; Kohno, Naoki

    2014-01-01

    Body size plays an important role in pinniped ecology and life history. However, body size data is often absent for historical, archaeological, and fossil specimens. To estimate the body size of pinnipeds (seals, sea lions, and walruses) for today and the past, we used 14 commonly preserved cranial measurements to develop sets of single variable and multivariate predictive equations for pinniped body mass and total length. Principal components analysis (PCA) was used to test whether separate family specific regressions were more appropriate than single predictive equations for Pinnipedia. The influence of phylogeny was tested with phylogenetic independent contrasts (PIC). The accuracy of these regressions was then assessed using a combination of coefficient of determination, percent prediction error, and standard error of estimation. Three different methods of multivariate analysis were examined: bidirectional stepwise model selection using Akaike information criteria; all-subsets model selection using Bayesian information criteria (BIC); and partial least squares regression. The PCA showed clear discrimination between Otariidae (fur seals and sea lions) and Phocidae (earless seals) for the 14 measurements, indicating the need for family-specific regression equations. The PIC analysis found that phylogeny had a minor influence on relationship between morphological variables and body size. The regressions for total length were more accurate than those for body mass, and equations specific to Otariidae were more accurate than those for Phocidae. Of the three multivariate methods, the all-subsets approach required the fewest number of variables to estimate body size accurately. We then used the single variable predictive equations and the all-subsets approach to estimate the body size of two recently extinct pinniped taxa, the Caribbean monk seal (Monachus tropicalis) and the Japanese sea lion (Zalophus japonicus). Body size estimates using single variable regressions generally under or over-estimated body size; however, the all-subset regression produced body size estimates that were close to historically recorded body length for these two species. This indicates that the all-subset regression equations developed in this study can estimate body size accurately. PMID:24916814

  9. [Correlation between gaseous exchange rate, body temperature, and mitochondrial protein content in the liver of mice].

    PubMed

    Muradian, Kh K; Utko, N O; Mozzhukhina, T H; Pishel', I M; Litoshenko, O Ia; Bezrukov, V V; Fraĭfel'd, V E

    2002-01-01

    Correlative and regressive relations between the gaseous exchange, thermoregulation and mitochondrial protein content were analyzed by two- and three-dimensional statistics in mice. It has been shown that the pair wise linear methods of analysis did not reveal any significant correlation between the parameters under exploration. However, it became evident at three-dimensional and non-linear plotting for which the coefficients of multivariable correlation reached and even exceeded 0.7-0.8. The calculations based on partial differentiation of the multivariable regression equations allow to conclude that at certain values of VO2, VCO2 and body temperature negative relations between the systems of gaseous exchange and thermoregulation become dominating.

  10. A High-Dimensional, Multivariate Copula Approach to Modeling Multivariate Agricultural Price Relationships and Tail Dependencies

    Treesearch

    Xuan Chi; Barry Goodwin

    2012-01-01

    Spatial and temporal relationships among agricultural prices have been an important topic of applied research for many years. Such research is used to investigate the performance of markets and to examine linkages up and down the marketing chain. This research has empirically evaluated price linkages by using correlation and regression models and, later, linear and...

  11. Multivariate time series analysis of neuroscience data: some challenges and opportunities.

    PubMed

    Pourahmadi, Mohsen; Noorbaloochi, Siamak

    2016-04-01

    Neuroimaging data may be viewed as high-dimensional multivariate time series, and analyzed using techniques from regression analysis, time series analysis and spatiotemporal analysis. We discuss issues related to data quality, model specification, estimation, interpretation, dimensionality and causality. Some recent research areas addressing aspects of some recurring challenges are introduced. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Utility of an Abbreviated Dizziness Questionnaire to Differentiate between Causes of Vertigo and Guide Appropriate Referral: A Multicenter Prospective Blinded Study

    PubMed Central

    Roland, Lauren T.; Kallogjeri, Dorina; Sinks, Belinda C.; Rauch, Steven D.; Shepard, Neil T.; White, Judith A.; Goebel, Joel A.

    2015-01-01

    Objective Test performance of a focused dizziness questionnaire’s ability to discriminate between peripheral and non-peripheral causes of vertigo. Study Design Prospective multi-center Setting Four academic centers with experienced balance specialists Patients New dizzy patients Interventions A 32-question survey was given to participants. Balance specialists were blinded and a diagnosis was established for all participating patients within 6 months. Main outcomes Multinomial logistic regression was used to evaluate questionnaire performance in predicting final diagnosis and differentiating between peripheral and non-peripheral vertigo. Univariate and multivariable stepwise logistic regression were used to identify questions as significant predictors of the ultimate diagnosis. C-index was used to evaluate performance and discriminative power of the multivariable models. Results 437 patients participated in the study. Eight participants without confirmed diagnoses were excluded and 429 were included in the analysis. Multinomial regression revealed that the model had good overall predictive accuracy of 78.5% for the final diagnosis and 75.5% for differentiating between peripheral and non-peripheral vertigo. Univariate logistic regression identified significant predictors of three main categories of vertigo: peripheral, central and other. Predictors were entered into forward stepwise multivariable logistic regression. The discriminative power of the final models for peripheral, central and other causes were considered good as measured by c-indices of 0.75, 0.7 and 0.78, respectively. Conclusions This multicenter study demonstrates a focused dizziness questionnaire can accurately predict diagnosis for patients with chronic/relapsing dizziness referred to outpatient clinics. Additionally, this survey has significant capability to differentiate peripheral from non-peripheral causes of vertigo and may, in the future, serve as a screening tool for specialty referral. Clinical utility of this questionnaire to guide specialty referral is discussed. PMID:26485598

  13. Utility of an Abbreviated Dizziness Questionnaire to Differentiate Between Causes of Vertigo and Guide Appropriate Referral: A Multicenter Prospective Blinded Study.

    PubMed

    Roland, Lauren T; Kallogjeri, Dorina; Sinks, Belinda C; Rauch, Steven D; Shepard, Neil T; White, Judith A; Goebel, Joel A

    2015-12-01

    Test performance of a focused dizziness questionnaire's ability to discriminate between peripheral and nonperipheral causes of vertigo. Prospective multicenter. Four academic centers with experienced balance specialists. New dizzy patients. A 32-question survey was given to participants. Balance specialists were blinded and a diagnosis was established for all participating patients within 6 months. Multinomial logistic regression was used to evaluate questionnaire performance in predicting final diagnosis and differentiating between peripheral and nonperipheral vertigo. Univariate and multivariable stepwise logistic regression were used to identify questions as significant predictors of the ultimate diagnosis. C-index was used to evaluate performance and discriminative power of the multivariable models. In total, 437 patients participated in the study. Eight participants without confirmed diagnoses were excluded and 429 were included in the analysis. Multinomial regression revealed that the model had good overall predictive accuracy of 78.5% for the final diagnosis and 75.5% for differentiating between peripheral and nonperipheral vertigo. Univariate logistic regression identified significant predictors of three main categories of vertigo: peripheral, central, and other. Predictors were entered into forward stepwise multivariable logistic regression. The discriminative power of the final models for peripheral, central, and other causes was considered good as measured by c-indices of 0.75, 0.7, and 0.78, respectively. This multicenter study demonstrates a focused dizziness questionnaire can accurately predict diagnosis for patients with chronic/relapsing dizziness referred to outpatient clinics. Additionally, this survey has significant capability to differentiate peripheral from nonperipheral causes of vertigo and may, in the future, serve as a screening tool for specialty referral. Clinical utility of this questionnaire to guide specialty referral is discussed.

  14. Experimental variability and data pre-processing as factors affecting the discrimination power of some chemometric approaches (PCA, CA and a new algorithm based on linear regression) applied to (+/-)ESI/MS and RPLC/UV data: Application on green tea extracts.

    PubMed

    Iorgulescu, E; Voicu, V A; Sârbu, C; Tache, F; Albu, F; Medvedovici, A

    2016-08-01

    The influence of the experimental variability (instrumental repeatability, instrumental intermediate precision and sample preparation variability) and data pre-processing (normalization, peak alignment, background subtraction) on the discrimination power of multivariate data analysis methods (Principal Component Analysis -PCA- and Cluster Analysis -CA-) as well as a new algorithm based on linear regression was studied. Data used in the study were obtained through positive or negative ion monitoring electrospray mass spectrometry (+/-ESI/MS) and reversed phase liquid chromatography/UV spectrometric detection (RPLC/UV) applied to green tea extracts. Extractions in ethanol and heated water infusion were used as sample preparation procedures. The multivariate methods were directly applied to mass spectra and chromatograms, involving strictly a holistic comparison of shapes, without assignment of any structural identity to compounds. An alternative data interpretation based on linear regression analysis mutually applied to data series is also discussed. Slopes, intercepts and correlation coefficients produced by the linear regression analysis applied on pairs of very large experimental data series successfully retain information resulting from high frequency instrumental acquisition rates, obviously better defining the profiles being compared. Consequently, each type of sample or comparison between samples produces in the Cartesian space an ellipsoidal volume defined by the normal variation intervals of the slope, intercept and correlation coefficient. Distances between volumes graphically illustrates (dis)similarities between compared data. The instrumental intermediate precision had the major effect on the discrimination power of the multivariate data analysis methods. Mass spectra produced through ionization from liquid state in atmospheric pressure conditions of bulk complex mixtures resulting from extracted materials of natural origins provided an excellent data basis for multivariate analysis methods, equivalent to data resulting from chromatographic separations. The alternative evaluation of very large data series based on linear regression analysis produced information equivalent to results obtained through application of PCA an CA. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. A Comparison of Conventional Linear Regression Methods and Neural Networks for Forecasting Educational Spending.

    ERIC Educational Resources Information Center

    Baker, Bruce D.; Richards, Craig E.

    1999-01-01

    Applies neural network methods for forecasting 1991-95 per-pupil expenditures in U.S. public elementary and secondary schools. Forecasting models included the National Center for Education Statistics' multivariate regression model and three neural architectures. Regarding prediction accuracy, neural network results were comparable or superior to…

  16. "Let Me Count the Ways:" Fostering Reasons for Living among Low-Income, Suicidal, African American Women

    ERIC Educational Resources Information Center

    West, Lindsey M.; Davis, Telsie A.; Thompson, Martie P.; Kaslow, Nadine J.

    2011-01-01

    Protective factors for fostering reasons for living were examined among low-income, suicidal, African American women. Bivariate logistic regressions revealed that higher levels of optimism, spiritual well-being, and family social support predicted reasons for living. Multivariate logistic regressions indicated that spiritual well-being showed…

  17. Fresh Biomass Estimation in Heterogeneous Grassland Using Hyperspectral Measurements and Multivariate Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Darvishzadeh, R.; Skidmore, A. K.; Mirzaie, M.; Atzberger, C.; Schlerf, M.

    2014-12-01

    Accurate estimation of grassland biomass at their peak productivity can provide crucial information regarding the functioning and productivity of the rangelands. Hyperspectral remote sensing has proved to be valuable for estimation of vegetation biophysical parameters such as biomass using different statistical techniques. However, in statistical analysis of hyperspectral data, multicollinearity is a common problem due to large amount of correlated hyper-spectral reflectance measurements. The aim of this study was to examine the prospect of above ground biomass estimation in a heterogeneous Mediterranean rangeland employing multivariate calibration methods. Canopy spectral measurements were made in the field using a GER 3700 spectroradiometer, along with concomitant in situ measurements of above ground biomass for 170 sample plots. Multivariate calibrations including partial least squares regression (PLSR), principal component regression (PCR), and Least-Squared Support Vector Machine (LS-SVM) were used to estimate the above ground biomass. The prediction accuracy of the multivariate calibration methods were assessed using cross validated R2 and RMSE. The best model performance was obtained using LS_SVM and then PLSR both calibrated with first derivative reflectance dataset with R2cv = 0.88 & 0.86 and RMSEcv= 1.15 & 1.07 respectively. The weakest prediction accuracy was appeared when PCR were used (R2cv = 0.31 and RMSEcv= 2.48). The obtained results highlight the importance of multivariate calibration methods for biomass estimation when hyperspectral data are used.

  18. Predicting exposure-response associations of ambient particulate matter with mortality in 73 Chinese cities.

    PubMed

    Madaniyazi, Lina; Guo, Yuming; Chen, Renjie; Kan, Haidong; Tong, Shilu

    2016-01-01

    Estimating the burden of mortality associated with particulates requires knowledge of exposure-response associations. However, the evidence on exposure-response associations is limited in many cities, especially in developing countries. In this study, we predicted associations of particulates smaller than 10 μm in aerodynamic diameter (PM10) with mortality in 73 Chinese cities. The meta-regression model was used to test and quantify which city-specific characteristics contributed significantly to the heterogeneity of PM10-mortality associations for 16 Chinese cities. Then, those city-specific characteristics with statistically significant regression coefficients were treated as independent variables to build multivariate meta-regression models. The model with the best fitness was used to predict PM10-mortality associations in 73 Chinese cities in 2010. Mean temperature, PM10 concentration and green space per capita could best explain the heterogeneity in PM10-mortality associations. Based on city-specific characteristics, we were able to develop multivariate meta-regression models to predict associations between air pollutants and health outcomes reasonably well. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Barriers to health-care and psychological distress among mothers living with HIV in Quebec (Canada).

    PubMed

    Blais, Martin; Fernet, Mylène; Proulx-Boucher, Karène; Lebouché, Bertrand; Rodrigue, Carl; Lapointe, Normand; Otis, Joanne; Samson, Johanne

    2015-01-01

    Health-care providers play a major role in providing good quality care and in preventing psychological distress among mothers living with HIV (MLHIV). The objectives of this study are to explore the impact of health-care services and satisfaction with care providers on psychological distress in MLHIV. One hundred MLHIV were recruited from community and clinical settings in the province of Quebec (Canada). Prevalence estimation of clinical psychological distress and univariate and multivariable logistic regression models were performed to predict clinical psychological distress. Forty-five percent of the participants reported clinical psychological distress. In the multivariable regression, the following variables were significantly associated with psychological distress while controlling for sociodemographic variables: resilience, quality of communication with the care providers, resources, and HIV disclosure concerns. The multivariate results support the key role of personal, structural, and medical resources in understanding psychological distress among MLHIV. Interventions that can support the psychological health of MLHIV are discussed.

  20. Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets

    USGS Publications Warehouse

    Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.

    2013-01-01

    In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.

  1. Using Time Series Analysis to Predict Cardiac Arrest in a PICU.

    PubMed

    Kennedy, Curtis E; Aoki, Noriaki; Mariscalco, Michele; Turley, James P

    2015-11-01

    To build and test cardiac arrest prediction models in a PICU, using time series analysis as input, and to measure changes in prediction accuracy attributable to different classes of time series data. Retrospective cohort study. Thirty-one bed academic PICU that provides care for medical and general surgical (not congenital heart surgery) patients. Patients experiencing a cardiac arrest in the PICU and requiring external cardiac massage for at least 2 minutes. None. One hundred three cases of cardiac arrest and 109 control cases were used to prepare a baseline dataset that consisted of 1,025 variables in four data classes: multivariate, raw time series, clinical calculations, and time series trend analysis. We trained 20 arrest prediction models using a matrix of five feature sets (combinations of data classes) with four modeling algorithms: linear regression, decision tree, neural network, and support vector machine. The reference model (multivariate data with regression algorithm) had an accuracy of 78% and 87% area under the receiver operating characteristic curve. The best model (multivariate + trend analysis data with support vector machine algorithm) had an accuracy of 94% and 98% area under the receiver operating characteristic curve. Cardiac arrest predictions based on a traditional model built with multivariate data and a regression algorithm misclassified cases 3.7 times more frequently than predictions that included time series trend analysis and built with a support vector machine algorithm. Although the final model lacks the specificity necessary for clinical application, we have demonstrated how information from time series data can be used to increase the accuracy of clinical prediction models.

  2. Physical Function in Older Men With Hyperkyphosis

    PubMed Central

    Harrison, Stephanie L.; Fink, Howard A.; Marshall, Lynn M.; Orwoll, Eric; Barrett-Connor, Elizabeth; Cawthon, Peggy M.; Kado, Deborah M.

    2015-01-01

    Background. Age-related hyperkyphosis has been associated with poor physical function and is a well-established predictor of adverse health outcomes in older women, but its impact on health in older men is less well understood. Methods. We conducted a cross-sectional study to evaluate the association of hyperkyphosis and physical function in 2,363 men, aged 71–98 (M = 79) from the Osteoporotic Fractures in Men Study. Kyphosis was measured using the Rancho Bernardo Study block method. Measurements of grip strength and lower extremity function, including gait speed over 6 m, narrow walk (measure of dynamic balance), repeated chair stands ability and time, and lower extremity power (Nottingham Power Rig) were included separately as primary outcomes. We investigated associations of kyphosis and each outcome in age-adjusted and multivariable linear or logistic regression models, controlling for age, clinic, education, race, bone mineral density, height, weight, diabetes, and physical activity. Results. In multivariate linear regression, we observed a dose-related response of worse scores on each lower extremity physical function test as number of blocks increased, p for trend ≤.001. Using a cutoff of ≥4 blocks, 20% (N = 469) of men were characterized with hyperkyphosis. In multivariate logistic regression, men with hyperkyphosis had increased odds (range 1.5–1.8) of being in the worst quartile of performing lower extremity physical function tasks (p < .001 for each outcome). Kyphosis was not associated with grip strength in any multivariate analysis. Conclusions. Hyperkyphosis is associated with impaired lower extremity physical function in older men. Further studies are needed to determine the direction of causality. PMID:25431353

  3. Multivariate functional response regression, with application to fluorescence spectroscopy in a cervical pre-cancer study.

    PubMed

    Zhu, Hongxiao; Morris, Jeffrey S; Wei, Fengrong; Cox, Dennis D

    2017-07-01

    Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors. This novel modeling framework simultaneously accounts for correlations between functional variables and potential multi-level structures in data that are induced by experimental design. The model is fitted by performing a two-stage linear transformation-a basis expansion to each functional variable followed by principal component analysis for the concatenated basis coefficients. This transformation effectively reduces the intra-and inter-function correlations and facilitates fast and convenient calculation. A fully Bayesian approach is adopted to sample the model parameters in the transformed space, and posterior inference is performed after inverse-transforming the regression coefficients back to the original data domain. The proposed approach produces functional tests that flag local regions on the functional effects, while controlling the overall experiment-wise error rate or false discovery rate. It also enables functional discriminant analysis through posterior predictive calculation. Analysis of the fluorescence spectroscopy data reveals local regions with differential expressions across the pre-cancer and normal samples. These regions may serve as biomarkers for prognosis and disease assessment.

  4. Development of the Complex General Linear Model in the Fourier Domain: Application to fMRI Multiple Input-Output Evoked Responses for Single Subjects

    PubMed Central

    Rio, Daniel E.; Rawlings, Robert R.; Woltz, Lawrence A.; Gilman, Jodi; Hommer, Daniel W.

    2013-01-01

    A linear time-invariant model based on statistical time series analysis in the Fourier domain for single subjects is further developed and applied to functional MRI (fMRI) blood-oxygen level-dependent (BOLD) multivariate data. This methodology was originally developed to analyze multiple stimulus input evoked response BOLD data. However, to analyze clinical data generated using a repeated measures experimental design, the model has been extended to handle multivariate time series data and demonstrated on control and alcoholic subjects taken from data previously analyzed in the temporal domain. Analysis of BOLD data is typically carried out in the time domain where the data has a high temporal correlation. These analyses generally employ parametric models of the hemodynamic response function (HRF) where prewhitening of the data is attempted using autoregressive (AR) models for the noise. However, this data can be analyzed in the Fourier domain. Here, assumptions made on the noise structure are less restrictive, and hypothesis tests can be constructed based on voxel-specific nonparametric estimates of the hemodynamic transfer function (HRF in the Fourier domain). This is especially important for experimental designs involving multiple states (either stimulus or drug induced) that may alter the form of the response function. PMID:23840281

  5. Development of the complex general linear model in the Fourier domain: application to fMRI multiple input-output evoked responses for single subjects.

    PubMed

    Rio, Daniel E; Rawlings, Robert R; Woltz, Lawrence A; Gilman, Jodi; Hommer, Daniel W

    2013-01-01

    A linear time-invariant model based on statistical time series analysis in the Fourier domain for single subjects is further developed and applied to functional MRI (fMRI) blood-oxygen level-dependent (BOLD) multivariate data. This methodology was originally developed to analyze multiple stimulus input evoked response BOLD data. However, to analyze clinical data generated using a repeated measures experimental design, the model has been extended to handle multivariate time series data and demonstrated on control and alcoholic subjects taken from data previously analyzed in the temporal domain. Analysis of BOLD data is typically carried out in the time domain where the data has a high temporal correlation. These analyses generally employ parametric models of the hemodynamic response function (HRF) where prewhitening of the data is attempted using autoregressive (AR) models for the noise. However, this data can be analyzed in the Fourier domain. Here, assumptions made on the noise structure are less restrictive, and hypothesis tests can be constructed based on voxel-specific nonparametric estimates of the hemodynamic transfer function (HRF in the Fourier domain). This is especially important for experimental designs involving multiple states (either stimulus or drug induced) that may alter the form of the response function.

  6. A nonparametric multivariate standardized drought index for characterizing socioeconomic drought: A case study in the Heihe River Basin

    NASA Astrophysics Data System (ADS)

    Huang, Shengzhi; Huang, Qiang; Leng, Guoyong; Liu, Saiyan

    2016-11-01

    Among various drought types, socioeconomic drought is the least investigated type of droughts. Most existing drought indicators ignore the role of local reservoirs and water demand in coping with climatic extremes. In this study, a Multivariate Standardized Reliability and Resilience Index (MSRRI) combining inflow-demand reliability index (IDR) and water storage resilience index (WSR) was applied to examine the evolution characteristics of the socioeconomic droughts in the Heihe River Basin, the second largest inland river basin in northwestern China. Furthermore, the cross wavelet analysis was adopted to explore the associations between annual MSRRI series and El Niño Southern Oscillation (ENSO)/Atlantic Oscillation (AO). Results indicated that: (1) the developed MSRRI is more sensitive to the onset and termination of socioeconomic droughts than IDR and WSR, owing to its joint distribution function of IDR and WSR, responding to changes in either or both of the indices; (2) the MSRRI series in the Heihe River Basin shows non-significant trends at both monthly and annual scales; (3) both ENSO and AO contribute to the changes in the socioeconomic droughts in the Heihe River Basin, and the impacts of ENSO on the socioeconomic droughts are stronger than those of AO.

  7. Exploratory multivariate modeling and prediction of the physico-chemical properties of surface water and groundwater

    NASA Astrophysics Data System (ADS)

    Ayoko, Godwin A.; Singh, Kirpal; Balerea, Steven; Kokot, Serge

    2007-03-01

    SummaryPhysico-chemical properties of surface water and groundwater samples from some developing countries have been subjected to multivariate analyses by the non-parametric multi-criteria decision-making methods, PROMETHEE and GAIA. Complete ranking information necessary to select one source of water in preference to all others was obtained, and this enabled relationships between the physico-chemical properties and water quality to be assessed. Thus, the ranking of the quality of the water bodies was found to be strongly dependent on the total dissolved solid, phosphate, sulfate, ammonia-nitrogen, calcium, iron, chloride, magnesium, zinc, nitrate and fluoride contents of the waters. However, potassium, manganese and zinc composition showed the least influence in differentiating the water bodies. To model and predict the water quality influencing parameters, partial least squares analyses were carried out on a matrix made up of the results of water quality assessment studies carried out in Nigeria, Papua New Guinea, Egypt, Thailand and India/Pakistan. The results showed that the total dissolved solid, calcium, sulfate, sodium and chloride contents can be used to predict a wide range of physico-chemical characteristics of water. The potential implications of these observations on the financial and opportunity costs associated with elaborate water quality monitoring are discussed.

  8. Logistic regression analysis of factors associated with avascular necrosis of the femoral head following femoral neck fractures in middle-aged and elderly patients.

    PubMed

    Ai, Zi-Sheng; Gao, You-Shui; Sun, Yuan; Liu, Yue; Zhang, Chang-Qing; Jiang, Cheng-Hua

    2013-03-01

    Risk factors for femoral neck fracture-induced avascular necrosis of the femoral head have not been elucidated clearly in middle-aged and elderly patients. Moreover, the high incidence of screw removal in China and its effect on the fate of the involved femoral head require statistical methods to reflect their intrinsic relationship. Ninety-nine patients older than 45 years with femoral neck fracture were treated by internal fixation between May 1999 and April 2004. Descriptive analysis, interaction analysis between associated factors, single factor logistic regression, multivariate logistic regression, and detailed interaction analysis were employed to explore potential relationships among associated factors. Avascular necrosis of the femoral head was found in 15 cases (15.2 %). Age × the status of implants (removal vs. maintenance) and gender × the timing of reduction were interactive according to two-factor interactive analysis. Age, the displacement of fractures, the quality of reduction, and the status of implants were found to be significant factors in single factor logistic regression analysis. Age, age × the status of implants, and the quality of reduction were found to be significant factors in multivariate logistic regression analysis. In fine interaction analysis after multivariate logistic regression analysis, implant removal was the most important risk factor for avascular necrosis in 56-to-85-year-old patients, with a risk ratio of 26.00 (95 % CI = 3.076-219.747). The middle-aged and elderly have less incidence of avascular necrosis of the femoral head following femoral neck fractures treated by cannulated screws. The removal of cannulated screws can induce a significantly high incidence of avascular necrosis of the femoral head in elderly patients, while a high-quality reduction is helpful to reduce avascular necrosis.

  9. Multivariate adaptive regression splines analysis to predict biomarkers of spontaneous preterm birth.

    PubMed

    Menon, Ramkumar; Bhat, Geeta; Saade, George R; Spratt, Heidi

    2014-04-01

    To develop classification models of demographic/clinical factors and biomarker data from spontaneous preterm birth in African Americans and Caucasians. Secondary analysis of biomarker data using multivariate adaptive regression splines (MARS), a supervised machine learning algorithm method. Analysis of data on 36 biomarkers from 191 women was reduced by MARS to develop predictive models for preterm birth in African Americans and Caucasians. Maternal plasma, cord plasma collected at admission for preterm or term labor and amniotic fluid at delivery. Data were partitioned into training and testing sets. Variable importance, a relative indicator (0-100%) and area under the receiver operating characteristic curve (AUC) characterized results. Multivariate adaptive regression splines generated models for combined and racially stratified biomarker data. Clinical and demographic data did not contribute to the model. Racial stratification of data produced distinct models in all three compartments. In African Americans maternal plasma samples IL-1RA, TNF-α, angiopoietin 2, TNFRI, IL-5, MIP1α, IL-1β and TGF-α modeled preterm birth (AUC train: 0.98, AUC test: 0.86). In Caucasians TNFR1, ICAM-1 and IL-1RA contributed to the model (AUC train: 0.84, AUC test: 0.68). African Americans cord plasma samples produced IL-12P70, IL-8 (AUC train: 0.82, AUC test: 0.66). Cord plasma in Caucasians modeled IGFII, PDGFBB, TGF-β1 , IL-12P70, and TIMP1 (AUC train: 0.99, AUC test: 0.82). Amniotic fluid in African Americans modeled FasL, TNFRII, RANTES, KGF, IGFI (AUC train: 0.95, AUC test: 0.89) and in Caucasians, TNF-α, MCP3, TGF-β3 , TNFR1 and angiopoietin 2 (AUC train: 0.94 AUC test: 0.79). Multivariate adaptive regression splines models multiple biomarkers associated with preterm birth and demonstrated racial disparity. © 2014 Nordic Federation of Societies of Obstetrics and Gynecology.

  10. Using a DEA Management Tool through a Nonparametric Approach: An Examination of Urban-Rural Effects on Thai School Efficiency

    ERIC Educational Resources Information Center

    Kantabutra, Sangchan

    2009-01-01

    This paper examines urban-rural effects on public upper-secondary school efficiency in northern Thailand. In the study, efficiency was measured by a nonparametric technique, data envelopment analysis (DEA). Urban-rural effects were examined through a Mann-Whitney nonparametric statistical test. Results indicate that urban schools appear to have…

  11. Rediscovery of Good-Turing estimators via Bayesian nonparametrics.

    PubMed

    Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye

    2016-03-01

    The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.

  12. A comparison of methods to handle skew distributed cost variables in the analysis of the resource consumption in schizophrenia treatment.

    PubMed

    Kilian, Reinhold; Matschinger, Herbert; Löeffler, Walter; Roick, Christiane; Angermeyer, Matthias C

    2002-03-01

    Transformation of the dependent cost variable is often used to solve the problems of heteroscedasticity and skewness in linear ordinary least square regression of health service cost data. However, transformation may cause difficulties in the interpretation of regression coefficients and the retransformation of predicted values. The study compares the advantages and disadvantages of different methods to estimate regression based cost functions using data on the annual costs of schizophrenia treatment. Annual costs of psychiatric service use and clinical and socio-demographic characteristics of the patients were assessed for a sample of 254 patients with a diagnosis of schizophrenia (ICD-10 F 20.0) living in Leipzig. The clinical characteristics of the participants were assessed by means of the BPRS 4.0, the GAF, and the CAN for service needs. Quality of life was measured by WHOQOL-BREF. A linear OLS regression model with non-parametric standard errors, a log-transformed OLS model and a generalized linear model with a log-link and a gamma distribution were used to estimate service costs. For the estimation of robust non-parametric standard errors, the variance estimator by White and a bootstrap estimator based on 2000 replications were employed. Models were evaluated by the comparison of the R2 and the root mean squared error (RMSE). RMSE of the log-transformed OLS model was computed with three different methods of bias-correction. The 95% confidence intervals for the differences between the RMSE were computed by means of bootstrapping. A split-sample-cross-validation procedure was used to forecast the costs for the one half of the sample on the basis of a regression equation computed for the other half of the sample. All three methods showed significant positive influences of psychiatric symptoms and met psychiatric service needs on service costs. Only the log- transformed OLS model showed a significant negative impact of age, and only the GLM shows a significant negative influences of employment status and partnership on costs. All three models provided a R2 of about.31. The Residuals of the linear OLS model revealed significant deviances from normality and homoscedasticity. The residuals of the log-transformed model are normally distributed but still heteroscedastic. The linear OLS model provided the lowest prediction error and the best forecast of the dependent cost variable. The log-transformed model provided the lowest RMSE if the heteroscedastic bias correction was used. The RMSE of the GLM with a log link and a gamma distribution was higher than those of the linear OLS model and the log-transformed OLS model. The difference between the RMSE of the linear OLS model and that of the log-transformed OLS model without bias correction was significant at the 95% level. As result of the cross-validation procedure, the linear OLS model provided the lowest RMSE followed by the log-transformed OLS model with a heteroscedastic bias correction. The GLM showed the weakest model fit again. None of the differences between the RMSE resulting form the cross- validation procedure were found to be significant. The comparison of the fit indices of the different regression models revealed that the linear OLS model provided a better fit than the log-transformed model and the GLM, but the differences between the models RMSE were not significant. Due to the small number of cases in the study the lack of significance does not sufficiently proof that the differences between the RSME for the different models are zero and the superiority of the linear OLS model can not be generalized. The lack of significant differences among the alternative estimators may reflect a lack of sample size adequate to detect important differences among the estimators employed. Further studies with larger case number are necessary to confirm the results. Specification of an adequate regression models requires a careful examination of the characteristics of the data. Estimation of standard errors and confidence intervals by nonparametric methods which are robust against deviations from the normal distribution and the homoscedasticity of residuals are suitable alternatives to the transformation of the skew distributed dependent variable. Further studies with more adequate case numbers are needed to confirm the results.

  13. An Investigation of Multivariate Adaptive Regression Splines for Modeling and Analysis of Univariate and Semi-Multivariate Time Series Systems

    DTIC Science & Technology

    1991-09-01

    However, there is no guarantee that this would work; for instance if the data were generated by an ARCH model (Tong, 1990 pp. 116-117) then a simple...Hill, R., Griffiths, W., Lutkepohl, H., and Lee, T., Introduction to the Theory and Practice of Econometrics , 2th ed., Wiley, 1985. Kendall, M., Stuart

  14. Marginal regression approach for additive hazards models with clustered current status data.

    PubMed

    Su, Pei-Fang; Chi, Yunchan

    2014-01-15

    Current status data arise naturally from tumorigenicity experiments, epidemiology studies, biomedicine, econometrics and demographic and sociology studies. Moreover, clustered current status data may occur with animals from the same litter in tumorigenicity experiments or with subjects from the same family in epidemiology studies. Because the only information extracted from current status data is whether the survival times are before or after the monitoring or censoring times, the nonparametric maximum likelihood estimator of survival function converges at a rate of n(1/3) to a complicated limiting distribution. Hence, semiparametric regression models such as the additive hazards model have been extended for independent current status data to derive the test statistics, whose distributions converge at a rate of n(1/2) , for testing the regression parameters. However, a straightforward application of these statistical methods to clustered current status data is not appropriate because intracluster correlation needs to be taken into account. Therefore, this paper proposes two estimating functions for estimating the parameters in the additive hazards model for clustered current status data. The comparative results from simulation studies are presented, and the application of the proposed estimating functions to one real data set is illustrated. Copyright © 2013 John Wiley & Sons, Ltd.

  15. Profile local linear estimation of generalized semiparametric regression model for longitudinal data.

    PubMed

    Sun, Yanqing; Sun, Liuquan; Zhou, Jie

    2013-07-01

    This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.

  16. The quantile regression approach to efficiency measurement: insights from Monte Carlo simulations.

    PubMed

    Liu, Chunping; Laporte, Audrey; Ferguson, Brian S

    2008-09-01

    In the health economics literature there is an ongoing debate over approaches used to estimate the efficiency of health systems at various levels, from the level of the individual hospital - or nursing home - up to that of the health system as a whole. The two most widely used approaches to evaluating the efficiency with which various units deliver care are non-parametric data envelopment analysis (DEA) and parametric stochastic frontier analysis (SFA). Productivity researchers tend to have very strong preferences over which methodology to use for efficiency estimation. In this paper, we use Monte Carlo simulation to compare the performance of DEA and SFA in terms of their ability to accurately estimate efficiency. We also evaluate quantile regression as a potential alternative approach. A Cobb-Douglas production function, random error terms and a technical inefficiency term with different distributions are used to calculate the observed output. The results, based on these experiments, suggest that neither DEA nor SFA can be regarded as clearly dominant, and that, depending on the quantile estimated, the quantile regression approach may be a useful addition to the armamentarium of methods for estimating technical efficiency.

  17. Semiparametric regression analysis of interval-censored competing risks data.

    PubMed

    Mao, Lu; Lin, Dan-Yu; Zeng, Donglin

    2017-09-01

    Interval-censored competing risks data arise when each study subject may experience an event or failure from one of several causes and the failure time is not observed directly but rather is known to lie in an interval between two examinations. We formulate the effects of possibly time-varying (external) covariates on the cumulative incidence or sub-distribution function of competing risks (i.e., the marginal probability of failure from a specific cause) through a broad class of semiparametric regression models that captures both proportional and non-proportional hazards structures for the sub-distribution. We allow each subject to have an arbitrary number of examinations and accommodate missing information on the cause of failure. We consider nonparametric maximum likelihood estimation and devise a fast and stable EM-type algorithm for its computation. We then establish the consistency, asymptotic normality, and semiparametric efficiency of the resulting estimators for the regression parameters by appealing to modern empirical process theory. In addition, we show through extensive simulation studies that the proposed methods perform well in realistic situations. Finally, we provide an application to a study on HIV-1 infection with different viral subtypes. © 2017, The International Biometric Society.

  18. Improved spatial regression analysis of diffusion tensor imaging for lesion detection during longitudinal progression of multiple sclerosis in individual subjects

    NASA Astrophysics Data System (ADS)

    Liu, Bilan; Qiu, Xing; Zhu, Tong; Tian, Wei; Hu, Rui; Ekholm, Sven; Schifitto, Giovanni; Zhong, Jianhui

    2016-03-01

    Subject-specific longitudinal DTI study is vital for investigation of pathological changes of lesions and disease evolution. Spatial Regression Analysis of Diffusion tensor imaging (SPREAD) is a non-parametric permutation-based statistical framework that combines spatial regression and resampling techniques to achieve effective detection of localized longitudinal diffusion changes within the whole brain at individual level without a priori hypotheses. However, boundary blurring and dislocation limit its sensitivity, especially towards detecting lesions of irregular shapes. In the present study, we propose an improved SPREAD (dubbed improved SPREAD, or iSPREAD) method by incorporating a three-dimensional (3D) nonlinear anisotropic diffusion filtering method, which provides edge-preserving image smoothing through a nonlinear scale space approach. The statistical inference based on iSPREAD was evaluated and compared with the original SPREAD method using both simulated and in vivo human brain data. Results demonstrated that the sensitivity and accuracy of the SPREAD method has been improved substantially by adapting nonlinear anisotropic filtering. iSPREAD identifies subject-specific longitudinal changes in the brain with improved sensitivity, accuracy, and enhanced statistical power, especially when the spatial correlation is heterogeneous among neighboring image pixels in DTI.

  19. Incorporating Measurement Error from Modeled Air Pollution Exposures into Epidemiological Analyses.

    PubMed

    Samoli, Evangelia; Butland, Barbara K

    2017-12-01

    Outdoor air pollution exposures used in epidemiological studies are commonly predicted from spatiotemporal models incorporating limited measurements, temporal factors, geographic information system variables, and/or satellite data. Measurement error in these exposure estimates leads to imprecise estimation of health effects and their standard errors. We reviewed methods for measurement error correction that have been applied in epidemiological studies that use model-derived air pollution data. We identified seven cohort studies and one panel study that have employed measurement error correction methods. These methods included regression calibration, risk set regression calibration, regression calibration with instrumental variables, the simulation extrapolation approach (SIMEX), and methods under the non-parametric or parameter bootstrap. Corrections resulted in small increases in the absolute magnitude of the health effect estimate and its standard error under most scenarios. Limited application of measurement error correction methods in air pollution studies may be attributed to the absence of exposure validation data and the methodological complexity of the proposed methods. Future epidemiological studies should consider in their design phase the requirements for the measurement error correction method to be later applied, while methodological advances are needed under the multi-pollutants setting.

  20. MicroRNA-Related DNA Repair/Cell-Cycle Genes Independently Associated With Relapse After Radiation Therapy for Early Breast Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gee, Harriet E., E-mail: harriet.gee@sydney.edu.au; The Chris O'Brien Lifehouse, Missenden Road, Camperdown, NSW; Central Clinical School, Sydney Medical School, University of Sydney, NSW

    Purpose: Local recurrence and distant failure after adjuvant radiation therapy for breast cancer remain significant clinical problems, incompletely predicted by conventional clinicopathologic markers. We had previously identified microRNA-139-5p and microRNA-1274a as key regulators of breast cancer radiation response in vitro. The purpose of this study was to investigate standard clinicopathologic markers of local recurrence in a contemporary series and to establish whether putative target genes of microRNAs involved in DNA repair and cell cycle control could better predict radiation therapy response in vivo. Methods and Materials: With institutional ethics board approval, local recurrence was measured in a contemporary, prospectively collected series ofmore » 458 patients treated with radiation therapy after breast-conserving surgery. Additionally, independent publicly available mRNA/microRNA microarray expression datasets totaling >1000 early-stage breast cancer patients, treated with adjuvant radiation therapy, with >10 years of follow-up, were analyzed. The expression of putative microRNA target biomarkers—TOP2A, POLQ, RAD54L, SKP2, PLK2, and RAG1—were correlated with standard clinicopathologic variables using 2-sided nonparametric tests, and to local/distant relapse and survival using Kaplan-Meier and Cox regression analysis. Results: We found a low rate of isolated local recurrence (1.95%) in our modern series, and that few clinicopathologic variables (such as lymphovascular invasion) were significantly predictive. In multiple independent datasets (n>1000), however, high expression of RAD54L, TOP2A, POLQ, and SKP2 significantly correlated with local recurrence, survival, or both in univariate and multivariate analyses (P<.001). Low RAG1 expression significantly correlated with local recurrence (multivariate, P=.008). Additionally, RAD54L, SKP2, and PLK2 may be predictive, being prognostic in radiation therapy–treated patients but not in untreated matched control individuals (n=107; P<.05). Conclusions: Biomarkers of DNA repair and cell cycle control can identify patients at high risk of treatment failure in those receiving radiation therapy for early breast cancer in independent cohorts. These should be further investigated prospectively, especially TOP2A and SKP2, for which targeted therapies are available.« less

  1. On the Use of Nonparametric Item Characteristic Curve Estimation Techniques for Checking Parametric Model Fit

    ERIC Educational Resources Information Center

    Lee, Young-Sun; Wollack, James A.; Douglas, Jeffrey

    2009-01-01

    The purpose of this study was to assess the model fit of a 2PL through comparison with the nonparametric item characteristic curve (ICC) estimation procedures. Results indicate that three nonparametric procedures implemented produced ICCs that are similar to that of the 2PL for items simulated to fit the 2PL. However for misfitting items,…

  2. Sign realized jump risk and the cross-section of stock returns: Evidence from China's stock market.

    PubMed

    Chao, Youcong; Liu, Xiaoqun; Guo, Shijun

    2017-01-01

    Using 5-minute high frequency data from the Chinese stock market, we employ a non-parametric method to estimate Fama-French portfolio realized jumps and investigate whether the estimated positive, negative and sign realized jumps could forecast or explain the cross-sectional stock returns. The Fama-MacBeth regression results show that not only have the realized jump components and the continuous volatility been compensated with risk premium, but also that the negative jump risk, the positive jump risk and the sign jump risk, to some extent, could explain the return of the stock portfolios. Therefore, we should pay high attention to the downside tail risk and the upside tail risk.

  3. A new method of real-time detection of changes in periodic data stream

    NASA Astrophysics Data System (ADS)

    Lyu, Chen; Lu, Guoliang; Cheng, Bin; Zheng, Xiangwei

    2017-07-01

    The change point detection in periodic time series is much desirable in many practical usages. We present a novel algorithm for this task, which includes two phases: 1) anomaly measure- on the basis of a typical regression model, we propose a new computation method to measure anomalies in time series which does not require any reference data from other measurement(s); 2) change detection- we introduce a new martingale test for detection which can be operated in an unsupervised and nonparametric way. We have conducted extensive experiments to systematically test our algorithm. The results make us believe that our algorithm can be directly applicable in many real-world change-point-detection applications.

  4. A Study of Specific Fracture Energy at Percussion Drilling

    NASA Astrophysics Data System (ADS)

    A, Shadrina; T, Kabanova; V, Krets; L, Saruev

    2014-08-01

    The paper presents experimental studies of rock failure provided by percussion drilling. Quantification and qualitative analysis were carried out to estimate critical values of rock failure depending on the hammer pre-impact velocity, types of drill bits and cylindrical hammer parameters (weight, length, diameter), and turn angle of a drill bit. Obtained data in this work were compared with obtained results by other researchers. The particle-size distribution in granite-cutting sludge was analyzed in this paper. Statistical approach (Spearmen's rank-order correlation, multiple regression analysis with dummy variables, Kruskal-Wallis nonparametric test) was used to analyze the drilling process. Experimental data will be useful for specialists engaged in simulation and illustration of rock failure.

  5. A Fast, Locally Adaptive, Interactive Retrieval Algorithm for the Analysis of DIAL Measurements

    NASA Astrophysics Data System (ADS)

    Samarov, D. V.; Rogers, R.; Hair, J. W.; Douglass, K. O.; Plusquellic, D.

    2010-12-01

    Differential absorption light detection and ranging (DIAL) is a laser-based tool which is used for remote, range-resolved measurement of particular gases in the atmosphere, such as carbon-dioxide and methane. In many instances it is of interest to study how these gases are distributed over a region such as a landfill, factory, or farm. While a single DIAL measurement only tells us about the distribution of a gas along a single path, a sequence of consecutive measurements provides us with information on how that gas is distributed over a region, making DIAL a natural choice for such studies. DIAL measurements present a number of interesting challenges; first, in order to convert the raw data to concentration it is necessary to estimate the derivative along the path of the measurement. Second, as the distribution of gases across a region can be highly heterogeneous it is important that the spatial nature of the measurements be taken into account. Finally, since it is common for the set of collected measurements to be quite large it is important for the method to be computationally efficient. Existing work based on Local Polynomial Regression (LPR) has been developed which addresses the first two issues, but the issue of computational speed remains an open problem. In addition to the latter, another desirable property is to allow user input into the algorithm. In this talk we present a novel method based on LPR which utilizes a variant of the RODEO algorithm to provide a fast, locally adaptive and interactive approach to the analysis of DIAL measurements. This methodology is motivated by and applied to several simulated examples and a study out of NASA Langley Research Center (LaRC) looking at the estimation of aerosol extinction in the atmosphere. A comparison study of our method against several other algorithms is also presented. References Chaudhuri, P., Marron, J.S., Scale-space view of curve estimation, Annals of Statistics 28 (2000) 408-428. Duong, T., Cowling, A., Koch, I., Wand, M.P., Feature significance for multivariate kernel density estimation, Computational Statistics and Data Analysis 52 (2008) 4225-4242. Godtliebsen, F., Marron, J.S., Chaudhuri, P., Statistical Significance of features in digital images, Image and Vision Computing 22 (2004) 1093-1104. Lafferty, J., Wasserman, L., RODEO: Sparse, Greedy Nonparametric Regression, Annals of Statistics 36 (2008) 28-63. Lindstrom, T., Holst, U., Weibring, P., Analysis of lidar fields using local polynomial regression, Environmetrics 16 (2005) 619-634

  6. Depression and Anxiety in Greek Male Veterans After Retirement.

    PubMed

    Kypraiou, Aspa; Sarafis, Pavlos; Tsounis, Andreas; Bitsi, Georgia; Andreanides, Elias; Constantinidis, Theodoros; Kotrotsiou, Evaggelia; Malliarou, Maria

    2017-03-01

    Retirement is a turning point in human life, resulting in changes to physical and mental health status. The aim of this study was to examine the factors that are related with depression and anxiety symptoms in Greek male veterans after retirement. A total of 502 veterans participated in a cross-sectional study. Beck Depression Inventory for depression assessment and Spielberger Trait Anxiety Inventory for anxiety assessment were used. The Ethics Committee of the Technological Educational Institution of Thessaly granted permission for conducting the research, and informed consent was obtained from all the participants. Questionnaires were filled in electronically using a platform that was made for the specific research. Mean values, standard deviations, Student t test, nonparametric cluster analysis of variance, Pearson's and Spearman's coefficients, and linear regression were conducted, using the Statistical Program for Social Services version 19.0. Severe depression was found in 3.8% of veterans with a mean score of 6.78, whereas 23.2% displayed mild-to-moderate symptoms of depression. Mean score of state anxiety was found to be 36.55 and of trait anxiety 33.60. Veterans who were discharged because of stressful working conditions, those who have a high body mass index, consume regularly alcohol, smoke and were not satisfied by changes in their everyday life after retirement had significantly more symptoms of depression and anxiety, although those who retired because of family problems had significantly more symptoms of depression. Multivariate linear regression analyses indicated that dissatisfaction related to lifestyle changes had statistically significant effect on symptoms of depression and anxiety, and stressful working conditions as a leading cause for retirement had statistically significant effect on depression. Finally, according to linear regression analyses results, those who were satisfied with their professional evolution had 1.80 times lower score in depression scale. The sense of satisfaction derived from fulfilling work-related expectations when finishing a career, with changes in everyday life, and smoking and alcohol reduction, may contribute to a better adjustment during the retirement period. To our knowledge, this was the first study examining depression and anxiety levels in Greek veterans, and the sample size was large, covering a randomly chosen veteran population. On the other, it was a convenient sample, although the study results could not focus on direct-term effects of retirement (up to 3 years of retirement from active service). Primitive data may be used for research directions in the future. Reprint & Copyright © 2017 Association of Military Surgeons of the U.S.

  7. Multivariate Linear Regression and CART Regression Analysis of TBM Performance at Abu Hamour Phase-I Tunnel

    NASA Astrophysics Data System (ADS)

    Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.

    2017-12-01

    The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.

  8. Constraining geostatistical models with hydrological data to improve prediction realism

    NASA Astrophysics Data System (ADS)

    Demyanov, V.; Rojas, T.; Christie, M.; Arnold, D.

    2012-04-01

    Geostatistical models reproduce spatial correlation based on the available on site data and more general concepts about the modelled patters, e.g. training images. One of the problem of modelling natural systems with geostatistics is in maintaining realism spatial features and so they agree with the physical processes in nature. Tuning the model parameters to the data may lead to geostatistical realisations with unrealistic spatial patterns, which would still honour the data. Such model would result in poor predictions, even though although fit the available data well. Conditioning the model to a wider range of relevant data provide a remedy that avoid producing unrealistic features in spatial models. For instance, there are vast amounts of information about the geometries of river channels that can be used in describing fluvial environment. Relations between the geometrical channel characteristics (width, depth, wave length, amplitude, etc.) are complex and non-parametric and are exhibit a great deal of uncertainty, which is important to propagate rigorously into the predictive model. These relations can be described within a Bayesian approach as multi-dimensional prior probability distributions. We propose a way to constrain multi-point statistics models with intelligent priors obtained from analysing a vast collection of contemporary river patterns based on previously published works. We applied machine learning techniques, namely neural networks and support vector machines, to extract multivariate non-parametric relations between geometrical characteristics of fluvial channels from the available data. An example demonstrates how ensuring geological realism helps to deliver more reliable prediction of a subsurface oil reservoir in a fluvial depositional environment.

  9. Non-parametric combination and related permutation tests for neuroimaging.

    PubMed

    Winkler, Anderson M; Webster, Matthew A; Brooks, Jonathan C; Tracey, Irene; Smith, Stephen M; Nichols, Thomas E

    2016-04-01

    In this work, we show how permutation methods can be applied to combination analyses such as those that include multiple imaging modalities, multiple data acquisitions of the same modality, or simply multiple hypotheses on the same data. Using the well-known definition of union-intersection tests and closed testing procedures, we use synchronized permutations to correct for such multiplicity of tests, allowing flexibility to integrate imaging data with different spatial resolutions, surface and/or volume-based representations of the brain, including non-imaging data. For the problem of joint inference, we propose and evaluate a modification of the recently introduced non-parametric combination (NPC) methodology, such that instead of a two-phase algorithm and large data storage requirements, the inference can be performed in a single phase, with reasonable computational demands. The method compares favorably to classical multivariate tests (such as MANCOVA), even when the latter is assessed using permutations. We also evaluate, in the context of permutation tests, various combining methods that have been proposed in the past decades, and identify those that provide the best control over error rate and power across a range of situations. We show that one of these, the method of Tippett, provides a link between correction for the multiplicity of tests and their combination. Finally, we discuss how the correction can solve certain problems of multiple comparisons in one-way ANOVA designs, and how the combination is distinguished from conjunctions, even though both can be assessed using permutation tests. We also provide a common algorithm that accommodates combination and correction. © 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.

  10. Nonparametric, Coupled ,Bayesian ,Dictionary ,and Classifier Learning for Hyperspectral Classification.

    PubMed

    Akhtar, Naveed; Mian, Ajmal

    2017-10-03

    We present a principled approach to learn a discriminative dictionary along a linear classifier for hyperspectral classification. Our approach places Gaussian Process priors over the dictionary to account for the relative smoothness of the natural spectra, whereas the classifier parameters are sampled from multivariate Gaussians. We employ two Beta-Bernoulli processes to jointly infer the dictionary and the classifier. These processes are coupled under the same sets of Bernoulli distributions. In our approach, these distributions signify the frequency of the dictionary atom usage in representing class-specific training spectra, which also makes the dictionary discriminative. Due to the coupling between the dictionary and the classifier, the popularity of the atoms for representing different classes gets encoded into the classifier. This helps in predicting the class labels of test spectra that are first represented over the dictionary by solving a simultaneous sparse optimization problem. The labels of the spectra are predicted by feeding the resulting representations to the classifier. Our approach exploits the nonparametric Bayesian framework to automatically infer the dictionary size--the key parameter in discriminative dictionary learning. Moreover, it also has the desirable property of adaptively learning the association between the dictionary atoms and the class labels by itself. We use Gibbs sampling to infer the posterior probability distributions over the dictionary and the classifier under the proposed model, for which, we derive analytical expressions. To establish the effectiveness of our approach, we test it on benchmark hyperspectral images. The classification performance is compared with the state-of-the-art dictionary learning-based classification methods.

  11. Effects of Social Class and School Conditions on Educational Enrollment and Achievement of Boys and Girls in Rural Viet Nam

    ERIC Educational Resources Information Center

    Nguyen, Phuong L.

    2006-01-01

    This study examines the effects of parental SES, school quality, and community factors on children's enrollment and achievement in rural areas in Viet Nam, using logistic regression and ordered logistic regression. Multivariate analysis reveals significant differences in educational enrollment and outcomes by level of household expenditures and…

  12. Procedures for using signals from one sensor as substitutes for signals of another

    NASA Technical Reports Server (NTRS)

    Suits, G.; Malila, W.; Weller, T.

    1988-01-01

    Long-term monitoring of surface conditions may require a transfer from using data from one satellite sensor to data from a different sensor having different spectral characteristics. Two general procedures for spectral signal substitution are described in this paper, a principal-components procedure and a complete multivariate regression procedure. They are evaluated through a simulation study of five satellite sensors (MSS, TM, AVHRR, CZCS, and HRV). For illustration, they are compared to another recently described procedure for relating AVHRR and MSS signals. The multivariate regression procedure is shown to be best. TM can accurately emulate the other sensors, but they, on the other hand, have difficulty in accurately emulating its shortwave infrared bands (TM5 and TM7).

  13. Multivariate Analysis of Seismic Field Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alam, M. Kathleen

    1999-06-01

    This report includes the details of the model building procedure and prediction of seismic field data. Principal Components Regression, a multivariate analysis technique, was used to model seismic data collected as two pieces of equipment were cycled on and off. Models built that included only the two pieces of equipment of interest had trouble predicting data containing signals not included in the model. Evidence for poor predictions came from the prediction curves as well as spectral F-ratio plots. Once the extraneous signals were included in the model, predictions improved dramatically. While Principal Components Regression performed well for the present datamore » sets, the present data analysis suggests further work will be needed to develop more robust modeling methods as the data become more complex.« less

  14. Non-proportional odds multivariate logistic regression of ordinal family data.

    PubMed

    Zaloumis, Sophie G; Scurrah, Katrina J; Harrap, Stephen B; Ellis, Justine A; Gurrin, Lyle C

    2015-03-01

    Methods to examine whether genetic and/or environmental sources can account for the residual variation in ordinal family data usually assume proportional odds. However, standard software to fit the non-proportional odds model to ordinal family data is limited because the correlation structure of family data is more complex than for other types of clustered data. To perform these analyses we propose the non-proportional odds multivariate logistic regression model and take a simulation-based approach to model fitting using Markov chain Monte Carlo methods, such as partially collapsed Gibbs sampling and the Metropolis algorithm. We applied the proposed methodology to male pattern baldness data from the Victorian Family Heart Study. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Post-fire debris flow prediction in Western United States: Advancements based on a nonparametric statistical technique

    NASA Astrophysics Data System (ADS)

    Nikolopoulos, E. I.; Destro, E.; Bhuiyan, M. A. E.; Borga, M., Sr.; Anagnostou, E. N.

    2017-12-01

    Fire disasters affect modern societies at global scale inducing significant economic losses and human casualties. In addition to their direct impacts they have various adverse effects on hydrologic and geomorphologic processes of a region due to the tremendous alteration of the landscape characteristics (vegetation, soil properties etc). As a consequence, wildfires often initiate a cascade of hazards such as flash floods and debris flows that usually follow the occurrence of a wildfire thus magnifying the overall impact in a region. Post-fire debris flows (PFDF) is one such type of hazards frequently occurring in Western United States where wildfires are a common natural disaster. Prediction of PDFD is therefore of high importance in this region and over the last years a number of efforts from United States Geological Survey (USGS) and National Weather Service (NWS) have been focused on the development of early warning systems that will help mitigate PFDF risk. This work proposes a prediction framework that is based on a nonparametric statistical technique (random forests) that allows predicting the occurrence of PFDF at regional scale with a higher degree of accuracy than the commonly used approaches that are based on power-law thresholds and logistic regression procedures. The work presented is based on a recently released database from USGS that reports a total of 1500 storms that triggered and did not trigger PFDF in a number of fire affected catchments in Western United States. The database includes information on storm characteristics (duration, accumulation, max intensity etc) and other auxiliary information of land surface properties (soil erodibility index, local slope etc). Results show that the proposed model is able to achieve a satisfactory prediction accuracy (threat score > 0.6) superior of previously published prediction frameworks highlighting the potential of nonparametric statistical techniques for development of PFDF prediction systems.

  16. Persistence and stability of fish community structure in a southwest New York stream

    USGS Publications Warehouse

    Hansen, Michael J.; Ramm, Carl W.

    1994-01-01

    We used multivariate and nonparametric statistics to examine persistence and stability of fish species in the upper 43 km of French Creek, New York. Species occurred in upstream and downstream groups in 1937 that persisted in 1979. However, the downstream group expanded its range in the drainage from 1937 to 1979 at the expense of the upstream group. A dam prevented further upstream expansion of the downstream group. Ranks of species abundances were stable, as tests of group similarity were significant. The abundances and distributions of benthic species were stable across seven sampling dates in 1980 despite several floods and repeated removals by sampling that could have altered community structure. We conclude that the fish community in French Creek persisted and was stable over the 42-yr interval, 1937-1979, and that abundances of benthic species were stable in summer 1980.

  17. Genetic parameters for growth characteristics of free-range chickens under univariate random regression models.

    PubMed

    Rovadoscki, Gregori A; Petrini, Juliana; Ramirez-Diaz, Johanna; Pertile, Simone F N; Pertille, Fábio; Salvian, Mayara; Iung, Laiza H S; Rodriguez, Mary Ana P; Zampar, Aline; Gaya, Leila G; Carvalho, Rachel S B; Coelho, Antonio A D; Savino, Vicente J M; Coutinho, Luiz L; Mourão, Gerson B

    2016-09-01

    Repeated measures from the same individual have been analyzed by using repeatability and finite dimension models under univariate or multivariate analyses. However, in the last decade, the use of random regression models for genetic studies with longitudinal data have become more common. Thus, the aim of this research was to estimate genetic parameters for body weight of four experimental chicken lines by using univariate random regression models. Body weight data from hatching to 84 days of age (n = 34,730) from four experimental free-range chicken lines (7P, Caipirão da ESALQ, Caipirinha da ESALQ and Carijó Barbado) were used. The analysis model included the fixed effects of contemporary group (gender and rearing system), fixed regression coefficients for age at measurement, and random regression coefficients for permanent environmental effects and additive genetic effects. Heterogeneous variances for residual effects were considered, and one residual variance was assigned for each of six subclasses of age at measurement. Random regression curves were modeled by using Legendre polynomials of the second and third orders, with the best model chosen based on the Akaike Information Criterion, Bayesian Information Criterion, and restricted maximum likelihood. Multivariate analyses under the same animal mixed model were also performed for the validation of the random regression models. The Legendre polynomials of second order were better for describing the growth curves of the lines studied. Moderate to high heritabilities (h(2) = 0.15 to 0.98) were estimated for body weight between one and 84 days of age, suggesting that selection for body weight at all ages can be used as a selection criteria. Genetic correlations among body weight records obtained through multivariate analyses ranged from 0.18 to 0.96, 0.12 to 0.89, 0.06 to 0.96, and 0.28 to 0.96 in 7P, Caipirão da ESALQ, Caipirinha da ESALQ, and Carijó Barbado chicken lines, respectively. Results indicate that genetic gain for body weight can be achieved by selection. Also, selection for body weight at 42 days of age can be maintained as a selection criterion. © 2016 Poultry Science Association Inc.

  18. On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices

    PubMed Central

    Ye, Xin; Pendyala, Ram M.; Zou, Yajie

    2017-01-01

    A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences. PMID:29073152

  19. On the development of a semi-nonparametric generalized multinomial logit model for travel-related choices.

    PubMed

    Wang, Ke; Ye, Xin; Pendyala, Ram M; Zou, Yajie

    2017-01-01

    A semi-nonparametric generalized multinomial logit model, formulated using orthonormal Legendre polynomials to extend the standard Gumbel distribution, is presented in this paper. The resulting semi-nonparametric function can represent a probability density function for a large family of multimodal distributions. The model has a closed-form log-likelihood function that facilitates model estimation. The proposed method is applied to model commute mode choice among four alternatives (auto, transit, bicycle and walk) using travel behavior data from Argau, Switzerland. Comparisons between the multinomial logit model and the proposed semi-nonparametric model show that violations of the standard Gumbel distribution assumption lead to considerable inconsistency in parameter estimates and model inferences.

  20. Mathematical models for nonparametric inferences from line transect data

    USGS Publications Warehouse

    Burnham, K.P.; Anderson, D.R.

    1976-01-01

    A general mathematical theory of line transects is develoepd which supplies a framework for nonparametric density estimation based on either right angle or sighting distances. The probability of observing a point given its right angle distance (y) from the line is generalized to an arbitrary function g(y). Given only that g(O) = 1, it is shown there are nonparametric approaches to density estimation using the observed right angle distances. The model is then generalized to include sighting distances (r). Let f(y/r) be the conditional distribution of right angle distance given sighting distance. It is shown that nonparametric estimation based only on sighting distances requires we know the transformation of r given by f(O/r).

  1. Multivariate generalized hidden Markov regression models with random covariates: Physical exercise in an elderly population.

    PubMed

    Punzo, Antonio; Ingrassia, Salvatore; Maruotti, Antonello

    2018-04-22

    A time-varying latent variable model is proposed to jointly analyze multivariate mixed-support longitudinal data. The proposal can be viewed as an extension of hidden Markov regression models with fixed covariates (HMRMFCs), which is the state of the art for modelling longitudinal data, with a special focus on the underlying clustering structure. HMRMFCs are inadequate for applications in which a clustering structure can be identified in the distribution of the covariates, as the clustering is independent from the covariates distribution. Here, hidden Markov regression models with random covariates are introduced by explicitly specifying state-specific distributions for the covariates, with the aim of improving the recovering of the clusters in the data with respect to a fixed covariates paradigm. The hidden Markov regression models with random covariates class is defined focusing on the exponential family, in a generalized linear model framework. Model identifiability conditions are sketched, an expectation-maximization algorithm is outlined for parameter estimation, and various implementation and operational issues are discussed. Properties of the estimators of the regression coefficients, as well as of the hidden path parameters, are evaluated through simulation experiments and compared with those of HMRMFCs. The method is applied to physical activity data. Copyright © 2018 John Wiley & Sons, Ltd.

  2. A novel strategy for forensic age prediction by DNA methylation and support vector regression model

    PubMed Central

    Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei

    2015-01-01

    High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R2 > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20–80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications. PMID:26635134

  3. Analysis of Multivariate Experimental Data Using A Simplified Regression Model Search Algorithm

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred

    2013-01-01

    A new regression model search algorithm was developed in 2011 that may be used to analyze both general multivariate experimental data sets and wind tunnel strain-gage balance calibration data. The new algorithm is a simplified version of a more complex search algorithm that was originally developed at the NASA Ames Balance Calibration Laboratory. The new algorithm has the advantage that it needs only about one tenth of the original algorithm's CPU time for the completion of a search. In addition, extensive testing showed that the prediction accuracy of math models obtained from the simplified algorithm is similar to the prediction accuracy of math models obtained from the original algorithm. The simplified algorithm, however, cannot guarantee that search constraints related to a set of statistical quality requirements are always satisfied in the optimized regression models. Therefore, the simplified search algorithm is not intended to replace the original search algorithm. Instead, it may be used to generate an alternate optimized regression model of experimental data whenever the application of the original search algorithm either fails or requires too much CPU time. Data from a machine calibration of NASA's MK40 force balance is used to illustrate the application of the new regression model search algorithm.

  4. Comprehensive ripeness-index for prediction of ripening level in mangoes by multivariate modelling of ripening behaviour

    NASA Astrophysics Data System (ADS)

    Eyarkai Nambi, Vijayaram; Thangavel, Kuladaisamy; Manickavasagan, Annamalai; Shahir, Sultan

    2017-01-01

    Prediction of ripeness level in climacteric fruits is essential for post-harvest handling. An index capable of predicting ripening level with minimum inputs would be highly beneficial to the handlers, processors and researchers in fruit industry. A study was conducted with Indian mango cultivars to develop a ripeness index and associated model. Changes in physicochemical, colour and textural properties were measured throughout the ripening period and the period was classified into five stages (unripe, early ripe, partially ripe, ripe and over ripe). Multivariate regression techniques like partial least square regression, principal component regression and multi linear regression were compared and evaluated for its prediction. Multi linear regression model with 12 parameters was found more suitable in ripening prediction. Scientific variable reduction method was adopted to simplify the developed model. Better prediction was achieved with either 2 or 3 variables (total soluble solids, colour and acidity). Cross validation was done to increase the robustness and it was found that proposed ripening index was more effective in prediction of ripening stages. Three-variable model would be suitable for commercial applications where reasonable accuracies are sufficient. However, 12-variable model can be used to obtain more precise results in research and development applications.

  5. Assessing Goodness of Fit in Item Response Theory with Nonparametric Models: A Comparison of Posterior Probabilities and Kernel-Smoothing Approaches

    ERIC Educational Resources Information Center

    Sueiro, Manuel J.; Abad, Francisco J.

    2011-01-01

    The distance between nonparametric and parametric item characteristic curves has been proposed as an index of goodness of fit in item response theory in the form of a root integrated squared error index. This article proposes to use the posterior distribution of the latent trait as the nonparametric model and compares the performance of an index…

  6. TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis

    NASA Astrophysics Data System (ADS)

    Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.

    2016-02-01

    In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.

  7. Multivariate statistical analysis: Principles and applications to coorbital streams of meteorite falls

    NASA Technical Reports Server (NTRS)

    Wolf, S. F.; Lipschutz, M. E.

    1993-01-01

    Multivariate statistical analysis techniques (linear discriminant analysis and logistic regression) can provide powerful discrimination tools which are generally unfamiliar to the planetary science community. Fall parameters were used to identify a group of 17 H chondrites (Cluster 1) that were part of a coorbital stream which intersected Earth's orbit in May, from 1855 - 1895, and can be distinguished from all other H chondrite falls. Using multivariate statistical techniques, it was demonstrated that a totally different criterion, labile trace element contents - hence thermal histories - or 13 Cluster 1 meteorites are distinguishable from those of 45 non-Cluster 1 H chondrites. Here, we focus upon the principles of multivariate statistical techniques and illustrate their application using non-meteoritic and meteoritic examples.

  8. Nonlinear multivariate and time series analysis by neural network methods

    NASA Astrophysics Data System (ADS)

    Hsieh, William W.

    2004-03-01

    Methods in multivariate statistical analysis are essential for working with large amounts of geophysical data, data from observational arrays, from satellites, or from numerical model output. In classical multivariate statistical analysis, there is a hierarchy of methods, starting with linear regression at the base, followed by principal component analysis (PCA) and finally canonical correlation analysis (CCA). A multivariate time series method, the singular spectrum analysis (SSA), has been a fruitful extension of the PCA technique. The common drawback of these classical methods is that only linear structures can be correctly extracted from the data. Since the late 1980s, neural network methods have become popular for performing nonlinear regression and classification. More recently, neural network methods have been extended to perform nonlinear PCA (NLPCA), nonlinear CCA (NLCCA), and nonlinear SSA (NLSSA). This paper presents a unified view of the NLPCA, NLCCA, and NLSSA techniques and their applications to various data sets of the atmosphere and the ocean (especially for the El Niño-Southern Oscillation and the stratospheric quasi-biennial oscillation). These data sets reveal that the linear methods are often too simplistic to describe real-world systems, with a tendency to scatter a single oscillatory phenomenon into numerous unphysical modes or higher harmonics, which can be largely alleviated in the new nonlinear paradigm.

  9. Structural brain connectivity and cognitive ability differences: A multivariate distance matrix regression analysis.

    PubMed

    Ponsoda, Vicente; Martínez, Kenia; Pineda-Pardo, José A; Abad, Francisco J; Olea, Julio; Román, Francisco J; Barbey, Aron K; Colom, Roberto

    2017-02-01

    Neuroimaging research involves analyses of huge amounts of biological data that might or might not be related with cognition. This relationship is usually approached using univariate methods, and, therefore, correction methods are mandatory for reducing false positives. Nevertheless, the probability of false negatives is also increased. Multivariate frameworks have been proposed for helping to alleviate this balance. Here we apply multivariate distance matrix regression for the simultaneous analysis of biological and cognitive data, namely, structural connections among 82 brain regions and several latent factors estimating cognitive performance. We tested whether cognitive differences predict distances among individuals regarding their connectivity pattern. Beginning with 3,321 connections among regions, the 36 edges better predicted by the individuals' cognitive scores were selected. Cognitive scores were related to connectivity distances in both the full (3,321) and reduced (36) connectivity patterns. The selected edges connect regions distributed across the entire brain and the network defined by these edges supports high-order cognitive processes such as (a) (fluid) executive control, (b) (crystallized) recognition, learning, and language processing, and (c) visuospatial processing. This multivariate study suggests that one widespread, but limited number, of regions in the human brain, supports high-level cognitive ability differences. Hum Brain Mapp 38:803-816, 2017. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  10. Multivariate analysis of cytokine profiles in pregnancy complications.

    PubMed

    Azizieh, Fawaz; Dingle, Kamaludin; Raghupathy, Raj; Johnson, Kjell; VanderPlas, Jacob; Ansari, Ali

    2018-03-01

    The immunoregulation to tolerate the semiallogeneic fetus during pregnancy includes a harmonious dynamic balance between anti- and pro-inflammatory cytokines. Several earlier studies reported significantly different levels and/or ratios of several cytokines in complicated pregnancy as compared to normal pregnancy. However, as cytokines operate in networks with potentially complex interactions, it is also interesting to compare groups with multi-cytokine data sets, with multivariate analysis. Such analysis will further examine how great the differences are, and which cytokines are more different than others. Various multivariate statistical tools, such as Cramer test, classification and regression trees, partial least squares regression figures, 2-dimensional Kolmogorov-Smirmov test, principal component analysis and gap statistic, were used to compare cytokine data of normal vs anomalous groups of different pregnancy complications. Multivariate analysis assisted in examining if the groups were different, how strongly they differed, in what ways they differed and further reported evidence for subgroups in 1 group (pregnancy-induced hypertension), possibly indicating multiple causes for the complication. This work contributes to a better understanding of cytokines interaction and may have important implications on targeting cytokine balance modulation or design of future medications or interventions that best direct management or prevention from an immunological approach. © 2018 The Authors. American Journal of Reproductive Immunology Published by John Wiley & Sons Ltd.

  11. Clinical management provided by board-certificated physiatrists in early rehabilitation is a significant determinant of functional improvement in acute stroke patients: a retrospective analysis of Japan rehabilitation database.

    PubMed

    Kinoshita, Shoji; Kakuda, Wataru; Momosaki, Ryo; Yamada, Naoki; Sugawara, Hidekazu; Watanabe, Shu; Abo, Masahiro

    2015-05-01

    Early rehabilitation for acute stroke patients is widely recommended. We tested the hypothesis that clinical outcome of stroke patients who receive early rehabilitation managed by board-certificated physiatrists (BCP) is generally better than that provided by other medical specialties. Data of stroke patients who underwent early rehabilitation in 19 acute hospitals between January 2005 and December 2013 were collected from the Japan Rehabilitation Database and analyzed retrospectively. Multivariate linear regression analysis using generalized estimating equations method was performed to assess the association between Functional Independence Measure (FIM) effectiveness and management provided by BCP in early rehabilitation. In addition, multivariate logistic regression analysis was also performed to assess the impact of management provided by BCP in acute phase on discharge destination. After setting the inclusion criteria, data of 3838 stroke patients were eligible for analysis. BCP provided early rehabilitation in 814 patients (21.2%). Both the duration of daily exercise time and the frequency of regular conferencing were significantly higher for patients managed by BCP than by other specialties. Although the mortality rate was not different, multivariate regression analysis showed that FIM effectiveness correlated significantly and positively with the management provided by BCP (coefficient, .35; 95% confidence interval [CI], .012-.059; P < .005). In addition, multivariate logistic analysis identified clinical management by BCP as a significant determinant of home discharge (odds ratio, 1.24; 95% CI, 1.08-1.44; P < .005). Our retrospective cohort study demonstrated that clinical management provided by BCP in early rehabilitation can lead to functional recovery of acute stroke. Copyright © 2015 National Stroke Association. Published by Elsevier Inc. All rights reserved.

  12. Physical function in older men with hyperkyphosis.

    PubMed

    Katzman, Wendy B; Harrison, Stephanie L; Fink, Howard A; Marshall, Lynn M; Orwoll, Eric; Barrett-Connor, Elizabeth; Cawthon, Peggy M; Kado, Deborah M

    2015-05-01

    Age-related hyperkyphosis has been associated with poor physical function and is a well-established predictor of adverse health outcomes in older women, but its impact on health in older men is less well understood. We conducted a cross-sectional study to evaluate the association of hyperkyphosis and physical function in 2,363 men, aged 71-98 (M = 79) from the Osteoporotic Fractures in Men Study. Kyphosis was measured using the Rancho Bernardo Study block method. Measurements of grip strength and lower extremity function, including gait speed over 6 m, narrow walk (measure of dynamic balance), repeated chair stands ability and time, and lower extremity power (Nottingham Power Rig) were included separately as primary outcomes. We investigated associations of kyphosis and each outcome in age-adjusted and multivariable linear or logistic regression models, controlling for age, clinic, education, race, bone mineral density, height, weight, diabetes, and physical activity. In multivariate linear regression, we observed a dose-related response of worse scores on each lower extremity physical function test as number of blocks increased, p for trend ≤.001. Using a cutoff of ≥4 blocks, 20% (N = 469) of men were characterized with hyperkyphosis. In multivariate logistic regression, men with hyperkyphosis had increased odds (range 1.5-1.8) of being in the worst quartile of performing lower extremity physical function tasks (p < .001 for each outcome). Kyphosis was not associated with grip strength in any multivariate analysis. Hyperkyphosis is associated with impaired lower extremity physical function in older men. Further studies are needed to determine the direction of causality. © The Author 2014. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  13. Flexible link functions in nonparametric binary regression with Gaussian process priors.

    PubMed

    Li, Dan; Wang, Xia; Lin, Lizhen; Dey, Dipak K

    2016-09-01

    In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function. © 2015, The International Biometric Society.

  14. Flexible Link Functions in Nonparametric Binary Regression with Gaussian Process Priors

    PubMed Central

    Li, Dan; Lin, Lizhen; Dey, Dipak K.

    2015-01-01

    Summary In many scientific fields, it is a common practice to collect a sequence of 0-1 binary responses from a subject across time, space, or a collection of covariates. Researchers are interested in finding out how the expected binary outcome is related to covariates, and aim at better prediction in the future 0-1 outcomes. Gaussian processes have been widely used to model nonlinear systems; in particular to model the latent structure in a binary regression model allowing nonlinear functional relationship between covariates and the expectation of binary outcomes. A critical issue in modeling binary response data is the appropriate choice of link functions. Commonly adopted link functions such as probit or logit links have fixed skewness and lack the flexibility to allow the data to determine the degree of the skewness. To address this limitation, we propose a flexible binary regression model which combines a generalized extreme value link function with a Gaussian process prior on the latent structure. Bayesian computation is employed in model estimation. Posterior consistency of the resulting posterior distribution is demonstrated. The flexibility and gains of the proposed model are illustrated through detailed simulation studies and two real data examples. Empirical results show that the proposed model outperforms a set of alternative models, which only have either a Gaussian process prior on the latent regression function or a Dirichlet prior on the link function. PMID:26686333

  15. Estimating the expected value of partial perfect information in health economic evaluations using integrated nested Laplace approximation.

    PubMed

    Heath, Anna; Manolopoulou, Ioanna; Baio, Gianluca

    2016-10-15

    The Expected Value of Perfect Partial Information (EVPPI) is a decision-theoretic measure of the 'cost' of parametric uncertainty in decision making used principally in health economic decision making. Despite this decision-theoretic grounding, the uptake of EVPPI calculations in practice has been slow. This is in part due to the prohibitive computational time required to estimate the EVPPI via Monte Carlo simulations. However, recent developments have demonstrated that the EVPPI can be estimated by non-parametric regression methods, which have significantly decreased the computation time required to approximate the EVPPI. Under certain circumstances, high-dimensional Gaussian Process (GP) regression is suggested, but this can still be prohibitively expensive. Applying fast computation methods developed in spatial statistics using Integrated Nested Laplace Approximations (INLA) and projecting from a high-dimensional into a low-dimensional input space allows us to decrease the computation time for fitting these high-dimensional GP, often substantially. We demonstrate that the EVPPI calculated using our method for GP regression is in line with the standard GP regression method and that despite the apparent methodological complexity of this new method, R functions are available in the package BCEA to implement it simply and efficiently. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  16. Death Anxiety as a Predictor of Posttraumatic Stress Levels among Individuals with Spinal Cord Injuries

    ERIC Educational Resources Information Center

    Martz, Erin

    2004-01-01

    Because the onset of a spinal cord injury may involve a brush with death and because serious injury and disability can act as a reminder of death, death anxiety was examined as a predictor of posttraumatic stress levels among individuals with disabilities. This cross-sectional study used multiple regression and multivariate multiple regression to…

  17. Cellulose I crystallinity determination using FT-Raman spectroscopy : univariate and multivariate methods

    Treesearch

    Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph

    2010-01-01

    Two new methods based on FT–Raman spectroscopy, one simple, based on band intensity ratio, and the other using a partial least squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in cellulose I samples was determined based on univariate regression that was first developed using the Raman band...

  18. Predicting Potential Changes in Suitable Habitat and Distribution by 2100 for Tree Species of the Eastern United States

    Treesearch

    Louis R Iverson; Anantha M. Prasad; Mark W. Schwartz; Mark W. Schwartz

    2005-01-01

    We predict current distribution and abundance for tree species present in eastern North America, and subsequently estimate potential suitable habitat for those species under a changed climate with 2 x CO2. We used a series of statistical models (i.e., Regression Tree Analysis (RTA), Multivariate Adaptive Regression Splines (MARS), Bagging Trees (...

  19. Per capita community-level effects of an invasive grass, Microstegium vimineum, on vegetation in mesic forests in northern Mississippi (USA)

    Treesearch

    J. Stephen Brewer

    2010-01-01

    Quantifying per capita impacts of invasive species on resident communities requires integrating regression analyses with experiments under natural conditions. Using multivariate and univariate approaches, I regressed the abundance of 105 resident species of groundcover plants and tree seedlings against the abundance and height of an invasive grass, Microstegium...

  20. Quantitative analysis of binary polymorphs mixtures of fusidic acid by diffuse reflectance FTIR spectroscopy, diffuse reflectance FT-NIR spectroscopy, Raman spectroscopy and multivariate calibration.

    PubMed

    Guo, Canyong; Luo, Xuefang; Zhou, Xiaohua; Shi, Beijia; Wang, Juanjuan; Zhao, Jinqi; Zhang, Xiaoxia

    2017-06-05

    Vibrational spectroscopic techniques such as infrared, near-infrared and Raman spectroscopy have become popular in detecting and quantifying polymorphism of pharmaceutics since they are fast and non-destructive. This study assessed the ability of three vibrational spectroscopy combined with multivariate analysis to quantify a low-content undesired polymorph within a binary polymorphic mixture. Partial least squares (PLS) regression and support vector machine (SVM) regression were employed to build quantitative models. Fusidic acid, a steroidal antibiotic, was used as the model compound. It was found that PLS regression performed slightly better than SVM regression in all the three spectroscopic techniques. Root mean square errors of prediction (RMSEP) were ranging from 0.48% to 1.17% for diffuse reflectance FTIR spectroscopy and 1.60-1.93% for diffuse reflectance FT-NIR spectroscopy and 1.62-2.31% for Raman spectroscopy. The results indicate that diffuse reflectance FTIR spectroscopy offers significant advantages in providing accurate measurement of polymorphic content in the fusidic acid binary mixtures, while Raman spectroscopy is the least accurate technique for quantitative analysis of polymorphs. Copyright © 2017 Elsevier B.V. All rights reserved.

Top