Sample records for component regression model

  1. Variable selection and model choice in geoadditive regression models.

    PubMed

    Kneib, Thomas; Hothorn, Torsten; Tutz, Gerhard

    2009-06-01

    Model choice and variable selection are issues of major concern in practical regression analyses, arising in many biometric applications such as habitat suitability analyses, where the aim is to identify the influence of potentially many environmental conditions on certain species. We describe regression models for breeding bird communities that facilitate both model choice and variable selection, by a boosting algorithm that works within a class of geoadditive regression models comprising spatial effects, nonparametric effects of continuous covariates, interaction surfaces, and varying coefficients. The major modeling components are penalized splines and their bivariate tensor product extensions. All smooth model terms are represented as the sum of a parametric component and a smooth component with one degree of freedom to obtain a fair comparison between the model terms. A generic representation of the geoadditive model allows us to devise a general boosting algorithm that automatically performs model choice and variable selection.

  2. Regression Models for Identifying Noise Sources in Magnetic Resonance Images

    PubMed Central

    Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.

    2009-01-01

    Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478

  3. Accounting for measurement error in log regression models with applications to accelerated testing.

    PubMed

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  4. Multiresponse semiparametric regression for modelling the effect of regional socio-economic variables on the use of information technology

    NASA Astrophysics Data System (ADS)

    Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania

    2017-03-01

    Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.

  5. Application of third molar development and eruption models in estimating dental age in Malay sub-adults.

    PubMed

    Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc

    2015-08-01

    The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  6. Mixed-effects Gaussian process functional regression models with application to dose-response curve prediction.

    PubMed

    Shi, J Q; Wang, B; Will, E J; West, R M

    2012-11-20

    We propose a new semiparametric model for functional regression analysis, combining a parametric mixed-effects model with a nonparametric Gaussian process regression model, namely a mixed-effects Gaussian process functional regression model. The parametric component can provide explanatory information between the response and the covariates, whereas the nonparametric component can add nonlinearity. We can model the mean and covariance structures simultaneously, combining the information borrowed from other subjects with the information collected from each individual subject. We apply the model to dose-response curves that describe changes in the responses of subjects for differing levels of the dose of a drug or agent and have a wide application in many areas. We illustrate the method for the management of renal anaemia. An individual dose-response curve is improved when more information is included by this mechanism from the subject/patient over time, enabling a patient-specific treatment regime. Copyright © 2012 John Wiley & Sons, Ltd.

  7. Error Covariance Penalized Regression: A novel multivariate model combining penalized regression with multivariate error structure.

    PubMed

    Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C

    2018-06-29

    A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.

  8. Modeling vertebrate diversity in Oregon using satellite imagery

    NASA Astrophysics Data System (ADS)

    Cablk, Mary Elizabeth

    Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.

  9. [New method of mixed gas infrared spectrum analysis based on SVM].

    PubMed

    Bai, Peng; Xie, Wen-Jun; Liu, Jun-Hua

    2007-07-01

    A new method of infrared spectrum analysis based on support vector machine (SVM) for mixture gas was proposed. The kernel function in SVM was used to map the seriously overlapping absorption spectrum into high-dimensional space, and after transformation, the high-dimensional data could be processed in the original space, so the regression calibration model was established, then the regression calibration model with was applied to analyze the concentration of component gas. Meanwhile it was proved that the regression calibration model with SVM also could be used for component recognition of mixture gas. The method was applied to the analysis of different data samples. Some factors such as scan interval, range of the wavelength, kernel function and penalty coefficient C that affect the model were discussed. Experimental results show that the component concentration maximal Mean AE is 0.132%, and the component recognition accuracy is higher than 94%. The problems of overlapping absorption spectrum, using the same method for qualitative and quantitative analysis, and limit number of training sample, were solved. The method could be used in other mixture gas infrared spectrum analyses, promising theoretic and application values.

  10. Poisson Mixture Regression Models for Heart Disease Prediction.

    PubMed

    Mufudza, Chipo; Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.

  11. Poisson Mixture Regression Models for Heart Disease Prediction

    PubMed Central

    Erol, Hamza

    2016-01-01

    Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611

  12. An Excel Solver Exercise to Introduce Nonlinear Regression

    ERIC Educational Resources Information Center

    Pinder, Jonathan P.

    2013-01-01

    Business students taking business analytics courses that have significant predictive modeling components, such as marketing research, data mining, forecasting, and advanced financial modeling, are introduced to nonlinear regression using application software that is a "black box" to the students. Thus, although correct models are…

  13. Dirichlet Component Regression and its Applications to Psychiatric Data.

    PubMed

    Gueorguieva, Ralitza; Rosenheck, Robert; Zelterman, Daniel

    2008-08-15

    We describe a Dirichlet multivariable regression method useful for modeling data representing components as a percentage of a total. This model is motivated by the unmet need in psychiatry and other areas to simultaneously assess the effects of covariates on the relative contributions of different components of a measure. The model is illustrated using the Positive and Negative Syndrome Scale (PANSS) for assessment of schizophrenia symptoms which, like many other metrics in psychiatry, is composed of a sum of scores on several components, each in turn, made up of sums of evaluations on several questions. We simultaneously examine the effects of baseline socio-demographic and co-morbid correlates on all of the components of the total PANSS score of patients from a schizophrenia clinical trial and identify variables associated with increasing or decreasing relative contributions of each component. Several definitions of residuals are provided. Diagnostics include measures of overdispersion, Cook's distance, and a local jackknife influence metric.

  14. Regression to fuzziness method for estimation of remaining useful life in power plant components

    NASA Astrophysics Data System (ADS)

    Alamaniotis, Miltiadis; Grelle, Austin; Tsoukalas, Lefteri H.

    2014-10-01

    Mitigation of severe accidents in power plants requires the reliable operation of all systems and the on-time replacement of mechanical components. Therefore, the continuous surveillance of power systems is a crucial concern for the overall safety, cost control, and on-time maintenance of a power plant. In this paper a methodology called regression to fuzziness is presented that estimates the remaining useful life (RUL) of power plant components. The RUL is defined as the difference between the time that a measurement was taken and the estimated failure time of that component. The methodology aims to compensate for a potential lack of historical data by modeling an expert's operational experience and expertise applied to the system. It initially identifies critical degradation parameters and their associated value range. Once completed, the operator's experience is modeled through fuzzy sets which span the entire parameter range. This model is then synergistically used with linear regression and a component's failure point to estimate the RUL. The proposed methodology is tested on estimating the RUL of a turbine (the basic electrical generating component of a power plant) in three different cases. Results demonstrate the benefits of the methodology for components for which operational data is not readily available and emphasize the significance of the selection of fuzzy sets and the effect of knowledge representation on the predicted output. To verify the effectiveness of the methodology, it was benchmarked against the data-based simple linear regression model used for predictions which was shown to perform equal or worse than the presented methodology. Furthermore, methodology comparison highlighted the improvement in estimation offered by the adoption of appropriate of fuzzy sets for parameter representation.

  15. Modeling health survey data with excessive zero and K responses.

    PubMed

    Lin, Ting Hsiang; Tsai, Min-Hsiao

    2013-04-30

    Zero-inflated Poisson regression is a popular tool used to analyze data with excessive zeros. Although much work has already been performed to fit zero-inflated data, most models heavily depend on special features of the individual data. To be specific, this means that there is a sizable group of respondents who endorse the same answers making the data have peaks. In this paper, we propose a new model with the flexibility to model excessive counts other than zero, and the model is a mixture of multinomial logistic and Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts, including zeros, K (where K is a positive integer) and all other values. The Poisson regression component models the counts that are assumed to follow a Poisson distribution. Two examples are provided to illustrate our models when the data have counts containing many ones and sixes. As a result, the zero-inflated and K-inflated models exhibit a better fit than the zero-inflated Poisson and standard Poisson regressions. Copyright © 2012 John Wiley & Sons, Ltd.

  16. Classical Testing in Functional Linear Models.

    PubMed

    Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab

    2016-01-01

    We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.

  17. Classical Testing in Functional Linear Models

    PubMed Central

    Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab

    2016-01-01

    We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155

  18. A Demonstration of Regression False Positive Selection in Data Mining

    ERIC Educational Resources Information Center

    Pinder, Jonathan P.

    2014-01-01

    Business analytics courses, such as marketing research, data mining, forecasting, and advanced financial modeling, have substantial predictive modeling components. The predictive modeling in these courses requires students to estimate and test many linear regressions. As a result, false positive variable selection ("type I errors") is…

  19. Construction of mathematical model for measuring material concentration by colorimetric method

    NASA Astrophysics Data System (ADS)

    Liu, Bing; Gao, Lingceng; Yu, Kairong; Tan, Xianghua

    2018-06-01

    This paper use the method of multiple linear regression to discuss the data of C problem of mathematical modeling in 2017. First, we have established a regression model for the concentration of 5 substances. But only the regression model of the substance concentration of urea in milk can pass through the significance test. The regression model established by the second sets of data can pass the significance test. But this model exists serious multicollinearity. We have improved the model by principal component analysis. The improved model is used to control the system so that it is possible to measure the concentration of material by direct colorimetric method.

  20. Boosted Regression Tree Models to Explain Watershed Nutrient Concentrations and Biological Condition

    EPA Science Inventory

    Boosted regression tree (BRT) models were developed to quantify the nonlinear relationships between landscape variables and nutrient concentrations in a mesoscale mixed land cover watershed during base-flow conditions. Factors that affect instream biological components, based on ...

  1. Plant, soil, and shadow reflectance components of row crops

    NASA Technical Reports Server (NTRS)

    Richardson, A. J.; Wiegand, C. L.; Gausman, H. W.; Cuellar, J. A.; Gerbermann, A. H.

    1975-01-01

    Data from the first Earth Resource Technology Satellite (LANDSAT-1) multispectral scanner (MSS) were used to develop three plant canopy models (Kubelka-Munk (K-M), regression, and combined K-M and regression models) for extracting plant, soil, and shadow reflectance components of cropped fields. The combined model gave the best correlation between MSS data and ground truth, by accounting for essentially all of the reflectance of plants, soil, and shadow between crop rows. The principles presented can be used to better forecast crop yield and to estimate acreage.

  2. Source apportionment of soil heavy metals using robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR) receptor model.

    PubMed

    Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun

    2018-06-01

    The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.

  3. An Alternative Way to Model Population Ability Distributions in Large-Scale Educational Surveys

    ERIC Educational Resources Information Center

    Wetzel, Eunike; Xu, Xueli; von Davier, Matthias

    2015-01-01

    In large-scale educational surveys, a latent regression model is used to compensate for the shortage of cognitive information. Conventionally, the covariates in the latent regression model are principal components extracted from background data. This operational method has several important disadvantages, such as the handling of missing data and…

  4. The Breslow estimator of the nonparametric baseline survivor function in Cox's regression model: some heuristics.

    PubMed

    Hanley, James A

    2008-01-01

    Most survival analysis textbooks explain how the hazard ratio parameters in Cox's life table regression model are estimated. Fewer explain how the components of the nonparametric baseline survivor function are derived. Those that do often relegate the explanation to an "advanced" section and merely present the components as algebraic or iterative solutions to estimating equations. None comment on the structure of these estimators. This note brings out a heuristic representation that may help to de-mystify the structure.

  5. Oil and gas pipeline construction cost analysis and developing regression models for cost estimation

    NASA Astrophysics Data System (ADS)

    Thaduri, Ravi Kiran

    In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.

  6. Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression

    NASA Astrophysics Data System (ADS)

    Khikmah, L.; Wijayanto, H.; Syafitri, U. D.

    2017-04-01

    The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.

  7. Dirichlet Component Regression and its Applications to Psychiatric Data

    PubMed Central

    Gueorguieva, Ralitza; Rosenheck, Robert; Zelterman, Daniel

    2011-01-01

    Summary We describe a Dirichlet multivariable regression method useful for modeling data representing components as a percentage of a total. This model is motivated by the unmet need in psychiatry and other areas to simultaneously assess the effects of covariates on the relative contributions of different components of a measure. The model is illustrated using the Positive and Negative Syndrome Scale (PANSS) for assessment of schizophrenia symptoms which, like many other metrics in psychiatry, is composed of a sum of scores on several components, each in turn, made up of sums of evaluations on several questions. We simultaneously examine the effects of baseline socio-demographic and co-morbid correlates on all of the components of the total PANSS score of patients from a schizophrenia clinical trial and identify variables associated with increasing or decreasing relative contributions of each component. Several definitions of residuals are provided. Diagnostics include measures of overdispersion, Cook’s distance, and a local jackknife influence metric. PMID:22058582

  8. Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures.

    PubMed

    Austin, Peter C

    2010-04-22

    Multilevel logistic regression models are increasingly being used to analyze clustered data in medical, public health, epidemiological, and educational research. Procedures for estimating the parameters of such models are available in many statistical software packages. There is currently little evidence on the minimum number of clusters necessary to reliably fit multilevel regression models. We conducted a Monte Carlo study to compare the performance of different statistical software procedures for estimating multilevel logistic regression models when the number of clusters was low. We examined procedures available in BUGS, HLM, R, SAS, and Stata. We found that there were qualitative differences in the performance of different software procedures for estimating multilevel logistic models when the number of clusters was low. Among the likelihood-based procedures, estimation methods based on adaptive Gauss-Hermite approximations to the likelihood (glmer in R and xtlogit in Stata) or adaptive Gaussian quadrature (Proc NLMIXED in SAS) tended to have superior performance for estimating variance components when the number of clusters was small, compared to software procedures based on penalized quasi-likelihood. However, only Bayesian estimation with BUGS allowed for accurate estimation of variance components when there were fewer than 10 clusters. For all statistical software procedures, estimation of variance components tended to be poor when there were only five subjects per cluster, regardless of the number of clusters.

  9. QSAR modeling of flotation collectors using principal components extracted from topological indices.

    PubMed

    Natarajan, R; Nirdosh, Inderjit; Basak, Subhash C; Mills, Denise R

    2002-01-01

    Several topological indices were calculated for substituted-cupferrons that were tested as collectors for the froth flotation of uranium. The principal component analysis (PCA) was used for data reduction. Seven principal components (PC) were found to account for 98.6% of the variance among the computed indices. The principal components thus extracted were used in stepwise regression analyses to construct regression models for the prediction of separation efficiencies (Es) of the collectors. A two-parameter model with a correlation coefficient of 0.889 and a three-parameter model with a correlation coefficient of 0.913 were formed. PCs were found to be better than partition coefficient to form regression equations, and inclusion of an electronic parameter such as Hammett sigma or quantum mechanically derived electronic charges on the chelating atoms did not improve the correlation coefficient significantly. The method was extended to model the separation efficiencies of mercaptobenzothiazoles (MBT) and aminothiophenols (ATP) used in the flotation of lead and zinc ores, respectively. Five principal components were found to explain 99% of the data variability in each series. A three-parameter equation with correlation coefficient of 0.985 and a two-parameter equation with correlation coefficient of 0.926 were obtained for MBT and ATP, respectively. The amenability of separation efficiencies of chelating collectors to QSAR modeling using PCs based on topological indices might lead to the selection of collectors for synthesis and testing from a virtual database.

  10. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method.

    PubMed

    Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun

    2018-03-01

    Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  11. Application of Semiparametric Spline Regression Model in Analyzing Factors that In uence Population Density in Central Java

    NASA Astrophysics Data System (ADS)

    Sumantari, Y. D.; Slamet, I.; Sugiyanto

    2017-06-01

    Semiparametric regression is a statistical analysis method that consists of parametric and nonparametric regression. There are various approach techniques in nonparametric regression. One of the approach techniques is spline. Central Java is one of the most densely populated province in Indonesia. Population density in this province can be modeled by semiparametric regression because it consists of parametric and nonparametric component. Therefore, the purpose of this paper is to determine the factors that in uence population density in Central Java using the semiparametric spline regression model. The result shows that the factors which in uence population density in Central Java is Family Planning (FP) active participants and district minimum wage.

  12. Evaluation of the Use of Zero-Augmented Regression Techniques to Model Incidence of Campylobacter Infections in FoodNet.

    PubMed

    Tremblay, Marlène; Crim, Stacy M; Cole, Dana J; Hoekstra, Robert M; Henao, Olga L; Döpfer, Dörte

    2017-10-01

    The Foodborne Diseases Active Surveillance Network (FoodNet) is currently using a negative binomial (NB) regression model to estimate temporal changes in the incidence of Campylobacter infection. FoodNet active surveillance in 483 counties collected data on 40,212 Campylobacter cases between years 2004 and 2011. We explored models that disaggregated these data to allow us to account for demographic, geographic, and seasonal factors when examining changes in incidence of Campylobacter infection. We hypothesized that modeling structural zeros and including demographic variables would increase the fit of FoodNet's Campylobacter incidence regression models. Five different models were compared: NB without demographic covariates, NB with demographic covariates, hurdle NB with covariates in the count component only, hurdle NB with covariates in both zero and count components, and zero-inflated NB with covariates in the count component only. Of the models evaluated, the nonzero-augmented NB model with demographic variables provided the best fit. Results suggest that even though zero inflation was not present at this level, individualizing the level of aggregation and using different model structures and predictors per site might be required to correctly distinguish between structural and observational zeros and account for risk factors that vary geographically.

  13. Coping with Multicollinearity: An Example on Application of Principal Components Regression in Dendroecology

    Treesearch

    B. Desta Fekedulegn; J.J. Colbert; R.R., Jr. Hicks; Michael E. Schuckers

    2002-01-01

    The theory and application of principal components regression, a method for coping with multicollinearity among independent variables in analyzing ecological data, is exhibited in detail. A concrete example of the complex procedures that must be carried out in developing a diagnostic growth-climate model is provided. We use tree radial increment data taken from breast...

  14. Multivariate Analysis of Seismic Field Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Alam, M. Kathleen

    1999-06-01

    This report includes the details of the model building procedure and prediction of seismic field data. Principal Components Regression, a multivariate analysis technique, was used to model seismic data collected as two pieces of equipment were cycled on and off. Models built that included only the two pieces of equipment of interest had trouble predicting data containing signals not included in the model. Evidence for poor predictions came from the prediction curves as well as spectral F-ratio plots. Once the extraneous signals were included in the model, predictions improved dramatically. While Principal Components Regression performed well for the present datamore » sets, the present data analysis suggests further work will be needed to develop more robust modeling methods as the data become more complex.« less

  15. Application of principal component regression and artificial neural network in FT-NIR soluble solids content determination of intact pear fruit

    NASA Astrophysics Data System (ADS)

    Ying, Yibin; Liu, Yande; Fu, Xiaping; Lu, Huishan

    2005-11-01

    The artificial neural networks (ANNs) have been used successfully in applications such as pattern recognition, image processing, automation and control. However, majority of today's applications of ANNs is back-propagate feed-forward ANN (BP-ANN). In this paper, back-propagation artificial neural networks (BP-ANN) were applied for modeling soluble solid content (SSC) of intact pear from their Fourier transform near infrared (FT-NIR) spectra. One hundred and sixty-four pear samples were used to build the calibration models and evaluate the models predictive ability. The results are compared to the classical calibration approaches, i.e. principal component regression (PCR), partial least squares (PLS) and non-linear PLS (NPLS). The effects of the optimal methods of training parameters on the prediction model were also investigated. BP-ANN combine with principle component regression (PCR) resulted always better than the classical PCR, PLS and Weight-PLS methods, from the point of view of the predictive ability. Based on the results, it can be concluded that FT-NIR spectroscopy and BP-ANN models can be properly employed for rapid and nondestructive determination of fruit internal quality.

  16. Analysis and improvement measures of flight delay in China

    NASA Astrophysics Data System (ADS)

    Zang, Yuhang

    2017-03-01

    Firstly, this paper establishes the principal component regression model to analyze the data quantitatively, based on principal component analysis to get the three principal component factors of flight delays. Then the least square method is used to analyze the factors and obtained the regression equation expression by substitution, and then found that the main reason for flight delays is airlines, followed by weather and traffic. Aiming at the above problems, this paper improves the controllable aspects of traffic flow control. For reasons of traffic flow control, an adaptive genetic queuing model is established for the runway terminal area. This paper, establish optimization method that fifteen planes landed simultaneously on the three runway based on Beijing capital international airport, comparing the results with the existing FCFS algorithm, the superiority of the model is proved.

  17. Sparse modeling of spatial environmental variables associated with asthma

    PubMed Central

    Chang, Timothy S.; Gangnon, Ronald E.; Page, C. David; Buckingham, William R.; Tandias, Aman; Cowan, Kelly J.; Tomasallo, Carrie D.; Arndt, Brian G.; Hanrahan, Lawrence P.; Guilbert, Theresa W.

    2014-01-01

    Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin’s Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5–50 years over a three-year period. Each patient’s home address was geocoded to one of 3,456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin’s geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. PMID:25533437

  18. Sparse modeling of spatial environmental variables associated with asthma.

    PubMed

    Chang, Timothy S; Gangnon, Ronald E; David Page, C; Buckingham, William R; Tandias, Aman; Cowan, Kelly J; Tomasallo, Carrie D; Arndt, Brian G; Hanrahan, Lawrence P; Guilbert, Theresa W

    2015-02-01

    Geographically distributed environmental factors influence the burden of diseases such as asthma. Our objective was to identify sparse environmental variables associated with asthma diagnosis gathered from a large electronic health record (EHR) dataset while controlling for spatial variation. An EHR dataset from the University of Wisconsin's Family Medicine, Internal Medicine and Pediatrics Departments was obtained for 199,220 patients aged 5-50years over a three-year period. Each patient's home address was geocoded to one of 3456 geographic census block groups. Over one thousand block group variables were obtained from a commercial database. We developed a Sparse Spatial Environmental Analysis (SASEA). Using this method, the environmental variables were first dimensionally reduced with sparse principal component analysis. Logistic thin plate regression spline modeling was then used to identify block group variables associated with asthma from sparse principal components. The addresses of patients from the EHR dataset were distributed throughout the majority of Wisconsin's geography. Logistic thin plate regression spline modeling captured spatial variation of asthma. Four sparse principal components identified via model selection consisted of food at home, dog ownership, household size, and disposable income variables. In rural areas, dog ownership and renter occupied housing units from significant sparse principal components were associated with asthma. Our main contribution is the incorporation of sparsity in spatial modeling. SASEA sequentially added sparse principal components to Logistic thin plate regression spline modeling. This method allowed association of geographically distributed environmental factors with asthma using EHR and environmental datasets. SASEA can be applied to other diseases with environmental risk factors. Copyright © 2014 Elsevier Inc. All rights reserved.

  19. Characterization of shrubland ecosystem components as continuous fields in the northwest United States

    USGS Publications Warehouse

    Xian, George Z.; Homer, Collin G.; Rigge, Matthew B.; Shi, Hua; Meyer, Debbie

    2015-01-01

    Accurate and consistent estimates of shrubland ecosystem components are crucial to a better understanding of ecosystem conditions in arid and semiarid lands. An innovative approach was developed by integrating multiple sources of information to quantify shrubland components as continuous field products within the National Land Cover Database (NLCD). The approach consists of several procedures including field sample collections, high-resolution mapping of shrubland components using WorldView-2 imagery and regression tree models, Landsat 8 radiometric balancing and phenological mosaicking, medium resolution estimates of shrubland components following different climate zones using Landsat 8 phenological mosaics and regression tree models, and product validation. Fractional covers of nine shrubland components were estimated: annual herbaceous, bare ground, big sagebrush, herbaceous, litter, sagebrush, shrub, sagebrush height, and shrub height. Our study area included the footprint of six Landsat 8 scenes in the northwestern United States. Results show that most components have relatively significant correlations with validation data, have small normalized root mean square errors, and correspond well with expected ecological gradients. While some uncertainties remain with height estimates, the model formulated in this study provides a cross-validated, unbiased, and cost effective approach to quantify shrubland components at a regional scale and advances knowledge of horizontal and vertical variability of these components.

  20. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  1. Temporal trend and climate factors of hemorrhagic fever with renal syndrome epidemic in Shenyang City, China

    PubMed Central

    2011-01-01

    Background Hemorrhagic fever with renal syndrome (HFRS) is an important infectious disease caused by different species of hantaviruses. As a rodent-borne disease with a seasonal distribution, external environmental factors including climate factors may play a significant role in its transmission. The city of Shenyang is one of the most seriously endemic areas for HFRS. Here, we characterized the dynamic temporal trend of HFRS, and identified climate-related risk factors and their roles in HFRS transmission in Shenyang, China. Methods The annual and monthly cumulative numbers of HFRS cases from 2004 to 2009 were calculated and plotted to show the annual and seasonal fluctuation in Shenyang. Cross-correlation and autocorrelation analyses were performed to detect the lagged effect of climate factors on HFRS transmission and the autocorrelation of monthly HFRS cases. Principal component analysis was constructed by using climate data from 2004 to 2009 to extract principal components of climate factors to reduce co-linearity. The extracted principal components and autocorrelation terms of monthly HFRS cases were added into a multiple regression model called principal components regression model (PCR) to quantify the relationship between climate factors, autocorrelation terms and transmission of HFRS. The PCR model was compared to a general multiple regression model conducted only with climate factors as independent variables. Results A distinctly declining temporal trend of annual HFRS incidence was identified. HFRS cases were reported every month, and the two peak periods occurred in spring (March to May) and winter (November to January), during which, nearly 75% of the HFRS cases were reported. Three principal components were extracted with a cumulative contribution rate of 86.06%. Component 1 represented MinRH0, MT1, RH1, and MWV1; component 2 represented RH2, MaxT3, and MAP3; and component 3 represented MaxT2, MAP2, and MWV2. The PCR model was composed of three principal components and two autocorrelation terms. The association between HFRS epidemics and climate factors was better explained in the PCR model (F = 446.452, P < 0.001, adjusted R2 = 0.75) than in the general multiple regression model (F = 223.670, P < 0.000, adjusted R2 = 0.51). Conclusion The temporal distribution of HFRS in Shenyang varied in different years with a distinctly declining trend. The monthly trends of HFRS were significantly associated with local temperature, relative humidity, precipitation, air pressure, and wind velocity of the different previous months. The model conducted in this study will make HFRS surveillance simpler and the control of HFRS more targeted in Shenyang. PMID:22133347

  2. A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test

    NASA Technical Reports Server (NTRS)

    Messer, Bradley P.

    2004-01-01

    Propulsion ground test facilities face the daily challenges of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Due to budgetary and schedule constraints, NASA and industry customers are pushing to test more components, for less money, in a shorter period of time. As these new rocket engine component test programs are undertaken, the lack of technology maturity in the test articles, combined with pushing the test facilities capabilities to their limits, tends to lead to an increase in facility breakdowns and unsuccessful tests. Over the last five years Stennis Space Center's propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and broken numerous test facility and test article parts. While various initiatives have been implemented to provide better propulsion test techniques and improve the quality, reliability, and maintainability of goods and parts used in the propulsion test facilities, unexpected failures during testing still occur quite regularly due to the harsh environment in which the propulsion test facilities operate. Previous attempts at modeling the lifecycle of a propulsion component test project have met with little success. Each of the attempts suffered form incomplete or inconsistent data on which to base the models. By focusing on the actual test phase of the tests project rather than the formulation, design or construction phases of the test project, the quality and quantity of available data increases dramatically. A logistic regression model has been developed form the data collected over the last five years, allowing the probability of successfully completing a rocket propulsion component test to be calculated. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),..,X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure. Logistic regression has primarily been used in the fields of epidemiology and biomedical research, but lends itself to many other applications. As indicated the use of logistic regression is not new, however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from the models provide project managers with insight and confidence into the affectivity of rocket engine component ground test projects. The initial success in modeling rocket propulsion ground test projects clears the way for more complex models to be developed in this area.

  3. Importance of spatial autocorrelation in modeling bird distributions at a continental scale

    USGS Publications Warehouse

    Bahn, V.; O'Connor, R.J.; Krohn, W.B.

    2006-01-01

    Spatial autocorrelation in species' distributions has been recognized as inflating the probability of a type I error in hypotheses tests, causing biases in variable selection, and violating the assumption of independence of error terms in models such as correlation or regression. However, it remains unclear whether these problems occur at all spatial resolutions and extents, and under which conditions spatially explicit modeling techniques are superior. Our goal was to determine whether spatial models were superior at large extents and across many different species. In addition, we investigated the importance of purely spatial effects in distribution patterns relative to the variation that could be explained through environmental conditions. We studied distribution patterns of 108 bird species in the conterminous United States using ten years of data from the Breeding Bird Survey. We compared the performance of spatially explicit regression models with non-spatial regression models using Akaike's information criterion. In addition, we partitioned the variance in species distributions into an environmental, a pure spatial and a shared component. The spatially-explicit conditional autoregressive regression models strongly outperformed the ordinary least squares regression models. In addition, partialling out the spatial component underlying the species' distributions showed that an average of 17% of the explained variation could be attributed to purely spatial effects independent of the spatial autocorrelation induced by the underlying environmental variables. We concluded that location in the range and neighborhood play an important role in the distribution of species. Spatially explicit models are expected to yield better predictions especially for mobile species such as birds, even in coarse-grained models with a large extent. ?? Ecography.

  4. A principal component regression model to forecast airborne concentration of Cupressaceae pollen in the city of Granada (SE Spain), during 1995-2006.

    PubMed

    Ocaña-Peinado, Francisco M; Valderrama, Mariano J; Bouzas, Paula R

    2013-05-01

    The problem of developing a 2-week-on ahead forecast of atmospheric cypress pollen levels is tackled in this paper by developing a principal component multiple regression model involving several climatic variables. The efficacy of the proposed model is validated by means of an application to real data of Cupressaceae pollen concentration in the city of Granada (southeast of Spain). The model was applied to data from 11 consecutive years (1995-2005), with 2006 being used to validate the forecasts. Based on the work of different authors, factors as temperature, humidity, hours of sun and wind speed were incorporated in the model. This methodology explains approximately 75-80% of the variability in the airborne Cupressaceae pollen concentration.

  5. An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

    NASA Astrophysics Data System (ADS)

    Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

    2018-03-01

    In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.

  6. Functional Data Analysis Applied to Modeling of Severe Acute Mucositis and Dysphagia Resulting From Head and Neck Radiation Therapy.

    PubMed

    Dean, Jamie A; Wong, Kee H; Gay, Hiram; Welsh, Liam C; Jones, Ann-Britt; Schick, Ulrike; Oh, Jung Hun; Apte, Aditya; Newbold, Kate L; Bhide, Shreerang A; Harrington, Kevin J; Deasy, Joseph O; Nutting, Christopher M; Gulliford, Sarah L

    2016-11-15

    Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue-sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogram data. The reduced dose data were input into functional logistic regression models (functional partial least squares-logistic regression [FPLS-LR] and functional principal component-logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate-response associations, assessed using bootstrapping. The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/-0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/-0.96, 0.79/-0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling. Copyright © 2016 The Author(s). Published by Elsevier Inc. All rights reserved.

  7. Structured functional additive regression in reproducing kernel Hilbert spaces.

    PubMed

    Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen

    2014-06-01

    Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application.

  8. A kinetic model of municipal sludge degradation during non-catalytic wet oxidation.

    PubMed

    Prince-Pike, Arrian; Wilson, David I; Baroutian, Saeid; Andrews, John; Gapes, Daniel J

    2015-12-15

    Wet oxidation is a successful process for the treatment of municipal sludge. In addition, the resulting effluent from wet oxidation is a useful carbon source for subsequent biological nutrient removal processes in wastewater treatment. Owing to limitations with current kinetic models, this study produced a kinetic model which predicts the concentrations of key intermediate components during wet oxidation. The model was regressed from lab-scale experiments and then subsequently validated using data from a wet oxidation pilot plant. The model was shown to be accurate in predicting the concentrations of each component, and produced good results when applied to a plant 500 times larger in size. A statistical study was undertaken to investigate the validity of the regressed model parameters. Finally the usefulness of the model was demonstrated by suggesting optimum operating conditions such that volatile fatty acids were maximised. Copyright © 2015 Elsevier Ltd. All rights reserved.

  9. Wavelet regression model in forecasting crude oil price

    NASA Astrophysics Data System (ADS)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  10. Least Principal Components Analysis (LPCA): An Alternative to Regression Analysis.

    ERIC Educational Resources Information Center

    Olson, Jeffery E.

    Often, all of the variables in a model are latent, random, or subject to measurement error, or there is not an obvious dependent variable. When any of these conditions exist, an appropriate method for estimating the linear relationships among the variables is Least Principal Components Analysis. Least Principal Components are robust, consistent,…

  11. An EM-based semi-parametric mixture model approach to the regression analysis of competing-risks data.

    PubMed

    Ng, S K; McLachlan, G J

    2003-04-15

    We consider a mixture model approach to the regression analysis of competing-risks data. Attention is focused on inference concerning the effects of factors on both the probability of occurrence and the hazard rate conditional on each of the failure types. These two quantities are specified in the mixture model using the logistic model and the proportional hazards model, respectively. We propose a semi-parametric mixture method to estimate the logistic and regression coefficients jointly, whereby the component-baseline hazard functions are completely unspecified. Estimation is based on maximum likelihood on the basis of the full likelihood, implemented via an expectation-conditional maximization (ECM) algorithm. Simulation studies are performed to compare the performance of the proposed semi-parametric method with a fully parametric mixture approach. The results show that when the component-baseline hazard is monotonic increasing, the semi-parametric and fully parametric mixture approaches are comparable for mildly and moderately censored samples. When the component-baseline hazard is not monotonic increasing, the semi-parametric method consistently provides less biased estimates than a fully parametric approach and is comparable in efficiency in the estimation of the parameters for all levels of censoring. The methods are illustrated using a real data set of prostate cancer patients treated with different dosages of the drug diethylstilbestrol. Copyright 2003 John Wiley & Sons, Ltd.

  12. Estimated long-term outdoor air pollution concentrations in a cohort study

    NASA Astrophysics Data System (ADS)

    Beelen, Rob; Hoek, Gerard; Fischer, Paul; Brandt, Piet A. van den; Brunekreef, Bert

    Several recent studies associated long-term exposure to air pollution with increased mortality. An ongoing cohort study, the Netherlands Cohort Study on Diet and Cancer (NLCS), was used to study the association between long-term exposure to traffic-related air pollution and mortality. Following on a previous exposure assessment study in the NLCS, we improved the exposure assessment methods. Long-term exposure to nitrogen dioxide (NO 2), nitrogen oxide (NO), black smoke (BS), and sulphur dioxide (SO 2) was estimated. Exposure at each home address ( N=21 868) was considered as a function of a regional, an urban and a local component. The regional component was estimated using inverse distance weighed interpolation of measurement data from regional background sites in a national monitoring network. Regression models with urban concentrations as dependent variables, and number of inhabitants in different buffers and land use variables, derived with a Geographic Information System (GIS), as predictor variables were used to estimate the urban component. The local component was assessed using a GIS and a digital road network with linked traffic intensities. Traffic intensity on the nearest road and on the nearest major road, and the sum of traffic intensity in a buffer of 100 m around each home address were assessed. Further, a quantitative estimate of the local component was estimated. The regression models to estimate the urban component explained 67%, 46%, 49% and 35% of the variances of NO 2, NO, BS, and SO 2 concentrations, respectively. Overall regression models which incorporated the regional, urban and local component explained 84%, 44%, 59% and 56% of the variability in concentrations for NO 2, NO, BS and SO 2, respectively. We were able to develop an exposure assessment model using GIS methods and traffic intensities that explained a large part of the variations in outdoor air pollution concentrations.

  13. Prediction by regression and intrarange data scatter in surface-process studies

    USGS Publications Warehouse

    Toy, T.J.; Osterkamp, W.R.; Renard, K.G.

    1993-01-01

    Modeling is a major component of contemporary earth science, and regression analysis occupies a central position in the parameterization, calibration, and validation of geomorphic and hydrologic models. Although this methodology can be used in many ways, we are primarily concerned with the prediction of values for one variable from another variable. Examination of the literature reveals considerable inconsistency in the presentation of the results of regression analysis and the occurrence of patterns in the scatter of data points about the regression line. Both circumstances confound utilization and evaluation of the models. Statisticians are well aware of various problems associated with the use of regression analysis and offer improved practices; often, however, their guidelines are not followed. After a review of the aforementioned circumstances and until standard criteria for model evaluation become established, we recommend, as a minimum, inclusion of scatter diagrams, the standard error of the estimate, and sample size in reporting the results of regression analyses for most surface-process studies. ?? 1993 Springer-Verlag.

  14. Principal component regression analysis with SPSS.

    PubMed

    Liu, R X; Kuang, J; Gong, Q; Hou, X L

    2003-06-01

    The paper introduces all indices of multicollinearity diagnoses, the basic principle of principal component regression and determination of 'best' equation method. The paper uses an example to describe how to do principal component regression analysis with SPSS 10.0: including all calculating processes of the principal component regression and all operations of linear regression, factor analysis, descriptives, compute variable and bivariate correlations procedures in SPSS 10.0. The principal component regression analysis can be used to overcome disturbance of the multicollinearity. The simplified, speeded up and accurate statistical effect is reached through the principal component regression analysis with SPSS.

  15. Structured functional additive regression in reproducing kernel Hilbert spaces

    PubMed Central

    Zhu, Hongxiao; Yao, Fang; Zhang, Hao Helen

    2013-01-01

    Summary Functional additive models (FAMs) provide a flexible yet simple framework for regressions involving functional predictors. The utilization of data-driven basis in an additive rather than linear structure naturally extends the classical functional linear model. However, the critical issue of selecting nonlinear additive components has been less studied. In this work, we propose a new regularization framework for the structure estimation in the context of Reproducing Kernel Hilbert Spaces. The proposed approach takes advantage of the functional principal components which greatly facilitates the implementation and the theoretical analysis. The selection and estimation are achieved by penalized least squares using a penalty which encourages the sparse structure of the additive components. Theoretical properties such as the rate of convergence are investigated. The empirical performance is demonstrated through simulation studies and a real data application. PMID:25013362

  16. Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert

    2013-01-01

    Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.

  17. Quantile regression in the presence of monotone missingness with sensitivity analysis

    PubMed Central

    Liu, Minzhao; Daniels, Michael J.; Perri, Michael G.

    2016-01-01

    In this paper, we develop methods for longitudinal quantile regression when there is monotone missingness. In particular, we propose pattern mixture models with a constraint that provides a straightforward interpretation of the marginal quantile regression parameters. Our approach allows sensitivity analysis which is an essential component in inference for incomplete data. To facilitate computation of the likelihood, we propose a novel way to obtain analytic forms for the required integrals. We conduct simulations to examine the robustness of our approach to modeling assumptions and compare its performance to competing approaches. The model is applied to data from a recent clinical trial on weight management. PMID:26041008

  18. Forecasting daily meteorological time series using ARIMA and regression models

    NASA Astrophysics Data System (ADS)

    Murat, Małgorzata; Malinowska, Iwona; Gos, Magdalena; Krzyszczak, Jaromir

    2018-04-01

    The daily air temperature and precipitation time series recorded between January 1, 1980 and December 31, 2010 in four European sites (Jokioinen, Dikopshof, Lleida and Lublin) from different climatic zones were modeled and forecasted. In our forecasting we used the methods of the Box-Jenkins and Holt- Winters seasonal auto regressive integrated moving-average, the autoregressive integrated moving-average with external regressors in the form of Fourier terms and the time series regression, including trend and seasonality components methodology with R software. It was demonstrated that obtained models are able to capture the dynamics of the time series data and to produce sensible forecasts.

  19. Improving Global Models of Remotely Sensed Ocean Chlorophyll Content Using Partial Least Squares and Geographically Weighted Regression

    NASA Astrophysics Data System (ADS)

    Gholizadeh, H.; Robeson, S. M.

    2015-12-01

    Empirical models have been widely used to estimate global chlorophyll content from remotely sensed data. Here, we focus on the standard NASA empirical models that use blue-green band ratios. These band ratio ocean color (OC) algorithms are in the form of fourth-order polynomials and the parameters of these polynomials (i.e. coefficients) are estimated from the NASA bio-Optical Marine Algorithm Data set (NOMAD). Most of the points in this data set have been sampled from tropical and temperate regions. However, polynomial coefficients obtained from this data set are used to estimate chlorophyll content in all ocean regions with different properties such as sea-surface temperature, salinity, and downwelling/upwelling patterns. Further, the polynomial terms in these models are highly correlated. In sum, the limitations of these empirical models are as follows: 1) the independent variables within the empirical models, in their current form, are correlated (multicollinear), and 2) current algorithms are global approaches and are based on the spatial stationarity assumption, so they are independent of location. Multicollinearity problem is resolved by using partial least squares (PLS). PLS, which transforms the data into a set of independent components, can be considered as a combined form of principal component regression (PCR) and multiple regression. Geographically weighted regression (GWR) is also used to investigate the validity of spatial stationarity assumption. GWR solves a regression model over each sample point by using the observations within its neighbourhood. PLS results show that the empirical method underestimates chlorophyll content in high latitudes, including the Southern Ocean region, when compared to PLS (see Figure 1). Cluster analysis of GWR coefficients also shows that the spatial stationarity assumption in empirical models is not likely a valid assumption.

  20. Functional mixture regression.

    PubMed

    Yao, Fang; Fu, Yuejiao; Lee, Thomas C M

    2011-04-01

    In functional linear models (FLMs), the relationship between the scalar response and the functional predictor process is often assumed to be identical for all subjects. Motivated by both practical and methodological considerations, we relax this assumption and propose a new class of functional regression models that allow the regression structure to vary for different groups of subjects. By projecting the predictor process onto its eigenspace, the new functional regression model is simplified to a framework that is similar to classical mixture regression models. This leads to the proposed approach named as functional mixture regression (FMR). The estimation of FMR can be readily carried out using existing software implemented for functional principal component analysis and mixture regression. The practical necessity and performance of FMR are illustrated through applications to a longevity analysis of female medflies and a human growth study. Theoretical investigations concerning the consistent estimation and prediction properties of FMR along with simulation experiments illustrating its empirical properties are presented in the supplementary material available at Biostatistics online. Corresponding results demonstrate that the proposed approach could potentially achieve substantial gains over traditional FLMs.

  1. Impact of multicollinearity on small sample hydrologic regression models

    NASA Astrophysics Data System (ADS)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  2. Experimental and computational prediction of glass transition temperature of drugs.

    PubMed

    Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S

    2014-12-22

    Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.

  3. Profile local linear estimation of generalized semiparametric regression model for longitudinal data.

    PubMed

    Sun, Yanqing; Sun, Liuquan; Zhou, Jie

    2013-07-01

    This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.

  4. Analysis of the Importance of Oxides and Clays in Cd, Cr, Cu, Ni, Pb and Zn Adsorption and Retention with Regression Trees

    PubMed Central

    González-Costa, Juan José; Reigosa, Manuel Joaquín; Matías, José María; Fernández-Covelo, Emma

    2017-01-01

    This study determines the influence of the different soil components and of the cation-exchange capacity on the adsorption and retention of different heavy metals: cadmium, chromium, copper, nickel, lead and zinc. In order to do so, regression models were created through decision trees and the importance of soil components was assessed. Used variables were: humified organic matter, specific cation-exchange capacity, percentages of sand and silt, proportions of Mn, Fe and Al oxides and hematite, and the proportion of quartz, plagioclase and mica, and the proportions of the different clays: kaolinite, vermiculite, gibbsite and chlorite. The most important components in the obtained models were vermiculite and gibbsite, especially for the adsorption of cadmium and zinc, while clays were less relevant. Oxides are less important than clays, especially for the adsorption of chromium and lead and the retention of chromium, copper and lead. PMID:28072849

  5. High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics

    PubMed Central

    Carvalho, Carlos M.; Chang, Jeffrey; Lucas, Joseph E.; Nevins, Joseph R.; Wang, Quanli; West, Mike

    2010-01-01

    We describe studies in molecular profiling and biological pathway analysis that use sparse latent factor and regression models for microarray gene expression data. We discuss breast cancer applications and key aspects of the modeling and computational methodology. Our case studies aim to investigate and characterize heterogeneity of structure related to specific oncogenic pathways, as well as links between aggregate patterns in gene expression profiles and clinical biomarkers. Based on the metaphor of statistically derived “factors” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcomponents and investigate how these components overlay multiple aspects of known biological activity. Our methodology is based on sparsity modeling of multivariate regression, ANOVA, and latent factor models, as well as a class of models that combines all components. Hierarchical sparsity priors address questions of dimension reduction and multiple comparisons, as well as scalability of the methodology. The models include practically relevant non-Gaussian/nonparametric components for latent structure, underlying often quite complex non-Gaussianity in multivariate expression patterns. Model search and fitting are addressed through stochastic simulation and evolutionary stochastic search methods that are exemplified in the oncogenic pathway studies. Supplementary supporting material provides more details of the applications, as well as examples of the use of freely available software tools for implementing the methodology. PMID:21218139

  6. Variable Selection for Regression Models of Percentile Flows

    NASA Astrophysics Data System (ADS)

    Fouad, G.

    2017-12-01

    Percentile flows describe the flow magnitude equaled or exceeded for a given percent of time, and are widely used in water resource management. However, these statistics are normally unavailable since most basins are ungauged. Percentile flows of ungauged basins are often predicted using regression models based on readily observable basin characteristics, such as mean elevation. The number of these independent variables is too large to evaluate all possible models. A subset of models is typically evaluated using automatic procedures, like stepwise regression. This ignores a large variety of methods from the field of feature (variable) selection and physical understanding of percentile flows. A study of 918 basins in the United States was conducted to compare an automatic regression procedure to the following variable selection methods: (1) principal component analysis, (2) correlation analysis, (3) random forests, (4) genetic programming, (5) Bayesian networks, and (6) physical understanding. The automatic regression procedure only performed better than principal component analysis. Poor performance of the regression procedure was due to a commonly used filter for multicollinearity, which rejected the strongest models because they had cross-correlated independent variables. Multicollinearity did not decrease model performance in validation because of a representative set of calibration basins. Variable selection methods based strictly on predictive power (numbers 2-5 from above) performed similarly, likely indicating a limit to the predictive power of the variables. Similar performance was also reached using variables selected based on physical understanding, a finding that substantiates recent calls to emphasize physical understanding in modeling for predictions in ungauged basins. The strongest variables highlighted the importance of geology and land cover, whereas widely used topographic variables were the weakest predictors. Variables suffered from a high degree of multicollinearity, possibly illustrating the co-evolution of climatic and physiographic conditions. Given the ineffectiveness of many variables used here, future work should develop new variables that target specific processes associated with percentile flows.

  7. Time series modeling by a regression approach based on a latent process.

    PubMed

    Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice

    2009-01-01

    Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.

  8. On Using the Average Intercorrelation Among Predictor Variables and Eigenvector Orientation to Choose a Regression Solution.

    ERIC Educational Resources Information Center

    Mugrage, Beverly; And Others

    Three ridge regression solutions are compared with ordinary least squares regression and with principal components regression using all components. Ridge regression, particularly the Lawless-Wang solution, out-performed ordinary least squares regression and the principal components solution on the criteria of stability of coefficient and closeness…

  9. A Bayesian Semiparametric Latent Variable Model for Mixed Responses

    ERIC Educational Resources Information Center

    Fahrmeir, Ludwig; Raach, Alexander

    2007-01-01

    In this paper we introduce a latent variable model (LVM) for mixed ordinal and continuous responses, where covariate effects on the continuous latent variables are modelled through a flexible semiparametric Gaussian regression model. We extend existing LVMs with the usual linear covariate effects by including nonparametric components for nonlinear…

  10. Comparison between Two Linear Supervised Learning Machines' Methods with Principle Component Based Methods for the Spectrofluorimetric Determination of Agomelatine and Its Degradants.

    PubMed

    Elkhoudary, Mahmoud M; Naguib, Ibrahim A; Abdel Salam, Randa A; Hadad, Ghada M

    2017-05-01

    Four accurate, sensitive and reliable stability indicating chemometric methods were developed for the quantitative determination of Agomelatine (AGM) whether in pure form or in pharmaceutical formulations. Two supervised learning machines' methods; linear artificial neural networks (PC-linANN) preceded by principle component analysis and linear support vector regression (linSVR), were compared with two principle component based methods; principle component regression (PCR) as well as partial least squares (PLS) for the spectrofluorimetric determination of AGM and its degradants. The results showed the benefits behind using linear learning machines' methods and the inherent merits of their algorithms in handling overlapped noisy spectral data especially during the challenging determination of AGM alkaline and acidic degradants (DG1 and DG2). Relative mean squared error of prediction (RMSEP) for the proposed models in the determination of AGM were 1.68, 1.72, 0.68 and 0.22 for PCR, PLS, SVR and PC-linANN; respectively. The results showed the superiority of supervised learning machines' methods over principle component based methods. Besides, the results suggested that linANN is the method of choice for determination of components in low amounts with similar overlapped spectra and narrow linearity range. Comparison between the proposed chemometric models and a reported HPLC method revealed the comparable performance and quantification power of the proposed models.

  11. Understanding software faults and their role in software reliability modeling

    NASA Technical Reports Server (NTRS)

    Munson, John C.

    1994-01-01

    This study is a direct result of an on-going project to model the reliability of a large real-time control avionics system. In previous modeling efforts with this system, hardware reliability models were applied in modeling the reliability behavior of this system. In an attempt to enhance the performance of the adapted reliability models, certain software attributes were introduced in these models to control for differences between programs and also sequential executions of the same program. As the basic nature of the software attributes that affect software reliability become better understood in the modeling process, this information begins to have important implications on the software development process. A significant problem arises when raw attribute measures are to be used in statistical models as predictors, for example, of measures of software quality. This is because many of the metrics are highly correlated. Consider the two attributes: lines of code, LOC, and number of program statements, Stmts. In this case, it is quite obvious that a program with a high value of LOC probably will also have a relatively high value of Stmts. In the case of low level languages, such as assembly language programs, there might be a one-to-one relationship between the statement count and the lines of code. When there is a complete absence of linear relationship among the metrics, they are said to be orthogonal or uncorrelated. Usually the lack of orthogonality is not serious enough to affect a statistical analysis. However, for the purposes of some statistical analysis such as multiple regression, the software metrics are so strongly interrelated that the regression results may be ambiguous and possibly even misleading. Typically, it is difficult to estimate the unique effects of individual software metrics in the regression equation. The estimated values of the coefficients are very sensitive to slight changes in the data and to the addition or deletion of variables in the regression equation. Since most of the existing metrics have common elements and are linear combinations of these common elements, it seems reasonable to investigate the structure of the underlying common factors or components that make up the raw metrics. The technique we have chosen to use to explore this structure is a procedure called principal components analysis. Principal components analysis is a decomposition technique that may be used to detect and analyze collinearity in software metrics. When confronted with a large number of metrics measuring a single construct, it may be desirable to represent the set by some smaller number of variables that convey all, or most, of the information in the original set. Principal components are linear transformations of a set of random variables that summarize the information contained in the variables. The transformations are chosen so that the first component accounts for the maximal amount of variation of the measures of any possible linear transform; the second component accounts for the maximal amount of residual variation; and so on. The principal components are constructed so that they represent transformed scores on dimensions that are orthogonal. Through the use of principal components analysis, it is possible to have a set of highly related software attributes mapped into a small number of uncorrelated attribute domains. This definitively solves the problem of multi-collinearity in subsequent regression analysis. There are many software metrics in the literature, but principal component analysis reveals that there are few distinct sources of variation, i.e. dimensions, in this set of metrics. It would appear perfectly reasonable to characterize the measurable attributes of a program with a simple function of a small number of orthogonal metrics each of which represents a distinct software attribute domain.

  12. Seasonal forecasting of high wind speeds over Western Europe

    NASA Astrophysics Data System (ADS)

    Palutikof, J. P.; Holt, T.

    2003-04-01

    As financial losses associated with extreme weather events escalate, there is interest from end users in the forestry and insurance industries, for example, in the development of seasonal forecasting models with a long lead time. This study uses exceedences of the 90th, 95th, and 99th percentiles of daily maximum wind speed over the period 1958 to present to derive predictands of winter wind extremes. The source data is the 6-hourly NCEP Reanalysis gridded surface wind field. Predictor variables include principal components of Atlantic sea surface temperature and several indices of climate variability, including the NAO and SOI. Lead times of up to a year are considered, in monthly increments. Three regression techniques are evaluated; multiple linear regression (MLR), principal component regression (PCR), and partial least squares regression (PLS). PCR and PLS proved considerably superior to MLR with much lower standard errors. PLS was chosen to formulate the predictive model since it offers more flexibility in experimental design and gave slightly better results than PCR. The results indicate that winter windiness can be predicted with considerable skill one year ahead for much of coastal Europe, but that this deteriorates rapidly in the hinterland. The experiment succeeded in highlighting PLS as a very useful method for developing more precise forecasting models, and in identifying areas of high predictability.

  13. Crude oil price forecasting based on hybridizing wavelet multiple linear regression model, particle swarm optimization techniques, and principal component analysis.

    PubMed

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series.

  14. Crude Oil Price Forecasting Based on Hybridizing Wavelet Multiple Linear Regression Model, Particle Swarm Optimization Techniques, and Principal Component Analysis

    PubMed Central

    Shabri, Ani; Samsudin, Ruhaidah

    2014-01-01

    Crude oil prices do play significant role in the global economy and are a key input into option pricing formulas, portfolio allocation, and risk measurement. In this paper, a hybrid model integrating wavelet and multiple linear regressions (MLR) is proposed for crude oil price forecasting. In this model, Mallat wavelet transform is first selected to decompose an original time series into several subseries with different scale. Then, the principal component analysis (PCA) is used in processing subseries data in MLR for crude oil price forecasting. The particle swarm optimization (PSO) is used to adopt the optimal parameters of the MLR model. To assess the effectiveness of this model, daily crude oil market, West Texas Intermediate (WTI), has been used as the case study. Time series prediction capability performance of the WMLR model is compared with the MLR, ARIMA, and GARCH models using various statistics measures. The experimental results show that the proposed model outperforms the individual models in forecasting of the crude oil prices series. PMID:24895666

  15. Non-destructive evaluation of chlorophyll content in quinoa and amaranth leaves by simple and multiple regression analysis of RGB image components.

    PubMed

    Riccardi, M; Mele, G; Pulvento, C; Lavini, A; d'Andria, R; Jacobsen, S-E

    2014-06-01

    Leaf chlorophyll content provides valuable information about physiological status of plants; it is directly linked to photosynthetic potential and primary production. In vitro assessment by wet chemical extraction is the standard method for leaf chlorophyll determination. This measurement is expensive, laborious, and time consuming. Over the years alternative methods, rapid and non-destructive, have been explored. The aim of this work was to evaluate the applicability of a fast and non-invasive field method for estimation of chlorophyll content in quinoa and amaranth leaves based on RGB components analysis of digital images acquired with a standard SLR camera. Digital images of leaves from different genotypes of quinoa and amaranth were acquired directly in the field. Mean values of each RGB component were evaluated via image analysis software and correlated to leaf chlorophyll provided by standard laboratory procedure. Single and multiple regression models using RGB color components as independent variables have been tested and validated. The performance of the proposed method was compared to that of the widely used non-destructive SPAD method. Sensitivity of the best regression models for different genotypes of quinoa and amaranth was also checked. Color data acquisition of the leaves in the field with a digital camera was quick, more effective, and lower cost than SPAD. The proposed RGB models provided better correlation (highest R (2)) and prediction (lowest RMSEP) of the true value of foliar chlorophyll content and had a lower amount of noise in the whole range of chlorophyll studied compared with SPAD and other leaf image processing based models when applied to quinoa and amaranth.

  16. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    NASA Astrophysics Data System (ADS)

    Soares dos Santos, T.; Mendes, D.; Rodrigues Torres, R.

    2016-01-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANNs) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon; northeastern Brazil; and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model output and observed monthly precipitation. We used general circulation model (GCM) experiments for the 20th century (RCP historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANNs significantly outperform the MLR downscaling of monthly precipitation variability.

  17. Combining multiple regression and principal component analysis for accurate predictions for column ozone in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Rajab, Jasim M.; MatJafri, M. Z.; Lim, H. S.

    2013-06-01

    This study encompasses columnar ozone modelling in the peninsular Malaysia. Data of eight atmospheric parameters [air surface temperature (AST), carbon monoxide (CO), methane (CH4), water vapour (H2Ovapour), skin surface temperature (SSKT), atmosphere temperature (AT), relative humidity (RH), and mean surface pressure (MSP)] data set, retrieved from NASA's Atmospheric Infrared Sounder (AIRS), for the entire period (2003-2008) was employed to develop models to predict the value of columnar ozone (O3) in study area. The combined method, which is based on using both multiple regressions combined with principal component analysis (PCA) modelling, was used to predict columnar ozone. This combined approach was utilized to improve the prediction accuracy of columnar ozone. Separate analysis was carried out for north east monsoon (NEM) and south west monsoon (SWM) seasons. The O3 was negatively correlated with CH4, H2Ovapour, RH, and MSP, whereas it was positively correlated with CO, AST, SSKT, and AT during both the NEM and SWM season periods. Multiple regression analysis was used to fit the columnar ozone data using the atmospheric parameter's variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to acquire subsets of the predictor variables to be comprised in the linear regression model of the atmospheric parameter's variables. It was found that the increase in columnar O3 value is associated with an increase in the values of AST, SSKT, AT, and CO and with a drop in the levels of CH4, H2Ovapour, RH, and MSP. The result of fitting the best models for the columnar O3 value using eight of the independent variables gave about the same values of the R (≈0.93) and R2 (≈0.86) for both the NEM and SWM seasons. The common variables that appeared in both regression equations were SSKT, CH4 and RH, and the principal precursor of the columnar O3 value in both the NEM and SWM seasons was SSKT.

  18. The PX-EM algorithm for fast stable fitting of Henderson's mixed model

    PubMed Central

    Foulley, Jean-Louis; Van Dyk, David A

    2000-01-01

    This paper presents procedures for implementing the PX-EM algorithm of Liu, Rubin and Wu to compute REML estimates of variance covariance components in Henderson's linear mixed models. The class of models considered encompasses several correlated random factors having the same vector length e.g., as in random regression models for longitudinal data analysis and in sire-maternal grandsire models for genetic evaluation. Numerical examples are presented to illustrate the procedures. Much better results in terms of convergence characteristics (number of iterations and time required for convergence) are obtained for PX-EM relative to the basic EM algorithm in the random regression. PMID:14736399

  19. 2012 Workplace and Gender Relations Survey of Reserve Component Members: Statistical Methodology Report

    DTIC Science & Technology

    2012-09-01

    3,435 10,461 9.1 3.1 63 Unmarried with Children+ Unmarried without Children 439,495 0.01 10,350 43,870 10.1 2.2 64 Married with Children+ Married ...logistic regression model was used to predict the probability of eligibility for the survey (known eligibility vs . unknown eligibility). A second logistic...regression model was used to predict the probability of response among eligible sample members (complete response vs . non-response). CHAID (Chi

  20. Addressing the identification problem in age-period-cohort analysis: a tutorial on the use of partial least squares and principal components analysis.

    PubMed

    Tu, Yu-Kang; Krämer, Nicole; Lee, Wen-Chung

    2012-07-01

    In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix.

  1. Linear models for calculating digestibile energy for sheep diets.

    PubMed

    Fonnesbeck, P V; Christiansen, M L; Harris, L E

    1981-05-01

    Equations for estimating the digestible energy (DE) content of sheep diets were generated from the chemical contents and a factorial description of diets fed to lambs in digestion trials. The diet factors were two forages (alfalfa and grass hay), harvested at three stages of maturity (late vegetative, early bloom and full bloom), fed in two ingredient combinations (all hay or a 50:50 hay and corn grain mixture) and prepared by two forage texture processes (coarsely chopped or finely chopped and pelleted). The 2 x 3 x 2 x 2 factorial arrangement produced 24 diet treatments. These were replicated twice, for a total of 48 lamb digestion trials. In model 1 regression equations, DE was calculated directly from chemical composition of the diet. In model 2, regression equations predicted the percentage of digested nutrient from the chemical contents of the diet and then DE of the diet was calculated as the sum of the gross energy of the digested organic components. Expanded forms of model 1 and model 2 were also developed that included diet factors as qualitative indicator variables to adjust the regression constant and regression coefficients for the diet description. The expanded forms of the equations accounted for significantly more variation in DE than did the simple models and more accurately estimated DE of the diet. Information provided by the diet description proved as useful as chemical analyses for the prediction of digestibility of nutrients. The statistics indicate that, with model 1, neutral detergent fiber and plant cell wall analyses provided as much information for the estimation of DE as did model 2 with the combined information from crude protein, available carbohydrate, total lipid, cellulose and hemicellulose. Regression equations are presented for estimating DE with the most currently analyzed organic components, including linear and curvilinear variables and diet factors that significantly reduce the standard error of the estimate. To estimate De of a diet, the user utilizes the equation that uses the chemical analysis information and diet description most effectively.

  2. Decoding and modelling of time series count data using Poisson hidden Markov model and Markov ordinal logistic regression models.

    PubMed

    Sebastian, Tunny; Jeyaseelan, Visalakshi; Jeyaseelan, Lakshmanan; Anandan, Shalini; George, Sebastian; Bangdiwala, Shrikant I

    2018-01-01

    Hidden Markov models are stochastic models in which the observations are assumed to follow a mixture distribution, but the parameters of the components are governed by a Markov chain which is unobservable. The issues related to the estimation of Poisson-hidden Markov models in which the observations are coming from mixture of Poisson distributions and the parameters of the component Poisson distributions are governed by an m-state Markov chain with an unknown transition probability matrix are explained here. These methods were applied to the data on Vibrio cholerae counts reported every month for 11-year span at Christian Medical College, Vellore, India. Using Viterbi algorithm, the best estimate of the state sequence was obtained and hence the transition probability matrix. The mean passage time between the states were estimated. The 95% confidence interval for the mean passage time was estimated via Monte Carlo simulation. The three hidden states of the estimated Markov chain are labelled as 'Low', 'Moderate' and 'High' with the mean counts of 1.4, 6.6 and 20.2 and the estimated average duration of stay of 3, 3 and 4 months, respectively. Environmental risk factors were studied using Markov ordinal logistic regression analysis. No significant association was found between disease severity levels and climate components.

  3. Selection of a Geostatistical Method to Interpolate Soil Properties of the State Crop Testing Fields using Attributes of a Digital Terrain Model

    NASA Astrophysics Data System (ADS)

    Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.

    2018-03-01

    The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.

  4. ECOPASS - a multivariate model used as an index of growth performance of poplar clones

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ceulemans, R.; Impens, I.

    The model (ECOlogical PASSport) reported was constructed by principal component analysis from a combination of biochemical, anatomical/morphological and ecophysiological gas exchange parameters measured on 5 fast growing poplar clones. Productivity data were 10 selected trees in 3 plantations in Belgium and given as m.a.i.(b.a.). The model is shown to be able to reflect not only genetic origin and the relative effects of the different parameters of the clones, but also their production potential. Multiple regression analysis of the 4 principal components showed a high cumulative correlation (96%) between the 3 components related to ecophysiological, biochemical and morphological parameters, and productivity;more » the ecophysiological component alone correlated 85% with productivity.« less

  5. Statistical methods and regression analysis of stratospheric ozone and meteorological variables in Isfahan

    NASA Astrophysics Data System (ADS)

    Hassanzadeh, S.; Hosseinibalam, F.; Omidvari, M.

    2008-04-01

    Data of seven meteorological variables (relative humidity, wet temperature, dry temperature, maximum temperature, minimum temperature, ground temperature and sun radiation time) and ozone values have been used for statistical analysis. Meteorological variables and ozone values were analyzed using both multiple linear regression and principal component methods. Data for the period 1999-2004 are analyzed jointly using both methods. For all periods, temperature dependent variables were highly correlated, but were all negatively correlated with relative humidity. Multiple regression analysis was used to fit the meteorological variables using the meteorological variables as predictors. A variable selection method based on high loading of varimax rotated principal components was used to obtain subsets of the predictor variables to be included in the linear regression model of the meteorological variables. In 1999, 2001 and 2002 one of the meteorological variables was weakly influenced predominantly by the ozone concentrations. However, the model did not predict that the meteorological variables for the year 2000 were not influenced predominantly by the ozone concentrations that point to variation in sun radiation. This could be due to other factors that were not explicitly considered in this study.

  6. Trend Estimation and Regression Analysis in Climatological Time Series: An Application of Structural Time Series Models and the Kalman Filter.

    NASA Astrophysics Data System (ADS)

    Visser, H.; Molenaar, J.

    1995-05-01

    The detection of trends in climatological data has become central to the discussion on climate change due to the enhanced greenhouse effect. To prove detection, a method is needed (i) to make inferences on significant rises or declines in trends, (ii) to take into account natural variability in climate series, and (iii) to compare output from GCMs with the trends in observed climate data. To meet these requirements, flexible mathematical tools are needed. A structural time series model is proposed with which a stochastic trend, a deterministic trend, and regression coefficients can be estimated simultaneously. The stochastic trend component is described using the class of ARIMA models. The regression component is assumed to be linear. However, the regression coefficients corresponding with the explanatory variables may be time dependent to validate this assumption. The mathematical technique used to estimate this trend-regression model is the Kaiman filter. The main features of the filter are discussed.Examples of trend estimation are given using annual mean temperatures at a single station in the Netherlands (1706-1990) and annual mean temperatures at Northern Hemisphere land stations (1851-1990). The inclusion of explanatory variables is shown by regressing the latter temperature series on four variables: Southern Oscillation index (SOI), volcanic dust index (VDI), sunspot numbers (SSN), and a simulated temperature signal, induced by increasing greenhouse gases (GHG). In all analyses, the influence of SSN on global temperatures is found to be negligible. The correlations between temperatures and SOI and VDI appear to be negative. For SOI, this correlation is significant, but for VDI it is not, probably because of a lack of volcanic eruptions during the sample period. The relation between temperatures and GHG is positive, which is in agreement with the hypothesis of a warming climate because of increasing levels of greenhouse gases. The prediction performance of the model is rather poor, and possible explanations are discussed.

  7. Selective principal component regression analysis of fluorescence hyperspectral image to assess aflatoxin contamination in corn

    USDA-ARS?s Scientific Manuscript database

    Selective principal component regression analysis (SPCR) uses a subset of the original image bands for principal component transformation and regression. For optimal band selection before the transformation, this paper used genetic algorithms (GA). In this case, the GA process used the regression co...

  8. Artificial neural networks and multiple linear regression model using principal components to estimate rainfall over South America

    NASA Astrophysics Data System (ADS)

    dos Santos, T. S.; Mendes, D.; Torres, R. R.

    2015-08-01

    Several studies have been devoted to dynamic and statistical downscaling for analysis of both climate variability and climate change. This paper introduces an application of artificial neural networks (ANN) and multiple linear regression (MLR) by principal components to estimate rainfall in South America. This method is proposed for downscaling monthly precipitation time series over South America for three regions: the Amazon, Northeastern Brazil and the La Plata Basin, which is one of the regions of the planet that will be most affected by the climate change projected for the end of the 21st century. The downscaling models were developed and validated using CMIP5 model out- put and observed monthly precipitation. We used GCMs experiments for the 20th century (RCP Historical; 1970-1999) and two scenarios (RCP 2.6 and 8.5; 2070-2100). The model test results indicate that the ANN significantly outperforms the MLR downscaling of monthly precipitation variability.

  9. Multilevel covariance regression with correlated random effects in the mean and variance structure.

    PubMed

    Quintero, Adrian; Lesaffre, Emmanuel

    2017-09-01

    Multivariate regression methods generally assume a constant covariance matrix for the observations. In case a heteroscedastic model is needed, the parametric and nonparametric covariance regression approaches can be restrictive in the literature. We propose a multilevel regression model for the mean and covariance structure, including random intercepts in both components and allowing for correlation between them. The implied conditional covariance function can be different across clusters as a result of the random effect in the variance structure. In addition, allowing for correlation between the random intercepts in the mean and covariance makes the model convenient for skewedly distributed responses. Furthermore, it permits us to analyse directly the relation between the mean response level and the variability in each cluster. Parameter estimation is carried out via Gibbs sampling. We compare the performance of our model to other covariance modelling approaches in a simulation study. Finally, the proposed model is applied to the RN4CAST dataset to identify the variables that impact burnout of nurses in Belgium. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  10. EMD-regression for modelling multi-scale relationships, and application to weather-related cardiovascular mortality

    NASA Astrophysics Data System (ADS)

    Masselot, Pierre; Chebana, Fateh; Bélanger, Diane; St-Hilaire, André; Abdous, Belkacem; Gosselin, Pierre; Ouarda, Taha B. M. J.

    2018-01-01

    In a number of environmental studies, relationships between natural processes are often assessed through regression analyses, using time series data. Such data are often multi-scale and non-stationary, leading to a poor accuracy of the resulting regression models and therefore to results with moderate reliability. To deal with this issue, the present paper introduces the EMD-regression methodology consisting in applying the empirical mode decomposition (EMD) algorithm on data series and then using the resulting components in regression models. The proposed methodology presents a number of advantages. First, it accounts of the issues of non-stationarity associated to the data series. Second, this approach acts as a scan for the relationship between a response variable and the predictors at different time scales, providing new insights about this relationship. To illustrate the proposed methodology it is applied to study the relationship between weather and cardiovascular mortality in Montreal, Canada. The results shed new knowledge concerning the studied relationship. For instance, they show that the humidity can cause excess mortality at the monthly time scale, which is a scale not visible in classical models. A comparison is also conducted with state of the art methods which are the generalized additive models and distributed lag models, both widely used in weather-related health studies. The comparison shows that EMD-regression achieves better prediction performances and provides more details than classical models concerning the relationship.

  11. Relationship between body composition and postural control in prepubertal overweight/obese children: A cross-sectional study.

    PubMed

    Villarrasa-Sapiña, Israel; Álvarez-Pitti, Julio; Cabeza-Ruiz, Ruth; Redón, Pau; Lurbe, Empar; García-Massó, Xavier

    2018-02-01

    Excess body weight during childhood causes reduced motor functionality and problems in postural control, a negative influence which has been reported in the literature. Nevertheless, no information regarding the effect of body composition on the postural control of overweight and obese children is available. The objective of this study was therefore to establish these relationships. A cross-sectional design was used to establish relationships between body composition and postural control variables obtained in bipedal eyes-open and eyes-closed conditions in twenty-two children. Centre of pressure signals were analysed in the temporal and frequency domains. Pearson correlations were applied to establish relationships between variables. Principal component analysis was applied to the body composition variables to avoid potential multicollinearity in the regression models. These principal components were used to perform a multiple linear regression analysis, from which regression models were obtained to predict postural control. Height and leg mass were the body composition variables that showed the highest correlation with postural control. Multiple regression models were also obtained and several of these models showed a higher correlation coefficient in predicting postural control than simple correlations. These models revealed that leg and trunk mass were good predictors of postural control. More equations were found in the eyes-open than eyes-closed condition. Body weight and height are negatively correlated with postural control. However, leg and trunk mass are better postural control predictors than arm or body mass. Finally, body composition variables are more useful in predicting postural control when the eyes are open. Copyright © 2017 Elsevier Ltd. All rights reserved.

  12. Quantitative structure-activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods.

    PubMed

    Ahmadi, Mehdi; Shahlaei, Mohsen

    2015-01-01

    P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure-activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7-7-1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure-activity relationship model suggested is robust and satisfactory.

  13. Quantitative structure–activity relationship study of P2X7 receptor inhibitors using combination of principal component analysis and artificial intelligence methods

    PubMed Central

    Ahmadi, Mehdi; Shahlaei, Mohsen

    2015-01-01

    P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure–activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7−7−1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure–activity relationship model suggested is robust and satisfactory. PMID:26600858

  14. Functional Data Analysis Applied to Modeling of Severe Acute Mucositis and Dysphagia Resulting From Head and Neck Radiation Therapy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dean, Jamie A., E-mail: jamie.dean@icr.ac.uk; Wong, Kee H.; Gay, Hiram

    Purpose: Current normal tissue complication probability modeling using logistic regression suffers from bias and high uncertainty in the presence of highly correlated radiation therapy (RT) dose data. This hinders robust estimates of dose-response associations and, hence, optimal normal tissue–sparing strategies from being elucidated. Using functional data analysis (FDA) to reduce the dimensionality of the dose data could overcome this limitation. Methods and Materials: FDA was applied to modeling of severe acute mucositis and dysphagia resulting from head and neck RT. Functional partial least squares regression (FPLS) and functional principal component analysis were used for dimensionality reduction of the dose-volume histogrammore » data. The reduced dose data were input into functional logistic regression models (functional partial least squares–logistic regression [FPLS-LR] and functional principal component–logistic regression [FPC-LR]) along with clinical data. This approach was compared with penalized logistic regression (PLR) in terms of predictive performance and the significance of treatment covariate–response associations, assessed using bootstrapping. Results: The area under the receiver operating characteristic curve for the PLR, FPC-LR, and FPLS-LR models was 0.65, 0.69, and 0.67, respectively, for mucositis (internal validation) and 0.81, 0.83, and 0.83, respectively, for dysphagia (external validation). The calibration slopes/intercepts for the PLR, FPC-LR, and FPLS-LR models were 1.6/−0.67, 0.45/0.47, and 0.40/0.49, respectively, for mucositis (internal validation) and 2.5/−0.96, 0.79/−0.04, and 0.79/0.00, respectively, for dysphagia (external validation). The bootstrapped odds ratios indicated significant associations between RT dose and severe toxicity in the mucositis and dysphagia FDA models. Cisplatin was significantly associated with severe dysphagia in the FDA models. None of the covariates was significantly associated with severe toxicity in the PLR models. Dose levels greater than approximately 1.0 Gy/fraction were most strongly associated with severe acute mucositis and dysphagia in the FDA models. Conclusions: FPLS and functional principal component analysis marginally improved predictive performance compared with PLR and provided robust dose-response associations. FDA is recommended for use in normal tissue complication probability modeling.« less

  15. Time-Dependent Moment Tensors of the First Four Source Physics Experiments (SPE) Explosions

    NASA Astrophysics Data System (ADS)

    Yang, X.

    2015-12-01

    We use mainly vertical-component geophone data within 2 km from the epicenter to invert for time-dependent moment tensors of the first four SPE explosions: SPE-1, SPE-2, SPE-3 and SPE-4Prime. We employ a one-dimensional (1D) velocity model developed from P- and Rg-wave travel times for Green's function calculations. The attenuation structure of the model is developed from P- and Rg-wave amplitudes. We select data for the inversion based on the criterion that they show consistent travel times and amplitude behavior as those predicted by the 1D model. Due to limited azimuthal coverage of the sources and the mostly vertical-component-only nature of the dataset, only long-period, diagonal components of the moment tensors are well constrained. Nevertheless, the moment tensors, particularly their isotropic components, provide reasonable estimates of the long-period source amplitudes as well as estimates of corner frequencies, albeit with larger uncertainties. The estimated corner frequencies, however, are consistent with estimates from ratios of seismogram spectra from different explosions. These long-period source amplitudes and corner frequencies cannot be fit by classical P-wave explosion source models. The results motivate the development of new P-wave source models suitable for these chemical explosions. To that end, we fit inverted moment-tensor spectra by modifying the classical explosion model using regressions of estimated source parameters. Although the number of data points used in the regression is small, the approach suggests a way for the new-model development when more data are collected.

  16. Rapid Detection of Volatile Oil in Mentha haplocalyx by Near-Infrared Spectroscopy and Chemometrics.

    PubMed

    Yan, Hui; Guo, Cheng; Shao, Yang; Ouyang, Zhen

    2017-01-01

    Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . The effects of data pre-processing methods on the accuracy of the PLSR calibration models were investigated. The performance of the final model was evaluated according to the correlation coefficient ( R ) and root mean square error of prediction (RMSEP). For PLSR model, the best preprocessing method combination was first-order derivative, standard normal variate transformation (SNV), and mean centering, which had of 0.8805, of 0.8719, RMSEC of 0.091, and RMSEP of 0.097, respectively. The wave number variables linking to volatile oil are from 5500 to 4000 cm-1 by analyzing the loading weights and variable importance in projection (VIP) scores. For SVM model, six LVs (less than seven LVs in PLSR model) were adopted in model, and the result was better than PLSR model. The and were 0.9232 and 0.9202, respectively, with RMSEC and RMSEP of 0.084 and 0.082, respectively, which indicated that the predicted values were accurate and reliable. This work demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in M. haplocalyx . The quality of medicine directly links to clinical efficacy, thus, it is important to control the quality of Mentha haplocalyx . Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . For SVM model, 6 LVs (less than 7 LVs in PLSR model) were adopted in model, and the result was better than PLSR model. It demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in Mentha haplocalyx . Abbreviations used: 1 st der: First-order derivative; 2 nd der: Second-order derivative; LOO: Leave-one-out; LVs: Latent variables; MC: Mean centering, NIR: Near-infrared; NIRS: Near infrared spectroscopy; PCR: Principal component regression, PLSR: Partial least squares regression; RBF: Radial basis function; RMSEC: Root mean square error of cross validation, RMSEC: Root mean square error of calibration; RMSEP: Root mean square error of prediction; SNV: Standard normal variate transformation; SVM: Support vector machine; VIP: Variable Importance in projection.

  17. A Two-Step Method to Select Major Surge-Producing Extratropical Cyclones from a 10,000-Year Stochastic Catalog

    NASA Astrophysics Data System (ADS)

    Keshtpoor, M.; Carnacina, I.; Yablonsky, R. M.

    2016-12-01

    Extratropical cyclones (ETCs) are the primary driver of storm surge events along the UK and northwest mainland Europe coastlines. In an effort to evaluate the storm surge risk in coastal communities in this region, a stochastic catalog is developed by perturbing the historical storm seeds of European ETCs to account for 10,000 years of possible ETCs. Numerical simulation of the storm surge generated by the full 10,000-year stochastic catalog, however, is computationally expensive and may take several months to complete with available computational resources. A new statistical regression model is developed to select the major surge-generating events from the stochastic ETC catalog. This regression model is based on the maximum storm surge, obtained via numerical simulations using a calibrated version of the Delft3D-FM hydrodynamic model with a relatively coarse mesh, of 1750 historical ETC events that occurred over the past 38 years in Europe. These numerically-simulated surge values were regressed to the local sea level pressure and the U and V components of the wind field at the location of 196 tide gauge stations near the UK and northwest mainland Europe coastal areas. The regression model suggests that storm surge values in the area of interest are highly correlated to the U- and V-component of wind speed, as well as the sea level pressure. Based on these correlations, the regression model was then used to select surge-generating storms from the 10,000-year stochastic catalog. Results suggest that roughly 105,000 events out of 480,000 stochastic storms are surge-generating events and need to be considered for numerical simulation using a hydrodynamic model. The selected stochastic storms were then simulated in Delft3D-FM, and the final refinement of the storm population was performed based on return period analysis of the 1750 historical event simulations at each of the 196 tide gauges in preparation for Delft3D-FM fine mesh simulations.

  18. Using Patient Demographics and Statistical Modeling to Predict Knee Tibia Component Sizing in Total Knee Arthroplasty.

    PubMed

    Ren, Anna N; Neher, Robert E; Bell, Tyler; Grimm, James

    2018-06-01

    Preoperative planning is important to achieve successful implantation in primary total knee arthroplasty (TKA). However, traditional TKA templating techniques are not accurate enough to predict the component size to a very close range. With the goal of developing a general predictive statistical model using patient demographic information, ordinal logistic regression was applied to build a proportional odds model to predict the tibia component size. The study retrospectively collected the data of 1992 primary Persona Knee System TKA procedures. Of them, 199 procedures were randomly selected as testing data and the rest of the data were randomly partitioned between model training data and model evaluation data with a ratio of 7:3. Different models were trained and evaluated on the training and validation data sets after data exploration. The final model had patient gender, age, weight, and height as independent variables and predicted the tibia size within 1 size difference 96% of the time on the validation data, 94% of the time on the testing data, and 92% on a prospective cadaver data set. The study results indicated the statistical model built by ordinal logistic regression can increase the accuracy of tibia sizing information for Persona Knee preoperative templating. This research shows statistical modeling may be used with radiographs to dramatically enhance the templating accuracy, efficiency, and quality. In general, this methodology can be applied to other TKA products when the data are applicable. Copyright © 2018 Elsevier Inc. All rights reserved.

  19. Impact of covariate models on the assessment of the air pollution-mortality association in a single- and multipollutant context.

    PubMed

    Sacks, Jason D; Ito, Kazuhiko; Wilson, William E; Neas, Lucas M

    2012-10-01

    With the advent of multicity studies, uniform statistical approaches have been developed to examine air pollution-mortality associations across cities. To assess the sensitivity of the air pollution-mortality association to different model specifications in a single and multipollutant context, the authors applied various regression models developed in previous multicity time-series studies of air pollution and mortality to data from Philadelphia, Pennsylvania (May 1992-September 1995). Single-pollutant analyses used daily cardiovascular mortality, fine particulate matter (particles with an aerodynamic diameter ≤2.5 µm; PM(2.5)), speciated PM(2.5), and gaseous pollutant data, while multipollutant analyses used source factors identified through principal component analysis. In single-pollutant analyses, risk estimates were relatively consistent across models for most PM(2.5) components and gaseous pollutants. However, risk estimates were inconsistent for ozone in all-year and warm-season analyses. Principal component analysis yielded factors with species associated with traffic, crustal material, residual oil, and coal. Risk estimates for these factors exhibited less sensitivity to alternative regression models compared with single-pollutant models. Factors associated with traffic and crustal material showed consistently positive associations in the warm season, while the coal combustion factor showed consistently positive associations in the cold season. Overall, mortality risk estimates examined using a source-oriented approach yielded more stable and precise risk estimates, compared with single-pollutant analyses.

  20. Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.

    PubMed

    Chatzis, Sotirios P; Andreou, Andreas S

    2015-11-01

    Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.

  1. Boosting structured additive quantile regression for longitudinal childhood obesity data.

    PubMed

    Fenske, Nora; Fahrmeir, Ludwig; Hothorn, Torsten; Rzehak, Peter; Höhle, Michael

    2013-07-25

    Childhood obesity and the investigation of its risk factors has become an important public health issue. Our work is based on and motivated by a German longitudinal study including 2,226 children with up to ten measurements on their body mass index (BMI) and risk factors from birth to the age of 10 years. We introduce boosting of structured additive quantile regression as a novel distribution-free approach for longitudinal quantile regression. The quantile-specific predictors of our model include conventional linear population effects, smooth nonlinear functional effects, varying-coefficient terms, and individual-specific effects, such as intercepts and slopes. Estimation is based on boosting, a computer intensive inference method for highly complex models. We propose a component-wise functional gradient descent boosting algorithm that allows for penalized estimation of the large variety of different effects, particularly leading to individual-specific effects shrunken toward zero. This concept allows us to flexibly estimate the nonlinear age curves of upper quantiles of the BMI distribution, both on population and on individual-specific level, adjusted for further risk factors and to detect age-varying effects of categorical risk factors. Our model approach can be regarded as the quantile regression analog of Gaussian additive mixed models (or structured additive mean regression models), and we compare both model classes with respect to our obesity data.

  2. Comprehensive ripeness-index for prediction of ripening level in mangoes by multivariate modelling of ripening behaviour

    NASA Astrophysics Data System (ADS)

    Eyarkai Nambi, Vijayaram; Thangavel, Kuladaisamy; Manickavasagan, Annamalai; Shahir, Sultan

    2017-01-01

    Prediction of ripeness level in climacteric fruits is essential for post-harvest handling. An index capable of predicting ripening level with minimum inputs would be highly beneficial to the handlers, processors and researchers in fruit industry. A study was conducted with Indian mango cultivars to develop a ripeness index and associated model. Changes in physicochemical, colour and textural properties were measured throughout the ripening period and the period was classified into five stages (unripe, early ripe, partially ripe, ripe and over ripe). Multivariate regression techniques like partial least square regression, principal component regression and multi linear regression were compared and evaluated for its prediction. Multi linear regression model with 12 parameters was found more suitable in ripening prediction. Scientific variable reduction method was adopted to simplify the developed model. Better prediction was achieved with either 2 or 3 variables (total soluble solids, colour and acidity). Cross validation was done to increase the robustness and it was found that proposed ripening index was more effective in prediction of ripening stages. Three-variable model would be suitable for commercial applications where reasonable accuracies are sufficient. However, 12-variable model can be used to obtain more precise results in research and development applications.

  3. Linking the Components of a University Program to the Qualification Profile of Graduates: The Case of a Sustainability-Oriented Environmental Science Curriculum

    ERIC Educational Resources Information Center

    Hansmann, Ralf

    2009-01-01

    A university Environmental Sciences curriculum is described against the background of requirements for environmental problem solving for sustainability and then analyzed using data from regular surveys of graduates (N = 373). Three types of multiple regression models examine links between qualifications and curriculum components in order to derive…

  4. Case study on prediction of remaining methane potential of landfilled municipal solid waste by statistical analysis of waste composition data.

    PubMed

    Sel, İlker; Çakmakcı, Mehmet; Özkaya, Bestamin; Suphi Altan, H

    2016-10-01

    Main objective of this study was to develop a statistical model for easier and faster Biochemical Methane Potential (BMP) prediction of landfilled municipal solid waste by analyzing waste composition of excavated samples from 12 sampling points and three waste depths representing different landfilling ages of closed and active sections of a sanitary landfill site located in İstanbul, Turkey. Results of Principal Component Analysis (PCA) were used as a decision support tool to evaluation and describe the waste composition variables. Four principal component were extracted describing 76% of data set variance. The most effective components were determined as PCB, PO, T, D, W, FM, moisture and BMP for the data set. Multiple Linear Regression (MLR) models were built by original compositional data and transformed data to determine differences. It was observed that even residual plots were better for transformed data the R(2) and Adjusted R(2) values were not improved significantly. The best preliminary BMP prediction models consisted of D, W, T and FM waste fractions for both versions of regressions. Adjusted R(2) values of the raw and transformed models were determined as 0.69 and 0.57, respectively. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.

    PubMed

    Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W

    1992-03-01

    The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.

  6. Obscure phenomena in statistical analysis of quantitative structure-activity relationships. Part 1: Multicollinearity of physicochemical descriptors.

    PubMed

    Mager, P P; Rothe, H

    1990-10-01

    Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.

  7. Prediction of Emergency Department Hospital Admission Based on Natural Language Processing and Neural Networks.

    PubMed

    Zhang, Xingyu; Kim, Joyce; Patzer, Rachel E; Pitts, Stephen R; Patzer, Aaron; Schrager, Justin D

    2017-10-26

    To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements. Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model. Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN. The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.

  8. Gene set analysis using variance component tests.

    PubMed

    Huang, Yen-Tsung; Lin, Xihong

    2013-06-28

    Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.

  9. Modeling and managing risk early in software development

    NASA Technical Reports Server (NTRS)

    Briand, Lionel C.; Thomas, William M.; Hetmanski, Christopher J.

    1993-01-01

    In order to improve the quality of the software development process, we need to be able to build empirical multivariate models based on data collectable early in the software process. These models need to be both useful for prediction and easy to interpret, so that remedial actions may be taken in order to control and optimize the development process. We present an automated modeling technique which can be used as an alternative to regression techniques. We show how it can be used to facilitate the identification and aid the interpretation of the significant trends which characterize 'high risk' components in several Ada systems. Finally, we evaluate the effectiveness of our technique based on a comparison with logistic regression based models.

  10. Analytical framework for reconstructing heterogeneous environmental variables from mammal community structure.

    PubMed

    Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C

    2015-01-01

    We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test

    NASA Technical Reports Server (NTRS)

    Messer, Bradley

    2007-01-01

    Propulsion ground test facilities face the daily challenge of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Over the last decade NASA s propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and exceeded the capabilities of numerous test facility and test article components. A logistic regression mathematical modeling technique has been developed to predict the probability of successfully completing a rocket propulsion test. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),.., X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure of accomplishing a full duration test. The use of logistic regression modeling is not new; however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from this type of model provide project managers with insight and confidence into the effectiveness of rocket propulsion ground testing.

  12. Weighted functional linear regression models for gene-based association analysis.

    PubMed

    Belonogova, Nadezhda M; Svishcheva, Gulnara R; Wilson, James F; Campbell, Harry; Axenovich, Tatiana I

    2018-01-01

    Functional linear regression models are effectively used in gene-based association analysis of complex traits. These models combine information about individual genetic variants, taking into account their positions and reducing the influence of noise and/or observation errors. To increase the power of methods, where several differently informative components are combined, weights are introduced to give the advantage to more informative components. Allele-specific weights have been introduced to collapsing and kernel-based approaches to gene-based association analysis. Here we have for the first time introduced weights to functional linear regression models adapted for both independent and family samples. Using data simulated on the basis of GAW17 genotypes and weights defined by allele frequencies via the beta distribution, we demonstrated that type I errors correspond to declared values and that increasing the weights of causal variants allows the power of functional linear models to be increased. We applied the new method to real data on blood pressure from the ORCADES sample. Five of the six known genes with P < 0.1 in at least one analysis had lower P values with weighted models. Moreover, we found an association between diastolic blood pressure and the VMP1 gene (P = 8.18×10-6), when we used a weighted functional model. For this gene, the unweighted functional and weighted kernel-based models had P = 0.004 and 0.006, respectively. The new method has been implemented in the program package FREGAT, which is freely available at https://cran.r-project.org/web/packages/FREGAT/index.html.

  13. Specific gravity of bovine colostrum immunoglobulins as affected by temperature and colostrum components.

    PubMed

    Mechor, G D; Gröhn, Y T; McDowell, L R; Van Saun, R J

    1992-11-01

    The effects of temperature and colostrum components on specific gravity in bovine colostrum were investigated. Thirty-nine first milking colostrum samples were collected from Holstein cows. The samples were assayed for alpha-tocopherol, fat, protein, total solids, and IgG. The concentrations of total solids, total protein, total IgG, and fat in colostrum were 26.6, 12.5, 3.7, and 9.4 g/100 g, respectively. A range of 1.8 to 24.7 micrograms/ml for alpha-tocopherol was measured in the colostrum samples. Specific gravity of the colostrum was measured using a hydrometer in increments of 5 degrees C from 0 to 40 degrees C. Specific gravity explained 76% of the variation in colostral total IgG at a colostrum temperature of 20 degrees C. The regression model was improved only slightly with the addition of protein, fat, and total solids. The model for samples at 20 degrees C was IgG (milligrams per milliliter) = 958 x (specific gravity) - 969. Measurement of specific gravity at variable temperatures necessitated inclusion of temperature in the model for estimation of IgG. Inclusion of the other components of colostrum into the model slightly improved the fit. The regression model for samples at variable temperatures was as follows: IgG (milligrams per milliliter) = 853 x (specific gravity) + .4 x temperature (Celsius degrees) - 866.

  14. Habitat suitability index model for brook trout in streams of the Southern Blue Ridge Province: surrogate variables, model evaluation, and suggested improvements

    Treesearch

    Christoper J. Schmitt; A. Dennis Lemly; Parley V. Winger

    1993-01-01

    Data from several sources were collated and analyzed by correlation, regression, and principal components analysis to define surrrogate variables for use in the brook trout (Salvelinus fontinalis) habitat suitability index (HSI) model, and to evaluate the applicability of the model for assessing habitat in high elevation streams of the southern Blue Ridge Province (...

  15. Household Food Waste: Multivariate Regression and Principal Components Analyses of Awareness and Attitudes among U.S. Consumers

    PubMed Central

    2016-01-01

    We estimate models of consumer food waste awareness and attitudes using responses from a national survey of U.S. residents. Our models are interpreted through the lens of several theories that describe how pro-social behaviors relate to awareness, attitudes and opinions. Our analysis of patterns among respondents’ food waste attitudes yields a model with three principal components: one that represents perceived practical benefits households may lose if food waste were reduced, one that represents the guilt associated with food waste, and one that represents whether households feel they could be doing more to reduce food waste. We find our respondents express significant agreement that some perceived practical benefits are ascribed to throwing away uneaten food, e.g., nearly 70% of respondents agree that throwing away food after the package date has passed reduces the odds of foodborne illness, while nearly 60% agree that some food waste is necessary to ensure meals taste fresh. We identify that these attitudinal responses significantly load onto a single principal component that may represent a key attitudinal construct useful for policy guidance. Further, multivariate regression analysis reveals a significant positive association between the strength of this component and household income, suggesting that higher income households most strongly agree with statements that link throwing away uneaten food to perceived private benefits. PMID:27441687

  16. Application and interpretation of functional data analysis techniques to differential scanning calorimetry data from lupus patients.

    PubMed

    Kendrick, Sarah K; Zheng, Qi; Garbett, Nichola C; Brock, Guy N

    2017-01-01

    DSC is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are considered functional data. In this paper we apply functional data analysis techniques to analyze differential scanning calorimetry (DSC) data from individuals from the Lupus Family Registry and Repository (LFRR). The aim was to assess the effect of lupus disease status as well as additional covariates on the thermogram profiles, and use FD analysis methods to create models for classifying lupus vs. control patients on the basis of the thermogram curves. Thermograms were collected for 300 lupus patients and 300 controls without lupus who were matched with diseased individuals based on sex, race, and age. First, functional regression with a functional response (DSC) and categorical predictor (disease status) was used to determine how thermogram curve structure varied according to disease status and other covariates including sex, race, and year of birth. Next, functional logistic regression with disease status as the response and functional principal component analysis (FPCA) scores as the predictors was used to model the effect of thermogram structure on disease status prediction. The prediction accuracy for patients with Osteoarthritis and Rheumatoid Arthritis but without Lupus was also calculated to determine the ability of the classifier to differentiate between Lupus and other diseases. Data were divided 1000 times into separate 2/3 training and 1/3 test data for evaluation of predictions. Finally, derivatives of thermogram curves were included in the models to determine whether they aided in prediction of disease status. Functional regression with thermogram as a functional response and disease status as predictor showed a clear separation in thermogram curve structure between cases and controls. The logistic regression model with FPCA scores as the predictors gave the most accurate results with a mean 79.22% correct classification rate with a mean sensitivity = 79.70%, and specificity = 81.48%. The model correctly classified OA and RA patients without Lupus as controls at a rate of 75.92% on average with a mean sensitivity = 79.70% and specificity = 77.6%. Regression models including FPCA scores for derivative curves did not perform as well, nor did regression models including covariates. Changes in thermograms observed in the disease state likely reflect covalent modifications of plasma proteins or changes in large protein-protein interacting networks resulting in the stabilization of plasma proteins towards thermal denaturation. By relating functional principal components from thermograms to disease status, our Functional Principal Component Analysis model provides results that are more easily interpretable compared to prior studies. Further, the model could also potentially be coupled with other biomarkers to improve diagnostic classification for lupus.

  17. Development and Validation of the Work-Related Well-Being Index: Analysis of the Federal Employee Viewpoint Survey.

    PubMed

    Eaton, Jennifer L; Mohr, David C; Hodgson, Michael J; McPhaul, Kathleen M

    2018-02-01

    To describe development and validation of the work-related well-being (WRWB) index. Principal components analysis was performed using Federal Employee Viewpoint Survey (FEVS) data (N = 392,752) to extract variables representing worker well-being constructs. Confirmatory factor analysis was performed to verify factor structure. To validate the WRWB index, we used multiple regression analysis to examine relationships with burnout associated outcomes. Principal Components Analysis identified three positive psychology constructs: "Work Positivity", "Co-worker Relationships", and "Work Mastery". An 11 item index explaining 63.5% of variance was achieved. The structural equation model provided a very good fit to the data. Higher WRWB scores were positively associated with all three employee experience measures examined in regression models. The new WRWB index shows promise as a valid and widely accessible instrument to assess worker well-being.

  18. Development of a hybrid proximal sensing method for rapid identification of petroleum contaminated soils.

    PubMed

    Chakraborty, Somsubhra; Weindorf, David C; Li, Bin; Ali Aldabaa, Abdalsamad Abdalsatar; Ghosh, Rakesh Kumar; Paul, Sathi; Nasim Ali, Md

    2015-05-01

    Using 108 petroleum contaminated soil samples, this pilot study proposed a new analytical approach of combining visible near-infrared diffuse reflectance spectroscopy (VisNIR DRS) and portable X-ray fluorescence spectrometry (PXRF) for rapid and improved quantification of soil petroleum contamination. Results indicated that an advanced fused model where VisNIR DRS spectra-based penalized spline regression (PSR) was used to predict total petroleum hydrocarbon followed by PXRF elemental data-based random forest regression was used to model the PSR residuals, it outperformed (R(2)=0.78, residual prediction deviation (RPD)=2.19) all other models tested, even producing better generalization than using VisNIR DRS alone (RPD's of 1.64, 1.86, and 1.96 for random forest, penalized spline regression, and partial least squares regression, respectively). Additionally, unsupervised principal component analysis using the PXRF+VisNIR DRS system qualitatively separated contaminated soils from control samples. Fusion of PXRF elemental data and VisNIR derivative spectra produced an optimized model for total petroleum hydrocarbon quantification in soils. Copyright © 2015 Elsevier B.V. All rights reserved.

  19. Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches

    NASA Astrophysics Data System (ADS)

    Brokamp, Cole; Jandarov, Roman; Rao, M. B.; LeMasters, Grace; Ryan, Patrick

    2017-02-01

    Exposure assessment for elemental components of particulate matter (PM) using land use modeling is a complex problem due to the high spatial and temporal variations in pollutant concentrations at the local scale. Land use regression (LUR) models may fail to capture complex interactions and non-linear relationships between pollutant concentrations and land use variables. The increasing availability of big spatial data and machine learning methods present an opportunity for improvement in PM exposure assessment models. In this manuscript, our objective was to develop a novel land use random forest (LURF) model and compare its accuracy and precision to a LUR model for elemental components of PM in the urban city of Cincinnati, Ohio. PM smaller than 2.5 μm (PM2.5) and eleven elemental components were measured at 24 sampling stations from the Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS). Over 50 different predictors associated with transportation, physical features, community socioeconomic characteristics, greenspace, land cover, and emission point sources were used to construct LUR and LURF models. Cross validation was used to quantify and compare model performance. LURF and LUR models were created for aluminum (Al), copper (Cu), iron (Fe), potassium (K), manganese (Mn), nickel (Ni), lead (Pb), sulfur (S), silicon (Si), vanadium (V), zinc (Zn), and total PM2.5 in the CCAAPS study area. LURF utilized a more diverse and greater number of predictors than LUR and LURF models for Al, K, Mn, Pb, Si, Zn, TRAP, and PM2.5 all showed a decrease in fractional predictive error of at least 5% compared to their LUR models. LURF models for Al, Cu, Fe, K, Mn, Pb, Si, Zn, TRAP, and PM2.5 all had a cross validated fractional predictive error less than 30%. Furthermore, LUR models showed a differential exposure assessment bias and had a higher prediction error variance. Random forest and other machine learning methods may provide more accurate exposure assessment.

  20. Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches.

    PubMed

    Brokamp, Cole; Jandarov, Roman; Rao, M B; LeMasters, Grace; Ryan, Patrick

    2017-02-01

    Exposure assessment for elemental components of particulate matter (PM) using land use modeling is a complex problem due to the high spatial and temporal variations in pollutant concentrations at the local scale. Land use regression (LUR) models may fail to capture complex interactions and non-linear relationships between pollutant concentrations and land use variables. The increasing availability of big spatial data and machine learning methods present an opportunity for improvement in PM exposure assessment models. In this manuscript, our objective was to develop a novel land use random forest (LURF) model and compare its accuracy and precision to a LUR model for elemental components of PM in the urban city of Cincinnati, Ohio. PM smaller than 2.5 μm (PM2.5) and eleven elemental components were measured at 24 sampling stations from the Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS). Over 50 different predictors associated with transportation, physical features, community socioeconomic characteristics, greenspace, land cover, and emission point sources were used to construct LUR and LURF models. Cross validation was used to quantify and compare model performance. LURF and LUR models were created for aluminum (Al), copper (Cu), iron (Fe), potassium (K), manganese (Mn), nickel (Ni), lead (Pb), sulfur (S), silicon (Si), vanadium (V), zinc (Zn), and total PM2.5 in the CCAAPS study area. LURF utilized a more diverse and greater number of predictors than LUR and LURF models for Al, K, Mn, Pb, Si, Zn, TRAP, and PM2.5 all showed a decrease in fractional predictive error of at least 5% compared to their LUR models. LURF models for Al, Cu, Fe, K, Mn, Pb, Si, Zn, TRAP, and PM2.5 all had a cross validated fractional predictive error less than 30%. Furthermore, LUR models showed a differential exposure assessment bias and had a higher prediction error variance. Random forest and other machine learning methods may provide more accurate exposure assessment.

  1. Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches

    PubMed Central

    Brokamp, Cole; Jandarov, Roman; Rao, M.B.; LeMasters, Grace; Ryan, Patrick

    2017-01-01

    Exposure assessment for elemental components of particulate matter (PM) using land use modeling is a complex problem due to the high spatial and temporal variations in pollutant concentrations at the local scale. Land use regression (LUR) models may fail to capture complex interactions and non-linear relationships between pollutant concentrations and land use variables. The increasing availability of big spatial data and machine learning methods present an opportunity for improvement in PM exposure assessment models. In this manuscript, our objective was to develop a novel land use random forest (LURF) model and compare its accuracy and precision to a LUR model for elemental components of PM in the urban city of Cincinnati, Ohio. PM smaller than 2.5 μm (PM2.5) and eleven elemental components were measured at 24 sampling stations from the Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS). Over 50 different predictors associated with transportation, physical features, community socioeconomic characteristics, greenspace, land cover, and emission point sources were used to construct LUR and LURF models. Cross validation was used to quantify and compare model performance. LURF and LUR models were created for aluminum (Al), copper (Cu), iron (Fe), potassium (K), manganese (Mn), nickel (Ni), lead (Pb), sulfur (S), silicon (Si), vanadium (V), zinc (Zn), and total PM2.5 in the CCAAPS study area. LURF utilized a more diverse and greater number of predictors than LUR and LURF models for Al, K, Mn, Pb, Si, Zn, TRAP, and PM2.5 all showed a decrease in fractional predictive error of at least 5% compared to their LUR models. LURF models for Al, Cu, Fe, K, Mn, Pb, Si, Zn, TRAP, and PM2.5 all had a cross validated fractional predictive error less than 30%. Furthermore, LUR models showed a differential exposure assessment bias and had a higher prediction error variance. Random forest and other machine learning methods may provide more accurate exposure assessment. PMID:28959135

  2. A Developmental Model of Cross-Cultural Competence at the Tactical Level

    DTIC Science & Technology

    2010-11-01

    components of 3C and describe how 3C develops in Soldiers. Five components of 3C were identified: Cultural Maturity , Cognitive Flexibility, Cultural...a result of the data analysis: Cultural Maturity , Cognitive Flexibility, Cultural Knowledge, Cultural Acuity, and Interpersonal Skills. These five...create regressions in the 3C development process. In short, KSAAs mature interdependently and simultaneously. Thus, development and transitions across

  3. Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle.

    PubMed

    Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan

    2016-12-01

    The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran.

  4. Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle

    PubMed Central

    Naserkheil, Masoumeh; Miraie-Ashtiani, Seyed Reza; Nejati-Javaremi, Ardeshir; Son, Jihyun; Lee, Deukhwan

    2016-01-01

    The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran. PMID:26954192

  5. [Correlation coefficient-based classification method of hydrological dependence variability: With auto-regression model as example].

    PubMed

    Zhao, Yu Xi; Xie, Ping; Sang, Yan Fang; Wu, Zi Yi

    2018-04-01

    Hydrological process evaluation is temporal dependent. Hydrological time series including dependence components do not meet the data consistency assumption for hydrological computation. Both of those factors cause great difficulty for water researches. Given the existence of hydrological dependence variability, we proposed a correlationcoefficient-based method for significance evaluation of hydrological dependence based on auto-regression model. By calculating the correlation coefficient between the original series and its dependence component and selecting reasonable thresholds of correlation coefficient, this method divided significance degree of dependence into no variability, weak variability, mid variability, strong variability, and drastic variability. By deducing the relationship between correlation coefficient and auto-correlation coefficient in each order of series, we found that the correlation coefficient was mainly determined by the magnitude of auto-correlation coefficient from the 1 order to p order, which clarified the theoretical basis of this method. With the first-order and second-order auto-regression models as examples, the reasonability of the deduced formula was verified through Monte-Carlo experiments to classify the relationship between correlation coefficient and auto-correlation coefficient. This method was used to analyze three observed hydrological time series. The results indicated the coexistence of stochastic and dependence characteristics in hydrological process.

  6. Random regression analyses using B-splines functions to model growth from birth to adult age in Canchim cattle.

    PubMed

    Baldi, F; Alencar, M M; Albuquerque, L G

    2010-12-01

    The objective of this work was to estimate covariance functions using random regression models on B-splines functions of animal age, for weights from birth to adult age in Canchim cattle. Data comprised 49,011 records on 2435 females. The model of analysis included fixed effects of contemporary groups, age of dam as quadratic covariable and the population mean trend taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were modelled through a step function with four classes. The direct and maternal additive genetic effects, and animal and maternal permanent environmental effects were included as random effects in the model. A total of seventeen analyses, considering linear, quadratic and cubic B-splines functions and up to seven knots, were carried out. B-spline functions of the same order were considered for all random effects. Random regression models on B-splines functions were compared to a random regression model on Legendre polynomials and with a multitrait model. Results from different models of analyses were compared using the REML form of the Akaike Information criterion and Schwarz' Bayesian Information criterion. In addition, the variance components and genetic parameters estimated for each random regression model were also used as criteria to choose the most adequate model to describe the covariance structure of the data. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most adequate to describe the covariance structure of the data. Random regression models using B-spline functions as base functions fitted the data better than Legendre polynomials, especially at mature ages, but higher number of parameters need to be estimated with B-splines functions. © 2010 Blackwell Verlag GmbH.

  7. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy

    NASA Astrophysics Data System (ADS)

    Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.

    2015-05-01

    The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.

  8. Improving the long-lead predictability of El Niño using a novel forecasting scheme based on a dynamic components model

    NASA Astrophysics Data System (ADS)

    Petrova, Desislava; Koopman, Siem Jan; Ballester, Joan; Rodó, Xavier

    2017-02-01

    El Niño (EN) is a dominant feature of climate variability on inter-annual time scales driving changes in the climate throughout the globe, and having wide-spread natural and socio-economic consequences. In this sense, its forecast is an important task, and predictions are issued on a regular basis by a wide array of prediction schemes and climate centres around the world. This study explores a novel method for EN forecasting. In the state-of-the-art the advantageous statistical technique of unobserved components time series modeling, also known as structural time series modeling, has not been applied. Therefore, we have developed such a model where the statistical analysis, including parameter estimation and forecasting, is based on state space methods, and includes the celebrated Kalman filter. The distinguishing feature of this dynamic model is the decomposition of a time series into a range of stochastically time-varying components such as level (or trend), seasonal, cycles of different frequencies, irregular, and regression effects incorporated as explanatory covariates. These components are modeled separately and ultimately combined in a single forecasting scheme. Customary statistical models for EN prediction essentially use SST and wind stress in the equatorial Pacific. In addition to these, we introduce a new domain of regression variables accounting for the state of the subsurface ocean temperature in the western and central equatorial Pacific, motivated by our analysis, as well as by recent and classical research, showing that subsurface processes and heat accumulation there are fundamental for the genesis of EN. An important feature of the scheme is that different regression predictors are used at different lead months, thus capturing the dynamical evolution of the system and rendering more efficient forecasts. The new model has been tested with the prediction of all warm events that occurred in the period 1996-2015. Retrospective forecasts of these events were made for long lead times of at least two and a half years. Hence, the present study demonstrates that the theoretical limit of ENSO prediction should be sought much longer than the commonly accepted "Spring Barrier". The high correspondence between the forecasts and observations indicates that the proposed model outperforms all current operational statistical models, and behaves comparably to the best dynamical models used for EN prediction. Thus, the novel way in which the modeling scheme has been structured could also be used for improving other statistical and dynamical modeling systems.

  9. Fast determination of total ginsenosides content in ginseng powder by near infrared reflectance spectroscopy

    NASA Astrophysics Data System (ADS)

    Chen, Hua-cai; Chen, Xing-dan; Lu, Yong-jun; Cao, Zhi-qiang

    2006-01-01

    Near infrared (NIR) reflectance spectroscopy was used to develop a fast determination method for total ginsenosides in Ginseng (Panax Ginseng) powder. The spectra were analyzed with multiplicative signal correction (MSC) correlation method. The best correlative spectra region with the total ginsenosides content was 1660 nm~1880 nm and 2230nm~2380 nm. The NIR calibration models of ginsenosides were built with multiple linear regression (MLR), principle component regression (PCR) and partial least squares (PLS) regression respectively. The results showed that the calibration model built with PLS combined with MSC and the optimal spectrum region was the best one. The correlation coefficient and the root mean square error of correction validation (RMSEC) of the best calibration model were 0.98 and 0.15% respectively. The optimal spectrum region for calibration was 1204nm~2014nm. The result suggested that using NIR to rapidly determinate the total ginsenosides content in ginseng powder were feasible.

  10. Kendall-Theil Robust Line (KTRLine--version 1.0)-A Visual Basic Program for Calculating and Graphing Robust Nonparametric Estimates of Linear-Regression Coefficients Between Two Continuous Variables

    USGS Publications Warehouse

    Granato, Gregory E.

    2006-01-01

    The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.

  11. “Smooth” Semiparametric Regression Analysis for Arbitrarily Censored Time-to-Event Data

    PubMed Central

    Zhang, Min; Davidian, Marie

    2008-01-01

    Summary A general framework for regression analysis of time-to-event data subject to arbitrary patterns of censoring is proposed. The approach is relevant when the analyst is willing to assume that distributions governing model components that are ordinarily left unspecified in popular semiparametric regression models, such as the baseline hazard function in the proportional hazards model, have densities satisfying mild “smoothness” conditions. Densities are approximated by a truncated series expansion that, for fixed degree of truncation, results in a “parametric” representation, which makes likelihood-based inference coupled with adaptive choice of the degree of truncation, and hence flexibility of the model, computationally and conceptually straightforward with data subject to any pattern of censoring. The formulation allows popular models, such as the proportional hazards, proportional odds, and accelerated failure time models, to be placed in a common framework; provides a principled basis for choosing among them; and renders useful extensions of the models straightforward. The utility and performance of the methods are demonstrated via simulations and by application to data from time-to-event studies. PMID:17970813

  12. Cross-validation pitfalls when selecting and assessing regression and classification models.

    PubMed

    Krstajic, Damjan; Buturovic, Ljubomir J; Leahy, David E; Thomas, Simon

    2014-03-29

    We address the problem of selecting and assessing classification and regression models using cross-validation. Current state-of-the-art methods can yield models with high variance, rendering them unsuitable for a number of practical applications including QSAR. In this paper we describe and evaluate best practices which improve reliability and increase confidence in selected models. A key operational component of the proposed methods is cloud computing which enables routine use of previously infeasible approaches. We describe in detail an algorithm for repeated grid-search V-fold cross-validation for parameter tuning in classification and regression, and we define a repeated nested cross-validation algorithm for model assessment. As regards variable selection and parameter tuning we define two algorithms (repeated grid-search cross-validation and double cross-validation), and provide arguments for using the repeated grid-search in the general case. We show results of our algorithms on seven QSAR datasets. The variation of the prediction performance, which is the result of choosing different splits of the dataset in V-fold cross-validation, needs to be taken into account when selecting and assessing classification and regression models. We demonstrate the importance of repeating cross-validation when selecting an optimal model, as well as the importance of repeating nested cross-validation when assessing a prediction error.

  13. Research on Fault Rate Prediction Method of T/R Component

    NASA Astrophysics Data System (ADS)

    Hou, Xiaodong; Yang, Jiangping; Bi, Zengjun; Zhang, Yu

    2017-07-01

    T/R component is an important part of the large phased array radar antenna array, because of its large numbers, high fault rate, it has important significance for fault prediction. Aiming at the problems of traditional grey model GM(1,1) in practical operation, the discrete grey model is established based on the original model in this paper, and the optimization factor is introduced to optimize the background value, and the linear form of the prediction model is added, the improved discrete grey model of linear regression is proposed, finally, an example is simulated and compared with other models. The results show that the method proposed in this paper has higher accuracy and the solution is simple and the application scope is more extensive.

  14. Retrieve sea surface salinity using principal component regression model based on SMOS satellite data

    NASA Astrophysics Data System (ADS)

    Zhao, Hong; Li, Changjun; Li, Hongping; Lv, Kebo; Zhao, Qinghui

    2016-06-01

    The sea surface salinity (SSS) is a key parameter in monitoring ocean states. Observing SSS can promote the understanding of global water cycle. This paper provides a new approach for retrieving sea surface salinity from Soil Moisture and Ocean Salinity (SMOS) satellite data. Based on the principal component regression (PCR) model, SSS can also be retrieved from the brightness temperature data of SMOS L2 measurements and Auxiliary data. 26 pair matchup data is used in model validation for the South China Sea (in the area of 4°-25°N, 105°-125°E). The RMSE value of PCR model retrieved SSS reaches 0.37 psu (practical salinity units) and the RMSE of SMOS SSS1 is 1.65 psu when compared with in-situ SSS. The corresponding Argo daily salinity data during April to June 2013 is also used in our validation with RMSE value 0.46 psu compared to 1.82 psu for daily averaged SMOS L2 products. This indicates that the PCR model is valid and may provide us with a good approach for retrieving SSS from SMOS satellite data.

  15. Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk

    PubMed Central

    Czarnota, Jenna; Gennings, Chris; Wheeler, David C

    2015-01-01

    In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case–control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome. PMID:26005323

  16. Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk.

    PubMed

    Czarnota, Jenna; Gennings, Chris; Wheeler, David C

    2015-01-01

    In evaluation of cancer risk related to environmental chemical exposures, the effect of many chemicals on disease is ultimately of interest. However, because of potentially strong correlations among chemicals that occur together, traditional regression methods suffer from collinearity effects, including regression coefficient sign reversal and variance inflation. In addition, penalized regression methods designed to remediate collinearity may have limitations in selecting the truly bad actors among many correlated components. The recently proposed method of weighted quantile sum (WQS) regression attempts to overcome these problems by estimating a body burden index, which identifies important chemicals in a mixture of correlated environmental chemicals. Our focus was on assessing through simulation studies the accuracy of WQS regression in detecting subsets of chemicals associated with health outcomes (binary and continuous) in site-specific analyses and in non-site-specific analyses. We also evaluated the performance of the penalized regression methods of lasso, adaptive lasso, and elastic net in correctly classifying chemicals as bad actors or unrelated to the outcome. We based the simulation study on data from the National Cancer Institute Surveillance Epidemiology and End Results Program (NCI-SEER) case-control study of non-Hodgkin lymphoma (NHL) to achieve realistic exposure situations. Our results showed that WQS regression had good sensitivity and specificity across a variety of conditions considered in this study. The shrinkage methods had a tendency to incorrectly identify a large number of components, especially in the case of strong association with the outcome.

  17. Assessing Principal Component Regression Prediction of Neurochemicals Detected with Fast-Scan Cyclic Voltammetry

    PubMed Central

    2011-01-01

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586

  18. Assessing principal component regression prediction of neurochemicals detected with fast-scan cyclic voltammetry.

    PubMed

    Keithley, Richard B; Wightman, R Mark

    2011-06-07

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.

  19. System Testing of Ground Cooling System Components

    NASA Technical Reports Server (NTRS)

    Ensey, Tyler Steven

    2014-01-01

    This internship focused primarily upon software unit testing of Ground Cooling System (GCS) components, one of the three types of tests (unit, integrated, and COTS/regression) utilized in software verification. Unit tests are used to test the software of necessary components before it is implemented into the hardware. A unit test determines that the control data, usage procedures, and operating procedures of a particular component are tested to determine if the program is fit for use. Three different files are used to make and complete an efficient unit test. These files include the following: Model Test file (.mdl), Simulink SystemTest (.test), and autotest (.m). The Model Test file includes the component that is being tested with the appropriate Discrete Physical Interface (DPI) for testing. The Simulink SystemTest is a program used to test all of the requirements of the component. The autotest tests that the component passes Model Advisor and System Testing, and puts the results into proper files. Once unit testing is completed on the GCS components they can then be implemented into the GCS Schematic and the software of the GCS model as a whole can be tested using integrated testing. Unit testing is a critical part of software verification; it allows for the testing of more basic components before a model of higher fidelity is tested, making the process of testing flow in an orderly manner.

  20. Intermediate and advanced topics in multilevel logistic regression analysis

    PubMed Central

    Merlo, Juan

    2017-01-01

    Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher‐level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within‐cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population‐average effect of covariates measured at the subject and cluster level, in contrast to the within‐cluster or cluster‐specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster‐level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28543517

  1. Intermediate and advanced topics in multilevel logistic regression analysis.

    PubMed

    Austin, Peter C; Merlo, Juan

    2017-09-10

    Multilevel data occur frequently in health services, population and public health, and epidemiologic research. In such research, binary outcomes are common. Multilevel logistic regression models allow one to account for the clustering of subjects within clusters of higher-level units when estimating the effect of subject and cluster characteristics on subject outcomes. A search of the PubMed database demonstrated that the use of multilevel or hierarchical regression models is increasing rapidly. However, our impression is that many analysts simply use multilevel regression models to account for the nuisance of within-cluster homogeneity that is induced by clustering. In this article, we describe a suite of analyses that can complement the fitting of multilevel logistic regression models. These ancillary analyses permit analysts to estimate the marginal or population-average effect of covariates measured at the subject and cluster level, in contrast to the within-cluster or cluster-specific effects arising from the original multilevel logistic regression model. We describe the interval odds ratio and the proportion of opposed odds ratios, which are summary measures of effect for cluster-level covariates. We describe the variance partition coefficient and the median odds ratio which are measures of components of variance and heterogeneity in outcomes. These measures allow one to quantify the magnitude of the general contextual effect. We describe an R 2 measure that allows analysts to quantify the proportion of variation explained by different multilevel logistic regression models. We illustrate the application and interpretation of these measures by analyzing mortality in patients hospitalized with a diagnosis of acute myocardial infarction. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.

  2. Use of Cox's Cure Model to Establish Clinical Determinants of Long-Term Disease-Free Survival in Neoadjuvant-Chemotherapy-Treated Breast Cancer Patients without Pathologic Complete Response.

    PubMed

    Asano, Junichi; Hirakawa, Akihiro; Hamada, Chikuma; Yonemori, Kan; Hirata, Taizo; Shimizu, Chikako; Tamura, Kenji; Fujiwara, Yasuhiro

    2013-01-01

    In prognostic studies for breast cancer patients treated with neoadjuvant chemotherapy (NAC), the ordinary Cox proportional-hazards (PH) model has been often used to identify prognostic factors for disease-free survival (DFS). This model assumes that all patients eventually experience relapse or death. However, a subset of NAC-treated breast cancer patients never experience these events during long-term follow-up (>10 years) and may be considered clinically "cured." Clinical factors associated with cure have not been studied adequately. Because the ordinary Cox PH model cannot be used to identify such clinical factors, we used the Cox PH cure model, a recently developed statistical method. This model includes both a logistic regression component for the cure rate and a Cox regression component for the hazard for uncured patients. The purpose of this study was to identify the clinical factors associated with cure and the variables associated with the time to recurrence or death in NAC-treated breast cancer patients without a pathologic complete response, by using the Cox PH cure model. We found that hormone receptor status, clinical response, human epidermal growth factor receptor 2 status, histological grade, and the number of lymph node metastases were associated with cure.

  3. The Impact of State Legislation and Model Policies on Bullying in Schools.

    PubMed

    Terry, Amanda

    2018-04-01

    The purpose of this study was to determine the impact of the coverage of state legislation and the expansiveness ratings of state model policies on the state-level prevalence of bullying in schools. The state-level prevalence of bullying in schools was based on cross-sectional data from the 2013 High School Youth Risk Behavior Survey. Multiple regression was conducted to determine whether the coverage of state legislation and the expansiveness rating of a state model policy affected the state-level prevalence of bullying in schools. The purpose and definition category of components in state legislation and the expansiveness rating of a state model policy were statistically significant predictors of the state-level prevalence of bullying in schools. The other 3 categories of components in state legislation-District Policy Development and Review, District Policy Components, and Additional Components-were not statistically significant predictors in the model. Extensive coverage in the purpose and definition category of components in state legislation and a high expansiveness rating of a state model policy may be important in efforts to reduce bullying in schools. Improving these areas may reduce the state-level prevalence of bullying in schools. © 2018, American School Health Association.

  4. Comparison of random regression models with Legendre polynomials and linear splines for production traits and somatic cell score of Canadian Holstein cows.

    PubMed

    Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G

    2008-09-01

    A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.

  5. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.

    PubMed

    Bobb, Jennifer F; Valeri, Linda; Claus Henn, Birgit; Christiani, David C; Wright, Robert O; Mazumdar, Maitreyi; Godleski, John J; Coull, Brent A

    2015-07-01

    Because humans are invariably exposed to complex chemical mixtures, estimating the health effects of multi-pollutant exposures is of critical concern in environmental epidemiology, and to regulatory agencies such as the U.S. Environmental Protection Agency. However, most health effects studies focus on single agents or consider simple two-way interaction models, in part because we lack the statistical methodology to more realistically capture the complexity of mixed exposures. We introduce Bayesian kernel machine regression (BKMR) as a new approach to study mixtures, in which the health outcome is regressed on a flexible function of the mixture (e.g. air pollution or toxic waste) components that is specified using a kernel function. In high-dimensional settings, a novel hierarchical variable selection approach is incorporated to identify important mixture components and account for the correlated structure of the mixture. Simulation studies demonstrate the success of BKMR in estimating the exposure-response function and in identifying the individual components of the mixture responsible for health effects. We demonstrate the features of the method through epidemiology and toxicology applications. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  6. Exploring the use of random regression models with legendre polynomials to analyze measures of volume of ejaculate in Holstein bulls.

    PubMed

    Carabaño, M J; Díaz, C; Ugarte, C; Serrano, M

    2007-02-01

    Artificial insemination centers routinely collect records of quantity and quality of semen of bulls throughout the animals' productive period. The goal of this paper was to explore the use of random regression models with orthogonal polynomials to analyze repeated measures of semen production of Spanish Holstein bulls. A total of 8,773 records of volume of first ejaculate (VFE) collected between 12 and 30 mo of age from 213 Spanish Holstein bulls was analyzed under alternative random regression models. Legendre polynomial functions of increasing order (0 to 6) were fitted to the average trajectory, additive genetic and permanent environmental effects. Age at collection and days in production were used as time variables. Heterogeneous and homogeneous residual variances were alternatively assumed. Analyses were carried out within a Bayesian framework. The logarithm of the marginal density and the cross-validation predictive ability of the data were used as model comparison criteria. Based on both criteria, age at collection as a time variable and heterogeneous residuals models are recommended to analyze changes of VFE over time. Both criteria indicated that fitting random curves for genetic and permanent environmental components as well as for the average trajector improved the quality of models. Furthermore, models with a higher order polynomial for the permanent environmental (5 to 6) than for the genetic components (4 to 5) and the average trajectory (2 to 3) tended to perform best. High-order polynomials were needed to accommodate the highly oscillating nature of the phenotypic values. Heritability and repeatability estimates, disregarding the extremes of the studied period, ranged from 0.15 to 0.35 and from 0.20 to 0.50, respectively, indicating that selection for VFE may be effective at any stage. Small differences among models were observed. Apart from the extremes, estimated correlations between ages decreased steadily from 0.9 and 0.4 for measures 1 mo apart to 0.4 and 0.2 for most distant measures for additive genetic and phenotypic components, respectively. Further investigation to account for environmental factors that may be responsible for the oscillating observations of VFE is needed.

  7. Robust estimation for partially linear models with large-dimensional covariates

    PubMed Central

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2014-01-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087

  8. Robust estimation for partially linear models with large-dimensional covariates.

    PubMed

    Zhu, LiPing; Li, RunZe; Cui, HengJian

    2013-10-01

    We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.

  9. A Semiparametric Change-Point Regression Model for Longitudinal Observations.

    PubMed

    Xing, Haipeng; Ying, Zhiliang

    2012-12-01

    Many longitudinal studies involve relating an outcome process to a set of possibly time-varying covariates, giving rise to the usual regression models for longitudinal data. When the purpose of the study is to investigate the covariate effects when experimental environment undergoes abrupt changes or to locate the periods with different levels of covariate effects, a simple and easy-to-interpret approach is to introduce change-points in regression coefficients. In this connection, we propose a semiparametric change-point regression model, in which the error process (stochastic component) is nonparametric and the baseline mean function (functional part) is completely unspecified, the observation times are allowed to be subject-specific, and the number, locations and magnitudes of change-points are unknown and need to be estimated. We further develop an estimation procedure which combines the recent advance in semiparametric analysis based on counting process argument and multiple change-points inference, and discuss its large sample properties, including consistency and asymptotic normality, under suitable regularity conditions. Simulation results show that the proposed methods work well under a variety of scenarios. An application to a real data set is also given.

  10. A new approach for continuous estimation of baseflow using discrete water quality data: Method description and comparison with baseflow estimates from two existing approaches

    USGS Publications Warehouse

    Miller, Matthew P.; Johnson, Henry M.; Susong, David D.; Wolock, David M.

    2015-01-01

    Understanding how watershed characteristics and climate influence the baseflow component of stream discharge is a topic of interest to both the scientific and water management communities. Therefore, the development of baseflow estimation methods is a topic of active research. Previous studies have demonstrated that graphical hydrograph separation (GHS) and conductivity mass balance (CMB) methods can be applied to stream discharge data to estimate daily baseflow. While CMB is generally considered to be a more objective approach than GHS, its application across broad spatial scales is limited by a lack of high frequency specific conductance (SC) data. We propose a new method that uses discrete SC data, which are widely available, to estimate baseflow at a daily time step using the CMB method. The proposed approach involves the development of regression models that relate discrete SC concentrations to stream discharge and time. Regression-derived CMB baseflow estimates were more similar to baseflow estimates obtained using a CMB approach with measured high frequency SC data than were the GHS baseflow estimates at twelve snowmelt dominated streams and rivers. There was a near perfect fit between the regression-derived and measured CMB baseflow estimates at sites where the regression models were able to accurately predict daily SC concentrations. We propose that the regression-derived approach could be applied to estimate baseflow at large numbers of sites, thereby enabling future investigations of watershed and climatic characteristics that influence the baseflow component of stream discharge across large spatial scales.

  11. Support vector regression and artificial neural network models for stability indicating analysis of mebeverine hydrochloride and sulpiride mixtures in pharmaceutical preparation: A comparative study

    NASA Astrophysics Data System (ADS)

    Naguib, Ibrahim A.; Darwish, Hany W.

    2012-02-01

    A comparison between support vector regression (SVR) and Artificial Neural Networks (ANNs) multivariate regression methods is established showing the underlying algorithm for each and making a comparison between them to indicate the inherent advantages and limitations. In this paper we compare SVR to ANN with and without variable selection procedure (genetic algorithm (GA)). To project the comparison in a sensible way, the methods are used for the stability indicating quantitative analysis of mixtures of mebeverine hydrochloride and sulpiride in binary mixtures as a case study in presence of their reported impurities and degradation products (summing up to 6 components) in raw materials and pharmaceutical dosage form via handling the UV spectral data. For proper analysis, a 6 factor 5 level experimental design was established resulting in a training set of 25 mixtures containing different ratios of the interfering species. An independent test set consisting of 5 mixtures was used to validate the prediction ability of the suggested models. The proposed methods (linear SVR (without GA) and linear GA-ANN) were successfully applied to the analysis of pharmaceutical tablets containing mebeverine hydrochloride and sulpiride mixtures. The results manifest the problem of nonlinearity and how models like the SVR and ANN can handle it. The methods indicate the ability of the mentioned multivariate calibration models to deconvolute the highly overlapped UV spectra of the 6 components' mixtures, yet using cheap and easy to handle instruments like the UV spectrophotometer.

  12. Analysis of nonlinear relationships in dual epidemics, and its application to the management of grapevine downy and powdery mildews.

    PubMed

    Savary, Serge; Delbac, Lionel; Rochas, Amélie; Taisant, Guillaume; Willocquet, Laetitia

    2009-08-01

    Dual epidemics are defined as epidemics developing on two or several plant organs in the course of a cropping season. Agricultural pathosystems where such epidemics develop are often very important, because the harvestable part is one of the organs affected. These epidemics also are often difficult to manage, because the linkage between epidemiological components occurring on different organs is poorly understood, and because prediction of the risk toward the harvestable organs is difficult. In the case of downy mildew (DM) and powdery mildew (PM) of grapevine, nonlinear modeling and logistic regression indicated nonlinearity in the foliage-cluster relationships. Nonlinear modeling enabled the parameterization of a transmission coefficient that numerically links the two components, leaves and clusters, in DM and PM epidemics. Logistic regression analysis yielded a series of probabilistic models that enabled predicting preset levels of cluster infection risks based on DM and PM severities on the foliage at successive crop stages. The usefulness of this framework for tactical decision-making for disease control is discussed.

  13. A classical regression framework for mediation analysis: fitting one model to estimate mediation effects.

    PubMed

    Saunders, Christina T; Blume, Jeffrey D

    2017-10-26

    Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.

  14. Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method

    NASA Astrophysics Data System (ADS)

    Asavaskulkiet, Krissada

    2018-04-01

    In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.

  15. Modeling daily soil temperature over diverse climate conditions in Iran—a comparison of multiple linear regression and support vector regression techniques

    NASA Astrophysics Data System (ADS)

    Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan

    2018-02-01

    The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.

  16. Genetic improvement in mastitis resistance: comparison of selection criteria from cross-sectional and random regression sire models for somatic cell score.

    PubMed

    Odegård, J; Klemetsdal, G; Heringstad, B

    2005-04-01

    Several selection criteria for reducing incidence of mastitis were developed from a random regression sire model for test-day somatic cell score (SCS). For comparison, sire transmitting abilities were also predicted based on a cross-sectional model for lactation mean SCS. Only first-crop daughters were used in genetic evaluation of SCS, and the different selection criteria were compared based on their correlation with incidence of clinical mastitis in second-crop daughters (measured as mean daughter deviations). Selection criteria were predicted based on both complete and reduced first-crop daughter groups (261 or 65 daughters per sire, respectively). For complete daughter groups, predicted transmitting abilities at around 30 d in milk showed the best predictive ability for incidence of clinical mastitis, closely followed by average predicted transmitting abilities over the entire lactation. Both of these criteria were derived from the random regression model. These selection criteria improved accuracy of selection by approximately 2% relative to a cross-sectional model. However, for reduced daughter groups, the cross-sectional model yielded increased predictive ability compared with the selection criteria based on the random regression model. This result may be explained by the cross-sectional model being more robust, i.e., less sensitive to precision of (co)variance components estimates and effects of data structure.

  17. The potential of non-invasive pre- and post-mortem carcass measurements to predict the contribution of carcass components to slaughter yield of guinea pigs.

    PubMed

    Barba, Lida; Sánchez-Macías, Davinia; Barba, Iván; Rodríguez, Nibaldo

    2018-06-01

    Guinea pig meat consumption is increasing exponentially worldwide. The evaluation of the contribution of carcass components to carcass quality potentially can allow for the estimation of the value added to food animal origin and make research in guinea pigs more practicable. The aim of this study was to propose a methodology for modelling the contribution of different carcass components to the overall carcass quality of guinea pigs by using non-invasive pre- and post mortem carcass measurements. The selection of predictors was developed through correlation analysis and statistical significance; whereas the prediction models were based on Multiple Linear Regression. The prediction results showed higher accuracy in the prediction of carcass component contribution expressed in grams, compared to when expressed as a percentage of carcass quality components. The proposed prediction models can be useful for the guinea pig meat industry and research institutions by using non-invasive and time- and cost-efficient carcass component measuring techniques. Copyright © 2018 Elsevier Ltd. All rights reserved.

  18. Dynamic linear models using the Kalman filter for early detection and early warning of malaria outbreaks

    NASA Astrophysics Data System (ADS)

    Merkord, C. L.; Liu, Y.; DeVos, M.; Wimberly, M. C.

    2015-12-01

    Malaria early detection and early warning systems are important tools for public health decision makers in regions where malaria transmission is seasonal and varies from year to year with fluctuations in rainfall and temperature. Here we present a new data-driven dynamic linear model based on the Kalman filter with time-varying coefficients that are used to identify malaria outbreaks as they occur (early detection) and predict the location and timing of future outbreaks (early warning). We fit linear models of malaria incidence with trend and Fourier form seasonal components using three years of weekly malaria case data from 30 districts in the Amhara Region of Ethiopia. We identified past outbreaks by comparing the modeled prediction envelopes with observed case data. Preliminary results demonstrated the potential for improved accuracy and timeliness over commonly-used methods in which thresholds are based on simpler summary statistics of historical data. Other benefits of the dynamic linear modeling approach include robustness to missing data and the ability to fit models with relatively few years of training data. To predict future outbreaks, we started with the early detection model for each district and added a regression component based on satellite-derived environmental predictor variables including precipitation data from the Tropical Rainfall Measuring Mission (TRMM) and land surface temperature (LST) and spectral indices from the Moderate Resolution Imaging Spectroradiometer (MODIS). We included lagged environmental predictors in the regression component of the model, with lags chosen based on cross-correlation of the one-step-ahead forecast errors from the first model. Our results suggest that predictions of future malaria outbreaks can be improved by incorporating lagged environmental predictors.

  19. A use of regression analysis in acoustical diagnostics of gear drives

    NASA Technical Reports Server (NTRS)

    Balitskiy, F. Y.; Genkin, M. D.; Ivanova, M. A.; Kobrinskiy, A. A.; Sokolova, A. G.

    1973-01-01

    A study is presented of components of the vibration spectrum as the filtered first and second harmonics of the tooth frequency which permits information to be obtained on the physical characteristics of the vibration excitation process, and an approach to be made to comparison of models of the gearing. Regression analysis of two random processes has shown a strong dependence of the second harmonic on the first, and independence of the first from the second. The nature of change in the regression line, with change in loading moment, gives rise to the idea of a variable phase shift between the first and second harmonics.

  20. Prediction of Knee Joint Contact Forces From External Measures Using Principal Component Prediction and Reconstruction.

    PubMed

    Saliba, Christopher M; Clouthier, Allison L; Brandon, Scott C E; Rainbow, Michael J; Deluzio, Kevin J

    2018-05-29

    Abnormal loading of the knee joint contributes to the pathogenesis of knee osteoarthritis. Gait retraining is a non-invasive intervention that aims to reduce knee loads by providing audible, visual, or haptic feedback of gait parameters. The computational expense of joint contact force prediction has limited real-time feedback to surrogate measures of the contact force, such as the knee adduction moment. We developed a method to predict knee joint contact forces using motion analysis and a statistical regression model that can be implemented in near real-time. Gait waveform variables were deconstructed using principal component analysis and a linear regression was used to predict the principal component scores of the contact force waveforms. Knee joint contact force waveforms were reconstructed using the predicted scores. We tested our method using a heterogenous population of asymptomatic controls and subjects with knee osteoarthritis. The reconstructed contact force waveforms had mean (SD) RMS differences of 0.17 (0.05) bodyweight compared to the contact forces predicted by a musculoskeletal model. Our method successfully predicted subject-specific shape features of contact force waveforms and is a potentially powerful tool in biofeedback and clinical gait analysis.

  1. Flooding Fragility Experiments and Prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, Curtis L.; Tahhan, Antonio; Muchmore, Cody

    2016-09-01

    This report describes the work that has been performed on flooding fragility, both the experimental tests being carried out and the probabilistic fragility predictive models being produced in order to use the text results. Flooding experiments involving full-scale doors have commenced in the Portal Evaluation Tank. The goal of these experiments is to develop a full-scale component flooding experiment protocol and to acquire data that can be used to create Bayesian regression models representing the fragility of these components. This work is in support of the Risk-Informed Safety Margin Characterization (RISMC) Pathway external hazards evaluation research and development.

  2. A hybrid sales forecasting scheme by combining independent component analysis with K-means clustering and support vector regression.

    PubMed

    Lu, Chi-Jie; Chang, Chi-Chang

    2014-01-01

    Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting.

  3. A Hybrid Sales Forecasting Scheme by Combining Independent Component Analysis with K-Means Clustering and Support Vector Regression

    PubMed Central

    2014-01-01

    Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting. PMID:25045738

  4. [Impacts of multicomponent environment on solubility of puerarin in biopharmaceutics classification system of Chinese materia medica].

    PubMed

    Hou, Cheng-Bo; Wang, Guo-Peng; Zhang, Qiang; Yang, Wen-Ning; Lv, Bei-Ran; Wei, Li; Dong, Ling

    2014-12-01

    To illustrate the solubility involved in biopharmaceutics classification system of Chinese materia medica (CMMBCS) , the influences of artificial multicomponent environment on solubility were investigated in this study. Mathematical model was built to describe the variation trend of their influence on the solubility of puerarin. Carried out with progressive levels, single component environment: baicalin, berberine and glycyrrhizic acid; double-component environment: baicalin and glycyrrhizic acid, baicalin and berberine and glycyrrhizic acid and berberine; and treble-component environment: baicalin, berberin, glycyrrhizic acid were used to describe the variation tendency of their influences on the solubility of puerarin, respectively. And then, the mathematical regression equation model was established to characterize the solubility of puerarin under multicomponent environment.

  5. A kinetic energy model of two-vehicle crash injury severity.

    PubMed

    Sobhani, Amir; Young, William; Logan, David; Bahrololoom, Sareh

    2011-05-01

    An important part of any model of vehicle crashes is the development of a procedure to estimate crash injury severity. After reviewing existing models of crash severity, this paper outlines the development of a modelling approach aimed at measuring the injury severity of people in two-vehicle road crashes. This model can be incorporated into a discrete event traffic simulation model, using simulation model outputs as its input. The model can then serve as an integral part of a simulation model estimating the crash potential of components of the traffic system. The model is developed using Newtonian Mechanics and Generalised Linear Regression. The factors contributing to the speed change (ΔV(s)) of a subject vehicle are identified using the law of conservation of momentum. A Log-Gamma regression model is fitted to measure speed change (ΔV(s)) of the subject vehicle based on the identified crash characteristics. The kinetic energy applied to the subject vehicle is calculated by the model, which in turn uses a Log-Gamma Regression Model to estimate the Injury Severity Score of the crash from the calculated kinetic energy, crash impact type, presence of airbag and/or seat belt and occupant age. Copyright © 2010 Elsevier Ltd. All rights reserved.

  6. Assessing long-term variations in sagebrush habitat: characterization of spatial extents and distribution patterns using multi-temporal satellite remote-sensing data

    USGS Publications Warehouse

    Xian, George; Homer, Collin G.; Aldridge, Cameron L.

    2012-01-01

    An approach that can generate sagebrush habitat change estimates for monitoring large-area sagebrush ecosystems has been developed and tested in southwestern Wyoming, USA. This prototype method uses a satellite-based image change detection algorithm and regression models to estimate sub-pixel percentage cover for five sagebrush habitat components: bare ground, herbaceous, litter, sagebrush and shrub. Landsat images from three different months in 1988, 1996 and 2006 were selected to identify potential landscape change during these time periods using change vector (CV) analysis incorporated with an image normalization algorithm. Regression tree (RT) models were used to estimate percentage cover for five components on all change areas identified in 1988 and 1996, using unchanged 2006 baseline data as training for both estimates. Over the entire study area (24 950 km2), a net increase of 98.83 km2, or 0.7%, for bare ground was measured between 1988 and 2006. Over the same period, the other four components had net losses of 20.17 km2, or 0.6%, for herbaceous vegetation; 30.16 km2, or 0.7%, for litter; 32.81 km2, or 1.5%, for sagebrush; and 33.34 km2, or 1.2%, for shrubs. The overall accuracy for shrub vegetation change between 1988 and 2006 was 89.56%. Change patterns within sagebrush habitat components differ spatially and quantitatively from each other, potentially indicating unique responses by these components to disturbances imposed upon them.

  7. A statistical forecast model using the time-scale decomposition technique to predict rainfall during flood period over the middle and lower reaches of the Yangtze River Valley

    NASA Astrophysics Data System (ADS)

    Hu, Yijia; Zhong, Zhong; Zhu, Yimin; Ha, Yao

    2018-04-01

    In this paper, a statistical forecast model using the time-scale decomposition method is established to do the seasonal prediction of the rainfall during flood period (FPR) over the middle and lower reaches of the Yangtze River Valley (MLYRV). This method decomposites the rainfall over the MLYRV into three time-scale components, namely, the interannual component with the period less than 8 years, the interdecadal component with the period from 8 to 30 years, and the interdecadal component with the period larger than 30 years. Then, the predictors are selected for the three time-scale components of FPR through the correlation analysis. At last, a statistical forecast model is established using the multiple linear regression technique to predict the three time-scale components of the FPR, respectively. The results show that this forecast model can capture the interannual and interdecadal variation of FPR. The hindcast of FPR during 14 years from 2001 to 2014 shows that the FPR can be predicted successfully in 11 out of the 14 years. This forecast model performs better than the model using traditional scheme without time-scale decomposition. Therefore, the statistical forecast model using the time-scale decomposition technique has good skills and application value in the operational prediction of FPR over the MLYRV.

  8. Regression and multivariate models for predicting particulate matter concentration level.

    PubMed

    Nazif, Amina; Mohammed, Nurul Izma; Malakahmad, Amirhossein; Abualqumboz, Motasem S

    2018-01-01

    The devastating health effects of particulate matter (PM 10 ) exposure by susceptible populace has made it necessary to evaluate PM 10 pollution. Meteorological parameters and seasonal variation increases PM 10 concentration levels, especially in areas that have multiple anthropogenic activities. Hence, stepwise regression (SR), multiple linear regression (MLR) and principal component regression (PCR) analyses were used to analyse daily average PM 10 concentration levels. The analyses were carried out using daily average PM 10 concentration, temperature, humidity, wind speed and wind direction data from 2006 to 2010. The data was from an industrial air quality monitoring station in Malaysia. The SR analysis established that meteorological parameters had less influence on PM 10 concentration levels having coefficient of determination (R 2 ) result from 23 to 29% based on seasoned and unseasoned analysis. While, the result of the prediction analysis showed that PCR models had a better R 2 result than MLR methods. The results for the analyses based on both seasoned and unseasoned data established that MLR models had R 2 result from 0.50 to 0.60. While, PCR models had R 2 result from 0.66 to 0.89. In addition, the validation analysis using 2016 data also recognised that the PCR model outperformed the MLR model, with the PCR model for the seasoned analysis having the best result. These analyses will aid in achieving sustainable air quality management strategies.

  9. Modeling heat stress effect on Holstein cows under hot and dry conditions: selection tools.

    PubMed

    Carabaño, M J; Bachagha, K; Ramón, M; Díaz, C

    2014-12-01

    Data from milk recording of Holstein-Friesian cows together with weather information from 2 regions in Southern Spain were used to define the models that can better describe heat stress response for production traits and somatic cell score (SCS). Two sets of analyses were performed, one aimed at defining the population phenotypic response and the other at studying the genetic components. The first involved 2,514,762 test-day records from up to 5 lactations of 128,112 cows. Two models, one fitting a comfort threshold for temperature and a slope of decay after the threshold, and the other a cubic Legendre polynomial (LP) model were tested. Average (TAVE) and maximum daily temperatures were alternatively considered as covariates. The LP model using TAVE as covariate showed the best goodness of fit for all traits. Estimated rates of decay from this model for production at 25 and 34°C were 36 and 170, 3.8 and 3.0, and 3.9 and 8.2g/d per degree Celsius for milk, fat, and protein yield, respectively. In the second set of analyses, a sample of 280,958 test-day records from first lactations of 29,114 cows was used. Random regression models including quadratic or cubic LP regressions (TEM_) on TAVE or a fixed threshold and an unknown slope (DUMMY), including or not cubic regressions on days in milk (DIM3_), were tested. For milk and SCS, the best models were the DIM3_ models. In contrast, for fat and protein yield, the best model was TEM3. The DIM3DUMMY models showed similar performance to DIM3TEM3. The estimated genetic correlations between the same trait under cold and hot temperatures (ρ) indicated the existence of a large genotype by environment interaction for fat (ρ=0.53 for model TEM3) and protein yield (ρ around 0.6 for DIM3TEM3) and for SCS (ρ=0.64 for model DIM3TEM3), and a small genotype by environment interaction for milk (ρ over 0.8). The eigendecomposition of the additive genetic covariance matrix from model TEM3 showed the existence of a dominant component, a constant term that is not affected by temperature, representing from 64% of the variation for SCS to 91% of the variation for milk. The second component, showing a flat pattern at intermediate temperatures and increasing or decreasing slopes for the extremes, gathered 15, 11, and 24% of the variation for fat and protein yield and SCS, respectively. This component could be further evaluated as a selection criterion for heat tolerance independently of the production level. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  10. A questionnaire-wide association study of personality and mortality: the Vietnam Experience Study.

    PubMed

    Weiss, Alexander; Gale, Catharine R; Batty, G David; Deary, Ian J

    2013-06-01

    We examined the association between the Minnesota Multiphasic Personality Inventory (MMPI) and all-cause mortality in 4462 middle-aged Vietnam-era veterans. We split the study population into half-samples. In each half, we used proportional hazards (Cox) regression to test the 550 MMPI items' associations with mortality over 15years. In all participants, we subjected significant (p<.01) items in both halves to principal-components analysis (PCA). We used Cox regression to test whether these components predicted mortality when controlling for other predictors (demographics, cognitive ability, health behaviors, and mental/physical health). Eighty-nine items were associated with mortality in both half-samples. PCA revealed Neuroticism/Negative Affectivity, Somatic Complaints, Psychotic/Paranoia, and Antisocial components, and a higher-order component, Personal Disturbance. Individually, Neuroticism/Negative Affectivity (HR=1.55; 95% CI=1.39, 1.72), Somatic Complaints (HR=1.66; 95% CI=1.52, 1.80), Psychotic/Paranoid (HR=1.44; 95% CI=1.32, 1.57), Antisocial (HR=1.79; 95% CI=1.59, 2.01), and Personal Disturbance (HR=1.74; 95% CI=1.58, 1.91) were associated with risk. Including covariates attenuated these associations (28.4 to 54.5%), though they were still significant. After entering Personal Disturbance into models with each component, Neuroticism/Negative Affectivity and Somatic Complaints were significant, although Neuroticism/Negative Affectivity's were now protective (HR=0.73; 95% CI=0.58, 0.92). When the four components were entered together with or without covariates, Somatic Complaints and Antisocial were significant risk factors. Somatic Complaints and Personal Disturbance are associated with increased mortality risk. Other components' effects varied as a function of variables in the model. Copyright © 2013 Elsevier Inc. All rights reserved.

  11. Protein-coding genes combined with long noncoding RNA as a novel transcriptome molecular staging model to predict the survival of patients with esophageal squamous cell carcinoma.

    PubMed

    Guo, Jin-Cheng; Wu, Yang; Chen, Yang; Pan, Feng; Wu, Zhi-Yong; Zhang, Jia-Sheng; Wu, Jian-Yi; Xu, Xiu-E; Zhao, Jian-Mei; Li, En-Min; Zhao, Yi; Xu, Li-Yan

    2018-04-09

    Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal carcinoma in China. This study was to develop a staging model to predict outcomes of patients with ESCC. Using Cox regression analysis, principal component analysis (PCA), partitioning clustering, Kaplan-Meier analysis, receiver operating characteristic (ROC) curve analysis, and classification and regression tree (CART) analysis, we mined the Gene Expression Omnibus database to determine the expression profiles of genes in 179 patients with ESCC from GSE63624 and GSE63622 dataset. Univariate cox regression analysis of the GSE63624 dataset revealed that 2404 protein-coding genes (PCGs) and 635 long non-coding RNAs (lncRNAs) were associated with the survival of patients with ESCC. PCA categorized these PCGs and lncRNAs into three principal components (PCs), which were used to cluster the patients into three groups. ROC analysis demonstrated that the predictive ability of PCG-lncRNA PCs when applied to new patients was better than that of the tumor-node-metastasis staging (area under ROC curve [AUC]: 0.69 vs. 0.65, P < 0.05). Accordingly, we constructed a molecular disaggregated model comprising one lncRNA and two PCGs, which we designated as the LSB staging model using CART analysis in the GSE63624 dataset. This LSB staging model classified the GSE63622 dataset of patients into three different groups, and its effectiveness was validated by analysis of another cohort of 105 patients. The LSB staging model has clinical significance for the prognosis prediction of patients with ESCC and may serve as a three-gene staging microarray.

  12. Regression approaches in the test-negative study design for assessment of influenza vaccine effectiveness.

    PubMed

    Bond, H S; Sullivan, S G; Cowling, B J

    2016-06-01

    Influenza vaccination is the most practical means available for preventing influenza virus infection and is widely used in many countries. Because vaccine components and circulating strains frequently change, it is important to continually monitor vaccine effectiveness (VE). The test-negative design is frequently used to estimate VE. In this design, patients meeting the same clinical case definition are recruited and tested for influenza; those who test positive are the cases and those who test negative form the comparison group. When determining VE in these studies, the typical approach has been to use logistic regression, adjusting for potential confounders. Because vaccine coverage and influenza incidence change throughout the season, time is included among these confounders. While most studies use unconditional logistic regression, adjusting for time, an alternative approach is to use conditional logistic regression, matching on time. Here, we used simulation data to examine the potential for both regression approaches to permit accurate and robust estimates of VE. In situations where vaccine coverage changed during the influenza season, the conditional model and unconditional models adjusting for categorical week and using a spline function for week provided more accurate estimates. We illustrated the two approaches on data from a test-negative study of influenza VE against hospitalization in children in Hong Kong which resulted in the conditional logistic regression model providing the best fit to the data.

  13. Random regression models using Legendre orthogonal polynomials to evaluate the milk production of Alpine goats.

    PubMed

    Silva, F G; Torres, R A; Brito, L F; Euclydes, R F; Melo, A L P; Souza, N O; Ribeiro, J I; Rodrigues, M T

    2013-12-11

    The objective of this study was to identify the best random regression model using Legendre orthogonal polynomials to evaluate Alpine goats genetically and to estimate the parameters for test day milk yield. On the test day, we analyzed 20,710 records of milk yield of 667 goats from the Goat Sector of the Universidade Federal de Viçosa. The evaluated models had combinations of distinct fitting orders for polynomials (2-5), random genetic (1-7), and permanent environmental (1-7) fixed curves and a number of classes for residual variance (2, 4, 5, and 6). WOMBAT software was used for all genetic analyses. A random regression model using the best Legendre orthogonal polynomial for genetic evaluation of milk yield on the test day of Alpine goats considered a fixed curve of order 4, curve of genetic additive effects of order 2, curve of permanent environmental effects of order 7, and a minimum of 5 classes of residual variance because it was the most economical model among those that were equivalent to the complete model by the likelihood ratio test. Phenotypic variance and heritability were higher at the end of the lactation period, indicating that the length of lactation has more genetic components in relation to the production peak and persistence. It is very important that the evaluation utilizes the best combination of fixed, genetic additive and permanent environmental regressions, and number of classes of heterogeneous residual variance for genetic evaluation using random regression models, thereby enhancing the precision and accuracy of the estimates of parameters and prediction of genetic values.

  14. Hybrid Support Vector Regression and Autoregressive Integrated Moving Average Models Improved by Particle Swarm Optimization for Property Crime Rates Forecasting with Economic Indicators

    PubMed Central

    Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina

    2013-01-01

    Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729

  15. Hybrid support vector regression and autoregressive integrated moving average models improved by particle swarm optimization for property crime rates forecasting with economic indicators.

    PubMed

    Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina

    2013-01-01

    Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.

  16. Genetic parameters of legendre polynomials for first parity lactation curves.

    PubMed

    Pool, M H; Janss, L L; Meuwissen, T H

    2000-11-01

    Variance components of the covariance function coefficients in a random regression test-day model were estimated by Legendre polynomials up to a fifth order for first-parity records of Dutch dairy cows using Gibbs sampling. Two Legendre polynomials of equal order were used to model the random part of the lactation curve, one for the genetic component and one for permanent environment. Test-day records from cows registered between 1990 to 1996 and collected by regular milk recording were available. For the data set, 23,700 complete lactations were selected from 475 herds sired by 262 sires. Because the application of a random regression model is limited by computing capacity, we investigated the minimum order needed to fit the variance structure in the data sufficiently. Predictions of genetic and permanent environmental variance structures were compared with bivariate estimates on 30-d intervals. A third-order or higher polynomial modeled the shape of variance curves over DIM with sufficient accuracy for the genetic and permanent environment part. Also, the genetic correlation structure was fitted with sufficient accuracy by a third-order polynomial, but, for the permanent environmental component, a fourth order was needed. Because equal orders are suggested in the literature, a fourth-order Legendre polynomial is recommended in this study. However, a rank of three for the genetic covariance matrix and of four for permanent environment allows a simpler covariance function with a reduced number of parameters based on the eigenvalues and eigenvectors.

  17. Solving a mixture of many random linear equations by tensor decomposition and alternating minimization.

    DOT National Transportation Integrated Search

    2016-09-01

    We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample...

  18. Prediction of sweetness and amino acid content in soybean crops from hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Monteiro, Sildomar Takahashi; Minekawa, Yohei; Kosugi, Yukio; Akazawa, Tsuneya; Oda, Kunio

    Hyperspectral image data provides a powerful tool for non-destructive crop analysis. This paper investigates a hyperspectral image data-processing method to predict the sweetness and amino acid content of soybean crops. Regression models based on artificial neural networks were developed in order to calculate the level of sucrose, glucose, fructose, and nitrogen concentrations, which can be related to the sweetness and amino acid content of vegetables. A performance analysis was conducted comparing regression models obtained using different preprocessing methods, namely, raw reflectance, second derivative, and principal components analysis. This method is demonstrated using high-resolution hyperspectral data of wavelengths ranging from the visible to the near infrared acquired from an experimental field of green vegetable soybeans. The best predictions were achieved using a nonlinear regression model of the second derivative transformed dataset. Glucose could be predicted with greater accuracy, followed by sucrose, fructose and nitrogen. The proposed method provides the possibility to provide relatively accurate maps predicting the chemical content of soybean crop fields.

  19. Identifying Nanoscale Structure-Function Relationships Using Multimodal Atomic Force Microscopy, Dimensionality Reduction, and Regression Techniques.

    PubMed

    Kong, Jessica; Giridharagopal, Rajiv; Harrison, Jeffrey S; Ginger, David S

    2018-05-31

    Correlating nanoscale chemical specificity with operational physics is a long-standing goal of functional scanning probe microscopy (SPM). We employ a data analytic approach combining multiple microscopy modes, using compositional information in infrared vibrational excitation maps acquired via photoinduced force microscopy (PiFM) with electrical information from conductive atomic force microscopy. We study a model polymer blend comprising insulating poly(methyl methacrylate) (PMMA) and semiconducting poly(3-hexylthiophene) (P3HT). We show that PiFM spectra are different from FTIR spectra, but can still be used to identify local composition. We use principal component analysis to extract statistically significant principal components and principal component regression to predict local current and identify local polymer composition. In doing so, we observe evidence of semiconducting P3HT within PMMA aggregates. These methods are generalizable to correlated SPM data and provide a meaningful technique for extracting complex compositional information that are impossible to measure from any one technique.

  20. Interstellar matter in early-type galaxies. II - The relationship between gaseous components and galaxy types

    NASA Technical Reports Server (NTRS)

    Bregman, Joel N.; Hogg, David E.; Roberts, Morton S.

    1992-01-01

    Interstellar components of early-type galaxies are established by galactic type and luminosity in order to search for relationships between the different interstellar components and to test the predictions of theoretical models. Some of the data include observations of neutral hydrogen, carbon monoxide, and radio continuum emission. An alternative distance model which yields LX varies as LB sup 2.45, a relation which is in conflict with simple cooling flow models, is discussed. The dispersion of the X-ray luminosity about this regression line is unlikely to result from stripping. The striking lack of clear correlations between hot and cold interstellar components, taken together with their morphologies, suggests that the cold gas is a disk phenomenon while the hot gas is a bulge phenomenon, with little interaction between the two. The progression of galaxy type from E to Sa is not only a sequence of decreasing stellar bulge-to-disk ratio, but also of hot-to-cold-gas ratio.

  1. Isolating the cow-specific part of residual energy intake in lactating dairy cows using random regressions.

    PubMed

    Fischer, A; Friggens, N C; Berry, D P; Faverdin, P

    2018-07-01

    The ability to properly assess and accurately phenotype true differences in feed efficiency among dairy cows is key to the development of breeding programs for improving feed efficiency. The variability among individuals in feed efficiency is commonly characterised by the residual intake approach. Residual feed intake is represented by the residuals of a linear regression of intake on the corresponding quantities of the biological functions that consume (or release) energy. However, the residuals include both, model fitting and measurement errors as well as any variability in cow efficiency. The objective of this study was to isolate the individual animal variability in feed efficiency from the residual component. Two separate models were fitted, in one the standard residual energy intake (REI) was calculated as the residual of a multiple linear regression of lactation average net energy intake (NEI) on lactation average milk energy output, average metabolic BW, as well as lactation loss and gain of body condition score. In the other, a linear mixed model was used to simultaneously fit fixed linear regressions and random cow levels on the biological traits and intercept using fortnight repeated measures for the variables. This method split the predicted NEI in two parts: one quantifying the population mean intercept and coefficients, and one quantifying cow-specific deviations in the intercept and coefficients. The cow-specific part of predicted NEI was assumed to isolate true differences in feed efficiency among cows. NEI and associated energy expenditure phenotypes were available for the first 17 fortnights of lactation from 119 Holstein cows; all fed a constant energy-rich diet. Mixed models fitting cow-specific intercept and coefficients to different combinations of the aforementioned energy expenditure traits, calculated on a fortnightly basis, were compared. The variance of REI estimated with the lactation average model represented only 8% of the variance of measured NEI. Among all compared mixed models, the variance of the cow-specific part of predicted NEI represented between 53% and 59% of the variance of REI estimated from the lactation average model or between 4% and 5% of the variance of measured NEI. The remaining 41% to 47% of the variance of REI estimated with the lactation average model may therefore reflect model fitting errors or measurement errors. In conclusion, the use of a mixed model framework with cow-specific random regressions seems to be a promising method to isolate the cow-specific component of REI in dairy cows.

  2. Development of Regional Power Sector Coal Fuel Costs (Prices) for the Short-Term Energy Outlook (STEO) Model

    EIA Publications

    2017-01-01

    The U.S. Energy Information Administration's Short-Term Energy Outlook (STEO) produces monthly projections of energy supply, demand, trade, and prices over a 13-24 month period. Every January, the forecast horizon is extended through December of the following year. The STEO model is an integrated system of econometric regression equations and identities that link data on the various components of the U.S. energy industry together in order to develop consistent forecasts. The regression equations are estimated and the STEO model is solved using the EViews 9.5 econometric software package from IHS Global Inc. The model consists of various modules specific to each energy resource. All modules provide projections for the United States, and some modules provide more detailed forecasts for different regions of the country.

  3. Model-Based Clustering of Regression Time Series Data via APECM -- An AECM Algorithm Sung to an Even Faster Beat

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Wei-Chen; Maitra, Ranjan

    2011-01-01

    We propose a model-based approach for clustering time series regression data in an unsupervised machine learning framework to identify groups under the assumption that each mixture component follows a Gaussian autoregressive regression model of order p. Given the number of groups, the traditional maximum likelihood approach of estimating the parameters using the expectation-maximization (EM) algorithm can be employed, although it is computationally demanding. The somewhat fast tune to the EM folk song provided by the Alternating Expectation Conditional Maximization (AECM) algorithm can alleviate the problem to some extent. In this article, we develop an alternative partial expectation conditional maximization algorithmmore » (APECM) that uses an additional data augmentation storage step to efficiently implement AECM for finite mixture models. Results on our simulation experiments show improved performance in both fewer numbers of iterations and computation time. The methodology is applied to the problem of clustering mutual funds data on the basis of their average annual per cent returns and in the presence of economic indicators.« less

  4. Harvest-time prediction of apple physiological indices using fiber optic Fourier transform near-infrared spectrometer

    NASA Astrophysics Data System (ADS)

    Liu, Yande; Ying, Yibin; Lu, Huishan; Fu, Xiaping

    2004-12-01

    This work evaluates the feasibility of Fourier transform near infrared (FT-NIR) spectrometry for rapid determining the total soluble solids content and acidity of apple fruit. Intact apple fruit were measured by reflectance FT-NIR in 800-2500 nm range. FT-NIR models were developed based on partial least square (PLS) regression and principal component regress (PCR) with respect to the reflectance and its first derivative, the logarithms of the reflectance reciprocal and its second derivative. The above regression models, related the FT-NIR spectra to soluble solids content (SSC), titratable acidity (TA) and available acidity (pH). The best combination, based on the prediction results, was PLS models with respect to the logarithms of the reflectance reciprocal. Predictions with PLS models resulted standard errors of prediction (SEP) of 0.455, 0.044 and 0.068, and correlation coefficients of 0.968, 0.728 and 0.831 for SSC, TA and pH, respectively. It was concluded that by using the FT-NIR spectrometry measurement system, in the appropriate spectral range, it is possible to nondestructively assess the maturity factors of apple fruit.

  5. The Application of Censored Regression Models in Low Streamflow Analyses

    NASA Astrophysics Data System (ADS)

    Kroll, C.; Luz, J.

    2003-12-01

    Estimation of low streamflow statistics at gauged and ungauged river sites is often a daunting task. This process is further confounded by the presence of intermittent streamflows, where streamflow is sometimes reported as zero, within a region. Streamflows recorded as zero may be zero, or may be less than the measurement detection limit. Such data is often referred to as censored data. Numerous methods have been developed to characterize intermittent streamflow series. Logit regression has been proposed to develop regional models of the probability annual lowflows series (such as 7-day lowflows) are zero. In addition, Tobit regression, a method of regression that allows for censored dependent variables, has been proposed for lowflow regional regression models in regions where the lowflow statistic of interest estimated as zero at some sites in the region. While these methods have been proposed, their use in practice has been limited. Here a delete-one jackknife simulation is presented to examine the performance of Logit and Tobit models of 7-day annual minimum flows in 6 USGS water resource regions in the United States. For the Logit model, an assessment is made of whether sites are correctly classified as having at least 10% of 7-day annual lowflows equal to zero. In such a situation, the 7-day, 10-year lowflow (Q710), a commonly employed low streamflow statistic, would be reported as zero. For the Tobit model, a comparison is made between results from the Tobit model, and from performing either ordinary least squares (OLS) or principal component regression (PCR) after the zero sites are dropped from the analysis. Initial results for the Logit model indicate this method to have a high probability of correctly classifying sites into groups with Q710s as zero and non-zero. Initial results also indicate the Tobit model produces better results than PCR and OLS when more than 5% of the sites in the region have Q710 values calculated as zero.

  6. Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data

    PubMed Central

    Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian

    2015-01-01

    In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213

  7. SPATIAL ANALYSIS OF AIR POLLUTION AND DEVELOPMENT OF A LAND-USE REGRESSION ( LUR ) MODEL IN AN URBAN AIRSHED

    EPA Science Inventory

    The Detroit Children's Health Study is an epidemiologic study examining associations between chronic ambient environmental exposures to gaseous air pollutants and respiratory health outcomes among elementary school-age children in an urban airshed. The exposure component of this...

  8. Drought Patterns Forecasting using an Auto-Regressive Logistic Model

    NASA Astrophysics Data System (ADS)

    del Jesus, M.; Sheffield, J.; Méndez Incera, F. J.; Losada, I. J.; Espejo, A.

    2014-12-01

    Drought is characterized by a water deficit that may manifest across a large range of spatial and temporal scales. Drought may create important socio-economic consequences, many times of catastrophic dimensions. A quantifiable definition of drought is elusive because depending on its impacts, consequences and generation mechanism, different water deficit periods may be identified as a drought by virtue of some definitions but not by others. Droughts are linked to the water cycle and, although a climate change signal may not have emerged yet, they are also intimately linked to climate.In this work we develop an auto-regressive logistic model for drought prediction at different temporal scales that makes use of a spatially explicit framework. Our model allows to include covariates, continuous or categorical, to improve the performance of the auto-regressive component.Our approach makes use of dimensionality reduction (principal component analysis) and classification techniques (K-Means and maximum dissimilarity) to simplify the representation of complex climatic patterns, such as sea surface temperature (SST) and sea level pressure (SLP), while including information on their spatial structure, i.e. considering their spatial patterns. This procedure allows us to include in the analysis multivariate representation of complex climatic phenomena, as the El Niño-Southern Oscillation. We also explore the impact of other climate-related variables such as sun spots. The model allows to quantify the uncertainty of the forecasts and can be easily adapted to make predictions under future climatic scenarios. The framework herein presented may be extended to other applications such as flash flood analysis, or risk assessment of natural hazards.

  9. Bayesian semi-parametric analysis of Poisson change-point regression models: application to policy making in Cali, Colombia.

    PubMed

    Park, Taeyoung; Krafty, Robert T; Sánchez, Alvaro I

    2012-07-27

    A Poisson regression model with an offset assumes a constant baseline rate after accounting for measured covariates, which may lead to biased estimates of coefficients in an inhomogeneous Poisson process. To correctly estimate the effect of time-dependent covariates, we propose a Poisson change-point regression model with an offset that allows a time-varying baseline rate. When the nonconstant pattern of a log baseline rate is modeled with a nonparametric step function, the resulting semi-parametric model involves a model component of varying dimension and thus requires a sophisticated varying-dimensional inference to obtain correct estimates of model parameters of fixed dimension. To fit the proposed varying-dimensional model, we devise a state-of-the-art MCMC-type algorithm based on partial collapse. The proposed model and methods are used to investigate an association between daily homicide rates in Cali, Colombia and policies that restrict the hours during which the legal sale of alcoholic beverages is permitted. While simultaneously identifying the latent changes in the baseline homicide rate which correspond to the incidence of sociopolitical events, we explore the effect of policies governing the sale of alcohol on homicide rates and seek a policy that balances the economic and cultural dependencies on alcohol sales to the health of the public.

  10. Influence factors and forecast of carbon emission in China: structure adjustment for emission peak

    NASA Astrophysics Data System (ADS)

    Wang, B.; Cui, C. Q.; Li, Z. P.

    2018-02-01

    This paper introduced Principal Component Analysis and Multivariate Linear Regression Model to verify long-term balance relationships between Carbon Emissions and the impact factors. The integrated model of improved PCA and multivariate regression analysis model is attainable to figure out the pattern of carbon emission sources. Main empirical results indicate that among all selected variables, the role of energy consumption scale was largest. GDP and Population follow and also have significant impacts on carbon emission. Industrialization rate and fossil fuel proportion, which is the indicator of reflecting the economic structure and energy structure, have a higher importance than the factor of urbanization rate and the dweller consumption level of urban areas. In this way, some suggestions are put forward for government to achieve the peak of carbon emissions.

  11. Membrane Introduction Mass Spectrometry Combined with an Orthogonal Partial-Least Squares Calibration Model for Mixture Analysis.

    PubMed

    Li, Min; Zhang, Lu; Yao, Xiaolong; Jiang, Xingyu

    2017-01-01

    The emerging membrane introduction mass spectrometry technique has been successfully used to detect benzene, toluene, ethyl benzene and xylene (BTEX), while overlapped spectra have unfortunately hindered its further application to the analysis of mixtures. Multivariate calibration, an efficient method to analyze mixtures, has been widely applied. In this paper, we compared univariate and multivariate analyses for quantification of the individual components of mixture samples. The results showed that the univariate analysis creates poor models with regression coefficients of 0.912, 0.867, 0.440 and 0.351 for BTEX, respectively. For multivariate analysis, a comparison to the partial-least squares (PLS) model shows that the orthogonal partial-least squares (OPLS) regression exhibits an optimal performance with regression coefficients of 0.995, 0.999, 0.980 and 0.976, favorable calibration parameters (RMSEC and RMSECV) and a favorable validation parameter (RMSEP). Furthermore, the OPLS exhibits a good recovery of 73.86 - 122.20% and relative standard deviation (RSD) of the repeatability of 1.14 - 4.87%. Thus, MIMS coupled with the OPLS regression provides an optimal approach for a quantitative BTEX mixture analysis in monitoring and predicting water pollution.

  12. Statistical downscaling modeling with quantile regression using lasso to estimate extreme rainfall

    NASA Astrophysics Data System (ADS)

    Santri, Dewi; Wigena, Aji Hamim; Djuraidah, Anik

    2016-02-01

    Rainfall is one of the climatic elements with high diversity and has many negative impacts especially extreme rainfall. Therefore, there are several methods that required to minimize the damage that may occur. So far, Global circulation models (GCM) are the best method to forecast global climate changes include extreme rainfall. Statistical downscaling (SD) is a technique to develop the relationship between GCM output as a global-scale independent variables and rainfall as a local- scale response variable. Using GCM method will have many difficulties when assessed against observations because GCM has high dimension and multicollinearity between the variables. The common method that used to handle this problem is principal components analysis (PCA) and partial least squares regression. The new method that can be used is lasso. Lasso has advantages in simultaneuosly controlling the variance of the fitted coefficients and performing automatic variable selection. Quantile regression is a method that can be used to detect extreme rainfall in dry and wet extreme. Objective of this study is modeling SD using quantile regression with lasso to predict extreme rainfall in Indramayu. The results showed that the estimation of extreme rainfall (extreme wet in January, February and December) in Indramayu could be predicted properly by the model at quantile 90th.

  13. [Optimization of processing technology for semen cuscuta by uniform and regression analysis].

    PubMed

    Li, Chun-yu; Luo, Hui-yu; Wang, Shu; Zhai, Ya-nan; Tian, Shu-hui; Zhang, Dan-shen

    2011-02-01

    To optimize the best preparation technology for the contains of total flavornoids, polysaccharides, the percentage of water and alcohol-soluble components in Semen Cuscuta herb processing. UV-spectrophotometry was applied to determine the contains of total flavornoids and polysaccharides, which were extracted from Semen Cuscuta. And the processing was optimized by the way of uniform design and contour map. The best preparation technology was satisfied with some conditions as follows: baking temperature 150 degrees C, baking time 140 seconds. The regression models are notable and reasonable, which can forecast results precisely.

  14. A hybrid model for PM₂.₅ forecasting based on ensemble empirical mode decomposition and a general regression neural network.

    PubMed

    Zhou, Qingping; Jiang, Haiyan; Wang, Jianzhou; Zhou, Jianling

    2014-10-15

    Exposure to high concentrations of fine particulate matter (PM₂.₅) can cause serious health problems because PM₂.₅ contains microscopic solid or liquid droplets that are sufficiently small to be ingested deep into human lungs. Thus, daily prediction of PM₂.₅ levels is notably important for regulatory plans that inform the public and restrict social activities in advance when harmful episodes are foreseen. A hybrid EEMD-GRNN (ensemble empirical mode decomposition-general regression neural network) model based on data preprocessing and analysis is firstly proposed in this paper for one-day-ahead prediction of PM₂.₅ concentrations. The EEMD part is utilized to decompose original PM₂.₅ data into several intrinsic mode functions (IMFs), while the GRNN part is used for the prediction of each IMF. The hybrid EEMD-GRNN model is trained using input variables obtained from principal component regression (PCR) model to remove redundancy. These input variables accurately and succinctly reflect the relationships between PM₂.₅ and both air quality and meteorological data. The model is trained with data from January 1 to November 1, 2013 and is validated with data from November 2 to November 21, 2013 in Xi'an Province, China. The experimental results show that the developed hybrid EEMD-GRNN model outperforms a single GRNN model without EEMD, a multiple linear regression (MLR) model, a PCR model, and a traditional autoregressive integrated moving average (ARIMA) model. The hybrid model with fast and accurate results can be used to develop rapid air quality warning systems. Copyright © 2014 Elsevier B.V. All rights reserved.

  15. Study on Hyperspectral Estimation Model of Total Nitrogen Content in Soil of Shaanxi Province

    NASA Astrophysics Data System (ADS)

    Liu, Jinbao; Dong, Zhenyu; Chen, Xi

    2018-01-01

    The development of hyperspectral remote sensing technology has been widely used in soil nutrient prediction. The soil is the representative soil type in Shaanxi Province. In this study, the soil total nitrogen content in Shaanxi soil was used as the research target, and the soil samples were measured by reflectance spectroscopy using ASD method. Pre-treatment, the first order differential, second order differential and reflectance logarithmic transformation of the reflected spectrum after pre-treatment, and the hyperspectral estimation model is established by using the least squares regression method and the principal component regression method. The results show that the correlation between the reflectance spectrum and the total nitrogen content of the soil is significantly improved. The correlation coefficient between the original reflectance and soil total nitrogen content is in the range of 350 ~ 2500nm. The correlation coefficient of soil total nitrogen content and first deviation of reflectance is more than 0.5 at 142nm, 1963nm, 2204nm and 2307nm, the second deviation has a significant positive correlation at 1114nm, 1470nm, 1967nm, 2372nm and 2402nm, respectively. After the reciprocal logarithmic transformation of the reflectance with the total nitrogen content of the correlation analysis found that the effect is not obvious. Rc2 = 0.7102, RMSEC = 0.0788; Rv2 = 0.8480, RMSEP = 0.0663, which can achieve the rapid prediction of the total nitrogen content in the region. The results show that the principal component regression model is the best.

  16. MRI morphometry of mamillary bodies, caudate nuclei, and prefrontal cortices after chemotherapy for childhood leukemia: multivariate models of early and late developing memory subsystems.

    PubMed

    Ciesielski, K T; Lesnik, P G; Benzel, E C; Hart, B L; Sanders, J A

    1999-06-01

    Neurotoxic intrathecal chemotherapy for childhood acute lymphoblastic leukemia (ALL) affects developing structures and functions of memory and learning subsystems selectively. Results show significant reductions in magnetic resonance imaging morphometry of mamillary bodies, components of the corticolimbic-diencephalic subsystem subserving functionally later developing, single-trial memory, nonsignificant changes in bilateral heads of the caudate nuclei, components of the corticostriatal subsystem subserving functionally earlier developing, multitrial learning, significant reductions in prefrontal cortical volume, visual and verbal single-trial memory deficits, and visuospatial, but not verbal, multitrial learning deficits. Multiple regression models provide evidence for partial dissociation and connectivity between the subsystems, and suggest that greater involvement of caudate may compensate for inefficient corticolimbic-diencephalic components.

  17. Soil moisture sensitivity of autotrophic and heterotrophic forest floor respiration in boreal xeric pine and mesic spruce forests

    NASA Astrophysics Data System (ADS)

    Ťupek, Boris; Launiainen, Samuli; Peltoniemi, Mikko; Heikkinen, Jukka; Lehtonen, Aleksi

    2016-04-01

    Litter decomposition rates of the most process based soil carbon models affected by environmental conditions are linked with soil heterotrophic CO2 emissions and serve for estimating soil carbon sequestration; thus due to the mass balance equation the variation in measured litter inputs and measured heterotrophic soil CO2 effluxes should indicate soil carbon stock changes, needed by soil carbon management for mitigation of anthropogenic CO2 emissions, if sensitivity functions of the applied model suit to the environmental conditions e.g. soil temperature and moisture. We evaluated the response forms of autotrophic and heterotrophic forest floor respiration to soil temperature and moisture in four boreal forest sites of the International Cooperative Programme on Assessment and Monitoring of Air Pollution Effects on Forests (ICP Forests) by a soil trenching experiment during year 2015 in southern Finland. As expected both autotrophic and heterotrophic forest floor respiration components were primarily controlled by soil temperature and exponential regression models generally explained more than 90% of the variance. Soil moisture regression models on average explained less than 10% of the variance and the response forms varied between Gaussian for the autotrophic forest floor respiration component and linear for the heterotrophic forest floor respiration component. Although the percentage of explained variance of soil heterotrophic respiration by the soil moisture was small, the observed reduction of CO2 emissions with higher moisture levels suggested that soil moisture response of soil carbon models not accounting for the reduction due to excessive moisture should be re-evaluated in order to estimate right levels of soil carbon stock changes. Our further study will include evaluation of process based soil carbon models by the annual heterotrophic respiration and soil carbon stocks.

  18. Structural Time Series Model for El Niño Prediction

    NASA Astrophysics Data System (ADS)

    Petrova, Desislava; Koopman, Siem Jan; Ballester, Joan; Rodo, Xavier

    2015-04-01

    ENSO is a dominant feature of climate variability on inter-annual time scales destabilizing weather patterns throughout the globe, and having far-reaching socio-economic consequences. It does not only lead to extensive rainfall and flooding in some regions of the world, and anomalous droughts in others, thus ruining local agriculture, but also substantially affects the marine ecosystems and the sustained exploitation of marine resources in particular coastal zones, especially the Pacific South American coast. As a result, forecasting of ENSO and especially of the warm phase of the oscillation (El Niño/EN) has long been a subject of intense research and improvement. Thus, the present study explores a novel method for the prediction of the Niño 3.4 index. In the state-of-the-art the advantageous statistical modeling approach of Structural Time Series Analysis has not been applied. Therefore, we have developed such a model using a State Space approach for the unobserved components of the time series. Its distinguishing feature is that observations consist of various components - level, seasonality, cycle, disturbance, and regression variables incorporated as explanatory covariates. These components are aimed at capturing the various modes of variability of the N3.4 time series. They are modeled separately, then combined in a single model for analysis and forecasting. Customary statistical ENSO prediction models essentially use SST, SLP and wind stress in the equatorial Pacific. We introduce new regression variables - subsurface ocean temperature in the western equatorial Pacific, motivated by recent (Ramesh and Murtugudde, 2012) and classical research (Jin, 1997), (Wyrtki, 1985), showing that subsurface processes and heat accumulation there are fundamental for initiation of an El Niño event; and a southern Pacific temperature-difference tracer, the Rossbell dipole, leading EN by about nine months (Ballester, 2011).

  19. QSPR models for half-wave reduction potential of steroids: a comparative study between feature selection and feature extraction from subsets of or entire set of descriptors.

    PubMed

    Hemmateenejad, Bahram; Yazdani, Mahdieh

    2009-02-16

    Steroids are widely distributed in nature and are found in plants, animals, and fungi in abundance. A data set consists of a diverse set of steroids have been used to develop quantitative structure-electrochemistry relationship (QSER) models for their half-wave reduction potential. Modeling was established by means of multiple linear regression (MLR) and principle component regression (PCR) analyses. In MLR analysis, the QSPR models were constructed by first grouping descriptors and then stepwise selection of variables from each group (MLR1) and stepwise selection of predictor variables from the pool of all calculated descriptors (MLR2). Similar procedure was used in PCR analysis so that the principal components (or features) were extracted from different group of descriptors (PCR1) and from entire set of descriptors (PCR2). The resulted models were evaluated using cross-validation, chance correlation, application to prediction reduction potential of some test samples and accessing applicability domain. Both MLR approaches represented accurate results however the QSPR model found by MLR1 was statistically more significant. PCR1 approach produced a model as accurate as MLR approaches whereas less accurate results were obtained by PCR2 approach. In overall, the correlation coefficients of cross-validation and prediction of the QSPR models resulted from MLR1, MLR2 and PCR1 approaches were higher than 90%, which show the high ability of the models to predict reduction potential of the studied steroids.

  20. Independent variable complexity for regional regression of the flow duration curve in ungauged basins

    NASA Astrophysics Data System (ADS)

    Fouad, Geoffrey; Skupin, André; Hope, Allen

    2016-04-01

    The flow duration curve (FDC) is one of the most widely used tools to quantify streamflow. Its percentile flows are often required for water resource applications, but these values must be predicted for ungauged basins with insufficient or no streamflow data. Regional regression is a commonly used approach for predicting percentile flows that involves identifying hydrologic regions and calibrating regression models to each region. The independent variables used to describe the physiographic and climatic setting of the basins are a critical component of regional regression, yet few studies have investigated their effect on resulting predictions. In this study, the complexity of the independent variables needed for regional regression is investigated. Different levels of variable complexity are applied for a regional regression consisting of 918 basins in the US. Both the hydrologic regions and regression models are determined according to the different sets of variables, and the accuracy of resulting predictions is assessed. The different sets of variables include (1) a simple set of three variables strongly tied to the FDC (mean annual precipitation, potential evapotranspiration, and baseflow index), (2) a traditional set of variables describing the average physiographic and climatic conditions of the basins, and (3) a more complex set of variables extending the traditional variables to include statistics describing the distribution of physiographic data and temporal components of climatic data. The latter set of variables is not typically used in regional regression, and is evaluated for its potential to predict percentile flows. The simplest set of only three variables performed similarly to the other more complex sets of variables. Traditional variables used to describe climate, topography, and soil offered little more to the predictions, and the experimental set of variables describing the distribution of basin data in more detail did not improve predictions. These results are largely reflective of cross-correlation existing in hydrologic datasets, and highlight the limited predictive power of many traditionally used variables for regional regression. A parsimonious approach including fewer variables chosen based on their connection to streamflow may be more efficient than a data mining approach including many different variables. Future regional regression studies may benefit from having a hydrologic rationale for including different variables and attempting to create new variables related to streamflow.

  1. Multivariate methods for indoor PM10 and PM2.5 modelling in naturally ventilated schools buildings

    NASA Astrophysics Data System (ADS)

    Elbayoumi, Maher; Ramli, Nor Azam; Md Yusof, Noor Faizah Fitri; Yahaya, Ahmad Shukri Bin; Al Madhoun, Wesam; Ul-Saufie, Ahmed Zia

    2014-09-01

    In this study the concentrations of PM10, PM2.5, CO and CO2 concentrations and meteorological variables (wind speed, air temperature, and relative humidity) were employed to predict the annual and seasonal indoor concentration of PM10 and PM2.5 using multivariate statistical methods. The data have been collected in twelve naturally ventilated schools in Gaza Strip (Palestine) from October 2011 to May 2012 (academic year). The bivariate correlation analysis showed that the indoor PM10 and PM2.5 were highly positive correlated with outdoor concentration of PM10 and PM2.5. Further, Multiple linear regression (MLR) was used for modelling and R2 values for indoor PM10 were determined as 0.62 and 0.84 for PM10 and PM2.5 respectively. The Performance indicators of MLR models indicated that the prediction for PM10 and PM2.5 annual models were better than seasonal models. In order to reduce the number of input variables, principal component analysis (PCA) and principal component regression (PCR) were applied by using annual data. The predicted R2 were 0.40 and 0.73 for PM10 and PM2.5, respectively. PM10 models (MLR and PCR) show the tendency to underestimate indoor PM10 concentrations as it does not take into account the occupant's activities which highly affect the indoor concentrations during the class hours.

  2. A Questionnaire-Wide Association Study of Personality and Mortality: The Vietnam Experience Study

    PubMed Central

    Weiss, Alexander; Gale, Catharine R.; Batty, G. David; Deary, Ian J.

    2013-01-01

    Objective We examined the association between the Minnesota Multiphasic Personality Inventory (MMPI) and all-cause mortality in 4462 middle-aged Vietnam-era veterans. Methods We split the study population into half samples. In each half, we used proportional hazards (Cox) regression to test the 550 MMPI items’ associations with mortality over 15 years. In all participants, we subjected significant (p < .01) items in both halves to principal-components analysis (PCA). We used Cox regression to test whether these components predicted mortality when controlling for other predictors (demographics, cognitive ability, health behaviors, mental/physical health). Results Eighty-nine items were associated with mortality in both half-samples. PCA revealed Neuroticism/Negative Affectivity, Somatic Complaints, Psychotic/Paranoia, and Antisocial components, and a higher-order component, Personal Disturbance. Individually, Neuroticism/Negative Affectivity (HR = 1.55, 95% CI = 1.39,1.72), Somatic Complaints (HR = 1.66; 95% CI = 1.52,1.80), Psychotic/Paranoid (HR = 1.44; 95% CI = 1.32,1.57), Antisocial (HR = 1.79; 95% CI = 1.59,2.01), and Personal Disturbance (HR = 1.74; 95% CI = 1.58,1.91) were associated with risk. Including covariates attenuated these associations (28.4 to 54.5%), though they were still significant. After entering Personal Disturbance into models with each component, Neuroticism/Negative Affectivity and Somatic Complaints were significant, although Neuroticism/Negative Affectivity’s were now protective (HR = 0.73, 95% CI = 0.58,0.92). When the four components were entered together with or without covariates, Somatic Complaints and Antisocial were significant risk factors. Conclusions Somatic Complaints and Personal Disturbance are associated with increased mortality risk. Other components’ effects varied as a function of variables in the model. PMID:23731751

  3. Orthogonal decomposition of left ventricular remodeling in myocardial infarction

    PubMed Central

    Zhang, Xingyu; Medrano-Gracia, Pau; Ambale-Venkatesh, Bharath; Bluemke, David A.; Cowan, Brett R; Finn, J. Paul; Kadish, Alan H.; Lee, Daniel C.; Lima, Joao A. C.; Young, Alistair A.; Suinesiaputra, Avan

    2017-01-01

    Abstract Left ventricular size and shape are important for quantifying cardiac remodeling in response to cardiovascular disease. Geometric remodeling indices have been shown to have prognostic value in predicting adverse events in the clinical literature, but these often describe interrelated shape changes. We developed a novel method for deriving orthogonal remodeling components directly from any (moderately independent) set of clinical remodeling indices. Results: Six clinical remodeling indices (end-diastolic volume index, sphericity, relative wall thickness, ejection fraction, apical conicity, and longitudinal shortening) were evaluated using cardiac magnetic resonance images of 300 patients with myocardial infarction, and 1991 asymptomatic subjects, obtained from the Cardiac Atlas Project. Partial least squares (PLS) regression of left ventricular shape models resulted in remodeling components that were optimally associated with each remodeling index. A Gram–Schmidt orthogonalization process, by which remodeling components were successively removed from the shape space in the order of shape variance explained, resulted in a set of orthonormal remodeling components. Remodeling scores could then be calculated that quantify the amount of each remodeling component present in each case. A one-factor PLS regression led to more decoupling between scores from the different remodeling components across the entire cohort, and zero correlation between clinical indices and subsequent scores. Conclusions: The PLS orthogonal remodeling components had similar power to describe differences between myocardial infarction patients and asymptomatic subjects as principal component analysis, but were better associated with well-understood clinical indices of cardiac remodeling. The data and analyses are available from www.cardiacatlas.org. PMID:28327972

  4. A metabolism-based whole lake eutrophication model to estimate the magnitude and time scales of the effects of restoration in Upper Klamath Lake, south-central Oregon

    USGS Publications Warehouse

    Wherry, Susan A.; Wood, Tamara M.

    2018-04-27

    A whole lake eutrophication (WLE) model approach for phosphorus and cyanobacterial biomass in Upper Klamath Lake, south-central Oregon, is presented here. The model is a successor to a previous model developed to inform a Total Maximum Daily Load (TMDL) for phosphorus in the lake, but is based on net primary production (NPP), which can be calculated from dissolved oxygen, rather than scaling up a small-scale description of cyanobacterial growth and respiration rates. This phase 3 WLE model is a refinement of the proof-of-concept developed in phase 2, which was the first attempt to use NPP to simulate cyanobacteria in the TMDL model. The calibration of the calculated NPP WLE model was successful, with performance metrics indicating a good fit to calibration data, and the calculated NPP WLE model was able to simulate mid-season bloom decreases, a feature that previous models could not reproduce.In order to use the model to simulate future scenarios based on phosphorus load reduction, a multivariate regression model was created to simulate NPP as a function of the model state variables (phosphorus and chlorophyll a) and measured meteorological and temperature model inputs. The NPP time series was split into a low- and high-frequency component using wavelet analysis, and regression models were fit to the components separately, with moderate success.The regression models for NPP were incorporated in the WLE model, referred to as the “scenario” WLE (SWLE), and the fit statistics for phosphorus during the calibration period were mostly unchanged. The fit statistics for chlorophyll a, however, were degraded. These statistics are still an improvement over prior models, and indicate that the SWLE is appropriate for long-term predictions even though it misses some of the seasonal variations in chlorophyll a.The complete whole lake SWLE model, with multivariate regression to predict NPP, was used to make long-term simulations of the response to 10-, 20-, and 40-percent reductions in tributary nutrient loads. The long-term mean water column concentration of total phosphorus was reduced by 9, 18, and 36 percent, respectively, in response to these load reductions. The long-term water column chlorophyll a concentration was reduced by 4, 13, and 44 percent, respectively. The adjustment to a new equilibrium between the water column and sediments occurred over about 30 years.

  5. Quantitative structure-activity relationship of the curcumin-related compounds using various regression methods

    NASA Astrophysics Data System (ADS)

    Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi

    2016-03-01

    Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.

  6. Hot Swapping Protocol Implementations in the OPNET Modeler Development Environment

    DTIC Science & Technology

    2008-03-01

    components. Unfortunately, this style is not efficient or particularly human–readable. Even purely pedagogical scenarios consisting of a client and a...definition provided by the mock object. sion of this kernel procedure steers all packets sent with op pk deliver() to the unit testing’s specialized...forms of development. Moreover, batteries of unit tests could ship with the accompanying process models and serve as robust regression tests

  7. A regression approach to the mapping of bio-physical characteristics of surface sediment using in situ and airborne hyperspectral acquisitions

    NASA Astrophysics Data System (ADS)

    Ibrahim, Elsy; Kim, Wonkook; Crawford, Melba; Monbaliu, Jaak

    2017-02-01

    Remote sensing has been successfully utilized to distinguish and quantify sediment properties in the intertidal environment. Classification approaches of imagery are popular and powerful yet can lead to site- and case-specific results. Such specificity creates challenges for temporal studies. Thus, this paper investigates the use of regression models to quantify sediment properties instead of classifying them. Two regression approaches, namely multiple regression (MR) and support vector regression (SVR), are used in this study for the retrieval of bio-physical variables of intertidal surface sediment of the IJzermonding, a Belgian nature reserve. In the regression analysis, mud content, chlorophyll a concentration, organic matter content, and soil moisture are estimated using radiometric variables of two airborne sensors, namely airborne hyperspectral sensor (AHS) and airborne prism experiment (APEX) and and using field hyperspectral acquisitions by analytical spectral device (ASD). The performance of the two regression approaches is best for the estimation of moisture content. SVR attains the highest accuracy without feature reduction while MR achieves good results when feature reduction is carried out. Sediment property maps are successfully obtained using the models and hyperspectral imagery where SVR used with all bands achieves the best performance. The study also involves the extraction of weights identifying the contribution of each band of the images in the quantification of each sediment property when MR and principal component analysis are used.

  8. Estimating overall exposure effects for the clustered and censored outcome using random effect Tobit regression models.

    PubMed

    Wang, Wei; Griswold, Michael E

    2016-11-30

    The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Modified multiblock partial least squares path modeling algorithm with backpropagation neural networks approach

    NASA Astrophysics Data System (ADS)

    Yuniarto, Budi; Kurniawan, Robert

    2017-03-01

    PLS Path Modeling (PLS-PM) is different from covariance based SEM, where PLS-PM use an approach based on variance or component, therefore, PLS-PM is also known as a component based SEM. Multiblock Partial Least Squares (MBPLS) is a method in PLS regression which can be used in PLS Path Modeling which known as Multiblock PLS Path Modeling (MBPLS-PM). This method uses an iterative procedure in its algorithm. This research aims to modify MBPLS-PM with Back Propagation Neural Network approach. The result is MBPLS-PM algorithm can be modified using the Back Propagation Neural Network approach to replace the iterative process in backward and forward step to get the matrix t and the matrix u in the algorithm. By modifying the MBPLS-PM algorithm using Back Propagation Neural Network approach, the model parameters obtained are relatively not significantly different compared to model parameters obtained by original MBPLS-PM algorithm.

  10. Inferring the use of forelimb suspensory locomotion by extinct primate species via shape exploration of the ulna.

    PubMed

    Rein, Thomas R; Harvati, Katerina; Harrison, Terry

    2015-01-01

    Uncovering links between skeletal morphology and locomotor behavior is an essential component of paleobiology because it allows researchers to infer the locomotor repertoire of extinct species based on preserved fossils. In this study, we explored ulnar shape in anthropoid primates using 3D geometric morphometrics to discover novel aspects of shape variation that correspond to observed differences in the relative amount of forelimb suspensory locomotion performed by species. The ultimate goal of this research was to construct an accurate predictive model that can be applied to infer the significance of these behaviors. We studied ulnar shape variation in extant species using principal component analysis. Species mainly clustered into phylogenetic groups along the first two principal components. Upon closer examination, the results showed that the position of species within each major clade corresponded closely with the proportion of forelimb suspensory locomotion that they have been observed to perform in nature. We used principal component regression to construct a predictive model for the proportion of these behaviors that would be expected to occur in the locomotor repertoire of anthropoid primates. We then applied this regression analysis to Pliopithecus vindobonensis, a stem catarrhine from the Miocene of central Europe, and found strong evidence that this species was adapted to perform a proportion of forelimb suspensory locomotion similar to that observed in the extant woolly monkey, Lagothrix lagothricha. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. Tests of Alignment among Assessment, Standards, and Instruction Using Generalized Linear Model Regression

    ERIC Educational Resources Information Center

    Fulmer, Gavin W.; Polikoff, Morgan S.

    2014-01-01

    An essential component in school accountability efforts is for assessments to be well-aligned with the standards or curriculum they are intended to measure. However, relatively little prior research has explored methods to determine statistical significance of alignment or misalignment. This study explores analyses of alignment as a special case…

  12. Global Potential Net Prmary Production Predicted from Vegetation Class, Precipitation, and Temperature

    USDA-ARS?s Scientific Manuscript database

    Net Primary Production (NPP), the difference between CO2 fixed by photosynthesis and CO2 lost to autotrophic respiration, is one of the most important components of the carbon cycle. Our goal was to develop a simple regression model to estimate global NPP using climate and land cover data. Approxima...

  13. Nonparametric regression applied to quantitative structure-activity relationships

    PubMed

    Constans; Hirst

    2000-03-01

    Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density designs. Performances were benchmarked against the nonlinear method of smoothing splines. A linear reference point was provided by multilinear regression (MLR). Variable selection was explored using systematic combinations of different variables and combinations of principal components. For the data set examined, 47 inhibitors of dopamine beta-hydroxylase, the additive nonparametric regressors have greater predictive accuracy (as measured by the mean absolute error of the predictions or the Pearson correlation in cross-validation trails) than MLR. The use of principal components did not improve the performance of the nonparametric regressors over use of the original descriptors, since the original descriptors are not strongly correlated. It remains to be seen if the nonparametric regressors can be successfully coupled with better variable selection and dimensionality reduction in the context of high-dimensional QSARs.

  14. Robust colour calibration of an imaging system using a colour space transform and advanced regression modelling.

    PubMed

    Jackman, Patrick; Sun, Da-Wen; Elmasry, Gamal

    2012-08-01

    A new algorithm for the conversion of device dependent RGB colour data into device independent L*a*b* colour data without introducing noticeable error has been developed. By combining a linear colour space transform and advanced multiple regression methodologies it was possible to predict L*a*b* colour data with less than 2.2 colour units of error (CIE 1976). By transforming the red, green and blue colour components into new variables that better reflect the structure of the L*a*b* colour space, a low colour calibration error was immediately achieved (ΔE(CAL) = 14.1). Application of a range of regression models on the data further reduced the colour calibration error substantially (multilinear regression ΔE(CAL) = 5.4; response surface ΔE(CAL) = 2.9; PLSR ΔE(CAL) = 2.6; LASSO regression ΔE(CAL) = 2.1). Only the PLSR models deteriorated substantially under cross validation. The algorithm is adaptable and can be easily recalibrated to any working computer vision system. The algorithm was tested on a typical working laboratory computer vision system and delivered only a very marginal loss of colour information ΔE(CAL) = 2.35. Colour features derived on this system were able to safely discriminate between three classes of ham with 100% correct classification whereas colour features measured on a conventional colourimeter were not. Copyright © 2012 Elsevier Ltd. All rights reserved.

  15. Combinations of NIR, Raman spectroscopy and physicochemical measurements for improved monitoring of solvent extraction processes using hierarchical multivariate analysis models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nee, K.; Bryan, S.; Levitskaia, T.

    The reliability of chemical processes can be greatly improved by implementing inline monitoring systems. Combining multivariate analysis with non-destructive sensors can enhance the process without interfering with the operation. Here, we present here hierarchical models using both principal component analysis and partial least square analysis developed for different chemical components representative of solvent extraction process streams. A training set of 380 samples and an external validation set of 95 samples were prepared and Near infrared and Raman spectral data as well as conductivity under variable temperature conditions were collected. The results from the models indicate that careful selection of themore » spectral range is important. By compressing the data through Principal Component Analysis (PCA), we lower the rank of the data set to its most dominant features while maintaining the key principal components to be used in the regression analysis. Within the studied data set, concentration of five chemical components were modeled; total nitrate (NO 3 -), total acid (H +), neodymium (Nd 3+), sodium (Na +), and ionic strength (I.S.). The best overall model prediction for each of the species studied used a combined data set comprised of complementary techniques including NIR, Raman, and conductivity. Finally, our study shows that chemometric models are powerful but requires significant amount of carefully analyzed data to capture variations in the chemistry.« less

  16. Combinations of NIR, Raman spectroscopy and physicochemical measurements for improved monitoring of solvent extraction processes using hierarchical multivariate analysis models

    DOE PAGES

    Nee, K.; Bryan, S.; Levitskaia, T.; ...

    2017-12-28

    The reliability of chemical processes can be greatly improved by implementing inline monitoring systems. Combining multivariate analysis with non-destructive sensors can enhance the process without interfering with the operation. Here, we present here hierarchical models using both principal component analysis and partial least square analysis developed for different chemical components representative of solvent extraction process streams. A training set of 380 samples and an external validation set of 95 samples were prepared and Near infrared and Raman spectral data as well as conductivity under variable temperature conditions were collected. The results from the models indicate that careful selection of themore » spectral range is important. By compressing the data through Principal Component Analysis (PCA), we lower the rank of the data set to its most dominant features while maintaining the key principal components to be used in the regression analysis. Within the studied data set, concentration of five chemical components were modeled; total nitrate (NO 3 -), total acid (H +), neodymium (Nd 3+), sodium (Na +), and ionic strength (I.S.). The best overall model prediction for each of the species studied used a combined data set comprised of complementary techniques including NIR, Raman, and conductivity. Finally, our study shows that chemometric models are powerful but requires significant amount of carefully analyzed data to capture variations in the chemistry.« less

  17. Use of principal-component, correlation, and stepwise multiple-regression analyses to investigate selected physical and hydraulic properties of carbonate-rock aquifers

    USGS Publications Warehouse

    Brown, C. Erwin

    1993-01-01

    Correlation analysis in conjunction with principal-component and multiple-regression analyses were applied to laboratory chemical and petrographic data to assess the usefulness of these techniques in evaluating selected physical and hydraulic properties of carbonate-rock aquifers in central Pennsylvania. Correlation and principal-component analyses were used to establish relations and associations among variables, to determine dimensions of property variation of samples, and to filter the variables containing similar information. Principal-component and correlation analyses showed that porosity is related to other measured variables and that permeability is most related to porosity and grain size. Four principal components are found to be significant in explaining the variance of data. Stepwise multiple-regression analysis was used to see how well the measured variables could predict porosity and (or) permeability for this suite of rocks. The variation in permeability and porosity is not totally predicted by the other variables, but the regression is significant at the 5% significance level. ?? 1993.

  18. A gap-filling model for eddy covariance latent heat flux: Estimating evapotranspiration of a subtropical seasonal evergreen broad-leaved forest as an example

    NASA Astrophysics Data System (ADS)

    Chen, Yi-Ying; Chu, Chia-Ren; Li, Ming-Hsu

    2012-10-01

    SummaryIn this paper we present a semi-parametric multivariate gap-filling model for tower-based measurement of latent heat flux (LE). Two statistical techniques, the principal component analysis (PCA) and a nonlinear interpolation approach were integrated into this LE gap-filling model. The PCA was first used to resolve the multicollinearity relationships among various environmental variables, including radiation, soil moisture deficit, leaf area index, wind speed, etc. Two nonlinear interpolation methods, multiple regressions (MRS) and the K-nearest neighbors (KNNs) were examined with random selected flux gaps for both clear sky and nighttime/cloudy data to incorporate into this LE gap-filling model. Experimental results indicated that the KNN interpolation approach is able to provide consistent LE estimations while MRS presents over estimations during nighttime/cloudy. Rather than using empirical regression parameters, the KNN approach resolves the nonlinear relationship between the gap-filled LE flux and principal components with adaptive K values under different atmospheric states. The developed LE gap-filling model (PCA with KNN) works with a RMSE of 2.4 W m-2 (˜0.09 mm day-1) at a weekly time scale by adding 40% artificial flux gaps into original dataset. Annual evapotranspiration at this study site were estimated at 736 mm (1803 MJ) and 728 mm (1785 MJ) for year 2008 and 2009, respectively.

  19. Prediction models for solitary pulmonary nodules based on curvelet textural features and clinical parameters.

    PubMed

    Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua

    2013-01-01

    Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.

  20. An improved geographically weighted regression model for PM2.5 concentration estimation in large areas

    NASA Astrophysics Data System (ADS)

    Zhai, Liang; Li, Shuang; Zou, Bin; Sang, Huiyong; Fang, Xin; Xu, Shan

    2018-05-01

    Considering the spatial non-stationary contributions of environment variables to PM2.5 variations, the geographically weighted regression (GWR) modeling method has been using to estimate PM2.5 concentrations widely. However, most of the GWR models in reported studies so far were established based on the screened predictors through pretreatment correlation analysis, and this process might cause the omissions of factors really driving PM2.5 variations. This study therefore developed a best subsets regression (BSR) enhanced principal component analysis-GWR (PCA-GWR) modeling approach to estimate PM2.5 concentration by fully considering all the potential variables' contributions simultaneously. The performance comparison experiment between PCA-GWR and regular GWR was conducted in the Beijing-Tianjin-Hebei (BTH) region over a one-year-period. Results indicated that the PCA-GWR modeling outperforms the regular GWR modeling with obvious higher model fitting- and cross-validation based adjusted R2 and lower RMSE. Meanwhile, the distribution map of PM2.5 concentration from PCA-GWR modeling also clearly depicts more spatial variation details in contrast to the one from regular GWR modeling. It can be concluded that the BSR enhanced PCA-GWR modeling could be a reliable way for effective air pollution concentration estimation in the coming future by involving all the potential predictor variables' contributions to PM2.5 variations.

  1. Climate change and future wildfire in the western USA: what model projections do and don't tell us

    NASA Astrophysics Data System (ADS)

    Littell, J. S.; McKenzie, D.; Cushman, S. A.; Wan, H. Y.

    2017-12-01

    We developed statistical climate-fire models describing area burned for 70 ecosections in the western U.S. Historically, these ecosections collectively represent a gradient of climate-fire relationships from purely fuel limited (characterized by antecedent positive water balance anomalies and/or negative energy balance anomalies) to purely flammability limited (characterized by antecedent negative water balance anomalies and/or positive energy balance anomalies). Sixty-eight ecosection linear models included significant climate predictors, and 56 ecosections satisfied regression diagnostics, yielding acceptable climate-fire models. There is considerable diversity in seasonality, dominant variables, and prevalence of lagged climatic terms in the climate-fire regression models, indicating variation in mechanisms of climate-fire linkages across ecosystems. This diversity, however, is not random - there is a clear pattern in the fuzzy set membership of the relative dominance of regression predictor variables. This pattern defines a fuel-flammability gradient of limitations, with a tendency toward warm season drought on the flammability end and a tendency toward antecedent moisture on the fuel end. Projected area burned under a multi-model composite future climate scenarios varies, with increasing area burned in 41 ecosections in the West by 2030-2059 (median 132% among 10 purely flammability limited ecosections, median 240% among 25 flammability limited systems with a fuel limitation component, and median 43% among 6 systems with equal control) but decreasing (median -119% among 13 fuel limited systems with a flammability component). For the period 2070-2099, the projected area burned increases much more in the flammability (769%) and flammability-fuel hybrid (442%) systems than those with joint control (139%), and continues to decrease (-178%) in fuel-flammability hybrid systems. Filtering the projected results with fire rotation limits projections biased high by the static assumptions of the statistical models. Exceedence probabilities for 95th%ile fire years increases for the 2040s and 2080s and are largest in exclusively flammability limited ecosections compared with other fuel controls.

  2. Reliability analysis of C-130 turboprop engine components using artificial neural network

    NASA Astrophysics Data System (ADS)

    Qattan, Nizar A.

    In this study, we predict the failure rate of Lockheed C-130 Engine Turbine. More than thirty years of local operational field data were used for failure rate prediction and validation. The Weibull regression model and the Artificial Neural Network model including (feed-forward back-propagation, radial basis neural network, and multilayer perceptron neural network model); will be utilized to perform this study. For this purpose, the thesis will be divided into five major parts. First part deals with Weibull regression model to predict the turbine general failure rate, and the rate of failures that require overhaul maintenance. The second part will cover the Artificial Neural Network (ANN) model utilizing the feed-forward back-propagation algorithm as a learning rule. The MATLAB package will be used in order to build and design a code to simulate the given data, the inputs to the neural network are the independent variables, the output is the general failure rate of the turbine, and the failures which required overhaul maintenance. In the third part we predict the general failure rate of the turbine and the failures which require overhaul maintenance, using radial basis neural network model on MATLAB tool box. In the fourth part we compare the predictions of the feed-forward back-propagation model, with that of Weibull regression model, and radial basis neural network model. The results show that the failure rate predicted by the feed-forward back-propagation artificial neural network model is closer in agreement with radial basis neural network model compared with the actual field-data, than the failure rate predicted by the Weibull model. By the end of the study, we forecast the general failure rate of the Lockheed C-130 Engine Turbine, the failures which required overhaul maintenance and six categorical failures using multilayer perceptron neural network (MLP) model on DTREG commercial software. The results also give an insight into the reliability of the engine turbine under actual operating conditions, which can be used by aircraft operators for assessing system and component failures and customizing the maintenance programs recommended by the manufacturer.

  3. Reconstruction of spatio-temporal temperature from sparse historical records using robust probabilistic principal component regression

    USGS Publications Warehouse

    Tipton, John; Hooten, Mevin B.; Goring, Simon

    2017-01-01

    Scientific records of temperature and precipitation have been kept for several hundred years, but for many areas, only a shorter record exists. To understand climate change, there is a need for rigorous statistical reconstructions of the paleoclimate using proxy data. Paleoclimate proxy data are often sparse, noisy, indirect measurements of the climate process of interest, making each proxy uniquely challenging to model statistically. We reconstruct spatially explicit temperature surfaces from sparse and noisy measurements recorded at historical United States military forts and other observer stations from 1820 to 1894. One common method for reconstructing the paleoclimate from proxy data is principal component regression (PCR). With PCR, one learns a statistical relationship between the paleoclimate proxy data and a set of climate observations that are used as patterns for potential reconstruction scenarios. We explore PCR in a Bayesian hierarchical framework, extending classical PCR in a variety of ways. First, we model the latent principal components probabilistically, accounting for measurement error in the observational data. Next, we extend our method to better accommodate outliers that occur in the proxy data. Finally, we explore alternatives to the truncation of lower-order principal components using different regularization techniques. One fundamental challenge in paleoclimate reconstruction efforts is the lack of out-of-sample data for predictive validation. Cross-validation is of potential value, but is computationally expensive and potentially sensitive to outliers in sparse data scenarios. To overcome the limitations that a lack of out-of-sample records presents, we test our methods using a simulation study, applying proper scoring rules including a computationally efficient approximation to leave-one-out cross-validation using the log score to validate model performance. The result of our analysis is a spatially explicit reconstruction of spatio-temporal temperature from a very sparse historical record.

  4. Rank estimation and the multivariate analysis of in vivo fast-scan cyclic voltammetric data

    PubMed Central

    Keithley, Richard B.; Carelli, Regina M.; Wightman, R. Mark

    2010-01-01

    Principal component regression has been used in the past to separate current contributions from different neuromodulators measured with in vivo fast-scan cyclic voltammetry. Traditionally, a percent cumulative variance approach has been used to determine the rank of the training set voltammetric matrix during model development, however this approach suffers from several disadvantages including the use of arbitrary percentages and the requirement of extreme precision of training sets. Here we propose that Malinowski’s F-test, a method based on a statistical analysis of the variance contained within the training set, can be used to improve factor selection for the analysis of in vivo fast-scan cyclic voltammetric data. These two methods of rank estimation were compared at all steps in the calibration protocol including the number of principal components retained, overall noise levels, model validation as determined using a residual analysis procedure, and predicted concentration information. By analyzing 119 training sets from two different laboratories amassed over several years, we were able to gain insight into the heterogeneity of in vivo fast-scan cyclic voltammetric data and study how differences in factor selection propagate throughout the entire principal component regression analysis procedure. Visualizing cyclic voltammetric representations of the data contained in the retained and discarded principal components showed that using Malinowski’s F-test for rank estimation of in vivo training sets allowed for noise to be more accurately removed. Malinowski’s F-test also improved the robustness of our criterion for judging multivariate model validity, even though signal-to-noise ratios of the data varied. In addition, pH change was the majority noise carrier of in vivo training sets while dopamine prediction was more sensitive to noise. PMID:20527815

  5. Application of near-infrared spectroscopy for the rapid quality assessment of Radix Paeoniae Rubra

    NASA Astrophysics Data System (ADS)

    Zhan, Hao; Fang, Jing; Tang, Liying; Yang, Hongjun; Li, Hua; Wang, Zhuju; Yang, Bin; Wu, Hongwei; Fu, Meihong

    2017-08-01

    Near-infrared (NIR) spectroscopy with multivariate analysis was used to quantify gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra, and the feasibility to classify the samples originating from different areas was investigated. A new high-performance liquid chromatography method was developed and validated to analyze gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra as the reference. Partial least squares (PLS), principal component regression (PCR), and stepwise multivariate linear regression (SMLR) were performed to calibrate the regression model. Different data pretreatments such as derivatives (1st and 2nd), multiplicative scatter correction, standard normal variate, Savitzky-Golay filter, and Norris derivative filter were applied to remove the systematic errors. The performance of the model was evaluated according to the root mean square of calibration (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and correlation coefficient (r). The results show that compared to PCR and SMLR, PLS had a lower RMSEC, RMSECV, and RMSEP and higher r for all the four analytes. PLS coupled with proper pretreatments showed good performance in both the fitting and predicting results. Furthermore, the original areas of Radix Paeoniae Rubra samples were partly distinguished by principal component analysis. This study shows that NIR with PLS is a reliable, inexpensive, and rapid tool for the quality assessment of Radix Paeoniae Rubra.

  6. Determination of polyphenolic compounds of red wines by UV-VIS-NIR spectroscopy and chemometrics tools.

    PubMed

    Martelo-Vidal, M J; Vázquez, M

    2014-09-01

    Spectral analysis is a quick and non-destructive method to analyse wine. In this work, trans-resveratrol, oenin, malvin, catechin, epicatechin, quercetin and syringic acid were determined in commercial red wines from DO Rías Baixas and DO Ribeira Sacra (Spain) by UV-VIS-NIR spectroscopy. Calibration models were developed using principal component regression (PCR) or partial least squares (PLS) regression. HPLC was used as reference method. The results showed that reliable PLS models were obtained to quantify all polyphenols for Rías Baixas wines. For Ribeira Sacra, feasible models were obtained to determine quercetin, epicatechin, oenin and syringic acid. PCR calibration models showed worst reliable of prediction than PLS models. For red wines from mencía grapes, feasible models were obtained for catechin and oenin, regardless the geographical origin. The results obtained demonstrate that UV-VIS-NIR spectroscopy can be used to determine individual polyphenolic compounds in red wines. Copyright © 2014 Elsevier Ltd. All rights reserved.

  7. Principal component analysis-based pattern analysis of dose-volume histograms and influence on rectal toxicity.

    PubMed

    Söhn, Matthias; Alber, Markus; Yan, Di

    2007-09-01

    The variability of dose-volume histogram (DVH) shapes in a patient population can be quantified using principal component analysis (PCA). We applied this to rectal DVHs of prostate cancer patients and investigated the correlation of the PCA parameters with late bleeding. PCA was applied to the rectal wall DVHs of 262 patients, who had been treated with a four-field box, conformal adaptive radiotherapy technique. The correlated changes in the DVH pattern were revealed as "eigenmodes," which were ordered by their importance to represent data set variability. Each DVH is uniquely characterized by its principal components (PCs). The correlation of the first three PCs and chronic rectal bleeding of Grade 2 or greater was investigated with uni- and multivariate logistic regression analyses. Rectal wall DVHs in four-field conformal RT can primarily be represented by the first two or three PCs, which describe approximately 94% or 96% of the DVH shape variability, respectively. The first eigenmode models the total irradiated rectal volume; thus, PC1 correlates to the mean dose. Mode 2 describes the interpatient differences of the relative rectal volume in the two- or four-field overlap region. Mode 3 reveals correlations of volumes with intermediate doses ( approximately 40-45 Gy) and volumes with doses >70 Gy; thus, PC3 is associated with the maximal dose. According to univariate logistic regression analysis, only PC2 correlated significantly with toxicity. However, multivariate logistic regression analysis with the first two or three PCs revealed an increased probability of bleeding for DVHs with more than one large PC. PCA can reveal the correlation structure of DVHs for a patient population as imposed by the treatment technique and provide information about its relationship to toxicity. It proves useful for augmenting normal tissue complication probability modeling approaches.

  8. Principal components analysis in clinical studies.

    PubMed

    Zhang, Zhongheng; Castelló, Adela

    2017-09-01

    In multivariate analysis, independent variables are usually correlated to each other which can introduce multicollinearity in the regression models. One approach to solve this problem is to apply principal components analysis (PCA) over these variables. This method uses orthogonal transformation to represent sets of potentially correlated variables with principal components (PC) that are linearly uncorrelated. PCs are ordered so that the first PC has the largest possible variance and only some components are selected to represent the correlated variables. As a result, the dimension of the variable space is reduced. This tutorial illustrates how to perform PCA in R environment, the example is a simulated dataset in which two PCs are responsible for the majority of the variance in the data. Furthermore, the visualization of PCA is highlighted.

  9. Analysis of Forest Foliage Using a Multivariate Mixture Model

    NASA Technical Reports Server (NTRS)

    Hlavka, C. A.; Peterson, David L.; Johnson, L. F.; Ganapol, B.

    1997-01-01

    Data with wet chemical measurements and near infrared spectra of ground leaf samples were analyzed to test a multivariate regression technique for estimating component spectra which is based on a linear mixture model for absorbance. The resulting unmixed spectra for carbohydrates, lignin, and protein resemble the spectra of extracted plant starches, cellulose, lignin, and protein. The unmixed protein spectrum has prominent absorption spectra at wavelengths which have been associated with nitrogen bonds.

  10. New methodology for modeling annual-aircraft emissions at airports

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Woodmansey, B.G.; Patterson, J.G.

    An as-accurate-as-possible estimation of total-aircraft emissions are an essential component of any environmental-impact assessment done for proposed expansions at major airports. To determine the amount of emissions generated by aircraft using present models it is necessary to know the emission characteristics of all engines that are on all planes using the airport. However, the published data base does not cover all engine types and, therefore, a new methodology is needed to assist in estimating annual emissions from aircraft at airports. Linear regression equations relating quantity of emissions to aircraft weight using a known-fleet mix are developed in this paper. Total-annualmore » emissions for CO, NO[sub x], NMHC, SO[sub x], CO[sub 2], and N[sub 2]O are tabulated for Toronto's international airport for 1990. The regression equations are statistically significant for all emissions except for NMHC from large jets and NO[sub x] and NMHC for piston-engine aircraft. This regression model is a relatively simple, fast, and inexpensive method of obtaining an annual-emission inventory for an airport.« less

  11. A componential model of human interaction with graphs: 1. Linear regression modeling

    NASA Technical Reports Server (NTRS)

    Gillan, Douglas J.; Lewis, Robert

    1994-01-01

    Task analyses served as the basis for developing the Mixed Arithmetic-Perceptual (MA-P) model, which proposes (1) that people interacting with common graphs to answer common questions apply a set of component processes-searching for indicators, encoding the value of indicators, performing arithmetic operations on the values, making spatial comparisons among indicators, and repsonding; and (2) that the type of graph and user's task determine the combination and order of the components applied (i.e., the processing steps). Two experiments investigated the prediction that response time will be linearly related to the number of processing steps according to the MA-P model. Subjects used line graphs, scatter plots, and stacked bar graphs to answer comparison questions and questions requiring arithmetic calculations. A one-parameter version of the model (with equal weights for all components) and a two-parameter version (with different weights for arithmetic and nonarithmetic processes) accounted for 76%-85% of individual subjects' variance in response time and 61%-68% of the variance taken across all subjects. The discussion addresses possible modifications in the MA-P model, alternative models, and design implications from the MA-P model.

  12. Analytics of Radioactive Materials Released in the Fukushima Daiichi Nuclear Accident

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Egarievwe, Stephen U.; Nuclear Engineering Department, University of Tennessee, Knoxville, TN; Coble, Jamie B.

    The 2011 Fukushima Daiichi nuclear accident in Japan resulted in the release of radioactive materials into the atmosphere, the nearby sea, and the surrounding land. Following the accident, several meteorological models were used to predict the transport of the radioactive materials to other continents such as North America and Europe. Also of high importance is the dispersion of radioactive materials locally and within Japan. Based on the International Atomic Energy Agency (IAEA) Convention on Early Notification of a nuclear accident, several radiological data sets were collected on the accident by the Japanese authorities. Among the radioactive materials monitored, are I-131more » and Cs-137 which form the major contributions to the contamination of drinking water. The radiation dose in the atmosphere was also measured. It is impractical to measure contamination and radiation dose in every place of interest. Therefore, modeling helps to predict contamination and radiation dose. Some modeling studies that have been reported in the literature include the simulation of transport and deposition of I-131 and Cs-137 from the accident, Cs-137 deposition and contamination of Japanese soils, and preliminary estimates of I-131 and Cs-137 discharged from the plant into the atmosphere. In this paper, we present statistical analytics of I-131 and Cs-137 with the goal of predicting gamma dose from the Fukushima Daiichi nuclear accident. The data sets used in our study were collected from the IAEA Fukushima Monitoring Database. As part of this study, we investigated several regression models to find the best algorithm for modeling the gamma dose. The modeling techniques used in our study include linear regression, principal component regression (PCR), partial least square (PLS) regression, and ridge regression. Our preliminary results on the first set of data showed that the linear regression model with one variable was the best with a root mean square error of 0.0133 μSv/h, compared to 0.0210 μSv/h for PCR, 0.231 μSv/h for ridge regression L-curve, 0.0856 μSv/h for PLS, and 0.0860 μSv/h for ridge regression cross validation. Complete results using the full datasets for these models will also be presented. (authors)« less

  13. Clinical Components of Borderline Personality Disorder and Personality Functioning.

    PubMed

    Ferrer, Marc; Andión, Óscar; Calvo, Natalia; Hörz, Susanne; Fischer-Kern, Melitta; Kapusta, Nestor D; Schneider, Gudrun; Clarkin, John F; Doering, Stephan

    2018-01-01

    Impairment in personality functioning (PF) represents a salient criterion of the DSM-5 alternative diagnostic model for personality disorders (AMPD). The main goal of this study is to analyze the relationship of the borderline personality disorder (BPD) clinical components derived from the DSM-5 categorical diagnostic model (affective dysregulation, behavioral dysregulation, and disturbed relatedness) with personality organization (PO), i.e., PF, assessed by the Structured Interview of Personality Organization (STIPO). STIPO and the Structured Clinical Interviews for DSM-IV (SCID-I and -II) were administered to 206 BPD patients. The relationship between PO and BPD components were studied using Spearman correlations and independent linear regression analyses. Significant positive correlations were observed between STIPO scores and several DSM-5 BPD criteria and comorbid psychiatric disorders. STIPO dimensions mainly correlated with disturbed relatedness and, to a lesser extent, affective dysregulation components. Each BPD clinical component was associated with specific STIPO dimensions. Both diagnostic models, DSM-5 BPD criteria and PO, are not only related but complementary concepts. The results of this study particularly recommend STIPO for the assessment of relational functioning, which is a major domain of the Personality Functioning Scale Levels of the DSM-5 AMPD. © 2018 S. Karger AG, Basel.

  14. Restricted maximum likelihood estimation of genetic principal components and smoothed covariance matrices

    PubMed Central

    Meyer, Karin; Kirkpatrick, Mark

    2005-01-01

    Principal component analysis is a widely used 'dimension reduction' technique, albeit generally at a phenotypic level. It is shown that we can estimate genetic principal components directly through a simple reparameterisation of the usual linear, mixed model. This is applicable to any analysis fitting multiple, correlated genetic effects, whether effects for individual traits or sets of random regression coefficients to model trajectories. Depending on the magnitude of genetic correlation, a subset of the principal component generally suffices to capture the bulk of genetic variation. Corresponding estimates of genetic covariance matrices are more parsimonious, have reduced rank and are smoothed, with the number of parameters required to model the dispersion structure reduced from k(k + 1)/2 to m(2k - m + 1)/2 for k effects and m principal components. Estimation of these parameters, the largest eigenvalues and pertaining eigenvectors of the genetic covariance matrix, via restricted maximum likelihood using derivatives of the likelihood, is described. It is shown that reduced rank estimation can reduce computational requirements of multivariate analyses substantially. An application to the analysis of eight traits recorded via live ultrasound scanning of beef cattle is given. PMID:15588566

  15. Homogenization Issues in the Combustion of Heterogeneous Solid Propellants

    NASA Technical Reports Server (NTRS)

    Chen, M.; Buckmaster, J.; Jackson, T. L.; Massa, L.

    2002-01-01

    We examine random packs of discs or spheres, models for ammonium-perchlorate-in-binder propellants, and discuss their average properties. An analytical strategy is described for calculating the mean or effective heat conduction coefficient in terms of the heat conduction coefficients of the individual components, and the results are verified by comparison with those of direct numerical simulations (dns) for both 2-D (disc) and 3-D (sphere) packs across which a temperature difference is applied. Similarly, when the surface regression speed of each component is related to the surface temperature via a simple Arrhenius law, an analytical strategy is developed for calculating an effective Arrhenius law for the combination, and these results are verified using dns in which a uniform heat flux is applied to the pack surface, causing it to regress. These results are needed for homogenization strategies necessary for fully integrated 2-D or 3-D simulations of heterogeneous propellant combustion.

  16. Association between Personality Traits and Sleep Quality in Young Korean Women

    PubMed Central

    Kim, Han-Na; Cho, Juhee; Chang, Yoosoo; Ryu, Seungho

    2015-01-01

    Personality is a trait that affects behavior and lifestyle, and sleep quality is an important component of a healthy life. We analyzed the association between personality traits and sleep quality in a cross-section of 1,406 young women (from 18 to 40 years of age) who were not reporting clinically meaningful depression symptoms. Surveys were carried out from December 2011 to February 2012, using the Revised NEO Personality Inventory and the Pittsburgh Sleep Quality Index (PSQI). All analyses were adjusted for demographic and behavioral variables. We considered beta weights, structure coefficients, unique effects, and common effects when evaluating the importance of sleep quality predictors in multiple linear regression models. Neuroticism was the most important contributor to PSQI global scores in the multiple regression models. By contrast, despite being strongly correlated with sleep quality, conscientiousness had a near-zero beta weight in linear regression models, because most variance was shared with other personality traits. However, conscientiousness was the most noteworthy predictor of poor sleep quality status (PSQI≥6) in logistic regression models and individuals high in conscientiousness were least likely to have poor sleep quality, which is consistent with an OR of 0.813, with conscientiousness being protective against poor sleep quality. Personality may be a factor in poor sleep quality and should be considered in sleep interventions targeting young women. PMID:26030141

  17. Separation of the long-term thermal effects from the strain measurements in the Geodynamics Laboratory of Lanzarote

    NASA Astrophysics Data System (ADS)

    Venedikov, A. P.; Arnoso, J.; Cai, W.; Vieira, R.; Tan, S.; Velez, E. J.

    2006-01-01

    A 12-year series (1992-2004) of strain measurements recorded in the Geodynamics Laboratory of Lanzarote is investigated. Through a tidal analysis the non-tidal component of the data is separated in order to use it for studying signals, useful for monitoring of the volcanic activity on the island. This component contains various perturbations of meteorological and oceanic origin, which should be eliminated in order to make the useful signals discernible. The paper is devoted to the estimation and elimination of the effect of the air temperature inside the station, which strongly dominates the strainmeter data. For solving this task, a regression model is applied, which includes a linear relation with the temperature and time-dependant polynomials. The regression includes nonlinearly a set of parameters, which are estimated by a properly applied Bayesian approach. The results obtained are: the regression coefficient of the strain data on temperature is equal to (-367.4 ± 0.8) × 10 -9 °C -1, the curve of the non-tidal component reduced by the effect of the temperature and a polynomial approximation of the reduced curve. The technique used here can be helpful to investigators in the domain of the earthquake and volcano monitoring. However, the fundamental and extremely difficult problem of what kind of signals in the reduced curves might be useful in this field is not considered here.

  18. Partial Least Squares Regression Calibration of an Ultraviolet-Visible Spectrophotometer for Measurements of Chemical Oxygen Demand in Dye Wastewater

    NASA Astrophysics Data System (ADS)

    Mai, W.; Zhang, J.-F.; Zhao, X.-M.; Li, Z.; Xu, Z.-W.

    2017-11-01

    Wastewater from the dye industry is typically analyzed using a standard method for measurement of chemical oxygen demand (COD) or by a single-wavelength spectroscopic method. To overcome the disadvantages of these methods, ultraviolet-visible (UV-Vis) spectroscopy was combined with principal component regression (PCR) and partial least squares regression (PLSR) in this study. Unlike the standard method, this method does not require digestion of the samples for preparation. Experiments showed that the PLSR model offered high prediction performance for COD, with a mean relative error of about 5% for two dyes. This error is similar to that obtained with the standard method. In this study, the precision of the PLSR model decreased with the number of dye compounds present. It is likely that multiple models will be required in reality, and the complexity of a COD monitoring system would be greatly reduced if the PLSR model is used because it can include several dyes. UV-Vis spectroscopy with PLSR successfully enhanced the performance of COD prediction for dye wastewater and showed good potential for application in on-line water quality monitoring.

  19. Evaluating observations in the context of predictions for the death valley regional groundwater system

    USGS Publications Warehouse

    Ely, D.M.; Hill, M.C.; Tiedeman, C.R.; O'Brien, G. M.

    2004-01-01

    When a model is calibrated by nonlinear regression, calculated diagnostic and inferential statistics provide a wealth of information about many aspects of the system. This work uses linear inferential statistics that are measures of prediction uncertainty to investigate the likely importance of continued monitoring of hydraulic head to the accuracy of model predictions. The measurements evaluated are hydraulic heads; the predictions of interest are subsurface transport from 15 locations. The advective component of transport is considered because it is the component most affected by the system dynamics represented by the regional-scale model being used. The problem is addressed using the capabilities of the U.S. Geological Survey computer program MODFLOW-2000, with its Advective Travel Observation (ADV) Package. Copyright ASCE 2004.

  20. Essays in energy economics: The electricity industry

    NASA Astrophysics Data System (ADS)

    Martinez-Chombo, Eduardo

    Electricity demand analysis using cointegration and error-correction models with time varying parameters: The Mexican case. In this essay we show how some flexibility can be allowed in modeling the parameters of the electricity demand function by employing the time varying coefficient (TVC) cointegrating model developed by Park and Hahn (1999). With the income elasticity of electricity demand modeled as a TVC, we perform tests to examine the adequacy of the proposed model against the cointegrating regression with fixed coefficients, as well as against the spuriousness of the regression with TVC. The results reject the specification of the model with fixed coefficients and favor the proposed model. We also show how some flexibility is gained in the specification of the error correction model based on the proposed TVC cointegrating model, by including more lags of the error correction term as predetermined variables. Finally, we present the results of some out-of-sample forecast comparison among competing models. Electricity demand and supply in Mexico. In this essay we present a simplified model of the Mexican electricity transmission network. We use the model to approximate the marginal cost of supplying electricity to consumers in different locations and at different times of the year. We examine how costs and system operations will be affected by proposed investments in generation and transmission capacity given a forecast of growth in regional electricity demands. Decomposing electricity prices with jumps. In this essay we propose a model that decomposes electricity prices into two independent stochastic processes: one that represents the "normal" pattern of electricity prices and the other that captures temporary shocks, or "jumps", with non-lasting effects in the market. Each contains specific mean reverting parameters to estimate. In order to identify such components we specify a state-space model with regime switching. Using Kim's (1994) filtering algorithm we estimate the parameters of the model, the transition probabilities and the unobservable components for the mean adjusted series of New South Wales' electricity prices. Finally, bootstrap simulations were performed to estimate the expected contribution of each of the components in the overall electricity prices.

  1. Patient phenotypes associated with outcomes after aneurysmal subarachnoid hemorrhage: a principal component analysis.

    PubMed

    Ibrahim, George M; Morgan, Benjamin R; Macdonald, R Loch

    2014-03-01

    Predictors of outcome after aneurysmal subarachnoid hemorrhage have been determined previously through hypothesis-driven methods that often exclude putative covariates and require a priori knowledge of potential confounders. Here, we apply a data-driven approach, principal component analysis, to identify baseline patient phenotypes that may predict neurological outcomes. Principal component analysis was performed on 120 subjects enrolled in a prospective randomized trial of clazosentan for the prevention of angiographic vasospasm. Correlation matrices were created using a combination of Pearson, polyserial, and polychoric regressions among 46 variables. Scores of significant components (with eigenvalues>1) were included in multivariate logistic regression models with incidence of severe angiographic vasospasm, delayed ischemic neurological deficit, and long-term outcome as outcomes of interest. Sixteen significant principal components accounting for 74.6% of the variance were identified. A single component dominated by the patients' initial hemodynamic status, World Federation of Neurosurgical Societies score, neurological injury, and initial neutrophil/leukocyte counts was significantly associated with poor outcome. Two additional components were associated with angiographic vasospasm, of which one was also associated with delayed ischemic neurological deficit. The first was dominated by the aneurysm-securing procedure, subarachnoid clot clearance, and intracerebral hemorrhage, whereas the second had high contributions from markers of anemia and albumin levels. Principal component analysis, a data-driven approach, identified patient phenotypes that are associated with worse neurological outcomes. Such data reduction methods may provide a better approximation of unique patient phenotypes and may inform clinical care as well as patient recruitment into clinical trials. http://www.clinicaltrials.gov. Unique identifier: NCT00111085.

  2. Orthogonal decomposition of left ventricular remodeling in myocardial infarction.

    PubMed

    Zhang, Xingyu; Medrano-Gracia, Pau; Ambale-Venkatesh, Bharath; Bluemke, David A; Cowan, Brett R; Finn, J Paul; Kadish, Alan H; Lee, Daniel C; Lima, Joao A C; Young, Alistair A; Suinesiaputra, Avan

    2017-03-01

    Left ventricular size and shape are important for quantifying cardiac remodeling in response to cardiovascular disease. Geometric remodeling indices have been shown to have prognostic value in predicting adverse events in the clinical literature, but these often describe interrelated shape changes. We developed a novel method for deriving orthogonal remodeling components directly from any (moderately independent) set of clinical remodeling indices. Six clinical remodeling indices (end-diastolic volume index, sphericity, relative wall thickness, ejection fraction, apical conicity, and longitudinal shortening) were evaluated using cardiac magnetic resonance images of 300 patients with myocardial infarction, and 1991 asymptomatic subjects, obtained from the Cardiac Atlas Project. Partial least squares (PLS) regression of left ventricular shape models resulted in remodeling components that were optimally associated with each remodeling index. A Gram-Schmidt orthogonalization process, by which remodeling components were successively removed from the shape space in the order of shape variance explained, resulted in a set of orthonormal remodeling components. Remodeling scores could then be calculated that quantify the amount of each remodeling component present in each case. A one-factor PLS regression led to more decoupling between scores from the different remodeling components across the entire cohort, and zero correlation between clinical indices and subsequent scores. The PLS orthogonal remodeling components had similar power to describe differences between myocardial infarction patients and asymptomatic subjects as principal component analysis, but were better associated with well-understood clinical indices of cardiac remodeling. The data and analyses are available from www.cardiacatlas.org. © The Author 2017. Published by Oxford University Press.

  3. Functional Generalized Additive Models.

    PubMed

    McLean, Mathew W; Hooker, Giles; Staicu, Ana-Maria; Scheipl, Fabian; Ruppert, David

    2014-01-01

    We introduce the functional generalized additive model (FGAM), a novel regression model for association studies between a scalar response and a functional predictor. We model the link-transformed mean response as the integral with respect to t of F { X ( t ), t } where F (·,·) is an unknown regression function and X ( t ) is a functional covariate. Rather than having an additive model in a finite number of principal components as in Müller and Yao (2008), our model incorporates the functional predictor directly and thus our model can be viewed as the natural functional extension of generalized additive models. We estimate F (·,·) using tensor-product B-splines with roughness penalties. A pointwise quantile transformation of the functional predictor is also considered to ensure each tensor-product B-spline has observed data on its support. The methods are evaluated using simulated data and their predictive performance is compared with other competing scalar-on-function regression alternatives. We illustrate the usefulness of our approach through an application to brain tractography, where X ( t ) is a signal from diffusion tensor imaging at position, t , along a tract in the brain. In one example, the response is disease-status (case or control) and in a second example, it is the score on a cognitive test. R code for performing the simulations and fitting the FGAM can be found in supplemental materials available online.

  4. Rainfall and streamflow from small tree-covered and fern-covered and burned watersheds in Hawaii

    Treesearch

    H. W. Anderson; P. D. Duffy; Teruo Yamamoto

    1966-01-01

    Streamflow from two 30-acre watersheds near Honolulu was studied by using principal components regression analysis. Models using data on monthly, storm, and peak discharges were tested against several variables expressing amount and intensity of rainfall, and against variables expressing antecedent rainfall. Explained variation ranged from 78 to 94 percent. The...

  5. Forest dynamics to precipitation and temperature in the Gulf of Mexico coastal region.

    PubMed

    Li, Tianyu; Meng, Qingmin

    2017-05-01

    The forest is one of the most significant components of the Gulf of Mexico (GOM) coast. It provides livelihood to inhabitant and is known to be sensitive to climatic fluctuations. This study focuses on examining the impacts of temperature and precipitation variations on coastal forest. Two different regression methods, ordinary least squares (OLS) and geographically weighted regression (GWR), were employed to reveal the relationship between meteorological variables and forest dynamics. OLS regression analysis shows that changes in precipitation and temperature, over a span of 12 months, are responsible for 56% of NDVI variation. The forest, which is not particularly affected by the average monthly precipitation in most months, is observed to be affected by cumulative seasonal and annual precipitation explicitly. Temperature and precipitation almost equally impact on NDVI changes; about 50% of the NDVI variations is explained in OLS modeling, and about 74% of the NDVI variations is explained in GWR modeling. GWR analysis indicated that both precipitation and temperature characterize the spatial heterogeneity patterns of forest dynamics.

  6. An interactive website for analytical method comparison and bias estimation.

    PubMed

    Bahar, Burak; Tuncel, Ayse F; Holmes, Earle W; Holmes, Daniel T

    2017-12-01

    Regulatory standards mandate laboratories to perform studies to ensure accuracy and reliability of their test results. Method comparison and bias estimation are important components of these studies. We developed an interactive website for evaluating the relative performance of two analytical methods using R programming language tools. The website can be accessed at https://bahar.shinyapps.io/method_compare/. The site has an easy-to-use interface that allows both copy-pasting and manual entry of data. It also allows selection of a regression model and creation of regression and difference plots. Available regression models include Ordinary Least Squares, Weighted-Ordinary Least Squares, Deming, Weighted-Deming, Passing-Bablok and Passing-Bablok for large datasets. The server processes the data and generates downloadable reports in PDF or HTML format. Our website provides clinical laboratories a practical way to assess the relative performance of two analytical methods. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  7. Forest dynamics to precipitation and temperature in the Gulf of Mexico coastal region

    NASA Astrophysics Data System (ADS)

    Li, Tianyu; Meng, Qingmin

    2017-05-01

    The forest is one of the most significant components of the Gulf of Mexico (GOM) coast. It provides livelihood to inhabitant and is known to be sensitive to climatic fluctuations. This study focuses on examining the impacts of temperature and precipitation variations on coastal forest. Two different regression methods, ordinary least squares (OLS) and geographically weighted regression (GWR), were employed to reveal the relationship between meteorological variables and forest dynamics. OLS regression analysis shows that changes in precipitation and temperature, over a span of 12 months, are responsible for 56% of NDVI variation. The forest, which is not particularly affected by the average monthly precipitation in most months, is observed to be affected by cumulative seasonal and annual precipitation explicitly. Temperature and precipitation almost equally impact on NDVI changes; about 50% of the NDVI variations is explained in OLS modeling, and about 74% of the NDVI variations is explained in GWR modeling. GWR analysis indicated that both precipitation and temperature characterize the spatial heterogeneity patterns of forest dynamics.

  8. Climatic change projections for winter streamflow in Guadalquivir river

    NASA Astrophysics Data System (ADS)

    Jesús Esteban Parra, María; Hidalgo Muñoz, José Manuel; García-Valdecasas-Ojeda, Matilde; Raquel Gámiz Fortis, Sonia; Castro Díez, Yolanda

    2015-04-01

    In this work we have obtained climate change projections for winter streamflow of the Guadalquivir River in the period 2071-2100 using the Principal Component Regression (PCR) method. The streamflow data base used has been provided by the Center for Studies and Experimentation of Public Works, CEDEX. Series from gauging stations and reservoirs with less than 10% of missing data (filled by regression with well correlated neighboring stations) have been considered. The homogeneity of these series has been evaluated through the Pettit test and degree of human alteration by the Common Area Index. The application of these criteria led to the selection of 13 streamflow time series homogeneously distributed over the basin, covering the period 1952-2011. For this streamflow data, winter seasonal values were obtained by averaging the monthly values from January to March. The PCR method has been applied using the Principal Components of the mean anomalies of sea level pressure (SLP) in winter (December to February averaged) as predictors of streamflow for the development of a downscaled statistical model. The SLP database is the NCEP reanalysis covering the North Atlantic region, and the calibration and validation periods used for fitting and evaluating the ability of the model are 1952-1992 and 1993-2011, respectively. In general, using four Principal Components, regression models are able to explain up to 70% of the variance of the streamflow data. Finally, the statistical model obtained for the observational data was applied to the SLP data for the period 2071-2100, using the outputs of different GCMs of the CMIP5 under the RPC8.5 scenario. The results found for the end of the century show no significant changes or moderate decrease in the streamflow of this river for most GCMs in winter, but for some of them the decrease is very strong. Keywords: Statistical downscaling, streamflow, Guadalquivir River, climate change. ACKNOWLEDGEMENTS This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía-Spain) and CGL2013-48539-R (MINECO-Spain, FEDER).

  9. Is More Always Better in Designing Workplace Wellness Programs?: A Comparison of Wellness Program Components Versus Outcomes.

    PubMed

    Batorsky, Benjamin; Van Stolk, Christian; Liu, Hangsheng

    2016-10-01

    Assess whether adding more components to a workplace wellness program is associated with better outcomes by measuring the relationship of program components to one another and to employee participation and perceptions of program effectiveness. Data came from a 2014 survey of 24,393 employees of 81 employers about services offered, leadership, incentives, and promotion. Logistic regressions were used to model the relationship between program characteristics and outcomes. Components individually are related to better outcomes, but this relationship is weaker in the presence of other components and non-significant for incentives. Within components, a moderate level of services and work time participation opportunities are associated with higher participation and effectiveness. The "more of everything" approach does not appear to be advisable for all programs. Programs should focus on providing ample opportunities for employees to participate and initiatives like results-based incentives.

  10. [Studies on the brand traceability of milk powder based on NIR spectroscopy technology].

    PubMed

    Guan, Xiao; Gu, Fang-Qing; Liu, Jing; Yang, Yong-Jian

    2013-10-01

    Brand traceability of several different kinds of milk powder was studied by combining near infrared spectroscopy diffuse reflectance mode with soft independent modeling of class analogy (SIMCA) in the present paper. The near infrared spectrum of 138 samples, including 54 Guangming milk powder samples, 43 Netherlands samples, and 33 Nestle samples and 8 Yili samples, were collected. After pretreatment of full spectrum data variables in training set, principal component analysis was performed, and the contribution rate of the cumulative variance of the first three principal components was about 99.07%. Milk powder principal component regression model based on SIMCA was established, and used to classify the milk powder samples in prediction sets. The results showed that the recognition rate of Guangming milk powder, Netherlands milk powder and Nestle milk powder was 78%, 75% and 100%, the rejection rate was 100%, 87%, and 88%, respectively. Therefore, the near infrared spectroscopy combined with SIMCA model can classify milk powder with high accuracy, and is a promising identification method of milk powder variety.

  11. A Nonlinear Model for Gene-Based Gene-Environment Interaction.

    PubMed

    Sa, Jian; Liu, Xu; He, Tao; Liu, Guifen; Cui, Yuehua

    2016-06-04

    A vast amount of literature has confirmed the role of gene-environment (G×E) interaction in the etiology of complex human diseases. Traditional methods are predominantly focused on the analysis of interaction between a single nucleotide polymorphism (SNP) and an environmental variable. Given that genes are the functional units, it is crucial to understand how gene effects (rather than single SNP effects) are influenced by an environmental variable to affect disease risk. Motivated by the increasing awareness of the power of gene-based association analysis over single variant based approach, in this work, we proposed a sparse principle component regression (sPCR) model to understand the gene-based G×E interaction effect on complex disease. We first extracted the sparse principal components for SNPs in a gene, then the effect of each principal component was modeled by a varying-coefficient (VC) model. The model can jointly model variants in a gene in which their effects are nonlinearly influenced by an environmental variable. In addition, the varying-coefficient sPCR (VC-sPCR) model has nice interpretation property since the sparsity on the principal component loadings can tell the relative importance of the corresponding SNPs in each component. We applied our method to a human birth weight dataset in Thai population. We analyzed 12,005 genes across 22 chromosomes and found one significant interaction effect using the Bonferroni correction method and one suggestive interaction. The model performance was further evaluated through simulation studies. Our model provides a system approach to evaluate gene-based G×E interaction.

  12. Optical system for tablet variety discrimination using visible/near-infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Shao, Yongni; He, Yong; Hu, Xingyue

    2007-12-01

    An optical system based on visible/near-infrared spectroscopy (Vis/NIRS) for variety discrimination of ginkgo (Ginkgo biloba L.) tablets was developed. This system consisted of a light source, beam splitter system, sample chamber, optical detector (diffuse reflection detector), and data collection. The tablet varieties used in the research include Da na kang, Xin bang, Tian bao ning, Yi kang, Hua na xing, Dou le, Lv yuan, Hai wang, and Ji yao. All samples (n=270) were scanned in the Vis/NIR region between 325 and 1075 nm using a spectrograph. The chemometrics method of principal component artificial neural network (PC-ANN) was used to establish discrimination models of them. In PC-ANN models, the scores of the principal components were chosen as the input nodes for the input layer of ANN, and the best discrimination rate of 91.1% was reached. Principal component analysis was also executed to select several optimal wavelengths based on loading values. Wavelengths at 481, 458, 466, 570, 1000, 662, and 400 nm were then used as the input data of stepwise multiple linear regression, the regression equation of ginkgo tablets was obtained, and the discrimination rate was researched 84.4%. The results indicated that this optical system could be applied to discriminating ginkgo (Ginkgo biloba L.) tablets, and it supplied a new method for fast ginkgo tablet variety discrimination.

  13. Simulating stream transport of nutrients in the eastern United States, 2002, using a spatially-referenced regression model and 1:100,000-scale hydrography

    USGS Publications Warehouse

    Hoos, Anne B.; Moore, Richard B.; Garcia, Ana Maria; Noe, Gregory B.; Terziotti, Silvia E.; Johnston, Craig M.; Dennis, Robin L.

    2013-01-01

    Existing Spatially Referenced Regression on Watershed attributes (SPARROW) nutrient models for the northeastern and southeastern regions of the United States were recalibrated to achieve a hydrographically consistent model with which to assess nutrient sources and stream transport and investigate specific management questions about the effects of wetlands and atmospheric deposition on nutrient transport. Recalibrated nitrogen models for the northeast and southeast were sufficiently similar to be merged into a single nitrogen model for the eastern United States. The atmospheric deposition source in the nitrogen model has been improved to account for individual components of atmospheric input, derived from emissions from agricultural manure, agricultural livestock, vehicles, power plants, other industry, and background sources. This accounting makes it possible to simulate the effects of altering an individual component of atmospheric deposition, such as nitrate emissions from vehicles or power plants. Regional differences in transport of phosphorus through wetlands and reservoirs were investigated and resulted in two distinct phosphorus models for the northeast and southeast. The recalibrated nitrogen and phosphorus models account explicitly for the influence of wetlands on regional-scale land-phase and aqueous-phase transport of nutrients and therefore allow comparison of the water-quality functions of different wetland systems over large spatial scales. Seven wetland systems were associated with enhanced transport of either nitrogen or phosphorus in streams, probably because of the export of dissolved organic nitrogen and bank erosion. Six wetland systems were associated with mitigating the delivery of either nitrogen or phosphorus to streams, probably because of sedimentation, phosphate sorption, and ground water infiltration.

  14. Construction and analysis of a modular model of caspase activation in apoptosis

    PubMed Central

    Harrington, Heather A; Ho, Kenneth L; Ghosh, Samik; Tung, KC

    2008-01-01

    Background A key physiological mechanism employed by multicellular organisms is apoptosis, or programmed cell death. Apoptosis is triggered by the activation of caspases in response to both extracellular (extrinsic) and intracellular (intrinsic) signals. The extrinsic and intrinsic pathways are characterized by the formation of the death-inducing signaling complex (DISC) and the apoptosome, respectively; both the DISC and the apoptosome are oligomers with complex formation dynamics. Additionally, the extrinsic and intrinsic pathways are coupled through the mitochondrial apoptosis-induced channel via the Bcl-2 family of proteins. Results A model of caspase activation is constructed and analyzed. The apoptosis signaling network is simplified through modularization methodologies and equilibrium abstractions for three functional modules. The mathematical model is composed of a system of ordinary differential equations which is numerically solved. Multiple linear regression analysis investigates the role of each module and reduced models are constructed to identify key contributions of the extrinsic and intrinsic pathways in triggering apoptosis for different cell lines. Conclusion Through linear regression techniques, we identified the feedbacks, dissociation of complexes, and negative regulators as the key components in apoptosis. The analysis and reduced models for our model formulation reveal that the chosen cell lines predominately exhibit strong extrinsic caspase, typical of type I cell, behavior. Furthermore, under the simplified model framework, the selected cells lines exhibit different modes by which caspase activation may occur. Finally the proposed modularized model of apoptosis may generalize behavior for additional cells and tissues, specifically identifying and predicting components responsible for the transition from type I to type II cell behavior. PMID:19077196

  15. Modeling strategies for pharmaceutical blend monitoring and end-point determination by near-infrared spectroscopy.

    PubMed

    Igne, Benoît; de Juan, Anna; Jaumot, Joaquim; Lallemand, Jordane; Preys, Sébastien; Drennen, James K; Anderson, Carl A

    2014-10-01

    The implementation of a blend monitoring and control method based on a process analytical technology such as near infrared spectroscopy requires the selection and optimization of numerous criteria that will affect the monitoring outputs and expected blend end-point. Using a five component formulation, the present article contrasts the modeling strategies and end-point determination of a traditional quantitative method based on the prediction of the blend parameters employing partial least-squares regression with a qualitative strategy based on principal component analysis and Hotelling's T(2) and residual distance to the model, called Prototype. The possibility to monitor and control blend homogeneity with multivariate curve resolution was also assessed. The implementation of the above methods in the presence of designed experiments (with variation of the amount of active ingredient and excipients) and with normal operating condition samples (nominal concentrations of the active ingredient and excipients) was tested. The impact of criteria used to stop the blends (related to precision and/or accuracy) was assessed. Results demonstrated that while all methods showed similarities in their outputs, some approaches were preferred for decision making. The selectivity of regression based methods was also contrasted with the capacity of qualitative methods to determine the homogeneity of the entire formulation. Copyright © 2014. Published by Elsevier B.V.

  16. Simultaneous determination of penicillin G salts by infrared spectroscopy: Evaluation of combining orthogonal signal correction with radial basis function-partial least squares regression

    NASA Astrophysics Data System (ADS)

    Talebpour, Zahra; Tavallaie, Roya; Ahmadi, Seyyed Hamid; Abdollahpour, Assem

    2010-09-01

    In this study, a new method for the simultaneous determination of penicillin G salts in pharmaceutical mixture via FT-IR spectroscopy combined with chemometrics was investigated. The mixture of penicillin G salts is a complex system due to similar analytical characteristics of components. Partial least squares (PLS) and radial basis function-partial least squares (RBF-PLS) were used to develop the linear and nonlinear relation between spectra and components, respectively. The orthogonal signal correction (OSC) preprocessing method was used to correct unexpected information, such as spectral overlapping and scattering effects. In order to compare the influence of OSC on PLS and RBF-PLS models, the optimal linear (PLS) and nonlinear (RBF-PLS) models based on conventional and OSC preprocessed spectra were established and compared. The obtained results demonstrated that OSC clearly enhanced the performance of both RBF-PLS and PLS calibration models. Also in the case of some nonlinear relation between spectra and component, OSC-RBF-PLS gave satisfactory results than OSC-PLS model which indicated that the OSC was helpful to remove extrinsic deviations from linearity without elimination of nonlinear information related to component. The chemometric models were tested on an external dataset and finally applied to the analysis commercialized injection product of penicillin G salts.

  17. Random regression analyses using B-splines to model growth of Australian Angus cattle

    PubMed Central

    Meyer, Karin

    2005-01-01

    Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error. PMID:16093011

  18. Model selection for semiparametric marginal mean regression accounting for within-cluster subsampling variability and informative cluster size.

    PubMed

    Shen, Chung-Wei; Chen, Yi-Hau

    2018-03-13

    We propose a model selection criterion for semiparametric marginal mean regression based on generalized estimating equations. The work is motivated by a longitudinal study on the physical frailty outcome in the elderly, where the cluster size, that is, the number of the observed outcomes in each subject, is "informative" in the sense that it is related to the frailty outcome itself. The new proposal, called Resampling Cluster Information Criterion (RCIC), is based on the resampling idea utilized in the within-cluster resampling method (Hoffman, Sen, and Weinberg, 2001, Biometrika 88, 1121-1134) and accommodates informative cluster size. The implementation of RCIC, however, is free of performing actual resampling of the data and hence is computationally convenient. Compared with the existing model selection methods for marginal mean regression, the RCIC method incorporates an additional component accounting for variability of the model over within-cluster subsampling, and leads to remarkable improvements in selecting the correct model, regardless of whether the cluster size is informative or not. Applying the RCIC method to the longitudinal frailty study, we identify being female, old age, low income and life satisfaction, and chronic health conditions as significant risk factors for physical frailty in the elderly. © 2018, The International Biometric Society.

  19. Physiological and anthropometric determinants of rhythmic gymnastics performance.

    PubMed

    Douda, Helen T; Toubekis, Argyris G; Avloniti, Alexandra A; Tokmakidis, Savvas P

    2008-03-01

    To identify the physiological and anthropometric predictors of rhythmic gymnastics performance, which was defined from the total ranking score of each athlete in a national competition. Thirty-four rhythmic gymnasts were divided into 2 groups, elite (n = 15) and nonelite (n = 19), and they underwent a battery of anthropometric, physical fitness, and physiological measurements. The principal-components analysis extracted 6 components: anthropometric, flexibility, explosive strength, aerobic capacity, body dimensions, and anaerobic metabolism. These were used in a simultaneous multiple-regression procedure to determine which best explain the variance in rhythmic gymnastics performance. Based on the principal-component analysis, the anthropometric component explained 45% of the total variance, flexibility 12.1%, explosive strength 9.2%, aerobic capacity 7.4%, body dimensions 6.8%, and anaerobic metabolism 4.6%. Components of anthropometric (r = .50) and aerobic capacity (r = .49) were significantly correlated with performance (P < .01). When the multiple-regression model-y = 10.708 + (0.0005121 x VO2max) + (0.157 x arm span) + (0.814 x midthigh circumference) - (0.293 x body mass)-was applied to elite gymnasts, 92.5% of the variation was explained by VO2max (58.9%), arm span (12%), midthigh circumference (13.1%), and body mass (8.5%). Selected anthropometric characteristics, aerobic power, flexibility, and explosive strength are important determinants of successful performance. These findings might have practical implications for both training and talent identification in rhythmic gymnastics.

  20. SAND: an automated VLBI imaging and analysing pipeline - I. Stripping component trajectories

    NASA Astrophysics Data System (ADS)

    Zhang, M.; Collioud, A.; Charlot, P.

    2018-02-01

    We present our implementation of an automated very long baseline interferometry (VLBI) data-reduction pipeline that is dedicated to interferometric data imaging and analysis. The pipeline can handle massive VLBI data efficiently, which makes it an appropriate tool to investigate multi-epoch multiband VLBI data. Compared to traditional manual data reduction, our pipeline provides more objective results as less human interference is involved. The source extraction is carried out in the image plane, while deconvolution and model fitting are performed in both the image plane and the uv plane for parallel comparison. The output from the pipeline includes catalogues of CLEANed images and reconstructed models, polarization maps, proper motion estimates, core light curves and multiband spectra. We have developed a regression STRIP algorithm to automatically detect linear or non-linear patterns in the jet component trajectories. This algorithm offers an objective method to match jet components at different epochs and to determine their proper motions.

  1. A regression-kriging model for estimation of rainfall in the Laohahe basin

    NASA Astrophysics Data System (ADS)

    Wang, Hong; Ren, Li L.; Liu, Gao H.

    2009-10-01

    This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.

  2. Estimation of aboveground biomass in Mediterranean forests by statistical modelling of ASTER fraction images

    NASA Astrophysics Data System (ADS)

    Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.

    2014-09-01

    Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.

  3. Separation mechanism of nortriptyline and amytriptyline in RPLC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gritti, Fabrice; Guiochon, Georges A

    2005-08-01

    The single and the competitive equilibrium isotherms of nortriptyline and amytriptyline were acquired by frontal analysis (FA) on the C{sub 18}-bonded discovery column, using a 28/72 (v/v) mixture of acetonitrile and water buffered with phosphate (20 mM, pH 2.70). The adsorption energy distributions (AED) of each compound were calculated from the raw adsorption data. Both the fitting of the adsorption data using multi-linear regression analysis and the AEDs are consistent with a trimodal isotherm model. The single-component isotherm data fit well to the tri-Langmuir isotherm model. The extension to a competitive two-component tri-Langmuir isotherm model based on the best parametersmore » of the single-component isotherms does not account well for the breakthrough curves nor for the overloaded band profiles measured for mixtures of nortriptyline and amytriptyline. However, it was possible to derive adjusted parameters of a competitive tri-Langmuir model based on the fitting of the adsorption data obtained for these mixtures. A very good agreement was then found between the calculated and the experimental overloaded band profiles of all the mixtures injected.« less

  4. Indriect Measurement Of Nitrogen In A Mult-Component Natural Gas By Heating The Gas

    DOEpatents

    Morrow, Thomas B.; Behring, II, Kendricks A.

    2004-06-22

    Methods of indirectly measuring the nitrogen concentration in a natural gas by heating the gas. In two embodiments, the heating energy is correlated to the speed of sound in the gas, the diluent concentrations in the gas, and constant values, resulting in a model equation. Regression analysis is used to calculate the constant values, which can then be substituted into the model equation. If the diluent concentrations other than nitrogen (typically carbon dioxide) are known, the model equation can be solved for the nitrogen concentration.

  5. Estimating ground-level PM(10) in a Chinese city by combining satellite data, meteorological information and a land use regression model.

    PubMed

    Meng, Xia; Fu, Qingyan; Ma, Zongwei; Chen, Li; Zou, Bin; Zhang, Yan; Xue, Wenbo; Wang, Jinnan; Wang, Dongfang; Kan, Haidong; Liu, Yang

    2016-01-01

    Development of exposure assessment model is the key component for epidemiological studies concerning air pollution, but the evidence from China is limited. Therefore, a linear mixed effects (LME) model was established in this study in a Chinese metropolis by incorporating aerosol optical depth (AOD), meteorological information and the land use regression (LUR) model to predict ground PM10 levels on high spatiotemporal resolution. The cross validation (CV) R(2) and the RMSE of the LME model were 0.87 and 19.2 μg/m(3), respectively. The relative prediction error (RPE) of daily and annual mean predicted PM10 concentrations were 19.1% and 7.5%, respectively. This study was the first attempt in China to estimate both short-term and long-term variation of PM10 levels with high spatial resolution in a Chinese metropolis with the LME model. The results suggested that the LME model could provide exposure assessment for short-term and long-term epidemiological studies in China. Copyright © 2015 Elsevier Ltd. All rights reserved.

  6. Dual Incorporation of the in vitro Data (IC50) and in vivo (Cmax) Data for the Prediction of Area Under the Curve (AUC) for Statins using Regression Models Developed for Either Pravastatin or Simvastatin.

    PubMed

    Srinivas, N R

    2016-08-01

    Linear regression models utilizing a single time point (Cmax) has been reported for pravastatin and simvastatin. A new model was developed for the prediction of AUC of statins that utilized the slopes of the above 2 models, with pharmacokinetic (Cmax) and a pharmacodynamic (IC50 value) components for the statins. The prediction of AUCs for various statins (pravastatin, atorvastatin, simvastatin and rosuvastatin) was carried out using the newly developed dual pharmacokinetic and pharmacodynamic model. Generally, the AUC predictions were contained within 0.5 to 2-fold difference of the observed AUC suggesting utility of the new models. The root mean square error predictions were<45% for the 2 models. On the basis of the present work, it is feasible to utilize both pharmacokinetic (Cmax) and pharmacodynamic (IC50) data for effectively predicting the AUC for statins. Such a new concept as described in the work may have utility in both drug discovery and development stages. © Georg Thieme Verlag KG Stuttgart · New York.

  7. Serum Folate Shows an Inverse Association with Blood Pressure in a Cohort of Chinese Women of Childbearing Age: A Cross-Sectional Study

    PubMed Central

    Shen, Minxue; Tan, Hongzhuan; Zhou, Shujin; Retnakaran, Ravi; Smith, Graeme N.; Davidge, Sandra T.; Trasler, Jacquetta; Walker, Mark C.; Wen, Shi Wu

    2016-01-01

    Background It has been reported that higher folate intake from food and supplementation is associated with decreased blood pressure (BP). The association between serum folate concentration and BP has been examined in few studies. We aim to examine the association between serum folate and BP levels in a cohort of young Chinese women. Methods We used the baseline data from a pre-conception cohort of women of childbearing age in Liuyang, China, for this study. Demographic data were collected by structured interview. Serum folate concentration was measured by immunoassay, and homocysteine, blood glucose, triglyceride and total cholesterol were measured through standardized clinical procedures. Multiple linear regression and principal component regression model were applied in the analysis. Results A total of 1,532 healthy normotensive non-pregnant women were included in the final analysis. The mean concentration of serum folate was 7.5 ± 5.4 nmol/L and 55% of the women presented with folate deficiency (< 6.8 nmol/L). Multiple linear regression and principal component regression showed that serum folate levels were inversely associated with systolic and diastolic BP, after adjusting for demographic, anthropometric, and biochemical factors. Conclusions Serum folate is inversely associated with BP in non-pregnant women of childbearing age with high prevalence of folate deficiency. PMID:27182603

  8. Assessing the prediction accuracy of cure in the Cox proportional hazards cure model: an application to breast cancer data.

    PubMed

    Asano, Junichi; Hirakawa, Akihiro; Hamada, Chikuma

    2014-01-01

    A cure rate model is a survival model incorporating the cure rate with the assumption that the population contains both uncured and cured individuals. It is a powerful statistical tool for prognostic studies, especially in cancer. The cure rate is important for making treatment decisions in clinical practice. The proportional hazards (PH) cure model can predict the cure rate for each patient. This contains a logistic regression component for the cure rate and a Cox regression component to estimate the hazard for uncured patients. A measure for quantifying the predictive accuracy of the cure rate estimated by the Cox PH cure model is required, as there has been a lack of previous research in this area. We used the Cox PH cure model for the breast cancer data; however, the area under the receiver operating characteristic curve (AUC) could not be estimated because many patients were censored. In this study, we used imputation-based AUCs to assess the predictive accuracy of the cure rate from the PH cure model. We examined the precision of these AUCs using simulation studies. The results demonstrated that the imputation-based AUCs were estimable and their biases were negligibly small in many cases, although ordinary AUC could not be estimated. Additionally, we introduced the bias-correction method of imputation-based AUCs and found that the bias-corrected estimate successfully compensated the overestimation in the simulation studies. We also illustrated the estimation of the imputation-based AUCs using breast cancer data. Copyright © 2014 John Wiley & Sons, Ltd.

  9. SkyFACT: high-dimensional modeling of gamma-ray emission with adaptive templates and penalized likelihoods

    NASA Astrophysics Data System (ADS)

    Storm, Emma; Weniger, Christoph; Calore, Francesca

    2017-08-01

    We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (gtrsim 105) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that are motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |l|<90o and |b|<20o, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.

  10. Compatible Models of Carbon Content of Individual Trees on a Cunninghamia lanceolata Plantation in Fujian Province, China

    PubMed Central

    Zhuo, Lin; Tao, Hong; Wei, Hong; Chengzhen, Wu

    2016-01-01

    We tried to establish compatible carbon content models of individual trees for a Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) plantation from Fujian province in southeast China. In general, compatibility requires that the sum of components equal the whole tree, meaning that the sum of percentages calculated from component equations should equal 100%. Thus, we used multiple approaches to simulate carbon content in boles, branches, foliage leaves, roots and the whole individual trees. The approaches included (i) single optimal fitting (SOF), (ii) nonlinear adjustment in proportion (NAP) and (iii) nonlinear seemingly unrelated regression (NSUR). These approaches were used in combination with variables relating diameter at breast height (D) and tree height (H), such as D, D2H, DH and D&H (where D&H means two separate variables in bivariate model). Power, exponential and polynomial functions were tested as well as a new general function model was proposed by this study. Weighted least squares regression models were employed to eliminate heteroscedasticity. Model performances were evaluated by using mean residuals, residual variance, mean square error and the determination coefficient. The results indicated that models with two dimensional variables (DH, D2H and D&H) were always superior to those with a single variable (D). The D&H variable combination was found to be the most useful predictor. Of all the approaches, SOF could establish a single optimal model separately, but there were deviations in estimating results due to existing incompatibilities, while NAP and NSUR could ensure predictions compatibility. Simultaneously, we found that the new general model had better accuracy than others. In conclusion, we recommend that the new general model be used to estimate carbon content for Chinese fir and considered for other vegetation types as well. PMID:26982054

  11. Self-consistent asset pricing models

    NASA Astrophysics Data System (ADS)

    Malevergne, Y.; Sornette, D.

    2007-08-01

    We discuss the foundations of factor or regression models in the light of the self-consistency condition that the market portfolio (and more generally the risk factors) is (are) constituted of the assets whose returns it is (they are) supposed to explain. As already reported in several articles, self-consistency implies correlations between the return disturbances. As a consequence, the alphas and betas of the factor model are unobservable. Self-consistency leads to renormalized betas with zero effective alphas, which are observable with standard OLS regressions. When the conditions derived from internal consistency are not met, the model is necessarily incomplete, which means that some sources of risk cannot be replicated (or hedged) by a portfolio of stocks traded on the market, even for infinite economies. Analytical derivations and numerical simulations show that, for arbitrary choices of the proxy which are different from the true market portfolio, a modified linear regression holds with a non-zero value αi at the origin between an asset i's return and the proxy's return. Self-consistency also introduces “orthogonality” and “normality” conditions linking the betas, alphas (as well as the residuals) and the weights of the proxy portfolio. Two diagnostics based on these orthogonality and normality conditions are implemented on a basket of 323 assets which have been components of the S&P500 in the period from January 1990 to February 2005. These two diagnostics show interesting departures from dynamical self-consistency starting about 2 years before the end of the Internet bubble. Assuming that the CAPM holds with the self-consistency condition, the OLS method automatically obeys the resulting orthogonality and normality conditions and therefore provides a simple way to self-consistently assess the parameters of the model by using proxy portfolios made only of the assets which are used in the CAPM regressions. Finally, the factor decomposition with the self-consistency condition derives a risk-factor decomposition in the multi-factor case which is identical to the principal component analysis (PCA), thus providing a direct link between model-driven and data-driven constructions of risk factors. This correspondence shows that PCA will therefore suffer from the same limitations as the CAPM and its multi-factor generalization, namely lack of out-of-sample explanatory power and predictability. In the multi-period context, the self-consistency conditions force the betas to be time-dependent with specific constraints.

  12. Optimisation of the formulation of a bubble bath by a chemometric approach market segmentation and optimisation.

    PubMed

    Marengo, Emilio; Robotti, Elisa; Gennaro, Maria Carla; Bertetto, Mariella

    2003-03-01

    The optimisation of the formulation of a commercial bubble bath was performed by chemometric analysis of Panel Tests results. A first Panel Test was performed to choose the best essence, among four proposed to the consumers; the best essence chosen was used in the revised commercial bubble bath. Afterwards, the effect of changing the amount of four components (the amount of primary surfactant, the essence, the hydratant and the colouring agent) of the bubble bath was studied by a fractional factorial design. The segmentation of the bubble bath market was performed by a second Panel Test, in which the consumers were requested to evaluate the samples coming from the experimental design. The results were then treated by Principal Component Analysis. The market had two segments: people preferring a product with a rich formulation and people preferring a poor product. The final target, i.e. the optimisation of the formulation for each segment, was obtained by the calculation of regression models relating the subjective evaluations given by the Panel and the compositions of the samples. The regression models allowed to identify the best formulations for the two segments ofthe market.

  13. Structured penalties for functional linear models-partially empirical eigenvectors for regression.

    PubMed

    Randolph, Timothy W; Harezlak, Jaroslaw; Feng, Ziding

    2012-01-01

    One of the challenges with functional data is incorporating geometric structure, or local correlation, into the analysis. This structure is inherent in the output from an increasing number of biomedical technologies, and a functional linear model is often used to estimate the relationship between the predictor functions and scalar responses. Common approaches to the problem of estimating a coefficient function typically involve two stages: regularization and estimation. Regularization is usually done via dimension reduction, projecting onto a predefined span of basis functions or a reduced set of eigenvectors (principal components). In contrast, we present a unified approach that directly incorporates geometric structure into the estimation process by exploiting the joint eigenproperties of the predictors and a linear penalty operator. In this sense, the components in the regression are 'partially empirical' and the framework is provided by the generalized singular value decomposition (GSVD). The form of the penalized estimation is not new, but the GSVD clarifies the process and informs the choice of penalty by making explicit the joint influence of the penalty and predictors on the bias, variance and performance of the estimated coefficient function. Laboratory spectroscopy data and simulations are used to illustrate the concepts.

  14. A SAR and QSAR study of new artemisinin compounds with antimalarial activity.

    PubMed

    Santos, Cleydson Breno R; Vieira, Josinete B; Lobato, Cleison C; Hage-Melim, Lorane I S; Souto, Raimundo N P; Lima, Clarissa S; Costa, Elizabeth V M; Brasil, Davi S B; Macêdo, Williams Jorge C; Carvalho, José Carlos T

    2013-12-30

    The Hartree-Fock method and the 6-31G** basis set were employed to calculate the molecular properties of artemisinin and 20 derivatives with antimalarial activity. Maps of molecular electrostatic potential (MEPs) and molecular docking were used to investigate the interaction between ligands and the receptor (heme). Principal component analysis and hierarchical cluster analysis were employed to select the most important descriptors related to activity. The correlation between biological activity and molecular properties was obtained using the partial least squares and principal component regression methods. The regression PLS and PCR models built in this study were also used to predict the antimalarial activity of 30 new artemisinin compounds with unknown activity. The models obtained showed not only statistical significance but also predictive ability. The significant molecular descriptors related to the compounds with antimalarial activity were the hydration energy (HE), the charge on the O11 oxygen atom (QO11), the torsion angle O1-O2-Fe-N2 (D2) and the maximum rate of R/Sanderson Electronegativity (RTe+). These variables led to a physical and structural explanation of the molecular properties that should be selected for when designing new ligands to be used as antimalarial agents.

  15. Applying different independent component analysis algorithms and support vector regression for IT chain store sales forecasting.

    PubMed

    Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie

    2014-01-01

    Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.

  16. Applying Different Independent Component Analysis Algorithms and Support Vector Regression for IT Chain Store Sales Forecasting

    PubMed Central

    Dai, Wensheng

    2014-01-01

    Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740

  17. Future climate data from RCP 4.5 and occurrence of malaria in Korea.

    PubMed

    Kwak, Jaewon; Noh, Huiseong; Kim, Soojun; Singh, Vijay P; Hong, Seung Jin; Kim, Duckgil; Lee, Keonhaeng; Kang, Narae; Kim, Hung Soo

    2014-10-15

    Since its reappearance at the Military Demarcation Line in 1993, malaria has been occurring annually in Korea. Malaria is regarded as a third grade nationally notifiable disease susceptible to climate change. The objective of this study is to quantify the effect of climatic factors on the occurrence of malaria in Korea and construct a malaria occurrence model for predicting the future trend of malaria under the influence of climate change. Using data from 2001-2011, the effect of time lag between malaria occurrence and mean temperature, relative humidity and total precipitation was investigated using spectral analysis. Also, a principal component regression model was constructed, considering multicollinearity. Future climate data, generated from RCP 4.5 climate change scenario and CNCM3 climate model, was applied to the constructed regression model to simulate future malaria occurrence and analyze the trend of occurrence. Results show an increase in the occurrence of malaria and the shortening of annual time of occurrence in the future.

  18. Future Climate Data from RCP 4.5 and Occurrence of Malaria in Korea

    PubMed Central

    Kwak, Jaewon; Noh, Huiseong; Kim, Soojun; Singh, Vijay P.; Hong, Seung Jin; Kim, Duckgil; Lee, Keonhaeng; Kang, Narae; Kim, Hung Soo

    2014-01-01

    Since its reappearance at the Military Demarcation Line in 1993, malaria has been occurring annually in Korea. Malaria is regarded as a third grade nationally notifiable disease susceptible to climate change. The objective of this study is to quantify the effect of climatic factors on the occurrence of malaria in Korea and construct a malaria occurrence model for predicting the future trend of malaria under the influence of climate change. Using data from 2001–2011, the effect of time lag between malaria occurrence and mean temperature, relative humidity and total precipitation was investigated using spectral analysis. Also, a principal component regression model was constructed, considering multicollinearity. Future climate data, generated from RCP 4.5 climate change scenario and CNCM3 climate model, was applied to the constructed regression model to simulate future malaria occurrence and analyze the trend of occurrence. Results show an increase in the occurrence of malaria and the shortening of annual time of occurrence in the future. PMID:25321875

  19. Effect of removing the common mode errors on linear regression analysis of noise amplitudes in position time series of a regional GPS network & a case study of GPS stations in Southern California

    NASA Astrophysics Data System (ADS)

    Jiang, Weiping; Ma, Jun; Li, Zhao; Zhou, Xiaohui; Zhou, Boye

    2018-05-01

    The analysis of the correlations between the noise in different components of GPS stations has positive significance to those trying to obtain more accurate uncertainty of velocity with respect to station motion. Previous research into noise in GPS position time series focused mainly on single component evaluation, which affects the acquisition of precise station positions, the velocity field, and its uncertainty. In this study, before and after removing the common-mode error (CME), we performed one-dimensional linear regression analysis of the noise amplitude vectors in different components of 126 GPS stations with a combination of white noise, flicker noise, and random walking noise in Southern California. The results show that, on the one hand, there are above-moderate degrees of correlation between the white noise amplitude vectors in all components of the stations before and after removal of the CME, while the correlations between flicker noise amplitude vectors in horizontal and vertical components are enhanced from un-correlated to moderately correlated by removing the CME. On the other hand, the significance tests show that, all of the obtained linear regression equations, which represent a unique function of the noise amplitude in any two components, are of practical value after removing the CME. According to the noise amplitude estimates in two components and the linear regression equations, more accurate noise amplitudes can be acquired in the two components.

  20. Mathematical Modelling of Optimization of Structures of Monolithic Coverings Based on Liquid Rubbers

    NASA Astrophysics Data System (ADS)

    Turgumbayeva, R. Kh; Abdikarimov, M. N.; Mussabekov, R.; Sartayev, D. T.

    2018-05-01

    The paper considers optimization of monolithic coatings compositions using a computer and MPE methods. The goal of the paper was to construct a mathematical model of the complete factorial experiment taking into account its plan and conditions. Several regression equations were received. Dependence between content components and parameters of rubber, as well as the quantity of a rubber crumb, was considered. An optimal composition for manufacturing the material of monolithic coatings compositions was recommended based on experimental data.

  1. High dimensional linear regression models under long memory dependence and measurement error

    NASA Astrophysics Data System (ADS)

    Kaul, Abhishek

    This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.

  2. Real time damage detection using recursive principal components and time varying auto-regressive modeling

    NASA Astrophysics Data System (ADS)

    Krishnan, M.; Bhowmik, B.; Hazra, B.; Pakrashi, V.

    2018-02-01

    In this paper, a novel baseline free approach for continuous online damage detection of multi degree of freedom vibrating structures using Recursive Principal Component Analysis (RPCA) in conjunction with Time Varying Auto-Regressive Modeling (TVAR) is proposed. In this method, the acceleration data is used to obtain recursive proper orthogonal components online using rank-one perturbation method, followed by TVAR modeling of the first transformed response, to detect the change in the dynamic behavior of the vibrating system from its pristine state to contiguous linear/non-linear-states that indicate damage. Most of the works available in the literature deal with algorithms that require windowing of the gathered data owing to their data-driven nature which renders them ineffective for online implementation. Algorithms focussed on mathematically consistent recursive techniques in a rigorous theoretical framework of structural damage detection is missing, which motivates the development of the present framework that is amenable for online implementation which could be utilized along with suite experimental and numerical investigations. The RPCA algorithm iterates the eigenvector and eigenvalue estimates for sample covariance matrices and new data point at each successive time instants, using the rank-one perturbation method. TVAR modeling on the principal component explaining maximum variance is utilized and the damage is identified by tracking the TVAR coefficients. This eliminates the need for offline post processing and facilitates online damage detection especially when applied to streaming data without requiring any baseline data. Numerical simulations performed on a 5-dof nonlinear system under white noise excitation and El Centro (also known as 1940 Imperial Valley earthquake) excitation, for different damage scenarios, demonstrate the robustness of the proposed algorithm. The method is further validated on results obtained from case studies involving experiments performed on a cantilever beam subjected to earthquake excitation; a two-storey benchscale model with a TMD and, data from recorded responses of UCLA factor building demonstrate the efficacy of the proposed methodology as an ideal candidate for real time, reference free structural health monitoring.

  3. Association Between Awareness of Hypertension and Health-Related Quality of Life in a Cross-Sectional Population-Based Study in Rural Area of Northwest China.

    PubMed

    Mi, Baibing; Dang, Shaonong; Li, Qiang; Zhao, Yaling; Yang, Ruihai; Wang, Duolao; Yan, Hong

    2015-07-01

    Hypertensive patients have more complex health care needs and are more likely to have poorer health-related quality of life than normotensive people. The awareness of hypertension could be related to reduce health-related quality of life. We propose the use of quantile regression to explore more detailed relationships between awareness of hypertension and health-related quality of life. In a cross-sectional, population-based study, 2737 participants (including 1035 hypertensive patients and 1702 normotensive participants) completed the Short-Form Health Survey. A quantile regression model was employed to investigate the association of physical component summary scores and mental component summary scores with awareness of hypertension and to evaluate the associated factors. Patients who were aware of hypertension (N = 554) had lower scores than patients who were unaware of hypertension (N = 481). The median (IQR) of physical component summary scores: 48.20 (13.88) versus 53.27 (10.79), P < 0.01; the mental component summary scores: 50.68 (15.09) versus 51.70 (10.65), P = 0.03. adjusting for covariates, the quantile regression results suggest awareness of hypertension was associated with most physical component summary scores quantiles (P < 0.05 except 10th and 20th quantiles) in which the β-estimates from -2.14 (95% CI: -3.80 to -0.48) to -1.45 (95% CI: -2.42 to -0.47), as the same significant trend with some poorer mental component summary scores quantiles in which the β-estimates from -3.47 (95% CI: -6.65 to -0.39) to -2.18 (95% CI: -4.30 to -0.06). The awareness of hypertension has a greater effect on those with intermediate physical component summary status: the β-estimates were equal to -2.04 (95% CI: -3.51 to -0.57, P < 0.05) at the 40th and decreased further to -1.45 (95% CI: -2.42 to -0.47, P < 0.01) at the 90th quantile. Awareness of hypertension was negatively related to health-related quality of life in hypertensive patients in rural western China, which has a greater effect on mental component summary scores with the poorer status and on physical component summary scores with the intermediate status.

  4. Analyzing Seasonal Variations in Suicide With Fourier Poisson Time-Series Regression: A Registry-Based Study From Norway, 1969-2007.

    PubMed

    Bramness, Jørgen G; Walby, Fredrik A; Morken, Gunnar; Røislien, Jo

    2015-08-01

    Seasonal variation in the number of suicides has long been acknowledged. It has been suggested that this seasonality has declined in recent years, but studies have generally used statistical methods incapable of confirming this. We examined all suicides occurring in Norway during 1969-2007 (more than 20,000 suicides in total) to establish whether seasonality decreased over time. Fitting of additive Fourier Poisson time-series regression models allowed for formal testing of a possible linear decrease in seasonality, or a reduction at a specific point in time, while adjusting for a possible smooth nonlinear long-term change without having to categorize time into discrete yearly units. The models were compared using Akaike's Information Criterion and analysis of variance. A model with a seasonal pattern was significantly superior to a model without one. There was a reduction in seasonality during the period. Both the model assuming a linear decrease in seasonality and the model assuming a change at a specific point in time were both superior to a model assuming constant seasonality, thus confirming by formal statistical testing that the magnitude of the seasonality in suicides has diminished. The additive Fourier Poisson time-series regression model would also be useful for studying other temporal phenomena with seasonal components. © The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  5. Semiparametric modeling and estimation of the terminal behavior of recurrent marker processes before failure events.

    PubMed

    Chan, Kwun Chuen Gary; Wang, Mei-Cheng

    2017-01-01

    Recurrent event processes with marker measurements are mostly and largely studied with forward time models starting from an initial event. Interestingly, the processes could exhibit important terminal behavior during a time period before occurrence of the failure event. A natural and direct way to study recurrent events prior to a failure event is to align the processes using the failure event as the time origin and to examine the terminal behavior by a backward time model. This paper studies regression models for backward recurrent marker processes by counting time backward from the failure event. A three-level semiparametric regression model is proposed for jointly modeling the time to a failure event, the backward recurrent event process, and the marker observed at the time of each backward recurrent event. The first level is a proportional hazards model for the failure time, the second level is a proportional rate model for the recurrent events occurring before the failure event, and the third level is a proportional mean model for the marker given the occurrence of a recurrent event backward in time. By jointly modeling the three components, estimating equations can be constructed for marked counting processes to estimate the target parameters in the three-level regression models. Large sample properties of the proposed estimators are studied and established. The proposed models and methods are illustrated by a community-based AIDS clinical trial to examine the terminal behavior of frequencies and severities of opportunistic infections among HIV infected individuals in the last six months of life.

  6. [Research on the method of interference correction for nondispersive infrared multi-component gas analysis].

    PubMed

    Sun, You-Wen; Liu, Wen-Qing; Wang, Shi-Mei; Huang, Shu-Hua; Yu, Xiao-Man

    2011-10-01

    A method of interference correction for nondispersive infrared multi-component gas analysis was described. According to the successive integral gas absorption models and methods, the influence of temperature and air pressure on the integral line strengths and linetype was considered, and based on Lorentz detuning linetypes, the absorption cross sections and response coefficients of H2O, CO2, CO, and NO on each filter channel were obtained. The four dimension linear regression equations for interference correction were established by response coefficients, the absorption cross interference was corrected by solving the multi-dimensional linear regression equations, and after interference correction, the pure absorbance signal on each filter channel was only controlled by the corresponding target gas concentration. When the sample cell was filled with gas mixture with a certain concentration proportion of CO, NO and CO2, the pure absorbance after interference correction was used for concentration inversion, the inversion concentration error for CO2 is 2.0%, the inversion concentration error for CO is 1.6%, and the inversion concentration error for NO is 1.7%. Both the theory and experiment prove that the interference correction method proposed for NDIR multi-component gas analysis is feasible.

  7. A survey on the measure of combat readiness

    NASA Astrophysics Data System (ADS)

    Wen, Kwong Fook; Nor, Norazman Mohamad; Soon, Lee Lai

    2014-09-01

    Measuring the combat readiness in military forces involves the measures of tangible and intangible elements of combat power. Though these measures are applicable, the mathematical models and formulae used focus mainly on either the tangible or the intangible elements. In this paper, a review is done to highlight the research gap in the formulation of a mathematical model that incorporates tangible elements with intangible elements to measure the combat readiness of a military force. It highlights the missing link between the tangible and intangible elements of combat power. To bridge the gap and missing link, a mathematical model could be formulated that measures both the tangible and intangible aspects of combat readiness by establishing the relationship between the causal (tangible and intangible) elements and its effects on the measure of combat readiness. The model uses multiple regression analysis as well as mathematical modeling and simulation which digest the capability component reflecting its assets and resources, the morale component reflecting human needs, and the quality of life component reflecting soldiers' state of satisfaction in life. The results of the review provide a mean to bridge the research gap through the formulation of a mathematical model that shows the total measure of a military force's combat readiness. The results also significantly identify parameters for each of the variables and factors in the model.

  8. Comprehensive investigation into historical pipeline construction costs and engineering economic analysis of Alaska in-state gas pipeline

    NASA Astrophysics Data System (ADS)

    Rui, Zhenhua

    This study analyzes historical cost data of 412 pipelines and 220 compressor stations. On the basis of this analysis, the study also evaluates the feasibility of an Alaska in-state gas pipeline using Monte Carlo simulation techniques. Analysis of pipeline construction costs shows that component costs, shares of cost components, and learning rates for material and labor costs vary by diameter, length, volume, year, and location. Overall average learning rates for pipeline material and labor costs are 6.1% and 12.4%, respectively. Overall average cost shares for pipeline material, labor, miscellaneous, and right of way (ROW) are 31%, 40%, 23%, and 7%, respectively. Regression models are developed to estimate pipeline component costs for different lengths, cross-sectional areas, and locations. An analysis of inaccuracy in pipeline cost estimation demonstrates that the cost estimation of pipeline cost components is biased except for in the case of total costs. Overall overrun rates for pipeline material, labor, miscellaneous, ROW, and total costs are 4.9%, 22.4%, -0.9%, 9.1%, and 6.5%, respectively, and project size, capacity, diameter, location, and year of completion have different degrees of impacts on cost overruns of pipeline cost components. Analysis of compressor station costs shows that component costs, shares of cost components, and learning rates for material and labor costs vary in terms of capacity, year, and location. Average learning rates for compressor station material and labor costs are 12.1% and 7.48%, respectively. Overall average cost shares of material, labor, miscellaneous, and ROW are 50.6%, 27.2%, 21.5%, and 0.8%, respectively. Regression models are developed to estimate compressor station component costs in different capacities and locations. An investigation into inaccuracies in compressor station cost estimation demonstrates that the cost estimation for compressor stations is biased except for in the case of material costs. Overall average overrun rates for compressor station material, labor, miscellaneous, land, and total costs are 3%, 60%, 2%, -14%, and 11%, respectively, and cost overruns for cost components are influenced by location and year of completion to different degrees. Monte Carlo models are developed and simulated to evaluate the feasibility of an Alaska in-state gas pipeline by assigning triangular distribution of the values of economic parameters. Simulated results show that the construction of an Alaska in-state natural gas pipeline is feasible at three scenarios: 500 million cubic feet per day (mmcfd), 750 mmcfd, and 1000 mmcfd.

  9. Protein subcellular location pattern classification in cellular images using latent discriminative models.

    PubMed

    Li, Jieyue; Xiong, Liang; Schneider, Jeff; Murphy, Robert F

    2012-06-15

    Knowledge of the subcellular location of a protein is crucial for understanding its functions. The subcellular pattern of a protein is typically represented as the set of cellular components in which it is located, and an important task is to determine this set from microscope images. In this article, we address this classification problem using confocal immunofluorescence images from the Human Protein Atlas (HPA) project. The HPA contains images of cells stained for many proteins; each is also stained for three reference components, but there are many other components that are invisible. Given one such cell, the task is to classify the pattern type of the stained protein. We first randomly select local image regions within the cells, and then extract various carefully designed features from these regions. This region-based approach enables us to explicitly study the relationship between proteins and different cell components, as well as the interactions between these components. To achieve these two goals, we propose two discriminative models that extend logistic regression with structured latent variables. The first model allows the same protein pattern class to be expressed differently according to the underlying components in different regions. The second model further captures the spatial dependencies between the components within the same cell so that we can better infer these components. To learn these models, we propose a fast approximate algorithm for inference, and then use gradient-based methods to maximize the data likelihood. In the experiments, we show that the proposed models help improve the classification accuracies on synthetic data and real cellular images. The best overall accuracy we report in this article for classifying 942 proteins into 13 classes of patterns is about 84.6%, which to our knowledge is the best so far. In addition, the dependencies learned are consistent with prior knowledge of cell organization. http://murphylab.web.cmu.edu/software/.

  10. Developing a Collection of Composable Data Translation Software Units to Improve Efficiency and Reproducibility in Ecohydrologic Modeling Workflows

    NASA Astrophysics Data System (ADS)

    Olschanowsky, C.; Flores, A. N.; FitzGerald, K.; Masarik, M. T.; Rudisill, W. J.; Aguayo, M.

    2017-12-01

    Dynamic models of the spatiotemporal evolution of water, energy, and nutrient cycling are important tools to assess impacts of climate and other environmental changes on ecohydrologic systems. These models require spatiotemporally varying environmental forcings like precipitation, temperature, humidity, windspeed, and solar radiation. These input data originate from a variety of sources, including global and regional weather and climate models, global and regional reanalysis products, and geostatistically interpolated surface observations. Data translation measures, often subsetting in space and/or time and transforming and converting variable units, represent a seemingly mundane, but critical step in the application workflows. Translation steps can introduce errors, misrepresentations of data, slow execution time, and interrupt data provenance. We leverage a workflow that subsets a large regional dataset derived from the Weather Research and Forecasting (WRF) model and prepares inputs to the Parflow integrated hydrologic model to demonstrate the impact translation tool software quality on scientific workflow results and performance. We propose that such workflows will benefit from a community approved collection of data transformation components. The components should be self-contained composable units of code. This design pattern enables automated parallelization and software verification, improving performance and reliability. Ensuring that individual translation components are self-contained and target minute tasks increases reliability. The small code size of each component enables effective unit and regression testing. The components can be automatically composed for efficient execution. An efficient data translation framework should be written to minimize data movement. Composing components within a single streaming process reduces data movement. Each component will typically have a low arithmetic intensity, meaning that it requires about the same number of bytes to be read as the number of computations it performs. When several components' executions are coordinated the overall arithmetic intensity increases, leading to increased efficiency.

  11. Polycyclic aromatic hydrocarbons in ambient air, surface soil and wheat grain near a large steel-smelting manufacturer in northern China.

    PubMed

    Liu, Weijian; Wang, Yilong; Chen, Yuanchen; Tao, Shu; Liu, Wenxin

    2017-07-01

    The total concentrations and component profiles of polycyclic aromatic hydrocarbons (PAHs) in ambient air, surface soil and wheat grain collected from wheat fields near a large steel-smelting manufacturer in Northern China were determined. Based on the specific isomeric ratios of paired species in ambient air, principle component analysis and multivariate linear regression, the main emission source of local PAHs was identified as a mixture of industrial and domestic coal combustion, biomass burning and traffic exhaust. The total organic carbon (TOC) fraction was considerably correlated with the total and individual PAH concentrations in surface soil. The total concentrations of PAHs in wheat grain were relatively low, with dominant low molecular weight constituents, and the compositional profile was more similar to that in ambient air than in topsoil. Combined with more significant results from partial correlation and linear regression models, the contribution from air PAHs to grain PAHs may be greater than that from soil PAHs. Copyright © 2016. Published by Elsevier B.V.

  12. Short wavelength Raman spectroscopy applied to the discrimination and characterization of three cultivars of extra virgin olive oils in different maturation stages.

    PubMed

    Gouvinhas, Irene; Machado, Nelson; Carvalho, Teresa; de Almeida, José M M M; Barros, Ana I R N A

    2015-01-01

    Extra virgin olive oils produced from three cultivars on different maturation stages were characterized using Raman spectroscopy. Chemometric methods (principal component analysis, discriminant analysis, principal component regression and partial least squares regression) applied to Raman spectral data were utilized to evaluate and quantify the statistical differences between cultivars and their ripening process. The models for predicting the peroxide value and free acidity of olive oils showed good calibration and prediction values and presented high coefficients of determination (>0.933). Both the R(2), and the correlation equations between the measured chemical parameters, and the values predicted by each approach are presented; these comprehend both PCR and PLS, used to assess SNV normalized Raman data, as well as first and second derivative of the spectra. This study demonstrates that a combination of Raman spectroscopy with multivariate analysis methods can be useful to predict rapidly olive oil chemical characteristics during the maturation process. Copyright © 2014 Elsevier B.V. All rights reserved.

  13. Revealing the underlying drivers of disaster risk: a global analysis

    NASA Astrophysics Data System (ADS)

    Peduzzi, Pascal

    2017-04-01

    Disasters events are perfect examples of compound events. Disaster risk lies at the intersection of several independent components such as hazard, exposure and vulnerability. Understanding the weight of each component requires extensive standardisation. Here, I show how footprints of past disastrous events were generated using GIS modelling techniques and used for extracting population and economic exposures based on distribution models. Using past event losses, it was possible to identify and quantify a wide range of socio-politico-economic drivers associated with human vulnerability. The analysis was applied to about nine thousand individual past disastrous events covering earthquakes, floods and tropical cyclones. Using a multiple regression analysis on these individual events it was possible to quantify each risk component and assess how vulnerability is influenced by various hazard intensities. The results show that hazard intensity, exposure, poverty, governance as well as other underlying factors (e.g. remoteness) can explain the magnitude of past disasters. Analysis was also performed to highlight the role of future trends in population and climate change and how this may impacts exposure to tropical cyclones in the future. GIS models combined with statistical multiple regression analysis provided a powerful methodology to identify, quantify and model disaster risk taking into account its various components. The same methodology can be applied to various types of risk at local to global scale. This method was applied and developed for the Global Risk Analysis of the Global Assessment Report on Disaster Risk Reduction (GAR). It was first applied on mortality risk in GAR 2009 and GAR 2011. New models ranging from global assets exposure and global flood hazard models were also recently developed to improve the resolution of the risk analysis and applied through CAPRA software to provide probabilistic economic risk assessments such as Average Annual Losses (AAL) and Probable Maximum Losses (PML) in GAR 2013 and GAR 2015. In parallel similar methodologies were developed to highlitght the role of ecosystems for Climate Change Adaptation (CCA) and Disaster Risk Reduction (DRR). New developments may include slow hazards (such as e.g. soil degradation and droughts), natech hazards (by intersecting with georeferenced critical infrastructures) The various global hazard, exposure and risk models can be visualized and download through the PREVIEW Global Risk Data Platform.

  14. Short communication: Principal components and factor analytic models for test-day milk yield in Brazilian Holstein cattle.

    PubMed

    Bignardi, A B; El Faro, L; Rosa, G J M; Cardoso, V L; Machado, P F; Albuquerque, L G

    2012-04-01

    A total of 46,089 individual monthly test-day (TD) milk yields (10 test-days), from 7,331 complete first lactations of Holstein cattle were analyzed. A standard multivariate analysis (MV), reduced rank analyses fitting the first 2, 3, and 4 genetic principal components (PC2, PC3, PC4), and analyses that fitted a factor analytic structure considering 2, 3, and 4 factors (FAS2, FAS3, FAS4), were carried out. The models included the random animal genetic effect and fixed effects of the contemporary groups (herd-year-month of test-day), age of cow (linear and quadratic effects), and days in milk (linear effect). The residual covariance matrix was assumed to have full rank. Moreover, 2 random regression models were applied. Variance components were estimated by restricted maximum likelihood method. The heritability estimates ranged from 0.11 to 0.24. The genetic correlation estimates between TD obtained with the PC2 model were higher than those obtained with the MV model, especially on adjacent test-days at the end of lactation close to unity. The results indicate that for the data considered in this study, only 2 principal components are required to summarize the bulk of genetic variation among the 10 traits. Copyright © 2012 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  15. Equilibrium isotherm and kinetic studies for the simultaneous removal of phenol and cyanide by use of S. odorifera (MTCC 5700) immobilized on coconut shell activated carbon

    NASA Astrophysics Data System (ADS)

    Singh, Neetu; Balomajumder, Chandrajit

    2017-10-01

    In this study, simultaneous removal of phenol and cyanide by a microorganism S. odorifera (MTCC 5700) immobilized onto coconut shell activated carbon surface (CSAC) was studied in batch reactor from mono and binary component aqueous solution. Activated carbon was derived from coconut shell by chemical activation method. Ferric chloride (Fecl3), used as surface modification agents was applied to biomass. Optimum biosorption conditions were obtained as a function of biosorbent dosage, pH, temperature, contact time and initial phenol and cyanide concentration. To define the equilibrium isotherms, experimental data were analyzed by five mono component isotherm and six binary component isotherm models. The higher uptake capacity of phenol and cyanide onto CSAC biosorbent surface was 450.02 and 2.58 mg/g, respectively. Nonlinear regression analysis was used for determining the best fit model on the basis of error functions and also for calculating the parameters involved in kinetic and isotherm models. The kinetic study results revealed that Fractal-like mixed first second order model and Brouser-Weron-Sototlongo models for phenol and cyanide were capable to offer accurate explanation of biosorption kinetic. According to the experimental data results, CSAC with immobilization of bacterium S. odorifera (MTCC 5700) seems to be an alternative and effective biosorbent for the elimination of phenol and cyanide from binary component aqueous solution.

  16. A framework for evaluating student perceptions of health policy training in medical school.

    PubMed

    Patel, Mitesh S; Lypson, Monica L; Miller, D Douglas; Davis, Matthew M

    2014-10-01

    Nearly half of graduating medical students in the United States report that medical school provides inadequate instruction in topics related to health policy. Although most medical schools report some form of policy education, there lacks a standard for teaching core concepts and evaluating student satisfaction. Responses to the Association of American Medical College's Medical School Graduation Questionnaire were obtained for the years 2007-2008 and 2011-2012 and mapped to domains of training in health policy curricula for four domains: systems and principles; value and equity; quality and safety; and politics and law. Chi-square tests were used to test differences among unadjusted temporal trends. Multiple logistic regression models were fit to the outcome variables and adjusted for student characteristics, student preferences, and medical school characteristics. Compared with 2007-2008, students' perceptions of training in 2011-2012 increased on a relative basis by 11.7% for components within systems and principles, 2.8% for quality and safety, and 6.8% for value and equity. Components within politics and law had a composite decline of 4.8%. Multiple logistic regression models found higher odds of reporting satisfaction with training over time for all components within the domains of systems and principles, quality and safety, and value and equity (P < .01), with the exception of medical economics. Medical student perceptions of training in health policy improved over time. Causal factors for these trends require further study. Despite improvement, nearly 40% of graduating medical students still report inadequate instruction in health policy.

  17. Parenting Interacts with Oxytocin Polymorphisms to Predict Adolescent Social Anxiety Symptom Development: A Novel Polygenic Approach.

    PubMed

    Nelemans, Stefanie A; van Assche, Evelien; Bijttebier, Patricia; Colpin, Hilde; van Leeuwen, Karla; Verschueren, Karine; Claes, Stephan; van den Noortgate, Wim; Goossens, Luc

    2018-04-26

    Guided by a developmental psychopathology framework, research has increasingly focused on the interplay of genetics and environment as a predictor of different forms of psychopathology, including social anxiety. In these efforts, the polygenic nature of complex phenotypes such as social anxiety is increasingly recognized, but studies applying polygenic approaches are still scarce. In this study, we applied Principal Covariates Regression as a novel approach to creating polygenic components for the oxytocin system, which has recently been put forward as particularly relevant to social anxiety. Participants were 978 adolescents (49.4% girls; M age T 1  = 13.8 years). Across 3 years, questionnaires were used to assess adolescent social anxiety symptoms and multi-informant reports of parental psychological control and autonomy support. All adolescents were genotyped for 223 oxytocin single nucleotide polymorphisms (SNPs) in 14 genes. Using Principal Covariates Regression, these SNPs could be reduced to five polygenic components. Four components reflected the underlying linkage disequilibrium and ancestry structure, whereas the fifth component, which consisted of small contributions of many SNPs across multiple genes, was strongly positively associated with adolescent social anxiety symptoms, pointing to an index of genetic risk. Moreover, significant interactions were found with this polygenic component and the environmental variables of interest. Specifically, adolescents who scored high on this polygenic component and experienced less adequate parenting (i.e., high psychological control or low autonomy support) showed the highest levels of social anxiety. Implications of these findings are discussed in the context of individual-by-environment models.

  18. [Mental Health Determines the Quality of Life in Patients With Cancer-Related Neuropathic Pain in Quito, Ecuador].

    PubMed

    Gordillo Altamirano, Fernando; Fierro Torres, María José; Cevallos Salas, Nelson; Cervantes Vélez, María Cristina

    To identify the main factors determining the health related quality of life (HRQL) in patients with cancer-related neuropathic pain in a tertiary care hospital. A cross-sectional analytical study was performed on a sample of 237 patients meeting criteria for cancer-related neuropathic pain. Clinical and demographic variables were recorded including, cancer type, stage, time since diagnosis, pain intensity, physical functionality with the Palliative Performance Scale (PPS), and anxiety and depression with the Hospital Anxiety and Depression Scale (HADS). Their respective correlation coefficients (r) with HRQL assessed with the SF-36v2 Questionnaire were then calculated. Linear regression equations were then constructed with the variables that showed an r≥.5 with the HRQL. The HRQL scores of the sample were 39.3±9.1 (Physical Component) and 45.5±13.8 (Mental Component). Anxiety and depression strongly correlated with the mental component (r=-.641 and r=-.741, respectively) while PPS score correlated with the physical component (r=.617). The linear regression model that better explained the variance of the mental component was designed combining the Anxiety and Depression variables (R=77.3%; P<.001). The strong influence of psychiatric comorbidity on the HRQL of patients with cancer-related neuropathic pain makes an integral management plan essential for these patients to include interventions for its timely diagnosis and treatment. Copyright © 2016 Asociación Colombiana de Psiquiatría. Publicado por Elsevier España. All rights reserved.

  19. Evolution of the Marine Officer Fitness Report: A Multivariate Analysis

    DTIC Science & Technology

    This thesis explores the evaluation behavior of United States Marine Corps (USMC) Reporting Seniors (RSs) from 2010 to 2017. Using fitness report...RSs evaluate the performance of subordinate active component unrestricted officer MROs over time. I estimate logistic regression models of the...lowest. However, these correlations indicating the effects of race matching on FITREP evaluations narrow in significance when performance-based factors

  20. Forecasting space weather: Can new econometric methods improve accuracy?

    NASA Astrophysics Data System (ADS)

    Reikard, Gordon

    2011-06-01

    Space weather forecasts are currently used in areas ranging from navigation and communication to electric power system operations. The relevant forecast horizons can range from as little as 24 h to several days. This paper analyzes the predictability of two major space weather measures using new time series methods, many of them derived from econometrics. The data sets are the A p geomagnetic index and the solar radio flux at 10.7 cm. The methods tested include nonlinear regressions, neural networks, frequency domain algorithms, GARCH models (which utilize the residual variance), state transition models, and models that combine elements of several techniques. While combined models are complex, they can be programmed using modern statistical software. The data frequency is daily, and forecasting experiments are run over horizons ranging from 1 to 7 days. Two major conclusions stand out. First, the frequency domain method forecasts the A p index more accurately than any time domain model, including both regressions and neural networks. This finding is very robust, and holds for all forecast horizons. Combining the frequency domain method with other techniques yields a further small improvement in accuracy. Second, the neural network forecasts the solar flux more accurately than any other method, although at short horizons (2 days or less) the regression and net yield similar results. The neural net does best when it includes measures of the long-term component in the data.

  1. Optimizing methods for linking cinematic features to fMRI data.

    PubMed

    Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

    2015-04-15

    One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.

  2. Global and system-specific resting-state fMRI fluctuations are uncorrelated: principal component analysis reveals anti-correlated networks.

    PubMed

    Carbonell, Felix; Bellec, Pierre; Shmuel, Amir

    2011-01-01

    The influence of the global average signal (GAS) on functional-magnetic resonance imaging (fMRI)-based resting-state functional connectivity is a matter of ongoing debate. The global average fluctuations increase the correlation between functional systems beyond the correlation that reflects their specific functional connectivity. Hence, removal of the GAS is a common practice for facilitating the observation of network-specific functional connectivity. This strategy relies on the implicit assumption of a linear-additive model according to which global fluctuations, irrespective of their origin, and network-specific fluctuations are super-positioned. However, removal of the GAS introduces spurious negative correlations between functional systems, bringing into question the validity of previous findings of negative correlations between fluctuations in the default-mode and the task-positive networks. Here we present an alternative method for estimating global fluctuations, immune to the complications associated with the GAS. Principal components analysis was applied to resting-state fMRI time-series. A global-signal effect estimator was defined as the principal component (PC) that correlated best with the GAS. The mean correlation coefficient between our proposed PC-based global effect estimator and the GAS was 0.97±0.05, demonstrating that our estimator successfully approximated the GAS. In 66 out of 68 runs, the PC that showed the highest correlation with the GAS was the first PC. Since PCs are orthogonal, our method provides an estimator of the global fluctuations, which is uncorrelated to the remaining, network-specific fluctuations. Moreover, unlike the regression of the GAS, the regression of the PC-based global effect estimator does not introduce spurious anti-correlations beyond the decrease in seed-based correlation values allowed by the assumed additive model. After regressing this PC-based estimator out of the original time-series, we observed robust anti-correlations between resting-state fluctuations in the default-mode and the task-positive networks. We conclude that resting-state global fluctuations and network-specific fluctuations are uncorrelated, supporting a Resting-State Linear-Additive Model. In addition, we conclude that the network-specific resting-state fluctuations of the default-mode and task-positive networks show artifact-free anti-correlations.

  3. Estimating the susceptibility of surface water in Texas to nonpoint-source contamination by use of logistic regression modeling

    USGS Publications Warehouse

    Battaglin, William A.; Ulery, Randy L.; Winterstein, Thomas; Welborn, Toby

    2003-01-01

    In the State of Texas, surface water (streams, canals, and reservoirs) and ground water are used as sources of public water supply. Surface-water sources of public water supply are susceptible to contamination from point and nonpoint sources. To help protect sources of drinking water and to aid water managers in designing protective yet cost-effective and risk-mitigated monitoring strategies, the Texas Commission on Environmental Quality and the U.S. Geological Survey developed procedures to assess the susceptibility of public water-supply source waters in Texas to the occurrence of 227 contaminants. One component of the assessments is the determination of susceptibility of surface-water sources to nonpoint-source contamination. To accomplish this, water-quality data at 323 monitoring sites were matched with geographic information system-derived watershed- characteristic data for the watersheds upstream from the sites. Logistic regression models then were developed to estimate the probability that a particular contaminant will exceed a threshold concentration specified by the Texas Commission on Environmental Quality. Logistic regression models were developed for 63 of the 227 contaminants. Of the remaining contaminants, 106 were not modeled because monitoring data were available at less than 10 percent of the monitoring sites; 29 were not modeled because there were less than 15 percent detections of the contaminant in the monitoring data; 27 were not modeled because of the lack of any monitoring data; and 2 were not modeled because threshold values were not specified.

  4. Sufficient Forecasting Using Factor Models

    PubMed Central

    Fan, Jianqing; Xue, Lingzhou; Yao, Jiawei

    2017-01-01

    We consider forecasting a single time series when there is a large number of predictors and a possible nonlinear effect. The dimensionality was first reduced via a high-dimensional (approximate) factor model implemented by the principal component analysis. Using the extracted factors, we develop a novel forecasting method called the sufficient forecasting, which provides a set of sufficient predictive indices, inferred from high-dimensional predictors, to deliver additional predictive power. The projected principal component analysis will be employed to enhance the accuracy of inferred factors when a semi-parametric (approximate) factor model is assumed. Our method is also applicable to cross-sectional sufficient regression using extracted factors. The connection between the sufficient forecasting and the deep learning architecture is explicitly stated. The sufficient forecasting correctly estimates projection indices of the underlying factors even in the presence of a nonparametric forecasting function. The proposed method extends the sufficient dimension reduction to high-dimensional regimes by condensing the cross-sectional information through factor models. We derive asymptotic properties for the estimate of the central subspace spanned by these projection directions as well as the estimates of the sufficient predictive indices. We further show that the natural method of running multiple regression of target on estimated factors yields a linear estimate that actually falls into this central subspace. Our method and theory allow the number of predictors to be larger than the number of observations. We finally demonstrate that the sufficient forecasting improves upon the linear forecasting in both simulation studies and an empirical study of forecasting macroeconomic variables. PMID:29731537

  5. The International Classification of Functioning as an explanatory model of health after distal radius fracture: A cohort study

    PubMed Central

    Harris, Jocelyn E; MacDermid, Joy C; Roth, James

    2005-01-01

    Background Distal radius fractures are common injuries that have an increasing impact on health across the lifespan. The purpose of this study was to identify health impacts in body structure/function, activity, and participation at baseline and follow-up, to determine whether they support the ICF model of health. Methods This is a prospective cohort study of 790 individuals who were assessed at 1 week, 3 months, and 1 year post injury. The Patient Rated Wrist Evaluation (PRWE), The Wrist Outcome Measure (WOM), and the Medical Outcome Survey Short-Form (SF-36) were used to measure impairment, activity, participation, and health. Multiple regression was used to develop explanatory models of health outcome. Results Regression analysis showed that the PRWE explained between 13% (one week) and 33% (three months) of the SF-36 Physical Component Summary Scores with pain, activities and participation subscales showing dominant effects at different stages of recovery. PRWE scores were less related to Mental Component Summary Scores, 10% (three months) and 8% (one year). Wrist impairment scores were less powerful predictors of health status than the PRWE. Conclusion The ICF is an informative model for examining distal radius fracture. Difficulty in the domains of activity and participation were able to explain a significant portion of physical health. Post-fracture rehabilitation and outcome assessments should extend beyond physical impairment to insure comprehensive treatment to individuals with distal radius fracture. PMID:16288664

  6. Risk prediction for myocardial infarction via generalized functional regression models.

    PubMed

    Ieva, Francesca; Paganoni, Anna M

    2016-08-01

    In this paper, we propose a generalized functional linear regression model for a binary outcome indicating the presence/absence of a cardiac disease with multivariate functional data among the relevant predictors. In particular, the motivating aim is the analysis of electrocardiographic traces of patients whose pre-hospital electrocardiogram (ECG) has been sent to 118 Dispatch Center of Milan (the Italian free-toll number for emergencies) by life support personnel of the basic rescue units. The statistical analysis starts with a preprocessing of ECGs treated as multivariate functional data. The signals are reconstructed from noisy observations. The biological variability is then removed by a nonlinear registration procedure based on landmarks. Thus, in order to perform a data-driven dimensional reduction, a multivariate functional principal component analysis is carried out on the variance-covariance matrix of the reconstructed and registered ECGs and their first derivatives. We use the scores of the Principal Components decomposition as covariates in a generalized linear model to predict the presence of the disease in a new patient. Hence, a new semi-automatic diagnostic procedure is proposed to estimate the risk of infarction (in the case of interest, the probability of being affected by Left Bundle Brunch Block). The performance of this classification method is evaluated and compared with other methods proposed in literature. Finally, the robustness of the procedure is checked via leave-j-out techniques. © The Author(s) 2013.

  7. A regression-adjusted approach can estimate competing biomass

    Treesearch

    James H. Miller

    1983-01-01

    A method is presented for estimating above-ground herbaceous and woody biomass on competition research plots. On a set of destructively-sampled plots, an ocular estimate of biomass by vegetative component is first made, after which vegetation is clipped, dried, and weighed. Linear regressions are then calculated for each component between estimated and actual weights...

  8. Design and testing of digitally manufactured paraffin Acrylonitrile-butadiene-styrene hybrid rocket motors

    NASA Astrophysics Data System (ADS)

    McCulley, Jonathan M.

    This research investigates the application of additive manufacturing techniques for fabricating hybrid rocket fuel grains composed of porous Acrylonitrile-butadiene-styrene impregnated with paraffin wax. The digitally manufactured ABS substrate provides mechanical support for the paraffin fuel material and serves as an additional fuel component. The embedded paraffin provides an enhanced fuel regression rate while having no detrimental effect on the thermodynamic burn properties of the fuel grain. Multiple fuel grains with various ABS-to-Paraffin mass ratios were fabricated and burned with nitrous oxide. Analytical predictions for end-to-end motor performance and fuel regression are compared against static test results. Baseline fuel grain regression calculations use an enthalpy balance energy analysis with the material and thermodynamic properties based on the mean paraffin/ABS mass fractions within the fuel grain. In support of these analytical comparisons, a novel method for propagating the fuel port burn surface was developed. In this modeling approach the fuel cross section grid is modeled as an image with white pixels representing the fuel and black pixels representing empty or burned grid cells.

  9. State-space decoding of primary afferent neuron firing rates

    NASA Astrophysics Data System (ADS)

    Wagenaar, J. B.; Ventura, V.; Weber, D. J.

    2011-02-01

    Kinematic state feedback is important for neuroprostheses to generate stable and adaptive movements of an extremity. State information, represented in the firing rates of populations of primary afferent (PA) neurons, can be recorded at the level of the dorsal root ganglia (DRG). Previous work in cats showed the feasibility of using DRG recordings to predict the kinematic state of the hind limb using reverse regression. Although accurate decoding results were attained, reverse regression does not make efficient use of the information embedded in the firing rates of the neural population. In this paper, we present decoding results based on state-space modeling, and show that it is a more principled and more efficient method for decoding the firing rates in an ensemble of PA neurons. In particular, we show that we can extract confounded information from neurons that respond to multiple kinematic parameters, and that including velocity components in the firing rate models significantly increases the accuracy of the decoded trajectory. We show that, on average, state-space decoding is twice as efficient as reverse regression for decoding joint and endpoint kinematics.

  10. Linear regression analysis and its application to multivariate chromatographic calibration for the quantitative analysis of two-component mixtures.

    PubMed

    Dinç, Erdal; Ozdemir, Abdil

    2005-01-01

    Multivariate chromatographic calibration technique was developed for the quantitative analysis of binary mixtures enalapril maleate (EA) and hydrochlorothiazide (HCT) in tablets in the presence of losartan potassium (LST). The mathematical algorithm of multivariate chromatographic calibration technique is based on the use of the linear regression equations constructed using relationship between concentration and peak area at the five-wavelength set. The algorithm of this mathematical calibration model having a simple mathematical content was briefly described. This approach is a powerful mathematical tool for an optimum chromatographic multivariate calibration and elimination of fluctuations coming from instrumental and experimental conditions. This multivariate chromatographic calibration contains reduction of multivariate linear regression functions to univariate data set. The validation of model was carried out by analyzing various synthetic binary mixtures and using the standard addition technique. Developed calibration technique was applied to the analysis of the real pharmaceutical tablets containing EA and HCT. The obtained results were compared with those obtained by classical HPLC method. It was observed that the proposed multivariate chromatographic calibration gives better results than classical HPLC.

  11. Imaging genetics approach to predict progression of Parkinson's diseases.

    PubMed

    Mansu Kim; Seong-Jin Son; Hyunjin Park

    2017-07-01

    Imaging genetics is a tool to extract genetic variants associated with both clinical phenotypes and imaging information. The approach can extract additional genetic variants compared to conventional approaches to better investigate various diseased conditions. Here, we applied imaging genetics to study Parkinson's disease (PD). We aimed to extract significant features derived from imaging genetics and neuroimaging. We built a regression model based on extracted significant features combining genetics and neuroimaging to better predict clinical scores of PD progression (i.e. MDS-UPDRS). Our model yielded high correlation (r = 0.697, p <; 0.001) and low root mean squared error (8.36) between predicted and actual MDS-UPDRS scores. Neuroimaging (from 123 I-Ioflupane SPECT) predictors of regression model were computed from independent component analysis approach. Genetic features were computed using image genetics approach based on identified neuroimaging features as intermediate phenotypes. Joint modeling of neuroimaging and genetics could provide complementary information and thus have the potential to provide further insight into the pathophysiology of PD. Our model included newly found neuroimaging features and genetic variants which need further investigation.

  12. Detection of Butter Adulteration with Lard by Employing (1)H-NMR Spectroscopy and Multivariate Data Analysis.

    PubMed

    Fadzillah, Nurrulhidayah Ahmad; Man, Yaakob bin Che; Rohman, Abdul; Rosman, Arieff Salleh; Ismail, Amin; Mustafa, Shuhaimi; Khatib, Alfi

    2015-01-01

    The authentication of food products from the presence of non-allowed components for certain religion like lard is very important. In this study, we used proton Nuclear Magnetic Resonance ((1)H-NMR) spectroscopy for the analysis of butter adulterated with lard by simultaneously quantification of all proton bearing compounds, and consequently all relevant sample classes. Since the spectra obtained were too complex to be analyzed visually by the naked eyes, the classification of spectra was carried out.The multivariate calibration of partial least square (PLS) regression was used for modelling the relationship between actual value of lard and predicted value. The model yielded a highest regression coefficient (R(2)) of 0.998 and the lowest root mean square error calibration (RMSEC) of 0.0091% and root mean square error prediction (RMSEP) of 0.0090, respectively. Cross validation testing evaluates the predictive power of the model. PLS model was shown as good models as the intercept of R(2)Y and Q(2)Y were 0.0853 and -0.309, respectively.

  13. On approaches to analyze the sensitivity of simulated hydrologic fluxes to model parameters in the community land model

    DOE PAGES

    Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...

    2015-12-04

    Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less

  14. Estimating individual influences of behavioral intentions: an application of random-effects modeling to the theory of reasoned action.

    PubMed

    Hedeker, D; Flay, B R; Petraitis, J

    1996-02-01

    Methods are proposed and described for estimating the degree to which relations among variables vary at the individual level. As an example of the methods, M. Fishbein and I. Ajzen's (1975; I. Ajzen & M. Fishbein, 1980) theory of reasoned action is examined, which posits first that an individual's behavioral intentions are a function of 2 components: the individual's attitudes toward the behavior and the subjective norms as perceived by the individual. A second component of their theory is that individuals may weight these 2 components differently in assessing their behavioral intentions. This article illustrates the use of empirical Bayes methods based on a random-effects regression model to estimate these individual influences, estimating an individual's weighting of both of these components (attitudes toward the behavior and subjective norms) in relation to their behavioral intentions. This method can be used when an individual's behavioral intentions, subjective norms, and attitudes toward the behavior are all repeatedly measured. In this case, the empirical Bayes estimates are derived as a function of the data from the individual, strengthened by the overall sample data.

  15. Enhanced index tracking modeling in portfolio optimization with mixed-integer programming z approach

    NASA Astrophysics Data System (ADS)

    Siew, Lam Weng; Jaaman, Saiful Hafizah Hj.; Ismail, Hamizun bin

    2014-09-01

    Enhanced index tracking is a popular form of portfolio management in stock market investment. Enhanced index tracking aims to construct an optimal portfolio to generate excess return over the return achieved by the stock market index without purchasing all of the stocks that make up the index. The objective of this paper is to construct an optimal portfolio using mixed-integer programming model which adopts regression approach in order to generate higher portfolio mean return than stock market index return. In this study, the data consists of 24 component stocks in Malaysia market index which is FTSE Bursa Malaysia Kuala Lumpur Composite Index from January 2010 until December 2012. The results of this study show that the optimal portfolio of mixed-integer programming model is able to generate higher mean return than FTSE Bursa Malaysia Kuala Lumpur Composite Index return with only selecting 30% out of the total stock market index components.

  16. Estimates of Refrigerator Loads in Public Housing Based on Metered Consumption Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miller, JD; Pratt, RG

    1998-09-11

    The New York Power Authority (NYPA), the New York City Housing Authority (NYCHA), and the U.S. Departments of Housing and Urban Development (HUD) and Energy (DOE) have joined in a project to replace refrigerators in New York City public housing with new, highly energy-efficient models. This project laid the ground work for the Consortium for Energy Efficiency (CEE) and DOE to enable housing authorities throughout the United States to bulk-purchase energy-efficient appliances. DOE helped develop and plan the program through the ENERGY STAR@ Partnerships program conducted by its Pacific Nofiwest National Laboratory (PNNL). PNNL was subsequently asked to conduct themore » savings evahations for 1996 and 1997. PNNL designed the metering protocol and occupant survey, supplied and calibrated the metering equipment, and managed and analyzed the data. The 1996 metering study of refrigerator energy usage in New York City public housing (Pratt and Miller 1997) established the need and justification for a regression-model-based approach to an energy savings estimate. The need originated in logistical difficulties associated with sampling the population and pen?orming a stratified analysis. Commonly, refrigerators[a) with high representation in the population were missed in the sampling schedule, leaving significant holes in the sample and difficulties for the stratified anrdysis. The just{jfcation was found in the fact that strata (distinct groups of identical refrigerators) were not statistically distinct in terms of their label ratio (ratio of metered consumption to label rating). This finding suggested a general regression model could be used to represent the consumption of all refrigerators in the population. In 1996 a simple two-coefficient regression model, a function of only the refrigerator label rating, was developed and used to represent the existing population of refrigerators. A key concept used in the 1997 study grew from findings in a small number of apartments metered in 1996 with a detailed protocol. Fifteen-minute time-series data of ambient and compartment temperatures and refrigerator power were analyzed and demonstrated the potential for reducing power records into three components. This motivated the development of an analysis process to divide the metered consumption into baseline load, occupant-associated load, and defrosting load. The baseline load is the consumption that would occur if the refrigerator were on but had no occupant usage load (no door-opening events) and the defrosting mechanism was disabled. The motivation behind this component reduction process was the hope that components could be more effectively modeled than the total. We reasoned that the components would lead to abetter (more general and more significant) understanding of the relationships between consumption, the characteristics of the refrigerator, and its operating environment.« less

  17. An augmented classical least squares method for quantitative Raman spectral analysis against component information loss.

    PubMed

    Zhou, Yan; Cao, Hui

    2013-01-01

    We propose an augmented classical least squares (ACLS) calibration method for quantitative Raman spectral analysis against component information loss. The Raman spectral signals with low analyte concentration correlations were selected and used as the substitutes for unknown quantitative component information during the CLS calibration procedure. The number of selected signals was determined by using the leave-one-out root-mean-square error of cross-validation (RMSECV) curve. An ACLS model was built based on the augmented concentration matrix and the reference spectral signal matrix. The proposed method was compared with partial least squares (PLS) and principal component regression (PCR) using one example: a data set recorded from an experiment of analyte concentration determination using Raman spectroscopy. A 2-fold cross-validation with Venetian blinds strategy was exploited to evaluate the predictive power of the proposed method. The one-way variance analysis (ANOVA) was used to access the predictive power difference between the proposed method and existing methods. Results indicated that the proposed method is effective at increasing the robust predictive power of traditional CLS model against component information loss and its predictive power is comparable to that of PLS or PCR.

  18. Logistic regression of family data from retrospective study designs.

    PubMed

    Whittemore, Alice S; Halpern, Jerry

    2003-11-01

    We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Copyright 2003 Wiley-Liss, Inc.

  19. Habitat suitability index model for brook trout in streams of the Southern Blue Ridge Province: Surrogate variables, model evaluation, and suggested improvements

    USGS Publications Warehouse

    Schmitt, C.J.; Lemly, A.D.; Winger, P.V.

    1993-01-01

    Data from several sources were collated and analyzed by correlation, regression, and principal components analysis to define surrrogate variables for use in the brook trout (Salvelinus fontinalis) habitat suitability index (HSI) model, and to evaluate the applicability of the model for assessing habitat in high elevation streams of the southern Blue Ridge Province (SBRP). In all data sets examined, pH and alkalinity were highly correlated, and both declined with increasing elevation; however, the magnitude of the decline varied with underlying rock formations and other factors, thereby restricting the utility of elevation as a surrogate for pH. In the data sets that contained biological information, brook trout abundance (as biomass, density, or both) tended to increase with elevation and decrease with the abundance of rainbow trout (Oncorhynchus mykiss), and was not significantly correlated (P >0.05) with the abundance of most benthic macroinvertebrate taxa normally construed as important in the diet of brook trout. Using multiple linear regression, the authors formulated an alternative HSI model A? based on point estimates of gradient, pH, elevation, stream width, and rainbow trout density A? which explained 40 to 50 percent of the variance in brook trout density in 256 stream reaches. Although logically developed, the present U.S. Fish and Wildlife Service HSI model, proposed in 1982, seems deficient in several areas, especially when applied to SBRP streams. The authors recommend that the water quality component in the model be updated and reevaluated, focusing on the differential sensitivities of each life stage, the stochastic nature of the water quality variables, and the possible existence of habitat requirements that differ among brook trout strains.

  20. Gender, position of authority, and the risk of depression and post-traumatic stress disorder among a national sample of U.S. Reserve Component Personnel

    PubMed Central

    Cohen, Gregory H.; Sampson, Laura A.; Fink, David S.; Wang, Jing; Russell, Dale; Gifford, Robert; Fullerton, Carol; Ursano, Robert; Galea, Sandro

    2016-01-01

    BACKGROUND Recent United States military operations in Iraq and Afghanistan have seen dramatic increases in the proportion of women serving, and the breadth of their occupational roles. General population studies suggest that women, compared to men, and persons with lower, as compared to higher, social position may be at greater risk of post-traumatic stress disorder (PTSD) and depression. However, these relations remain unclear in military populations. Accordingly, we aimed to estimate the effects of (1) gender, (2) military authority (i.e., rank) and (3) the interaction of gender and military authority upon: (a) risk of most-recent-deployment-related PTSD, and (b) risk of depression since most-recent-deployment. METHODS Using a nationally representative sample of 1024 previously deployed Reserve Component personnel surveyed in 2010, we constructed multivariable logistic regression models to estimate effects of interest. RESULTS Weighted multivariable logistic regression models demonstrated no statistically significant associations between gender or authority, and either PTSD or depression. Interaction models demonstrated multiplicative statistical interaction between gender and authority for PTSD (beta= −2.37;p=0.01), and depression (beta=-1.21; p=0.057). Predicted probabilities of PTSD and depression, respectively, were lowest in male officers (0.06, 0.09), followed by male enlisted (0.07, 0.14), female enlisted (0.07, 0.15), and female officers (0.30, 0.25). CONCLUSIONS Female officers in the Reserve Component may be at greatest risk for PTSD and depression following deployment, relative to their male and enlisted counterparts, and this relation is not explained by deployment trauma exposure. Future studies may fruitfully examine whether social support, family responsibilities peri-deployment, or contradictory class status may explain these findings. PMID:26899583

  1. Searching for the main anti-bacterial components in artificial Calculus bovis using UPLC and microcalorimetry coupled with multi-linear regression analysis.

    PubMed

    Zang, Qing-Ce; Wang, Jia-Bo; Kong, Wei-Jun; Jin, Cheng; Ma, Zhi-Jie; Chen, Jing; Gong, Qian-Feng; Xiao, Xiao-He

    2011-12-01

    The fingerprints of artificial Calculus bovis extracts from different solvents were established by ultra-performance liquid chromatography (UPLC) and the anti-bacterial activities of artificial C. bovis extracts on Staphylococcus aureus (S. aureus) growth were studied by microcalorimetry. The UPLC fingerprints were evaluated using hierarchical clustering analysis. Some quantitative parameters obtained from the thermogenic curves of S. aureus growth affected by artificial C. bovis extracts were analyzed using principal component analysis. The spectrum-effect relationships between UPLC fingerprints and anti-bacterial activities were investigated using multi-linear regression analysis. The results showed that peak 1 (taurocholate sodium), peak 3 (unknown compound), peak 4 (cholic acid), and peak 6 (chenodeoxycholic acid) are more significant than the other peaks with the standard parameter estimate 0.453, -0.166, 0.749, 0.025, respectively. So, compounds cholic acid, taurocholate sodium, and chenodeoxycholic acid might be the major anti-bacterial components in artificial C. bovis. Altogether, this work provides a general model of the combination of UPLC chromatography and anti-bacterial effect to study the spectrum-effect relationships of artificial C. bovis extracts, which can be used to discover the main anti-bacterial components in artificial C. bovis or other Chinese herbal medicines with anti-bacterial effects. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  2. Quantifying predictive capability of electronic health records for the most harmful breast cancer

    NASA Astrophysics Data System (ADS)

    Wu, Yirong; Fan, Jun; Peissig, Peggy; Berg, Richard; Tafti, Ahmad Pahlavan; Yin, Jie; Yuan, Ming; Page, David; Cox, Jennifer; Burnside, Elizabeth S.

    2018-03-01

    Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and LassoLR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.

  3. Quantifying predictive capability of electronic health records for the most harmful breast cancer.

    PubMed

    Wu, Yirong; Fan, Jun; Peissig, Peggy; Berg, Richard; Tafti, Ahmad Pahlavan; Yin, Jie; Yuan, Ming; Page, David; Cox, Jennifer; Burnside, Elizabeth S

    2018-02-01

    Improved prediction of the "most harmful" breast cancers that cause the most substantive morbidity and mortality would enable physicians to target more intense screening and preventive measures at those women who have the highest risk; however, such prediction models for the "most harmful" breast cancers have rarely been developed. Electronic health records (EHRs) represent an underused data source that has great research and clinical potential. Our goal was to quantify the value of EHR variables in the "most harmful" breast cancer risk prediction. We identified 794 subjects who had breast cancer with primary non-benign tumors with their earliest diagnosis on or after 1/1/2004 from an existing personalized medicine data repository, including 395 "most harmful" breast cancer cases and 399 "least harmful" breast cancer cases. For these subjects, we collected EHR data comprised of 6 components: demographics, diagnoses, symptoms, procedures, medications, and laboratory results. We developed two regularized prediction models, Ridge Logistic Regression (Ridge-LR) and Lasso Logistic Regression (Lasso-LR), to predict the "most harmful" breast cancer one year in advance. The area under the ROC curve (AUC) was used to assess model performance. We observed that the AUCs of Ridge-LR and Lasso-LR models were 0.818 and 0.839 respectively. For both the Ridge-LR and Lasso-LR models, the predictive performance of the whole EHR variables was significantly higher than that of each individual component (p<0.001). In conclusion, EHR variables can be used to predict the "most harmful" breast cancer, providing the possibility to personalize care for those women at the highest risk in clinical practice.

  4. SkyFACT: high-dimensional modeling of gamma-ray emission with adaptive templates and penalized likelihoods

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Storm, Emma; Weniger, Christoph; Calore, Francesca, E-mail: e.m.storm@uva.nl, E-mail: c.weniger@uva.nl, E-mail: francesca.calore@lapth.cnrs.fr

    We present SkyFACT (Sky Factorization with Adaptive Constrained Templates), a new approach for studying, modeling and decomposing diffuse gamma-ray emission. Like most previous analyses, the approach relies on predictions from cosmic-ray propagation codes like GALPROP and DRAGON. However, in contrast to previous approaches, we account for the fact that models are not perfect and allow for a very large number (∼> 10{sup 5}) of nuisance parameters to parameterize these imperfections. We combine methods of image reconstruction and adaptive spatio-spectral template regression in one coherent hybrid approach. To this end, we use penalized Poisson likelihood regression, with regularization functions that aremore » motivated by the maximum entropy method. We introduce methods to efficiently handle the high dimensionality of the convex optimization problem as well as the associated semi-sparse covariance matrix, using the L-BFGS-B algorithm and Cholesky factorization. We test the method both on synthetic data as well as on gamma-ray emission from the inner Galaxy, |ℓ|<90{sup o} and | b |<20{sup o}, as observed by the Fermi Large Area Telescope. We finally define a simple reference model that removes most of the residual emission from the inner Galaxy, based on conventional diffuse emission components as well as components for the Fermi bubbles, the Fermi Galactic center excess, and extended sources along the Galactic disk. Variants of this reference model can serve as basis for future studies of diffuse emission in and outside the Galactic disk.« less

  5. Reverse engineering model structures for soil and ecosystem respiration: the potential of gene expression programming

    NASA Astrophysics Data System (ADS)

    Ilie, Iulia; Dittrich, Peter; Carvalhais, Nuno; Jung, Martin; Heinemeyer, Andreas; Migliavacca, Mirco; Morison, James I. L.; Sippel, Sebastian; Subke, Jens-Arne; Wilkinson, Matthew; Mahecha, Miguel D.

    2017-09-01

    Accurate model representation of land-atmosphere carbon fluxes is essential for climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments, complemented by a steadily evolving body of mechanistic theory, provides the main basis for developing such models. The strongly increasing availability of measurements may facilitate new ways of identifying suitable model structures using machine learning. Here, we explore the potential of gene expression programming (GEP) to derive relevant model formulations based solely on the signals present in data by automatically applying various mathematical transformations to potential predictors and repeatedly evolving the resulting model structures. In contrast to most other machine learning regression techniques, the GEP approach generates readable models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions, with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (random forests, support vector machines, artificial neural networks, and kernel ridge regressions). Based on real observations we explore the responses of the different components of terrestrial respiration at an oak forest in south-eastern England. We find that the GEP-retrieved models are often better in prediction than some established respiration models. Based on their structures, we find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. We noticed that the GEP models are only partly portable across respiration components, the identification of a general terrestrial respiration model possibly prevented by equifinality issues. Overall, GEP is a promising tool for uncovering new model structures for terrestrial ecology in the data-rich era, complementing more traditional modelling approaches.

  6. Assessing the accuracy of ANFIS, EEMD-GRNN, PCR, and MLR models in predicting PM2.5

    NASA Astrophysics Data System (ADS)

    Ausati, Shadi; Amanollahi, Jamil

    2016-10-01

    Since Sanandaj is considered one of polluted cities of Iran, prediction of any type of pollution especially prediction of suspended particles of PM2.5, which are the cause of many diseases, could contribute to health of society by timely announcements and prior to increase of PM2.5. In order to predict PM2.5 concentration in the Sanandaj air the hybrid models consisting of an ensemble empirical mode decomposition and general regression neural network (EEMD-GRNN), Adaptive Neuro-Fuzzy Inference System (ANFIS), principal component regression (PCR), and linear model such as multiple liner regression (MLR) model were used. In these models the data of suspended particles of PM2.5 were the dependent variable and the data related to air quality including PM2.5, PM10, SO2, NO2, CO, O3 and meteorological data including average minimum temperature (Min T), average maximum temperature (Max T), average atmospheric pressure (AP), daily total precipitation (TP), daily relative humidity level of the air (RH) and daily wind speed (WS) for the year 2014 in Sanandaj were the independent variables. Among the used models, EEMD-GRNN model with values of R2 = 0.90, root mean square error (RMSE) = 4.9218 and mean absolute error (MAE) = 3.4644 in the training phase and with values of R2 = 0.79, RMSE = 5.0324 and MAE = 3.2565 in the testing phase, exhibited the best function in predicting this phenomenon. It can be concluded that hybrid models have accurate results to predict PM2.5 concentration compared with linear model.

  7. Application of principal component regression and partial least squares regression in ultraviolet spectrum water quality detection

    NASA Astrophysics Data System (ADS)

    Li, Jiangtong; Luo, Yongdao; Dai, Honglin

    2018-01-01

    Water is the source of life and the essential foundation of all life. With the development of industrialization, the phenomenon of water pollution is becoming more and more frequent, which directly affects the survival and development of human. Water quality detection is one of the necessary measures to protect water resources. Ultraviolet (UV) spectral analysis is an important research method in the field of water quality detection, which partial least squares regression (PLSR) analysis method is becoming predominant technology, however, in some special cases, PLSR's analysis produce considerable errors. In order to solve this problem, the traditional principal component regression (PCR) analysis method was improved by using the principle of PLSR in this paper. The experimental results show that for some special experimental data set, improved PCR analysis method performance is better than PLSR. The PCR and PLSR is the focus of this paper. Firstly, the principal component analysis (PCA) is performed by MATLAB to reduce the dimensionality of the spectral data; on the basis of a large number of experiments, the optimized principal component is extracted by using the principle of PLSR, which carries most of the original data information. Secondly, the linear regression analysis of the principal component is carried out with statistic package for social science (SPSS), which the coefficients and relations of principal components can be obtained. Finally, calculating a same water spectral data set by PLSR and improved PCR, analyzing and comparing two results, improved PCR and PLSR is similar for most data, but improved PCR is better than PLSR for data near the detection limit. Both PLSR and improved PCR can be used in Ultraviolet spectral analysis of water, but for data near the detection limit, improved PCR's result better than PLSR.

  8. Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China.

    PubMed

    Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

    2016-01-01

    Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (P<0.05). However, the biomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia.

  9. Tree Biomass Allocation and Its Model Additivity for Casuarina equisetifolia in a Tropical Forest of Hainan Island, China

    PubMed Central

    Xue, Yang; Yang, Zhongyang; Wang, Xiaoyan; Lin, Zhipan; Li, Dunxi; Su, Shaofeng

    2016-01-01

    Casuarina equisetifolia is commonly planted and used in the construction of coastal shelterbelt protection in Hainan Island. Thus, it is critical to accurately estimate the tree biomass of Casuarina equisetifolia L. for forest managers to evaluate the biomass stock in Hainan. The data for this work consisted of 72 trees, which were divided into three age groups: young forest, middle-aged forest, and mature forest. The proportion of biomass from the trunk significantly increased with age (P<0.05). However, the biomass of the branch and leaf decreased, and the biomass of the root did not change. To test whether the crown radius (CR) can improve biomass estimates of C. equisetifolia, we introduced CR into the biomass models. Here, six models were used to estimate the biomass of each component, including the trunk, the branch, the leaf, and the root. In each group, we selected one model among these six models for each component. The results showed that including the CR greatly improved the model performance and reduced the error, especially for the young and mature forests. In addition, to ensure biomass additivity, the selected equation for each component was fitted as a system of equations using seemingly unrelated regression (SUR). The SUR method not only gave efficient and accurate estimates but also achieved the logical additivity. The results in this study provide a robust estimation of tree biomass components and total biomass over three groups of C. equisetifolia. PMID:27002822

  10. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    NASA Astrophysics Data System (ADS)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  11. Portfolio optimization for index tracking modelling in Malaysia stock market

    NASA Astrophysics Data System (ADS)

    Siew, Lam Weng; Jaaman, Saiful Hafizah; Ismail, Hamizun

    2016-06-01

    Index tracking is an investment strategy in portfolio management which aims to construct an optimal portfolio to generate similar mean return with the stock market index mean return without purchasing all of the stocks that make up the index. The objective of this paper is to construct an optimal portfolio using the optimization model which adopts regression approach in tracking the benchmark stock market index return. In this study, the data consists of weekly price of stocks in Malaysia market index which is FTSE Bursa Malaysia Kuala Lumpur Composite Index from January 2010 until December 2013. The results of this study show that the optimal portfolio is able to track FBMKLCI Index at minimum tracking error of 1.0027% with 0.0290% excess mean return over the mean return of FBMKLCI Index. The significance of this study is to construct the optimal portfolio using optimization model which adopts regression approach in tracking the stock market index without purchasing all index components.

  12. Economics of human performance and systems total ownership cost.

    PubMed

    Onkham, Wilawan; Karwowski, Waldemar; Ahram, Tareq Z

    2012-01-01

    Financial costs of investing in people is associated with training, acquisition, recruiting, and resolving human errors have a significant impact on increased total ownership costs. These costs can also affect the exaggerate budgets and delayed schedules. The study of human performance economical assessment in the system acquisition process enhances the visibility of hidden cost drivers which support program management informed decisions. This paper presents the literature review of human total ownership cost (HTOC) and cost impacts on overall system performance. Economic value assessment models such as cost benefit analysis, risk-cost tradeoff analysis, expected value of utility function analysis (EV), growth readiness matrix, multi-attribute utility technique, and multi-regressions model were introduced to reflect the HTOC and human performance-technology tradeoffs in terms of the dollar value. The human total ownership regression model introduces to address the influencing human performance cost component measurement. Results from this study will increase understanding of relevant cost drivers in the system acquisition process over the long term.

  13. Deep supervised dictionary learning for no-reference image quality assessment

    NASA Astrophysics Data System (ADS)

    Huang, Yuge; Liu, Xuesong; Tian, Xiang; Zhou, Fan; Chen, Yaowu; Jiang, Rongxin

    2018-03-01

    We propose a deep convolutional neural network (CNN) for general no-reference image quality assessment (NR-IQA), i.e., accurate prediction of image quality without a reference image. The proposed model consists of three components such as a local feature extractor that is a fully CNN, an encoding module with an inherent dictionary that aggregates local features to output a fixed-length global quality-aware image representation, and a regression module that maps the representation to an image quality score. Our model can be trained in an end-to-end manner, and all of the parameters, including the weights of the convolutional layers, the dictionary, and the regression weights, are simultaneously learned from the loss function. In addition, the model can predict quality scores for input images of arbitrary sizes in a single step. We tested our method on commonly used image quality databases and showed that its performance is comparable with that of state-of-the-art general-purpose NR-IQA algorithms.

  14. Sugar and acid content of Citrus prediction modeling using FT-IR fingerprinting in combination with multivariate statistical analysis.

    PubMed

    Song, Seung Yeob; Lee, Young Koung; Kim, In-Jung

    2016-01-01

    A high-throughput screening system for Citrus lines were established with higher sugar and acid contents using Fourier transform infrared (FT-IR) spectroscopy in combination with multivariate analysis. FT-IR spectra confirmed typical spectral differences between the frequency regions of 950-1100 cm(-1), 1300-1500 cm(-1), and 1500-1700 cm(-1). Principal component analysis (PCA) and subsequent partial least square-discriminant analysis (PLS-DA) were able to discriminate five Citrus lines into three separate clusters corresponding to their taxonomic relationships. The quantitative predictive modeling of sugar and acid contents from Citrus fruits was established using partial least square regression algorithms from FT-IR spectra. The regression coefficients (R(2)) between predicted values and estimated sugar and acid content values were 0.99. These results demonstrate that by using FT-IR spectra and applying quantitative prediction modeling to Citrus sugar and acid contents, excellent Citrus lines can be early detected with greater accuracy. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Calculation and Analysis of Magnetic Gradient Tensor Components of Global Magnetic Models

    NASA Astrophysics Data System (ADS)

    Schiffler, M.; Queitsch, M.; Schneider, M.; Goepel, A.; Stolz, R.; Krech, W.; Meyer, H. G.; Kukowski, N.

    2014-12-01

    Global Earth's magnetic field models like the International Geomagnetic Reference Field (IGRF), the World Magnetic Model (WMM) or the High Definition Geomagnetic Model (HDGM) are harmonic analysis regressions to available magnetic observations stored as spherical harmonic coefficients. Input data combine recordings from magnetic observatories, airborne magnetic surveys and satellite data. The advance of recent magnetic satellite missions like SWARM and its predecessors like CHAMP offer high resolution measurements while providing a full global coverage. This deserves expansion of the theoretical framework of harmonic synthesis to magnetic gradient tensor components. Measurement setups for Full Tensor Magnetic Gradiometry equipped with high sensitive gradiometers like the JeSSY STAR system can directly measure the gradient tensor components, which requires precise knowledge about the background regional gradients which can be calculated with this extension. In this study we develop the theoretical framework for calculation of the magnetic gradient tensor components from the harmonic series expansion and apply our approach to the IGRF and HDGM. The gradient tensor component maps for entire Earth's surface produced for the IGRF show low gradients reflecting the variation from the dipolar character, whereas maps for the HDGM (up to degree N=729) reveal new information about crustal structure, especially across the oceans, and deeply situated ore bodies. From the gradient tensor components, the rotational invariants, the Eigenvalues, and the normalized source strength (NSS) are calculated. The NSS focuses on shallower and stronger anomalies. Euler deconvolution using either the tensor components or the NSS applied to the HDGM reveals an estimate of the average source depth for the entire magnetic crust as well as individual plutons and ore bodies. The NSS reveals the boundaries between the anomalies of major continental provinces like southern Africa or the Eastern European Craton.

  16. In Spite of Indeterminacy Many Common Factor Score Estimates Yield an Identical Reproduced Covariance Matrix

    ERIC Educational Resources Information Center

    Beauducel, Andre

    2007-01-01

    It was investigated whether commonly used factor score estimates lead to the same reproduced covariance matrix of observed variables. This was achieved by means of Schonemann and Steiger's (1976) regression component analysis, since it is possible to compute the reproduced covariance matrices of the regression components corresponding to different…

  17. Evaluating abundance and trends in a Hawaiian avian community using state-space analysis

    USGS Publications Warehouse

    Camp, Richard J.; Brinck, Kevin W.; Gorresen, P.M.; Paxton, Eben H.

    2016-01-01

    Estimating population abundances and patterns of change over time are important in both ecology and conservation. Trend assessment typically entails fitting a regression to a time series of abundances to estimate population trajectory. However, changes in abundance estimates from year-to-year across time are due to both true variation in population size (process variation) and variation due to imperfect sampling and model fit. State-space models are a relatively new method that can be used to partition the error components and quantify trends based only on process variation. We compare a state-space modelling approach with a more traditional linear regression approach to assess trends in uncorrected raw counts and detection-corrected abundance estimates of forest birds at Hakalau Forest National Wildlife Refuge, Hawai‘i. Most species demonstrated similar trends using either method. In general, evidence for trends using state-space models was less strong than for linear regression, as measured by estimates of precision. However, while the state-space models may sacrifice precision, the expectation is that these estimates provide a better representation of the real world biological processes of interest because they are partitioning process variation (environmental and demographic variation) and observation variation (sampling and model variation). The state-space approach also provides annual estimates of abundance which can be used by managers to set conservation strategies, and can be linked to factors that vary by year, such as climate, to better understand processes that drive population trends.

  18. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.

    The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less

  19. Genetic analysis of milk production traits of Tunisian Holsteins using random regression test-day model with Legendre polynomials

    PubMed Central

    2018-01-01

    Objective The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. Methods A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. Results All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from 0.78±0.01 to 0.82±0.03, between the first and second parities, from 0.73±0.03 to 0.8±0.04 between the first and third parities, and from 0.82±0.02 to 0.84±0.04 between the second and third parities. Conclusion These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins. PMID:28823122

  20. Genetic analysis of milk production traits of Tunisian Holsteins using random regression test-day model with Legendre polynomials.

    PubMed

    Ben Zaabza, Hafedh; Ben Gara, Abderrahmen; Rekik, Boulbaba

    2018-05-01

    The objective of this study was to estimate genetic parameters of milk, fat, and protein yields within and across lactations in Tunisian Holsteins using a random regression test-day (TD) model. A random regression multiple trait multiple lactation TD model was used to estimate genetic parameters in the Tunisian dairy cattle population. Data were TD yields of milk, fat, and protein from the first three lactations. Random regressions were modeled with third-order Legendre polynomials for the additive genetic, and permanent environment effects. Heritabilities, and genetic correlations were estimated by Bayesian techniques using the Gibbs sampler. All variance components tended to be high in the beginning and the end of lactations. Additive genetic variances for milk, fat, and protein yields were the lowest and were the least variable compared to permanent variances. Heritability values tended to increase with parity. Estimates of heritabilities for 305-d yield-traits were low to moderate, 0.14 to 0.2, 0.12 to 0.17, and 0.13 to 0.18 for milk, fat, and protein yields, respectively. Within-parity, genetic correlations among traits were up to 0.74. Genetic correlations among lactations for the yield traits were relatively high and ranged from 0.78±0.01 to 0.82±0.03, between the first and second parities, from 0.73±0.03 to 0.8±0.04 between the first and third parities, and from 0.82±0.02 to 0.84±0.04 between the second and third parities. These results are comparable to previously reported estimates on the same population, indicating that the adoption of a random regression TD model as the official genetic evaluation for production traits in Tunisia, as developed by most Interbull countries, is possible in the Tunisian Holsteins.

  1. Influence of landscape-scale factors in limiting brook trout populations in Pennsylvania streams

    USGS Publications Warehouse

    Kocovsky, P.M.; Carline, R.F.

    2006-01-01

    Landscapes influence the capacity of streams to produce trout through their effect on water chemistry and other factors at the reach scale. Trout abundance also fluctuates over time; thus, to thoroughly understand how spatial factors at landscape scales affect trout populations, one must assess the changes in populations over time to provide a context for interpreting the importance of spatial factors. We used data from the Pennsylvania Fish and Boat Commission's fisheries management database to investigate spatial factors that affect the capacity of streams to support brook trout Salvelinus fontinalis and to provide models useful for their management. We assessed the relative importance of spatial and temporal variation by calculating variance components and comparing relative standard errors for spatial and temporal variation. We used binary logistic regression to predict the presence of harvestable-length brook trout and multiple linear regression to assess the mechanistic links between landscapes and trout populations and to predict population density. The variance in trout density among streams was equal to or greater than the temporal variation for several streams, indicating that differences among sites affect population density. Logistic regression models correctly predicted the absence of harvestable-length brook trout in 60% of validation samples. The r 2-value for the linear regression model predicting density was 0.3, indicating low predictive ability. Both logistic and linear regression models supported buffering capacity against acid episodes as an important mechanistic link between landscapes and trout populations. Although our models fail to predict trout densities precisely, their success at elucidating the mechanistic links between landscapes and trout populations, in concert with the importance of spatial variation, increases our understanding of factors affecting brook trout abundance and will help managers and private groups to protect and enhance populations of wild brook trout. ?? Copyright by the American Fisheries Society 2006.

  2. Monthly streamflow forecasting in the Rhine basin

    NASA Astrophysics Data System (ADS)

    Schick, Simon; Rössler, Ole; Weingartner, Rolf

    2017-04-01

    Forecasting seasonal streamflow of the Rhine river is of societal relevance as the Rhine is an important water way and water resource in Western Europe. The present study investigates the predictability of monthly mean streamflow at lead times of zero, one, and two months with the focus on potential benefits by the integration of seasonal climate predictions. Specifically, we use seasonal predictions of precipitation and surface air temperature released by the European Centre for Medium-Range Weather Forecasts (ECMWF) for a regression analysis. In order to disentangle forecast uncertainty, the 'Reverse Ensemble Streamflow Prediction' framework is adapted here to the context of regression: By using appropriate subsets of predictors the regression model is constrained to either the initial conditions, the meteorological forcing, or both. An operational application is mimicked by equipping the model with the seasonal climate predictions provided by ECMWF. Finally, to mitigate the spatial aggregation of the meteorological fields the model is also applied at the subcatchment scale, and the resulting predictions are combined afterwards. The hindcast experiment is carried out for the period 1982-2011 in cross validation mode at two gauging stations, namely the Rhine at Lobith and Basel. The results show that monthly forecasts are skillful with respect to climatology only at zero lead time. In addition, at zero lead time the integration of seasonal climate predictions decreases the mean absolute error by 5 to 10 percentage compared to forecasts which are solely based on initial conditions. This reduction most likely is induced by the seasonal prediction of precipitation and not air temperature. The study is completed by bench marking the regression model with runoff simulations from ECMWFs seasonal forecast system. By simply using basin averages followed by a linear bias correction, these runoff simulations translate well to monthly streamflow. Though the regression model is only slightly outperformed, we argue that runoff out of the land surface component of seasonal climate forecasting systems is an interesting option when it comes to seasonal streamflow forecasting in large river basins.

  3. Statistical downscaling of summer precipitation over northwestern South America

    NASA Astrophysics Data System (ADS)

    Palomino Lemus, Reiner; Córdoba Machado, Samir; Raquel Gámiz Fortis, Sonia; Castro Díez, Yolanda; Jesús Esteban Parra, María

    2015-04-01

    In this study a statistical downscaling (SD) model using Principal Component Regression (PCR) for simulating summer precipitation in Colombia during the period 1950-2005, has been developed, and climate projections during the 2071-2100 period by applying the obtained SD model have been obtained. For these ends the Principal Components (PCs) of the SLP reanalysis data from NCEP were used as predictor variables, while the observed gridded summer precipitation was the predictand variable. Period 1950-1993 was utilized for calibration and 1994-2010 for validation. The Bootstrap with replacement was applied to provide estimations of the statistical errors. All models perform reasonably well at regional scales, and the spatial distribution of the correlation coefficients between predicted and observed gridded precipitation values show high values (between 0.5 and 0.93) along Andes range, north and north Pacific of Colombia. Additionally, the ability of the MIROC5 GCM to simulate the summer precipitation in Colombia, for present climate (1971-2005), has been analyzed by calculating the differences between the simulated and observed precipitation values. The simulation obtained by this GCM strongly overestimates the precipitation along a horizontal sector through the center of Colombia, especially important at the east and west of this country. However, the SD model applied to the SLP of the GCM shows its ability to faithfully reproduce the rainfall field. Finally, in order to get summer precipitation projections in Colombia for the period 1971-2100, the downscaled model, recalibrated for the total period 1950-2010, has been applied to the SLP output from MIROC5 model under the RCP2.6, RCP4.5 and RCP8.5 scenarios. The changes estimated by the SD models are not significant under the RCP2.6 scenario, while for the RCP4.5 and RCP8.5 scenarios a significant increase of precipitation appears regard to the present values in all the regions, reaching around the 27% in northern Colombia region under the RCP8.5 scenario. Keywords: Statistical downscaling, precipitation, Principal Component Regression, climate change, Colombia. ACKNOWLEDGEMENTS This work has been financed by the projects P11-RNM-7941 (Junta de Andalucía-Spain) and CGL2013-48539-R (MINECO-Spain, FEDER).

  4. Adaptive surrogate modeling by ANOVA and sparse polynomial dimensional decomposition for global sensitivity analysis in fluid simulation

    NASA Astrophysics Data System (ADS)

    Tang, Kunkun; Congedo, Pietro M.; Abgrall, Rémi

    2016-06-01

    The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable for real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.

  5. Not my fault: blame externalization is the psychopathic feature most associated with pathological delinquency among confined delinquents.

    PubMed

    DeLisi, Matt; Angton, Alexia; Vaughn, Michael G; Trulson, Chad R; Caudill, Jonathan W; Beaver, Kevin M

    2014-12-01

    The association between psychopathy and crime is established, but the specific components of the personality disorders that most contribute to crime are largely unknown. Drawing on data from 723 confined delinquents in Missouri, the present study delved into the eight subscales of the Psychopathic Personality Inventory-Short Form to empirically assess the specific aspects of the disorder that are most responsible for explaining variation in career delinquency. Blame externalization emerged as the strongest predictor of career delinquency in ordinary least squares regression, logistic regression, and t-test models. Fearlessness and carefree nonplanfulness were also significant in all models. Other features of psychopathy, such as stress immunity, social potency, and coldheartedness were weakly and inconsistently predictive of career delinquency. Implications of these findings for the study of psychopathy and delinquent careers are discussed in this article. © The Author(s) 2013.

  6. Development and Validation of the Work-Related Well-Being Index: Analysis of the Federal Employee Viewpoint Survey (FEVS).

    PubMed

    Eaton, Jennifer L; Mohr, David C; Hodgson, Michael J; McPhaul, Kathleen M

    2017-10-11

    To describe development and validation of the Work-Related Well-Being (WRWB) Index. Principal Components Analysis was performed using Federal Employee Viewpoint Survey (FEVS) data (N = 392,752) to extract variables representing worker well-being constructs. Confirmatory factor analysis was performed to verify factor structure. To validate the WRWB index, we used multiple regression analysis to examine relationships with burnout associated outcomes. PCA identified three positive psychology constructs: "Work Positivity", "Co-worker Relationships", and "Work Mastery". An 11 item index explaining 63.5% of variance was achieved. The structural equation model provided a very good fit to the data. Higher WRWB scores were positively associated with all 3 employee experience measures examined in regression models. The new WRWB index shows promise as a valid and widely accessible instrument to assess worker well-being.

  7. Combining Mixture Components for Clustering*

    PubMed Central

    Baudry, Jean-Patrick; Raftery, Adrian E.; Celeux, Gilles; Lo, Kenneth; Gottardo, Raphaël

    2010-01-01

    Model-based clustering consists of fitting a mixture model to data and identifying each cluster with one of its components. Multivariate normal distributions are typically used. The number of clusters is usually determined from the data, often using BIC. In practice, however, individual clusters can be poorly fitted by Gaussian distributions, and in that case model-based clustering tends to represent one non-Gaussian cluster by a mixture of two or more Gaussian distributions. If the number of mixture components is interpreted as the number of clusters, this can lead to overestimation of the number of clusters. This is because BIC selects the number of mixture components needed to provide a good approximation to the density, rather than the number of clusters as such. We propose first selecting the total number of Gaussian mixture components, K, using BIC and then combining them hierarchically according to an entropy criterion. This yields a unique soft clustering for each number of clusters less than or equal to K. These clusterings can be compared on substantive grounds, and we also describe an automatic way of selecting the number of clusters via a piecewise linear regression fit to the rescaled entropy plot. We illustrate the method with simulated data and a flow cytometry dataset. Supplemental Materials are available on the journal Web site and described at the end of the paper. PMID:20953302

  8. A study of ground motion attenuation in the Southern Great Basin, Nevada-California, using several techniques for estimates of Qs , log A 0, and coda Q

    NASA Astrophysics Data System (ADS)

    Rogers, A. M.; Harmsen, S. C.; Herrmann, R. B.; Meremonte, M. E.

    1987-04-01

    As a first step in the assessment of the earthquake hazard in the southern Great Basin of Nevada-California, this study evaluates the attenuation of peak vertical ground motions using a number of different regression models applied to unfiltered and band-pass-filtered ground motion data. These data are concentrated in the distance range 10-250 km. The regression models include parameters to account for geometric spreading, anelastic attenuation with a power law frequency dependence, source size, and station site effects. We find that the data are most consistent with an essentially frequency-independent Q and a geometric spreading coefficient less than 1.0. Regressions are also performed on vertical component peak amplitudes reexpressed as pseudo-Wood-Anderson peak amplitude estimates (PWA), permitting comparison with earlier work that used Wood-Anderson (WA) data from California. Both of these results show that Q values in this region are high relative to California, having values in the range 700-900 over the frequency band 1-10 Hz. Comparison of ML magnitudes from stations BRK and PAS for earthquakes in the southern Great Basin shows that these two stations report magnitudes with differences that are distance dependent. This bias suggests that the Richter log A0 curve appropriate to California is too steep for earthquakes occurring in southern Nevada, a result implicitly supporting our finding that Q values are higher than those in California. The PWA attenuation functions derived from our data also indicate that local magnitudes reported by California observatories for earthquakes in this region may be overestimated by as much as 0.8 magnitude units in some cases. Both of these results will have an effect on the assessment of the earthquake hazard in this region. The robustness of our regression technique to extract the correct geometric spreading coefficient n and anelastic attenuation Q is tested by applying the technique to simulated data computed with given n and Q values. Using a stochastic modeling technique, we generate suites of seismograms for the distance range 10-200 km and for both WA and short-period vertical component seismometers. Regressions on the peak amplitudes from these records show that our regression model extracts values of n and Q approximately equal to the input values for either low-Q California attenuation or high-Q southern Nevada attenuation. Regressions on stochastically modeled WA and PWA amplitudes also provides a method of evaluating differences in magnitudes from WA and PWA amplitudes due to recording instrument response characteristics alone. These results indicate a difference between MLWA and MLPWA equal to 0.15 magnitude units, which we term the residual instrument correction. In contrast to the peak amplitude results, coda Q determinations using the single scatterer theory indicate that Qc values are dependent on source type and are proportional to ƒp, where p = 0.8 to 1.0. This result suggests that a difference exists between attenuation mechanisms for direct waves and backscattered waves in this region.

  9. Using occupancy modeling and logistic regression to assess the distribution of shrimp species in lowland streams, Costa Rica: Does regional groundwater create favorable habitat?

    USGS Publications Warehouse

    Snyder, Marcia; Freeman, Mary C.; Purucker, S. Thomas; Pringle, Catherine M.

    2016-01-01

    Freshwater shrimps are an important biotic component of tropical ecosystems. However, they can have a low probability of detection when abundances are low. We sampled 3 of the most common freshwater shrimp species, Macrobrachium olfersii, Macrobrachium carcinus, and Macrobrachium heterochirus, and used occupancy modeling and logistic regression models to improve our limited knowledge of distribution of these cryptic species by investigating both local- and landscape-scale effects at La Selva Biological Station in Costa Rica. Local-scale factors included substrate type and stream size, and landscape-scale factors included presence or absence of regional groundwater inputs. Capture rates for 2 of the sampled species (M. olfersii and M. carcinus) were sufficient to compare the fit of occupancy models. Occupancy models did not converge for M. heterochirus, but M. heterochirus had high enough occupancy rates that logistic regression could be used to model the relationship between occupancy rates and predictors. The best-supported models for M. olfersii and M. carcinus included conductivity, discharge, and substrate parameters. Stream size was positively correlated with occupancy rates of all 3 species. High stream conductivity, which reflects the quantity of regional groundwater input into the stream, was positively correlated with M. olfersii occupancy rates. Boulder substrates increased occupancy rate of M. carcinus and decreased the detection probability of M. olfersii. Our models suggest that shrimp distribution is driven by factors that function at local (substrate and discharge) and landscape (conductivity) scales.

  10. Self-criticism interacts with the affective component of pain to predict depressive symptoms in female patients.

    PubMed

    Lerman, S F; Shahar, G; Rudich, Z

    2012-01-01

    This longitudinal study examined the role of the trait of self-criticism as a moderator of the relationship between the affective and sensory components of pain, and depression. One hundred and sixty-three chronic pain patients treated at a specialty pain clinic completed self-report questionnaires at two time points assessing affective and sensory components of pain, depression, and self-criticism. Hierarchical linear regression analysis revealed a significant 3-way interaction between self-criticism, affective pain and gender, whereby women with high affective pain and high self-criticism demonstrated elevated levels of depression. Our findings are the first to show within a broad, comprehensive model, that selfcriticism is activated by the affective, but not sensory component of pain in leading to depressive symptoms, and highlight the need to assess patients' personality as part of an effective treatment plan. © 2011 European Federation of International Association for the Study of Pain Chapters.

  11. Application of principal component analysis (PCA) as a sensory assessment tool for fermented food products.

    PubMed

    Ghosh, Debasree; Chattopadhyay, Parimal

    2012-06-01

    The objective of the work was to use the method of quantitative descriptive analysis (QDA) to describe the sensory attributes of the fermented food products prepared with the incorporation of lactic cultures. Panellists were selected and trained to evaluate various attributes specially color and appearance, body texture, flavor, overall acceptability and acidity of the fermented food products like cow milk curd and soymilk curd, idli, sauerkraut and probiotic ice cream. Principal component analysis (PCA) identified the six significant principal components that accounted for more than 90% of the variance in the sensory attribute data. Overall product quality was modelled as a function of principal components using multiple least squares regression (R (2) = 0.8). The result from PCA was statistically analyzed by analysis of variance (ANOVA). These findings demonstrate the utility of quantitative descriptive analysis for identifying and measuring the fermented food product attributes that are important for consumer acceptability.

  12. Failure of Standard Training Sets in the Analysis of Fast-Scan Cyclic Voltammetry Data.

    PubMed

    Johnson, Justin A; Rodeberg, Nathan T; Wightman, R Mark

    2016-03-16

    The use of principal component regression, a multivariate calibration method, in the analysis of in vivo fast-scan cyclic voltammetry data allows for separation of overlapping signal contributions, permitting evaluation of the temporal dynamics of multiple neurotransmitters simultaneously. To accomplish this, the technique relies on information about current-concentration relationships across the scan-potential window gained from analysis of training sets. The ability of the constructed models to resolve analytes depends critically on the quality of these data. Recently, the use of standard training sets obtained under conditions other than those of the experimental data collection (e.g., with different electrodes, animals, or equipment) has been reported. This study evaluates the analyte resolution capabilities of models constructed using this approach from both a theoretical and experimental viewpoint. A detailed discussion of the theory of principal component regression is provided to inform this discussion. The findings demonstrate that the use of standard training sets leads to misassignment of the current-concentration relationships across the scan-potential window. This directly results in poor analyte resolution and, consequently, inaccurate quantitation, which may lead to erroneous conclusions being drawn from experimental data. Thus, it is strongly advocated that training sets be obtained under the experimental conditions to allow for accurate data analysis.

  13. Factors Controlling Sediment Load in The Central Anatolia Region of Turkey: Ankara River Basin.

    PubMed

    Duru, Umit; Wohl, Ellen; Ahmadi, Mehdi

    2017-05-01

    Better understanding of the factors controlling sediment load at a catchment scale can facilitate estimation of soil erosion and sediment transport rates. The research summarized here enhances understanding of correlations between potential control variables on suspended sediment loads. The Soil and Water Assessment Tool was used to simulate flow and sediment at the Ankara River basin. Multivariable regression analysis and principal component analysis were then performed between sediment load and controlling variables. The physical variables were either directly derived from a Digital Elevation Model or from field maps or computed using established equations. Mean observed sediment rate is 6697 ton/year and mean sediment yield is 21 ton/y/km² from the gage. Soil and Water Assessment Tool satisfactorily simulated observed sediment load with Nash-Sutcliffe efficiency, relative error, and coefficient of determination (R²) values of 0.81, -1.55, and 0.93, respectively in the catchment. Therefore, parameter values from the physically based model were applied to the multivariable regression analysis as well as principal component analysis. The results indicate that stream flow, drainage area, and channel width explain most of the variability in sediment load among the catchments. The implications of the results, efficient siltation management practices in the catchment should be performed to stream flow, drainage area, and channel width.

  14. Factors Controlling Sediment Load in The Central Anatolia Region of Turkey: Ankara River Basin

    NASA Astrophysics Data System (ADS)

    Duru, Umit; Wohl, Ellen; Ahmadi, Mehdi

    2017-05-01

    Better understanding of the factors controlling sediment load at a catchment scale can facilitate estimation of soil erosion and sediment transport rates. The research summarized here enhances understanding of correlations between potential control variables on suspended sediment loads. The Soil and Water Assessment Tool was used to simulate flow and sediment at the Ankara River basin. Multivariable regression analysis and principal component analysis were then performed between sediment load and controlling variables. The physical variables were either directly derived from a Digital Elevation Model or from field maps or computed using established equations. Mean observed sediment rate is 6697 ton/year and mean sediment yield is 21 ton/y/km² from the gage. Soil and Water Assessment Tool satisfactorily simulated observed sediment load with Nash-Sutcliffe efficiency, relative error, and coefficient of determination ( R²) values of 0.81, -1.55, and 0.93, respectively in the catchment. Therefore, parameter values from the physically based model were applied to the multivariable regression analysis as well as principal component analysis. The results indicate that stream flow, drainage area, and channel width explain most of the variability in sediment load among the catchments. The implications of the results, efficient siltation management practices in the catchment should be performed to stream flow, drainage area, and channel width.

  15. Discrimination of orange beverage emulsions with different formulations using multivariate analysis.

    PubMed

    Mirhosseini, Hamed; Tan, Chin Ping

    2010-06-01

    The constituents in a food emulsion interact with each other, either physically or chemically, determining the overall physico-chemical and organoleptic properties of the final product. Thus, the main objective of present study was to investigate the effect of emulsion components on beverage emulsion properties. In most cases, the second-order polynomial regression models with no significant (P > 0.05) lack of fit and high adjusted coefficient of determination (adjusted R(2), 0.851-0.996) were significantly fitted to explain the beverage emulsion properties as function of main emulsion components. The main effect of gum arabic was found to be significant (P < 0.05) in all response regression models. Orange beverage emulsion containing 222.0 g kg(-1) gum arabic, 2.4 g kg(-1) xanthan gum and 152.7 g kg(-1) orange oil was predicted to provide the desirable emulsion properties. The present study suggests that the concentration of gum arabic should be considered as a primary critical factor for the formulation of orange beverage emulsion. This study also indicated that the interaction effect between xanthan gum and orange oil showed the most significant (P < 0.05) effect among all interaction effects influencing all the physicochemical properties except for density. Copyright (c) 2010 Society of Chemical Industry.

  16. Method optimization for drug impurity profiling in supercritical fluid chromatography: Application to a pharmaceutical mixture.

    PubMed

    Muscat Galea, Charlene; Didion, David; Clicq, David; Mangelings, Debby; Vander Heyden, Yvan

    2017-12-01

    A supercritical chromatographic method for the separation of a drug and its impurities has been developed and optimized applying an experimental design approach and chromatogram simulations. Stationary phase screening was followed by optimization of the modifier and injection solvent composition. A design-of-experiment (DoE) approach was then used to optimize column temperature, back-pressure and the gradient slope simultaneously. Regression models for the retention times and peak widths of all mixture components were built. The factor levels for different grid points were then used to predict the retention times and peak widths of the mixture components using the regression models and the best separation for the worst separated peak pair in the experimental domain was identified. A plot of the minimal resolutions was used to help identifying the factor levels leading to the highest resolution between consecutive peaks. The effects of the DoE factors were visualized in a way that is familiar to the analytical chemist, i.e. by simulating the resulting chromatogram. The mixture of an active ingredient and seven impurities was separated in less than eight minutes. The approach discussed in this paper demonstrates how SFC methods can be developed and optimized efficiently using simple concepts and tools. Copyright © 2017 Elsevier B.V. All rights reserved.

  17. Calibration for the shear strain of 3-component borehole strainmeters in eastern Taiwan through Earth and ocean tidal waveform modeling

    NASA Astrophysics Data System (ADS)

    Canitano, Alexandre; Hsu, Ya-Ju; Lee, Hsin-Ming; Linde, Alan T.; Sacks, Selwyn

    2018-03-01

    We propose an approach for calibrating the horizontal tidal shear components [(differential extension (γ _1) and engineering shear (γ _2)] of two Sacks-Evertson (in Pap Meteorol Geophys 22:195-208, 1971) SES-3 borehole strainmeters installed in the Longitudinal Valley in eastern Taiwan. The method is based on the waveform reconstruction of the Earth and ocean tidal shear signals through linear regressions on strain gauge signals, with variable sensor azimuth. This method allows us to derive the orientation of the sensor without any initial constraints and to calibrate the shear strain components γ _1 and γ _2 against M_2 tidal constituent. The results illustrate the potential of tensor strainmeters for recording horizontal tidal shear strain.

  18. Applying the health promotion model to development of a worksite intervention.

    PubMed

    Lusk, S L; Kerr, M J; Ronis, D L; Eakin, B L

    1999-01-01

    Consistent use of hearing protection devices (HPDs) decreases noise-induced hearing loss, however, many workers do not use them consistently. Past research has supported the need to use a conceptual framework to understand behaviors and guide intervention programs; however, few reports have specified a process to translate a conceptual model into an intervention. The strongest predictors from the Health Promotion Model were used to design a training program to increase HPD use among construction workers. Carpenters (n = 118), operating engineers (n = 109), and plumber/pipefitters (n = 129) in the Midwest were recruited to participate in the study. Written questionnaires including scales measuring the components of the Health Promotion Model were completed in classroom settings at worker trade group meetings. All items from scales predicting HPD use were reviewed to determine the basis for the content of a program to promote the use of HPDs. Three selection criteria were developed: (1) correlation with use of hearing protection (at least .20), (2) amenability to change, and (3) room for improvement (mean score not at ceiling). Linear regression and Pearson's correlation were used to assess the components of the model as predictors of HPD use. Five predictors had statistically significant regression coefficients: perceived noise exposure, self-efficacy, value of use, barriers to use, and modeling of use of hearing protection. Using items meeting the selection criteria, a 20-minute videotape with written handouts was developed as the core of an intervention. A clearly defined practice session was also incorporated in the training intervention. Determining salient factors for worker populations and specific protective equipment prior to designing an intervention is essential. These predictors provided the basis for a training program that addressed the specific needs of construction workers. Results of tests of the effectiveness of the program will be available in the near future.

  19. Changes of choroidal neovascularization in indocyanine green angiography after intravitreal ranibizumab injection.

    PubMed

    Lee, Ji Eun; Kim, Hyun Woong; Lee, Sang Joon; Lee, Joo Eun

    2015-05-01

    To investigate vascular structural changes of choroidal neovascularization (CNV) followed by intravitreal ranibizumab injections using indocyanine green angiography. A total of 31 patients with exudative age-related macular degeneration and CNV whose structures were identifiable in indocyanine green angiography were included. Ranibizumab was injected into the vitreous cavity once a month for 3 months and then as needed for the next 3 months prospectively. Indocyanine green angiography was performed at baseline, 3, and 6 months. Early to midphase images of the indocyanine green angiography in the details of vascular structure of the CNV were discerned the best were used in the image analysis. Vascular structures of CNV were described as arteriovenular and capillary components, and structural changes were assessed. Arteriovenular components were observed in 29 eyes (94%). Regression of the capillary components was observed in most cases. Although regression of arteriovenular component was noted in 14 eyes (48%), complete resolution was not observed. The eyes were categorized into 3 groups according to CNV structural changes: the regressed (Group R, 10 eyes, 31%), the matured (Group M, 7 eyes, 23%), and the growing (Group G, 14 eyes, 45%). In Group R, there was no regrowth of CNV found at 6 months. In Group M, distinct vascular structures were observed at 3 months and persisted without apparent changes at 6 months. In Group G, growth or reperfusion of capillary components from the persisting arteriovenular components was noted at 6 months. Both capillary and arteriovenular components were regressed during monthly ranibizumab injections. However, CNV regrowth was observed in a group of patients during the as-needed treatment phase.

  20. Identifying maternal and infant factors associated with newborn size in rural Bangladesh by partial least squares (PLS) regression analysis

    PubMed Central

    Rahman, Md. Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D. W.; Labrique, Alain B.; Rashid, Mahbubur; Christian, Parul; West, Keith P.

    2017-01-01

    Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 − -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset. PMID:29261760

  1. Identifying maternal and infant factors associated with newborn size in rural Bangladesh by partial least squares (PLS) regression analysis.

    PubMed

    Kabir, Alamgir; Rahman, Md Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D W; Labrique, Alain B; Rashid, Mahbubur; Christian, Parul; West, Keith P

    2017-01-01

    Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 - -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset.

  2. Human allometry: adult bodies are more nearly geometrically similar than regression analysis has suggested.

    PubMed

    Burton, Richard F

    2010-01-01

    It is almost a matter of dogma that human body mass in adults tends to vary roughly in proportion to the square of height (stature), as Quetelet stated in 1835. As he realised, perfect isometry or geometric similarity requires that body mass varies with height cubed, so there seems to be a trend for tall adults to be relatively much lighter than short ones. Much evidence regarding component tissues and organs seems to accord with this idea. However, the hypothesis is presented that the proportions of the body are actually very much less size-dependent. Past evidence has mostly been obtained by least-squares regression analysis, but this cannot generally give a true picture of the allometric relationships. This is because there is considerable scatter in the data (leading to a low correlation between mass and height) and because neither variable causally determines the other. The relevant regression equations, though often formulated in logarithmic terms, effectively treat the masses as proportional to (body height)(b). Values of b estimated by regression must usually underestimate the true functional values, doing so especially when mass and height are poorly correlated. It is therefore telling support for the hypothesis that published estimates of b both for the whole body (which range between 1.0 and 2.5) and for its component tissues and organs (which vary even more) correlate with the corresponding correlation coefficients for mass and height. There is no simple statistical technique for establishing the true functional relationships, but Monte Carlo modelling has shown that the results obtained for total body mass are compatible with a true height exponent of three. Other data, on relationships between body mass and the girths of various body parts such as the thigh and chest, are also more consistent with isometry than regression analysis has suggested. This too is demonstrated by modelling. It thus seems that much of anthropometry needs to be re-evaluated. It is not suggested that all organs and tissues scale equally with whole body size.

  3. Freshwater DOM quantity and quality from a two-component model of UV absorbance

    USGS Publications Warehouse

    Carter, Heather T.; Tipping, Edward; Koprivnjak, Jean-Francois; Miller, Matthew P.; Cookson, Brenda; Hamilton-Taylor, John

    2012-01-01

    We present a model that considers UV-absorbing dissolved organic matter (DOM) to consist of two components (A and B), each with a distinct and constant spectrum. Component A absorbs UV light strongly, and is therefore presumed to possess aromatic chromophores and hydrophobic character, whereas B absorbs weakly and can be assumed hydrophilic. We parameterised the model with dissolved organic carbon concentrations [DOC] and corresponding UV spectra for c. 1700 filtered surface water samples from North America and the United Kingdom, by optimising extinction coefficients for A and B, together with a small constant concentration of non-absorbing DOM (0.80 mg DOC L-1). Good unbiased predictions of [DOC] from absorbance data at 270 and 350 nm were obtained (r2 = 0.98), the sum of squared residuals in [DOC] being reduced by 66% compared to a regression model fitted to absorbance at 270 nm alone. The parameterised model can use measured optical absorbance values at any pair of suitable wavelengths to calculate both [DOC] and the relative amounts of A and B in a water sample, i.e. measures of quantity and quality. Blind prediction of [DOC] was satisfactory for 9 of 11 independent data sets (181 of 213 individual samples).

  4. Development of a hybrid model to predict construction and demolition waste: China as a case study.

    PubMed

    Song, Yiliao; Wang, Yong; Liu, Feng; Zhang, Yixin

    2017-01-01

    Construction and demolition waste (C&DW) is currently a worldwide issue, and the situation is the worst in China due to a rapid increase in the construction industry and the short life span of China's buildings. To create an opportunity out of this problem, comprehensive prevention measures and effective management strategies are urgently needed. One major gap in the literature of waste management is a lack of estimations on future C&DW generation. Therefore, this paper presents a forecasting procedure for C&DW in China that can forecast the quantity of each component in such waste. The proposed approach is based on a GM-SVR model that improves the forecasting effectiveness of the gray model (GM), which is achieved by adjusting the residual series by a support vector regression (SVR) method and a transition matrix that aims to estimate the discharge of each component in the C&DW. Through the proposed method, future C&DW volume are listed and analyzed containing their potential components and distribution in different provinces in China. Besides, model testing process provides mathematical evidence to validate the proposed model is an effective way to give future information of C&DW for policy makers. Copyright © 2016 Elsevier Ltd. All rights reserved.

  5. Virtual milk for modelling and simulation of dairy processes.

    PubMed

    Munir, M T; Zhang, Y; Yu, W; Wilson, D I; Young, B R

    2016-05-01

    The modeling of dairy processing using a generic process simulator suffers from shortcomings, given that many simulators do not contain milk components in their component libraries. Recently, pseudo-milk components for a commercial process simulator were proposed for simulation and the current work extends this pseudo-milk concept by studying the effect of both total milk solids and temperature on key physical properties such as thermal conductivity, density, viscosity, and heat capacity. This paper also uses expanded fluid and power law models to predict milk viscosity over the temperature range from 4 to 75°C and develops a succinct regressed model for heat capacity as a function of temperature and fat composition. The pseudo-milk was validated by comparing the simulated and actual values of the physical properties of milk. The milk thermal conductivity, density, viscosity, and heat capacity showed differences of less than 2, 4, 3, and 1.5%, respectively, between the simulated results and actual values. This work extends the capabilities of the previously proposed pseudo-milk and of a process simulator to model dairy processes, processing different types of milk (e.g., whole milk, skim milk, and concentrated milk) with different intrinsic compositions, and to predict correct material and energy balances for dairy processes. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  6. The weighted priors approach for combining expert opinions in logistic regression experiments

    DOE PAGES

    Quinlan, Kevin R.; Anderson-Cook, Christine M.; Myers, Kary L.

    2017-04-24

    When modeling the reliability of a system or component, it is not uncommon for more than one expert to provide very different prior estimates of the expected reliability as a function of an explanatory variable such as age or temperature. Our goal in this paper is to incorporate all information from the experts when choosing a design about which units to test. Bayesian design of experiments has been shown to be very successful for generalized linear models, including logistic regression models. We use this approach to develop methodology for the case where there are several potentially non-overlapping priors under consideration.more » While multiple priors have been used for analysis in the past, they have never been used in a design context. The Weighted Priors method performs well for a broad range of true underlying model parameter choices and is more robust when compared to other reasonable design choices. Finally, we illustrate the method through multiple scenarios and a motivating example. Additional figures for this article are available in the online supplementary information.« less

  7. Face aging effect simulation model based on multilayer representation and shearlet transform

    NASA Astrophysics Data System (ADS)

    Li, Yuancheng; Li, Yan

    2017-09-01

    In order to extract detailed facial features, we build a face aging effect simulation model based on multilayer representation and shearlet transform. The face is divided into three layers: the global layer of the face, the local features layer, and texture layer, which separately establishes the aging model. First, the training samples are classified according to different age groups, and we use active appearance model (AAM) at the global level to obtain facial features. The regression equations of shape and texture with age are obtained by fitting the support vector machine regression, which is based on the radial basis function. We use AAM to simulate the aging of facial organs. Then, for the texture detail layer, we acquire the significant high-frequency characteristic components of the face by using the multiscale shearlet transform. Finally, we get the last simulated aging images of the human face by the fusion algorithm. Experiments are carried out on the FG-NET dataset, and the experimental results show that the simulated face images have less differences from the original image and have a good face aging simulation effect.

  8. Kernel analysis of partial least squares (PLS) regression models.

    PubMed

    Shinzawa, Hideyuki; Ritthiruangdej, Pitiporn; Ozaki, Yukihiro

    2011-05-01

    An analytical technique based on kernel matrix representation is demonstrated to provide further chemically meaningful insight into partial least squares (PLS) regression models. The kernel matrix condenses essential information about scores derived from PLS or principal component analysis (PCA). Thus, it becomes possible to establish the proper interpretation of the scores. A PLS model for the total nitrogen (TN) content in multiple Thai fish sauces is built with a set of near-infrared (NIR) transmittance spectra of the fish sauce samples. The kernel analysis of the scores effectively reveals that the variation of the spectral feature induced by the change in protein content is substantially associated with the total water content and the protein hydration. Kernel analysis is also carried out on a set of time-dependent infrared (IR) spectra representing transient evaporation of ethanol from a binary mixture solution of ethanol and oleic acid. A PLS model to predict the elapsed time is built with the IR spectra and the kernel matrix is derived from the scores. The detailed analysis of the kernel matrix provides penetrating insight into the interaction between the ethanol and the oleic acid.

  9. Urban change analysis and future growth of Istanbul.

    PubMed

    Akın, Anıl; Sunar, Filiz; Berberoğlu, Süha

    2015-08-01

    This study is aimed at analyzing urban change within Istanbul and assessing the city's future growth potential using appropriate approach modeling for the year 2040. Urban growth is a major driving force of land-use change, and spatial and temporal components of urbanization can be identified through accurate spatial modeling. In this context, widely used urban modeling approaches, such as the Markov chain and logistic regression based on cellular automata (CA), were used to simulate urban growth within Istanbul. The distance from each pixel to the urban and road classes, elevation, and slope, together with municipality and land use maps (as an excluded layer), were identified as factors. Calibration data were obtained from remotely sensed data recorded in 1972, 1986, and 2013. Validation was performed by overlaying the simulated and actual 2013 urban maps, and a kappa index of agreement was derived. The results indicate that urban expansion will influence mainly forest areas during the time period of 2013-2040. The urban expansion was predicted as 429 and 327 km(2) with the Markov chain and logistic regression models, respectively.

  10. The weighted priors approach for combining expert opinions in logistic regression experiments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Quinlan, Kevin R.; Anderson-Cook, Christine M.; Myers, Kary L.

    When modeling the reliability of a system or component, it is not uncommon for more than one expert to provide very different prior estimates of the expected reliability as a function of an explanatory variable such as age or temperature. Our goal in this paper is to incorporate all information from the experts when choosing a design about which units to test. Bayesian design of experiments has been shown to be very successful for generalized linear models, including logistic regression models. We use this approach to develop methodology for the case where there are several potentially non-overlapping priors under consideration.more » While multiple priors have been used for analysis in the past, they have never been used in a design context. The Weighted Priors method performs well for a broad range of true underlying model parameter choices and is more robust when compared to other reasonable design choices. Finally, we illustrate the method through multiple scenarios and a motivating example. Additional figures for this article are available in the online supplementary information.« less

  11. Quantifying components of the hydrologic cycle in Virginia using chemical hydrograph separation and multiple regression analysis

    USGS Publications Warehouse

    Sanford, Ward E.; Nelms, David L.; Pope, Jason P.; Selnick, David L.

    2012-01-01

    This study by the U.S. Geological Survey, prepared in cooperation with the Virginia Department of Environmental Quality, quantifies the components of the hydrologic cycle across the Commonwealth of Virginia. Long-term, mean fluxes were calculated for precipitation, surface runoff, infiltration, total evapotranspiration (ET), riparian ET, recharge, base flow (or groundwater discharge) and net total outflow. Fluxes of these components were first estimated on a number of real-time-gaged watersheds across Virginia. Specific conductance was used to distinguish and separate surface runoff from base flow. Specific-conductance data were collected every 15 minutes at 75 real-time gages for approximately 18 months between March 2007 and August 2008. Precipitation was estimated for 1971–2000 using PRISM climate data. Precipitation and temperature from the PRISM data were used to develop a regression-based relation to estimate total ET. The proportion of watershed precipitation that becomes surface runoff was related to physiographic province and rock type in a runoff regression equation. Component flux estimates from the watersheds were transferred to flux estimates for counties and independent cities using the ET and runoff regression equations. Only 48 of the 75 watersheds yielded sufficient data, and data from these 48 were used in the final runoff regression equation. The base-flow proportion for the 48 watersheds averaged 72 percent using specific conductance, a value that was substantially higher than the 61 percent average calculated using a graphical-separation technique (the USGS program PART). Final results for the study are presented as component flux estimates for all counties and independent cities in Virginia.

  12. 77 FR 3121 - Program Integrity: Gainful Employment-Debt Measures; Correction

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-01-23

    ...On June 13, 2011, the Secretary of Education (Secretary) published a notice of final regulations in the Federal Register for Program Integrity: Gainful Employment--Debt Measures (Gainful Employment--Debt Measures) (76 FR 34386). In the preamble of the final regulations, we used the wrong data to calculate the percent of total variance in institutions' repayment rates that may be explained by race/ethnicity. Our intent was to use the data that included all minority students per institution. However, we mistakenly used the data for a subset of minority students per institution. We have now recalculated the total variance using the data that includes all minority students. Through this document, we correct, in the preamble of the Gainful Employment--Debt Measures final regulations, the errors resulting from this misapplication. We do not change the regression analysis model itself; we are using the same model with the appropriate data. Through this notice we also correct, in the preamble of the Gainful Employment--Debt Measures final regulations, our description of one component of the regression analysis. The preamble referred to use of an institutional variable measuring acceptance rates. This description was incorrect; in fact we used an institutional variable measuring retention rates. Correcting this language does not change the regression analysis model itself or the variance explained by the model. The text of the final regulations remains unchanged.

  13. Modified locally weighted--partial least squares regression improving clinical predictions from infrared spectra of human serum samples.

    PubMed

    Perez-Guaita, David; Kuligowski, Julia; Quintás, Guillermo; Garrigues, Salvador; Guardia, Miguel de la

    2013-03-30

    Locally weighted partial least squares regression (LW-PLSR) has been applied to the determination of four clinical parameters in human serum samples (total protein, triglyceride, glucose and urea contents) by Fourier transform infrared (FTIR) spectroscopy. Classical LW-PLSR models were constructed using different spectral regions. For the selection of parameters by LW-PLSR modeling, a multi-parametric study was carried out employing the minimum root-mean square error of cross validation (RMSCV) as objective function. In order to overcome the effect of strong matrix interferences on the predictive accuracy of LW-PLSR models, this work focuses on sample selection. Accordingly, a novel strategy for the development of local models is proposed. It was based on the use of: (i) principal component analysis (PCA) performed on an analyte specific spectral region for identifying most similar sample spectra and (ii) partial least squares regression (PLSR) constructed using the whole spectrum. Results found by using this strategy were compared to those provided by PLSR using the same spectral intervals as for LW-PLSR. Prediction errors found by both, classical and modified LW-PLSR improved those obtained by PLSR. Hence, both proposed approaches were useful for the determination of analytes present in a complex matrix as in the case of human serum samples. Copyright © 2013 Elsevier B.V. All rights reserved.

  14. Effect of noise in principal component analysis with an application to ozone pollution

    NASA Astrophysics Data System (ADS)

    Tsakiri, Katerina G.

    This thesis analyzes the effect of independent noise in principal components of k normally distributed random variables defined by a covariance matrix. We prove that the principal components as well as the canonical variate pairs determined from joint distribution of original sample affected by noise can be essentially different in comparison with those determined from the original sample. However when the differences between the eigenvalues of the original covariance matrix are sufficiently large compared to the level of the noise, the effect of noise in principal components and canonical variate pairs proved to be negligible. The theoretical results are supported by simulation study and examples. Moreover, we compare our results about the eigenvalues and eigenvectors in the two dimensional case with other models examined before. This theory can be applied in any field for the decomposition of the components in multivariate analysis. One application is the detection and prediction of the main atmospheric factor of ozone concentrations on the example of Albany, New York. Using daily ozone, solar radiation, temperature, wind speed and precipitation data, we determine the main atmospheric factor for the explanation and prediction of ozone concentrations. A methodology is described for the decomposition of the time series of ozone and other atmospheric variables into the global term component which describes the long term trend and the seasonal variations, and the synoptic scale component which describes the short term variations. By using the Canonical Correlation Analysis, we show that solar radiation is the only main factor between the atmospheric variables considered here for the explanation and prediction of the global and synoptic scale component of ozone. The global term components are modeled by a linear regression model, while the synoptic scale components by a vector autoregressive model and the Kalman filter. The coefficient of determination, R2, for the prediction of the synoptic scale ozone component was found to be the highest when we consider the synoptic scale component of the time series for solar radiation and temperature. KEY WORDS: multivariate analysis; principal component; canonical variate pairs; eigenvalue; eigenvector; ozone; solar radiation; spectral decomposition; Kalman filter; time series prediction

  15. Neighborhood social capital and crime victimization: comparison of spatial regression analysis and hierarchical regression analysis.

    PubMed

    Takagi, Daisuke; Ikeda, Ken'ichi; Kawachi, Ichiro

    2012-11-01

    Crime is an important determinant of public health outcomes, including quality of life, mental well-being, and health behavior. A body of research has documented the association between community social capital and crime victimization. The association between social capital and crime victimization has been examined at multiple levels of spatial aggregation, ranging from entire countries, to states, metropolitan areas, counties, and neighborhoods. In multilevel analysis, the spatial boundaries at level 2 are most often drawn from administrative boundaries (e.g., Census tracts in the U.S.). One problem with adopting administrative definitions of neighborhoods is that it ignores spatial spillover. We conducted a study of social capital and crime victimization in one ward of Tokyo city, using a spatial Durbin model with an inverse-distance weighting matrix that assigned each respondent a unique level of "exposure" to social capital based on all other residents' perceptions. The study is based on a postal questionnaire sent to 20-69 years old residents of Arakawa Ward, Tokyo. The response rate was 43.7%. We examined the contextual influence of generalized trust, perceptions of reciprocity, two types of social network variables, as well as two principal components of social capital (constructed from the above four variables). Our outcome measure was self-reported crime victimization in the last five years. In the spatial Durbin model, we found that neighborhood generalized trust, reciprocity, supportive networks and two principal components of social capital were each inversely associated with crime victimization. By contrast, a multilevel regression performed with the same data (using administrative neighborhood boundaries) found generally null associations between neighborhood social capital and crime. Spatial regression methods may be more appropriate for investigating the contextual influence of social capital in homogeneous cultural settings such as Japan. Copyright © 2012 Elsevier Ltd. All rights reserved.

  16. The Separation of Between-person and Within-person Components of Individual Change Over Time: A Latent Curve Model with Structured Residuals

    PubMed Central

    Curran, Patrick J.; Howard, Andrea L.; Bainter, Sierra; Lane, Stephanie T.; McGinley, James S.

    2014-01-01

    Objective Although recent statistical and computational developments allow for the empirical testing of psychological theories in ways not previously possible, one particularly vexing challenge remains: how to optimally model the prospective, reciprocal relations between two constructs as they developmentally unfold over time. Several analytic methods currently exist that attempt to model these types of relations, and each approach is successful to varying degrees. However, none provide the unambiguous separation of between-person and within-person components of stability and change over time, components that are often hypothesized to exist in the psychological sciences. The goal of our paper is to propose and demonstrate a novel extension of the multivariate latent curve model to allow for the disaggregation of these effects. Method We begin with a review of the standard latent curve models and describe how these primarily capture between-person differences in change. We then extend this model to allow for regression structures among the time-specific residuals to capture within-person differences in change. Results We demonstrate this model using an artificial data set generated to mimic the developmental relation between alcohol use and depressive symptomatology spanning five repeated measures. Conclusions We obtain a specificity of results from the proposed analytic strategy that are not available from other existing methodologies. We conclude with potential limitations of our approach and directions for future research. PMID:24364798

  17. Prevention and treatment of atherosclerosis with flaxseed-derived compound secoisolariciresinol diglucoside.

    PubMed

    Prasad, Kailash; Jadhav, Ashok

    2016-01-01

    Atherosclerosis is the primary cause of coronary artery disease, heart attack, strokes, and peripheral vascular disease. Alternative/complimentary medicines, although are unacceptable by medical community, may be of great help in suppression, slowing of progression and regression of atherosclerosis. Numerous natural products are in use for therapy in spite of lack of evidence. This paper discusses the basic mechanism of atherosclerosis, risk factors for atherosclerosis, and prevention, slowing of progression and regression of atherosclerosis with flaxseed-derived secoisolariciresinol diglucoside (SDG). SDG content of flaxseed varies from 6mg/g to 18 mg/g. Flaxseed is the richest source of SDG. SDG possesses antioxidant, antihypertensive, antidiabetic, hypolipidemic, anti-inflammatory and antiatherogenic activities. SDG content of some commonly used food has been described. SDG in very low dose (15 mg/ kg) suppressed the development of hypercholesterolemic atherosclerosis by 73 % and this effect was associated with reduction in serum total cholesterol, LDL-C, and oxidative stress, and an increase in the levels HDL-C. A summary of the effects of flaxseed and its components on hypercholesterolemic atherosclerosis has been provided. Reduction in hypercholesterolemic atherosclerosis by flaxseed, CDC-flaxseed, flaxseed oil, flax lignan complex and SDG are 46 %, 69 %, 0 %, 34 % and 73 % respectively in dietary cholesterol -induced rabbit model of atherosclerosis. SDG slows the progression of atherosclerosis in animal model. Long-term use of SDG regresses hypercholesterolemic atherosclerosis. It is interesting that regular diet following high cholesterol diet accelerates in this animal model of atherosclerosis. In conclusion SDG suppresses, slow the progression and regresses the atherosclerosis. It could serve as an alternative medicine for the prevention, slowing of progression and regression of atherosclerosis and hence for the treatment of coronary artery disease, stroke and peripheral arterial vascular diseases.

  18. Spectroscopic and Chemometric Analysis of Binary and Ternary Edible Oil Mixtures: Qualitative and Quantitative Study.

    PubMed

    Jović, Ozren; Smolić, Tomislav; Primožič, Ines; Hrenar, Tomica

    2016-04-19

    The aim of this study was to investigate the feasibility of FTIR-ATR spectroscopy coupled with the multivariate numerical methodology for qualitative and quantitative analysis of binary and ternary edible oil mixtures. Four pure oils (extra virgin olive oil, high oleic sunflower oil, rapeseed oil, and sunflower oil), as well as their 54 binary and 108 ternary mixtures, were analyzed using FTIR-ATR spectroscopy in combination with principal component and discriminant analysis, partial least-squares, and principal component regression. It was found that the composition of all 166 samples can be excellently represented using only the first three principal components describing 98.29% of total variance in the selected spectral range (3035-2989, 1170-1140, 1120-1100, 1093-1047, and 930-890 cm(-1)). Factor scores in 3D space spanned by these three principal components form a tetrahedral-like arrangement: pure oils being at the vertices, binary mixtures at the edges, and ternary mixtures on the faces of a tetrahedron. To confirm the validity of results, we applied several cross-validation methods. Quantitative analysis was performed by minimization of root-mean-square error of cross-validation values regarding the spectral range, derivative order, and choice of method (partial least-squares or principal component regression), which resulted in excellent predictions for test sets (R(2) > 0.99 in all cases). Additionally, experimentally more demanding gas chromatography analysis of fatty acid content was carried out for all specimens, confirming the results obtained by FTIR-ATR coupled with principal component analysis. However, FTIR-ATR provided a considerably better model for prediction of mixture composition than gas chromatography, especially for high oleic sunflower oil.

  19. The effects of components of fine particulate air pollution on mortality in california: results from CALFINE.

    PubMed

    Ostro, Bart; Feng, Wen-Ying; Broadwin, Rachel; Green, Shelley; Lipsett, Michael

    2007-01-01

    Several epidemiologic studies provide evidence of an association between daily mortality and particulate matter < 2.5 pm in diameter (PM2.5). Little is known, however, about the relative effects of PM2.5 constituents. We examined associations between 19 PM2.5 components and daily mortality in six California counties. We obtained daily data from 2000 to 2003 on mortality and PM2.5 mass and components, including elemental and organic carbon (EC and OC), nitrates, sulfates, and various metals. We examined associations of PM2.5 and its constituents with daily counts of several mortality categories: all-cause, cardiovascular, respiratory, and mortality age > 65 years. Poisson regressions incorporating natural splines were used to control for time-varying covariates. Effect estimates were determined for each component in each county and then combined using a random-effects model. PM2.5 mass and several constituents were associated with multiple mortality categories, especially cardiovascular deaths. For example, for a 3-day lag, the latter increased by 1.6, 2.1, 1.6, and 1.5% for PM2.5, EC, OC, and nitrates based on interquartile ranges of 14.6, 0.8, 4.6, and 5.5 pg/m(3), respectively. Stronger associations were observed between mortality and additional pollutants, including sulfates and several metals, during the cool season. This multicounty analysis adds to the growing body of evidence linking PM2.5 with mortality and indicates that excess risks may vary among specific PM2.5 components. Therefore, the use of regression coefficients based on PM2.5 mass may underestimate associations with some PM2.5 components. Also, our findings support the hypothesis that combustion-associated pollutants are particularly important in California.

  20. Bayesian nonparametric regression with varying residual density

    PubMed Central

    Pati, Debdeep; Dunson, David B.

    2013-01-01

    We consider the problem of robust Bayesian inference on the mean regression function allowing the residual density to change flexibly with predictors. The proposed class of models is based on a Gaussian process prior for the mean regression function and mixtures of Gaussians for the collection of residual densities indexed by predictors. Initially considering the homoscedastic case, we propose priors for the residual density based on probit stick-breaking (PSB) scale mixtures and symmetrized PSB (sPSB) location-scale mixtures. Both priors restrict the residual density to be symmetric about zero, with the sPSB prior more flexible in allowing multimodal densities. We provide sufficient conditions to ensure strong posterior consistency in estimating the regression function under the sPSB prior, generalizing existing theory focused on parametric residual distributions. The PSB and sPSB priors are generalized to allow residual densities to change nonparametrically with predictors through incorporating Gaussian processes in the stick-breaking components. This leads to a robust Bayesian regression procedure that automatically down-weights outliers and influential observations in a locally-adaptive manner. Posterior computation relies on an efficient data augmentation exact block Gibbs sampler. The methods are illustrated using simulated and real data applications. PMID:24465053

  1. Forecasting hourly PM(10) concentration in Cyprus through artificial neural networks and multiple regression models: implications to local environmental management.

    PubMed

    Paschalidou, Anastasia K; Karakitsios, Spyridon; Kleanthous, Savvas; Kassomenos, Pavlos A

    2011-02-01

    In the present work, two types of artificial neural network (NN) models using the multilayer perceptron (MLP) and the radial basis function (RBF) techniques, as well as a model based on principal component regression analysis (PCRA), are employed to forecast hourly PM(10) concentrations in four urban areas (Larnaca, Limassol, Nicosia and Paphos) in Cyprus. The model development is based on a variety of meteorological and pollutant parameters corresponding to the 2-year period between July 2006 and June 2008, and the model evaluation is achieved through the use of a series of well-established evaluation instruments and methodologies. The evaluation reveals that the MLP NN models display the best forecasting performance with R (2) values ranging between 0.65 and 0.76, whereas the RBF NNs and the PCRA models reveal a rather weak performance with R (2) values between 0.37-0.43 and 0.33-0.38, respectively. The derived MLP models are also used to forecast Saharan dust episodes with remarkable success (probability of detection ranging between 0.68 and 0.71). On the whole, the analysis shows that the models introduced here could provide local authorities with reliable and precise predictions and alarms about air quality if used on an operational basis.

  2. Comparison of adsorption equilibrium models for the study of CL-, NO3- and SO4(2-) removal from aqueous solutions by an anion exchange resin.

    PubMed

    Dron, Julien; Dodi, Alain

    2011-06-15

    The removal of chloride, nitrate and sulfate ions from aqueous solutions by a macroporous resin is studied through the ion exchange systems OH(-)/Cl(-), OH(-)/NO(3)(-), OH(-)/SO(4)(2-), and HCO(3)(-)/Cl(-), Cl(-)/NO(3)(-), Cl(-)/SO(4)(2-). They are investigated by means of Langmuir, Freundlich, Dubinin-Radushkevitch (D-R) and Dubinin-Astakhov (D-A) single-component adsorption isotherms. The sorption parameters and the fitting of the models are determined by nonlinear regression and discussed. The Langmuir model provides a fair estimation of the sorption capacity whatever the system under study, on the contrary to Freundlich and D-R models. The adsorption energies deduced from Dubinin and Langmuir isotherms are in good agreement, and the surface parameter of the D-A isotherm appears consistent. All models agree on the order of affinity OH(-)

  3. Cenesthopathy and Subjective Cognitive Complaints: An Exploratory Study in Schizophrenia.

    PubMed

    Jimeno, Natalia; Vargas, Martin L

    2018-01-01

    Cenesthopathy is mainly associated with schizophrenia; however, its neurobiological basis is nowadays unclear. The general objective was to explore clinical correlates of cenesthopathy and subjective cognitive complaints in schizophrenia. Participants (n = 30) meeting DSM-IV criteria for psychotic disorder were recruited from a psychiatry unit and assessed with: Association for Methodology and Documentation in Psychiatry (AMDP) system, Positive and Negative Syndrome Scale, Frankfurt Complaint Questionnaire (FCQ), and the Bonn Scale for the Assessment of Basic Symptoms (BSABS). For quantitative variables, means and Spearman correlation coefficients were calculated. Linear regression following backward method and principal component analysis with varimax rotation were used. 83.3% of subjects (73.3% male, mean age, 31.5 years) presented any type of cenesthopathy; all types of cenesthetic basic symptoms were found. Cenesthetic basic symptoms significantly correlated with the AMDP category "fear and anancasm," FCQ total score, and BSABS cognitive thought disturbances. In the regression analysis only 1 predictor, cognitive thought disturbances, entered the model. In the principal component analysis, a main component which accounted for 22.69% of the variance was found. Cenesthopathy, as assessed with the Bonn Scale (BSABS), is mainly associated with cog-nitive abnormalities including disturbances of thought initiative and mental intentionality, of receptive speech, and subjective retardation or pressure of thoughts. © 2018 S. Karger AG, Basel.

  4. A novel approach combining self-organizing map and parallel factor analysis for monitoring water quality of watersheds under non-point source pollution

    PubMed Central

    Zhang, Yixiang; Liang, Xinqiang; Wang, Zhibo; Xu, Lixian

    2015-01-01

    High content of organic matter in the downstream of watersheds underscored the severity of non-point source (NPS) pollution. The major objectives of this study were to characterize and quantify dissolved organic matter (DOM) in watersheds affected by NPS pollution, and to apply self-organizing map (SOM) and parallel factor analysis (PARAFAC) to assess fluorescence properties as proxy indicators for NPS pollution and labor-intensive routine water quality indicators. Water from upstreams and downstreams was sampled to measure dissolved organic carbon (DOC) concentrations and excitation-emission matrix (EEM). Five fluorescence components were modeled with PARAFAC. The regression analysis between PARAFAC intensities (Fmax) and raw EEM measurements indicated that several raw fluorescence measurements at target excitation-emission wavelength region could provide similar DOM information to massive EEM measurements combined with PARAFAC. Regression analysis between DOC concentration and raw EEM measurements suggested that some regions in raw EEM could be used as surrogates for labor-intensive routine indicators. SOM can be used to visualize the occurrence of pollution. Relationship between DOC concentration and PARAFAC components analyzed with SOM suggested that PARAFAC component 2 might be the major part of bulk DOC and could be recognized as a proxy indicator to predict the DOC concentration. PMID:26526140

  5. Quantifying Parkinson's disease finger-tapping severity by extracting and synthesizing finger motion properties.

    PubMed

    Sano, Yuko; Kandori, Akihiko; Shima, Keisuke; Yamaguchi, Yuki; Tsuji, Toshio; Noda, Masafumi; Higashikawa, Fumiko; Yokoe, Masaru; Sakoda, Saburo

    2016-06-01

    We propose a novel index of Parkinson's disease (PD) finger-tapping severity, called "PDFTsi," for quantifying the severity of symptoms related to the finger tapping of PD patients with high accuracy. To validate the efficacy of PDFTsi, the finger-tapping movements of normal controls and PD patients were measured by using magnetic sensors, and 21 characteristics were extracted from the finger-tapping waveforms. To distinguish motor deterioration due to PD from that due to aging, the aging effect on finger tapping was removed from these characteristics. Principal component analysis (PCA) was applied to the age-normalized characteristics, and principal components that represented the motion properties of finger tapping were calculated. Multiple linear regression (MLR) with stepwise variable selection was applied to the principal components, and PDFTsi was calculated. The calculated PDFTsi indicates that PDFTsi has a high estimation ability, namely a mean square error of 0.45. The estimation ability of PDFTsi is higher than that of the alternative method, MLR with stepwise regression selection without PCA, namely a mean square error of 1.30. This result suggests that PDFTsi can quantify PD finger-tapping severity accurately. Furthermore, the result of interpreting a model for calculating PDFTsi indicated that motion wideness and rhythm disorder are important for estimating PD finger-tapping severity.

  6. Web document ranking via active learning and kernel principal component analysis

    NASA Astrophysics Data System (ADS)

    Cai, Fei; Chen, Honghui; Shu, Zhen

    2015-09-01

    Web document ranking arises in many information retrieval (IR) applications, such as the search engine, recommendation system and online advertising. A challenging issue is how to select the representative query-document pairs and informative features as well for better learning and exploring new ranking models to produce an acceptable ranking list of candidate documents of each query. In this study, we propose an active sampling (AS) plus kernel principal component analysis (KPCA) based ranking model, viz. AS-KPCA Regression, to study the document ranking for a retrieval system, i.e. how to choose the representative query-document pairs and features for learning. More precisely, we fill those documents gradually into the training set by AS such that each of which will incur the highest expected DCG loss if unselected. Then, the KPCA is performed via projecting the selected query-document pairs onto p-principal components in the feature space to complete the regression. Hence, we can cut down the computational overhead and depress the impact incurred by noise simultaneously. To the best of our knowledge, we are the first to perform the document ranking via dimension reductions in two dimensions, namely, the number of documents and features simultaneously. Our experiments demonstrate that the performance of our approach is better than that of the baseline methods on the public LETOR 4.0 datasets. Our approach brings an improvement against RankBoost as well as other baselines near 20% in terms of MAP metric and less improvements using P@K and NDCG@K, respectively. Moreover, our approach is particularly suitable for document ranking on the noisy dataset in practice.

  7. A comparison of large-scale climate signals and the North American Multi-Model Ensemble (NMME) for drought prediction in China

    NASA Astrophysics Data System (ADS)

    Xu, Lei; Chen, Nengcheng; Zhang, Xiang

    2018-02-01

    Drought is an extreme natural disaster that can lead to huge socioeconomic losses. Drought prediction ahead of months is helpful for early drought warning and preparations. In this study, we developed a statistical model, two weighted dynamic models and a statistical-dynamic (hybrid) model for 1-6 month lead drought prediction in China. Specifically, statistical component refers to climate signals weighting by support vector regression (SVR), dynamic components consist of the ensemble mean (EM) and Bayesian model averaging (BMA) of the North American Multi-Model Ensemble (NMME) climatic models, and the hybrid part denotes a combination of statistical and dynamic components by assigning weights based on their historical performances. The results indicate that the statistical and hybrid models show better rainfall predictions than NMME-EM and NMME-BMA models, which have good predictability only in southern China. In the 2011 China winter-spring drought event, the statistical model well predicted the spatial extent and severity of drought nationwide, although the severity was underestimated in the mid-lower reaches of Yangtze River (MLRYR) region. The NMME-EM and NMME-BMA models largely overestimated rainfall in northern and western China in 2011 drought. In the 2013 China summer drought, the NMME-EM model forecasted the drought extent and severity in eastern China well, while the statistical and hybrid models falsely detected negative precipitation anomaly (NPA) in some areas. Model ensembles such as multiple statistical approaches, multiple dynamic models or multiple hybrid models for drought predictions were highlighted. These conclusions may be helpful for drought prediction and early drought warnings in China.

  8. Gross regional domestic product estimation: Application of two-way unbalanced panel data models to economic growth in East Nusa Tenggara province

    NASA Astrophysics Data System (ADS)

    Wibowo, Wahyu; Sinu, Elisabeth B.; Setiawan

    2017-03-01

    The condition of East Nusa Tenggara Province which recently developed new districts can affect the number of information or data collected become unbalanced. One of the consequences of ignoring the data incompleteness is the estimator become not valid. Therefore, the analysis of unbalanced panel data is very crucial.The aim of this paper is to find the estimation of Gross Regional Domestic Product in East Nusa Tenggara Province using unbalanced panel data regression model for two-way error component which assume random effect model (REM). In this research, we employ Feasible Generalized Least Squares (FGLS) as regression coefficients estimation method. Since variance of the model is unknown, ANOVA method is considered to obtain the variance components in order to construct the variance-covariance matrix. The data used in this research is secondary data taken from Central Bureau of Statistics of East Nusa Tenggara Province in 21 districts period 2004-2013. The predictors are the number of labor over 15 years old (X1), electrification ratios (X2), and local revenues (X3) while Gross Regional Domestic Product based on constant price 2000 is the response (Y). The FGLS estimation result shows that the value of R2 is 80,539% and all the predictors chosen are significantly affect (α = 5%) the Gross Regional Domestic Product in all district of East Nusa Tenggara Province. Those variables are the number of labor over 15 years old (X1), electrification ratios (X2), and local revenues (X3) with 0,22986, 0,090476, and 0,14749 of elasticities, respectively.

  9. Assimilation of GOES-Derived Cloud Fields Into MM5

    NASA Astrophysics Data System (ADS)

    Biazar, A. P.; Doty, K. G.; McNider, R.

    2007-12-01

    This approach for the assimilation of GOES-derived cloud data into an atmospheric model (the Fifth-Generation Pennsylvania State University-National Center for Atmospheric Research Mesoscale Model, or MM5) was performed in two steps. In the first step, multiple linear regression equations were developed using a control MM5 simulation to develop relationships for several dependent variables in model columns that had one or more layers of clouds. In the second step, the regression equations were applied during an MM5 simulation with assimilation in which the hourly GOES satellite data were used to determine the cloud locations and some of the cloud properties, but with all the other variables being determined by the model data. The satellite-derived fields used were shortwave cloud albedo and cloud top pressure. Ten multiple linear regression equations were developed for the following dependent variables: total cloud depth, number of cloud layers, depth of the layer that contains the maximum vertical velocity, the maximum vertical velocity, the height of the maximum vertical velocity, the estimated 1-h stable (i.e., grid scale) precipitation rate, the estimated 1-h convective precipitation rate, the height of the level with the maximum positive diabatic heating, the magnitude of the maximum positive diabatic heating, and the largest continuous layer of upward motion. The horizontal components of the divergent wind were adjusted to be consistent with the regression estimate of the maximum vertical velocity. The new total horizontal wind field with these new divergent components was then used to nudge an ongoing MM5 model simulation towards the target vertical velocity. Other adjustments included diabatic heating and moistening at specified levels. Where the model simulation had clouds when the satellite data indicated clear conditions, procedures were taken to remove or diminish the errant clouds. The results for the period of 0000 UTC 28 June - 0000 UTC 16 July 1999 for both a continental 32-km grid and an 8-km grid over the Southeastern United States indicate a significant improvement in the cloud bias statistics. The main improvement was the reduction of high bias values that indicated times and locations in the control run when there were model clouds but when the satellite indicated clear conditions. The importance of this technique is that it has been able to assimilate the observed clouds in the model in a dynamically sustainable manner. Acknowledgments. This work was partially funded by the following grants: a GEWEX grant from NASA , the Cooperative Agreement between the University of Alabama in Huntsville and the Minerals Management Service on Gulf of Mexico Issues, a NASA applications grant, and a NSF grant.

  10. Probabilistic Modeling of the Renal Stone Formation Module

    NASA Technical Reports Server (NTRS)

    Best, Lauren M.; Myers, Jerry G.; Goodenow, Debra A.; McRae, Michael P.; Jackson, Travis C.

    2013-01-01

    The Integrated Medical Model (IMM) is a probabilistic tool, used in mission planning decision making and medical systems risk assessments. The IMM project maintains a database of over 80 medical conditions that could occur during a spaceflight, documenting an incidence rate and end case scenarios for each. In some cases, where observational data are insufficient to adequately define the inflight medical risk, the IMM utilizes external probabilistic modules to model and estimate the event likelihoods. One such medical event of interest is an unpassed renal stone. Due to a high salt diet and high concentrations of calcium in the blood (due to bone depletion caused by unloading in the microgravity environment) astronauts are at a considerable elevated risk for developing renal calculi (nephrolithiasis) while in space. Lack of observed incidences of nephrolithiasis has led HRP to initiate the development of the Renal Stone Formation Module (RSFM) to create a probabilistic simulator capable of estimating the likelihood of symptomatic renal stone presentation in astronauts on exploration missions. The model consists of two major parts. The first is the probabilistic component, which utilizes probability distributions to assess the range of urine electrolyte parameters and a multivariate regression to transform estimated crystal density and size distributions to the likelihood of the presentation of nephrolithiasis symptoms. The second is a deterministic physical and chemical model of renal stone growth in the kidney developed by Kassemi et al. The probabilistic component of the renal stone model couples the input probability distributions describing the urine chemistry, astronaut physiology, and system parameters with the physical and chemical outputs and inputs to the deterministic stone growth model. These two parts of the model are necessary to capture the uncertainty in the likelihood estimate. The model will be driven by Monte Carlo simulations, continuously randomly sampling the probability distributions of the electrolyte concentrations and system parameters that are inputs into the deterministic model. The total urine chemistry concentrations are used to determine the urine chemistry activity using the Joint Expert Speciation System (JESS), a biochemistry model. Information used from JESS is then fed into the deterministic growth model. Outputs from JESS and the deterministic model are passed back to the probabilistic model where a multivariate regression is used to assess the likelihood of a stone forming and the likelihood of a stone requiring clinical intervention. The parameters used to determine to quantify these risks include: relative supersaturation (RS) of calcium oxalate, citrate/calcium ratio, crystal number density, total urine volume, pH, magnesium excretion, maximum stone width, and ureteral location. Methods and Validation: The RSFM is designed to perform a Monte Carlo simulation to generate probability distributions of clinically significant renal stones, as well as provide an associated uncertainty in the estimate. Initially, early versions will be used to test integration of the components and assess component validation and verification (V&V), with later versions used to address questions regarding design reference mission scenarios. Once integrated with the deterministic component, the credibility assessment of the integrated model will follow NASA STD 7009 requirements.

  11. Geospatial modeling of plant stable isotope ratios - the development of isoscapes

    NASA Astrophysics Data System (ADS)

    West, J. B.; Ehleringer, J. R.; Hurley, J. M.; Cerling, T. E.

    2007-12-01

    Large-scale spatial variation in stable isotope ratios can yield critical insights into the spatio-temporal dynamics of biogeochemical cycles, animal movements, and shifts in climate, as well as anthropogenic activities such as commerce, resource utilization, and forensic investigation. Interpreting these signals requires that we understand and model the variation. We report progress in our development of plant stable isotope ratio landscapes (isoscapes). Our approach utilizes a GIS, gridded datasets, a range of modeling approaches, and spatially distributed observations. We synthesize findings from four studies to illustrate the general utility of the approach, its ability to represent observed spatio-temporal variability in plant stable isotope ratios, and also outline some specific areas of uncertainty. We also address two basic, but critical questions central to our ability to model plant stable isotope ratios using this approach: 1. Do the continuous precipitation isotope ratio grids represent reasonable proxies for plant source water?, and 2. Do continuous climate grids (as is or modified) represent a reasonable proxy for the climate experienced by plants? Plant components modeled include leaf water, grape water (extracted from wine), bulk leaf material ( Cannabis sativa; marijuana), and seed oil ( Ricinus communis; castor bean). Our approaches to modeling the isotope ratios of these components varied from highly sophisticated process models to simple one-step fractionation models to regression approaches. The leaf water isosocapes were produced using steady-state models of enrichment and continuous grids of annual average precipitation isotope ratios and climate. These were compared to other modeling efforts, as well as a relatively sparse, but geographically distributed dataset from the literature. The latitudinal distributions and global averages compared favorably to other modeling efforts and the observational data compared well to model predictions. These results yield confidence in the precipitation isoscapes used to represent plant source water, the modified climate grids used to represent leaf climate, and the efficacy of this approach to modeling. Further work confirmed these observations. The seed oil isoscape was produced using a simple model of lipid fractionation driven with the precipitation grid, and compared well to widely distributed observations of castor bean oil, again suggesting that the precipitation grids were reasonable proxies for plant source water. The marijuana leaf δ2H observations distributed across the continental United States were regressed against the precipitation δ2H grids and yielded a strong relationship between them, again suggesting that plant source water was reasonably well represented by the precipitation grid. Finally, the wine water δ18O isoscape was developed from regressions that related precipitation isotope ratios and climate to observations from a single vintage. Favorable comparisons between year-specific wine water isoscapes and inter-annual variations in previous vintages yielded confidence in the climate grids. Clearly significant residual variability remains to be explained in all of these cases and uncertainties vary depending on the component modeled, but we conclude from this synthesis that isoscapes are capable of representing real spatial and temporal variability in plant stable isotope ratios.

  12. Feature Extraction of Event-Related Potentials Using Wavelets: An Application to Human Performance Monitoring

    NASA Technical Reports Server (NTRS)

    Trejo, Leonard J.; Shensa, Mark J.; Remington, Roger W. (Technical Monitor)

    1998-01-01

    This report describes the development and evaluation of mathematical models for predicting human performance from discrete wavelet transforms (DWT) of event-related potentials (ERP) elicited by task-relevant stimuli. The DWT was compared to principal components analysis (PCA) for representation of ERPs in linear regression and neural network models developed to predict a composite measure of human signal detection performance. Linear regression models based on coefficients of the decimated DWT predicted signal detection performance with half as many f ree parameters as comparable models based on PCA scores. In addition, the DWT-based models were more resistant to model degradation due to over-fitting than PCA-based models. Feed-forward neural networks were trained using the backpropagation,-, algorithm to predict signal detection performance based on raw ERPs, PCA scores, or high-power coefficients of the DWT. Neural networks based on high-power DWT coefficients trained with fewer iterations, generalized to new data better, and were more resistant to overfitting than networks based on raw ERPs. Networks based on PCA scores did not generalize to new data as well as either the DWT network or the raw ERP network. The results show that wavelet expansions represent the ERP efficiently and extract behaviorally important features for use in linear regression or neural network models of human performance. The efficiency of the DWT is discussed in terms of its decorrelation and energy compaction properties. In addition, the DWT models provided evidence that a pattern of low-frequency activity (1 to 3.5 Hz) occurring at specific times and scalp locations is a reliable correlate of human signal detection performance.

  13. Feature extraction of event-related potentials using wavelets: an application to human performance monitoring

    NASA Technical Reports Server (NTRS)

    Trejo, L. J.; Shensa, M. J.

    1999-01-01

    This report describes the development and evaluation of mathematical models for predicting human performance from discrete wavelet transforms (DWT) of event-related potentials (ERP) elicited by task-relevant stimuli. The DWT was compared to principal components analysis (PCA) for representation of ERPs in linear regression and neural network models developed to predict a composite measure of human signal detection performance. Linear regression models based on coefficients of the decimated DWT predicted signal detection performance with half as many free parameters as comparable models based on PCA scores. In addition, the DWT-based models were more resistant to model degradation due to over-fitting than PCA-based models. Feed-forward neural networks were trained using the backpropagation algorithm to predict signal detection performance based on raw ERPs, PCA scores, or high-power coefficients of the DWT. Neural networks based on high-power DWT coefficients trained with fewer iterations, generalized to new data better, and were more resistant to overfitting than networks based on raw ERPs. Networks based on PCA scores did not generalize to new data as well as either the DWT network or the raw ERP network. The results show that wavelet expansions represent the ERP efficiently and extract behaviorally important features for use in linear regression or neural network models of human performance. The efficiency of the DWT is discussed in terms of its decorrelation and energy compaction properties. In addition, the DWT models provided evidence that a pattern of low-frequency activity (1 to 3.5 Hz) occurring at specific times and scalp locations is a reliable correlate of human signal detection performance. Copyright 1999 Academic Press.

  14. Estimating riparian understory vegetation cover with beta regression and copula models

    USGS Publications Warehouse

    Eskelson, Bianca N.I.; Madsen, Lisa; Hagar, Joan C.; Temesgen, Hailemariam

    2011-01-01

    Understory vegetation communities are critical components of forest ecosystems. As a result, the importance of modeling understory vegetation characteristics in forested landscapes has become more apparent. Abundance measures such as shrub cover are bounded between 0 and 1, exhibit heteroscedastic error variance, and are often subject to spatial dependence. These distributional features tend to be ignored when shrub cover data are analyzed. The beta distribution has been used successfully to describe the frequency distribution of vegetation cover. Beta regression models ignoring spatial dependence (BR) and accounting for spatial dependence (BRdep) were used to estimate percent shrub cover as a function of topographic conditions and overstory vegetation structure in riparian zones in western Oregon. The BR models showed poor explanatory power (pseudo-R2 ≤ 0.34) but outperformed ordinary least-squares (OLS) and generalized least-squares (GLS) regression models with logit-transformed response in terms of mean square prediction error and absolute bias. We introduce a copula (COP) model that is based on the beta distribution and accounts for spatial dependence. A simulation study was designed to illustrate the effects of incorrectly assuming normality, equal variance, and spatial independence. It showed that BR, BRdep, and COP models provide unbiased parameter estimates, whereas OLS and GLS models result in slightly biased estimates for two of the three parameters. On the basis of the simulation study, 93–97% of the GLS, BRdep, and COP confidence intervals covered the true parameters, whereas OLS and BR only resulted in 84–88% coverage, which demonstrated the superiority of GLS, BRdep, and COP over OLS and BR models in providing standard errors for the parameter estimates in the presence of spatial dependence.

  15. A parametric ribcage geometry model accounting for variations among the adult population.

    PubMed

    Wang, Yulong; Cao, Libo; Bai, Zhonghao; Reed, Matthew P; Rupp, Jonathan D; Hoff, Carrie N; Hu, Jingwen

    2016-09-06

    The objective of this study is to develop a parametric ribcage model that can account for morphological variations among the adult population. Ribcage geometries, including 12 pair of ribs, sternum, and thoracic spine, were collected from CT scans of 101 adult subjects through image segmentation, landmark identification (1016 for each subject), symmetry adjustment, and template mesh mapping (26,180 elements for each subject). Generalized procrustes analysis (GPA), principal component analysis (PCA), and regression analysis were used to develop a parametric ribcage model, which can predict nodal locations of the template mesh according to age, sex, height, and body mass index (BMI). Two regression models, a quadratic model for estimating the ribcage size and a linear model for estimating the ribcage shape, were developed. The results showed that the ribcage size was dominated by the height (p=0.000) and age-sex-interaction (p=0.007) and the ribcage shape was significantly affected by the age (p=0.0005), sex (p=0.0002), height (p=0.0064) and BMI (p=0.0000). Along with proper assignment of cortical bone thickness, material properties and failure properties, this parametric ribcage model can directly serve as the mesh of finite element ribcage models for quantifying effects of human characteristics on thoracic injury risks. Copyright © 2016 Elsevier Ltd. All rights reserved.

  16. Axial cervical vertebrae-based multivariate regression model for the estimation of skeletal-maturation status.

    PubMed

    Yang, Y-M; Lee, J; Kim, Y-I; Cho, B-H; Park, S-B

    2014-08-01

    This study aimed to determine the viability of using axial cervical vertebrae (ACV) as biological indicators of skeletal maturation and to build models that estimate ossification level with improved explanatory power over models based only on chronological age. The study population comprised 74 female and 47 male patients with available hand-wrist radiographs and cone-beam computed tomography images. Generalized Procrustes analysis was used to analyze the shape, size, and form of the ACV regions of interest. The variabilities of these factors were analyzed by principal component analysis. Skeletal maturation was then estimated using a multiple regression model. Separate models were developed for male and female participants. For the female estimation model, the adjusted R(2) explained 84.8% of the variability of the Sempé maturation level (SML), representing a 7.9% increase in SML explanatory power over that using chronological age alone (76.9%). For the male estimation model, the adjusted R(2) was over 90%, representing a 1.7% increase relative to the reference model. The simplest possible ACV morphometric information provided a statistically significant explanation of the portion of skeletal-maturation variability not dependent on chronological age. These results verify that ACV is a strong biological indicator of ossification status. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  17. Linear Regression Modeling of Selected Analytes from the Balad Air Sampling Program

    DTIC Science & Technology

    2012-04-05

    groundwater, air and soil contamination with unwanted chemicals as well as attract vectors (Insects, rodents, etc.) for diseases. In deployed...via in-flight jettisoning of fuel and from 31 accidental spills or leaks to soil during use, storage, and transportation. VOC components of JP-8...can be introduced to the atmosphere from the soil through volatilization.46 In addition, the reaction between JP-8 and atmospheric chemicals may

  18. Development of NIRS models to predict protein and amylose content of brown rice and proximate compositions of rice bran.

    PubMed

    Bagchi, Torit Baran; Sharma, Srigopal; Chattopadhyay, Krishnendu

    2016-01-15

    With the escalating persuasion of economic and nutritional importance of rice grain protein and nutritional components of rice bran (RB), NIRS can be an effective tool for high throughput screening in rice breeding programme. Optimization of NIRS is prerequisite for accurate prediction of grain quality parameters. In the present study, 173 brown rice (BR) and 86 RB samples with a wide range of values were used to compare the calibration models generated by different chemometrics for grain protein (GPC) and amylose content (AC) of BR and proximate compositions (protein, crude oil, moisture, ash and fiber content) of RB. Various modified partial least square (mPLSs) models corresponding with the best mathematical treatments were identified for all components. Another set of 29 genotypes derived from the breeding programme were employed for the external validation of these calibration models. High accuracy of all these calibration and prediction models was ensured through pair t-test and correlation regression analysis between reference and predicted values. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Noninvasive diagnostics of skin microphysical parameters based on spatially resolved diffuse reflectance spectroscopy

    NASA Astrophysics Data System (ADS)

    Lisenko, S. A.; Kugeiko, M. M.

    2013-01-01

    The ability to determine noninvasively microphysical parameters (MPPs) of skin characteristic of malignant melanoma was demonstrated. The MPPs were the melanin content in dermis, saturation of tissue with blood vessels, and concentration and effective size of tissue scatterers. The proposed method was based on spatially resolved spectral measurements of skin diffuse reflectance and multiple regressions between linearly independent measurement components and skin MPPs. The regressions were established by modeling radiation transfer in skin with a wide variation of its MPPs. Errors in the determination of skin MPPs were estimated using fiber-optic measurements of its diffuse reflectance at wavelengths of commercially available semiconductor diode lasers (578, 625, 660, 760, and 806 nm) at source-detector separations of 0.23-1.38 mm.

  20. Associations between different components of fitness and fatness with academic performance in Chilean youths.

    PubMed

    Olivares, Pedro R; García-Rubio, Javier

    2016-01-01

    To analyze the associations between different components of fitness and fatness with academic performance, adjusting the analysis by sex, age, socio-economic status, region and school type in a Chilean sample. Data of fitness, fatness and academic performance was obtained from the Chilean System for the Assessment of Educational Quality test for eighth grade in 2011 and includes a sample of 18,746 subjects (49% females). Partial correlations adjusted by confounders were done to explore association between fitness and fatness components, and between the academic scores. Three unadjusted and adjusted linear regression models were done in order to analyze the associations of variables. Fatness has a negative association with academic performance when Body Mass Index (BMI) and Waist to Height Ratio (WHR) are assessed independently. When BMI and WHR are assessed jointly and adjusted by cofounders, WHR is more associated with academic performance than BMI, and only the association of WHR is positive. For fitness components, strength was the variable most associated with the academic performance. Cardiorespiratory capacity was not associated with academic performance if fatness and other fitness components are included in the model. Fitness and fatness are associated with academic performance. WHR and strength are more related with academic performance than BMI and cardiorespiratory capacity.

  1. Associations between different components of fitness and fatness with academic performance in Chilean youths

    PubMed Central

    2016-01-01

    Objectives To analyze the associations between different components of fitness and fatness with academic performance, adjusting the analysis by sex, age, socio-economic status, region and school type in a Chilean sample. Methods Data of fitness, fatness and academic performance was obtained from the Chilean System for the Assessment of Educational Quality test for eighth grade in 2011 and includes a sample of 18,746 subjects (49% females). Partial correlations adjusted by confounders were done to explore association between fitness and fatness components, and between the academic scores. Three unadjusted and adjusted linear regression models were done in order to analyze the associations of variables. Results Fatness has a negative association with academic performance when Body Mass Index (BMI) and Waist to Height Ratio (WHR) are assessed independently. When BMI and WHR are assessed jointly and adjusted by cofounders, WHR is more associated with academic performance than BMI, and only the association of WHR is positive. For fitness components, strength was the variable most associated with the academic performance. Cardiorespiratory capacity was not associated with academic performance if fatness and other fitness components are included in the model. Conclusions Fitness and fatness are associated with academic performance. WHR and strength are more related with academic performance than BMI and cardiorespiratory capacity. PMID:27761345

  2. Exploring Oral Cancer Patients' Preference in Medical Decision Making and Quality of Life.

    PubMed

    Cheng, Sun-Long; Liao, Hsien-Hua; Shueng, Pei-Wei; Lee, Hsi-Chieh; Cheewakriangkrai, Chalong; Chang, Chi-Chang

    2017-01-01

    Little is known about the clinical effects of shared medical decision making (SMDM) associated with quality of life about oral cancer? To understand patients who occurred potential cause of SMDM and extended to explore the interrelated components of quality of life for providing patients with potential adaptation of early assessment. All consenting patients completed the SMDM questionnaire and 36-Item Short Form (SF-36). Regression analyses were conducted to find predictors of quality of life among oral cancer patients. The proposed model predicted 57.4% of the variance in patients' SF-36 Mental Component scores. Patient mental component summary scores were associated with smoking habit (β=-0.3449, p=0.022), autonomy (β=-0.226, p=0.018) and Control preference (β=-0.388, p=0.007). The proposed model predicted 42.6% of the variance in patients' SF-36 Physical component scores. Patient physical component summary scores were associated with higher education (β=0.288, p=0.007), employment status (β=-0.225, p=0.033), involvement perceived (β=-0.606, p=0.011) and Risk communication (β=-0.558, p=0.019). Future research is necessary to determine whether oral cancer patients would benefit from early screening and intervention to address shared medical decision making.

  3. Genetic parameters and expected responses to selection for components of feed efficiency in a Duroc pig line.

    PubMed

    Sánchez, Juan P; Ragab, Mohamed; Quintanilla, Raquel; Rothschild, Max F; Piles, Miriam

    2017-12-01

    Improving feed efficiency ([Formula: see text]) is a key factor for any pig breeding company. Although this can be achieved by selection on an index of multi-trait best linear unbiased prediction of breeding values with optimal economic weights, considering deviations of feed intake from actual needs ([Formula: see text]) should be of value for further research on biological aspects of [Formula: see text]. Here, we present a random regression model that extends the classical definition of [Formula: see text] by including animal-specific needs in the model. Using this model, we explore the genetic determinism of several [Formula: see text] components: use of feed for growth ([Formula: see text]), use of feed for backfat deposition ([Formula: see text]), use of feed for maintenance ([Formula: see text]), and unspecific efficiency in the use of feed ([Formula: see text]). Expected response to alternative selection indexes involving different components is also studied. Based on goodness-of-fit to the available feed intake ([Formula: see text]) data, the model that assumes individual (genetic and permanent) variation in the use of feed for maintenance, [Formula: see text] and [Formula: see text] showed the best performance. Joint individual variation in feed allocation to maintenance, growth and backfat deposition comprised 37% of the individual variation of [Formula: see text]. The estimated heritabilities of [Formula: see text] using the model that accounts for animal-specific needs and the traditional [Formula: see text] model were 0.12 and 0.18, respectively. The estimated heritabilities for the regression coefficients were 0.44, 0.39 and 0.55 for [Formula: see text], [Formula: see text] and [Formula: see text], respectively. Estimates of genetic correlations of [Formula: see text] were positive with amount of feed used for [Formula: see text] and [Formula: see text] but negative for [Formula: see text]. Expected response in overall efficiency, reducing [Formula: see text] without altering performance, was 2.5% higher when the model assumed animal-specific needs than when the traditional definition of [Formula: see text] was considered. Expected response in overall efficiency, by reducing [Formula: see text] without altering performance, is slightly better with a model that assumes animal-specific needs instead of batch-specific needs to correct [Formula: see text]. The relatively small difference between the traditional [Formula: see text] model and our model is due to random intercepts (unspecific use of feed) accounting for the majority of variability in [Formula: see text]. Overall, a model that accounts for animal-specific needs for [Formula: see text], [Formula: see text] and [Formula: see text] is statistically superior and allows for the possibility to act differentially on [Formula: see text] components.

  4. Bio-inspired adaptive feedback error learning architecture for motor control.

    PubMed

    Tolu, Silvia; Vanegas, Mauricio; Luque, Niceto R; Garrido, Jesús A; Ros, Eduardo

    2012-10-01

    This study proposes an adaptive control architecture based on an accurate regression method called Locally Weighted Projection Regression (LWPR) and on a bio-inspired module, such as a cerebellar-like engine. This hybrid architecture takes full advantage of the machine learning module (LWPR kernel) to abstract an optimized representation of the sensorimotor space while the cerebellar component integrates this to generate corrective terms in the framework of a control task. Furthermore, we illustrate how the use of a simple adaptive error feedback term allows to use the proposed architecture even in the absence of an accurate analytic reference model. The presented approach achieves an accurate control with low gain corrective terms (for compliant control schemes). We evaluate the contribution of the different components of the proposed scheme comparing the obtained performance with alternative approaches. Then, we show that the presented architecture can be used for accurate manipulation of different objects when their physical properties are not directly known by the controller. We evaluate how the scheme scales for simulated plants of high Degrees of Freedom (7-DOFs).

  5. Concentration determination of collagen and proteoglycan in bovine nasal cartilage by Fourier transform infrared imaging and PLS

    NASA Astrophysics Data System (ADS)

    Zhang, Xuexi; Xiao, Zhi-Yan; Yin, Jianhua; Xia, Yang

    2014-09-01

    Fourier transform infrared imaging (FTIRI) combined with chemometrics can be used to detect the structure of bio-macromolecule, measure the concentrations of some components, and so on. In this study, FTIRI with Partial Least-Squares (PLS) regression was applied to study the concentration of two main components in bovine nasal cartilage (BNC), collagen and proteoglycan. An infrared spectrum library was built by mixing the collagen and chondroitin 6-sulfate (main of proteoglycan) at different ratios. Some pretreatments are needed for building PLS model. FTIR images were collected from BNC sections at 6.25μm and 25μm pixel size. The spectra extracted from BNC-FTIR images were imported into the PLS regression program to predict the concentrations of collagen and proteoglycan. These PLS-determined concentrations are agreed with the result in our previous work and biochemical analytical results. The prediction shows that the concentrations of collagen and proteoglycan in BNC are comparative on the whole. However, the concentration of proteoglycan is a litter higher than that of collagen, to some extent.

  6. Computational simulation of coupled material degradation processes for probabilistic lifetime strength of aerospace materials

    NASA Technical Reports Server (NTRS)

    Boyce, Lola; Bast, Callie C.

    1992-01-01

    The research included ongoing development of methodology that provides probabilistic lifetime strength of aerospace materials via computational simulation. A probabilistic material strength degradation model, in the form of a randomized multifactor interaction equation, is postulated for strength degradation of structural components of aerospace propulsion systems subjected to a number of effects or primative variables. These primative variable may include high temperature, fatigue or creep. In most cases, strength is reduced as a result of the action of a variable. This multifactor interaction strength degradation equation has been randomized and is included in the computer program, PROMISS. Also included in the research is the development of methodology to calibrate the above described constitutive equation using actual experimental materials data together with linear regression of that data, thereby predicting values for the empirical material constraints for each effect or primative variable. This regression methodology is included in the computer program, PROMISC. Actual experimental materials data were obtained from the open literature for materials typically of interest to those studying aerospace propulsion system components. Material data for Inconel 718 was analyzed using the developed methodology.

  7. Identification of milk origin and process-induced changes in milk by stable isotope ratio mass spectrometry.

    PubMed

    Scampicchio, Matteo; Mimmo, Tanja; Capici, Calogero; Huck, Christian; Innocente, Nadia; Drusch, Stephan; Cesco, Stefano

    2012-11-14

    Stable isotope values were used to develop a new analytical approach enabling the simultaneous identification of milk samples either processed with different heating regimens or from different geographical origins. The samples consisted of raw, pasteurized (HTST), and ultrapasteurized (UHT) milk from different Italian origins. The approach consisted of the analysis of the isotope ratio of δ¹³C and δ¹⁵N for the milk samples and their fractions (fat, casein, and whey). The main finding of this work is that as the heat processing affects the composition of the milk fractions, changes in δ¹³C and δ¹⁵N were also observed. These changes were used as markers to develop pattern recognition maps based on principal component analysis and supervised classification models, such as linear discriminant analysis (LDA), multivariate regression (MLR), principal component regression (PCR), and partial least-squares (PLS). The results give proof of the concept that isotope ratio mass spectroscopy can discriminate simultaneously between milk samples according to their geographical origin and type of processing.

  8. Statistical methods for efficient design of community surveys of response to noise: Random coefficients regression models

    NASA Technical Reports Server (NTRS)

    Tomberlin, T. J.

    1985-01-01

    Research studies of residents' responses to noise consist of interviews with samples of individuals who are drawn from a number of different compact study areas. The statistical techniques developed provide a basis for those sample design decisions. These techniques are suitable for a wide range of sample survey applications. A sample may consist of a random sample of residents selected from a sample of compact study areas, or in a more complex design, of a sample of residents selected from a sample of larger areas (e.g., cities). The techniques may be applied to estimates of the effects on annoyance of noise level, numbers of noise events, the time-of-day of the events, ambient noise levels, or other factors. Methods are provided for determining, in advance, how accurately these effects can be estimated for different sample sizes and study designs. Using a simple cost function, they also provide for optimum allocation of the sample across the stages of the design for estimating these effects. These techniques are developed via a regression model in which the regression coefficients are assumed to be random, with components of variance associated with the various stages of a multi-stage sample design.

  9. Tensile properties of cooked meat sausages and their correlation with texture profile analysis (TPA) parameters and physico-chemical characteristics.

    PubMed

    Herrero, A M; de la Hoz, L; Ordóñez, J A; Herranz, B; Romero de Ávila, M D; Cambero, M I

    2008-11-01

    The possibilities of using breaking strength (BS) and energy to fracture (EF) for monitoring textural properties of some cooked meat sausages (chopped, mortadella and galantines) were studied. Texture profile analysis (TPA), folding test and physico-chemical measurements were also performed. Principal component analysis enabled these meat products to be grouped into three textural profiles which showed significant (p<0.05) differences mainly for BS, hardness, adhesiveness and cohesiveness. Multivariate analysis indicated that BS, EF and TPA parameters were correlated (p<0.05) for every individual meat product (chopped, mortadella and galantines) and all products together. On the basis of these results, TPA parameters could be used for constructing regression models to predict BS. The resulting regression model for all cooked meat products was BS=-0.160+6.600∗cohesiveness-1.255∗adhesiveness+0.048∗hardness-506.31∗springiness (R(2)=0.745, p<0.00005). Simple linear regression analysis showed significant coefficients of determination between BS (R(2)=0.586, p<0.0001) versus folding test grade (FG) and EF versus FG (R(2)=0.564, p<0.0001).

  10. Imaging-based biomarkers of cognitive performance in older adults constructed via high-dimensional pattern regression applied to MRI and PET.

    PubMed

    Wang, Ying; Goh, Joshua O; Resnick, Susan M; Davatzikos, Christos

    2013-01-01

    In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM) and functional (positron emission tomography of regional cerebral blood flow; PET) brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA). We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM), and short-term memory (STM). For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination) as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive status.

  11. A general procedure to generate models for urban environmental-noise pollution using feature selection and machine learning methods.

    PubMed

    Torija, Antonio J; Ruiz, Diego P

    2015-02-01

    The prediction of environmental noise in urban environments requires the solution of a complex and non-linear problem, since there are complex relationships among the multitude of variables involved in the characterization and modelling of environmental noise and environmental-noise magnitudes. Moreover, the inclusion of the great spatial heterogeneity characteristic of urban environments seems to be essential in order to achieve an accurate environmental-noise prediction in cities. This problem is addressed in this paper, where a procedure based on feature-selection techniques and machine-learning regression methods is proposed and applied to this environmental problem. Three machine-learning regression methods, which are considered very robust in solving non-linear problems, are used to estimate the energy-equivalent sound-pressure level descriptor (LAeq). These three methods are: (i) multilayer perceptron (MLP), (ii) sequential minimal optimisation (SMO), and (iii) Gaussian processes for regression (GPR). In addition, because of the high number of input variables involved in environmental-noise modelling and estimation in urban environments, which make LAeq prediction models quite complex and costly in terms of time and resources for application to real situations, three different techniques are used to approach feature selection or data reduction. The feature-selection techniques used are: (i) correlation-based feature-subset selection (CFS), (ii) wrapper for feature-subset selection (WFS), and the data reduction technique is principal-component analysis (PCA). The subsequent analysis leads to a proposal of different schemes, depending on the needs regarding data collection and accuracy. The use of WFS as the feature-selection technique with the implementation of SMO or GPR as regression algorithm provides the best LAeq estimation (R(2)=0.94 and mean absolute error (MAE)=1.14-1.16 dB(A)). Copyright © 2014 Elsevier B.V. All rights reserved.

  12. HT-FRTC: a fast radiative transfer code using kernel regression

    NASA Astrophysics Data System (ADS)

    Thelen, Jean-Claude; Havemann, Stephan; Lewis, Warren

    2016-09-01

    The HT-FRTC is a principal component based fast radiative transfer code that can be used across the electromagnetic spectrum from the microwave through to the ultraviolet to calculate transmittance, radiance and flux spectra. The principal components cover the spectrum at a very high spectral resolution, which allows very fast line-by-line, hyperspectral and broadband simulations for satellite-based, airborne and ground-based sensors. The principal components are derived during a code training phase from line-by-line simulations for a diverse set of atmosphere and surface conditions. The derived principal components are sensor independent, i.e. no extra training is required to include additional sensors. During the training phase we also derive the predictors which are required by the fast radiative transfer code to determine the principal component scores from the monochromatic radiances (or fluxes, transmittances). These predictors are calculated for each training profile at a small number of frequencies, which are selected by a k-means cluster algorithm during the training phase. Until recently the predictors were calculated using a linear regression. However, during a recent rewrite of the code the linear regression was replaced by a Gaussian Process (GP) regression which resulted in a significant increase in accuracy when compared to the linear regression. The HT-FRTC has been trained with a large variety of gases, surface properties and scatterers. Rayleigh scattering as well as scattering by frozen/liquid clouds, hydrometeors and aerosols have all been included. The scattering phase function can be fully accounted for by an integrated line-by-line version of the Edwards-Slingo spherical harmonics radiation code or approximately by a modification to the extinction (Chou scaling).

  13. Relationship between rice yield and climate variables in southwest Nigeria using multiple linear regression and support vector machine analysis

    NASA Astrophysics Data System (ADS)

    Oguntunde, Philip G.; Lischeid, Gunnar; Dietrich, Ottfried

    2018-03-01

    This study examines the variations of climate variables and rice yield and quantifies the relationships among them using multiple linear regression, principal component analysis, and support vector machine (SVM) analysis in southwest Nigeria. The climate and yield data used was for a period of 36 years between 1980 and 2015. Similar to the observed decrease ( P < 0.001) in rice yield, pan evaporation, solar radiation, and wind speed declined significantly. Eight principal components exhibited an eigenvalue > 1 and explained 83.1% of the total variance of predictor variables. The SVM regression function using the scores of the first principal component explained about 75% of the variance in rice yield data and linear regression about 64%. SVM regression between annual solar radiation values and yield explained 67% of the variance. Only the first component of the principal component analysis (PCA) exhibited a clear long-term trend and sometimes short-term variance similar to that of rice yield. Short-term fluctuations of the scores of the PC1 are closely coupled to those of rice yield during the 1986-1993 and the 2006-2013 periods thereby revealing the inter-annual sensitivity of rice production to climate variability. Solar radiation stands out as the climate variable of highest influence on rice yield, and the influence was especially strong during monsoon and post-monsoon periods, which correspond to the vegetative, booting, flowering, and grain filling stages in the study area. The outcome is expected to provide more in-depth regional-specific climate-rice linkage for screening of better cultivars that can positively respond to future climate fluctuations as well as providing information that may help optimized planting dates for improved radiation use efficiency in the study area.

  14. Modelling Ecuador's rainfall distribution according to geographical characteristics.

    NASA Astrophysics Data System (ADS)

    Tobar, Vladimiro; Wyseure, Guido

    2017-04-01

    It is known that rainfall is affected by terrain characteristics and some studies had focussed on its distribution over complex terrain. Ecuador's temporal and spatial rainfall distribution is affected by its location on the ITCZ, the marine currents in the Pacific, the Amazon rainforest, and the Andes mountain range. Although all these factors are important, we think that the latter one may hold a key for modelling spatial and temporal distribution of rainfall. The study considered 30 years of monthly data from 319 rainfall stations having at least 10 years of data available. The relatively low density of stations and their location in accessible sites near to main roads or rivers, leave large and important areas ungauged, making it not appropriate to rely on traditional interpolation techniques to estimate regional rainfall for water balance. The aim of this research was to come up with a useful model for seasonal rainfall distribution in Ecuador based on geographical characteristics to allow its spatial generalization. The target for modelling was the seasonal rainfall, characterized by nine percentiles for each one of the 12 months of the year that results in 108 response variables, later on reduced to four principal components comprising 94% of the total variability. Predictor variables for the model were: geographic coordinates, elevation, main wind effects from the Amazon and Coast, Valley and Hill indexes, and average and maximum elevation above the selected rainfall station to the east and to the west, for each one of 18 directions (50-135°, by 5°) adding up to 79 predictors. A multiple linear regression model by the Elastic-net algorithm with cross-validation was applied for each one of the PC as response to select the most important ones from the 79 predictor variables. The Elastic-net algorithm deals well with collinearity problems, while allowing variable selection in a blended approach between the Ridge and Lasso regression. The model fitting produced explained variances of 59%, 81%, 49% and 17% for PC1, PC2, PC3 and PC4, respectively, backing up the hypothesis of good correlation between geographical characteristics and seasonal rainfall patterns (comprised in the four principal components). With the obtained coefficients from the regression, the 108 rainfall percentiles for each station were back estimated giving very good results when compared with the original ones, with an overall 60% explained variance.

  15. A hybrid PCA-CART-MARS-based prognostic approach of the remaining useful life for aircraft engines.

    PubMed

    Sánchez Lasheras, Fernando; García Nieto, Paulino José; de Cos Juez, Francisco Javier; Mayo Bayón, Ricardo; González Suárez, Victor Manuel

    2015-03-23

    Prognostics is an engineering discipline that predicts the future health of a system. In this research work, a data-driven approach for prognostics is proposed. Indeed, the present paper describes a data-driven hybrid model for the successful prediction of the remaining useful life of aircraft engines. The approach combines the multivariate adaptive regression splines (MARS) technique with the principal component analysis (PCA), dendrograms and classification and regression trees (CARTs). Elements extracted from sensor signals are used to train this hybrid model, representing different levels of health for aircraft engines. In this way, this hybrid algorithm is used to predict the trends of these elements. Based on this fitting, one can determine the future health state of a system and estimate its remaining useful life (RUL) with accuracy. To evaluate the proposed approach, a test was carried out using aircraft engine signals collected from physical sensors (temperature, pressure, speed, fuel flow, etc.). Simulation results show that the PCA-CART-MARS-based approach can forecast faults long before they occur and can predict the RUL. The proposed hybrid model presents as its main advantage the fact that it does not require information about the previous operation states of the input variables of the engine. The performance of this model was compared with those obtained by other benchmark models (multivariate linear regression and artificial neural networks) also applied in recent years for the modeling of remaining useful life. Therefore, the PCA-CART-MARS-based approach is very promising in the field of prognostics of the RUL for aircraft engines.

  16. Robust inference in the negative binomial regression model with an application to falls data.

    PubMed

    Aeberhard, William H; Cantoni, Eva; Heritier, Stephane

    2014-12-01

    A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.

  17. A Hybrid PCA-CART-MARS-Based Prognostic Approach of the Remaining Useful Life for Aircraft Engines

    PubMed Central

    Lasheras, Fernando Sánchez; Nieto, Paulino José García; de Cos Juez, Francisco Javier; Bayón, Ricardo Mayo; Suárez, Victor Manuel González

    2015-01-01

    Prognostics is an engineering discipline that predicts the future health of a system. In this research work, a data-driven approach for prognostics is proposed. Indeed, the present paper describes a data-driven hybrid model for the successful prediction of the remaining useful life of aircraft engines. The approach combines the multivariate adaptive regression splines (MARS) technique with the principal component analysis (PCA), dendrograms and classification and regression trees (CARTs). Elements extracted from sensor signals are used to train this hybrid model, representing different levels of health for aircraft engines. In this way, this hybrid algorithm is used to predict the trends of these elements. Based on this fitting, one can determine the future health state of a system and estimate its remaining useful life (RUL) with accuracy. To evaluate the proposed approach, a test was carried out using aircraft engine signals collected from physical sensors (temperature, pressure, speed, fuel flow, etc.). Simulation results show that the PCA-CART-MARS-based approach can forecast faults long before they occur and can predict the RUL. The proposed hybrid model presents as its main advantage the fact that it does not require information about the previous operation states of the input variables of the engine. The performance of this model was compared with those obtained by other benchmark models (multivariate linear regression and artificial neural networks) also applied in recent years for the modeling of remaining useful life. Therefore, the PCA-CART-MARS-based approach is very promising in the field of prognostics of the RUL for aircraft engines. PMID:25806876

  18. Rapid and simultaneous analysis of five alkaloids in four parts of Coptidis Rhizoma by near-infrared spectroscopy

    NASA Astrophysics Data System (ADS)

    Jintao, Xue; Yufei, Liu; Liming, Ye; Chunyan, Li; Quanwei, Yang; Weiying, Wang; Yun, Jing; Minxiang, Zhang; Peng, Li

    2018-01-01

    Near-Infrared Spectroscopy (NIRS) was first used to develop a method for rapid and simultaneous determination of 5 active alkaloids (berberine, coptisine, palmatine, epiberberine and jatrorrhizine) in 4 parts (rhizome, fibrous root, stem and leaf) of Coptidis Rhizoma. A total of 100 samples from 4 main places of origin were collected and studied. With HPLC analysis values as calibration reference, the quantitative analysis of 5 marker components was performed by two different modeling methods, partial least-squares (PLS) regression as linear regression and artificial neural networks (ANN) as non-linear regression. The results indicated that the 2 types of models established were robust, accurate and repeatable for five active alkaloids, and the ANN models was more suitable for the determination of berberine, coptisine and palmatine while the PLS model was more suitable for the analysis of epiberberine and jatrorrhizine. The performance of the optimal models was achieved as follows: the correlation coefficient (R) for berberine, coptisine, palmatine, epiberberine and jatrorrhizine was 0.9958, 0.9956, 0.9959, 0.9963 and 0.9923, respectively; the root mean square error of validation (RMSEP) was 0.5093, 0.0578, 0.0443, 0.0563 and 0.0090, respectively. Furthermore, for the comprehensive exploitation and utilization of plant resource of Coptidis Rhizoma, the established NIR models were used to analysis the content of 5 active alkaloids in 4 parts of Coptidis Rhizoma and 4 main origin of places. This work demonstrated that NIRS may be a promising method as routine screening for off-line fast analysis or on-line quality assessment of traditional Chinese medicine (TCM).

  19. Use of social adaptability index to explain self-care and diabetes outcomes.

    PubMed

    Campbell, Jennifer A; Walker, Rebekah J; Smalls, Brittany L; Egede, Leonard E

    2017-06-20

    To examine whether the social adaptability index (SAI) alone or components of the index provide a better explanatory model for self-care and diabetes outcomes. Six hundred fifteen patients were recruited from two primary care settings. A series of multiple linear regression models were run to assess (1) associations between the SAI and diabetes self-care/outcomes, and (2) associations between individual SAI indicator variables and diabetes self-care/outcomes. Separate models were run for each self-care behavior and outcome. Two models were run for each dependent variable to compare associations with the SAI and components of the index. The SAI has a significant association with the mental component of quality of life (0.23, p < 0.01). In adjusted analyses, the SAI score did not have a significant association with any of the self-care behaviors. Individual components from the index had significant associations between self-care and multiple SAI indicator variables. Significant associations also exist between outcomes and the individual SAI indicators for education and employment. In this population, the SAI has low explanatory power and few significant associations with diabetes self-care/outcomes. While the use of a composite index to predict outcomes within a diabetes population would have high utility, particularly for clinical settings, this SAI lacks statistical and clinical significance in a representative diabetes population. Based on these results, the index does not provide a good model fit and masks the relationship of individual components to diabetes self-care and outcomes. These findings suggest that five items alone are not adequate to explain or predict outcomes for patients with type 2 diabetes.

  20. A guide to understanding meta-analysis.

    PubMed

    Israel, Heidi; Richter, Randy R

    2011-07-01

    With the focus on evidence-based practice in healthcare, a well-conducted systematic review that includes a meta-analysis where indicated represents a high level of evidence for treatment effectiveness. The purpose of this commentary is to assist clinicians in understanding meta-analysis as a statistical tool using both published articles and explanations of components of the technique. We describe what meta-analysis is, what heterogeneity is, and how it affects meta-analysis, effect size, the modeling techniques of meta-analysis, and strengths and weaknesses of meta-analysis. Common components like forest plot interpretation, software that may be used, special cases for meta-analysis, such as subgroup analysis, individual patient data, and meta-regression, and a discussion of criticisms, are included.

  1. [Biodiversity and depressive symptoms in Mexican adults: Exploration of beneficial environmental effects].

    PubMed

    Duarte-Tagles, Héctor; Salinas-Rodríguez, Aarón; Idrovo, Álvaro J; Búrquez, Alberto; Corral-Verdugo, Víctor

    2015-08-01

    Depression is a highly prevalent illness among adults, and it is the second most frequently reported mental disorder in urban settings in México. Exposure to natural environments and its components may improve the mental health of the population. To evaluate the association between biodiversity indicators and the prevalence of depressive symptoms among the adult population (20 to 65 years of age) in México. Information from the Encuesta Nacional de Salud y Nutrición 2006 (ENSANUT 2006) and the Compendio de Estadísticas Ambientales 2008 was analyzed. A biodiversity index was constructed based on the species richness and ecoregions in each state. A multilevel logistic regression model was built with random intercepts and a multiple logistic regression was generated with clustering by state. The factors associated with depressive symptoms were being female, self-perceived as indigenous, lower education level, not living with a partner, lack of steady paid work, having a chronic illness and drinking alcohol. The biodiversity index was found to be inversely associated with the prevalence of depressive symptoms when defined as a continuous variable, and the results from the regression were grouped by state (OR=0.71; 95% CI = 0.59-0.87). Although the design was cross-sectional, this study adds to the evidence of the potential benefits to mental health from contact with nature and its components.

  2. Practical aspects of estimating energy components in rodents

    PubMed Central

    van Klinken, Jan B.; van den Berg, Sjoerd A. A.; van Dijk, Ko Willems

    2013-01-01

    Recently there has been an increasing interest in exploiting computational and statistical techniques for the purpose of component analysis of indirect calorimetry data. Using these methods it becomes possible to dissect daily energy expenditure into its components and to assess the dynamic response of the resting metabolic rate (RMR) to nutritional and pharmacological manipulations. To perform robust component analysis, however, is not straightforward and typically requires the tuning of parameters and the preprocessing of data. Moreover the degree of accuracy that can be attained by these methods depends on the configuration of the system, which must be properly taken into account when setting up experimental studies. Here, we review the methods of Kalman filtering, linear, and penalized spline regression, and minimal energy expenditure estimation in the context of component analysis and discuss their results on high resolution datasets from mice and rats. In addition, we investigate the effect of the sample time, the accuracy of the activity sensor, and the washout time of the chamber on the estimation accuracy. We found that on the high resolution data there was a strong correlation between the results of Kalman filtering and penalized spline (P-spline) regression, except for the activity respiratory quotient (RQ). For low resolution data the basal metabolic rate (BMR) and resting RQ could still be estimated accurately with P-spline regression, having a strong correlation with the high resolution estimate (R2 > 0.997; sample time of 9 min). In contrast, the thermic effect of food (TEF) and activity related energy expenditure (AEE) were more sensitive to a reduction in the sample rate (R2 > 0.97). In conclusion, for component analysis on data generated by single channel systems with continuous data acquisition both Kalman filtering and P-spline regression can be used, while for low resolution data from multichannel systems P-spline regression gives more robust results. PMID:23641217

  3. Estimating methane emissions from landfills based on rainfall, ambient temperature, and waste composition: The CLEEN model.

    PubMed

    Karanjekar, Richa V; Bhatt, Arpita; Altouqui, Said; Jangikhatoonabad, Neda; Durai, Vennila; Sattler, Melanie L; Hossain, M D Sahadat; Chen, Victoria

    2015-12-01

    Accurately estimating landfill methane emissions is important for quantifying a landfill's greenhouse gas emissions and power generation potential. Current models, including LandGEM and IPCC, often greatly simplify treatment of factors like rainfall and ambient temperature, which can substantially impact gas production. The newly developed Capturing Landfill Emissions for Energy Needs (CLEEN) model aims to improve landfill methane generation estimates, but still require inputs that are fairly easy to obtain: waste composition, annual rainfall, and ambient temperature. To develop the model, methane generation was measured from 27 laboratory scale landfill reactors, with varying waste compositions (ranging from 0% to 100%); average rainfall rates of 2, 6, and 12 mm/day; and temperatures of 20, 30, and 37°C, according to a statistical experimental design. Refuse components considered were the major biodegradable wastes, food, paper, yard/wood, and textile, as well as inert inorganic waste. Based on the data collected, a multiple linear regression equation (R(2)=0.75) was developed to predict first-order methane generation rate constant values k as functions of waste composition, annual rainfall, and temperature. Because, laboratory methane generation rates exceed field rates, a second scale-up regression equation for k was developed using actual gas-recovery data from 11 landfills in high-income countries with conventional operation. The Capturing Landfill Emissions for Energy Needs (CLEEN) model was developed by incorporating both regression equations into the first-order decay based model for estimating methane generation rates from landfills. CLEEN model values were compared to actual field data from 6 US landfills, and to estimates from LandGEM and IPCC. For 4 of the 6 cases, CLEEN model estimates were the closest to actual. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. A Simulation Investigation of Principal Component Regression.

    ERIC Educational Resources Information Center

    Allen, David E.

    Regression analysis is one of the more common analytic tools used by researchers. However, multicollinearity between the predictor variables can cause problems in using the results of regression analyses. Problems associated with multicollinearity include entanglement of relative influences of variables due to reduced precision of estimation,…

  5. Weibull mixture regression for marginal inference in zero-heavy continuous outcomes.

    PubMed

    Gebregziabher, Mulugeta; Voronca, Delia; Teklehaimanot, Abeba; Santa Ana, Elizabeth J

    2017-06-01

    Continuous outcomes with preponderance of zero values are ubiquitous in data that arise from biomedical studies, for example studies of addictive disorders. This is known to lead to violation of standard assumptions in parametric inference and enhances the risk of misleading conclusions unless managed properly. Two-part models are commonly used to deal with this problem. However, standard two-part models have limitations with respect to obtaining parameter estimates that have marginal interpretation of covariate effects which are important in many biomedical applications. Recently marginalized two-part models are proposed but their development is limited to log-normal and log-skew-normal distributions. Thus, in this paper, we propose a finite mixture approach, with Weibull mixture regression as a special case, to deal with the problem. We use extensive simulation study to assess the performance of the proposed model in finite samples and to make comparisons with other family of models via statistical information and mean squared error criteria. We demonstrate its application on real data from a randomized controlled trial of addictive disorders. Our results show that a two-component Weibull mixture model is preferred for modeling zero-heavy continuous data when the non-zero part are simulated from Weibull or similar distributions such as Gamma or truncated Gauss.

  6. Analysis of Wind Tunnel Lateral Oscillatory Data of the F-16XL Aircraft

    NASA Technical Reports Server (NTRS)

    Klein, Vladislav; Murphy, Patrick C.; Szyba, Nathan M.

    2004-01-01

    Static and dynamic wind tunnel tests were performed on an 18% scale model of the F-16XL aircraft. These tests were performed over a wide range of angles of attack and sideslip with oscillation amplitudes from 5 deg. to 30 deg. and reduced frequencies from 0.073 to 0.269. Harmonic analysis was used to estimate Fourier coefficients and in-phase and out-of-phase components. For frequency dependent data from rolling oscillations, a two-step regression method was used to obtain unsteady models (indicial functions), and derivatives due to sideslip angle, roll rate and yaw rate from in-phase and out-of-phase components. Frequency dependence was found for angles of attack between 20 deg. and 50 deg. Reduced values of coefficient of determination and increased values of fit error were found for angles of attack between 35 deg. and 45 deg. An attempt to estimate model parameters from yaw oscillations failed, probably due to the low number of test cases at different frequencies.

  7. An Efficient Recommendation Filter Model on Smart Home Big Data Analytics for Enhanced Living Environments.

    PubMed

    Chen, Hao; Xie, Xiaoyun; Shu, Wanneng; Xiong, Naixue

    2016-10-15

    With the rapid growth of wireless sensor applications, the user interfaces and configurations of smart homes have become so complicated and inflexible that users usually have to spend a great amount of time studying them and adapting to their expected operation. In order to improve user experience, a weighted hybrid recommender system based on a Kalman Filter model is proposed to predict what users might want to do next, especially when users are located in a smart home with an enhanced living environment. Specifically, a weight hybridization method was introduced, which combines contextual collaborative filter and the contextual content-based recommendations. This method inherits the advantages of the optimum regression and the stability features of the proposed adaptive Kalman Filter model, and it can predict and revise the weight of each system component dynamically. Experimental results show that the hybrid recommender system can optimize the distribution of weights of each component, and achieve more reasonable recall and precision rates.

  8. An Efficient Recommendation Filter Model on Smart Home Big Data Analytics for Enhanced Living Environments

    PubMed Central

    Chen, Hao; Xie, Xiaoyun; Shu, Wanneng; Xiong, Naixue

    2016-01-01

    With the rapid growth of wireless sensor applications, the user interfaces and configurations of smart homes have become so complicated and inflexible that users usually have to spend a great amount of time studying them and adapting to their expected operation. In order to improve user experience, a weighted hybrid recommender system based on a Kalman Filter model is proposed to predict what users might want to do next, especially when users are located in a smart home with an enhanced living environment. Specifically, a weight hybridization method was introduced, which combines contextual collaborative filter and the contextual content-based recommendations. This method inherits the advantages of the optimum regression and the stability features of the proposed adaptive Kalman Filter model, and it can predict and revise the weight of each system component dynamically. Experimental results show that the hybrid recommender system can optimize the distribution of weights of each component, and achieve more reasonable recall and precision rates. PMID:27754456

  9. Exploratory studies into seasonal flow forecasting potential for large lakes

    NASA Astrophysics Data System (ADS)

    Sene, Kevin; Tych, Wlodek; Beven, Keith

    2018-01-01

    In seasonal flow forecasting applications, one factor which can help predictability is a significant hydrological response time between rainfall and flows. On account of storage influences, large lakes therefore provide a useful test case although, due to the spatial scales involved, there are a number of modelling challenges related to data availability and understanding the individual components in the water balance. Here some possible model structures are investigated using a range of stochastic regression and transfer function techniques with additional insights gained from simple analytical approximations. The methods were evaluated using records for two of the largest lakes in the world - Lake Malawi and Lake Victoria - with forecast skill demonstrated several months ahead using water balance models formulated in terms of net inflows. In both cases slight improvements were obtained for lead times up to 4-5 months from including climate indices in the data assimilation component. The paper concludes with a discussion of the relevance of the results to operational flow forecasting systems for other large lakes.

  10. Comparison of three chemometrics methods for near-infrared spectra of glucose in the whole blood

    NASA Astrophysics Data System (ADS)

    Zhang, Hongyan; Ding, Dong; Li, Xin; Chen, Yu; Tang, Yuguo

    2005-01-01

    Principal Component Regression (PCR), Partial Least Square (PLS) and Artificial Neural Networks (ANN) methods are used in the analysis for the near infrared (NIR) spectra of glucose in the whole blood. The calibration model is built up in the spectrum band where there are the glucose has much more spectral absorption than the water, fat, and protein with these methods and the correlation coefficients of the model are showed in this paper. Comparing these results, a suitable method to analyze the glucose NIR spectrum in the whole blood is found.

  11. WALLY 1 ...A large, principal components regression program with varimax rotation of the factor weight matrix

    Treesearch

    James R. Wallis

    1965-01-01

    Written in Fortran IV and MAP, this computer program can handle up to 120 variables, and retain 40 principal components. It can perform simultaneous regression of up to 40 criterion variables upon the varimax rotated factor weight matrix. The columns and rows of all output matrices are labeled by six-character alphanumeric names. Data input can be from punch cards or...

  12. Multiple-trait random regression models for the estimation of genetic parameters for milk, fat, and protein yield in buffaloes.

    PubMed

    Borquis, Rusbel Raul Aspilcueta; Neto, Francisco Ribeiro de Araujo; Baldi, Fernando; Hurtado-Lugo, Naudin; de Camargo, Gregório M F; Muñoz-Berrocal, Milthon; Tonhati, Humberto

    2013-09-01

    In this study, genetic parameters for test-day milk, fat, and protein yield were estimated for the first lactation. The data analyzed consisted of 1,433 first lactations of Murrah buffaloes, daughters of 113 sires from 12 herds in the state of São Paulo, Brazil, with calvings from 1985 to 2007. Ten-month classes of lactation days were considered for the test-day yields. The (co)variance components for the 3 traits were estimated using the regression analyses by Bayesian inference applying an animal model by Gibbs sampling. The contemporary groups were defined as herd-year-month of the test day. In the model, the random effects were additive genetic, permanent environment, and residual. The fixed effects were contemporary group and number of milkings (1 or 2), the linear and quadratic effects of the covariable age of the buffalo at calving, as well as the mean lactation curve of the population, which was modeled by orthogonal Legendre polynomials of fourth order. The random effects for the traits studied were modeled by Legendre polynomials of third and fourth order for additive genetic and permanent environment, respectively, the residual variances were modeled considering 4 residual classes. The heritability estimates for the traits were moderate (from 0.21-0.38), with higher estimates in the intermediate lactation phase. The genetic correlation estimates within and among the traits varied from 0.05 to 0.99. The results indicate that the selection for any trait test day will result in an indirect genetic gain for milk, fat, and protein yield in all periods of the lactation curve. The accuracy associated with estimated breeding values obtained using multi-trait random regression was slightly higher (around 8%) compared with single-trait random regression. This difference may be because to the greater amount of information available per animal. Copyright © 2013 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  13. Global and System-Specific Resting-State fMRI Fluctuations Are Uncorrelated: Principal Component Analysis Reveals Anti-Correlated Networks

    PubMed Central

    Carbonell, Felix; Bellec, Pierre

    2011-01-01

    Abstract The influence of the global average signal (GAS) on functional-magnetic resonance imaging (fMRI)–based resting-state functional connectivity is a matter of ongoing debate. The global average fluctuations increase the correlation between functional systems beyond the correlation that reflects their specific functional connectivity. Hence, removal of the GAS is a common practice for facilitating the observation of network-specific functional connectivity. This strategy relies on the implicit assumption of a linear-additive model according to which global fluctuations, irrespective of their origin, and network-specific fluctuations are super-positioned. However, removal of the GAS introduces spurious negative correlations between functional systems, bringing into question the validity of previous findings of negative correlations between fluctuations in the default-mode and the task-positive networks. Here we present an alternative method for estimating global fluctuations, immune to the complications associated with the GAS. Principal components analysis was applied to resting-state fMRI time-series. A global-signal effect estimator was defined as the principal component (PC) that correlated best with the GAS. The mean correlation coefficient between our proposed PC-based global effect estimator and the GAS was 0.97±0.05, demonstrating that our estimator successfully approximated the GAS. In 66 out of 68 runs, the PC that showed the highest correlation with the GAS was the first PC. Since PCs are orthogonal, our method provides an estimator of the global fluctuations, which is uncorrelated to the remaining, network-specific fluctuations. Moreover, unlike the regression of the GAS, the regression of the PC-based global effect estimator does not introduce spurious anti-correlations beyond the decrease in seed-based correlation values allowed by the assumed additive model. After regressing this PC-based estimator out of the original time-series, we observed robust anti-correlations between resting-state fluctuations in the default-mode and the task-positive networks. We conclude that resting-state global fluctuations and network-specific fluctuations are uncorrelated, supporting a Resting-State Linear-Additive Model. In addition, we conclude that the network-specific resting-state fluctuations of the default-mode and task-positive networks show artifact-free anti-correlations. PMID:22444074

  14. Estimation of test-day model (co)variance components across breeds using New Zealand dairy cattle data.

    PubMed

    Vanderick, S; Harris, B L; Pryce, J E; Gengler, N

    2009-03-01

    In New Zealand, a large proportion of cows are currently crossbreds, mostly Holstein-Friesians (HF) x Jersey (JE). The genetic evaluation system for milk yields is considering the same additive genetic effects for all breeds. The objective was to model different additive effects according to parental breeds to obtain first estimates of correlations among breed-specific effects and to study the usefulness of this type of random regression test-day model. Estimates of (co)variance components for purebred HF and JE cattle in purebred herds were computed by using a single-breed model. This analysis showed differences between the 2 breeds, with a greater variability in the HF breed. (Co)variance components for purebred HF and JE and crossbred HF x JE cattle were then estimated by using a complete multibreed model in which computations of complete across-breed (co)variances were simplified by correlating only eigenvectors for HF and JE random regressions of the same order as obtained from the single-breed analysis. Parameter estimates differed more strongly than expected between the single-breed and multibreed analyses, especially for JE. This could be due to differences between animals and management in purebred and non-purebred herds. In addition, the model used only partially accounted for heterosis. The multibreed analysis showed additive genetic differences between the HF and JE breeds, expressed as genetic correlations of additive effects in both breeds, especially in linear and quadratic Legendre polynomials (respectively, 0.807 and 0.604). The differences were small for overall milk production (0.926). Results showed that permanent environmental lactation curves were highly correlated across breeds; however, intraherd lactation curves were also affected by the breed-environment interaction. This result may indicate the existence of breed-specific competition effects that vary through the different lactation stages. In conclusion, a multibreed model similar to the one presented could optimally use the environmental and genetic parameters and provide breed-dependent additive breeding values. This model could also be a useful tool to evaluate crossbred dairy cattle populations like those in New Zealand. However, a routine evaluation would still require the development of an improved methodology. It would also be computationally very challenging because of the simultaneous presence of a large number of breeds.

  15. Sequential Processing and the Matching-Stimulus Interval Effect in ERP Components: An Exploration of the Mechanism Using Multiple Regression

    PubMed Central

    Steiner, Genevieve Z.; Barry, Robert J.; Gonsalvez, Craig J.

    2016-01-01

    In oddball tasks, increasing the time between stimuli within a particular condition (target-to-target interval, TTI; nontarget-to-nontarget interval, NNI) systematically enhances N1, P2, and P300 event-related potential (ERP) component amplitudes. This study examined the mechanism underpinning these effects in ERP components recorded from 28 adults who completed a conventional three-tone oddball task. Bivariate correlations, partial correlations and multiple regression explored component changes due to preceding ERP component amplitudes and intervals found within the stimulus series, rather than constraining the task with experimentally constructed intervals, which has been adequately explored in prior studies. Multiple regression showed that for targets, N1 and TTI predicted N2, TTI predicted P3a and P3b, and Processing Negativity (PN), P3b, and TTI predicted reaction time. For rare nontargets, P1 predicted N1, NNI predicted N2, and N1 predicted Slow Wave (SW). Findings show that the mechanism is operating on separate stages of stimulus-processing, suggestive of either increased activation within a number of stimulus-specific pathways, or very long component generator recovery cycles. These results demonstrate the extent to which matching-stimulus intervals influence ERP component amplitudes and behavior in a three-tone oddball task, and should be taken into account when designing similar studies. PMID:27445774

  16. Sequential Processing and the Matching-Stimulus Interval Effect in ERP Components: An Exploration of the Mechanism Using Multiple Regression.

    PubMed

    Steiner, Genevieve Z; Barry, Robert J; Gonsalvez, Craig J

    2016-01-01

    In oddball tasks, increasing the time between stimuli within a particular condition (target-to-target interval, TTI; nontarget-to-nontarget interval, NNI) systematically enhances N1, P2, and P300 event-related potential (ERP) component amplitudes. This study examined the mechanism underpinning these effects in ERP components recorded from 28 adults who completed a conventional three-tone oddball task. Bivariate correlations, partial correlations and multiple regression explored component changes due to preceding ERP component amplitudes and intervals found within the stimulus series, rather than constraining the task with experimentally constructed intervals, which has been adequately explored in prior studies. Multiple regression showed that for targets, N1 and TTI predicted N2, TTI predicted P3a and P3b, and Processing Negativity (PN), P3b, and TTI predicted reaction time. For rare nontargets, P1 predicted N1, NNI predicted N2, and N1 predicted Slow Wave (SW). Findings show that the mechanism is operating on separate stages of stimulus-processing, suggestive of either increased activation within a number of stimulus-specific pathways, or very long component generator recovery cycles. These results demonstrate the extent to which matching-stimulus intervals influence ERP component amplitudes and behavior in a three-tone oddball task, and should be taken into account when designing similar studies.

  17. Exposure reconstruction for the TCDD-exposed NIOSH cohort using a concentration- and age-dependent model of elimination.

    PubMed

    Aylward, Lesa L; Brunet, Robert C; Starr, Thomas B; Carrier, Gaétan; Delzell, Elizabeth; Cheng, Hong; Beall, Colleen

    2005-08-01

    Recent studies demonstrating a concentration dependence of elimination of 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) suggest that previous estimates of exposure for occupationally exposed cohorts may have underestimated actual exposure, resulting in a potential overestimate of the carcinogenic potency of TCDD in humans based on the mortality data for these cohorts. Using a database on U.S. chemical manufacturing workers potentially exposed to TCDD compiled by the National Institute for Occupational Safety and Health (NIOSH), we evaluated the impact of using a concentration- and age-dependent elimination model (CADM) (Aylward et al., 2005) on estimates of serum lipid area under the curve (AUC) for the NIOSH cohort. These data were used previously by Steenland et al. (2001) in combination with a first-order elimination model with an 8.7-year half-life to estimate cumulative serum lipid concentration (equivalent to AUC) for these workers for use in cancer dose-response assessment. Serum lipid TCDD measurements taken in 1988 for a subset of the cohort were combined with the NIOSH job exposure matrix and work histories to estimate dose rates per unit of exposure score. We evaluated the effect of choices in regression model (regression on untransformed vs. ln-transformed data and inclusion of a nonzero regression intercept) as well as the impact of choices of elimination models and parameters on estimated AUCs for the cohort. Central estimates for dose rate parameters derived from the serum-sampled subcohort were applied with the elimination models to time-specific exposure scores for the entire cohort to generate AUC estimates for all cohort members. Use of the CADM resulted in improved model fits to the serum sampling data compared to the first-order models. Dose rates varied by a factor of 50 among different combinations of elimination model, parameter sets, and regression models. Use of a CADM results in increases of up to five-fold in AUC estimates for the more highly exposed members of the cohort compared to estimates obtained using the first-order model with 8.7-year half-life. This degree of variation in the AUC estimates for this cohort would affect substantially the cancer potency estimates derived from the mortality data from this cohort. Such variability and uncertainty in the reconstructed serum lipid AUC estimates for this cohort, depending on elimination model, parameter set, and regression model, have not been described previously and are critical components in evaluating the dose-response data from the occupationally exposed populations.

  18. Identification of different bacterial species in biofilms using confocal Raman microscopy

    NASA Astrophysics Data System (ADS)

    Beier, Brooke D.; Quivey, Robert G.; Berger, Andrew J.

    2010-11-01

    Confocal Raman microspectroscopy is used to discriminate between different species of bacteria grown in biofilms. Tests are performed using two bacterial species, Streptococcus sanguinis and Streptococcus mutans, which are major components of oral plaque and of particular interest due to their association with healthy and cariogenic plaque, respectively. Dehydrated biofilms of these species are studied as a simplified model of dental plaque. A prediction model based on principal component analysis and logistic regression is calibrated using pure biofilms of each species and validated on pure biofilms grown months later, achieving 96% accuracy in prospective classification. When biofilms of the two species are partially mixed together, Raman-based identifications are achieved within ~2 μm of the boundaries between species with 97% accuracy. This combination of spatial resolution and predication accuracy should be suitable for forming images of species distributions within intact two-species biofilms.

  19. A preliminary case-mix classification system for Medicare home health clients.

    PubMed

    Branch, L G; Goldberg, H B

    1993-04-01

    In this study, a hierarchical case-mix model was developed for grouping Medicare home health beneficiaries homogeneously, based on the allowed charges for their home care. Based on information from a two-page form from 2,830 clients from ten states and using the classification and regression trees method, a four-component model was developed that yielded 11 case-mix groups and explained 22% of the variance for the test sample of 1,929 clients. The four components are rehabilitation, special care, skilled-nurse monitoring, and paralysis; each are categorized as present or absent. The range of mean-allowed charges for the 11 groups in the total sample was $473 to $2,562 with a mean of $847. Of the six groups with mean charges above $1,000, none exceeded 5.2% of clients; thus, the high-cost groups are relatively rare.

  20. Determination of butter adulteration with margarine using Raman spectroscopy.

    PubMed

    Uysal, Reyhan Selin; Boyaci, Ismail Hakki; Genis, Hüseyin Efe; Tamer, Ugur

    2013-12-15

    In this study, adulteration of butter with margarine was analysed using Raman spectroscopy combined with chemometric methods (principal component analysis (PCA), principal component regression (PCR), partial least squares (PLS)) and artificial neural networks (ANNs). Different butter and margarine samples were mixed at various concentrations ranging from 0% to 100% w/w. PCA analysis was applied for the classification of butters, margarines and mixtures. PCR, PLS and ANN were used for the detection of adulteration ratios of butter. Models were created using a calibration data set and developed models were evaluated using a validation data set. The coefficient of determination (R(2)) values between actual and predicted values obtained for PCR, PLS and ANN for the validation data set were 0.968, 0.987 and 0.978, respectively. In conclusion, a combination of Raman spectroscopy with chemometrics and ANN methods can be applied for testing butter adulteration. Copyright © 2013 Elsevier Ltd. All rights reserved.

  1. Common EEG features for behavioral estimation in disparate, real-world tasks.

    PubMed

    Touryan, Jon; Lance, Brent J; Kerick, Scott E; Ries, Anthony J; McDowell, Kaleb

    2016-02-01

    In this study we explored the potential for capturing the behavioral dynamics observed in real-world tasks from concurrent measures of EEG. In doing so, we sought to develop models of behavior that would enable the identification of common cross-participant and cross-task EEG features. To accomplish this we had participants perform both simulated driving and guard duty tasks while we recorded their EEG. For each participant we developed models to estimate their behavioral performance during both tasks. Sequential forward floating selection was used to identify the montage of independent components for each model. Linear regression was then used on the combined power spectra from these independent components to generate a continuous estimate of behavior. Our results show that oscillatory processes, evidenced in EEG, can be used to successfully capture slow fluctuations in behavior in complex, multi-faceted tasks. The average correlation coefficients between the actual and estimated behavior was 0.548 ± 0.117 and 0.701 ± 0.154 for the driving and guard duty tasks respectively. Interestingly, through a simple clustering approach we were able to identify a number of common components, both neural and eye-movement related, across participants and tasks. We used these component clusters to quantify the relative influence of common versus participant-specific features in the models of behavior. These findings illustrate the potential for estimating complex behavioral dynamics from concurrent measures from EEG using a finite library of universal features. Published by Elsevier B.V.

  2. Discrimination and prediction of cultivation age and parts of Panax ginseng by Fourier-transform infrared spectroscopy combined with multivariate statistical analysis.

    PubMed

    Lee, Byeong-Ju; Kim, Hye-Youn; Lim, Sa Rang; Huang, Linfang; Choi, Hyung-Kyoon

    2017-01-01

    Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values.

  3. Discrimination and prediction of cultivation age and parts of Panax ginseng by Fourier-transform infrared spectroscopy combined with multivariate statistical analysis

    PubMed Central

    Lim, Sa Rang; Huang, Linfang

    2017-01-01

    Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values. PMID:29049369

  4. Statistical optimization of medium components and growth conditions by response surface methodology to enhance phenol degradation by Pseudomonas putida.

    PubMed

    Annadurai, Gurusamy; Ling, Lai Yi; Lee, Jiunn-Fwu

    2008-02-28

    In this work, a four-level Box-Behnken factorial design was employed combining with response surface methodology (RSM) to optimize the medium composition for the degradation of phenol by pseudomonas putida (ATCC 31800). A mathematical model was then developed to show the effect of each medium composition and their interactions on the biodegradation of phenol. Response surface method was using four levels like glucose, yeast extract, ammonium sulfate and sodium chloride, which also enabled the identification of significant effects of interactions for the batch studies. The biodegradation of phenol on Pseudomonas putida (ATCC 31800) was determined to be pH-dependent and the maximum degradation capacity of microorganism at 30 degrees C when the phenol concentration was 0.2 g/L and the pH of the solution was 7.0. Second order polynomial regression model was used for analysis of the experiment. Cubic and quadratic terms were incorporated into the regression model through variable selection procedures. The experimental values are in good agreement with predicted values and the correlation coefficient was found to be 0.9980.

  5. Self-other rating agreement and leader-member exchange (LMX): a quasi-replication.

    PubMed

    Barbuto, John E; Wilmot, Michael P; Singh, Matthew; Story, Joana S P

    2012-04-01

    Data from a sample of 83 elected community leaders and 391 direct-report staff (resulting in 333 useable leader-member dyads) were reanalyzed to test relations between self-other rating agreement of servant leadership and member-reported leader-member exchange (LMX). Polynomial regression analysis indicated that the self-other rating agreement model was not statistically significant. Instead, all of the variance in member-reported LMX was accounted for by the others' ratings component alone.

  6. A model for national outcome audit in vascular surgery.

    PubMed

    Prytherch, D R; Ridler, B M; Beard, J D; Earnshaw, J J

    2001-06-01

    The aim was to model vascular surgical outcome in a national study using POSSUM scoring. One hundred and twenty-one British and Irish surgeons completed data questionnaires on patients undergoing arterial surgery under their care (mean 12 patients, range 1-49) in May/June 1998. A total of 1480 completed data records were available for logistic regression analysis using P-POSSUM methodology. Information collected included all POSSUM data items plus other factors thought to have a significant bearing on patient outcome: "extra items". The main outcome measures were death and major postoperative complications. The data were checked and inconsistent records were excluded. The remaining 1313 were divided into two sets for analysis. The first "training" set was used to obtain logistic regression models that were applied prospectively to the second "test" dataset. using POSSUM data items alone, it was possible to predict both mortality and morbidity after vascular reconstruction using P-POSSUM analysis. The addition of the "extra items" found significant in regression analysis did not significantly improve the accuracy of prediction. It was possible to predict both mortality and morbidity derived from the preoperative physiology components of the POSSUM data items alone. this study has shown that P-POSSUM methodology can be used to predict outcome after arterial surgery across a range of surgeons in different hospitals and could form the basis of a national outcome audit. It was also possible to obtain accurate models for both mortality and major morbidity from the POSSUM physiology scores alone. Copyright 2001 Harcourt Publishers Limited.

  7. Metabolic outcomes of workers according to the International Standard Classification of Occupations in Korea.

    PubMed

    Lee, Wanhyung; Yeom, Hyungseon; Yoon, Jin-Ha; Won, Jong-Uk; Jung, Pil Kyun; Lee, June-Hee; Seok, Hongdeok; Roh, Jaehoon

    2016-08-01

    Occupation influences the risk for developing chronic metabolic diseases. We compared the prevalence of MetS by International Standard Classification of Occupations using the nationally representative data in Korea (KNHANES). We enrolled 16,763 workers (9,175 males; 7,588 females) who had measurements for the National Cholesterol Education Program criteria III and other variables. OR and 95%CIs for MetS and its components were estimated according to occupation using the multiple logistic regression models. The occupational groups with the highest age-standardized prevalence of MetS were lower skilled white-collar men (31.1 ± 2.4%) and green-collar women (24.2 ± 2.9%). Compared with the unskilled male blue-collar group, which had the lowest prevalence of MetS, the OR (95%CIs) of MetS in men were 1.77 (1.45-2.15) in higher skilled white-collar, 1.82 (1.47-2.26) in lower-skilled white-collar, 1.63 (1.32-2.01) in pink-collar and 1.37 (1.13-1.66) in skilled blue-collar workers in final logistic regression model. MetS and its components vary by occupational category and gender in ways that may guide health interventions. Am. J. Ind. Med. 59:685-694, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  8. Current Pressure Transducer Application of Model-based Prognostics Using Steady State Conditions

    NASA Technical Reports Server (NTRS)

    Teubert, Christopher; Daigle, Matthew J.

    2014-01-01

    Prognostics is the process of predicting a system's future states, health degradation/wear, and remaining useful life (RUL). This information plays an important role in preventing failure, reducing downtime, scheduling maintenance, and improving system utility. Prognostics relies heavily on wear estimation. In some components, the sensors used to estimate wear may not be fast enough to capture brief transient states that are indicative of wear. For this reason it is beneficial to be capable of detecting and estimating the extent of component wear using steady-state measurements. This paper details a method for estimating component wear using steady-state measurements, describes how this is used to predict future states, and presents a case study of a current/pressure (I/P) Transducer. I/P Transducer nominal and off-nominal behaviors are characterized using a physics-based model, and validated against expected and observed component behavior. This model is used to map observed steady-state responses to corresponding fault parameter values in the form of a lookup table. This method was chosen because of its fast, efficient nature, and its ability to be applied to both linear and non-linear systems. Using measurements of the steady state output, and the lookup table, wear is estimated. A regression is used to estimate the wear propagation parameter and characterize the damage progression function, which are used to predict future states and the remaining useful life of the system.

  9. Analysis of Moisture Content in Beetroot using Fourier Transform Infrared Spectroscopy and by Principal Component Analysis.

    PubMed

    Nesakumar, Noel; Baskar, Chanthini; Kesavan, Srinivasan; Rayappan, John Bosco Balaguru; Alwarappan, Subbiah

    2018-05-22

    The moisture content of beetroot varies during long-term cold storage. In this work, we propose a strategy to identify the moisture content and age of beetroot using principal component analysis coupled Fourier transform infrared spectroscopy (FTIR). Frequent FTIR measurements were recorded directly from the beetroot sample surface over a period of 34 days for analysing its moisture content employing attenuated total reflectance in the spectral ranges of 2614-4000 and 1465-1853 cm -1 with a spectral resolution of 8 cm -1 . In order to estimate the transmittance peak height (T p ) and area under the transmittance curve [Formula: see text] over the spectral ranges of 2614-4000 and 1465-1853 cm -1 , Gaussian curve fitting algorithm was performed on FTIR data. Principal component and nonlinear regression analyses were utilized for FTIR data analysis. Score plot over the ranges of 2614-4000 and 1465-1853 cm -1 allowed beetroot quality discrimination. Beetroot quality predictive models were developed by employing biphasic dose response function. Validation experiment results confirmed that the accuracy of the beetroot quality predictive model reached 97.5%. This research work proves that FTIR spectroscopy in combination with principal component analysis and beetroot quality predictive models could serve as an effective tool for discriminating moisture content in fresh, half and completely spoiled stages of beetroot samples and for providing status alerts.

  10. Random regression analyses using B-spline functions to model growth of Nellore cattle.

    PubMed

    Boligon, A A; Mercadante, M E Z; Lôbo, R B; Baldi, F; Albuquerque, L G

    2012-02-01

    The objective of this study was to estimate (co)variance components using random regression on B-spline functions to weight records obtained from birth to adulthood. A total of 82 064 weight records of 8145 females obtained from the data bank of the Nellore Breeding Program (PMGRN/Nellore Brazil) which started in 1987, were used. The models included direct additive and maternal genetic effects and animal and maternal permanent environmental effects as random. Contemporary group and dam age at calving (linear and quadratic effect) were included as fixed effects, and orthogonal Legendre polynomials of age (cubic regression) were considered as random covariate. The random effects were modeled using B-spline functions considering linear, quadratic and cubic polynomials for each individual segment. Residual variances were grouped in five age classes. Direct additive genetic and animal permanent environmental effects were modeled using up to seven knots (six segments). A single segment with two knots at the end points of the curve was used for the estimation of maternal genetic and maternal permanent environmental effects. A total of 15 models were studied, with the number of parameters ranging from 17 to 81. The models that used B-splines were compared with multi-trait analyses with nine weight traits and to a random regression model that used orthogonal Legendre polynomials. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most appropriate and parsimonious model to describe the covariance structure of the data. Selection for higher weight, such as at young ages, should be performed taking into account an increase in mature cow weight. Particularly, this is important in most of Nellore beef cattle production systems, where the cow herd is maintained on range conditions. There is limited modification of the growth curve of Nellore cattle with respect to the aim of selecting them for rapid growth at young ages while maintaining constant adult weight.

  11. Modeling Array Stations in SIG-VISA

    NASA Astrophysics Data System (ADS)

    Ding, N.; Moore, D.; Russell, S.

    2013-12-01

    We add support for array stations to SIG-VISA, a system for nuclear monitoring using probabilistic inference on seismic signals. Array stations comprise a large portion of the IMS network; they can provide increased sensitivity and more accurate directional information compared to single-component stations. Our existing model assumed that signals were independent at each station, which is false when lots of stations are close together, as in an array. The new model removes that assumption by jointly modeling signals across array elements. This is done by extending our existing Gaussian process (GP) regression models, also known as kriging, from a 3-dimensional single-component space of events to a 6-dimensional space of station-event pairs. For each array and each event attribute (including coda decay, coda height, amplitude transfer and travel time), we model the joint distribution across array elements using a Gaussian process that learns the correlation lengthscale across the array, thereby incorporating information of array stations into the probabilistic inference framework. To evaluate the effectiveness of our model, we perform ';probabilistic beamforming' on new events using our GP model, i.e., we compute the event azimuth having highest posterior probability under the model, conditioned on the signals at array elements. We compare the results from our probabilistic inference model to the beamforming currently performed by IMS station processing.

  12. Emergency department patient perceptions and preferences on opt-in rapid HIV screening program components

    PubMed Central

    Merchant, Roland C.; Clark, Melissa A.; Seage, George R.; Mayer, Kenneth H.; DeGruttola, Victor G.; Becker, Bruce M.

    2011-01-01

    The aim of this investigation was to assess emergency department (ED) patients’ perceptions and preferences about an opt-in, universal, rapid HIV screening program and identify patient groups who expressed stronger beliefs about components of the testing program. From July 2005 to July 2006, ED patients in the opt-in, universal, rapid HIV screening program were interviewed in person. Multivariable regression models were used to compare participants on their beliefs about the program components. Of the 561 participants, 62.0% had previously been tested for HIV. The majority of participants (58.8%) believed the rapid and standard/conventional HIV tests to be equally accurate, 27.7% believed the rapid test to be less or much less accurate, and 8.7% believed the rapid test to be more or much more accurate. Almost two-thirds (65.1%) favored having a rapid instead of a standard/conventional HIV test, 94.6% wanted the test results within one hour, and 61.3% would be likely or very likely to undergo testing in the ED if it prolonged their ED visit. Almost all (92.5%) believed that their medical care was “not at all” delayed because of being tested, 94.1% believed that testing did “not at all” divert attention from the reason for their ED visit, and 80.9% thought that testing in the ED was “not at all” stressful. In multivariable logistic regression models, males and those with more than 12 years of formal education showed greater concerns about the rapid HIV test’s accuracy. Hispanic/Latinos, participants with governmental insurance, and those previously HIV tested were more apt to be screened for HIV even if testing delayed their ED departure. Overall, participants were highly accepting of the components of this opt-in rapid HIV screening program. However, concerns regarding the accuracy of the rapid HIV test might limit test acceptance and should be addressed during pre-test information procedures. PMID:19283644

  13. Least-Squares Regression and Spectral Residual Augmented Classical Least-Squares Chemometric Models for Stability-Indicating Analysis of Agomelatine and Its Degradation Products: A Comparative Study.

    PubMed

    Naguib, Ibrahim A; Abdelrahman, Maha M; El Ghobashy, Mohamed R; Ali, Nesma A

    2016-01-01

    Two accurate, sensitive, and selective stability-indicating methods are developed and validated for simultaneous quantitative determination of agomelatine (AGM) and its forced degradation products (Deg I and Deg II), whether in pure forms or in pharmaceutical formulations. Partial least-squares regression (PLSR) and spectral residual augmented classical least-squares (SRACLS) are two chemometric models that are being subjected to a comparative study through handling UV spectral data in range (215-350 nm). For proper analysis, a three-factor, four-level experimental design was established, resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of eight mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze AGM, Deg I, and Deg II with high selectivity and accuracy. The analysis results of the pharmaceutical formulations were statistically compared to the reference HPLC method, with no significant differences observed regarding accuracy and precision. The SRACLS model gives comparable results to the PLSR model; however, it keeps the qualitative spectral information of the classical least-squares algorithm for analyzed components.

  14. Modeling and Prediction of Solvent Effect on Human Skin Permeability using Support Vector Regression and Random Forest.

    PubMed

    Baba, Hiromi; Takahara, Jun-ichi; Yamashita, Fumiyoshi; Hashida, Mitsuru

    2015-11-01

    The solvent effect on skin permeability is important for assessing the effectiveness and toxicological risk of new dermatological formulations in pharmaceuticals and cosmetics development. The solvent effect occurs by diverse mechanisms, which could be elucidated by efficient and reliable prediction models. However, such prediction models have been hampered by the small variety of permeants and mixture components archived in databases and by low predictive performance. Here, we propose a solution to both problems. We first compiled a novel large database of 412 samples from 261 structurally diverse permeants and 31 solvents reported in the literature. The data were carefully screened to ensure their collection under consistent experimental conditions. To construct a high-performance predictive model, we then applied support vector regression (SVR) and random forest (RF) with greedy stepwise descriptor selection to our database. The models were internally and externally validated. The SVR achieved higher performance statistics than RF. The (externally validated) determination coefficient, root mean square error, and mean absolute error of SVR were 0.899, 0.351, and 0.268, respectively. Moreover, because all descriptors are fully computational, our method can predict as-yet unsynthesized compounds. Our high-performance prediction model offers an attractive alternative to permeability experiments for pharmaceutical and cosmetic candidate screening and optimizing skin-permeable topical formulations.

  15. Bayesian regression model for recurrent event data with event-varying covariate effects and event effect.

    PubMed

    Lin, Li-An; Luo, Sheng; Davis, Barry R

    2018-01-01

    In the course of hypertension, cardiovascular disease events (e.g., stroke, heart failure) occur frequently and recurrently. The scientific interest in such study may lie in the estimation of treatment effect while accounting for the correlation among event times. The correlation among recurrent event times come from two sources: subject-specific heterogeneity (e.g., varied lifestyles, genetic variations, and other unmeasurable effects) and event dependence (i.e., event incidences may change the risk of future recurrent events). Moreover, event incidences may change the disease progression so that there may exist event-varying covariate effects (the covariate effects may change after each event) and event effect (the effect of prior events on the future events). In this article, we propose a Bayesian regression model that not only accommodates correlation among recurrent events from both sources, but also explicitly characterizes the event-varying covariate effects and event effect. This model is especially useful in quantifying how the incidences of events change the effects of covariates and risk of future events. We compare the proposed model with several commonly used recurrent event models and apply our model to the motivating lipid-lowering trial (LLT) component of the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) (ALLHAT-LLT).

  16. Bayesian regression model for recurrent event data with event-varying covariate effects and event effect

    PubMed Central

    Lin, Li-An; Luo, Sheng; Davis, Barry R.

    2017-01-01

    In the course of hypertension, cardiovascular disease events (e.g., stroke, heart failure) occur frequently and recurrently. The scientific interest in such study may lie in the estimation of treatment effect while accounting for the correlation among event times. The correlation among recurrent event times come from two sources: subject-specific heterogeneity (e.g., varied lifestyles, genetic variations, and other unmeasurable effects) and event dependence (i.e., event incidences may change the risk of future recurrent events). Moreover, event incidences may change the disease progression so that there may exist event-varying covariate effects (the covariate effects may change after each event) and event effect (the effect of prior events on the future events). In this article, we propose a Bayesian regression model that not only accommodates correlation among recurrent events from both sources, but also explicitly characterizes the event-varying covariate effects and event effect. This model is especially useful in quantifying how the incidences of events change the effects of covariates and risk of future events. We compare the proposed model with several commonly used recurrent event models and apply our model to the motivating lipid-lowering trial (LLT) component of the Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial (ALLHAT) (ALLHAT-LLT). PMID:29755162

  17. A basin-scale approach to estimating stream temperatures of tributaries to the lower Klamath River, California

    USGS Publications Warehouse

    Flint, L.E.; Flint, A.L.

    2008-01-01

    Stream temperature is an important component of salmonid habitat and is often above levels suitable for fish survival in the Lower Klamath River in northern California. The objective of this study was to provide boundary conditions for models that are assessing stream temperature on the main stem for the purpose of developing strategies to manage stream conditions using Total Maximum Daily Loads. For model input, hourly stream temperatures for 36 tributaries were estimated for 1 Jan. 2001 through 31 Oct. 2004. A basin-scale approach incorporating spatially distributed energy balance data was used to estimate the stream temperatures with measured air temperature and relative humidity data and simulated solar radiation, including topographic shading and corrections for cloudiness. Regression models were developed on the basis of available stream temperature data to predict temperatures for unmeasured periods of time and for unmeasured streams. The most significant factor in matching measured minimum and maximum stream temperatures was the seasonality of the estimate. Adding minimum and maximum air temperature to the regression model improved the estimate, and air temperature data over the region are available and easily distributed spatially. The addition of simulated solar radiation and vapor saturation deficit to the regression model significantly improved predictions of maximum stream temperature but was not required to predict minimum stream temperature. The average SE in estimated maximum daily stream temperature for the individual basins was 0.9 ?? 0.6??C at the 95% confidence interval. Copyright ?? 2008 by the American Society of Agronomy, Crop Science Society of America, and Soil Science Society of America. All rights reserved.

  18. Characterization of the spatial variability of soil available zinc at various sampling densities using grouped soil type information.

    PubMed

    Song, Xiao-Dong; Zhang, Gan-Lin; Liu, Feng; Li, De-Cheng; Zhao, Yu-Guo

    2016-11-01

    The influence of anthropogenic activities and natural processes involved high uncertainties to the spatial variation modeling of soil available zinc (AZn) in plain river network regions. Four datasets with different sampling densities were split over the Qiaocheng district of Bozhou City, China. The difference of AZn concentrations regarding soil types was analyzed by the principal component analysis (PCA). Since the stationarity was not indicated and effective ranges of four datasets were larger than the sampling extent (about 400 m), two investigation tools, namely F3 test and stationarity index (SI), were employed to test the local non-stationarity. Geographically weighted regression (GWR) technique was performed to describe the spatial heterogeneity of AZn concentrations under the non-stationarity assumption. GWR based on grouped soil type information (GWRG for short) was proposed so as to benefit the local modeling of soil AZn within each soil-landscape unit. For reference, the multiple linear regression (MLR) model, a global regression technique, was also employed and incorporated the same predictors as in the GWR models. Validation results based on 100 times realization demonstrated that GWRG outperformed MLR and can produce similar or better accuracy than the GWR approach. Nevertheless, GWRG can generate better soil maps than GWR for limit soil data. Two-sample t test of produced soil maps also confirmed significantly different means. Variogram analysis of the model residuals exhibited weak spatial correlation, rejecting the use of hybrid kriging techniques. As a heuristically statistical method, the GWRG was beneficial in this study and potentially for other soil properties.

  19. Sensitivity of ALOS/PALSAR imagery to forest degradation by fire in northern Amazon

    NASA Astrophysics Data System (ADS)

    Martins, Flora da Silva Ramos Vieira; dos Santos, João Roberto; Galvão, Lênio Soares; Xaud, Haron Abrahim Magalhães

    2016-07-01

    We evaluated the sensitivity of the full polarimetric Phased Array type L-band Synthetic Aperture Radar (PALSAR), onboard the Advanced Land Observing Satellite (ALOS), to forest degradation caused by fires in northern Amazon, Brazil. We searched for changes in PALSAR signal and tri-dimensional polarimetric responses for different classes of fire disturbance defined by fire frequency and severity. Since the aboveground biomass (AGB) is affected by fire, multiple regression models to estimate AGB were obtained for the whole set of coherent and incoherent attributes (general model) and for each set separately (specific models). The results showed that the polarimetric L-band PALSAR attributes were sensitive to variations in canopy structure and AGB caused by forest fire. However, except for the unburned and thrice burned classes, no single PALSAR attribute was able to discriminate between the intermediate classes of forest degradation by fire. Both the coherent and incoherent polarimetric attributes were important to explain AGB variations in tropical forests affected by fire. The HV backscattering coefficient, anisotropy, double-bounce component, orientation angle, volume index and HH-VV phase difference were PALSAR attributes selected from multiple regression analysis to estimate AGB. The general regression model, combining phase and power radar metrics, presented better results than specific models using coherent or incoherent attributes. The polarimetric responses indicated the dominance of VV-oriented backscattering in primary forest and lightly burned forests. The HH-oriented backscattering predominated in heavily and frequently burned forests. The results suggested a greater contribution of horizontally arranged constituents such as fallen trunks or branches in areas severely affected by fire.

  20. Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush–steppe ecosystem

    USGS Publications Warehouse

    Wylie, Bruce K.; Johnson, Douglas A.; Laca, Emilio; Saliendra, Nicanor Z.; Gilmanov, Tagir G.; Reed, Bradley C.; Tieszen, Larry L.; Worstell, Bruce B.

    2003-01-01

    The net ecosystem exchange (NEE) of carbon flux can be partitioned into gross primary productivity (GPP) and respiration (R). The contribution of remote sensing and modeling holds the potential to predict these components and map them spatially and temporally. This has obvious utility to quantify carbon sink and source relationships and to identify improved land management strategies for optimizing carbon sequestration. The objective of our study was to evaluate prediction of 14-day average daytime CO2 fluxes (Fday) and nighttime CO2 fluxes (Rn) using remote sensing and other data. Fday and Rnwere measured with a Bowen ratio–energy balance (BREB) technique in a sagebrush (Artemisia spp.)–steppe ecosystem in northeast Idaho, USA, during 1996–1999. Micrometeorological variables aggregated across 14-day periods and time-integrated Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (iNDVI) were determined during four growing seasons (1996–1999) and used to predict Fday and Rn. We found that iNDVI was a strong predictor of Fday(R2=0.79, n=66, P<0.0001). Inclusion of evapotranspiration in the predictive equation led to improved predictions of Fday (R2=0.82, n=66, P<0.0001). Crossvalidation indicated that regression tree predictions of Fday were prone to overfitting and that linear regression models were more robust. Multiple regression and regression tree models predicted Rn quite well (R2=0.75–0.77, n=66) with the regression tree model being slightly more robust in crossvalidation. Temporal mapping of Fday and Rn is possible with these techniques and would allow the assessment of NEE in sagebrush–steppe ecosystems. Simulations of periodic Fday measurements, as might be provided by a mobile flux tower, indicated that such measurements could be used in combination with iNDVI to accurately predict Fday. These periodic measurements could maximize the utility of expensive flux towers for evaluating various carbon management strategies, carbon certification, and validation and calibration of carbon flux models.

  1. Calibration of remotely sensed, coarse resolution NDVI to CO2 fluxes in a sagebrush-steppe ecosystem

    USGS Publications Warehouse

    Wylie, B.K.; Johnson, D.A.; Laca, Emilio; Saliendra, Nicanor Z.; Gilmanov, T.G.; Reed, B.C.; Tieszen, L.L.; Worstell, B.B.

    2003-01-01

    The net ecosystem exchange (NEE) of carbon flux can be partitioned into gross primary productivity (GPP) and respiration (R). The contribution of remote sensing and modeling holds the potential to predict these components and map them spatially and temporally. This has obvious utility to quantify carbon sink and source relationships and to identify improved land management strategies for optimizing carbon sequestration. The objective of our study was to evaluate prediction of 14-day average daytime CO2 fluxes (Fday) and nighttime CO2 fluxes (Rn) using remote sensing and other data. Fday and Rn were measured with a Bowen ratio-energy balance (BREB) technique in a sagebrush (Artemisia spp.)-steppe ecosystem in northeast Idaho, USA, during 1996-1999. Micrometeorological variables aggregated across 14-day periods and time-integrated Advanced Very High Resolution Radiometer (AVHRR) Normalized Difference Vegetation Index (iNDVI) were determined during four growing seasons (1996-1999) and used to predict Fday and Rn. We found that iNDVI was a strong predictor of Fday (R2 = 0.79, n = 66, P < 0.0001). Inclusion of evapotranspiration in the predictive equation led to improved predictions of Fday (R2= 0.82, n = 66, P < 0.0001). Crossvalidation indicated that regression tree predictions of Fday were prone to overfitting and that linear regression models were more robust. Multiple regression and regression tree models predicted Rn quite well (R2 = 0.75-0.77, n = 66) with the regression tree model being slightly more robust in crossvalidation. Temporal mapping of Fday and Rn is possible with these techniques and would allow the assessment of NEE in sagebrush-steppe ecosystems. Simulations of periodic Fday measurements, as might be provided by a mobile flux tower, indicated that such measurements could be used in combination with iNDVI to accurately predict Fday. These periodic measurements could maximize the utility of expensive flux towers for evaluating various carbon management strategies, carbon certification, and validation and calibration of carbon flux models. ?? 2003 Elsevier Science Inc. All rights reserved.

  2. An improved model for the combustion of AP composite propellants

    NASA Technical Reports Server (NTRS)

    Cohen, N. S.; Strand, L. D.

    1981-01-01

    This paper presents several improvements to the BDP model of steady-state burning of AP composite solid propellants. The Price-Boggs-Derr model of AP monopropellant burning is incorporated to represent the AP. A separate energy equation is written for the binder to permit a different surface temperature from the AP; this includes an analysis of the sharing of primary diffusion flame energy, and correction of a BDP model inconsistency in treating the binder regression rate. A method for assembling component contributions to calculate the burning rates of multimodal propellants is also presented. Results are shown in the form of representative burning rate curves, comparisons with data, and calculated internal details of interest. Ideas for future work are discussed in an Appendix.

  3. A church-based diet and physical activity intervention for rural, lower Mississippi Delta African American adults: Delta Body and Soul effectiveness study, 2010-2011.

    PubMed

    Tussing-Humphreys, Lisa; Thomson, Jessica L; Mayo, Tanyatta; Edmond, Emanuel

    2013-06-06

    Obesity, diabetes, and hypertension have reached epidemic levels in the largely rural Lower Mississippi Delta (LMD) region. We assessed the effectiveness of a 6-month, church-based diet and physical activity intervention, conducted during 2010 through 2011, for improving diet quality (measured by the Healthy Eating Index-2005) and increasing physical activity of African American adults in the LMD region. We used a quasi-experimental design in which 8 self-selected eligible churches were assigned to intervention or control. Assessments included dietary, physical activity, anthropometric, and clinical measures. Statistical tests for group comparisons included χ(2), Fisher's exact, and McNemar's tests for categorical variables, and mixed-model regression analysis for continuous variables and modeling intervention effects. Retention rates were 85% (176 of 208) for control and 84% (163 of 195) for intervention churches. Diet quality components, including total fruit, total vegetables, and total quality improved significantly in both control (mean [standard deviation], 0.3 [1.8], 0.2 [1.1], and 3.4 [9.6], respectively) and intervention (0.6 [1.7], 0.3 [1.2], and 3.2 [9.7], respectively) groups, while significant increases in aerobic (22%) and strength/flexibility (24%) physical activity indicators were apparent in the intervention group only. Regression analysis indicated that intervention participation level and vehicle ownership were significant positive predictors of change for several diet quality components. This church-based diet and physical activity intervention may be effective in improving diet quality and increasing physical activity of LMD African American adults. Components key to the success of such programs are participant engagement in educational sessions and vehicle access.

  4. Indirect Measurement Of Nitrogen In A Multi-Component Gas By Measuring The Speed Of Sound At Two States Of The Gas.

    DOEpatents

    Morrow, Thomas B.; Behring, II, Kendricks A.

    2004-10-12

    A methods of indirectly measuring the nitrogen concentration in a gas mixture. The molecular weight of the gas is modeled as a function of the speed of sound in the gas, the diluent concentrations in the gas, and constant values, resulting in a model equation. Regression analysis is used to calculate the constant values, which can then be substituted into the model equation. If the speed of sound in the gas is measured at two states and diluent concentrations other than nitrogen (typically carbon dioxide) are known, two equations for molecular weight can be equated and solved for the nitrogen concentration in the gas mixture.

  5. WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING

    PubMed Central

    Reiss, Philip T.; Huo, Lan; Zhao, Yihong; Kelly, Clare; Ogden, R. Todd

    2016-01-01

    An increasingly important goal of psychiatry is the use of brain imaging data to develop predictive models. Here we present two contributions to statistical methodology for this purpose. First, we propose and compare a set of wavelet-domain procedures for fitting generalized linear models with scalar responses and image predictors: sparse variants of principal component regression and of partial least squares, and the elastic net. Second, we consider assessing the contribution of image predictors over and above available scalar predictors, in particular via permutation tests and an extension of the idea of confounding to the case of functional or image predictors. Using the proposed methods, we assess whether maps of a spontaneous brain activity measure, derived from functional magnetic resonance imaging, can meaningfully predict presence or absence of attention deficit/hyperactivity disorder (ADHD). Our results shed light on the role of confounding in the surprising outcome of the recent ADHD-200 Global Competition, which challenged researchers to develop algorithms for automated image-based diagnosis of the disorder. PMID:27330652

  6. Feasibility of using near infrared spectroscopy to detect and quantify an adulterant in high quality sandalwood oil.

    PubMed

    Kuriakose, Saji; Joe, I Hubert

    2013-11-01

    Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC=0.00009% v/v). The lowest root mean square error of prediction (RMSEP=0.00016% v/v) in the test set and the highest coefficient of determination (R(2)=0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model. Copyright © 2013 Elsevier B.V. All rights reserved.

  7. Assessing the Liquidity of Firms: Robust Neural Network Regression as an Alternative to the Current Ratio

    NASA Astrophysics Data System (ADS)

    de Andrés, Javier; Landajo, Manuel; Lorca, Pedro; Labra, Jose; Ordóñez, Patricia

    Artificial neural networks have proven to be useful tools for solving financial analysis problems such as financial distress prediction and audit risk assessment. In this paper we focus on the performance of robust (least absolute deviation-based) neural networks on measuring liquidity of firms. The problem of learning the bivariate relationship between the components (namely, current liabilities and current assets) of the so-called current ratio is analyzed, and the predictive performance of several modelling paradigms (namely, linear and log-linear regressions, classical ratios and neural networks) is compared. An empirical analysis is conducted on a representative data base from the Spanish economy. Results indicate that classical ratio models are largely inadequate as a realistic description of the studied relationship, especially when used for predictive purposes. In a number of cases, especially when the analyzed firms are microenterprises, the linear specification is improved by considering the flexible non-linear structures provided by neural networks.

  8. How to predict the sugariness and hardness of melons: A near-infrared hyperspectral imaging method.

    PubMed

    Sun, Meijun; Zhang, Dong; Liu, Li; Wang, Zheng

    2017-03-01

    Hyperspectral imaging (HSI) in the near-infrared (NIR) region (900-1700nm) was used for non-intrusive quality measurements (of sweetness and texture) in melons. First, HSI data from melon samples were acquired to extract the spectral signatures. The corresponding sample sweetness and hardness values were recorded using traditional intrusive methods. Partial least squares regression (PLSR), principal component analysis (PCA), support vector machine (SVM), and artificial neural network (ANN) models were created to predict melon sweetness and hardness values from the hyperspectral data. Experimental results for the three types of melons show that PLSR produces the most accurate results. To reduce the high dimensionality of the hyperspectral data, the weighted regression coefficients of the resulting PLSR models were used to identify the most important wavelengths. On the basis of these wavelengths, each image pixel was used to visualize the sweetness and hardness in all the portions of each sample. Copyright © 2016 Elsevier Ltd. All rights reserved.

  9. Atmospheric concentrations, sources and gas-particle partitioning of PAHs in Beijing after the 29th Olympic Games.

    PubMed

    Ma, Wan-Li; Sun, De-Zhi; Shen, Wei-Guo; Yang, Meng; Qi, Hong; Liu, Li-Yan; Shen, Ji-Min; Li, Yi-Fan

    2011-07-01

    A comprehensive sampling campaign was carried out to study atmospheric concentration of polycyclic aromatic hydrocarbons (PAHs) in Beijing and to evaluate the effectiveness of source control strategies in reducing PAHs pollution after the 29th Olympic Games. The sub-cooled liquid vapor pressure (logP(L)(o))-based model and octanol-air partition coefficient (K(oa))-based model were applied based on each seasonal dateset. Regression analysis among log K(P), logP(L)(o) and log K(oa) exhibited high significant correlations for four seasons. Source factors were identified by principle component analysis and contributions were further estimated by multiple linear regression. Pyrogenic sources and coke oven emission were identified as major sources for both the non-heating and heating seasons. As compared with literatures, the mean PAH concentrations before and after the 29th Olympic Games were reduced by more than 60%, indicating that the source control measures were effective for reducing PAHs pollution in Beijing. Copyright © 2011 Elsevier Ltd. All rights reserved.

  10. Intelligent Optimization of the Film-to-Fiber Ratio of a Degradable Braided Bicomponent Ureteral Stent

    PubMed Central

    Liu, Xiaoyan; Li, Feng; Ding, Yongsheng; Zou, Ting; Wang, Lu; Hao, Kuangrong

    2015-01-01

    A hierarchical support vector regression (SVR) model (HSVRM) was employed to correlate the compositions and mechanical properties of bicomponent stents composed of poly(lactic-co-glycolic acid) (PGLA) film and poly(glycolic acid) (PGA) fibers for urethral repair for the first time. PGLA film and PGA fibers could provide ureteral stents with good compressive and tensile properties, respectively. In bicomponent stents, high film content led to high stiffness, while high fiber content resulted in poor compressional properties. To simplify the procedures to optimize the ratio of PGLA film and PGA fiber in the stents, a hierarchical support vector regression model (HSVRM) and particle swarm optimization (PSO) algorithm were used to construct relationships between the film-to-fiber weight ratio and the measured compressional/tensile properties of the stents. The experimental data and simulated data fit well, proving that the HSVRM could closely reflect the relationship between the component ratio and performance properties of the ureteral stents. PMID:28793658

  11. Feasibility of using near infrared spectroscopy to detect and quantify an adulterant in high quality sandalwood oil

    NASA Astrophysics Data System (ADS)

    Kuriakose, Saji; Joe, I. Hubert

    2013-11-01

    Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model.

  12. Relationship between the Decomposition Process of Coarse Woody Debris and Fungal Community Structure as Detected by High-Throughput Sequencing in a Deciduous Broad-Leaved Forest in Japan

    PubMed Central

    Yamashita, Satoshi; Masuya, Hayato; Abe, Shin; Masaki, Takashi; Okabe, Kimiko

    2015-01-01

    We examined the relationship between the community structure of wood-decaying fungi, detected by high-throughput sequencing, and the decomposition rate using 13 years of data from a forest dynamics plot. For molecular analysis and wood density measurements, drill dust samples were collected from logs and stumps of Fagus and Quercus in the plot. Regression using a negative exponential model between wood density and time since death revealed that the decomposition rate of Fagus was greater than that of Quercus. The residual between the expected value obtained from the regression curve and the observed wood density was used as a decomposition rate index. Principal component analysis showed that the fungal community compositions of both Fagus and Quercus changed with time since death. Principal component analysis axis scores were used as an index of fungal community composition. A structural equation model for each wood genus was used to assess the effect of fungal community structure traits on the decomposition rate and how the fungal community structure was determined by the traits of coarse woody debris. Results of the structural equation model suggested that the decomposition rate of Fagus was affected by two fungal community composition components: one that was affected by time since death and another that was not affected by the traits of coarse woody debris. In contrast, the decomposition rate of Quercus was not affected by coarse woody debris traits or fungal community structure. These findings suggest that, in the case of Fagus coarse woody debris, the fungal community structure is related to the decomposition process of its host substrate. Because fungal community structure is affected partly by the decay stage and wood density of its substrate, these factors influence each other. Further research on interactive effects is needed to improve our understanding of the relationship between fungal community structure and the woody debris decomposition process. PMID:26110605

  13. Gender, Position of Authority, and the Risk of Depression and Posttraumatic Stress Disorder among a National Sample of U.S. Reserve Component Personnel.

    PubMed

    Cohen, Gregory H; Sampson, Laura A; Fink, David S; Wang, Jing; Russell, Dale; Gifford, Robert; Fullerton, Carol; Ursano, Robert; Galea, Sandro

    2016-01-01

    Recent U.S. military operations in Iraq and Afghanistan have seen dramatic increases in the proportion of women serving and the breadth of their occupational roles. General population studies suggest that women, compared with men, and persons with lower, as compared with higher, social position may be at greater risk of posttraumatic stress disorder (PTSD) and depression. However, these relations remain unclear in military populations. Accordingly, we aimed to estimate the effects of 1) gender, 2) military authority (i.e., rank), and 3) the interaction of gender and military authority on a) risk of most recent deployment-related PTSD and b) risk of depression since most recent deployment. Using a nationally representative sample of 1,024 previously deployed Reserve Component personnel surveyed in 2010, we constructed multivariable logistic regression models to estimate effects of interest. Weighted multivariable logistic regression models demonstrated no statistically significant associations between gender or authority, and either PTSD or depression. Interaction models demonstrated multiplicative statistical interaction between gender and authority for PTSD (beta = -2.37; p = .01), and depression (beta = -1.21; p = .057). Predicted probabilities of PTSD and depression, respectively, were lowest in male officers (0.06, 0.09), followed by male enlisted (0.07, 0.14), female enlisted (0.07, 0.15), and female officers (0.30, 0.25). Female officers in the Reserve Component may be at greatest risk for PTSD and depression after deployment, relative to their male and enlisted counterparts, and this relation is not explained by deployment trauma exposure. Future studies may fruitfully examine whether social support, family responsibilities peri-deployment, or contradictory class status may explain these findings. Copyright © 2016 Jacobs Institute of Women's Health. Published by Elsevier Inc. All rights reserved.

  14. Sequential Prediction of Literacy Achievement for Specific Learning Disabilities Contrasting in Impaired Levels of Language in Grades 4 to 9

    PubMed Central

    Sanders, Elizabeth A.; Berninger, Virginia W.; Abbott, Robert D.

    2017-01-01

    Sequential regression was used to evaluate whether language-related working memory components uniquely predict reading and writing achievement beyond cognitive-linguistic translation for students in grades 4–9 (N=103) with specific learning disabilities (SLDs) in subword handwriting (dysgraphia, n=25), word reading and spelling (dyslexia, n=60), or oral and written language (OWL LD, n=18). That is, SLDs are defined on basis of cascading level of language impairment (subword, word, and syntax/text). A 5-block regression model sequentially predicted literacy achievement from cognitive-linguistic translation (Block 1); working memory components for word form coding (Block 2), phonological and orthographic loops (Block 3), and supervisory focused or switching attention (Block4); and SLD groups (Block 5). Results showed that cognitive-linguistic translation explained an average of 27% and 15% of the variance in reading and writing achievement, respectively, but working memory components explained an additional 39% and 27% variance. Orthographic word form coding uniquely predicted nearly every measure, whereas attention switching only uniquely predicted reading. Finally, differences in reading and writing persisted between dyslexia and dysgraphia, with dysgraphia higher, even after controlling for Block 1 to 4 predictors. Differences in literacy achievement between students with dyslexia and OWL LD were largely explained by the Block 1 predictors. Applications to identifying and teaching students with these SLDs are discussed. PMID:28199175

  15. Optimal weighted combinatorial forecasting model of QT dispersion of ECGs in Chinese adults.

    PubMed

    Wen, Zhang; Miao, Ge; Xinlei, Liu; Minyi, Cen

    2016-07-01

    This study aims to provide a scientific basis for unifying the reference value standard of QT dispersion of ECGs in Chinese adults. Three predictive models including regression model, principal component model, and artificial neural network model are combined to establish the optimal weighted combination model. The optimal weighted combination model and single model are verified and compared. Optimal weighted combinatorial model can reduce predicting risk of single model and improve the predicting precision. The reference value of geographical distribution of Chinese adults' QT dispersion was precisely made by using kriging methods. When geographical factors of a particular area are obtained, the reference value of QT dispersion of Chinese adults in this area can be estimated by using optimal weighted combinatorial model and reference value of the QT dispersion of Chinese adults anywhere in China can be obtained by using geographical distribution figure as well.

  16. Impacts of land use and population density on seasonal surface water quality using a modified geographically weighted regression.

    PubMed

    Chen, Qiang; Mei, Kun; Dahlgren, Randy A; Wang, Ting; Gong, Jian; Zhang, Minghua

    2016-12-01

    As an important regulator of pollutants in overland flow and interflow, land use has become an essential research component for determining the relationships between surface water quality and pollution sources. This study investigated the use of ordinary least squares (OLS) and geographically weighted regression (GWR) models to identify the impact of land use and population density on surface water quality in the Wen-Rui Tang River watershed of eastern China. A manual variable excluding-selecting method was explored to resolve multicollinearity issues. Standard regression coefficient analysis coupled with cluster analysis was introduced to determine which variable had the greatest influence on water quality. Results showed that: (1) Impact of land use on water quality varied with spatial and seasonal scales. Both positive and negative effects for certain land-use indicators were found in different subcatchments. (2) Urban land was the dominant factor influencing N, P and chemical oxygen demand (COD) in highly urbanized regions, but the relationship was weak as the pollutants were mainly from point sources. Agricultural land was the primary factor influencing N and P in suburban and rural areas; the relationship was strong as the pollutants were mainly from agricultural surface runoff. Subcatchments located in suburban areas were identified with urban land as the primary influencing factor during the wet season while agricultural land was identified as a more prevalent influencing factor during the dry season. (3) Adjusted R 2 values in OLS models using the manual variable excluding-selecting method averaged 14.3% higher than using stepwise multiple linear regressions. However, the corresponding GWR models had adjusted R 2 ~59.2% higher than the optimal OLS models, confirming that GWR models demonstrated better prediction accuracy. Based on our findings, water resource protection policies should consider site-specific land-use conditions within each watershed to optimize mitigation strategies for contrasting land-use characteristics and seasonal variations. Copyright © 2016 Elsevier B.V. All rights reserved.

  17. Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing

    NASA Astrophysics Data System (ADS)

    Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa

    2017-02-01

    Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture—for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments—as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series—daily Poaceae pollen concentrations over the period 2006-2014—was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.

  18. Modeling pollen time series using seasonal-trend decomposition procedure based on LOESS smoothing.

    PubMed

    Rojo, Jesús; Rivero, Rosario; Romero-Morte, Jorge; Fernández-González, Federico; Pérez-Badia, Rosa

    2017-02-01

    Analysis of airborne pollen concentrations provides valuable information on plant phenology and is thus a useful tool in agriculture-for predicting harvests in crops such as the olive and for deciding when to apply phytosanitary treatments-as well as in medicine and the environmental sciences. Variations in airborne pollen concentrations, moreover, are indicators of changing plant life cycles. By modeling pollen time series, we can not only identify the variables influencing pollen levels but also predict future pollen concentrations. In this study, airborne pollen time series were modeled using a seasonal-trend decomposition procedure based on LOcally wEighted Scatterplot Smoothing (LOESS) smoothing (STL). The data series-daily Poaceae pollen concentrations over the period 2006-2014-was broken up into seasonal and residual (stochastic) components. The seasonal component was compared with data on Poaceae flowering phenology obtained by field sampling. Residuals were fitted to a model generated from daily temperature and rainfall values, and daily pollen concentrations, using partial least squares regression (PLSR). This method was then applied to predict daily pollen concentrations for 2014 (independent validation data) using results for the seasonal component of the time series and estimates of the residual component for the period 2006-2013. Correlation between predicted and observed values was r = 0.79 (correlation coefficient) for the pre-peak period (i.e., the period prior to the peak pollen concentration) and r = 0.63 for the post-peak period. Separate analysis of each of the components of the pollen data series enables the sources of variability to be identified more accurately than by analysis of the original non-decomposed data series, and for this reason, this procedure has proved to be a suitable technique for analyzing the main environmental factors influencing airborne pollen concentrations.

  19. Comparison of Regression Methods to Compute Atmospheric Pressure and Earth Tidal Coefficients in Water Level Associated with Wenchuan Earthquake of 12 May 2008

    NASA Astrophysics Data System (ADS)

    He, Anhua; Singh, Ramesh P.; Sun, Zhaohua; Ye, Qing; Zhao, Gang

    2016-07-01

    The earth tide, atmospheric pressure, precipitation and earthquake fluctuations, especially earthquake greatly impacts water well levels, thus anomalous co-seismic changes in ground water levels have been observed. In this paper, we have used four different models, simple linear regression (SLR), multiple linear regression (MLR), principal component analysis (PCA) and partial least squares (PLS) to compute the atmospheric pressure and earth tidal effects on water level. Furthermore, we have used the Akaike information criterion (AIC) to study the performance of various models. Based on the lowest AIC and sum of squares for error values, the best estimate of the effects of atmospheric pressure and earth tide on water level is found using the MLR model. However, MLR model does not provide multicollinearity between inputs, as a result the atmospheric pressure and earth tidal response coefficients fail to reflect the mechanisms associated with the groundwater level fluctuations. On the premise of solving serious multicollinearity of inputs, PLS model shows the minimum AIC value. The atmospheric pressure and earth tidal response coefficients show close response with the observation using PLS model. The atmospheric pressure and the earth tidal response coefficients are found to be sensitive to the stress-strain state using the observed data for the period 1 April-8 June 2008 of Chuan 03# well. The transient enhancement of porosity of rock mass around Chuan 03# well associated with the Wenchuan earthquake (Mw = 7.9 of 12 May 2008) that has taken its original pre-seismic level after 13 days indicates that the co-seismic sharp rise of water well could be induced by static stress change, rather than development of new fractures.

  20. Selective adsorption of flavor-active components on hydrophobic resins.

    PubMed

    Saffarionpour, Shima; Sevillano, David Mendez; Van der Wielen, Luuk A M; Noordman, T Reinoud; Brouwer, Eric; Ottens, Marcel

    2016-12-09

    This work aims to propose an optimum resin that can be used in industrial adsorption process for tuning flavor-active components or removal of ethanol for producing an alcohol-free beer. A procedure is reported for selective adsorption of volatile aroma components from water/ethanol mixtures on synthetic hydrophobic resins. High throughput 96-well microtiter-plates batch uptake experimentation is applied for screening resins for adsorption of esters (i.e. isoamyl acetate, and ethyl acetate), higher alcohols (i.e. isoamyl alcohol and isobutyl alcohol), a diketone (diacetyl) and ethanol. The miniaturized batch uptake method is adapted for adsorption of volatile components, and validated with column breakthrough analysis. The results of single-component adsorption tests on Sepabeads SP20-SS are expressed in single-component Langmuir, Freundlich, and Sips isotherm models and multi-component versions of Langmuir and Sips models are applied for expressing multi-component adsorption results obtained on several tested resins. The adsorption parameters are regressed and the selectivity over ethanol is calculated for each tested component and tested resin. Resin scores for four different scenarios of selective adsorption of esters, higher alcohols, diacetyl, and ethanol are obtained. The optimal resin for adsorption of esters is Sepabeads SP20-SS with resin score of 87% and for selective removal of higher alcohols, XAD16N, and XAD4 from Amberlite resin series are proposed with scores of 80 and 74% respectively. For adsorption of diacetyl, XAD16N and XAD4 resins with score of 86% are the optimum choice and Sepabeads SP2MGS and XAD761 resins showed the highest affinity towards ethanol. Copyright © 2016 Elsevier B.V. All rights reserved.

  1. Demonstrated Potential of Ion Mobility Spectrometry for Detection of Adulterated Perfumes and Plant Speciation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Clark, Jared Matthew; Daum, Keith Alvin; Kalival, J. H.

    2003-01-01

    This initial study evaluates the use of ion mobility spectrometry (IMS) as a rapid test procedure for potential detection of adulterated perfumes and speciation of plant life. Sample types measured consist of five genuine perfumes, two species of sagebrush, and four species of flowers. Each sample type is treated as a separate classification problem. It is shown that discrimination using principal component analysis with K-nearest neighbors can distinguish one class from another. Discriminatory models generated using principal component regressions are not as effective. Results from this examination are encouraging and represent an initial phase demonstrating that perfumes and plants possessmore » characteristic chemical signatures that can be used for reliable identification.« less

  2. Long-term effects of psychosocial work stress in midlife on health functioning after labor market exit--results from the GAZEL study.

    PubMed

    Wahrendorf, Morten; Sembajwe, Grace; Zins, Marie; Berkman, Lisa; Goldberg, Marcel; Siegrist, Johannes

    2012-07-01

    To study long-term effects of psychosocial work stress in mid-life on health functioning after labor market exit using two established work stress models. In the frame of the prospective French Gazel cohort study, data on psychosocial work stress were assessed using the full questionnaires measuring the demand-control-support model (in 1997 and 1999) and the effort-reward imbalance model (in 1998). In 2007, health functioning was assessed, using the Short Form 36 mental and physical component scores. Multivariate regressions were calculated to predict health functioning in 2007, controlling for age, gender, social position, and baseline self-perceived health. Consistent effects of both work stress models and their single components on mental and physical health functioning during retirement were observed. Effects remained significant after adjustment including baseline self-perceived health. Whereas the predictive power of both work stress models was similar in the case of the physical composite score, in the case of the mental health score, values of model fit were slightly higher for the effort-reward imbalance model (R(2): 0.13) compared with the demand-control model (R²: 0.11). Findings underline the importance of working conditions in midlife not only for health in midlife but also for health functioning after labor market exit.

  3. Multiplicative Multitask Feature Learning

    PubMed Central

    Wang, Xin; Bi, Jinbo; Yu, Shipeng; Sun, Jiangwen; Song, Minghu

    2016-01-01

    We investigate a general framework of multiplicative multitask feature learning which decomposes individual task’s model parameters into a multiplication of two components. One of the components is used across all tasks and the other component is task-specific. Several previous methods can be proved to be special cases of our framework. We study the theoretical properties of this framework when different regularization conditions are applied to the two decomposed components. We prove that this framework is mathematically equivalent to the widely used multitask feature learning methods that are based on a joint regularization of all model parameters, but with a more general form of regularizers. Further, an analytical formula is derived for the across-task component as related to the task-specific component for all these regularizers, leading to a better understanding of the shrinkage effects of different regularizers. Study of this framework motivates new multitask learning algorithms. We propose two new learning formulations by varying the parameters in the proposed framework. An efficient blockwise coordinate descent algorithm is developed suitable for solving the entire family of formulations with rigorous convergence analysis. Simulation studies have identified the statistical properties of data that would be in favor of the new formulations. Extensive empirical studies on various classification and regression benchmark data sets have revealed the relative advantages of the two new formulations by comparing with the state of the art, which provides instructive insights into the feature learning problem with multiple tasks. PMID:28428735

  4. Vestibular schwannomas: Accuracy of tumor volume estimated by ice cream cone formula using thin-sliced MR images.

    PubMed

    Ho, Hsing-Hao; Li, Ya-Hui; Lee, Jih-Chin; Wang, Chih-Wei; Yu, Yi-Lin; Hueng, Dueng-Yuan; Ma, Hsin-I; Hsu, Hsian-He; Juan, Chun-Jung

    2018-01-01

    We estimated the volume of vestibular schwannomas by an ice cream cone formula using thin-sliced magnetic resonance images (MRI) and compared the estimation accuracy among different estimating formulas and between different models. The study was approved by a local institutional review board. A total of 100 patients with vestibular schwannomas examined by MRI between January 2011 and November 2015 were enrolled retrospectively. Informed consent was waived. Volumes of vestibular schwannomas were estimated by cuboidal, ellipsoidal, and spherical formulas based on a one-component model, and cuboidal, ellipsoidal, Linskey's, and ice cream cone formulas based on a two-component model. The estimated volumes were compared to the volumes measured by planimetry. Intraobserver reproducibility and interobserver agreement was tested. Estimation error, including absolute percentage error (APE) and percentage error (PE), was calculated. Statistical analysis included intraclass correlation coefficient (ICC), linear regression analysis, one-way analysis of variance, and paired t-tests with P < 0.05 considered statistically significant. Overall tumor size was 4.80 ± 6.8 mL (mean ±standard deviation). All ICCs were no less than 0.992, suggestive of high intraobserver reproducibility and high interobserver agreement. Cuboidal formulas significantly overestimated the tumor volume by a factor of 1.9 to 2.4 (P ≤ 0.001). The one-component ellipsoidal and spherical formulas overestimated the tumor volume with an APE of 20.3% and 29.2%, respectively. The two-component ice cream cone method, and ellipsoidal and Linskey's formulas significantly reduced the APE to 11.0%, 10.1%, and 12.5%, respectively (all P < 0.001). The ice cream cone method and other two-component formulas including the ellipsoidal and Linskey's formulas allow for estimation of vestibular schwannoma volume more accurately than all one-component formulas.

  5. Data mining: Potential applications in research on nutrition and health.

    PubMed

    Batterham, Marijka; Neale, Elizabeth; Martin, Allison; Tapsell, Linda

    2017-02-01

    Data mining enables further insights from nutrition-related research, but caution is required. The aim of this analysis was to demonstrate and compare the utility of data mining methods in classifying a categorical outcome derived from a nutrition-related intervention. Baseline data (23 variables, 8 categorical) on participants (n = 295) in an intervention trial were used to classify participants in terms of meeting the criteria of achieving 10 000 steps per day. Results from classification and regression trees (CARTs), random forests, adaptive boosting, logistic regression, support vector machines and neural networks were compared using area under the curve (AUC) and error assessments. The CART produced the best model when considering the AUC (0.703), overall error (18%) and within class error (28%). Logistic regression also performed reasonably well compared to the other models (AUC 0.675, overall error 23%, within class error 36%). All the methods gave different rankings of variables' importance. CART found that body fat, quality of life using the SF-12 Physical Component Summary (PCS) and the cholesterol: HDL ratio were the most important predictors of meeting the 10 000 steps criteria, while logistic regression showed the SF-12PCS, glucose levels and level of education to be the most significant predictors (P ≤ 0.01). Differing outcomes suggest caution is required with a single data mining method, particularly in a dataset with nonlinear relationships and outliers and when exploring relationships that were not the primary outcomes of the research. © 2017 Dietitians Association of Australia.

  6. Predicting fundamental and realized distributions based on thermal niche: A case study of a freshwater turtle

    NASA Astrophysics Data System (ADS)

    Rodrigues, João Fabrício Mota; Coelho, Marco Túlio Pacheco; Ribeiro, Bruno R.

    2018-04-01

    Species distribution models (SDM) have been broadly used in ecology to address theoretical and practical problems. Currently, there are two main approaches to generate SDMs: (i) correlative, which is based on species occurrences and environmental predictor layers and (ii) process-based models, which are constructed based on species' functional traits and physiological tolerances. The distributions estimated by each approach are based on different components of species niche. Predictions of correlative models approach species realized niches, while predictions of process-based are more akin to species fundamental niche. Here, we integrated the predictions of fundamental and realized distributions of the freshwater turtle Trachemys dorbigni. Fundamental distribution was estimated using data of T. dorbigni's egg incubation temperature, and realized distribution was estimated using species occurrence records. Both types of distributions were estimated using the same regression approaches (logistic regression and support vector machines), both considering macroclimatic and microclimatic temperatures. The realized distribution of T. dorbigni was generally nested in its fundamental distribution reinforcing theoretical assumptions that the species' realized niche is a subset of its fundamental niche. Both modelling algorithms produced similar results but microtemperature generated better results than macrotemperature for the incubation model. Finally, our results reinforce the conclusion that species realized distributions are constrained by other factors other than just thermal tolerances.

  7. Towards molecular design using 2D-molecular contour maps obtained from PLS regression coefficients

    NASA Astrophysics Data System (ADS)

    Borges, Cleber N.; Barigye, Stephen J.; Freitas, Matheus P.

    2017-12-01

    The multivariate image analysis descriptors used in quantitative structure-activity relationships are direct representations of chemical structures as they are simply numerical decodifications of pixels forming the 2D chemical images. These MDs have found great utility in the modeling of diverse properties of organic molecules. Given the multicollinearity and high dimensionality of the data matrices generated with the MIA-QSAR approach, modeling techniques that involve the projection of the data space onto orthogonal components e.g. Partial Least Squares (PLS) have been generally used. However, the chemical interpretation of the PLS-based MIA-QSAR models, in terms of the structural moieties affecting the modeled bioactivity has not been straightforward. This work describes the 2D-contour maps based on the PLS regression coefficients, as a means of assessing the relevance of single MIA predictors to the response variable, and thus allowing for the structural, electronic and physicochemical interpretation of the MIA-QSAR models. A sample study to demonstrate the utility of the 2D-contour maps to design novel drug-like molecules is performed using a dataset of some anti-HIV-1 2-amino-6-arylsulfonylbenzonitriles and derivatives, and the inferences obtained are consistent with other reports in the literature. In addition, the different schemes for encoding atomic properties in molecules are discussed and evaluated.

  8. Spectral multivariate calibration without laboratory prepared or determined reference analyte values.

    PubMed

    Ottaway, Josh; Farrell, Jeremy A; Kalivas, John H

    2013-02-05

    An essential part to calibration is establishing the analyte calibration reference samples. These samples must characterize the sample matrix and measurement conditions (chemical, physical, instrumental, and environmental) of any sample to be predicted. Calibration usually requires measuring spectra for numerous reference samples in addition to determining the corresponding analyte reference values. Both tasks are typically time-consuming and costly. This paper reports on a method named pure component Tikhonov regularization (PCTR) that does not require laboratory prepared or determined reference values. Instead, an analyte pure component spectrum is used in conjunction with nonanalyte spectra for calibration. Nonanalyte spectra can be from different sources including pure component interference samples, blanks, and constant analyte samples. The approach is also applicable to calibration maintenance when the analyte pure component spectrum is measured in one set of conditions and nonanalyte spectra are measured in new conditions. The PCTR method balances the trade-offs between calibration model shrinkage and the degree of orthogonality to the nonanalyte content (model direction) in order to obtain accurate predictions. Using visible and near-infrared (NIR) spectral data sets, the PCTR results are comparable to those obtained using ridge regression (RR) with reference calibration sets. The flexibility of PCTR also allows including reference samples if such samples are available.

  9. Multicollinearity in spatial genetics: separating the wheat from the chaff using commonality analyses.

    PubMed

    Prunier, J G; Colyn, M; Legendre, X; Nimon, K F; Flamand, M C

    2015-01-01

    Direct gradient analyses in spatial genetics provide unique opportunities to describe the inherent complexity of genetic variation in wildlife species and are the object of many methodological developments. However, multicollinearity among explanatory variables is a systemic issue in multivariate regression analyses and is likely to cause serious difficulties in properly interpreting results of direct gradient analyses, with the risk of erroneous conclusions, misdirected research and inefficient or counterproductive conservation measures. Using simulated data sets along with linear and logistic regressions on distance matrices, we illustrate how commonality analysis (CA), a detailed variance-partitioning procedure that was recently introduced in the field of ecology, can be used to deal with nonindependence among spatial predictors. By decomposing model fit indices into unique and common (or shared) variance components, CA allows identifying the location and magnitude of multicollinearity, revealing spurious correlations and thus thoroughly improving the interpretation of multivariate regressions. Despite a few inherent limitations, especially in the case of resistance model optimization, this review highlights the great potential of CA to account for complex multicollinearity patterns in spatial genetics and identifies future applications and lines of research. We strongly urge spatial geneticists to systematically investigate commonalities when performing direct gradient analyses. © 2014 John Wiley & Sons Ltd.

  10. Novel associations between contaminant body burdens and biomarkers of reproductive condition in male Common Carp along multiple gradients of contaminant exposure in Lake Mead National Recreation Area, USA

    USGS Publications Warehouse

    Patino, Reynaldo; VanLandeghem, Matthew M.; Goodbred, Steven L.; Orsak, Erik; Jenkins, Jill A.; Echols, Kathy R.; Rosen, Michael R.; Torres, Leticia

    2015-01-01

    Adult male Common Carp were sampled in 2007/08 over a full reproductive cycle at Lake Mead National Recreation Area. Sites sampled included a stream dominated by treated wastewater effluent, a lake basin receiving the streamflow, an upstream lake basin (reference), and a site below Hoover Dam. Individual body burdens for 252 contaminants were measured, and biological variables assessed included physiological [plasma vitellogenin (VTG), estradiol-17β (E2), 11-ketotestosterone (11KT)] and organ [gonadosomatic index (GSI)] endpoints. Patterns in contaminant composition and biological condition were determined by Principal Component Analysis, and their associations modeled by Principal Component Regression. Three spatially distinct but temporally stable gradients of contaminant distribution were recognized: a contaminant mixture typical of wastewaters (PBDEs, methyl triclosan, galaxolide), PCBs, and DDTs. Two spatiotemporally variable patterns of biological condition were recognized: a primary pattern consisting of reproductive condition variables (11KT, E2, GSI), and a secondary pattern including general condition traits (condition factor, hematocrit, fork length). VTG was low in all fish, indicating low estrogenic activity of water at all sites. Wastewater contaminants associated negatively with GSI, 11KT and E2; PCBs associated negatively with GSI and 11KT; and DDTs associated positively with GSI and 11KT. Regression of GSI on sex steroids revealed a novel, nonlinear association between these variables. Inclusion of sex steroids in the GSI regression on contaminants rendered wastewater contaminants nonsignificant in the model and reduced the influence of PCBs and DDTs. Thus, the influence of contaminants on GSI may have been partially driven by organismal modes-of-action that include changes in sex steroid production. The positive association of DDTs with 11KT and GSI suggests that lifetime, sub-lethal exposures to DDTs have effects on male carp opposite of those reported by studies where exposure concentrations were relatively high. Lastly, this study highlighted advantages of multivariate/multiple regression approaches for exploring associations between complex contaminant mixtures and gradients and reproductive condition in wild fishes.

  11. Novel associations between contaminant body burdens and biomarkers of reproductive condition in male Common Carp along multiple gradients of contaminant exposure in Lake Mead National Recreation Area, USA.

    PubMed

    Patiño, Reynaldo; VanLandeghem, Matthew M; Goodbred, Steven L; Orsak, Erik; Jenkins, Jill A; Echols, Kathy; Rosen, Michael R; Torres, Leticia

    2015-08-01

    Adult male Common Carp were sampled in 2007/08 over a full reproductive cycle at Lake Mead National Recreation Area. Sites sampled included a stream dominated by treated wastewater effluent, a lake basin receiving the streamflow, an upstream lake basin (reference), and a site below Hoover Dam. Individual body burdens for 252 contaminants were measured, and biological variables assessed included physiological [plasma vitellogenin (VTG), estradiol-17β (E2), 11-ketotestosterone (11KT)] and organ [gonadosomatic index (GSI)] endpoints. Patterns in contaminant composition and biological condition were determined by Principal Component Analysis, and their associations modeled by Principal Component Regression. Three spatially distinct but temporally stable gradients of contaminant distribution were recognized: a contaminant mixture typical of wastewaters (PBDEs, methyl triclosan, galaxolide), PCBs, and DDTs. Two spatiotemporally variable patterns of biological condition were recognized: a primary pattern consisting of reproductive condition variables (11KT, E2, GSI), and a secondary pattern including general condition traits (condition factor, hematocrit, fork length). VTG was low in all fish, indicating low estrogenic activity of water at all sites. Wastewater contaminants associated negatively with GSI, 11KT and E2; PCBs associated negatively with GSI and 11KT; and DDTs associated positively with GSI and 11KT. Regression of GSI on sex steroids revealed a novel, nonlinear association between these variables. Inclusion of sex steroids in the GSI regression on contaminants rendered wastewater contaminants nonsignificant in the model and reduced the influence of PCBs and DDTs. Thus, the influence of contaminants on GSI may have been partially driven by organismal modes-of-action that include changes in sex steroid production. The positive association of DDTs with 11KT and GSI suggests that lifetime, sub-lethal exposures to DDTs have effects on male carp opposite of those reported by studies where exposure concentrations were relatively high. Lastly, this study highlighted advantages of multivariate/multiple regression approaches for exploring associations between complex contaminant mixtures and gradients and reproductive condition in wild fishes. Published by Elsevier Inc.

  12. Sociodemographic Factors Associated With Changes in Successful Aging in Spain: A Follow-Up Study.

    PubMed

    Domènech-Abella, Joan; Perales, Jaime; Lara, Elvira; Moneta, Maria Victoria; Izquierdo, Ana; Rico-Uribe, Laura Alejandra; Mundó, Jordi; Haro, Josep Maria

    2017-06-01

    Successful aging (SA) refers to maintaining well-being in old age. Several definitions or models of SA exist (biomedical, psychosocial, and mixed). We examined the longitudinal association between various SA models and sociodemographic factors, and analyzed the patterns of change within these models. This was a nationally representative follow-up in Spain including 3,625 individuals aged ≥50 years. Some 1,970 individuals were interviewed after 3 years. Linear regression models were used to analyze the survey data. Age, sex, and occupation predicted SA in the biomedical model, while marital status, educational level, and urbanicity predicted SA in the psychosocial model. The remaining models included different sets of these predictors as significant. In the psychosocial model, individuals tended to improve over time but this was not the case in the biomedical model. The biomedical and psychosocial components of SA need to be addressed specifically to achieve the best aging trajectories.

  13. Crop area estimation based on remotely-sensed data with an accurate but costly subsample

    NASA Technical Reports Server (NTRS)

    Gunst, R. F.

    1985-01-01

    Research activities conducted under the auspices of National Aeronautics and Space Administration Cooperative Agreement NCC 9-9 are discussed. During this contract period research efforts are concentrated in two primary areas. The first are is an investigation of the use of measurement error models as alternatives to least squares regression estimators of crop production or timber biomass. The secondary primary area of investigation is on the estimation of the mixing proportion of two-component mixture models. This report lists publications, technical reports, submitted manuscripts, and oral presentation generated by these research efforts. Possible areas of future research are mentioned.

  14. Machine learning modeling of plant phenology based on coupling satellite and gridded meteorological dataset

    NASA Astrophysics Data System (ADS)

    Czernecki, Bartosz; Nowosad, Jakub; Jabłońska, Katarzyna

    2018-04-01

    Changes in the timing of plant phenological phases are important proxies in contemporary climate research. However, most of the commonly used traditional phenological observations do not give any coherent spatial information. While consistent spatial data can be obtained from airborne sensors and preprocessed gridded meteorological data, not many studies robustly benefit from these data sources. Therefore, the main aim of this study is to create and evaluate different statistical models for reconstructing, predicting, and improving quality of phenological phases monitoring with the use of satellite and meteorological products. A quality-controlled dataset of the 13 BBCH plant phenophases in Poland was collected for the period 2007-2014. For each phenophase, statistical models were built using the most commonly applied regression-based machine learning techniques, such as multiple linear regression, lasso, principal component regression, generalized boosted models, and random forest. The quality of the models was estimated using a k-fold cross-validation. The obtained results showed varying potential for coupling meteorological derived indices with remote sensing products in terms of phenological modeling; however, application of both data sources improves models' accuracy from 0.6 to 4.6 day in terms of obtained RMSE. It is shown that a robust prediction of early phenological phases is mostly related to meteorological indices, whereas for autumn phenophases, there is a stronger information signal provided by satellite-derived vegetation metrics. Choosing a specific set of predictors and applying a robust preprocessing procedures is more important for final results than the selection of a particular statistical model. The average RMSE for the best models of all phenophases is 6.3, while the individual RMSE vary seasonally from 3.5 to 10 days. Models give reliable proxy for ground observations with RMSE below 5 days for early spring and late spring phenophases. For other phenophases, RMSE are higher and rise up to 9-10 days in the case of the earliest spring phenophases.

  15. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    NASA Astrophysics Data System (ADS)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  16. What is the role of "nonorganic somatic components" in functional capacity evaluations in patients with chronic nonspecific low back pain undergoing fitness for work evaluation?

    PubMed

    Oesch, Peter; Meyer, Kathrin; Jansen, Beatrice; Mowinckel, Petter; Bachmann, Stefan; Hagen, Kare Birger

    2012-02-15

    Analytical cross-sectional study. To assess the association of "nonorganic somatic components" together with physical and other psychosocial factors on functional capacity evaluation (FCE) in patients with chronic nonspecific low back pain (NSLBP) undergoing fitness-for-work evaluation. Functional capacity evaluation is increasingly used for physical fitness-for-work evaluation in patients with chronic NSLBP, but results seem to be influenced by physical as well as psychosocial factors. The influence of nonorganic somatic components together with physical and other psychosocial factors on FCE performance has not yet been investigated. One hundred twenty-six patients with chronic NSLBP referred for physical fitness-for-work evaluation were included. The 4 FCE tests were lifting from floor to waist, forward bend standing, grip strength, and 6-minute walking. Nonorganic somatic components were assessed with the 8 nonorganic somatic signs as defined by Waddell and were adjusted for age, sex, days off work, salary in the previous occupation, pain intensity, fear avoidance belief, and perceived functional ability in multivariate regression analyses. Between 42% and 58% of the variation in the FCE tests was explained in the final multivariate regression models. Nonorganic somatic components were consistent independent predictors for all tests. Their influence was most important on forward bend standing and walking distance, and less on grip strength and lifting performance. The physical factors of age and/or sex were strongly associated with grip strength and lifting, less with walking distance, and not at all with forward bend standing. The influence of at least 1 other psychosocial factor was observed in all FCE tests, having the highest proportion in the 6-minute walking test. Nonorganic somatic components seem to be consistent independent predictors in FCE testing and should be considered for interpretation of test results.

  17. SU-E-J-244: Development and Validation of a Knowledge Based Planning Model for External Beam Radiation Therapy of Locally Advanced Non-Small Cell Lung Cancer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Liu, Z; Kennedy, A; Larsen, E

    2015-06-15

    Purpose: The study aims to develop and validate a knowledge based planning (KBP) model for external beam radiation therapy of locally advanced non-small cell lung cancer (LA-NSCLC). Methods: RapidPlan™ technology was used to develop a lung KBP model. Plans from 65 patients with LA-NSCLC were used to train the model. 25 patients were treated with VMAT, and the other patients were treated with IMRT. Organs-at-risk (OARs) included right lung, left lung, heart, esophagus, and spinal cord. DVH and geometric distribution DVH were extracted from the treated plans. The model was trained using principal component analysis and step-wise multiple regression. Boxmore » plot and regression plot tools were used to identify geometric outliers and dosimetry outliers and help fine-tune the model. The validation was performed by (a) comparing predicted DVH boundaries to actual DVHs of 63 patients and (b) using an independent set of treatment planning data. Results: 63 out of 65 plans were included in the final KBP model with PTV volume ranging from 102.5cc to 1450.2cc. Total treatment dose prescription varied from 50Gy to 70Gy based on institutional guidelines. One patient was excluded due to geometric outlier where 2.18cc of spinal cord was included in PTV. The other patient was excluded due to dosimetric outlier where the dose sparing to spinal cord was heavily enforced in the clinical plan. Target volume, OAR volume, OAR overlap volume percentage to target, and OAR out-of-field volume were included in the trained model. Lungs and heart had two principal component scores of GEDVH, whereas spinal cord and esophagus had three in the final model. Predicted DVH band (mean ±1 standard deviation) represented 66.2±3.6% of all DVHs. Conclusion: A KBP model was developed and validated for radiotherapy of LA-NSCLC in a commercial treatment planning system. The clinical implementation may improve the consistency of IMRT/VMAT planning.« less

  18. Correcting for population structure and kinship using the linear mixed model: theory and extensions.

    PubMed

    Hoffman, Gabriel E

    2013-01-01

    Population structure and kinship are widespread confounding factors in genome-wide association studies (GWAS). It has been standard practice to include principal components of the genotypes in a regression model in order to account for population structure. More recently, the linear mixed model (LMM) has emerged as a powerful method for simultaneously accounting for population structure and kinship. The statistical theory underlying the differences in empirical performance between modeling principal components as fixed versus random effects has not been thoroughly examined. We undertake an analysis to formalize the relationship between these widely used methods and elucidate the statistical properties of each. Moreover, we introduce a new statistic, effective degrees of freedom, that serves as a metric of model complexity and a novel low rank linear mixed model (LRLMM) to learn the dimensionality of the correction for population structure and kinship, and we assess its performance through simulations. A comparison of the results of LRLMM and a standard LMM analysis applied to GWAS data from the Multi-Ethnic Study of Atherosclerosis (MESA) illustrates how our theoretical results translate into empirical properties of the mixed model. Finally, the analysis demonstrates the ability of the LRLMM to substantially boost the strength of an association for HDL cholesterol in Europeans.

  19. Dynamic sensitivity analysis of long running landslide models through basis set expansion and meta-modelling

    NASA Astrophysics Data System (ADS)

    Rohmer, Jeremy

    2016-04-01

    Predicting the temporal evolution of landslides is typically supported by numerical modelling. Dynamic sensitivity analysis aims at assessing the influence of the landslide properties on the time-dependent predictions (e.g., time series of landslide displacements). Yet two major difficulties arise: 1. Global sensitivity analysis require running the landslide model a high number of times (> 1000), which may become impracticable when the landslide model has a high computation time cost (> several hours); 2. Landslide model outputs are not scalar, but function of time, i.e. they are n-dimensional vectors with n usually ranging from 100 to 1000. In this article, I explore the use of a basis set expansion, such as principal component analysis, to reduce the output dimensionality to a few components, each of them being interpreted as a dominant mode of variation in the overall structure of the temporal evolution. The computationally intensive calculation of the Sobol' indices for each of these components are then achieved through meta-modelling, i.e. by replacing the landslide model by a "costless-to-evaluate" approximation (e.g., a projection pursuit regression model). The methodology combining "basis set expansion - meta-model - Sobol' indices" is then applied to the La Frasse landslide to investigate the dynamic sensitivity analysis of the surface horizontal displacements to the slip surface properties during the pore pressure changes. I show how to extract information on the sensitivity of each main modes of temporal behaviour using a limited number (a few tens) of long running simulations. In particular, I identify the parameters, which trigger the occurrence of a turning point marking a shift between a regime of low values of landslide displacements and one of high values.

  20. Multicollinearity in prognostic factor analyses using the EORTC QLQ-C30: identification and impact on model selection.

    PubMed

    Van Steen, Kristel; Curran, Desmond; Kramer, Jocelyn; Molenberghs, Geert; Van Vreckem, Ann; Bottomley, Andrew; Sylvester, Richard

    2002-12-30

    Clinical and quality of life (QL) variables from an EORTC clinical trial of first line chemotherapy in advanced breast cancer were used in a prognostic factor analysis of survival and response to chemotherapy. For response, different final multivariate models were obtained from forward and backward selection methods, suggesting a disconcerting instability. Quality of life was measured using the EORTC QLQ-C30 questionnaire completed by patients. Subscales on the questionnaire are known to be highly correlated, and therefore it was hypothesized that multicollinearity contributed to model instability. A correlation matrix indicated that global QL was highly correlated with 7 out of 11 variables. In a first attempt to explore multicollinearity, we used global QL as dependent variable in a regression model with other QL subscales as predictors. Afterwards, standard diagnostic tests for multicollinearity were performed. An exploratory principal components analysis and factor analysis of the QL subscales identified at most three important components and indicated that inclusion of global QL made minimal difference to the loadings on each component, suggesting that it is redundant in the model. In a second approach, we advocate a bootstrap technique to assess the stability of the models. Based on these analyses and since global QL exacerbates problems of multicollinearity, we therefore recommend that global QL be excluded from prognostic factor analyses using the QLQ-C30. The prognostic factor analysis was rerun without global QL in the model, and selected the same significant prognostic factors as before. Copyright 2002 John Wiley & Sons, Ltd.

Top