Sample records for squares multiple regression

  1. Identifying maternal and infant factors associated with newborn size in rural Bangladesh by partial least squares (PLS) regression analysis

    PubMed Central

    Rahman, Md. Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D. W.; Labrique, Alain B.; Rashid, Mahbubur; Christian, Parul; West, Keith P.

    2017-01-01

    Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 − -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset. PMID:29261760

  2. Identifying maternal and infant factors associated with newborn size in rural Bangladesh by partial least squares (PLS) regression analysis.

    PubMed

    Kabir, Alamgir; Rahman, Md Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D W; Labrique, Alain B; Rashid, Mahbubur; Christian, Parul; West, Keith P

    2017-01-01

    Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 - -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset.

  3. Investigating bias in squared regression structure coefficients

    PubMed Central

    Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce

    2015-01-01

    The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients. PMID:26217273

  4. Using multiple calibration sets to improve the quantitative accuracy of partial least squares (PLS) regression on open-path fourier transform infrared (OP/FT-IR) spectra of ammonia over wide concentration ranges

    USDA-ARS?s Scientific Manuscript database

    A technique of using multiple calibration sets in partial least squares regression (PLS) was proposed to improve the quantitative determination of ammonia from open-path Fourier transform infrared spectra. The spectra were measured near animal farms, and the path-integrated concentration of ammonia...

  5. Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient.

    ERIC Educational Resources Information Center

    Algina, James; Olejnik, Stephen

    2000-01-01

    Discusses determining sample size for estimation of the squared multiple correlation coefficient and presents regression equations that permit determination of the sample size for estimating this parameter for up to 20 predictor variables. (SLD)

  6. Application of Partial Least Square (PLS) Regression to Determine Landscape-Scale Aquatic Resources Vulnerability in the Ozark Mountains

    EPA Science Inventory

    Partial least squares (PLS) analysis offers a number of advantages over the more traditionally used regression analyses applied in landscape ecology, particularly for determining the associations among multiple constituents of surface water and landscape configuration. Common dat...

  7. Use of Empirical Estimates of Shrinkage in Multiple Regression: A Caution.

    ERIC Educational Resources Information Center

    Kromrey, Jeffrey D.; Hines, Constance V.

    1995-01-01

    The accuracy of four empirical techniques to estimate shrinkage in multiple regression was studied through Monte Carlo simulation. None of the techniques provided unbiased estimates of the population squared multiple correlation coefficient, but the normalized jackknife and bootstrap techniques demonstrated marginally acceptable performance with…

  8. Enhance-Synergism and Suppression Effects in Multiple Regression

    ERIC Educational Resources Information Center

    Lipovetsky, Stan; Conklin, W. Michael

    2004-01-01

    Relations between pairwise correlations and the coefficient of multiple determination in regression analysis are considered. The conditions for the occurrence of enhance-synergism and suppression effects when multiple determination becomes bigger than the total of squared correlations of the dependent variable with the regressors are discussed. It…

  9. Use of Thematic Mapper for water quality assessment

    NASA Technical Reports Server (NTRS)

    Horn, E. M.; Morrissey, L. A.

    1984-01-01

    The evaluation of simulated TM data obtained on an ER-2 aircraft at twenty-five predesignated sample sites for mapping water quality factors such as conductivity, pH, suspended solids, turbidity, temperature, and depth, is discussed. Using a multiple regression for the seven TM bands, an equation is developed for the suspended solids. TM bands 1, 2, 3, 4, and 6 are used with logarithm conductivity in a multiple regression. The assessment of regression equations for a high coefficient of determination (R-squared) and statistical significance is considered. Confidence intervals about the mean regression point are calculated in order to assess the robustness of the regressions used for mapping conductivity, turbidity, and suspended solids, and by regressing random subsamples of sites and comparing the resultant range of R-squared, cross validation is conducted.

  10. Confidence Intervals for Squared Semipartial Correlation Coefficients: The Effect of Nonnormality

    ERIC Educational Resources Information Center

    Algina, James; Keselman, H. J.; Penfield, Randall D.

    2010-01-01

    The increase in the squared multiple correlation coefficient ([delta]R[superscript 2]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. Algina, Keselman, and Penfield found that intervals based on asymptotic principles were typically very inaccurate, even though the sample size…

  11. Ridge: a computer program for calculating ridge regression estimates

    Treesearch

    Donald E. Hilt; Donald W. Seegrist

    1977-01-01

    Least-squares coefficients for multiple-regression models may be unstable when the independent variables are highly correlated. Ridge regression is a biased estimation procedure that produces stable estimates of the coefficients. Ridge regression is discussed, and a computer program for calculating the ridge coefficients is presented.

  12. A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants

    ERIC Educational Resources Information Center

    Cooper, Paul D.

    2010-01-01

    A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…

  13. Weighted regression analysis and interval estimators

    Treesearch

    Donald W. Seegrist

    1974-01-01

    A method for deriving the weighted least squares estimators for the parameters of a multiple regression model. Confidence intervals for expected values, and prediction intervals for the means of future samples are given.

  14. The Multivariate Regression Statistics Strategy to Investigate Content-Effect Correlation of Multiple Components in Traditional Chinese Medicine Based on a Partial Least Squares Method.

    PubMed

    Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun

    2018-03-01

    Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.

  15. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    NASA Astrophysics Data System (ADS)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  16. Comparing methods of analysing datasets with small clusters: case studies using four paediatric datasets.

    PubMed

    Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil

    2009-07-01

    Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.

  17. A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield

    NASA Astrophysics Data System (ADS)

    Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan

    2018-04-01

    In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.

  18. The Overall Odds Ratio as an Intuitive Effect Size Index for Multiple Logistic Regression: Examination of Further Refinements

    ERIC Educational Resources Information Center

    Le, Huy; Marcus, Justin

    2012-01-01

    This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…

  19. Multiple concurrent recursive least squares identification with application to on-line spacecraft mass-property identification

    NASA Technical Reports Server (NTRS)

    Wilson, Edward (Inventor)

    2006-01-01

    The present invention is a method for identifying unknown parameters in a system having a set of governing equations describing its behavior that cannot be put into regression form with the unknown parameters linearly represented. In this method, the vector of unknown parameters is segmented into a plurality of groups where each individual group of unknown parameters may be isolated linearly by manipulation of said equations. Multiple concurrent and independent recursive least squares identification of each said group run, treating other unknown parameters appearing in their regression equation as if they were known perfectly, with said values provided by recursive least squares estimation from the other groups, thereby enabling the use of fast, compact, efficient linear algorithms to solve problems that would otherwise require nonlinear solution approaches. This invention is presented with application to identification of mass and thruster properties for a thruster-controlled spacecraft.

  20. Comparison of Different Shrinkage Formulas in Estimating Population Multiple Correlation Coefficients.

    ERIC Educational Resources Information Center

    Carter, David S.

    1979-01-01

    There are a variety of formulas for reducing the positive bias which occurs in estimating R squared in multiple regression or correlation equations. Five different formulas are evaluated in a Monte Carlo study, and recommendations are made. (JKS)

  1. Fast determination of total ginsenosides content in ginseng powder by near infrared reflectance spectroscopy

    NASA Astrophysics Data System (ADS)

    Chen, Hua-cai; Chen, Xing-dan; Lu, Yong-jun; Cao, Zhi-qiang

    2006-01-01

    Near infrared (NIR) reflectance spectroscopy was used to develop a fast determination method for total ginsenosides in Ginseng (Panax Ginseng) powder. The spectra were analyzed with multiplicative signal correction (MSC) correlation method. The best correlative spectra region with the total ginsenosides content was 1660 nm~1880 nm and 2230nm~2380 nm. The NIR calibration models of ginsenosides were built with multiple linear regression (MLR), principle component regression (PCR) and partial least squares (PLS) regression respectively. The results showed that the calibration model built with PLS combined with MSC and the optimal spectrum region was the best one. The correlation coefficient and the root mean square error of correction validation (RMSEC) of the best calibration model were 0.98 and 0.15% respectively. The optimal spectrum region for calibration was 1204nm~2014nm. The result suggested that using NIR to rapidly determinate the total ginsenosides content in ginseng powder were feasible.

  2. Soil Cd, Cr, Cu, Ni, Pb and Zn sorption and retention models using SVM: Variable selection and competitive model.

    PubMed

    González Costa, J J; Reigosa, M J; Matías, J M; Covelo, E F

    2017-09-01

    The aim of this study was to model the sorption and retention of Cd, Cu, Ni, Pb and Zn in soils. To that extent, the sorption and retention of these metals were studied and the soil characterization was performed separately. Multiple stepwise regression was used to produce multivariate models with linear techniques and with support vector machines, all of which included 15 explanatory variables characterizing soils. When the R-squared values are represented, two different groups are noticed. Cr, Cu and Pb sorption and retention show a higher R-squared; the most explanatory variables being humified organic matter, Al oxides and, in some cases, cation-exchange capacity (CEC). The other group of metals (Cd, Ni and Zn) shows a lower R-squared, and clays are the most explanatory variables, including a percentage of vermiculite and slime. In some cases, quartz, plagioclase or hematite percentages also show some explanatory capacity. Support Vector Machine (SVM) regression shows that the different models are not as regular as in multiple regression in terms of number of variables, the regression for nickel adsorption being the one with the highest number of variables in its optimal model. On the other hand, there are cases where the most explanatory variables are the same for two metals, as it happens with Cd and Cr adsorption. A similar adsorption mechanism is thus postulated. These patterns of the introduction of variables in the model allow us to create explainability sequences. Those which are the most similar to the selectivity sequences obtained by Covelo (2005) are Mn oxides in multiple regression and change capacity in SVM. Among all the variables, the only one that is explanatory for all the metals after applying the maximum parsimony principle is the percentage of sand in the retention process. In the competitive model arising from the aforementioned sequences, the most intense competitiveness for the adsorption and retention of different metals appears between Cr and Cd, Cu and Zn in multiple regression; and between Cr and Cd in SVM regression. Copyright © 2017 Elsevier B.V. All rights reserved.

  3. Methods for Improving Information from ’Undesigned’ Human Factors Experiments.

    DTIC Science & Technology

    Human factors engineering, Information processing, Regression analysis , Experimental design, Least squares method, Analysis of variance, Correlation techniques, Matrices(Mathematics), Multiple disciplines, Mathematical prediction

  4. Advanced statistics: linear regression, part I: simple linear regression.

    PubMed

    Marill, Keith A

    2004-01-01

    Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.

  5. Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data.

    PubMed

    Balabin, Roman M; Smirnov, Sergey V

    2011-04-29

    During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. Copyright © 2011 Elsevier B.V. All rights reserved.

  6. Methods for estimating the magnitude and frequency of peak streamflows at ungaged sites in and near the Oklahoma Panhandle

    USGS Publications Warehouse

    Smith, S. Jerrod; Lewis, Jason M.; Graves, Grant M.

    2015-09-28

    Generalized-least-squares multiple-linear regression analysis was used to formulate regression relations between peak-streamflow frequency statistics and basin characteristics. Contributing drainage area was the only basin characteristic determined to be statistically significant for all percentage of annual exceedance probabilities and was the only basin characteristic used in regional regression equations for estimating peak-streamflow frequency statistics on unregulated streams in and near the Oklahoma Panhandle. The regression model pseudo-coefficient of determination, converted to percent, for the Oklahoma Panhandle regional regression equations ranged from about 38 to 63 percent. The standard errors of prediction and the standard model errors for the Oklahoma Panhandle regional regression equations ranged from about 84 to 148 percent and from about 76 to 138 percent, respectively. These errors were comparable to those reported for regional peak-streamflow frequency regression equations for the High Plains areas of Texas and Colorado. The root mean square errors for the Oklahoma Panhandle regional regression equations (ranging from 3,170 to 92,000 cubic feet per second) were less than the root mean square errors for the Oklahoma statewide regression equations (ranging from 18,900 to 412,000 cubic feet per second); therefore, the Oklahoma Panhandle regional regression equations produce more accurate peak-streamflow statistic estimates for the irrigated period of record in the Oklahoma Panhandle than do the Oklahoma statewide regression equations. The regression equations developed in this report are applicable to streams that are not substantially affected by regulation, impoundment, or surface-water withdrawals. These regression equations are intended for use for stream sites with contributing drainage areas less than or equal to about 2,060 square miles, the maximum value for the independent variable used in the regression analysis.

  7. Comparing least-squares and quantile regression approaches to analyzing median hospital charges.

    PubMed

    Olsen, Cody S; Clark, Amy E; Thomas, Andrea M; Cook, Lawrence J

    2012-07-01

    Emergency department (ED) and hospital charges obtained from administrative data sets are useful descriptors of injury severity and the burden to EDs and the health care system. However, charges are typically positively skewed due to costly procedures, long hospital stays, and complicated or prolonged treatment for few patients. The median is not affected by extreme observations and is useful in describing and comparing distributions of hospital charges. A least-squares analysis employing a log transformation is one approach for estimating median hospital charges, corresponding confidence intervals (CIs), and differences between groups; however, this method requires certain distributional properties. An alternate method is quantile regression, which allows estimation and inference related to the median without making distributional assumptions. The objective was to compare the log-transformation least-squares method to the quantile regression approach for estimating median hospital charges, differences in median charges between groups, and associated CIs. The authors performed simulations using repeated sampling of observed statewide ED and hospital charges and charges randomly generated from a hypothetical lognormal distribution. The median and 95% CI and the multiplicative difference between the median charges of two groups were estimated using both least-squares and quantile regression methods. Performance of the two methods was evaluated. In contrast to least squares, quantile regression produced estimates that were unbiased and had smaller mean square errors in simulations of observed ED and hospital charges. Both methods performed well in simulations of hypothetical charges that met least-squares method assumptions. When the data did not follow the assumed distribution, least-squares estimates were often biased, and the associated CIs had lower than expected coverage as sample size increased. Quantile regression analyses of hospital charges provide unbiased estimates even when lognormal and equal variance assumptions are violated. These methods may be particularly useful in describing and analyzing hospital charges from administrative data sets. © 2012 by the Society for Academic Emergency Medicine.

  8. Development of a Multiple Linear Regression Model to Forecast Facility Electrical Consumption at an Air Force Base.

    DTIC Science & Technology

    1981-09-01

    corresponds to the same square footage that consumed the electrical energy. 3. The basic assumptions of multiple linear regres- sion, as enumerated in...7. Data related to the sample of bases is assumed to be representative of bases in the population. Limitations Basic limitations on this research were... Ratemaking --Overview. Rand Report R-5894, Santa Monica CA, May 1977. Chatterjee, Samprit, and Bertram Price. Regression Analysis by Example. New York: John

  9. Examination of Parameters Affecting the House Prices by Multiple Regression Analysis and its Contributions to Earthquake-Based Urban Transformation

    NASA Astrophysics Data System (ADS)

    Denli, H. H.; Durmus, B.

    2016-12-01

    The purpose of this study is to examine the factors which may affect the apartment prices with multiple linear regression analysis models and visualize the results by value maps. The study is focused on a county of Istanbul - Turkey. Totally 390 apartments around the county Umraniye are evaluated due to their physical and locational conditions. The identification of factors affecting the price of apartments in the county with a population of approximately 600k is expected to provide a significant contribution to the apartment market.Physical factors are selected as the age, number of rooms, size, floor numbers of the building and the floor that the apartment is positioned in. Positional factors are selected as the distances to the nearest hospital, school, park and police station. Totally ten physical and locational parameters are examined by regression analysis.After the regression analysis has been performed, value maps are composed from the parameters age, price and price per square meters. The most significant of the composed maps is the price per square meters map. Results show that the location of the apartment has the most influence to the square meter price information of the apartment. A different practice is developed from the composed maps by searching the ability of using price per square meters map in urban transformation practices. By marking the buildings older than 15 years in the price per square meters map, a different and new interpretation has been made to determine the buildings, to which should be given priority during an urban transformation in the county.This county is very close to the North Anatolian Fault zone and is under the threat of earthquakes. By marking the apartments older than 15 years on the price per square meters map, both older and expensive square meters apartments list can be gathered. By the help of this list, the priority could be given to the selected higher valued old apartments to support the economy of the country during an earthquake loss. We may call this urban transformation as earthquake-based urban transformation.

  10. Fundamental Analysis of the Linear Multiple Regression Technique for Quantification of Water Quality Parameters from Remote Sensing Data. Ph.D. Thesis - Old Dominion Univ.

    NASA Technical Reports Server (NTRS)

    Whitlock, C. H., III

    1977-01-01

    Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.

  11. Estimation of Finger Joint Angles Based on Electromechanical Sensing of Wrist Shape.

    PubMed

    Kawaguchi, Junki; Yoshimoto, Shunsuke; Kuroda, Yoshihiro; Oshiro, Osamu

    2017-09-01

    An approach to finger motion capture that places fewer restrictions on the usage environment and actions of the user is an important research topic in biomechanics and human-computer interaction. We proposed a system that electrically detects finger motion from the associated deformation of the wrist and estimates the finger joint angles using multiple regression models. A wrist-mounted sensing device with 16 electrodes detects deformation of the wrist from changes in electrical contact resistance at the skin. In this study, we experimentally investigated the accuracy of finger joint angle estimation, the adequacy of two multiple regression models, and the resolution of the estimation of total finger joint angles. In experiments, both the finger joint angles and the system output voltage were recorded as subjects performed flexion/extension of the fingers. These data were used for calibration using the least-squares method. The system was found to be capable of estimating the total finger joint angle with a root-mean-square error of 29-34 degrees. A multiple regression model with a second-order polynomial basis function was shown to be suitable for the estimation of all total finger joint angles, but not those of the thumb.

  12. Normalization Ridge Regression in Practice II: The Estimation of Multiple Feedback Linkages.

    ERIC Educational Resources Information Center

    Bulcock, J. W.

    The use of the two-stage least squares (2 SLS) procedure for estimating nonrecursive social science models is often impractical when multiple feedback linkages are required. This is because 2 SLS is extremely sensitive to multicollinearity. The standard statistical solution to the multicollinearity problem is a biased, variance reduced procedure…

  13. Effects of Verbal and Written Performance Feedback on Treatment Adherence: Practical Application of Two Delivery Formats

    ERIC Educational Resources Information Center

    Kaufman, Dahlia; Codding, Robin S.; Markus, Keith A.; Tryon, Georgiana Shick; Kyse, Eden Nagler

    2013-01-01

    Verbal and written performance feedback for improving preschool and kindergarten teachers' treatment integrity of behavior plans was compared using a combined multiple-baseline and multiple-treatment design across teacher-student dyads with order counterbalanced as within-series conditions. Supplemental generalized least square regression analyses…

  14. Multi-analyte quantification in bioprocesses by Fourier-transform-infrared spectroscopy by partial least squares regression and multivariate curve resolution.

    PubMed

    Koch, Cosima; Posch, Andreas E; Goicoechea, Héctor C; Herwig, Christoph; Lendl, Bernhard

    2014-01-07

    This paper presents the quantification of Penicillin V and phenoxyacetic acid, a precursor, inline during Pencillium chrysogenum fermentations by FTIR spectroscopy and partial least squares (PLS) regression and multivariate curve resolution - alternating least squares (MCR-ALS). First, the applicability of an attenuated total reflection FTIR fiber optic probe was assessed offline by measuring standards of the analytes of interest and investigating matrix effects of the fermentation broth. Then measurements were performed inline during four fed-batch fermentations with online HPLC for the determination of Penicillin V and phenoxyacetic acid as reference analysis. PLS and MCR-ALS models were built using these data and validated by comparison of single analyte spectra with the selectivity ratio of the PLS models and the extracted spectral traces of the MCR-ALS models, respectively. The achieved root mean square errors of cross-validation for the PLS regressions were 0.22 g L(-1) for Penicillin V and 0.32 g L(-1) for phenoxyacetic acid and the root mean square errors of prediction for MCR-ALS were 0.23 g L(-1) for Penicillin V and 0.15 g L(-1) for phenoxyacetic acid. A general work-flow for building and assessing chemometric regression models for the quantification of multiple analytes in bioprocesses by FTIR spectroscopy is given. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

  15. Simultaneous spectrophotometric determination of salbutamol and bromhexine in tablets.

    PubMed

    Habib, I H I; Hassouna, M E M; Zaki, G A

    2005-03-01

    Typical anti-mucolytic drugs called salbutamol hydrochloride and bromhexine sulfate encountered in tablets were determined simultaneously either by using linear regression at zero-crossing wavelengths of the first derivation of UV-spectra or by application of multiple linear partial least squares regression method. The results obtained by the two proposed mathematical methods were compared with those obtained by the HPLC technique.

  16. Introductory Statistics in the Garden

    ERIC Educational Resources Information Center

    Wagaman, John C.

    2017-01-01

    This article describes four semesters of introductory statistics courses that incorporate service learning and gardening into the curriculum with applications of the binomial distribution, least squares regression and hypothesis testing. The activities span multiple semesters and are iterative in nature.

  17. Wavelet regression model in forecasting crude oil price

    NASA Astrophysics Data System (ADS)

    Hamid, Mohd Helmie; Shabri, Ani

    2017-05-01

    This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.

  18. A method for the selection of a functional form for a thermodynamic equation of state using weighted linear least squares stepwise regression

    NASA Technical Reports Server (NTRS)

    Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.

    1976-01-01

    A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.

  19. The comparison of robust partial least squares regression with robust principal component regression on a real

    NASA Astrophysics Data System (ADS)

    Polat, Esra; Gunay, Suleyman

    2013-10-01

    One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.

  20. A consensus least squares support vector regression (LS-SVR) for analysis of near-infrared spectra of plant samples.

    PubMed

    Li, Yankun; Shao, Xueguang; Cai, Wensheng

    2007-04-15

    Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.

  1. Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients

    NASA Astrophysics Data System (ADS)

    Gorgees, HazimMansoor; Mahdi, FatimahAssim

    2018-05-01

    This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.

  2. A new linear least squares method for T1 estimation from SPGR signals with multiple TRs

    NASA Astrophysics Data System (ADS)

    Chang, Lin-Ching; Koay, Cheng Guan; Basser, Peter J.; Pierpaoli, Carlo

    2009-02-01

    The longitudinal relaxation time, T1, can be estimated from two or more spoiled gradient recalled echo x (SPGR) images with two or more flip angles and one or more repetition times (TRs). The function relating signal intensity and the parameters are nonlinear; T1 maps can be computed from SPGR signals using nonlinear least squares regression. A widely-used linear method transforms the nonlinear model by assuming a fixed TR in SPGR images. This constraint is not desirable since multiple TRs are a clinically practical way to reduce the total acquisition time, to satisfy the required resolution, and/or to combine SPGR data acquired at different times. A new linear least squares method is proposed using the first order Taylor expansion. Monte Carlo simulations of SPGR experiments are used to evaluate the accuracy and precision of the estimated T1 from the proposed linear and the nonlinear methods. We show that the new linear least squares method provides T1 estimates comparable in both precision and accuracy to those from the nonlinear method, allowing multiple TRs and reducing computation time significantly.

  3. Estimation of liver T₂ in transfusion-related iron overload in patients with weighted least squares T₂ IDEAL.

    PubMed

    Vasanawala, Shreyas S; Yu, Huanzhou; Shimakawa, Ann; Jeng, Michael; Brittain, Jean H

    2012-01-01

    MRI imaging of hepatic iron overload can be achieved by estimating T(2) values using multiple-echo sequences. The purpose of this work is to develop and clinically evaluate a weighted least squares algorithm based on T(2) Iterative Decomposition of water and fat with Echo Asymmetry and Least-squares estimation (IDEAL) technique for volumetric estimation of hepatic T(2) in the setting of iron overload. The weighted least squares T(2) IDEAL technique improves T(2) estimation by automatically decreasing the impact of later, noise-dominated echoes. The technique was evaluated in 37 patients with iron overload. Each patient underwent (i) a standard 2D multiple-echo gradient echo sequence for T(2) assessment with nonlinear exponential fitting, and (ii) a 3D T(2) IDEAL technique, with and without a weighted least squares fit. Regression and Bland-Altman analysis demonstrated strong correlation between conventional 2D and T(2) IDEAL estimation. In cases of severe iron overload, T(2) IDEAL without weighted least squares reconstruction resulted in a relative overestimation of T(2) compared with weighted least squares. Copyright © 2011 Wiley-Liss, Inc.

  4. Estimation of perceptible water vapor of atmosphere using artificial neural network, support vector machine and multiple linear regression algorithm and their comparative study

    NASA Astrophysics Data System (ADS)

    Shastri, Niket; Pathak, Kamlesh

    2018-05-01

    The water vapor content in atmosphere plays very important role in climate. In this paper the application of GPS signal in meteorology is discussed, which is useful technique that is used to estimate the perceptible water vapor of atmosphere. In this paper various algorithms like artificial neural network, support vector machine and multiple linear regression are use to predict perceptible water vapor. The comparative studies in terms of root mean square error and mean absolute errors are also carried out for all the algorithms.

  5. Normality of Residuals Is a Continuous Variable, and Does Seem to Influence the Trustworthiness of Confidence Intervals: A Response to, and Appreciation of, Williams, Grajales, and Kurkiewicz (2013)

    ERIC Educational Resources Information Center

    Osborne, Jason W.

    2013-01-01

    Osborne and Waters (2002) focused on checking some of the assumptions of multiple linear regression. In a critique of that paper, Williams, Grajales, and Kurkiewicz correctly clarify that regression models estimated using ordinary least squares require the assumption of normally distributed errors, but not the assumption of normally distributed…

  6. Modelling fourier regression for time series data- a case study: modelling inflation in foods sector in Indonesia

    NASA Astrophysics Data System (ADS)

    Prahutama, Alan; Suparti; Wahyu Utami, Tiani

    2018-03-01

    Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.

  7. Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

    PubMed Central

    Motulsky, Harvey J; Brown, Ronald E

    2006-01-01

    Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949

  8. Exact and Approximate Statistical Inference for Nonlinear Regression and the Estimating Equation Approach.

    PubMed

    Demidenko, Eugene

    2017-09-01

    The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.

  9. Partial Least Squares with Structured Output for Modelling the Metabolomics Data Obtained from Complex Experimental Designs: A Study into the Y-Block Coding.

    PubMed

    Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston

    2016-10-28

    Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.

  10. MERGANSER- Predicting Mercury Levels in Fish and Loons in New England Lakes

    EPA Science Inventory

    MERGANSER (MERcury Geo-spatial AssesmentS for the New England Region) is an empirical least squares multiple regression model using atmospheric deposition of mercury (Hg) and readily obtainable lake and watershed features to predict fish and common loon Hg (as methyl mercury) in ...

  11. Partial Least Square Analyses of Landscape and Surface Water Biota Associations in the Savannah River Basin

    EPA Science Inventory

    Ecologists are often faced with problem of small sample size, correlated and large number of predictors, and high noise-to-signal relationships. This necessitates excluding important variables from the model when applying standard multiple or multivariate regression analyses. In ...

  12. Optimal Wavelengths Selection Using Hierarchical Evolutionary Algorithm for Prediction of Firmness and Soluble Solids Content in Apples

    USDA-ARS?s Scientific Manuscript database

    Hyperspectral scattering is a promising technique for rapid and noninvasive measurement of multiple quality attributes of apple fruit. A hierarchical evolutionary algorithm (HEA) approach, in combination with subspace decomposition and partial least squares (PLS) regression, was proposed to select o...

  13. MERGANSER - An Empirical Model to Predict Fish and Loon Mercury in New England Lakes

    EPA Science Inventory

    MERGANSER (MERcury Geo-spatial AssessmeNtS for the New England Region) is an empirical least-squares multiple regression model using mercury (Hg) deposition and readily obtainable lake and watershed features to predict fish (fillet) and common loon (blood) Hg in New England lakes...

  14. Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction.

    PubMed

    He, Dan; Kuhn, David; Parida, Laxmi

    2016-06-15

    Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. dhe@us.ibm.com. © The Author 2016. Published by Oxford University Press.

  15. Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

    NASA Astrophysics Data System (ADS)

    Gusriani, N.; Firdaniza

    2018-03-01

    The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.

  16. Application of near-infrared spectroscopy for the rapid quality assessment of Radix Paeoniae Rubra

    NASA Astrophysics Data System (ADS)

    Zhan, Hao; Fang, Jing; Tang, Liying; Yang, Hongjun; Li, Hua; Wang, Zhuju; Yang, Bin; Wu, Hongwei; Fu, Meihong

    2017-08-01

    Near-infrared (NIR) spectroscopy with multivariate analysis was used to quantify gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra, and the feasibility to classify the samples originating from different areas was investigated. A new high-performance liquid chromatography method was developed and validated to analyze gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra as the reference. Partial least squares (PLS), principal component regression (PCR), and stepwise multivariate linear regression (SMLR) were performed to calibrate the regression model. Different data pretreatments such as derivatives (1st and 2nd), multiplicative scatter correction, standard normal variate, Savitzky-Golay filter, and Norris derivative filter were applied to remove the systematic errors. The performance of the model was evaluated according to the root mean square of calibration (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and correlation coefficient (r). The results show that compared to PCR and SMLR, PLS had a lower RMSEC, RMSECV, and RMSEP and higher r for all the four analytes. PLS coupled with proper pretreatments showed good performance in both the fitting and predicting results. Furthermore, the original areas of Radix Paeoniae Rubra samples were partly distinguished by principal component analysis. This study shows that NIR with PLS is a reliable, inexpensive, and rapid tool for the quality assessment of Radix Paeoniae Rubra.

  17. A regression-kriging model for estimation of rainfall in the Laohahe basin

    NASA Astrophysics Data System (ADS)

    Wang, Hong; Ren, Li L.; Liu, Gao H.

    2009-10-01

    This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.

  18. Patterns of Library Use by Undergraduate Students in a Chilean University

    ERIC Educational Resources Information Center

    Jara, Magdalena; Clasing, Paula; Gonzalez, Carlos; Montenegro, Maximiliano; Kelly, Nick; Alarcón, Rosa; Sandoval, Augusto; Saurina, Elvira

    2017-01-01

    This paper explores the patterns of use of print materials and digital resources in an undergraduate library in a Chilean university, by the students' discipline and year of study. A quantitative analysis was carried out, including descriptive analysis of contingency tables, chi-squared tests, t-tests, and multiple linear regressions. The results…

  19. Prediction of octanol-water partition coefficients of organic compounds by multiple linear regression, partial least squares, and artificial neural network.

    PubMed

    Golmohammadi, Hassan

    2009-11-30

    A quantitative structure-property relationship (QSPR) study was performed to develop models those relate the structure of 141 organic compounds to their octanol-water partition coefficients (log P(o/w)). A genetic algorithm was applied as a variable selection tool. Modeling of log P(o/w) of these compounds as a function of theoretically derived descriptors was established by multiple linear regression (MLR), partial least squares (PLS), and artificial neural network (ANN). The best selected descriptors that appear in the models are: atomic charge weighted partial positively charged surface area (PPSA-3), fractional atomic charge weighted partial positive surface area (FPSA-3), minimum atomic partial charge (Qmin), molecular volume (MV), total dipole moment of molecule (mu), maximum antibonding contribution of a molecule orbital in the molecule (MAC), and maximum free valency of a C atom in the molecule (MFV). The result obtained showed the ability of developed artificial neural network to prediction of partition coefficients of organic compounds. Also, the results revealed the superiority of ANN over the MLR and PLS models. Copyright 2009 Wiley Periodicals, Inc.

  20. Least squares regression methods for clustered ROC data with discrete covariates.

    PubMed

    Tang, Liansheng Larry; Zhang, Wei; Li, Qizhai; Ye, Xuan; Chan, Leighton

    2016-07-01

    The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  1. Estimating magnitude and frequency of peak discharges for rural, unregulated, streams in West Virginia

    USGS Publications Warehouse

    Wiley, J.B.; Atkins, John T.; Tasker, Gary D.

    2000-01-01

    Multiple and simple least-squares regression models for the log10-transformed 100-year discharge with independent variables describing the basin characteristics (log10-transformed and untransformed) for 267 streamflow-gaging stations were evaluated, and the regression residuals were plotted as areal distributions that defined three regions of the State, designated East, North, and South. Exploratory data analysis procedures identified 31 gaging stations at which discharges are different than would be expected for West Virginia. Regional equations for the 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year peak discharges were determined by generalized least-squares regression using data from 236 gaging stations. Log10-transformed drainage area was the most significant independent variable for all regions.Equations developed in this study are applicable only to rural, unregulated, streams within the boundaries of West Virginia. The accuracy of estimating equations is quantified by measuring the average prediction error (from 27.7 to 44.7 percent) and equivalent years of record (from 1.6 to 20.0 years).

  2. Comparison of two-concentration with multi-concentration linear regressions: Retrospective data analysis of multiple regulated LC-MS bioanalytical projects.

    PubMed

    Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi

    2013-09-01

    Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore, examples are given as how to evaluate the linearity over the entire concentration range when only two concentration levels are used for linear regression. To conclude, two-concentration linear regression is accurate and robust enough for routine use in regulated LC-MS bioanalysis and it significantly saves time and cost as well. Copyright © 2013 Elsevier B.V. All rights reserved.

  3. The crux of the method: assumptions in ordinary least squares and logistic regression.

    PubMed

    Long, Rebecca G

    2008-10-01

    Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.

  4. Characteristics and Psychosocial Predictors of Adolescent Nonsuicidal Self-Injury in Residential Care

    ERIC Educational Resources Information Center

    Gallant, Jason; Snyder, Gregory S.; von der Embse, Nathaniel P.

    2014-01-01

    This study examined characteristics and biopsychosocial predictors of nonsuicidal self-injury in a sample (N = 753) of youth in residential care admitted between 2005 and 2010. To model the data, the authors used t-tests, chi-square tests, and multiple logistic regressions stratified by gender. Results suggested that 12% of youth engaged in…

  5. Hyperspectral analysis of soil organic matter in coal mining regions using wavelets, correlations, and partial least squares regression.

    PubMed

    Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen

    2016-02-01

    Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.

  6. Effects of land cover, topography, and built structure on seasonal water quality at multiple spatial scales.

    PubMed

    Pratt, Bethany; Chang, Heejun

    2012-03-30

    The relationship among land cover, topography, built structure and stream water quality in the Portland Metro region of Oregon and Clark County, Washington areas, USA, is analyzed using ordinary least squares (OLS) and geographically weighted (GWR) multiple regression models. Two scales of analysis, a sectional watershed and a buffer, offered a local and a global investigation of the sources of stream pollutants. Model accuracy, measured by R(2) values, fluctuated according to the scale, season, and regression method used. While most wet season water quality parameters are associated with urban land covers, most dry season water quality parameters are related topographic features such as elevation and slope. GWR models, which take into consideration local relations of spatial autocorrelation, had stronger results than OLS regression models. In the multiple regression models, sectioned watershed results were consistently better than the sectioned buffer results, except for dry season pH and stream temperature parameters. This suggests that while riparian land cover does have an effect on water quality, a wider contributing area needs to be included in order to account for distant sources of pollutants. Copyright © 2012 Elsevier B.V. All rights reserved.

  7. Estimation of Flood-Frequency Discharges for Rural, Unregulated Streams in West Virginia

    USGS Publications Warehouse

    Wiley, Jeffrey B.; Atkins, John T.

    2010-01-01

    Flood-frequency discharges were determined for 290 streamgage stations having a minimum of 9 years of record in West Virginia and surrounding states through the 2006 or 2007 water year. No trend was determined in the annual peaks used to calculate the flood-frequency discharges. Multiple and simple least-squares regression equations for the 100-year (1-percent annual-occurrence probability) flood discharge with independent variables that describe the basin characteristics were developed for 290 streamgage stations in West Virginia and adjacent states. The regression residuals for the models were evaluated and used to define three regions of the State, designated as Eastern Panhandle, Central Mountains, and Western Plateaus. Exploratory data analysis procedures identified 44 streamgage stations that were excluded from the development of regression equations representative of rural, unregulated streams in West Virginia. Regional equations for the 1.1-, 1.5-, 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year flood discharges were determined by generalized least-squares regression using data from the remaining 246 streamgage stations. Drainage area was the only significant independent variable determined for all equations in all regions. Procedures developed to estimate flood-frequency discharges on ungaged streams were based on (1) regional equations and (2) drainage-area ratios between gaged and ungaged locations on the same stream. The procedures are applicable only to rural, unregulated streams within the boundaries of West Virginia that have drainage areas within the limits of the stations used to develop the regional equations (from 0.21 to 1,461 square miles in the Eastern Panhandle, from 0.10 to 1,619 square miles in the Central Mountains, and from 0.13 to 1,516 square miles in the Western Plateaus). The accuracy of the equations is quantified by measuring the average prediction error (from 21.7 to 56.3 percent) and equivalent years of record (from 2.0 to 70.9 years).

  8. Economist intelligence unit democracy index in relation to health services accessibility: a regression analysis.

    PubMed

    Walker, Mary Ellen; Anonson, June; Szafron, Michael

    2015-01-01

    The relationship between political environment and health services accessibility (HSA) has not been the focus of any specific studies. The purpose of this study was to address this gap in the literature by examining the relationship between political environment and HSA. This relationship that HSA indicators (physicians, nurses and hospital beds per 10 000 people) has with political environment was analyzed with multiple least-squares regression using the components of democracy (electoral processes and pluralism, functioning of government, political participation, political culture, and civil liberties). The components of democracy were represented by the 2011 Economist Intelligence Unit Democracy Index (EIUDI) sub-scores. The EIUDI sub-scores and the HSA indicators were evaluated for significant relationships with multiple least-squares regression. While controlling for a country's geographic location and level of democracy, we found that two components of a nation's political environment: functioning of government and political participation, and their interaction had significant relationships with the three HSA indicators. These study findings are of significance to health professionals because they examine the political contexts in which citizens access health services, they come from research that is the first of its kind, and they help explain the effect political environment has on health. © The Author 2014. Published by Oxford University Press on behalf of Royal Society of Tropical Medicine and Hygiene. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  9. Sparse partial least squares regression for simultaneous dimension reduction and variable selection

    PubMed Central

    Chun, Hyonho; Keleş, Sündüz

    2010-01-01

    Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611

  10. Partial Least Squares Regression Can Aid in Detecting Differential Abundance of Multiple Features in Sets of Metagenomic Samples

    PubMed Central

    Libiger, Ondrej; Schork, Nicholas J.

    2015-01-01

    It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061

  11. Modeling the language learning strategies and English language proficiency of pre-university students in UMS: A case study

    NASA Astrophysics Data System (ADS)

    Kiram, J. J.; Sulaiman, J.; Swanto, S.; Din, W. A.

    2015-10-01

    This study aims to construct a mathematical model of the relationship between a student's Language Learning Strategy usage and English Language proficiency. Fifty-six pre-university students of University Malaysia Sabah participated in this study. A self-report questionnaire called the Strategy Inventory for Language Learning was administered to them to measure their language learning strategy preferences before they sat for the Malaysian University English Test (MUET), the results of which were utilised to measure their English language proficiency. We attempted the model assessment specific to Multiple Linear Regression Analysis subject to variable selection using Stepwise regression. We conducted various assessments to the model obtained, including the Global F-test, Root Mean Square Error and R-squared. The model obtained suggests that not all language learning strategies should be included in the model in an attempt to predict Language Proficiency.

  12. A non-linear regression analysis program for describing electrophysiological data with multiple functions using Microsoft Excel.

    PubMed

    Brown, Angus M

    2006-04-01

    The objective of this present study was to demonstrate a method for fitting complex electrophysiological data with multiple functions using the SOLVER add-in of the ubiquitous spreadsheet Microsoft Excel. SOLVER minimizes the difference between the sum of the squares of the data to be fit and the function(s) describing the data using an iterative generalized reduced gradient method. While it is a straightforward procedure to fit data with linear functions, and we have previously demonstrated a method of non-linear regression analysis of experimental data based upon a single function, it is more complex to fit data with multiple functions, usually requiring specialized expensive computer software. In this paper we describe an easily understood program for fitting experimentally acquired data, in this case the stimulus-evoked compound action potential from the mouse optic nerve, with multiple Gaussian functions. The program is flexible and can be applied to describe data with a wide variety of user-input functions.

  13. Economic benefits of reducing fire-related sediment in southwestern fire-prone ecosystems

    Treesearch

    John Loomis; Pete Wohlgemuth; Armando González-Cabán; Don English

    2003-01-01

    A multiple regression analysis of fire interval and resulting sediment yield (controlling for relief ratio, rainfall, etc.) indicates that reducing the fire interval from the current average 22 years to a prescribed fire interval of 5 years would reduce sediment yield by 2 million cubic meters in the 86.2 square kilometer southern California watershed adjacent to and...

  14. Estimating heating times of wood boards, square timbers, and logs in saturated steam by multiple regression

    Treesearch

    William T. Simpson

    2006-01-01

    Heat sterilization is used to kill insects and fungi in wood being traded internationally. Determining the time required to reach the kill temperature is difficult considering the many variables that can affect it, such as heating temperature, target center temperature, initial wood temperature, wood configuration dimensions, specific gravity, and moisture content. In...

  15. Mapping diffuse photosynthetically active radiation from satellite data in Thailand

    NASA Astrophysics Data System (ADS)

    Choosri, P.; Janjai, S.; Nunez, M.; Buntoung, S.; Charuchittipan, D.

    2017-12-01

    In this paper, calculation of monthly average hourly diffuse photosynthetically active radiation (PAR) using satellite data is proposed. Diffuse PAR was analyzed at four stations in Thailand. A radiative transfer model was used for calculating the diffuse PAR for cloudless sky conditions. Differences between the diffuse PAR under all sky conditions obtained from the ground-based measurements and those from the model are representative of cloud effects. Two models are developed, one describing diffuse PAR only as a function of solar zenith angle, and the second one as a multiple linear regression with solar zenith angle and satellite reflectivity acting linearly and aerosol optical depth acting in logarithmic functions. When tested with an independent data set, the multiple regression model performed best with a higher coefficient of variance R2 (0.78 vs. 0.70), lower root mean square difference (RMSD) (12.92% vs. 13.05%) and the same mean bias difference (MBD) of -2.20%. Results from the multiple regression model are used to map diffuse PAR throughout the country as monthly averages of hourly data.

  16. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  17. An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology

    DOE PAGES

    Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...

    2017-05-15

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less

  18. Estimating the magnitude of annual peak discharges with recurrence intervals between 1.1 and 3.0 years for rural, unregulated streams in West Virginia

    USGS Publications Warehouse

    Wiley, Jeffrey B.; Atkins, John T.; Newell, Dawn A.

    2002-01-01

    Multiple and simple least-squares regression models for the log10-transformed 1.5- and 2-year recurrence intervals of peak discharges with independent variables describing the basin characteristics (log10-transformed and untransformed) for 236 streamflow-gaging stations were evaluated, and the regression residuals were plotted as areal distributions that defined three regions in West Virginia designated as East, North, and South. Regional equations for the 1.1-, 1.2-, 1.3-, 1.4-, 1.5-, 1.6-, 1.7-, 1.8-, 1.9-, 2.0-, 2.5-, and 3-year recurrence intervals of peak discharges were determined by generalized least-squares regression. Log10-transformed drainage area was the most significant independent variable for all regions. Equations developed in this study are applicable only to rural, unregulated streams within the boundaries of West Virginia. The accuracies of estimating equations are quantified by measuring the average prediction error (from 27.4 to 52.4 percent) and equivalent years of record (from 1.1 to 3.4 years).

  19. Estimation of Magnitude and Frequency of Floods for Streams on the Island of Oahu, Hawaii

    USGS Publications Warehouse

    Wong, Michael F.

    1994-01-01

    This report describes techniques for estimating the magnitude and frequency of floods for the island of Oahu. The log-Pearson Type III distribution and methodology recommended by the Interagency Committee on Water Data was used to determine the magnitude and frequency of floods at 79 gaging stations that had 11 to 72 years of record. Multiple regression analysis was used to construct regression equations to transfer the magnitude and frequency information from gaged sites to ungaged sites. Oahu was divided into three hydrologic regions to define relations between peak discharge and drainage-basin and climatic characteristics. Regression equations are provided to estimate the 2-, 5-, 10-, 25-, 50-, and 100-year peak discharges at ungaged sites. Significant basin and climatic characteristics included in the regression equations are drainage area, median annual rainfall, and the 2-year, 24-hour rainfall intensity. Drainage areas for sites used in this study ranged from 0.03 to 45.7 square miles. Standard error of prediction for the regression equations ranged from 34 to 62 percent. Peak-discharge data collected through water year 1988, geographic information system (GIS) technology, and generalized least-squares regression were used in the analyses. The use of GIS seems to be a more flexible and consistent means of defining and calculating basin and climatic characteristics than using manual methods. Standard errors of estimate for the regression equations in this report are an average of 8 percent less than those published in previous studies.

  20. Using Weighted Least Squares Regression for Obtaining Langmuir Sorption Constants

    USDA-ARS?s Scientific Manuscript database

    One of the most commonly used models for describing phosphorus (P) sorption to soils is the Langmuir model. To obtain model parameters, the Langmuir model is fit to measured sorption data using least squares regression. Least squares regression is based on several assumptions including normally dist...

  1. A Comparative Investigation of the Combined Effects of Pre-Processing, Wavelength Selection, and Regression Methods on Near-Infrared Calibration Model Performance.

    PubMed

    Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N

    2017-07-01

    Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant role in the calibration while wavelength selection plays a marginal role and the combination of certain pre-processing, wavelength selection, and nonlinear regression methods can achieve superior performance over traditional linear regression-based calibration.

  2. Using Remote Sensing Data to Evaluate Surface Soil Properties in Alabama Ultisols

    NASA Technical Reports Server (NTRS)

    Sullivan, Dana G.; Shaw, Joey N.; Rickman, Doug; Mask, Paul L.; Luvall, Jeff

    2005-01-01

    Evaluation of surface soil properties via remote sensing could facilitate soil survey mapping, erosion prediction and allocation of agrochemicals for precision management. The objective of this study was to evaluate the relationship between soil spectral signature and surface soil properties in conventionally managed row crop systems. High-resolution RS data were acquired over bare fields in the Coastal Plain, Appalachian Plateau, and Ridge and Valley provinces of Alabama using the Airborne Terrestrial Applications Sensor multispectral scanner. Soils ranged from sandy Kandiudults to fine textured Rhodudults. Surface soil samples (0-1 cm) were collected from 163 sampling points for soil organic carbon, particle size distribution, and citrate dithionite extractable iron content. Surface roughness, soil water content, and crusting were also measured during sampling. Two methods of analysis were evaluated: 1) multiple linear regression using common spectral band ratios, and 2) partial least squares regression. Our data show that thermal infrared spectra are highly, linearly related to soil organic carbon, sand and clay content. Soil organic carbon content was the most difficult to quantify in these highly weathered systems, where soil organic carbon was generally less than 1.2%. Estimates of sand and clay content were best using partial least squares regression at the Valley site, explaining 42-59% of the variability. In the Coastal Plain, sandy surfaces prone to crusting limited estimates of sand and clay content via partial least squares and regression with common band ratios. Estimates of iron oxide content were a function of mineralogy and best accomplished using specific band ratios, with regression explaining 36-65% of the variability at the Valley and Coastal Plain sites, respectively.

  3. Daily Suspended Sediment Discharge Prediction Using Multiple Linear Regression and Artificial Neural Network

    NASA Astrophysics Data System (ADS)

    Uca; Toriman, Ekhwan; Jaafar, Othman; Maru, Rosmini; Arfan, Amal; Saleh Ahmar, Ansari

    2018-01-01

    Prediction of suspended sediment discharge in a catchments area is very important because it can be used to evaluation the erosion hazard, management of its water resources, water quality, hydrology project management (dams, reservoirs, and irrigation) and to determine the extent of the damage that occurred in the catchments. Multiple Linear Regression analysis and artificial neural network can be used to predict the amount of daily suspended sediment discharge. Regression analysis using the least square method, whereas artificial neural networks using Radial Basis Function (RBF) and feedforward multilayer perceptron with three learning algorithms namely Levenberg-Marquardt (LM), Scaled Conjugate Descent (SCD) and Broyden-Fletcher-Goldfarb-Shanno Quasi-Newton (BFGS). The number neuron of hidden layer is three to sixteen, while in output layer only one neuron because only one output target. The mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2 ) and coefficient of efficiency (CE) of the multiple linear regression (MLRg) value Model 2 (6 input variable independent) has the lowest the value of MAE and RMSE (0.0000002 and 13.6039) and highest R2 and CE (0.9971 and 0.9971). When compared between LM, SCG and RBF, the BFGS model structure 3-7-1 is the better and more accurate to prediction suspended sediment discharge in Jenderam catchment. The performance value in testing process, MAE and RMSE (13.5769 and 17.9011) is smallest, meanwhile R2 and CE (0.9999 and 0.9998) is the highest if it compared with the another BFGS Quasi-Newton model (6-3-1, 9-10-1 and 12-12-1). Based on the performance statistics value, MLRg, LM, SCG, BFGS and RBF suitable and accurately for prediction by modeling the non-linear complex behavior of suspended sediment responses to rainfall, water depth and discharge. The comparison between artificial neural network (ANN) and MLRg, the MLRg Model 2 accurately for to prediction suspended sediment discharge (kg/day) in Jenderan catchment area.

  4. Hyperspectral Imaging for Predicting the Internal Quality of Kiwifruits Based on Variable Selection Algorithms and Chemometric Models.

    PubMed

    Zhu, Hongyan; Chu, Bingquan; Fan, Yangyang; Tao, Xiaoya; Yin, Wenxin; He, Yong

    2017-08-10

    We investigated the feasibility and potentiality of determining firmness, soluble solids content (SSC), and pH in kiwifruits using hyperspectral imaging, combined with variable selection methods and calibration models. The images were acquired by a push-broom hyperspectral reflectance imaging system covering two spectral ranges. Weighted regression coefficients (BW), successive projections algorithm (SPA) and genetic algorithm-partial least square (GAPLS) were compared and evaluated for the selection of effective wavelengths. Moreover, multiple linear regression (MLR), partial least squares regression and least squares support vector machine (LS-SVM) were developed to predict quality attributes quantitatively using effective wavelengths. The established models, particularly SPA-MLR, SPA-LS-SVM and GAPLS-LS-SVM, performed well. The SPA-MLR models for firmness (R pre  = 0.9812, RPD = 5.17) and SSC (R pre  = 0.9523, RPD = 3.26) at 380-1023 nm showed excellent performance, whereas GAPLS-LS-SVM was the optimal model at 874-1734 nm for predicting pH (R pre  = 0.9070, RPD = 2.60). Image processing algorithms were developed to transfer the predictive model in every pixel to generate prediction maps that visualize the spatial distribution of firmness and SSC. Hence, the results clearly demonstrated that hyperspectral imaging has the potential as a fast and non-invasive method to predict the quality attributes of kiwifruits.

  5. HIV-Related Risk Behaviors, Perceptions of Risk, HIV Testing, and Exposure to Prevention Messages and Methods among Urban American Indians and Alaska Natives

    ERIC Educational Resources Information Center

    Lapidus, Jodi A.; Bertolli, Jeanne; McGowan, Karen; Sullivan, Patrick

    2006-01-01

    The goal of this study was to describe HIV risk behaviors, perceptions, testing, and prevention exposure among urban American Indians and Alaska Natives (AI/AN). Interviewers administered a questionnaire to participants recruited through anonymous peer-referral sampling. Chi-square tests and multiple logistic regression were used to compare HIV…

  6. Influences of Disciplinary Classroom Climate on High School Student Self-Efficacy and Mathematics Achievement: A Look at Gender and Racial-Ethnic Differences

    ERIC Educational Resources Information Center

    Cheema, Jehanzeb R.; Kitsantas, Anastasia

    2014-01-01

    The present study investigated the role of disciplinary climate in the classroom and student math self-efficacy on math achievement. The student part of the Program for International Student Assessment (PISA) 2003 survey containing 4,199 U.S. observations was employed in a weighted least squares nested multiple regression framework to predict math…

  7. Testing a single regression coefficient in high dimensional linear models

    PubMed Central

    Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling

    2017-01-01

    In linear regression models with high dimensional data, the classical z-test (or t-test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z-test to assess the significance of each covariate. Based on the p-value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively. PMID:28663668

  8. Testing a single regression coefficient in high dimensional linear models.

    PubMed

    Lan, Wei; Zhong, Ping-Shou; Li, Runze; Wang, Hansheng; Tsai, Chih-Ling

    2016-11-01

    In linear regression models with high dimensional data, the classical z -test (or t -test) for testing the significance of each single regression coefficient is no longer applicable. This is mainly because the number of covariates exceeds the sample size. In this paper, we propose a simple and novel alternative by introducing the Correlated Predictors Screening (CPS) method to control for predictors that are highly correlated with the target covariate. Accordingly, the classical ordinary least squares approach can be employed to estimate the regression coefficient associated with the target covariate. In addition, we demonstrate that the resulting estimator is consistent and asymptotically normal even if the random errors are heteroscedastic. This enables us to apply the z -test to assess the significance of each covariate. Based on the p -value obtained from testing the significance of each covariate, we further conduct multiple hypothesis testing by controlling the false discovery rate at the nominal level. Then, we show that the multiple hypothesis testing achieves consistent model selection. Simulation studies and empirical examples are presented to illustrate the finite sample performance and the usefulness of the proposed method, respectively.

  9. Two Enhancements of the Logarithmic Least-Squares Method for Analyzing Subjective Comparisons

    DTIC Science & Technology

    1989-03-25

    error term. 1 For this model, the total sum of squares ( SSTO ), defined as n 2 SSTO = E (yi y) i=1 can be partitioned into error and regression sums...of the regression line around the mean value. Mathematically, for the model given by equation A.4, SSTO = SSE + SSR (A.6) A-4 where SSTO is the total...sum of squares (i.e., the variance of the yi’s), SSE is error sum of squares, and SSR is the regression sum of squares. SSTO , SSE, and SSR are given

  10. Ordinary Least Squares and Quantile Regression: An Inquiry-Based Learning Approach to a Comparison of Regression Methods

    ERIC Educational Resources Information Center

    Helmreich, James E.; Krog, K. Peter

    2018-01-01

    We present a short, inquiry-based learning course on concepts and methods underlying ordinary least squares (OLS), least absolute deviation (LAD), and quantile regression (QR). Students investigate squared, absolute, and weighted absolute distance functions (metrics) as location measures. Using differential calculus and properties of convex…

  11. On Using the Average Intercorrelation Among Predictor Variables and Eigenvector Orientation to Choose a Regression Solution.

    ERIC Educational Resources Information Center

    Mugrage, Beverly; And Others

    Three ridge regression solutions are compared with ordinary least squares regression and with principal components regression using all components. Ridge regression, particularly the Lawless-Wang solution, out-performed ordinary least squares regression and the principal components solution on the criteria of stability of coefficient and closeness…

  12. Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio

    USGS Publications Warehouse

    Koltun, G.F.; Roberts, J.W.

    1990-01-01

    Multiple-regression equations are presented for estimating flood-peak discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years at ungaged sites on rural, unregulated streams in Ohio. The average standard errors of prediction for the equations range from 33.4% to 41.4%. Peak discharge estimates determined by log-Pearson Type III analysis using data collected through the 1987 water year are reported for 275 streamflow-gaging stations. Ordinary least-squares multiple-regression techniques were used to divide the State into three regions and to identify a set of basin characteristics that help explain station-to- station variation in the log-Pearson estimates. Contributing drainage area, main-channel slope, and storage area were identified as suitable explanatory variables. Generalized least-square procedures, which include historical flow data and account for differences in the variance of flows at different gaging stations, spatial correlation among gaging station records, and variable lengths of station record were used to estimate the regression parameters. Weighted peak-discharge estimates computed as a function of the log-Pearson Type III and regression estimates are reported for each station. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site located on the same stream. Limitations and shortcomings cited in an earlier report on the magnitude and frequency of floods in Ohio are addressed in this study. Geographic bias is no longer evident for the Maumee River basin of northwestern Ohio. No bias is found to be associated with the forested-area characteristic for the range used in the regression analysis (0.0 to 99.0%), nor is this characteristic significant in explaining peak discharges. Surface-mined area likewise is not significant in explaining peak discharges, and the regression equations are not biased when applied to basins having approximately 30% or less surface-mined area. Analyses of residuals indicate that the equations tend to overestimate flood-peak discharges for basins having approximately 30% or more surface-mined area. (USGS)

  13. [Gaussian process regression and its application in near-infrared spectroscopy analysis].

    PubMed

    Feng, Ai-Ming; Fang, Li-Min; Lin, Min

    2011-06-01

    Gaussian process (GP) is applied in the present paper as a chemometric method to explore the complicated relationship between the near infrared (NIR) spectra and ingredients. After the outliers were detected by Monte Carlo cross validation (MCCV) method and removed from dataset, different preprocessing methods, such as multiplicative scatter correction (MSC), smoothing and derivate, were tried for the best performance of the models. Furthermore, uninformative variable elimination (UVE) was introduced as a variable selection technique and the characteristic wavelengths obtained were further employed as input for modeling. A public dataset with 80 NIR spectra of corn was introduced as an example for evaluating the new algorithm. The optimal models for oil, starch and protein were obtained by the GP regression method. The performance of the final models were evaluated according to the root mean square error of calibration (RMSEC), root mean square error of cross-validation (RMSECV), root mean square error of prediction (RMSEP) and correlation coefficient (r). The models give good calibration ability with r values above 0.99 and the prediction ability is also satisfactory with r values higher than 0.96. The overall results demonstrate that GP algorithm is an effective chemometric method and is promising for the NIR analysis.

  14. Technique for estimating the 2- to 500-year flood discharges on unregulated streams in rural Missouri

    USGS Publications Warehouse

    Alexander, Terry W.; Wilson, Gary L.

    1995-01-01

    A generalized least-squares regression technique was used to relate the 2- to 500-year flood discharges from 278 selected streamflow-gaging stations to statistically significant basin characteristics. The regression relations (estimating equations) were defined for three hydrologic regions (I, II, and III) in rural Missouri. Ordinary least-squares regression analyses indicate that drainage area (Regions I, II, and III) and main-channel slope (Regions I and II) are the only basin characteristics needed for computing the 2- to 500-year design-flood discharges at gaged or ungaged stream locations. The resulting generalized least-squares regression equations provide a technique for estimating the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year flood discharges on unregulated streams in rural Missouri. The regression equations for Regions I and II were developed from stream-flow-gaging stations with drainage areas ranging from 0.13 to 11,500 square miles and 0.13 to 14,000 square miles, and main-channel slopes ranging from 1.35 to 150 feet per mile and 1.20 to 279 feet per mile. The regression equations for Region III were developed from streamflow-gaging stations with drainage areas ranging from 0.48 to 1,040 square miles. Standard errors of estimate for the generalized least-squares regression equations in Regions I, II, and m ranged from 30 to 49 percent.

  15. Predicting recreational water quality advisories: A comparison of statistical methods

    USGS Publications Warehouse

    Brooks, Wesley R.; Corsi, Steven R.; Fienen, Michael N.; Carvin, Rebecca B.

    2016-01-01

    Epidemiological studies indicate that fecal indicator bacteria (FIB) in beach water are associated with illnesses among people having contact with the water. In order to mitigate public health impacts, many beaches are posted with an advisory when the concentration of FIB exceeds a beach action value. The most commonly used method of measuring FIB concentration takes 18–24 h before returning a result. In order to avoid the 24 h lag, it has become common to ”nowcast” the FIB concentration using statistical regressions on environmental surrogate variables. Most commonly, nowcast models are estimated using ordinary least squares regression, but other regression methods from the statistical and machine learning literature are sometimes used. This study compares 14 regression methods across 7 Wisconsin beaches to identify which consistently produces the most accurate predictions. A random forest model is identified as the most accurate, followed by multiple regression fit using the adaptive LASSO.

  16. Accounting for measurement error in log regression models with applications to accelerated testing.

    PubMed

    Richardson, Robert; Tolley, H Dennis; Evenson, William E; Lunt, Barry M

    2018-01-01

    In regression settings, parameter estimates will be biased when the explanatory variables are measured with error. This bias can significantly affect modeling goals. In particular, accelerated lifetime testing involves an extrapolation of the fitted model, and a small amount of bias in parameter estimates may result in a significant increase in the bias of the extrapolated predictions. Additionally, bias may arise when the stochastic component of a log regression model is assumed to be multiplicative when the actual underlying stochastic component is additive. To account for these possible sources of bias, a log regression model with measurement error and additive error is approximated by a weighted regression model which can be estimated using Iteratively Re-weighted Least Squares. Using the reduced Eyring equation in an accelerated testing setting, the model is compared to previously accepted approaches to modeling accelerated testing data with both simulations and real data.

  17. Generalized regression neural network (GRNN)-based approach for colored dissolved organic matter (CDOM) retrieval: case study of Connecticut River at Middle Haddam Station, USA.

    PubMed

    Heddam, Salim

    2014-11-01

    The prediction of colored dissolved organic matter (CDOM) using artificial neural network approaches has received little attention in the past few decades. In this study, colored dissolved organic matter (CDOM) was modeled using generalized regression neural network (GRNN) and multiple linear regression (MLR) models as a function of Water temperature (TE), pH, specific conductance (SC), and turbidity (TU). Evaluation of the prediction accuracy of the models is based on the root mean square error (RMSE), mean absolute error (MAE), coefficient of correlation (CC), and Willmott's index of agreement (d). The results indicated that GRNN can be applied successfully for prediction of colored dissolved organic matter (CDOM).

  18. An open-access CMIP5 pattern library for temperature and precipitation: description and methodology

    NASA Astrophysics Data System (ADS)

    Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben

    2017-05-01

    Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.

  19. Real estate value prediction using multivariate regression models

    NASA Astrophysics Data System (ADS)

    Manjula, R.; Jain, Shubham; Srivastava, Sharad; Rajiv Kher, Pranav

    2017-11-01

    The real estate market is one of the most competitive in terms of pricing and the same tends to vary significantly based on a lot of factors, hence it becomes one of the prime fields to apply the concepts of machine learning to optimize and predict the prices with high accuracy. Therefore in this paper, we present various important features to use while predicting housing prices with good accuracy. We have described regression models, using various features to have lower Residual Sum of Squares error. While using features in a regression model some feature engineering is required for better prediction. Often a set of features (multiple regressions) or polynomial regression (applying a various set of powers in the features) is used for making better model fit. For these models are expected to be susceptible towards over fitting ridge regression is used to reduce it. This paper thus directs to the best application of regression models in addition to other techniques to optimize the result.

  20. Statistical Tutorial | Center for Cancer Research

    Cancer.gov

    Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data.  ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018.  The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.

  1. Techniques for estimating magnitude and frequency of peak flows for Pennsylvania streams

    USGS Publications Warehouse

    Stuckey, Marla H.; Reed, Lloyd A.

    2000-01-01

    Regression equations for estimating the magnitude and frequency of floods on ungaged streams in Pennsylvania with drainage areas less that 2,000 square miles were developed on the basis of peak-flow data collected at 313 streamflow-gaging stations. All streamflow-gaging stations used in the development of the equations had 10 or more years of record and include active and discontinued continuous-record and crest-stage partial-record streamflow-gaging stations. Regional regression equations were developed for flood flows expected every 10, 25, 50, 100, and 500 years by the use of a weighted multiple linear regression model.The State was divided into two regions. The largest region, Region A, encompasses about 78 percent of Pennsylvania. The smaller region, Region B, includes only the northwestern part of the State. Basin characteristics used in the regression equations for Region A are drainage area, percentage of forest cover, percentage of urban development, percentage of basin underlain by carbonate bedrock, and percentage of basin controlled by lakes, swamps, and reservoirs. Basin characteristics used in the regression equations for Region B are drainage area and percentage of basin controlled by lakes, swamps, and reservoirs. The coefficient of determination (R2) values for the five flood-frequency equations for Region A range from 0.93 to 0.82, and for Region B, the range is from 0.96 to 0.89.While the regression equations can be used to predict the magnitude and frequency of peak flows for most streams in the State, they should not be used for streams with drainage areas greater than 2,000 square miles or less than 1.5 square miles, for streams that drain extensively mined areas, or for stream reaches immediately below flood-control reservoirs. In addition, the equations presented for Region B should not be used if the stream drains a basin with more than 5 percent urban development.

  2. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations.

    PubMed

    NeCamp, Timothy; Kilbourne, Amy; Almirall, Daniel

    2017-08-01

    Cluster-level dynamic treatment regimens can be used to guide sequential treatment decision-making at the cluster level in order to improve outcomes at the individual or patient-level. In a cluster-level dynamic treatment regimen, the treatment is potentially adapted and re-adapted over time based on changes in the cluster that could be impacted by prior intervention, including aggregate measures of the individuals or patients that compose it. Cluster-randomized sequential multiple assignment randomized trials can be used to answer multiple open questions preventing scientists from developing high-quality cluster-level dynamic treatment regimens. In a cluster-randomized sequential multiple assignment randomized trial, sequential randomizations occur at the cluster level and outcomes are observed at the individual level. This manuscript makes two contributions to the design and analysis of cluster-randomized sequential multiple assignment randomized trials. First, a weighted least squares regression approach is proposed for comparing the mean of a patient-level outcome between the cluster-level dynamic treatment regimens embedded in a sequential multiple assignment randomized trial. The regression approach facilitates the use of baseline covariates which is often critical in the analysis of cluster-level trials. Second, sample size calculators are derived for two common cluster-randomized sequential multiple assignment randomized trial designs for use when the primary aim is a between-dynamic treatment regimen comparison of the mean of a continuous patient-level outcome. The methods are motivated by the Adaptive Implementation of Effective Programs Trial which is, to our knowledge, the first-ever cluster-randomized sequential multiple assignment randomized trial in psychiatry.

  3. Orthogonal Regression: A Teaching Perspective

    ERIC Educational Resources Information Center

    Carr, James R.

    2012-01-01

    A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…

  4. Guidelines and Procedures for Computing Time-Series Suspended-Sediment Concentrations and Loads from In-Stream Turbidity-Sensor and Streamflow Data

    USGS Publications Warehouse

    Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.

    2009-01-01

    In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.

  5. A generalized least squares regression approach for computing effect sizes in single-case research: application examples.

    PubMed

    Maggin, Daniel M; Swaminathan, Hariharan; Rogers, Helen J; O'Keeffe, Breda V; Sugai, George; Horner, Robert H

    2011-06-01

    A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of treatment effect from baseline to treatment phases in standard deviation units. In this paper, the method is applied to two published examples using common single case designs (i.e., withdrawal and multiple-baseline). The results from these studies are described, and the method is compared to ten desirable criteria for single-case effect sizes. Based on the results of this application, we conclude with observations about the use of GLS as a support to visual analysis, provide recommendations for future research, and describe implications for practice. Copyright © 2011 Society for the Study of School Psychology. Published by Elsevier Ltd. All rights reserved.

  6. Efficient design of gain-flattened multi-pump Raman fiber amplifiers using least squares support vector regression

    NASA Astrophysics Data System (ADS)

    Chen, Jing; Qiu, Xiaojie; Yin, Cunyi; Jiang, Hao

    2018-02-01

    An efficient method to design the broadband gain-flattened Raman fiber amplifier with multiple pumps is proposed based on least squares support vector regression (LS-SVR). A multi-input multi-output LS-SVR model is introduced to replace the complicated solving process of the nonlinear coupled Raman amplification equation. The proposed approach contains two stages: offline training stage and online optimization stage. During the offline stage, the LS-SVR model is trained. Owing to the good generalization capability of LS-SVR, the net gain spectrum can be directly and accurately obtained when inputting any combination of the pump wavelength and power to the well-trained model. During the online stage, we incorporate the LS-SVR model into the particle swarm optimization algorithm to find the optimal pump configuration. The design results demonstrate that the proposed method greatly shortens the computation time and enhances the efficiency of the pump parameter optimization for Raman fiber amplifier design.

  7. Partial Least Squares Regression Calibration of an Ultraviolet-Visible Spectrophotometer for Measurements of Chemical Oxygen Demand in Dye Wastewater

    NASA Astrophysics Data System (ADS)

    Mai, W.; Zhang, J.-F.; Zhao, X.-M.; Li, Z.; Xu, Z.-W.

    2017-11-01

    Wastewater from the dye industry is typically analyzed using a standard method for measurement of chemical oxygen demand (COD) or by a single-wavelength spectroscopic method. To overcome the disadvantages of these methods, ultraviolet-visible (UV-Vis) spectroscopy was combined with principal component regression (PCR) and partial least squares regression (PLSR) in this study. Unlike the standard method, this method does not require digestion of the samples for preparation. Experiments showed that the PLSR model offered high prediction performance for COD, with a mean relative error of about 5% for two dyes. This error is similar to that obtained with the standard method. In this study, the precision of the PLSR model decreased with the number of dye compounds present. It is likely that multiple models will be required in reality, and the complexity of a COD monitoring system would be greatly reduced if the PLSR model is used because it can include several dyes. UV-Vis spectroscopy with PLSR successfully enhanced the performance of COD prediction for dye wastewater and showed good potential for application in on-line water quality monitoring.

  8. Application of multivariate chemometric techniques for simultaneous determination of five parameters of cottonseed oil by single bounce attenuated total reflectance Fourier transform infrared spectroscopy.

    PubMed

    Talpur, M Younis; Kara, Huseyin; Sherazi, S T H; Ayyildiz, H Filiz; Topkafa, Mustafa; Arslan, Fatma Nur; Naz, Saba; Durmaz, Fatih; Sirajuddin

    2014-11-01

    Single bounce attenuated total reflectance (SB-ATR) Fourier transform infrared (FTIR) spectroscopy in conjunction with chemometrics was used for accurate determination of free fatty acid (FFA), peroxide value (PV), iodine value (IV), conjugated diene (CD) and conjugated triene (CT) of cottonseed oil (CSO) during potato chips frying. Partial least square (PLS), stepwise multiple linear regression (SMLR), principal component regression (PCR) and simple Beer׳s law (SBL) were applied to develop the calibrations for simultaneous evaluation of five stated parameters of cottonseed oil (CSO) during frying of French frozen potato chips at 170°C. Good regression coefficients (R(2)) were achieved for FFA, PV, IV, CD and CT with value of >0.992 by PLS, SMLR, PCR, and SBL. Root mean square error of prediction (RMSEP) was found to be less than 1.95% for all determinations. Result of the study indicated that SB-ATR FTIR in combination with multivariate chemometrics could be used for accurate and simultaneous determination of different parameters during the frying process without using any toxic organic solvent. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Spirituality and Resilience Among Mexican American IPV Survivors.

    PubMed

    de la Rosa, Iván A; Barnett-Queen, Timothy; Messick, Madeline; Gurrola, Maria

    2016-12-01

    Women with abusive partners use a variety of coping strategies. This study examined the correlation between spirituality, resilience, and intimate partner violence using a cross-sectional survey of 54 Mexican American women living along the U.S.-Mexico border. The meaning-making coping model provides the conceptual framework to explore how spirituality is used as a copying strategy. Multiple ordinary least squares (OLS) regression results indicate women who score higher on spirituality also report greater resilient characteristics. Poisson regression analyses revealed that an increase in level of spirituality is associated with lower number of types of abuse experienced. Clinical, programmatic, and research implications are discussed. © The Author(s) 2015.

  10. Legitimate Techniques for Improving the R-Square and Related Statistics of a Multiple Regression Model

    DTIC Science & Technology

    1981-01-01

    explanatory variable has been ommitted. Ramsey (1974) has developed a rather interesting test for detecting specification errors using estimates of the...Peter. (1979) A Guide to Econometrics , Cambridge, MA: The MIT Press. Ramsey , J.B. (1974), "Classical Model Selection Through Specification Error... Tests ," in P. Zarembka, Ed. Frontiers in Econometrics , New York: Academia Press. Theil, Henri. (1971), Principles of Econometrics , New York: John Wiley

  11. Meteorological adjustment of yearly mean values for air pollutant concentration comparison

    NASA Technical Reports Server (NTRS)

    Sidik, S. M.; Neustadter, H. E.

    1976-01-01

    Using multiple linear regression analysis, models which estimate mean concentrations of Total Suspended Particulate (TSP), sulfur dioxide, and nitrogen dioxide as a function of several meteorologic variables, two rough economic indicators, and a simple trend in time are studied. Meteorologic data were obtained and do not include inversion heights. The goodness of fit of the estimated models is partially reflected by the squared coefficient of multiple correlation which indicates that, at the various sampling stations, the models accounted for about 23 to 47 percent of the total variance of the observed TSP concentrations. If the resulting model equations are used in place of simple overall means of the observed concentrations, there is about a 20 percent improvement in either: (1) predicting mean concentrations for specified meteorological conditions; or (2) adjusting successive yearly averages to allow for comparisons devoid of meteorological effects. An application to source identification is presented using regression coefficients of wind velocity predictor variables.

  12. Regression analysis for LED color detection of visual-MIMO system

    NASA Astrophysics Data System (ADS)

    Banik, Partha Pratim; Saha, Rappy; Kim, Ki-Doo

    2018-04-01

    Color detection from a light emitting diode (LED) array using a smartphone camera is very difficult in a visual multiple-input multiple-output (visual-MIMO) system. In this paper, we propose a method to determine the LED color using a smartphone camera by applying regression analysis. We employ a multivariate regression model to identify the LED color. After taking a picture of an LED array, we select the LED array region, and detect the LED using an image processing algorithm. We then apply the k-means clustering algorithm to determine the number of potential colors for feature extraction of each LED. Finally, we apply the multivariate regression model to predict the color of the transmitted LEDs. In this paper, we show our results for three types of environmental light condition: room environmental light, low environmental light (560 lux), and strong environmental light (2450 lux). We compare the results of our proposed algorithm from the analysis of training and test R-Square (%) values, percentage of closeness of transmitted and predicted colors, and we also mention about the number of distorted test data points from the analysis of distortion bar graph in CIE1931 color space.

  13. A Simple Introduction to Moving Least Squares and Local Regression Estimation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Garimella, Rao Veerabhadra

    In this brief note, a highly simpli ed introduction to esimating functions over a set of particles is presented. The note starts from Global Least Squares tting, going on to Moving Least Squares estimation (MLS) and nally, Local Regression Estimation (LRE).

  14. Analysis and prediction of flow from local source in a river basin using a Neuro-fuzzy modeling tool.

    PubMed

    Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi

    2007-10-01

    Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.

  15. Regional equations for estimation of peak-streamflow frequency for natural basins in Texas

    USGS Publications Warehouse

    Asquith, William H.; Slade, Raymond M.

    1997-01-01

    Peak-streamflow frequency for 559 Texas stations with natural (unregulated and rural or nonurbanized) basins was estimated with annual peak-streamflow data through 1993. The peak-streamflow frequency and drainage-basin characteristics for the Texas stations were used to develop 16 sets of equations to estimate peak-streamflow frequency for ungaged natural stream sites in each of 11 regions in Texas. The relation between peak-streamflow frequency and contributing drainage area for 5 of the 11 regions is curvilinear, requiring that one set of equations be developed for drainage areas less than 32 square miles and another set be developed for drainage areas greater than 32 square miles. These equations, developed through multiple-regression analysis using weighted least squares, are based on the relation between peak-streamflow frequency and basin characteristics for streamflow-gaging stations. The regions represent areas with similar flood characteristics. The use and limitations of the regression equations also are discussed. Additionally, procedures are presented to compute the 50-, 67-, and 90-percent confidence limits for any estimation from the equations. Also, supplemental peak-streamflow frequency and basin characteristics for 105 selected stations bordering Texas are included in the report. This supplemental information will aid in interpretation of flood characteristics for sites near the state borders of Texas.

  16. Normalization Ridge Regression in Practice I: Comparisons Between Ordinary Least Squares, Ridge Regression and Normalization Ridge Regression.

    ERIC Educational Resources Information Center

    Bulcock, J. W.

    The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…

  17. An iteratively reweighted least-squares approach to adaptive robust adjustment of parameters in linear regression models with autoregressive and t-distributed deviations

    NASA Astrophysics Data System (ADS)

    Kargoll, Boris; Omidalizarandi, Mohammad; Loth, Ina; Paffenholz, Jens-André; Alkhatib, Hamza

    2018-03-01

    In this paper, we investigate a linear regression time series model of possibly outlier-afflicted observations and autocorrelated random deviations. This colored noise is represented by a covariance-stationary autoregressive (AR) process, in which the independent error components follow a scaled (Student's) t-distribution. This error model allows for the stochastic modeling of multiple outliers and for an adaptive robust maximum likelihood (ML) estimation of the unknown regression and AR coefficients, the scale parameter, and the degree of freedom of the t-distribution. This approach is meant to be an extension of known estimators, which tend to focus only on the regression model, or on the AR error model, or on normally distributed errors. For the purpose of ML estimation, we derive an expectation conditional maximization either algorithm, which leads to an easy-to-implement version of iteratively reweighted least squares. The estimation performance of the algorithm is evaluated via Monte Carlo simulations for a Fourier as well as a spline model in connection with AR colored noise models of different orders and with three different sampling distributions generating the white noise components. We apply the algorithm to a vibration dataset recorded by a high-accuracy, single-axis accelerometer, focusing on the evaluation of the estimated AR colored noise model.

  18. Liquid detection with InGaAsP semiconductor lasers having multiple short external cavities.

    PubMed

    Zhu, X; Cassidy, D T

    1996-08-20

    A liquid detection system consisting of a diode laser with multiple short external cavities (MSXC's) is reported. The MSXC diode laser operates single mode on one of 18 distinct modes that span a range of 72 nm. We selected the modes by setting the length of one of the external cavities using a piezoelectric positioner. One can measure the transmission through cells by modulating the injection current at audio frequencies and using phase-sensitive detection to reject the ambient light and reduce 1/f noise. A method to determine regions of single-mode operation by the rms of the output of the laser is described. The transmission data were processed by multivariate calibration techniques, i.e., partial least squares and principal component regression. Water concentration in acetone was used to demonstrate the performance of the system. A correlation coefficient of R(2) = 0.997 and 0.29% root-mean-square error of prediction are found for water concentration over the range of 2-19%.

  19. Soil sail content estimation in the yellow river delta with satellite hyperspectral data

    USGS Publications Warehouse

    Weng, Yongling; Gong, Peng; Zhu, Zhi-Liang

    2008-01-01

    Soil salinization is one of the most common land degradation processes and is a severe environmental hazard. The primary objective of this study is to investigate the potential of predicting salt content in soils with hyperspectral data acquired with EO-1 Hyperion. Both partial least-squares regression (PLSR) and conventional multiple linear regression (MLR), such as stepwise regression (SWR), were tested as the prediction model. PLSR is commonly used to overcome the problem caused by high-dimensional and correlated predictors. Chemical analysis of 95 samples collected from the top layer of soils in the Yellow River delta area shows that salt content was high on average, and the dominant chemicals in the saline soil were NaCl and MgCl2. Multivariate models were established between soil contents and hyperspectral data. Our results indicate that the PLSR technique with laboratory spectral data has a strong prediction capacity. Spectral bands at 1487-1527, 1971-1991, 2032-2092, and 2163-2355 nm possessed large absolute values of regression coefficients, with the largest coefficient at 2203 nm. We obtained a root mean squared error (RMSE) for calibration (with 61 samples) of RMSEC = 0.753 (R2 = 0.893) and a root mean squared error for validation (with 30 samples) of RMSEV = 0.574. The prediction model was applied on a pixel-by-pixel basis to a Hyperion reflectance image to yield a quantitative surface distribution map of soil salt content. The result was validated successfully from 38 sampling points. We obtained an RMSE estimate of 1.037 (R2 = 0.784) for the soil salt content map derived by the PLSR model. The salinity map derived from the SWR model shows that the predicted value is higher than the true value. These results demonstrate that the PLSR method is a more suitable technique than stepwise regression for quantitative estimation of soil salt content in a large area. ?? 2008 CASI.

  20. Addressing the identification problem in age-period-cohort analysis: a tutorial on the use of partial least squares and principal components analysis.

    PubMed

    Tu, Yu-Kang; Krämer, Nicole; Lee, Wen-Chung

    2012-07-01

    In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix.

  1. Use of partial least squares regression to impute SNP genotypes in Italian cattle breeds.

    PubMed

    Dimauro, Corrado; Cellesi, Massimo; Gaspa, Giustino; Ajmone-Marsan, Paolo; Steri, Roberto; Marras, Gabriele; Macciotta, Nicolò P P

    2013-06-05

    The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available.

  2. [Application of artificial neural networks on the prediction of surface ozone concentrations].

    PubMed

    Shen, Lu-Lu; Wang, Yu-Xuan; Duan, Lei

    2011-08-01

    Ozone is an important secondary air pollutant in the lower atmosphere. In order to predict the hourly maximum ozone one day in advance based on the meteorological variables for the Wanqingsha site in Guangzhou, Guangdong province, a neural network model (Multi-Layer Perceptron) and a multiple linear regression model were used and compared. Model inputs are meteorological parameters (wind speed, wind direction, air temperature, relative humidity, barometric pressure and solar radiation) of the next day and hourly maximum ozone concentration of the previous day. The OBS (optimal brain surgeon) was adopted to prune the neutral work, to reduce its complexity and to improve its generalization ability. We find that the pruned neural network has the capacity to predict the peak ozone, with an agreement index of 92.3%, the root mean square error of 0.0428 mg/m3, the R-square of 0.737 and the success index of threshold exceedance 77.0% (the threshold O3 mixing ratio of 0.20 mg/m3). When the neural classifier was added to the neural network model, the success index of threshold exceedance increased to 83.6%. Through comparison of the performance indices between the multiple linear regression model and the neural network model, we conclud that that neural network is a better choice to predict peak ozone from meteorological forecast, which may be applied to practical prediction of ozone concentration.

  3. Estimating standard errors in feature network models.

    PubMed

    Frank, Laurence E; Heiser, Willem J

    2007-05-01

    Feature network models are graphical structures that represent proximity data in a discrete space while using the same formalism that is the basis of least squares methods employed in multidimensional scaling. Existing methods to derive a network model from empirical data only give the best-fitting network and yield no standard errors for the parameter estimates. The additivity properties of networks make it possible to consider the model as a univariate (multiple) linear regression problem with positivity restrictions on the parameters. In the present study, both theoretical and empirical standard errors are obtained for the constrained regression parameters of a network model with known features. The performance of both types of standard error is evaluated using Monte Carlo techniques.

  4. Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verdoolaege, G., E-mail: geert.verdoolaege@ugent.be; Laboratory for Plasma Physics, Royal Military Academy, B-1000 Brussels; Shabbir, A.

    Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standardmore » least squares.« less

  5. Model selection with multiple regression on distance matrices leads to incorrect inferences.

    PubMed

    Franckowiak, Ryan P; Panasci, Michael; Jarvis, Karl J; Acuña-Rodriguez, Ian S; Landguth, Erin L; Fortin, Marie-Josée; Wagner, Helene H

    2017-01-01

    In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.

  6. Application of third molar development and eruption models in estimating dental age in Malay sub-adults.

    PubMed

    Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc

    2015-08-01

    The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  7. Kernel analysis of partial least squares (PLS) regression models.

    PubMed

    Shinzawa, Hideyuki; Ritthiruangdej, Pitiporn; Ozaki, Yukihiro

    2011-05-01

    An analytical technique based on kernel matrix representation is demonstrated to provide further chemically meaningful insight into partial least squares (PLS) regression models. The kernel matrix condenses essential information about scores derived from PLS or principal component analysis (PCA). Thus, it becomes possible to establish the proper interpretation of the scores. A PLS model for the total nitrogen (TN) content in multiple Thai fish sauces is built with a set of near-infrared (NIR) transmittance spectra of the fish sauce samples. The kernel analysis of the scores effectively reveals that the variation of the spectral feature induced by the change in protein content is substantially associated with the total water content and the protein hydration. Kernel analysis is also carried out on a set of time-dependent infrared (IR) spectra representing transient evaporation of ethanol from a binary mixture solution of ethanol and oleic acid. A PLS model to predict the elapsed time is built with the IR spectra and the kernel matrix is derived from the scores. The detailed analysis of the kernel matrix provides penetrating insight into the interaction between the ethanol and the oleic acid.

  8. Estimating Dbh of Trees Employing Multiple Linear Regression of the best Lidar-Derived Parameter Combination Automated in Python in a Natural Broadleaf Forest in the Philippines

    NASA Astrophysics Data System (ADS)

    Ibanez, C. A. G.; Carcellar, B. G., III; Paringit, E. C.; Argamosa, R. J. L.; Faelga, R. A. G.; Posilero, M. A. V.; Zaragosa, G. P.; Dimayacyac, N. A.

    2016-06-01

    Diameter-at-Breast-Height Estimation is a prerequisite in various allometric equations estimating important forestry indices like stem volume, basal area, biomass and carbon stock. LiDAR Technology has a means of directly obtaining different forest parameters, except DBH, from the behavior and characteristics of point cloud unique in different forest classes. Extensive tree inventory was done on a two-hectare established sample plot in Mt. Makiling, Laguna for a natural growth forest. Coordinates, height, and canopy cover were measured and types of species were identified to compare to LiDAR derivatives. Multiple linear regression was used to get LiDAR-derived DBH by integrating field-derived DBH and 27 LiDAR-derived parameters at 20m, 10m, and 5m grid resolutions. To know the best combination of parameters in DBH Estimation, all possible combinations of parameters were generated and automated using python scripts and additional regression related libraries such as Numpy, Scipy, and Scikit learn were used. The combination that yields the highest r-squared or coefficient of determination and lowest AIC (Akaike's Information Criterion) and BIC (Bayesian Information Criterion) was determined to be the best equation. The equation is at its best using 11 parameters at 10mgrid size and at of 0.604 r-squared, 154.04 AIC and 175.08 BIC. Combination of parameters may differ among forest classes for further studies. Additional statistical tests can be supplemented to help determine the correlation among parameters such as Kaiser- Meyer-Olkin (KMO) Coefficient and the Barlett's Test for Spherecity (BTS).

  9. Retargeted Least Squares Regression Algorithm.

    PubMed

    Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin

    2015-09-01

    This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.

  10. Time-resolved flow reconstruction with indirect measurements using regression models and Kalman-filtered POD ROM

    NASA Astrophysics Data System (ADS)

    Leroux, Romain; Chatellier, Ludovic; David, Laurent

    2018-01-01

    This article is devoted to the estimation of time-resolved particle image velocimetry (TR-PIV) flow fields using a time-resolved point measurements of a voltage signal obtained by hot-film anemometry. A multiple linear regression model is first defined to map the TR-PIV flow fields onto the voltage signal. Due to the high temporal resolution of the signal acquired by the hot-film sensor, the estimates of the TR-PIV flow fields are obtained with a multiple linear regression method called orthonormalized partial least squares regression (OPLSR). Subsequently, this model is incorporated as the observation equation in an ensemble Kalman filter (EnKF) applied on a proper orthogonal decomposition reduced-order model to stabilize it while reducing the effects of the hot-film sensor noise. This method is assessed for the reconstruction of the flow around a NACA0012 airfoil at a Reynolds number of 1000 and an angle of attack of {20}°. Comparisons with multi-time delay-modified linear stochastic estimation show that both the OPLSR and EnKF combined with OPLSR are more accurate as they produce a much lower relative estimation error, and provide a faithful reconstruction of the time evolution of the velocity flow fields.

  11. Geodesic least squares regression for scaling studies in magnetic confinement fusion

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verdoolaege, Geert

    In regression analyses for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to scaling laws. We here discuss a new regression method that is robust in the presence of significant uncertainty on both the data and the regression model. The method, which we call geodesic least squares regression (GLS), is based on minimization of the Rao geodesic distance on a probabilistic manifold. We demonstrate the superiority ofmore » the method using synthetic data and we present an application to the scaling law for the power threshold for the transition to the high confinement regime in magnetic confinement fusion devices.« less

  12. The extraction of simple relationships in growth factor-specific multiple-input and multiple-output systems in cell-fate decisions by backward elimination PLS regression.

    PubMed

    Akimoto, Yuki; Yugi, Katsuyuki; Uda, Shinsuke; Kudo, Takamasa; Komori, Yasunori; Kubota, Hiroyuki; Kuroda, Shinya

    2013-01-01

    Cells use common signaling molecules for the selective control of downstream gene expression and cell-fate decisions. The relationship between signaling molecules and downstream gene expression and cellular phenotypes is a multiple-input and multiple-output (MIMO) system and is difficult to understand due to its complexity. For example, it has been reported that, in PC12 cells, different types of growth factors activate MAP kinases (MAPKs) including ERK, JNK, and p38, and CREB, for selective protein expression of immediate early genes (IEGs) such as c-FOS, c-JUN, EGR1, JUNB, and FOSB, leading to cell differentiation, proliferation and cell death; however, how multiple-inputs such as MAPKs and CREB regulate multiple-outputs such as expression of the IEGs and cellular phenotypes remains unclear. To address this issue, we employed a statistical method called partial least squares (PLS) regression, which involves a reduction of the dimensionality of the inputs and outputs into latent variables and a linear regression between these latent variables. We measured 1,200 data points for MAPKs and CREB as the inputs and 1,900 data points for IEGs and cellular phenotypes as the outputs, and we constructed the PLS model from these data. The PLS model highlighted the complexity of the MIMO system and growth factor-specific input-output relationships of cell-fate decisions in PC12 cells. Furthermore, to reduce the complexity, we applied a backward elimination method to the PLS regression, in which 60 input variables were reduced to 5 variables, including the phosphorylation of ERK at 10 min, CREB at 5 min and 60 min, AKT at 5 min and JNK at 30 min. The simple PLS model with only 5 input variables demonstrated a predictive ability comparable to that of the full PLS model. The 5 input variables effectively extracted the growth factor-specific simple relationships within the MIMO system in cell-fate decisions in PC12 cells.

  13. Methods for estimating annual exceedance probability discharges for streams in Arkansas, based on data through water year 2013

    USGS Publications Warehouse

    Wagner, Daniel M.; Krieger, Joshua D.; Veilleux, Andrea G.

    2016-08-04

    In 2013, the U.S. Geological Survey initiated a study to update regional skew, annual exceedance probability discharges, and regional regression equations used to estimate annual exceedance probability discharges for ungaged locations on streams in the study area with the use of recent geospatial data, new analytical methods, and available annual peak-discharge data through the 2013 water year. An analysis of regional skew using Bayesian weighted least-squares/Bayesian generalized-least squares regression was performed for Arkansas, Louisiana, and parts of Missouri and Oklahoma. The newly developed constant regional skew of -0.17 was used in the computation of annual exceedance probability discharges for 281 streamgages used in the regional regression analysis. Based on analysis of covariance, four flood regions were identified for use in the generation of regional regression models. Thirty-nine basin characteristics were considered as potential explanatory variables, and ordinary least-squares regression techniques were used to determine the optimum combinations of basin characteristics for each of the four regions. Basin characteristics in candidate models were evaluated based on multicollinearity with other basin characteristics (variance inflation factor < 2.5) and statistical significance at the 95-percent confidence level (p ≤ 0.05). Generalized least-squares regression was used to develop the final regression models for each flood region. Average standard errors of prediction of the generalized least-squares models ranged from 32.76 to 59.53 percent, with the largest range in flood region D. Pseudo coefficients of determination of the generalized least-squares models ranged from 90.29 to 97.28 percent, with the largest range also in flood region D. The regional regression equations apply only to locations on streams in Arkansas where annual peak discharges are not substantially affected by regulation, diversion, channelization, backwater, or urbanization. The applicability and accuracy of the regional regression equations depend on the basin characteristics measured for an ungaged location on a stream being within range of those used to develop the equations.

  14. Determination of total phenolic compounds in compost by infrared spectroscopy.

    PubMed

    Cascant, M M; Sisouane, M; Tahiri, S; Krati, M El; Cervera, M L; Garrigues, S; de la Guardia, M

    2016-06-01

    Middle and near infrared (MIR and NIR) were applied to determine the total phenolic compounds (TPC) content in compost samples based on models built by using partial least squares (PLS) regression. The multiplicative scatter correction, standard normal variate and first derivative were employed as spectra pretreatment, and the number of latent variable were optimized by leave-one-out cross-validation. The performance of PLS-ATR-MIR and PLS-DR-NIR models was evaluated according to root mean square error of cross validation and prediction (RMSECV and RMSEP), the coefficient of determination for prediction (Rpred(2)) and residual predictive deviation (RPD) being obtained for this latter values of 5.83 and 8.26 for MIR and NIR, respectively. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Application of Fourier transform near-infrared spectroscopy combined with high-performance liquid chromatography in rapid and simultaneous determination of essential components in crude Radix Scrophulariae.

    PubMed

    Li, Xiaomeng; Fang, Dansi; Cong, Xiaodong; Cao, Gang; Cai, Hao; Cai, Baochang

    2012-12-01

    A method is described using rapid and sensitive Fourier transform near-infrared spectroscopy combined with high-performance liquid chromatography-diode array detection for the simultaneous identification and determination of four bioactive compounds in crude Radix Scrophulariae samples. Partial least squares regression is selected as the analysis type and multiplicative scatter correction, second derivative, and Savitzky-Golay filter were adopted for the spectral pretreatment. The correlation coefficients (R) of the calibration models were above 0.96 and the root mean square error of predictions were under 0.028. The developed models were applied to unknown samples with satisfactory results. The established method was validated and can be applied to the intrinsic quality control of crude Radix Scrophulariae.

  16. On structure-exploiting trust-region regularized nonlinear least squares algorithms for neural-network learning.

    PubMed

    Mizutani, Eiji; Demmel, James W

    2003-01-01

    This paper briefly introduces our numerical linear algebra approaches for solving structured nonlinear least squares problems arising from 'multiple-output' neural-network (NN) models. Our algorithms feature trust-region regularization, and exploit sparsity of either the 'block-angular' residual Jacobian matrix or the 'block-arrow' Gauss-Newton Hessian (or Fisher information matrix in statistical sense) depending on problem scale so as to render a large class of NN-learning algorithms 'efficient' in both memory and operation costs. Using a relatively large real-world nonlinear regression application, we shall explain algorithmic strengths and weaknesses, analyzing simulation results obtained by both direct and iterative trust-region algorithms with two distinct NN models: 'multilayer perceptrons' (MLP) and 'complementary mixtures of MLP-experts' (or neuro-fuzzy modular networks).

  17. Application of least median of squared orthogonal distance (LMD) and LMD-based reweighted least squares (RLS) methods on the stock-recruitment relationship

    NASA Astrophysics Data System (ADS)

    Wang, Yan-Jun; Liu, Qun

    1999-03-01

    Analysis of stock-recruitment (SR) data is most often done by fitting various SR relationship curves to the data. Fish population dynamics data often have stochastic variations and measurement errors, which usually result in a biased regression analysis. This paper presents a robust regression method, least median of squared orthogonal distance (LMD), which is insensitive to abnormal values in the dependent and independent variables in a regression analysis. Outliers that have significantly different variance from the rest of the data can be identified in a residual analysis. Then, the least squares (LS) method is applied to the SR data with defined outliers being down weighted. The application of LMD and LMD-based Reweighted Least Squares (RLS) method to simulated and real fisheries SR data is explored.

  18. A Weighted Least Squares Approach To Robustify Least Squares Estimates.

    ERIC Educational Resources Information Center

    Lin, Chowhong; Davenport, Ernest C., Jr.

    This study developed a robust linear regression technique based on the idea of weighted least squares. In this technique, a subsample of the full data of interest is drawn, based on a measure of distance, and an initial set of regression coefficients is calculated. The rest of the data points are then taken into the subsample, one after another,…

  19. Validation of Core Temperature Estimation Algorithm

    DTIC Science & Technology

    2016-01-29

    plot of observed versus estimated core temperature with the line of identity (dashed) and the least squares regression line (solid) and line equation...estimated PSI with the line of identity (dashed) and the least squares regression line (solid) and line equation in the top left corner. (b) Bland...for comparison. The root mean squared error (RMSE) was also computed, as given by Equation 2.

  20. Multiple regression and Artificial Neural Network for long-term rainfall forecasting using large scale climate modes

    NASA Astrophysics Data System (ADS)

    Mekanik, F.; Imteaz, M. A.; Gato-Trinidad, S.; Elmahdi, A.

    2013-10-01

    In this study, the application of Artificial Neural Networks (ANN) and Multiple regression analysis (MR) to forecast long-term seasonal spring rainfall in Victoria, Australia was investigated using lagged El Nino Southern Oscillation (ENSO) and Indian Ocean Dipole (IOD) as potential predictors. The use of dual (combined lagged ENSO-IOD) input sets for calibrating and validating ANN and MR Models is proposed to investigate the simultaneous effect of past values of these two major climate modes on long-term spring rainfall prediction. The MR models that did not violate the limits of statistical significance and multicollinearity were selected for future spring rainfall forecast. The ANN was developed in the form of multilayer perceptron using Levenberg-Marquardt algorithm. Both MR and ANN modelling were assessed statistically using mean square error (MSE), mean absolute error (MAE), Pearson correlation (r) and Willmott index of agreement (d). The developed MR and ANN models were tested on out-of-sample test sets; the MR models showed very poor generalisation ability for east Victoria with correlation coefficients of -0.99 to -0.90 compared to ANN with correlation coefficients of 0.42-0.93; ANN models also showed better generalisation ability for central and west Victoria with correlation coefficients of 0.68-0.85 and 0.58-0.97 respectively. The ability of multiple regression models to forecast out-of-sample sets is compatible with ANN for Daylesford in central Victoria and Kaniva in west Victoria (r = 0.92 and 0.67 respectively). The errors of the testing sets for ANN models are generally lower compared to multiple regression models. The statistical analysis suggest the potential of ANN over MR models for rainfall forecasting using large scale climate modes.

  1. Analyzing Multilevel Data: Comparing Findings from Hierarchical Linear Modeling and Ordinary Least Squares Regression

    ERIC Educational Resources Information Center

    Rocconi, Louis M.

    2013-01-01

    This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…

  2. Least median of squares and iteratively re-weighted least squares as robust linear regression methods for fluorimetric determination of α-lipoic acid in capsules in ideal and non-ideal cases of linearity.

    PubMed

    Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F

    2018-06-01

    This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.

  3. GIS-based spatial statistical analysis of risk areas for liver flukes in Surin Province of Thailand.

    PubMed

    Rujirakul, Ratana; Ueng-arporn, Naporn; Kaewpitoon, Soraya; Loyd, Ryan J; Kaewthani, Sarochinee; Kaewpitoon, Natthawut

    2015-01-01

    It is urgently necessary to be aware of the distribution and risk areas of liver fluke, Opisthorchis viverrini, for proper allocation of prevention and control measures. This study aimed to investigate the human behavior, and environmental factors influencing the distribution in Surin Province of Thailand, and to build a model using stepwise multiple regression analysis with a geographic information system (GIS) on environment and climate data. The relationship between the human behavior, attitudes (<50%; X111), environmental factors like population density (148-169 pop/km2; X73), and land use as wetland (X64), were correlated with the liver fluke disease distribution at 0.000, 0.034, and 0.006 levels, respectively. Multiple regression analysis, by equations OV=-0.599+0.005(population density (148-169 pop/km2); X73)+0.040 (human attitude (<50%); X111)+0.022 (land used (wetland; X64), was used to predict the distribution of liver fluke. OV is the patients of liver fluke infection, R Square=0.878, and, Adjust R Square=0.849. By GIS analysis, we found Si Narong, Sangkha, Phanom Dong Rak, Mueang Surin, Non Narai, Samrong Thap, Chumphon Buri, and Rattanaburi to have the highest distributions in Surin province. In conclusion, the combination of GIS and statistical analysis can help simulate the spatial distribution and risk areas of liver fluke, and thus may be an important tool for future planning of prevention and control measures.

  4. Estimation of lung tumor position from multiple anatomical features on 4D-CT using multiple regression analysis.

    PubMed

    Ono, Tomohiro; Nakamura, Mitsuhiro; Hirose, Yoshinori; Kitsuda, Kenji; Ono, Yuka; Ishigaki, Takashi; Hiraoka, Masahiro

    2017-09-01

    To estimate the lung tumor position from multiple anatomical features on four-dimensional computed tomography (4D-CT) data sets using single regression analysis (SRA) and multiple regression analysis (MRA) approach and evaluate an impact of the approach on internal target volume (ITV) for stereotactic body radiotherapy (SBRT) of the lung. Eleven consecutive lung cancer patients (12 cases) underwent 4D-CT scanning. The three-dimensional (3D) lung tumor motion exceeded 5 mm. The 3D tumor position and anatomical features, including lung volume, diaphragm, abdominal wall, and chest wall positions, were measured on 4D-CT images. The tumor position was estimated by SRA using each anatomical feature and MRA using all anatomical features. The difference between the actual and estimated tumor positions was defined as the root-mean-square error (RMSE). A standard partial regression coefficient for the MRA was evaluated. The 3D lung tumor position showed a high correlation with the lung volume (R = 0.92 ± 0.10). Additionally, ITVs derived from SRA and MRA approaches were compared with ITV derived from contouring gross tumor volumes on all 10 phases of the 4D-CT (conventional ITV). The RMSE of the SRA was within 3.7 mm in all directions. Also, the RMSE of the MRA was within 1.6 mm in all directions. The standard partial regression coefficient for the lung volume was the largest and had the most influence on the estimated tumor position. Compared with conventional ITV, average percentage decrease of ITV were 31.9% and 38.3% using SRA and MRA approaches, respectively. The estimation accuracy of lung tumor position was improved by the MRA approach, which provided smaller ITV than conventional ITV. © 2017 The Authors. Journal of Applied Clinical Medical Physics published by Wiley Periodicals, Inc. on behalf of American Association of Physicists in Medicine.

  5. Pan evaporation modeling using six different heuristic computing methods in different climates of China

    NASA Astrophysics Data System (ADS)

    Wang, Lunche; Kisi, Ozgur; Zounemat-Kermani, Mohammad; Li, Hui

    2017-01-01

    Pan evaporation (Ep) plays important roles in agricultural water resources management. One of the basic challenges is modeling Ep using limited climatic parameters because there are a number of factors affecting the evaporation rate. This study investigated the abilities of six different soft computing methods, multi-layer perceptron (MLP), generalized regression neural network (GRNN), fuzzy genetic (FG), least square support vector machine (LSSVM), multivariate adaptive regression spline (MARS), adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP), and two regression methods, multiple linear regression (MLR) and Stephens and Stewart model (SS) in predicting monthly Ep. Long-term climatic data at various sites crossing a wide range of climates during 1961-2000 are used for model development and validation. The results showed that the models have different accuracies in different climates and the MLP model performed superior to the other models in predicting monthly Ep at most stations using local input combinations (for example, the MAE (mean absolute errors), RMSE (root mean square errors), and determination coefficient (R2) are 0.314 mm/day, 0.405 mm/day and 0.988, respectively for HEB station), while GRNN model performed better in Tibetan Plateau (MAE, RMSE and R2 are 0.459 mm/day, 0.592 mm/day and 0.932, respectively). The accuracies of above models ranked as: MLP, GRNN, LSSVM, FG, ANFIS-GP, MARS and MLR. The overall results indicated that the soft computing techniques generally performed better than the regression methods, but MLR and SS models can be more preferred at some climatic zones instead of complex nonlinear models, for example, the BJ (Beijing), CQ (Chongqing) and HK (Haikou) stations. Therefore, it can be concluded that Ep could be successfully predicted using above models in hydrological modeling studies.

  6. Improving Global Models of Remotely Sensed Ocean Chlorophyll Content Using Partial Least Squares and Geographically Weighted Regression

    NASA Astrophysics Data System (ADS)

    Gholizadeh, H.; Robeson, S. M.

    2015-12-01

    Empirical models have been widely used to estimate global chlorophyll content from remotely sensed data. Here, we focus on the standard NASA empirical models that use blue-green band ratios. These band ratio ocean color (OC) algorithms are in the form of fourth-order polynomials and the parameters of these polynomials (i.e. coefficients) are estimated from the NASA bio-Optical Marine Algorithm Data set (NOMAD). Most of the points in this data set have been sampled from tropical and temperate regions. However, polynomial coefficients obtained from this data set are used to estimate chlorophyll content in all ocean regions with different properties such as sea-surface temperature, salinity, and downwelling/upwelling patterns. Further, the polynomial terms in these models are highly correlated. In sum, the limitations of these empirical models are as follows: 1) the independent variables within the empirical models, in their current form, are correlated (multicollinear), and 2) current algorithms are global approaches and are based on the spatial stationarity assumption, so they are independent of location. Multicollinearity problem is resolved by using partial least squares (PLS). PLS, which transforms the data into a set of independent components, can be considered as a combined form of principal component regression (PCR) and multiple regression. Geographically weighted regression (GWR) is also used to investigate the validity of spatial stationarity assumption. GWR solves a regression model over each sample point by using the observations within its neighbourhood. PLS results show that the empirical method underestimates chlorophyll content in high latitudes, including the Southern Ocean region, when compared to PLS (see Figure 1). Cluster analysis of GWR coefficients also shows that the spatial stationarity assumption in empirical models is not likely a valid assumption.

  7. Near infrared spectral linearisation in quantifying soluble solids content of intact carambola.

    PubMed

    Omar, Ahmad Fairuz; MatJafri, Mohd Zubir

    2013-04-12

    This study presents a novel application of near infrared (NIR) spectral linearisation for measuring the soluble solids content (SSC) of carambola fruits. NIR spectra were measured using reflectance and interactance methods. In this study, only the interactance measurement technique successfully generated a reliable measurement result with a coefficient of determination of (R2) = 0.724 and a root mean square error of prediction for (RMSEP) = 0.461° Brix. The results from this technique produced a highly accurate and stable prediction model compared with multiple linear regression techniques.

  8. Near Infrared Spectral Linearisation in Quantifying Soluble Solids Content of Intact Carambola

    PubMed Central

    Omar, Ahmad Fairuz; MatJafri, Mohd Zubir

    2013-01-01

    This study presents a novel application of near infrared (NIR) spectral linearisation for measuring the soluble solids content (SSC) of carambola fruits. NIR spectra were measured using reflectance and interactance methods. In this study, only the interactance measurement technique successfully generated a reliable measurement result with a coefficient of determination of (R2) = 0.724 and a root mean square error of prediction for (RMSEP) = 0.461° Brix. The results from this technique produced a highly accurate and stable prediction model compared with multiple linear regression techniques. PMID:23584118

  9. Power and instrument strength requirements for Mendelian randomization studies using multiple genetic variants.

    PubMed

    Pierce, Brandon L; Ahsan, Habibul; Vanderweele, Tyler J

    2011-06-01

    Mendelian Randomization (MR) studies assess the causality of an exposure-disease association using genetic determinants [i.e. instrumental variables (IVs)] of the exposure. Power and IV strength requirements for MR studies using multiple genetic variants have not been explored. We simulated cohort data sets consisting of a normally distributed disease trait, a normally distributed exposure, which affects this trait and a biallelic genetic variant that affects the exposure. We estimated power to detect an effect of exposure on disease for varying allele frequencies, effect sizes and samples sizes (using two-stage least squares regression on 10,000 data sets-Stage 1 is a regression of exposure on the variant. Stage 2 is a regression of disease on the fitted exposure). Similar analyses were conducted using multiple genetic variants (5, 10, 20) as independent or combined IVs. We assessed IV strength using the first-stage F statistic. Simulations of realistic scenarios indicate that MR studies will require large (n > 1000), often very large (n > 10,000), sample sizes. In many cases, so-called 'weak IV' problems arise when using multiple variants as independent IVs (even with as few as five), resulting in biased effect estimates. Combining genetic factors into fewer IVs results in modest power decreases, but alleviates weak IV problems. Ideal methods for combining genetic factors depend upon knowledge of the genetic architecture underlying the exposure. The feasibility of well-powered, unbiased MR studies will depend upon the amount of variance in the exposure that can be explained by known genetic factors and the 'strength' of the IV set derived from these genetic factors.

  10. Magnitude and frequency of floods in small drainage basins in Idaho

    USGS Publications Warehouse

    Thomas, C.A.; Harenberg, W.A.; Anderson, J.M.

    1973-01-01

    A method is presented in this report for determining magnitude and frequency of floods on streams with drainage areas between 0.5 and 200 square miles. The method relates basin characteristics, including drainage area, percentage of forest cover, percentage of water area, latitude, and longitude, with peak flow characteristics. Regression equations for each of eight regions are presented for determination of QIQ/ the peak discharge, which, on the average, will be exceeded once in 10 years. Peak flows, Q25 and Q 50 , can then be estimated from Q25/Q10 and Q-50/Q-10 ratios developed for each region. Nomographs are included which solve the equations for basins between 1 and 50 square miles. The regional regression equations were developed using multiple regression techniques. Annual peaks for 303 sites were analyzed in the study. These included all records on unregulated streams with drainage areas less than about 500 square miles with 10 years or more of record or which could readily be extended to 10 years on the basis of nearby streams. The log-Pearson Type III method as modified and a digital computer were employed to estimate magnitude and frequency of floods for each of the 303 gaged sites. A large number of physical and climatic basin characteristics were determined for each of the gaged sites. The multiple regression method was then applied to determine the equations relating the floodflows and the most significant basin characteristics. For convenience of the users, several equations were simplified and some complex characteristics were deleted at the sacrifice of some increase in the standard error. Standard errors of estimate and many other statistical data were computed in the analysis process and are available in the Boise district office files. The analysis showed that QIQ was the best defined and most practical index flood for determination of the Q25 and 0,50 flood estimates.Regression equations are not developed because of poor definition for areas which total about 20,000 square miles, most of which are in southern Idaho. These areas are described in the report to prevent use of regression equations where they do not apply. They include urbanized areas, streams affected by regulation or diversion by works of man, unforested areas, streams with gaining or losing reaches, streams draining alluvial valleys and the Snake Plain, intense thunderstorm areas, and scattered areas where records indicate recurring floods which depart from the regional equations. Maximum flows of record and basin locations are summarized in tables and maps. The analysis indicates deficiencies in data exist. To improve knowledge regarding flood characteristics in poorly defined areas, the following data-collection programs are recommended. Gages should be operated on a few selected small streams for an extended period to define floods at long recurrence intervals. Crest-stage gages should be operated in representative basins in urbanized areas, newly developed irrigated areas and grasslands, and in unforested areas. Unusual floods should continue to be measured at miscellaneous sites on regulated streams and in intense thunderstorm-prone areas. The relationship between channel geometry and floodflow characteristics should be investigated as an alternative or supplement to operation of gaging stations. Documentation of historic flood data from newspapers and other sources would improve the basic flood-data base.

  11. A comparison of model-based imputation methods for handling missing predictor values in a linear regression model: A simulation study

    NASA Astrophysics Data System (ADS)

    Hasan, Haliza; Ahmad, Sanizah; Osman, Balkish Mohd; Sapri, Shamsiah; Othman, Nadirah

    2017-08-01

    In regression analysis, missing covariate data has been a common problem. Many researchers use ad hoc methods to overcome this problem due to the ease of implementation. However, these methods require assumptions about the data that rarely hold in practice. Model-based methods such as Maximum Likelihood (ML) using the expectation maximization (EM) algorithm and Multiple Imputation (MI) are more promising when dealing with difficulties caused by missing data. Then again, inappropriate methods of missing value imputation can lead to serious bias that severely affects the parameter estimates. The main objective of this study is to provide a better understanding regarding missing data concept that can assist the researcher to select the appropriate missing data imputation methods. A simulation study was performed to assess the effects of different missing data techniques on the performance of a regression model. The covariate data were generated using an underlying multivariate normal distribution and the dependent variable was generated as a combination of explanatory variables. Missing values in covariate were simulated using a mechanism called missing at random (MAR). Four levels of missingness (10%, 20%, 30% and 40%) were imposed. ML and MI techniques available within SAS software were investigated. A linear regression analysis was fitted and the model performance measures; MSE, and R-Squared were obtained. Results of the analysis showed that MI is superior in handling missing data with highest R-Squared and lowest MSE when percent of missingness is less than 30%. Both methods are unable to handle larger than 30% level of missingness.

  12. Computing daily mean streamflow at ungaged locations in Iowa by using the Flow Anywhere and Flow Duration Curve Transfer statistical methods

    USGS Publications Warehouse

    Linhart, S. Mike; Nania, Jon F.; Sanders, Curtis L.; Archfield, Stacey A.

    2012-01-01

    The U.S. Geological Survey (USGS) maintains approximately 148 real-time streamgages in Iowa for which daily mean streamflow information is available, but daily mean streamflow data commonly are needed at locations where no streamgages are present. Therefore, the USGS conducted a study as part of a larger project in cooperation with the Iowa Department of Natural Resources to develop methods to estimate daily mean streamflow at locations in ungaged watersheds in Iowa by using two regression-based statistical methods. The regression equations for the statistical methods were developed from historical daily mean streamflow and basin characteristics from streamgages within the study area, which includes the entire State of Iowa and adjacent areas within a 50-mile buffer of Iowa in neighboring states. Results of this study can be used with other techniques to determine the best method for application in Iowa and can be used to produce a Web-based geographic information system tool to compute streamflow estimates automatically. The Flow Anywhere statistical method is a variation of the drainage-area-ratio method, which transfers same-day streamflow information from a reference streamgage to another location by using the daily mean streamflow at the reference streamgage and the drainage-area ratio of the two locations. The Flow Anywhere method modifies the drainage-area-ratio method in order to regionalize the equations for Iowa and determine the best reference streamgage from which to transfer same-day streamflow information to an ungaged location. Data used for the Flow Anywhere method were retrieved for 123 continuous-record streamgages located in Iowa and within a 50-mile buffer of Iowa. The final regression equations were computed by using either left-censored regression techniques with a low limit threshold set at 0.1 cubic feet per second (ft3/s) and the daily mean streamflow for the 15th day of every other month, or by using an ordinary-least-squares multiple linear regression method and the daily mean streamflow for the 15th day of every other month. The Flow Duration Curve Transfer method was used to estimate unregulated daily mean streamflow from the physical and climatic characteristics of gaged basins. For the Flow Duration Curve Transfer method, daily mean streamflow quantiles at the ungaged site were estimated with the parameter-based regression model, which results in a continuous daily flow-duration curve (the relation between exceedance probability and streamflow for each day of observed streamflow) at the ungaged site. By the use of a reference streamgage, the Flow Duration Curve Transfer is converted to a time series. Data used in the Flow Duration Curve Transfer method were retrieved for 113 continuous-record streamgages in Iowa and within a 50-mile buffer of Iowa. The final statewide regression equations for Iowa were computed by using a weighted-least-squares multiple linear regression method and were computed for the 0.01-, 0.05-, 0.10-, 0.15-, 0.20-, 0.30-, 0.40-, 0.50-, 0.60-, 0.70-, 0.80-, 0.85-, 0.90-, and 0.95-exceedance probability statistics determined from the daily mean streamflow with a reporting limit set at 0.1 ft3/s. The final statewide regression equation for Iowa computed by using left-censored regression techniques was computed for the 0.99-exceedance probability statistic determined from the daily mean streamflow with a low limit threshold and a reporting limit set at 0.1 ft3/s. For the Flow Anywhere method, results of the validation study conducted by using six streamgages show that differences between the root-mean-square error and the mean absolute error ranged from 1,016 to 138 ft3/s, with the larger value signifying a greater occurrence of outliers between observed and estimated streamflows. Root-mean-square-error values ranged from 1,690 to 237 ft3/s. Values of the percent root-mean-square error ranged from 115 percent to 26.2 percent. The logarithm (base 10) streamflow percent root-mean-square error ranged from 13.0 to 5.3 percent. Root-mean-square-error observations standard-deviation-ratio values ranged from 0.80 to 0.40. Percent-bias values ranged from 25.4 to 4.0 percent. Untransformed streamflow Nash-Sutcliffe efficiency values ranged from 0.84 to 0.35. The logarithm (base 10) streamflow Nash-Sutcliffe efficiency values ranged from 0.86 to 0.56. For the streamgage with the best agreement between observed and estimated streamflow, higher streamflows appear to be underestimated. For the streamgage with the worst agreement between observed and estimated streamflow, low flows appear to be overestimated whereas higher flows seem to be underestimated. Estimated cumulative streamflows for the period October 1, 2004, to September 30, 2009, are underestimated by -25.8 and -7.4 percent for the closest and poorest comparisons, respectively. For the Flow Duration Curve Transfer method, results of the validation study conducted by using the same six streamgages show that differences between the root-mean-square error and the mean absolute error ranged from 437 to 93.9 ft3/s, with the larger value signifying a greater occurrence of outliers between observed and estimated streamflows. Root-mean-square-error values ranged from 906 to 169 ft3/s. Values of the percent root-mean-square-error ranged from 67.0 to 25.6 percent. The logarithm (base 10) streamflow percent root-mean-square error ranged from 12.5 to 4.4 percent. Root-mean-square-error observations standard-deviation-ratio values ranged from 0.79 to 0.40. Percent-bias values ranged from 22.7 to 0.94 percent. Untransformed streamflow Nash-Sutcliffe efficiency values ranged from 0.84 to 0.38. The logarithm (base 10) streamflow Nash-Sutcliffe efficiency values ranged from 0.89 to 0.48. For the streamgage with the closest agreement between observed and estimated streamflow, there is relatively good agreement between observed and estimated streamflows. For the streamgage with the poorest agreement between observed and estimated streamflow, streamflows appear to be substantially underestimated for much of the time period. Estimated cumulative streamflow for the period October 1, 2004, to September 30, 2009, are underestimated by -9.3 and -22.7 percent for the closest and poorest comparisons, respectively.

  13. Orthogonalizing EM: A design-based least squares algorithm.

    PubMed

    Xiong, Shifeng; Dai, Bin; Huling, Jared; Qian, Peter Z G

    We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares problems, and is considerably faster than competing methods when n is much larger than p . Supplementary materials for this article are available online.

  14. Quantitative analysis of bayberry juice acidity based on visible and near-infrared spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shao Yongni; He Yong; Mao Jingyuan

    Visible and near-infrared (Vis/NIR) reflectance spectroscopy has been investigated for its ability to nondestructively detect acidity in bayberry juice. What we believe to be a new, better mathematic model is put forward, which we have named principal component analysis-stepwise regression analysis-backpropagation neural network (PCA-SRA-BPNN), to build a correlation between the spectral reflectivity data and the acidity of bayberry juice. In this model, the optimum network parameters,such as the number of input nodes, hidden nodes, learning rate, and momentum, are chosen by the value of root-mean-square (rms) error. The results show that its prediction statistical parameters are correlation coefficient (r) ofmore » 0.9451 and root-mean-square error of prediction(RMSEP) of 0.1168. Partial least-squares (PLS) regression is also established to compare with this model. Before doing this, the influences of various spectral pretreatments (standard normal variate, multiplicative scatter correction, S. Golay first derivative, and wavelet package transform) are compared. The PLS approach with wavelet package transform preprocessing spectra is found to provide the best results, and its prediction statistical parameters are correlation coefficient (r) of 0.9061 and RMSEP of 0.1564. Hence, these two models are both desirable to analyze the data from Vis/NIR spectroscopy and to solve the problem of the acidity prediction of bayberry juice. This supplies basal research to ultimately realize the online measurements of the juice's internal quality through this Vis/NIR spectroscopy technique.« less

  15. RRegrs: an R package for computer-aided model selection with multiple regression models.

    PubMed

    Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L

    2015-01-01

    Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.

  16. The Use of Alternative Regression Methods in Social Sciences and the Comparison of Least Squares and M Estimation Methods in Terms of the Determination of Coefficient

    ERIC Educational Resources Information Center

    Coskuntuncel, Orkun

    2013-01-01

    The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…

  17. Application of Partial Least Squares (PLS) Regression to Determine Landscape-Scale Aquatic Resource Vulnerability in the Ozark Mountains

    EPA Science Inventory

    Partial least squares (PLS) analysis offers a number of advantages over the more traditionally used regression analyses applied in landscape ecology to study the associations among constituents of surface water and landscapes. Common data problems in ecological studies include: s...

  18. Methods for estimating the magnitude and frequency of floods for urban and small, rural streams in Georgia, South Carolina, and North Carolina, 2011

    USGS Publications Warehouse

    Feaster, Toby D.; Gotvald, Anthony J.; Weaver, J. Curtis

    2014-01-01

    Reliable estimates of the magnitude and frequency of floods are essential for the design of transportation and water-conveyance structures, flood-insurance studies, and flood-plain management. Such estimates are particularly important in densely populated urban areas. In order to increase the number of streamflow-gaging stations (streamgages) available for analysis, expand the geographical coverage that would allow for application of regional regression equations across State boundaries, and build on a previous flood-frequency investigation of rural U.S Geological Survey streamgages in the Southeast United States, a multistate approach was used to update methods for determining the magnitude and frequency of floods in urban and small, rural streams that are not substantially affected by regulation or tidal fluctuations in Georgia, South Carolina, and North Carolina. The at-site flood-frequency analysis of annual peak-flow data for urban and small, rural streams (through September 30, 2011) included 116 urban streamgages and 32 small, rural streamgages, defined in this report as basins draining less than 1 square mile. The regional regression analysis included annual peak-flow data from an additional 338 rural streamgages previously included in U.S. Geological Survey flood-frequency reports and 2 additional rural streamgages in North Carolina that were not included in the previous Southeast rural flood-frequency investigation for a total of 488 streamgages included in the urban and small, rural regression analysis. The at-site flood-frequency analyses for the urban and small, rural streamgages included the expected moments algorithm, which is a modification of the Bulletin 17B log-Pearson type III method for fitting the statistical distribution to the logarithms of the annual peak flows. Where applicable, the flood-frequency analysis also included low-outlier and historic information. Additionally, the application of a generalized Grubbs-Becks test allowed for the detection of multiple potentially influential low outliers. Streamgage basin characteristics were determined using geographical information system techniques. Initial ordinary least squares regression simulations reduced the number of basin characteristics on the basis of such factors as statistical significance, coefficient of determination, Mallow’s Cp statistic, and ease of measurement of the explanatory variable. Application of generalized least squares regression techniques produced final predictive (regression) equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probability flows for urban and small, rural ungaged basins for three hydrologic regions (HR1, Piedmont–Ridge and Valley; HR3, Sand Hills; and HR4, Coastal Plain), which previously had been defined from exploratory regression analysis in the Southeast rural flood-frequency investigation. Because of the limited availability of urban streamgages in the Coastal Plain of Georgia, South Carolina, and North Carolina, additional urban streamgages in Florida and New Jersey were used in the regression analysis for this region. Including the urban streamgages in New Jersey allowed for the expansion of the applicability of the predictive equations in the Coastal Plain from 3.5 to 53.5 square miles. Average standard error of prediction for the predictive equations, which is a measure of the average accuracy of the regression equations when predicting flood estimates for ungaged sites, range from 25.0 percent for the 10-percent annual exceedance probability regression equation for the Piedmont–Ridge and Valley region to 73.3 percent for the 0.2-percent annual exceedance probability regression equation for the Sand Hills region.

  19. Rapid determination of crocins in saffron by near-infrared spectroscopy combined with chemometric techniques

    NASA Astrophysics Data System (ADS)

    Li, Shuailing; Shao, Qingsong; Lu, Zhonghua; Duan, Chengli; Yi, Haojun; Su, Liyang

    2018-02-01

    Saffron is an expensive spice. Its primary effective constituents are crocin I and II, and the contents of these compounds directly affect the quality and commercial value of saffron. In this study, near-infrared spectroscopy was combined with chemometric techniques for the determination of crocin I and II in saffron. Partial least squares regression models were built for the quantification of crocin I and II. By comparing different spectral ranges and spectral pretreatment methods (no pretreatment, vector normalization, subtract a straight line, multiplicative scatter correction, minimum-maximum normalization, eliminate the constant offset, first derivative, and second derivative), optimum models were developed. The root mean square error of cross-validation values of the best partial least squares models for crocin I and II were 1.40 and 0.30, respectively. The coefficients of determination for crocin I and II were 93.40 and 96.30, respectively. These results show that near-infrared spectroscopy can be combined with chemometric techniques to determine the contents of crocin I and II in saffron quickly and efficiently.

  20. A KPI-based process monitoring and fault detection framework for large-scale processes.

    PubMed

    Zhang, Kai; Shardt, Yuri A W; Chen, Zhiwen; Yang, Xu; Ding, Steven X; Peng, Kaixiang

    2017-05-01

    Large-scale processes, consisting of multiple interconnected subprocesses, are commonly encountered in industrial systems, whose performance needs to be determined. A common approach to this problem is to use a key performance indicator (KPI)-based approach. However, the different KPI-based approaches are not developed with a coherent and consistent framework. Thus, this paper proposes a framework for KPI-based process monitoring and fault detection (PM-FD) for large-scale industrial processes, which considers the static and dynamic relationships between process and KPI variables. For the static case, a least squares-based approach is developed that provides an explicit link with least-squares regression, which gives better performance than partial least squares. For the dynamic case, using the kernel representation of each subprocess, an instrument variable is used to reduce the dynamic case to the static case. This framework is applied to the TE benchmark process and the hot strip mill rolling process. The results show that the proposed method can detect faults better than previous methods. Copyright © 2017 ISA. Published by Elsevier Ltd. All rights reserved.

  1. Kernel Partial Least Squares for Nonlinear Regression and Discrimination

    NASA Technical Reports Server (NTRS)

    Rosipal, Roman; Clancy, Daniel (Technical Monitor)

    2002-01-01

    This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.

  2. Superquantile Regression: Theory, Algorithms, and Applications

    DTIC Science & Technology

    2014-12-01

    Example C: Stack loss data scatterplot matrix. 91 Regression α c0 caf cwt cac R̄ 2 α R̄ 2 α,Adj Least Squares NA -39.9197 0.7156 1.2953 -0.1521 0.9136...This is due to a small 92 Model Regression α c0 cwt cwt2 R̄ 2 α R̄ 2 α,Adj f2 Least Squares NA -41.9109 2.8174 — 0.7665 0.7542 Quantile 0.25 -32.0000

  3. Least squares reverse time migration of controlled order multiples

    NASA Astrophysics Data System (ADS)

    Liu, Y.

    2016-12-01

    Imaging using the reverse time migration of multiples generates inherent crosstalk artifacts due to the interference among different order multiples. Traditionally, least-square fitting has been used to address this issue by seeking the best objective function to measure the amplitude differences between the predicted and observed data. We have developed an alternative objective function by decomposing multiples into different orders to minimize the difference between Born modeling predicted multiples and specific-order multiples from observational data in order to attenuate the crosstalk. This method is denoted as the least-squares reverse time migration of controlled order multiples (LSRTM-CM). Our numerical examples demonstrated that the LSRTM-CM can significantly improve image quality compared with reverse time migration of multiples and least-square reverse time migration of multiples. Acknowledgments This research was funded by the National Nature Science Foundation of China (Grant Nos. 41430321 and 41374138).

  4. Physical Activity and Depressive Symptoms in Four Ethnic Groups of Midlife Women

    PubMed Central

    Im, Eun-Ok; Ham, Ok Kyung; Chee, Eunice; Chee, Wonshik

    2014-01-01

    The purpose of this study was to determine the associations between physical activity and depression and the multiple contextual factors influencing these associations in four major ethnic-groups of midlife women in the U.S. This was a secondary analysis of the data from 542 midlife women. The instruments included questions on background characteristics and health and menopausal status; the Depression Index for Midlife Women; and the Kaiser Physical Activity Survey. The data were analyzed using chi-square tests, the ANOVA, twoway ANOVA, correlation analyses, and hierarchical multiple regression analyses. The women's depressive symptoms were negatively correlated with active living and sports/exercise physical activities whereas they were positively correlated with occupational physical activities (p < .01). Family income was the strongest predictor of their depressive symptoms. Increasing physical activity may improve midlife women's depressive symptoms, but the types of physical activity and multiple contextual factors need to be considered in intervention development. PMID:24879749

  5. The influence of family adaptability and cohesion on anxiety and depression of terminally ill cancer patients.

    PubMed

    Park, Young-Yoon; Jeong, Young-Jin; Lee, Junyong; Moon, Nayun; Bang, Inho; Kim, Hyunju; Yun, Kyung-Sook; Kim, Yong-I; Jeon, Tae-Hee

    2018-01-01

    This study investigated the effect of family members on terminally ill cancer patients by measuring the relationship of the presence of the family caregivers, visiting time by family and friends, and family adaptability and cohesion with patient's anxiety and depression. From June, 2016 to March, 2017, 100 terminally ill cancer patients who were admitted to a palliative care unit in Seoul, South Korea, were surveyed, and their medical records were reviewed. The Korean version of the Family Adaptability and Cohesion Evaluation Scales III and Hospital Anxiety-Depression Scale was used. Chi-square and multiple logistic regression analyses were used. The results of the chi-square analysis showed that the presence of family caregivers and family visit times did not have statistically significant effects on anxiety and depression in terminally ill cancer patients. In multiple logistic regression, when adjusted for age, sex, ECOG PS, and the monthly average income, the odds ratios (ORs) of the low family adaptability to anxiety and depression were 2.4 (1.03-5.83) and 5.4 (1.10-26.87), respectively. The OR of low family cohesion for depression was 5.4 (1.10-27.20) when adjusted for age, sex, ECOG PS, and monthly average household income. A higher family adaptability resulted in a lower degree of anxiety and depression in terminally ill cancer patients. The higher the family cohesion, the lower the degree of depression in the patient. The presence of the family caregiver and the visiting time by family and friends did not affect the patient's anxiety and depression.

  6. A fast and direct spectrophotometric method for the simultaneous determination of methyl paraben and hydroquinone in cosmetic products using successive projections algorithm.

    PubMed

    Esteki, M; Nouroozi, S; Shahsavari, Z

    2016-02-01

    To develop a simple and efficient spectrophotometric technique combined with chemometrics for the simultaneous determination of methyl paraben (MP) and hydroquinone (HQ) in cosmetic products, and specifically, to: (i) evaluate the potential use of successive projections algorithm (SPA) to derivative spectrophotometric data in order to provide sufficient accuracy and model robustness and (ii) determine MP and HQ concentration in cosmetics without tedious pre-treatments such as derivatization or extraction techniques which are time-consuming and require hazardous solvents. The absorption spectra were measured in the wavelength range of 200-350 nm. Prior to performing chemometric models, the original and first-derivative absorption spectra of binary mixtures were used as calibration matrices. Variable selected by successive projections algorithm was used to obtain multiple linear regression (MLR) models based on a small subset of wavelengths. The number of wavelengths and the starting vector were optimized, and the comparison of the root mean square error of calibration (RMSEC) and cross-validation (RMSECV) was applied to select effective wavelengths with the least collinearity and redundancy. Principal component regression (PCR) and partial least squares (PLS) were also developed for comparison. The concentrations of the calibration matrix ranged from 0.1 to 20 μg mL(-1) for MP, and from 0.1 to 25 μg mL(-1) for HQ. The constructed models were tested on an external validation data set and finally cosmetic samples. The results indicated that successive projections algorithm-multiple linear regression (SPA-MLR), applied on the first-derivative spectra, achieved the optimal performance for two compounds when compared with the full-spectrum PCR and PLS. The root mean square error of prediction (RMSEP) was 0.083, 0.314 for MP and HQ, respectively. To verify the accuracy of the proposed method, a recovery study on real cosmetic samples was carried out with satisfactory results (84-112%). The proposed method, which is an environmentally friendly approach, using minimum amount of solvent, is a simple, fast and low-cost analysis method that can provide high accuracy and robust models. The suggested method does not need any complex extraction procedure which is time-consuming and requires hazardous solvents. © 2015 Society of Cosmetic Scientists and the Société Française de Cosmétologie.

  7. Investigation of marital satisfaction and its relationship with job stress and general health of nurses in Qazvin, Iran.

    PubMed

    Azimian, Jalil; Piran, Pegah; Jahanihashemi, Hassan; Dehghankar, Leila

    2017-04-01

    Pressures in nursing can affect family life and marital problems, disrupt common social problems, increase work-family conflicts and endanger people's general health. To determine marital satisfaction and its relationship with job stress and general health of nurses. This descriptive and cross-sectional study was done in 2015 in medical educational centers of Qazvin by using an ENRICH marital satisfaction scale and General Health and Job Stress questionnaires completed by 123 nurses. Analysis was done by SPSS version 19 using descriptive and analytical statistics (Pearson correlation, t-test, ANOVA, Chi-square, regression line, multiple regression analysis). The findings showed that 64.4% of nurses had marital satisfaction. There was significant relationship between age (p=0.03), job experience (p=0.01), age of spouse (p=0.01) and marital satisfaction. The results showed that there was a significant relationship between marital satisfaction and general health (p<0.0001). Multiple regression analysis showed that there was a significant relationship between depression (p=0.012) and anxiety (p=0.001) with marital satisfaction. Due to high levels of job stress and disorder in general health of nurses and low marital satisfaction by running health promotion programs and paying attention to its dimensions can help work and family health of nurses.

  8. Moderation analysis using a two-level regression model.

    PubMed

    Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott

    2014-10-01

    Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.

  9. Optimizing methods for linking cinematic features to fMRI data.

    PubMed

    Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia

    2015-04-15

    One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.

  10. [Aggression and related factors in elementary school students].

    PubMed

    Ji, Eun Sun; Jang, Mi Heui

    2010-10-01

    This study was done to explore the relationship between aggression and internet over-use, depression-anxiety, self-esteem, all of which are known to be behavior and psychological characteristics linked to "at-risk" children for aggression. Korean-Child Behavior Check List (K-CBCL), Korean-Internet Addiction Self-Test Scale, and Self-Esteem Scale by Rosenberg (1965) were used as measurement tools with a sample of 743, 5th-6th grade students from 3 elementary schools in Jecheon city. Chi-square, t-test, ANOVA, Pearson's correlation and stepwise multiple regression with SPSS/Win 13.0 version were used to analyze the collected data. Aggression for the elementary school students was positively correlated with internet over-use and depression-anxiety, whereas self-esteem was negatively correlated with aggression. Stepwise multiple regression analysis showed that 68.4% of the variance for aggression was significantly accounted for by internet over-use, depression-anxiety, and self-esteem. The most significant factor influencing aggression was depression-anxiety. These results suggest that earlier screening and intervention programs for depression-anxiety and internet over-use for elementary student will be helpful in preventing aggression.

  11. Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate

    NASA Astrophysics Data System (ADS)

    Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno

    2017-03-01

    This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four (4) statistical regression models (two linear and two nonlinear) are built to predict the ROP of TBM. Finally a fuzzy logic model is developed as an alternative method and compared to the four statistical regression models. Results show that the fuzzy logic model provides better estimations and can be applied to predict the TBM performance. The R-squared value (R2) of the fuzzy logic model scores the highest value of 0.714 over the second runner-up of 0.667 from the multiple variables nonlinear regression model.

  12. A comparative study of the use of powder X-ray diffraction, Raman and near infrared spectroscopy for quantification of binary polymorphic mixtures of piracetam.

    PubMed

    Croker, Denise M; Hennigan, Michelle C; Maher, Anthony; Hu, Yun; Ryder, Alan G; Hodnett, Benjamin K

    2012-04-07

    Diffraction and spectroscopic methods were evaluated for quantitative analysis of binary powder mixtures of FII(6.403) and FIII(6.525) piracetam. The two polymorphs of piracetam could be distinguished using powder X-ray diffraction (PXRD), Raman and near-infrared (NIR) spectroscopy. The results demonstrated that Raman and NIR spectroscopy are most suitable for quantitative analysis of this polymorphic mixture. When the spectra are treated with the combination of multiplicative scatter correction (MSC) and second derivative data pretreatments, the partial least squared (PLS) regression model gave a root mean square error of calibration (RMSEC) of 0.94 and 0.99%, respectively. FIII(6.525) demonstrated some preferred orientation in PXRD analysis, making PXRD the least preferred method of quantification. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Methods for estimating annual exceedance-probability discharges for streams in Iowa, based on data through water year 2010

    USGS Publications Warehouse

    Eash, David A.; Barnes, Kimberlee K.; Veilleux, Andrea G.

    2013-01-01

    A statewide study was performed to develop regional regression equations for estimating selected annual exceedance-probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedance-probability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized least-squares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized least-squares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.

  14. Multilevel Modeling and Ordinary Least Squares Regression: How Comparable Are They?

    ERIC Educational Resources Information Center

    Huang, Francis L.

    2018-01-01

    Studies analyzing clustered data sets using both multilevel models (MLMs) and ordinary least squares (OLS) regression have generally concluded that resulting point estimates, but not the standard errors, are comparable with each other. However, the accuracy of the estimates of OLS models is important to consider, as several alternative techniques…

  15. A Comparison of Mean Phase Difference and Generalized Least Squares for Analyzing Single-Case Data

    ERIC Educational Resources Information Center

    Manolov, Rumen; Solanas, Antonio

    2013-01-01

    The present study focuses on single-case data analysis specifically on two procedures for quantifying differences between baseline and treatment measurements. The first technique tested is based on generalized least square regression analysis and is compared to a proposed non-regression technique, which allows obtaining similar information. The…

  16. Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.

    PubMed

    Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W

    1992-03-01

    The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.

  17. Risk factors for retinal breaks in patients with symptom of floaters.

    PubMed

    Singalavanija, Apichart; Amornrattanapan, Chutiwan; Nitiruangjarus, Kanjanee; Tongsai, Sasima

    2010-06-01

    To identify the risk factors of retinal breaks in patients with the symptom of floaters, and to determine the association between those risk factors and retinal breaks. A retrospective analytic study of 184 patients (55 males and 129 females) that included 220 eyes was conducted. Patient information such as age, symptoms (multiple floaters, flashing), duration of symptom, refractive error, history of cataract surgery, family history of retinal detachment, and complete eye examination were recorded. The patients were divided into two groups, the first group (control group) had symptoms of floaters and no retinal breaks, the second group (retinal breaks group) had symptoms of floaters with retinal breaks. Chi-square test, and the multiple logistic regression were used for statistical analysis. Two hundred twenty eyes, 175 eyes of the control group and 45 eyes of the retinal breaks group were examined and included in this study. The multiple logistic regression analysis revealed that patients with multiple floaters, and floaters and flashing increased the risk of retinal breaks to 5.8 and 4.3 times, respectively, when compared to patients with single floater or floaters alone. Lattice degeneration increased the risk of retinal breaks to 5.9 times when compared to eyes that did not have lattice degeneration. Multiple floaters, flashing and lattice degeneration are risk factors of retinal breaks in patients with symptoms of floaters. Therefore, it is important for the ophthalmologists to be aware of these risk factors and the patients at risk should have follow-up examinations.

  18. Multi-frequency Phase Unwrap from Noisy Data: Adaptive Least Squares Approach

    NASA Astrophysics Data System (ADS)

    Katkovnik, Vladimir; Bioucas-Dias, José

    2010-04-01

    Multiple frequency interferometry is, basically, a phase acquisition strategy aimed at reducing or eliminating the ambiguity of the wrapped phase observations or, equivalently, reducing or eliminating the fringe ambiguity order. In multiple frequency interferometry, the phase measurements are acquired at different frequencies (or wavelengths) and recorded using the corresponding sensors (measurement channels). Assuming that the absolute phase to be reconstructed is piece-wise smooth, we use a nonparametric regression technique for the phase reconstruction. The nonparametric estimates are derived from a local least squares criterion, which, when applied to the multifrequency data, yields denoised (filtered) phase estimates with extended ambiguity (periodized), compared with the phase ambiguities inherent to each measurement frequency. The filtering algorithm is based on local polynomial (LPA) approximation for design of nonlinear filters (estimators) and adaptation of these filters to unknown smoothness of the spatially varying absolute phase [9]. For phase unwrapping, from filtered periodized data, we apply the recently introduced robust (in the sense of discontinuity preserving) PUMA unwrapping algorithm [1]. Simulations give evidence that the proposed algorithm yields state-of-the-art performance for continuous as well as for discontinues phase surfaces, enabling phase unwrapping in extraordinary difficult situations when all other algorithms fail.

  19. Combined computational-experimental approach to predict blood-brain barrier (BBB) permeation based on "green" salting-out thin layer chromatography supported by simple molecular descriptors.

    PubMed

    Ciura, Krzesimir; Belka, Mariusz; Kawczak, Piotr; Bączek, Tomasz; Markuszewski, Michał J; Nowakowska, Joanna

    2017-09-05

    The objective of this paper is to build QSRR/QSAR model for predicting the blood-brain barrier (BBB) permeability. The obtained models are based on salting-out thin layer chromatography (SOTLC) constants and calculated molecular descriptors. Among chromatographic methods SOTLC was chosen, since the mobile phases are free of organic solvent. As consequences, there are less toxic, and have lower environmental impact compared to classical reserved phases liquid chromatography (RPLC). During the study three stationary phase silica gel, cellulose plates and neutral aluminum oxide were examined. The model set of solutes presents a wide range of log BB values, containing compounds which cross the BBB readily and molecules poorly distributed to the brain including drugs acting on the nervous system as well as peripheral acting drugs. Additionally, the comparison of three regression models: multiple linear regression (MLR), partial least-squares (PLS) and orthogonal partial least squares (OPLS) were performed. The designed QSRR/QSAR models could be useful to predict BBB of systematically synthesized newly compounds in the drug development pipeline and are attractive alternatives of time-consuming and demanding directed methods for log BB measurement. The study also shown that among several regression techniques, significant differences can be obtained in models performance, measured by R 2 and Q 2 , hence it is strongly suggested to evaluate all available options as MLR, PLS and OPLS. Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Orthogonalizing EM: A design-based least squares algorithm

    PubMed Central

    Xiong, Shifeng; Dai, Bin; Huling, Jared; Qian, Peter Z. G.

    2016-01-01

    We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares problems, and is considerably faster than competing methods when n is much larger than p. Supplementary materials for this article are available online. PMID:27499558

  1. [Comparison of the factors influencing young adolescents' aggression according to family structure].

    PubMed

    Yun, Eun Kyoung; Shin, Sung Hee

    2013-06-01

    This cross-sectional study was done to compare factors influencing young adolescents' aggression according to family structure. Participants were 680 young adolescents aged 11 to 15 years (113 in single father families, 136 in single mother families, 49 in grandparent families, and 382 in both-parent families). All measures were self-administered. Data were analyzed using SPSS 18.0 program and factors affecting young adolescents' aggression were analyzed by stepwise multiple regression. Levels of young adolescents' aggression and all variables were significantly different among the four family structure groups. Factors influencing young adolescents' aggression were also different according to these 4 groups. For single father families, depression-anxiety and family hardiness significantly predicted the level of young adolescents' aggression (adjusted R square=.37, p<.001). For single mother families, depression-anxiety, gender, and friends' support significantly predicted the level of young adolescents' aggression (adjusted R square=.58, p<.001). For grandparent families, depression-anxiety and family support significantly predicted the level of young adolescents' aggression (adjusted R square=.58, p<.001). For both-parent families, depression-anxiety, family hardiness, and friends' support significantly predicted the level of young adolescents' aggression (adjusted R square=.48, p<.001). Nurses working with young adolescents should consider family structure-specific factors influencing aggression in this population.

  2. Evaluation of the CEAS model for barley yields in North Dakota and Minnesota

    NASA Technical Reports Server (NTRS)

    Barnett, T. L. (Principal Investigator)

    1981-01-01

    The CEAS yield model is based upon multiple regression analysis at the CRD and state levels. For the historical time series, yield is regressed on a set of variables derived from monthly mean temperature and monthly precipitation. Technological trend is represented by piecewise linear and/or quadriatic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-79) demonstrated that biases are small and performance as indicated by the root mean square errors are acceptable for intended application, however, model response for individual years particularly unusual years, is not very reliable and shows some large errors. The model is objective, adequate, timely, simple and not costly. It considers scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.

  3. Study relationship between inorganic and organic coal analysis with gross calorific value by multiple regression and ANFIS

    USGS Publications Warehouse

    Chelgani, S.C.; Hart, B.; Grady, W.C.; Hower, J.C.

    2011-01-01

    The relationship between maceral content plus mineral matter and gross calorific value (GCV) for a wide range of West Virginia coal samples (from 6518 to 15330 BTU/lb; 15.16 to 35.66MJ/kg) has been investigated by multivariable regression and adaptive neuro-fuzzy inference system (ANFIS). The stepwise least square mathematical method comparison between liptinite, vitrinite, plus mineral matter as input data sets with measured GCV reported a nonlinear correlation coefficient (R2) of 0.83. Using the same data set the correlation between the predicted GCV from the ANFIS model and the actual GCV reported a R2 value of 0.96. It was determined that the GCV-based prediction methods, as used in this article, can provide a reasonable estimation of GCV. Copyright ?? Taylor & Francis Group, LLC.

  4. The Relationships between Weather-Related Factors and Daily Outdoor Physical Activity Counts on an Urban Greenway

    PubMed Central

    Wolff, Dana; Fitzhugh, Eugene C.

    2011-01-01

    The purpose of this study was to examine relationships between weather and outdoor physical activity (PA). An online weather source was used to obtain daily max temperature [DMT], precipitation, and wind speed. An infra-red trail counter provided data on daily trail use along a greenway, over a 2-year period. Multiple regression analysis was used to examine associations between PA and weather, while controlling for day of the week and month of the year. The overall regression model explained 77.0% of the variance in daily PA (p < 0.001). DMT (b = 10.5), max temp-squared (b = −4.0), precipitation (b = −70.0), and max wind speed (b = 1.9) contributed significantly. Conclusion: Aggregated daily data can detect relationships between weather and outdoor PA. PMID:21556205

  5. Key factors contributing to accident severity rate in construction industry in Iran: a regression modelling approach.

    PubMed

    Soltanzadeh, Ahmad; Mohammadfam, Iraj; Moghimbeigi, Abbas; Ghiasvand, Reza

    2016-03-01

    Construction industry involves the highest risk of occupational accidents and bodily injuries, which range from mild to very severe. The aim of this cross-sectional study was to identify the factors associated with accident severity rate (ASR) in the largest Iranian construction companies based on data about 500 occupational accidents recorded from 2009 to 2013. We also gathered data on safety and health risk management and training systems. Data were analysed using Pearson's chi-squared coefficient and multiple regression analysis. Median ASR (and the interquartile range) was 107.50 (57.24- 381.25). Fourteen of the 24 studied factors stood out as most affecting construction accident severity (p<0.05). These findings can be applied in the design and implementation of a comprehensive safety and health risk management system to reduce ASR.

  6. Geodesic regression on orientation distribution functions with its application to an aging study.

    PubMed

    Du, Jia; Goh, Alvina; Kushnarev, Sergey; Qiu, Anqi

    2014-02-15

    In this paper, we treat orientation distribution functions (ODFs) derived from high angular resolution diffusion imaging (HARDI) as elements of a Riemannian manifold and present a method for geodesic regression on this manifold. In order to find the optimal regression model, we pose this as a least-squares problem involving the sum-of-squared geodesic distances between observed ODFs and their model fitted data. We derive the appropriate gradient terms and employ gradient descent to find the minimizer of this least-squares optimization problem. In addition, we show how to perform statistical testing for determining the significance of the relationship between the manifold-valued regressors and the real-valued regressands. Experiments on both synthetic and real human data are presented. In particular, we examine aging effects on HARDI via geodesic regression of ODFs in normal adults aged 22 years old and above. © 2013 Elsevier Inc. All rights reserved.

  7. Weighted linear regression using D2H and D2 as the independent variables

    Treesearch

    Hans T. Schreuder; Michael S. Williams

    1998-01-01

    Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...

  8. Using partial least squares regression as a predictive tool in describing equine third metacarpal bone shape.

    PubMed

    Liley, Helen; Zhang, Ju; Firth, Elwyn; Fernandez, Justin; Besier, Thor

    2017-11-01

    Population variance in bone shape is an important consideration when applying the results of subject-specific computational models to a population. In this letter, we demonstrate the ability of partial least squares regression to provide an improved shape prediction of the equine third metacarpal epiphysis, using two easily obtained measurements.

  9. Plant leaf chlorophyll content retrieval based on a field imaging spectroscopy system.

    PubMed

    Liu, Bo; Yue, Yue-Min; Li, Ru; Shen, Wen-Jing; Wang, Ke-Lin

    2014-10-23

    A field imaging spectrometer system (FISS; 380-870 nm and 344 bands) was designed for agriculture applications. In this study, FISS was used to gather spectral information from soybean leaves. The chlorophyll content was retrieved using a multiple linear regression (MLR), partial least squares (PLS) regression and support vector machine (SVM) regression. Our objective was to verify the performance of FISS in a quantitative spectral analysis through the estimation of chlorophyll content and to determine a proper quantitative spectral analysis method for processing FISS data. The results revealed that the derivative reflectance was a more sensitive indicator of chlorophyll content and could extract content information more efficiently than the spectral reflectance, which is more significant for FISS data compared to ASD (analytical spectral devices) data, reducing the corresponding RMSE (root mean squared error) by 3.3%-35.6%. Compared with the spectral features, the regression methods had smaller effects on the retrieval accuracy. A multivariate linear model could be the ideal model to retrieve chlorophyll information with a small number of significant wavelengths used. The smallest RMSE of the chlorophyll content retrieved using FISS data was 0.201 mg/g, a relative reduction of more than 30% compared with the RMSE based on a non-imaging ASD spectrometer, which represents a high estimation accuracy compared with the mean chlorophyll content of the sampled leaves (4.05 mg/g). Our study indicates that FISS could obtain both spectral and spatial detailed information of high quality. Its image-spectrum-in-one merit promotes the good performance of FISS in quantitative spectral analyses, and it can potentially be widely used in the agricultural sector.

  10. Plant Leaf Chlorophyll Content Retrieval Based on a Field Imaging Spectroscopy System

    PubMed Central

    Liu, Bo; Yue, Yue-Min; Li, Ru; Shen, Wen-Jing; Wang, Ke-Lin

    2014-01-01

    A field imaging spectrometer system (FISS; 380–870 nm and 344 bands) was designed for agriculture applications. In this study, FISS was used to gather spectral information from soybean leaves. The chlorophyll content was retrieved using a multiple linear regression (MLR), partial least squares (PLS) regression and support vector machine (SVM) regression. Our objective was to verify the performance of FISS in a quantitative spectral analysis through the estimation of chlorophyll content and to determine a proper quantitative spectral analysis method for processing FISS data. The results revealed that the derivative reflectance was a more sensitive indicator of chlorophyll content and could extract content information more efficiently than the spectral reflectance, which is more significant for FISS data compared to ASD (analytical spectral devices) data, reducing the corresponding RMSE (root mean squared error) by 3.3%–35.6%. Compared with the spectral features, the regression methods had smaller effects on the retrieval accuracy. A multivariate linear model could be the ideal model to retrieve chlorophyll information with a small number of significant wavelengths used. The smallest RMSE of the chlorophyll content retrieved using FISS data was 0.201 mg/g, a relative reduction of more than 30% compared with the RMSE based on a non-imaging ASD spectrometer, which represents a high estimation accuracy compared with the mean chlorophyll content of the sampled leaves (4.05 mg/g). Our study indicates that FISS could obtain both spectral and spatial detailed information of high quality. Its image-spectrum-in-one merit promotes the good performance of FISS in quantitative spectral analyses, and it can potentially be widely used in the agricultural sector. PMID:25341439

  11. Very-short-term wind power prediction by a hybrid model with single- and multi-step approaches

    NASA Astrophysics Data System (ADS)

    Mohammed, E.; Wang, S.; Yu, J.

    2017-05-01

    Very-short-term wind power prediction (VSTWPP) has played an essential role for the operation of electric power systems. This paper aims at improving and applying a hybrid method of VSTWPP based on historical data. The hybrid method is combined by multiple linear regressions and least square (MLR&LS), which is intended for reducing prediction errors. The predicted values are obtained through two sub-processes:1) transform the time-series data of actual wind power into the power ratio, and then predict the power ratio;2) use the predicted power ratio to predict the wind power. Besides, the proposed method can include two prediction approaches: single-step prediction (SSP) and multi-step prediction (MSP). WPP is tested comparatively by auto-regressive moving average (ARMA) model from the predicted values and errors. The validity of the proposed hybrid method is confirmed in terms of error analysis by using probability density function (PDF), mean absolute percent error (MAPE) and means square error (MSE). Meanwhile, comparison of the correlation coefficients between the actual values and the predicted values for different prediction times and window has confirmed that MSP approach by using the hybrid model is the most accurate while comparing to SSP approach and ARMA. The MLR&LS is accurate and promising for solving problems in WPP.

  12. [Study of blending method for the extracts of herbal plants].

    PubMed

    Liu, Yongsuo; Cao, Min; Chen, Yuying; Hu, Yuzhu; Wang, Yiming; Luo, Guoan

    2006-03-01

    The irregularity in herbal plant composition is influenced by multiple factors. As for quality control of traditional Chinese medicine, the most critical challenge is to ensure the dosage content uniformity. This content uniformity can be improved by blending different batches of the extracts of herbal plants. Nonlinear least-squares regression was used to calculate the blending coefficient, which means no great absolute differences allowed for all ingredients. For traditional Chinese medicines, even relatively smaller differences could present to be very important for all the ingredients. The auto-scaling pretreatment was used prior to the calculation of the blending coefficients. The pretreatment buffered the characteristics of individual data for the ingredients in different batches, so an improved auto-scaling pretreatment method was proposed. With the improved auto-scaling pretreatment, the relative. differences decreased after blending different batches of extracts of herbal plants according to the reference samples. And the content uniformity control of the specific ingredients could be achieved by the error control coefficient. In the studies for the extracts of fructus gardeniae, the relative differences of all the ingredients is less than 3% after blending different batches of the extracts. The results showed that nonlinear least-squares regression can be used to calculate the blending coefficient of the herbal plant extracts.

  13. Clarifying the role of mean centring in multicollinearity of interaction effects.

    PubMed

    Shieh, Gwowen

    2011-11-01

    Moderated multiple regression (MMR) is frequently employed to analyse interaction effects between continuous predictor variables. The procedure of mean centring is commonly recommended to mitigate the potential threat of multicollinearity between predictor variables and the constructed cross-product term. Also, centring does typically provide more straightforward interpretation of the lower-order terms. This paper attempts to clarify two methodological issues of potential confusion. First, the positive and negative effects of mean centring on multicollinearity diagnostics are explored. It is illustrated that the mean centring method is, depending on the characteristics of the data, capable of either increasing or decreasing various measures of multicollinearity. Second, the exact reason why mean centring does not affect the detection of interaction effects is given. The explication shows the symmetrical influence of mean centring on the corrected sum of squares and variance inflation factor of the product variable while maintaining the equivalence between the two residual sums of squares for the regression of the product term on the two predictor variables. Thus the resulting test statistic remains unchanged regardless of the obvious modification of multicollinearity with mean centring. These findings provide a clear understanding and demonstration on the diverse impact of mean centring in MMR applications. ©2011 The British Psychological Society.

  14. Quantifying Parkinson's disease finger-tapping severity by extracting and synthesizing finger motion properties.

    PubMed

    Sano, Yuko; Kandori, Akihiko; Shima, Keisuke; Yamaguchi, Yuki; Tsuji, Toshio; Noda, Masafumi; Higashikawa, Fumiko; Yokoe, Masaru; Sakoda, Saburo

    2016-06-01

    We propose a novel index of Parkinson's disease (PD) finger-tapping severity, called "PDFTsi," for quantifying the severity of symptoms related to the finger tapping of PD patients with high accuracy. To validate the efficacy of PDFTsi, the finger-tapping movements of normal controls and PD patients were measured by using magnetic sensors, and 21 characteristics were extracted from the finger-tapping waveforms. To distinguish motor deterioration due to PD from that due to aging, the aging effect on finger tapping was removed from these characteristics. Principal component analysis (PCA) was applied to the age-normalized characteristics, and principal components that represented the motion properties of finger tapping were calculated. Multiple linear regression (MLR) with stepwise variable selection was applied to the principal components, and PDFTsi was calculated. The calculated PDFTsi indicates that PDFTsi has a high estimation ability, namely a mean square error of 0.45. The estimation ability of PDFTsi is higher than that of the alternative method, MLR with stepwise regression selection without PCA, namely a mean square error of 1.30. This result suggests that PDFTsi can quantify PD finger-tapping severity accurately. Furthermore, the result of interpreting a model for calculating PDFTsi indicated that motion wideness and rhythm disorder are important for estimating PD finger-tapping severity.

  15. Bitterness intensity prediction of berberine hydrochloride using an electronic tongue and a GA-BP neural network.

    PubMed

    Liu, Ruixin; Zhang, Xiaodong; Zhang, Lu; Gao, Xiaojie; Li, Huiling; Shi, Junhan; Li, Xuelin

    2014-06-01

    The aim of this study was to predict the bitterness intensity of a drug using an electronic tongue (e-tongue). The model drug of berberine hydrochloride was used to establish a bitterness prediction model (BPM), based on the taste evaluation of bitterness intensity by a taste panel, the data provided by the e-tongue and a genetic algorithm-back-propagation neural network (GA-BP) modeling method. The modeling characteristics of the GA-BP were compared with those of multiple linear regression, partial least square regression and BP methods. The determination coefficient of the BPM was 0.99965±0.00004, the root mean square error of cross-validation was 0.1398±0.0488 and the correlation coefficient of the cross-validation between the true and predicted values was 0.9959±0.0027. The model is superior to the other three models based on these indicators. In conclusion, the model established in this study has a high fitting degree and may be used for the bitterness prediction modeling of berberine hydrochloride of different concentrations. The model also provides a reference for the generation of BPMs of other drugs. Additionally, the algorithm of the study is able to conduct a rapid and accurate quantitative analysis of the data provided by the e-tongue.

  16. Bitterness intensity prediction of berberine hydrochloride using an electronic tongue and a GA-BP neural network

    PubMed Central

    LIU, RUIXIN; ZHANG, XIAODONG; ZHANG, LU; GAO, XIAOJIE; LI, HUILING; SHI, JUNHAN; LI, XUELIN

    2014-01-01

    The aim of this study was to predict the bitterness intensity of a drug using an electronic tongue (e-tongue). The model drug of berberine hydrochloride was used to establish a bitterness prediction model (BPM), based on the taste evaluation of bitterness intensity by a taste panel, the data provided by the e-tongue and a genetic algorithm-back-propagation neural network (GA-BP) modeling method. The modeling characteristics of the GA-BP were compared with those of multiple linear regression, partial least square regression and BP methods. The determination coefficient of the BPM was 0.99965±0.00004, the root mean square error of cross-validation was 0.1398±0.0488 and the correlation coefficient of the cross-validation between the true and predicted values was 0.9959±0.0027. The model is superior to the other three models based on these indicators. In conclusion, the model established in this study has a high fitting degree and may be used for the bitterness prediction modeling of berberine hydrochloride of different concentrations. The model also provides a reference for the generation of BPMs of other drugs. Additionally, the algorithm of the study is able to conduct a rapid and accurate quantitative analysis of the data provided by the e-tongue. PMID:24926369

  17. 40 CFR Appendix C to Subpart Nnn... - Method for the Determination of Product Density

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... insulation. The method is applicable to all cured board and blanket products. 2. Equipment One square foot (12 in. by 12 in.) template, or templates that are multiples of one square foot, for use in cutting... procedure for the designated product. 3.2Cut samples using one square foot (or multiples of one square foot...

  18. 40 CFR Appendix C to Subpart Nnn... - Method for the Determination of Product Density

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    .... The method is applicable to all cured board and blanket products. 2. Equipment One square foot (12 in. by 12 in.) template, or templates that are multiples of one square foot, for use in cutting... procedure for the designated product. 3.2Cut samples using one square foot (or multiples of one square foot...

  19. 40 CFR Appendix C to Subpart Nnn... - Method for the Determination of Product Density

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    .... The method is applicable to all cured board and blanket products. 2. Equipment One square foot (12 in. by 12 in.) template, or templates that are multiples of one square foot, for use in cutting... procedure for the designated product. 3.2Cut samples using one square foot (or multiples of one square foot...

  20. Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method.

    PubMed

    Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

    2016-01-25

    To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.

  1. A Generalized Least Squares Regression Approach for Computing Effect Sizes in Single-Case Research: Application Examples

    ERIC Educational Resources Information Center

    Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.

    2011-01-01

    A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…

  2. Knowledge, attitudes and practices survey on organ donation among a selected adult population of Pakistan

    PubMed Central

    Saleem, Taimur; Ishaque, Sidra; Habib, Nida; Hussain, Syedda Saadia; Jawed, Areeba; Khan, Aamir Ali; Ahmad, Muhammad Imran; Iftikhar, Mian Omer; Mughal, Hamza Pervez; Jehan, Imtiaz

    2009-01-01

    Background To determine the knowledge, attitudes and practices regarding organ donation in a selected adult population in Pakistan. Methods Convenience sampling was used to generate a sample of 440; 408 interviews were successfully completed and used for analysis. Data collection was carried out via a face to face interview based on a pre-tested questionnaire in selected public areas of Karachi, Pakistan. Data was analyzed using SPSS v.15 and associations were tested using the Pearson's Chi square test. Multiple logistic regression was used to find independent predictors of knowledge status and motivation of organ donation. Results Knowledge about organ donation was significantly associated with education (p = 0.000) and socioeconomic status (p = 0.038). 70/198 (35.3%) people expressed a high motivation to donate. Allowance of organ donation in religion was significantly associated with the motivation to donate (p = 0.000). Multiple logistic regression analysis revealed that higher level of education and higher socioeconomic status were significant (p < 0.05) independent predictors of knowledge status of organ donation. For motivation, multiple logistic regression revealed that higher socioeconomic status, adequate knowledge score and belief that organ donation is allowed in religion were significant (p < 0.05) independent predictors. Television emerged as the major source of information. Only 3.5% had themselves donated an organ; with only one person being an actual kidney donor. Conclusion Better knowledge may ultimately translate into the act of donation. Effective measures should be taken to educate people with relevant information with the involvement of media, doctors and religious scholars. PMID:19534793

  3. Spectral distance decay: Assessing species beta-diversity by quantile regression

    USGS Publications Warehouse

    Rocchinl, D.; Nagendra, H.; Ghate, R.; Cade, B.S.

    2009-01-01

    Remotely sensed data represents key information for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance may allow us to quantitatively estimate how beta-diversity in species changes with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological datasets are characterized by a high number of zeroes that can add noise to the regression model. Quantile regression can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this paper, we used ordinary least square (ols) and quantile regression to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.05) considering both ols and quantile regression. Nonetheless, ols regression estimate of mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when spectral distance approaches zero, was very low compared with the intercepts of upper quantiles, which detected high species similarity when habitats are more similar. In this paper we demonstrated the power of using quantile regressions applied to spectral distance decay in order to reveal species diversity patterns otherwise lost or underestimated by ordinary least square regression. ?? 2009 American Society for Photogrammetry and Remote Sensing.

  4. Least Square Regression Method for Estimating Gas Concentration in an Electronic Nose System

    PubMed Central

    Khalaf, Walaa; Pace, Calogero; Gaudioso, Manlio

    2009-01-01

    We describe an Electronic Nose (ENose) system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values), the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM) approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte. PMID:22573980

  5. Estimating EQ-5D values from the Neck Disability Index and numeric rating scales for neck and arm pain.

    PubMed

    Carreon, Leah Y; Bratcher, Kelly R; Das, Nandita; Nienhuis, Jacob B; Glassman, Steven D

    2014-09-01

    The Neck Disability Index (NDI) and numeric rating scales (0 to 10) for neck pain and arm pain are widely used cervical spine disease-specific measures. Recent studies have shown that there is a strong relationship between the SF-6D and the NDI such that using a simple linear regression allows for the estimation of an SF-6D value from the NDI alone. Due to ease of administration and scoring, the EQ-5D is increasingly being used as a measure of utility in the clinical setting. The purpose of this study is to determine if the EQ-5D values can be estimated from commonly available cervical spine disease-specific health-related quality of life measures, much like the SF-6D. The EQ-5D, NDI, neck pain score, and arm pain score were prospectively collected in 3732 patients who presented to the authors' clinic with degenerative cervical spine disorders. Correlation coefficients for paired observations from multiple time points between the NDI, neck pain and arm pain scores, and EQ-5D were determined. Regression models were built to estimate the EQ-5D values from the NDI, neck pain, and arm pain scores. The mean age of the 3732 patients was 53.3 ± 12.2 years, and 43% were male. Correlations between the EQ-5D and the NDI, neck pain score, and arm pain score were statistically significant (p < 0.0001), with correlation coefficients of -0.77, -0.62, and -0.50, respectively. The regression equation 0.98947 + (-0.00705 × NDI) + (-0.00875 × arm pain score) + (-0.00877 × neck pain score) to predict EQ-5D had an R-square of 0.62 and a root mean square error (RMSE) of 0.146. The model using NDI alone had an R-square of 0.59 and a RMSE of 0.150. The model using the individual NDI items had an R-square of 0.46 and an RMSE of 0.172. The correlation coefficient between the observed and estimated EQ-5D scores was 0.79. There was no statistically significant difference between the actual EQ-5D score (0.603 ± 0.235) and the estimated EQ-5D score (0.603 ± 0.185) using the NDI, neck pain score, and arm pain score regression model. However, rounding off the coefficients to fewer than 5 decimal places produced less accurate results. The regression model estimating the EQ-5D from the NDI, neck pain score, and arm pain score accounted for 60% of the variability of the EQ-5D with a relatively large RMSE. This regression model may not be sufficient to accurately or reliably estimate actual EQ-5D values.

  6. Do MCAT scores predict USMLE scores? An analysis on 5 years of medical student data.

    PubMed

    Gauer, Jacqueline L; Wolff, Josephine M; Jackson, J Brooks

    2016-01-01

    The purpose of this study was to determine the associations and predictive values of Medical College Admission Test (MCAT) component and composite scores prior to 2015 with U.S. Medical Licensure Exam (USMLE) Step 1 and Step 2 Clinical Knowledge (CK) scores, with a focus on whether students scoring low on the MCAT were particularly likely to continue to score low on the USMLE exams. Multiple linear regression, correlation, and chi-square analyses were performed to determine the relationship between MCAT component and composite scores and USMLE Step 1 and Step 2 CK scores from five graduating classes (2011-2015) at the University of Minnesota Medical School ( N =1,065). The multiple linear regression analyses were both significant ( p <0.001). The three MCAT component scores together explained 17.7% of the variance in Step 1 scores ( p< 0.001) and 12.0% of the variance in Step 2 CK scores ( p <0.001). In the chi-square analyses, significant, albeit weak associations were observed between almost all MCAT component scores and USMLE scores (Cramer's V ranged from 0.05 to 0.24). Each of the MCAT component scores was significantly associated with USMLE Step 1 and Step 2 CK scores, although the effect size was small. Being in the top or bottom scoring range of the MCAT exam was predictive of being in the top or bottom scoring range of the USMLE exams, although the strengths of the associations were weak to moderate. These results indicate that MCAT scores are predictive of student performance on the USMLE exams, but, given the small effect sizes, should be considered as part of the holistic view of the student.

  7. Do MCAT scores predict USMLE scores? An analysis on 5 years of medical student data

    PubMed Central

    Gauer, Jacqueline L.; Wolff, Josephine M.; Jackson, J. Brooks

    2016-01-01

    Introduction The purpose of this study was to determine the associations and predictive values of Medical College Admission Test (MCAT) component and composite scores prior to 2015 with U.S. Medical Licensure Exam (USMLE) Step 1 and Step 2 Clinical Knowledge (CK) scores, with a focus on whether students scoring low on the MCAT were particularly likely to continue to score low on the USMLE exams. Method Multiple linear regression, correlation, and chi-square analyses were performed to determine the relationship between MCAT component and composite scores and USMLE Step 1 and Step 2 CK scores from five graduating classes (2011–2015) at the University of Minnesota Medical School (N=1,065). Results The multiple linear regression analyses were both significant (p<0.001). The three MCAT component scores together explained 17.7% of the variance in Step 1 scores (p<0.001) and 12.0% of the variance in Step 2 CK scores (p<0.001). In the chi-square analyses, significant, albeit weak associations were observed between almost all MCAT component scores and USMLE scores (Cramer's V ranged from 0.05 to 0.24). Discussion Each of the MCAT component scores was significantly associated with USMLE Step 1 and Step 2 CK scores, although the effect size was small. Being in the top or bottom scoring range of the MCAT exam was predictive of being in the top or bottom scoring range of the USMLE exams, although the strengths of the associations were weak to moderate. These results indicate that MCAT scores are predictive of student performance on the USMLE exams, but, given the small effect sizes, should be considered as part of the holistic view of the student. PMID:27702431

  8. Analysis of medical litigation among patients with medical disputes in cosmetic surgery in Taiwan.

    PubMed

    Lyu, Shu-Yu; Liao, Chuh-Kai; Chang, Kao-Ping; Tsai, Shang-Ta; Lee, Ming-Been; Tsai, Feng-Chou

    2011-10-01

    This study aimed to investigate the key factors in medical disputes (arguments) among female patients after cosmetic surgery in Taiwan and to explore the correlates of medical litigation. A total of 6,888 patients (3,210 patients from two hospitals and 3,678 patients from two clinics) received cosmetic surgery from January 2001 to December 2009. The inclusion criteria specified female patients with a medical dispute. Chi-square testing and multiple logistic regression analysis were used to analyze the data. Of the 43 patients who had a medical dispute (hospitals, 0.53%; clinics, 0.73%), 9 plaintiffs eventually filed suit against their plastic surgeons. Such an outcome exhibited a decreasing annual trend. The hospitals and clinics did not differ significantly in terms of patient profiles. The Chi-square test showed that most patients with a medical dispute (p < 0.05) were older than 30 years, were divorced or married, had received operations under general anesthesia, had no economic stress, had a history of medical litigation, and eventually did not sue the surgeons. The test results also showed that the surgeon's seniority and experience significantly influenced the possibility of medical dispute and nonlitigation. Multiple logistical regression analysis further showed that the patients who did decide to enter into litigation had two main related factors: marital stress (odds ratio [OR], 10.67; 95% confidence interval [CI], 1.20-94.73) and an education level below junior college (OR, 9.33; 95% CI, 1.01-86.36). The study findings suggest that the key characteristics of patients and surgeons should be taken into consideration not only in the search for ways to enhance pre- and postoperative communication but also as useful information for expert testimony in the inquisitorial law system.

  9. Prediction of Maximal Oxygen Uptake by Six-Minute Walk Test and Body Mass Index in Healthy Boys.

    PubMed

    Jalili, Majid; Nazem, Farzad; Sazvar, Akbar; Ranjbar, Kamal

    2018-05-14

    To develop an equation to predict maximal oxygen uptake (VO2max) based on the 6-minute walk test (6MWT) and body composition in healthy boys. Direct VO2max, 6-minute walk distance, and anthropometric characteristics were measured in 349 healthy boys (12.49 ± 2.72 years). Multiple regression analysis was used to generate VO2max prediction equations. Cross-validation of the VO2max prediction equations was assessed with predicted residual sum of squares statistics. Pearson correlation was used to assess the correlation between measured and predicted VO2max. Objectively measured VO2max had a significant correlation with demographic and 6MWT characteristics (R = 0.11-0.723, P < .01). Multiple regression analysis revealed the following VO2max prediction equation: VO2max (mL/kg/min) = 12.701 + (0.06 × 6-minute walk distance m ) - (0.732 × body mass index kg/m2 ) (R 2 = 0.79, standard error of the estimate [SEE] = 2.91 mL/kg/min, %SEE = 6.9%). There was strong correlation between measured and predicted VO2max (r = 0.875, P < .001). Cross-validation revealed minimal shrinkage (R 2 p = 0.78 and predicted residual sum of squares SEE = 2.99 mL/kg/min). This study provides a relatively accurate and convenient VO2max prediction equation based on the 6MWT and body mass index in healthy boys. This model can be used for evaluation of cardiorespiratory fitness of boys in different settings. Copyright © 2018 Elsevier Inc. All rights reserved.

  10. Regional regression of flood characteristics employing historical information

    USGS Publications Warehouse

    Tasker, Gary D.; Stedinger, J.R.

    1987-01-01

    Streamflow gauging networks provide hydrologic information for use in estimating the parameters of regional regression models. The regional regression models can be used to estimate flood statistics, such as the 100 yr peak, at ungauged sites as functions of drainage basin characteristics. A recent innovation in regional regression is the use of a generalized least squares (GLS) estimator that accounts for unequal station record lengths and sample cross correlation among the flows. However, this technique does not account for historical flood information. A method is proposed here to adjust this generalized least squares estimator to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the ordinary least squares estimator. Results indicate that using the GLS estimator adjusted for historical information significantly improves the regression model. ?? 1987.

  11. Computation of nonlinear least squares estimator and maximum likelihood using principles in matrix calculus

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.

    2017-11-01

    This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation

  12. 40 CFR Appendix C to Subpart Nnn... - Method for the Determination of Product Density

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... One square foot (12 in. by 12 in.) template, or templates that are multiples of one square foot, for... to the plant's written procedure for the designated product. 3.2Cut samples using one square foot (or multiples of one square foot) template. 3.3Weigh product and obtain area weight (lb/ft2). 3.4Measure sample...

  13. 40 CFR Appendix C to Subpart Nnn... - Method for the Determination of Product Density

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... One square foot (12 in. by 12 in.) template, or templates that are multiples of one square foot, for... to the plant's written procedure for the designated product. 3.2Cut samples using one square foot (or multiples of one square foot) template. 3.3Weigh product and obtain area weight (lb/ft2). 3.4Measure sample...

  14. Advantages of continuous genotype values over genotype classes for GWAS in higher polyploids: a comparative study in hexaploid chrysanthemum.

    PubMed

    Grandke, Fabian; Singh, Priyanka; Heuven, Henri C M; de Haan, Jorn R; Metzler, Dirk

    2016-08-24

    Association studies are an essential part of modern plant breeding, but are limited for polyploid crops. The increased number of possible genotype classes complicates the differentiation between them. Available methods are limited with respect to the ploidy level or data producing technologies. While genotype classification is an established noise reduction step in diploids, it gains complexity with increasing ploidy levels. Eventually, the errors produced by misclassifications exceed the benefits of genotype classes. Alternatively, continuous genotype values can be used for association analysis in higher polyploids. We associated continuous genotypes to three different traits and compared the results to the output of the genotype caller SuperMASSA. Linear, Bayesian and partial least squares regression were applied, to determine if the use of continuous genotypes is limited to a specific method. A disease, a flowering and a growth trait with h (2) of 0.51, 0.78 and 0.91 were associated with a hexaploid chrysanthemum genotypes. The data set consisted of 55,825 probes and 228 samples. We were able to detect associating probes using continuous genotypes for multiple traits, using different regression methods. The identified probe sets were overlapping, but not identical between the methods. Baysian regression was the most restrictive method, resulting in ten probes for one trait and none for the others. Linear and partial least squares regression led to numerous associating probes. Association based on genotype classes resulted in similar values, but missed several significant probes. A simulation study was used to successfully validate the number of associating markers. Association of various phenotypic traits with continuous genotypes is successful with both uni- and multivariate regression methods. Genotype calling does not improve the association and shows no advantages in this study. Instead, use of continuous genotypes simplifies the analysis, saves computational time and results more potential markers.

  15. Inter-class sparsity based discriminative least square regression.

    PubMed

    Wen, Jie; Xu, Yong; Li, Zuoyong; Ma, Zhongli; Xu, Yuanrong

    2018-06-01

    Least square regression is a very popular supervised classification method. However, two main issues greatly limit its performance. The first one is that it only focuses on fitting the input features to the corresponding output labels while ignoring the correlations among samples. The second one is that the used label matrix, i.e., zero-one label matrix is inappropriate for classification. To solve these problems and improve the performance, this paper presents a novel method, i.e., inter-class sparsity based discriminative least square regression (ICS_DLSR), for multi-class classification. Different from other methods, the proposed method pursues that the transformed samples have a common sparsity structure in each class. For this goal, an inter-class sparsity constraint is introduced to the least square regression model such that the margins of samples from the same class can be greatly reduced while those of samples from different classes can be enlarged. In addition, an error term with row-sparsity constraint is introduced to relax the strict zero-one label matrix, which allows the method to be more flexible in learning the discriminative transformation matrix. These factors encourage the method to learn a more compact and discriminative transformation for regression and thus has the potential to perform better than other methods. Extensive experimental results show that the proposed method achieves the best performance in comparison with other methods for multi-class classification. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Least Squares Procedures.

    ERIC Educational Resources Information Center

    Hester, Yvette

    Least squares methods are sophisticated mathematical curve fitting procedures used in all classical parametric methods. The linear least squares approximation is most often associated with finding the "line of best fit" or the regression line. Since all statistical analyses are correlational and all classical parametric methods are least…

  17. Prediction of erodibility in Oxisols using iron oxides, soil color and diffuse reflectance spectroscopy

    NASA Astrophysics Data System (ADS)

    Arantes Camargo, Livia; Marques, José, Jr.

    2015-04-01

    The prediction of erodibility using indirect methods such as diffuse reflectance spectroscopy could facilitate the characterization of the spatial variability in large areas and optimize implementation of conservation practices. The aim of this study was to evaluate the prediction of interrill erodibility (Ki) and rill erodibility (Kr) by means of iron oxides content and soil color using multiple linear regression and diffuse reflectance spectroscopy (DRS) using regression analysis by least squares partial (PLSR). The soils were collected from three geomorphic surfaces and analyzed for chemical, physical and mineralogical properties, plus scanned in the spectral range from the visible and infrared. Maps of spatial distribution of Ki and Kr were built with the values calculated by the calibrated models that obtained the best accuracy using geostatistics. Interrill-rill erodibility presented negative correlation with iron extracted by dithionite-citrate-bicarbonate, hematite, and chroma, confirming the influence of iron oxides in soil structural stability. Hematite and hue were the attributes that most contributed in calibration models by multiple linear regression for the prediction of Ki (R2 = 0.55) and Kr (R2 = 0.53). The diffuse reflectance spectroscopy via PLSR allowed to predict Interrill-rill erodibility with high accuracy (R2adj = 0.76, 0.81 respectively and RPD> 2.0) in the range of the visible spectrum (380-800 nm) and the characterization of the spatial variability of these attributes by geostatistics.

  18. Investigation of marital satisfaction and its relationship with job stress and general health of nurses in Qazvin, Iran

    PubMed Central

    Azimian, Jalil; Piran, Pegah; Jahanihashemi, Hassan; Dehghankar, Leila

    2017-01-01

    Background Pressures in nursing can affect family life and marital problems, disrupt common social problems, increase work-family conflicts and endanger people’s general health. Aim To determine marital satisfaction and its relationship with job stress and general health of nurses. Methods This descriptive and cross-sectional study was done in 2015 in medical educational centers of Qazvin by using an ENRICH marital satisfaction scale and General Health and Job Stress questionnaires completed by 123 nurses. Analysis was done by SPSS version 19 using descriptive and analytical statistics (Pearson correlation, t-test, ANOVA, Chi-square, regression line, multiple regression analysis). Results The findings showed that 64.4% of nurses had marital satisfaction. There was significant relationship between age (p=0.03), job experience (p=0.01), age of spouse (p=0.01) and marital satisfaction. The results showed that there was a significant relationship between marital satisfaction and general health (p<0.0001). Multiple regression analysis showed that there was a significant relationship between depression (p=0.012) and anxiety (p=0.001) with marital satisfaction. Conclusions Due to high levels of job stress and disorder in general health of nurses and low marital satisfaction by running health promotion programs and paying attention to its dimensions can help work and family health of nurses. PMID:28607660

  19. Simultaneous determination of hydroquinone, resorcinol, phenol, m-cresol and p-cresol in untreated air samples using spectrofluorimetry and a custom multiple linear regression-successive projection algorithm.

    PubMed

    Pistonesi, Marcelo F; Di Nezio, María S; Centurión, María E; Lista, Adriana G; Fragoso, Wallace D; Pontes, Márcio J C; Araújo, Mário C U; Band, Beatriz S Fernández

    2010-12-15

    In this study, a novel, simple, and efficient spectrofluorimetric method to determine directly and simultaneously five phenolic compounds (hydroquinone, resorcinol, phenol, m-cresol and p-cresol) in air samples is presented. For this purpose, variable selection by the successive projections algorithm (SPA) is used in order to obtain simple multiple linear regression (MLR) models based on a small subset of wavelengths. For comparison, partial least square (PLS) regression is also employed in full-spectrum. The concentrations of the calibration matrix ranged from 0.02 to 0.2 mg L(-1) for hydroquinone, from 0.05 to 0.6 mg L(-1) for resorcinol, and from 0.05 to 0.4 mg L(-1) for phenol, m-cresol and p-cresol; incidentally, such ranges are in accordance with the Argentinean environmental legislation. To verify the accuracy of the proposed method a recovery study on real air samples of smoking environment was carried out with satisfactory results (94-104%). The advantage of the proposed method is that it requires only spectrofluorimetric measurements of samples and chemometric modeling for simultaneous determination of five phenols. With it, air is simply sampled and no pre-treatment sample is needed (i.e., separation steps and derivatization reagents are avoided) that means a great saving of time. Copyright © 2010 Elsevier B.V. All rights reserved.

  20. Determination of suitable drying curve model for bread moisture loss during baking

    NASA Astrophysics Data System (ADS)

    Soleimani Pour-Damanab, A. R.; Jafary, A.; Rafiee, S.

    2013-03-01

    This study presents mathematical modelling of bread moisture loss or drying during baking in a conventional bread baking process. In order to estimate and select the appropriate moisture loss curve equation, 11 different models, semi-theoretical and empirical, were applied to the experimental data and compared according to their correlation coefficients, chi-squared test and root mean square error which were predicted by nonlinear regression analysis. Consequently, of all the drying models, a Page model was selected as the best one, according to the correlation coefficients, chi-squared test, and root mean square error values and its simplicity. Mean absolute estimation error of the proposed model by linear regression analysis for natural and forced convection modes was 2.43, 4.74%, respectively.

  1. Bivariate least squares linear regression: Towards a unified analytic formalism. I. Functional models

    NASA Astrophysics Data System (ADS)

    Caimmi, R.

    2011-08-01

    Concerning bivariate least squares linear regression, the classical approach pursued for functional models in earlier attempts ( York, 1966, 1969) is reviewed using a new formalism in terms of deviation (matrix) traces which, for unweighted data, reduce to usual quantities leaving aside an unessential (but dimensional) multiplicative factor. Within the framework of classical error models, the dependent variable relates to the independent variable according to the usual additive model. The classes of linear models considered are regression lines in the general case of correlated errors in X and in Y for weighted data, and in the opposite limiting situations of (i) uncorrelated errors in X and in Y, and (ii) completely correlated errors in X and in Y. The special case of (C) generalized orthogonal regression is considered in detail together with well known subcases, namely: (Y) errors in X negligible (ideally null) with respect to errors in Y; (X) errors in Y negligible (ideally null) with respect to errors in X; (O) genuine orthogonal regression; (R) reduced major-axis regression. In the limit of unweighted data, the results determined for functional models are compared with their counterparts related to extreme structural models i.e. the instrumental scatter is negligible (ideally null) with respect to the intrinsic scatter ( Isobe et al., 1990; Feigelson and Babu, 1992). While regression line slope and intercept estimators for functional and structural models necessarily coincide, the contrary holds for related variance estimators even if the residuals obey a Gaussian distribution, with the exception of Y models. An example of astronomical application is considered, concerning the [O/H]-[Fe/H] empirical relations deduced from five samples related to different stars and/or different methods of oxygen abundance determination. For selected samples and assigned methods, different regression models yield consistent results within the errors (∓ σ) for both heteroscedastic and homoscedastic data. Conversely, samples related to different methods produce discrepant results, due to the presence of (still undetected) systematic errors, which implies no definitive statement can be made at present. A comparison is also made between different expressions of regression line slope and intercept variance estimators, where fractional discrepancies are found to be not exceeding a few percent, which grows up to about 20% in the presence of large dispersion data. An extension of the formalism to structural models is left to a forthcoming paper.

  2. Explaining cross-national differences in marriage, cohabitation, and divorce in Europe, 1990-2000.

    PubMed

    Kalmijn, Matthijs

    2007-11-01

    European countries differ considerably in their marriage patterns. The study presented in this paper describes these differences for the 1990s and attempts to explain them from a macro-level perspective. We find that different indicators of marriage (i.e., marriage rate, age at marriage, divorce rate, and prevalence of unmarried cohabitation) cannot be seen as indicators of an underlying concept such as the 'strength of marriage'. Multivariate ordinary least squares (OLS) regression analyses are estimated with countries as units and panel regression models are estimated in which annual time series for multiple countries are pooled. Using these models, we find that popular explanations of trends in the indicators - explanations that focus on gender roles, secularization, unemployment, and educational expansion - are also important for understanding differences among countries. We also find evidence for the role of historical continuity and societal disintegration in understanding cross-national differences.

  3. Performance of bias-correction methods for exposure measurement error using repeated measurements with and without missing data.

    PubMed

    Batistatou, Evridiki; McNamee, Roseanne

    2012-12-10

    It is known that measurement error leads to bias in assessing exposure effects, which can however, be corrected if independent replicates are available. For expensive replicates, two-stage (2S) studies that produce data 'missing by design', may be preferred over a single-stage (1S) study, because in the second stage, measurement of replicates is restricted to a sample of first-stage subjects. Motivated by an occupational study on the acute effect of carbon black exposure on respiratory morbidity, we compare the performance of several bias-correction methods for both designs in a simulation study: an instrumental variable method (EVROS IV) based on grouping strategies, which had been recommended especially when measurement error is large, the regression calibration and the simulation extrapolation methods. For the 2S design, either the problem of 'missing' data was ignored or the 'missing' data were imputed using multiple imputations. Both in 1S and 2S designs, in the case of small or moderate measurement error, regression calibration was shown to be the preferred approach in terms of root mean square error. For 2S designs, regression calibration as implemented by Stata software is not recommended in contrast to our implementation of this method; the 'problematic' implementation of regression calibration although substantially improved with use of multiple imputations. The EVROS IV method, under a good/fairly good grouping, outperforms the regression calibration approach in both design scenarios when exposure mismeasurement is severe. Both in 1S and 2S designs with moderate or large measurement error, simulation extrapolation severely failed to correct for bias. Copyright © 2012 John Wiley & Sons, Ltd.

  4. Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.

    ERIC Educational Resources Information Center

    Willett, John B.; Singer, Judith D.

    In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…

  5. Non-destructive and rapid prediction of moisture content in red pepper (Capsicum annuum L.) powder using near-infrared spectroscopy and a partial least squares regression model

    USDA-ARS?s Scientific Manuscript database

    Purpose: The aim of this study was to develop a technique for the non-destructive and rapid prediction of the moisture content in red pepper powder using near-infrared (NIR) spectroscopy and a partial least squares regression (PLSR) model. Methods: Three red pepper powder products were separated in...

  6. Local Linear Regression for Data with AR Errors.

    PubMed

    Li, Runze; Li, Yan

    2009-07-01

    In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.

  7. Linear regression in astronomy. II

    NASA Technical Reports Server (NTRS)

    Feigelson, Eric D.; Babu, Gutti J.

    1992-01-01

    A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

  8. Optimal Wavelength Selection on Hyperspectral Data with Fused Lasso for Biomass Estimation of Tropical Rain Forest

    NASA Astrophysics Data System (ADS)

    Takayama, T.; Iwasaki, A.

    2016-06-01

    Above-ground biomass prediction of tropical rain forest using remote sensing data is of paramount importance to continuous large-area forest monitoring. Hyperspectral data can provide rich spectral information for the biomass prediction; however, the prediction accuracy is affected by a small-sample-size problem, which widely exists as overfitting in using high dimensional data where the number of training samples is smaller than the dimensionality of the samples due to limitation of require time, cost, and human resources for field surveys. A common approach to addressing this problem is reducing the dimensionality of dataset. Also, acquired hyperspectral data usually have low signal-to-noise ratio due to a narrow bandwidth and local or global shifts of peaks due to instrumental instability or small differences in considering practical measurement conditions. In this work, we propose a methodology based on fused lasso regression that select optimal bands for the biomass prediction model with encouraging sparsity and grouping, which solves the small-sample-size problem by the dimensionality reduction from the sparsity and the noise and peak shift problem by the grouping. The prediction model provided higher accuracy with root-mean-square error (RMSE) of 66.16 t/ha in the cross-validation than other methods; multiple linear analysis, partial least squares regression, and lasso regression. Furthermore, fusion of spectral and spatial information derived from texture index increased the prediction accuracy with RMSE of 62.62 t/ha. This analysis proves efficiency of fused lasso and image texture in biomass estimation of tropical forests.

  9. Statistical procedures for determination and verification of minimum reporting levels for drinking water methods.

    PubMed

    Winslow, Stephen D; Pepich, Barry V; Martin, John J; Hallberg, George R; Munch, David J; Frebis, Christopher P; Hedrick, Elizabeth J; Krop, Richard A

    2006-01-01

    The United States Environmental Protection Agency's Office of Ground Water and Drinking Water has developed a single-laboratory quantitation procedure: the lowest concentration minimum reporting level (LCMRL). The LCMRL is the lowest true concentration for which future recovery is predicted to fall, with high confidence (99%), between 50% and 150%. The procedure takes into account precision and accuracy. Multiple concentration replicates are processed through the entire analytical method and the data are plotted as measured sample concentration (y-axis) versus true concentration (x-axis). If the data support an assumption of constant variance over the concentration range, an ordinary least-squares regression line is drawn; otherwise, a variance-weighted least-squares regression is used. Prediction interval lines of 99% confidence are drawn about the regression. At the points where the prediction interval lines intersect with data quality objective lines of 50% and 150% recovery, lines are dropped to the x-axis. The higher of the two values is the LCMRL. The LCMRL procedure is flexible because the data quality objectives (50-150%) and the prediction interval confidence (99%) can be varied to suit program needs. The LCMRL determination is performed during method development only. A simpler procedure for verification of data quality objectives at a given minimum reporting level (MRL) is also presented. The verification procedure requires a single set of seven samples taken through the entire method procedure. If the calculated prediction interval is contained within data quality recovery limits (50-150%), the laboratory performance at the MRL is verified.

  10. Socio-demographic correlates of breast-feeding in urban slums of Chandigarh.

    PubMed

    Kumar, Dinesh; Agarwal, Neeraj; Swami, H M

    2006-11-01

    Whether socio-demographic factors are associated with initiation of breast-feeding in urban slums of Chandigarh. (1) To study the prevailing breast-feeding practices adopted by mothers, (2) To study the socio-demographic factors associated with initiation of breast-feeding. Cross-sectional. Mothers of infants willing to participate in the study in the selected area. A total of 270 respondents. Social and demographic characteristics like age, socioeconomic status, educational level, birth interval, parity, gender preference, natal care practices, etc.; and variables related to various aspects of breast-feeding practices like prelacteal feed, initiation of feeding, colostrum feeding, reasons of discarding colostrum, etc. Chi-square test and odd ratios along with their respective 95% confidence intervals, multiple logistic regression analysis. Out of all 270 respondents, 159 (58.9%) initiated breast-feeding within 6 h of birth, only 43 (15.9%) discarded colostrum and 108 (40.0%) mothers gave prelacteal feed. Illiterate/just literate mothers who delivered at home were found at significantly higher risk of delay in initiation of breast-feeding on the basis of multiple logistic regression analysis. Promotion of institutional deliveries and imparting health education to mothers for protecting and promoting optimal breast-feeding practices are suggested.

  11. Dental calculus is associated with death from heart infarction.

    PubMed

    Söder, Birgitta; Meurman, Jukka H; Söder, Per-Östen

    2014-01-01

    We studied whether the amount of dental calculus is associated with death from heart infarction in the dental infection-atherosclerosis paradigm. Participants were 1676 healthy young Swedes followed up from 1985 to 2011. At the beginning of the study all subjects underwent oral clinical examination including dental calculus registration scored with calculus index (CI). Outcome measure was cause of death classified according to WHO International Classification of Diseases. Unpaired t-test, Chi-square tests, and multiple logistic regressions were used. Of the 1676 participants, 2.8% had died during follow-up. Women died at a mean age of 61.5 years and men at 61.7 years. The difference in the CI index score between the survivors versus deceased patients was significant by the year 2009 (P < 0.01). In multiple regression analysis of the relationship between death from heart infarction as a dependent variable and CI as independent variable with controlling for age, gender, dental visits, dental plaque, periodontal pockets, education, income, socioeconomic status, and pack-years of smoking, CI score appeared to be associated with 2.3 times the odds ratio for cardiac death. The results confirmed our study hypothesis by showing that dental calculus indeed associated statistically with cardiac death due to infarction.

  12. Fast Quantitative Analysis Of Museum Objects Using Laser-Induced Breakdown Spectroscopy And Multiple Regression Algorithms

    NASA Astrophysics Data System (ADS)

    Lorenzetti, G.; Foresta, A.; Palleschi, V.; Legnaioli, S.

    2009-09-01

    The recent development of mobile instrumentation, specifically devoted to in situ analysis and study of museum objects, allows the acquisition of many LIBS spectra in very short time. However, such large amount of data calls for new analytical approaches which would guarantee a prompt analysis of the results obtained. In this communication, we will present and discuss the advantages of statistical analytical methods, such as Partial Least Squares Multiple Regression algorithms vs. the classical calibration curve approach. PLS algorithms allows to obtain in real time the information on the composition of the objects under study; this feature of the method, compared to the traditional off-line analysis of the data, is extremely useful for the optimization of the measurement times and number of points associated with the analysis. In fact, the real time availability of the compositional information gives the possibility of concentrating the attention on the most `interesting' parts of the object, without over-sampling the zones which would not provide useful information for the scholars or the conservators. Some example on the applications of this method will be presented, including the studies recently performed by the researcher of the Applied Laser Spectroscopy Laboratory on museum bronze objects.

  13. Prevalence of Depressive Symptoms and Related Factors in Korean Employees: The Third Korean Working Conditions Survey (2011).

    PubMed

    Park, Ji Nam; Han, Mi Ah; Park, Jong; Ryu, So Yeon

    2016-04-14

    The aim of this study was to analyze the association between general working conditions and depressive symptoms among Korean employees. The target population of the study was native employees nationwide who were at least 15 years old, and 50,032 such individuals were enrolled in the study. Depressive symptoms was assessed using the WHO-5 wellbeing index. Associations between general characteristics, job-related characteristics, work environment, and depressive symptoms were tested using chi-square tests, t-tests, and multiple logistic regression analysis. The prevalence of depressive symptoms was 39% (40.7% in males and 36.5% in females). Multiple regression analysis revealed that male subjects, older subjects, subjects with higher education status, subjects with lower monthly income, current smokers, and frequent drinkers were more likely to have depressive symptoms. In addition, longer weekly work hours, occupation type (skilled, unskilled, operative, or economic sector), shift work, working to tight deadlines, exposure to stress at work, and hazard exposure were associated with depressive symptoms. This representative study will be a guide to help manage depression among Korean employees. We expect that further research will identify additional causal relationships between general or specific working conditions and depression.

  14. Physical activity and depressive symptoms in four ethnic groups of midlife women.

    PubMed

    Im, Eun-Ok; Ham, Ok Kyung; Chee, Eunice; Chee, Wonshik

    2015-06-01

    The purpose of this study was to determine the associations between physical activity and depression and the multiple contextual factors influencing these associations in four major ethnic groups of midlife women in the United States. This was a secondary analysis of the data from 542 midlife women. The instruments included questions on background characteristics and health and menopausal status; the Depression Index for Midlife Women (DIMW); and the Kaiser Physical Activity Survey (KPAS). The data were analyzed using chi-square tests, the ANOVA, two-way ANOVA, correlation analyses, and hierarchical multiple regression analyses. The women's depressive symptoms were negatively correlated with active living and sports/exercise physical activities whereas they were positively correlated with occupational physical activities (p < .01). Family income was the strongest predictor of their depressive symptoms. Increasing physical activity may improve midlife women's depressive symptoms, but the types of physical activity and multiple contextual factors need to be considered in intervention development. © The Author(s) 2014.

  15. Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method

    PubMed Central

    Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

    2016-01-01

    To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026

  16. BrightStat.com: free statistics online.

    PubMed

    Stricker, Daniel

    2008-10-01

    Powerful software for statistical analysis is expensive. Here I present BrightStat, a statistical software running on the Internet which is free of charge. BrightStat's goals, its main capabilities and functionalities are outlined. Three different sample runs, a Friedman test, a chi-square test, and a step-wise multiple regression are presented. The results obtained by BrightStat are compared with results computed by SPSS, one of the global leader in providing statistical software, and VassarStats, a collection of scripts for data analysis running on the Internet. Elementary statistics is an inherent part of academic education and BrightStat is an alternative to commercial products.

  17. Estimation of aboveground biomass in Mediterranean forests by statistical modelling of ASTER fraction images

    NASA Astrophysics Data System (ADS)

    Fernández-Manso, O.; Fernández-Manso, A.; Quintano, C.

    2014-09-01

    Aboveground biomass (AGB) estimation from optical satellite data is usually based on regression models of original or synthetic bands. To overcome the poor relation between AGB and spectral bands due to mixed-pixels when a medium spatial resolution sensor is considered, we propose to base the AGB estimation on fraction images from Linear Spectral Mixture Analysis (LSMA). Our study area is a managed Mediterranean pine woodland (Pinus pinaster Ait.) in central Spain. A total of 1033 circular field plots were used to estimate AGB from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) optical data. We applied Pearson correlation statistics and stepwise multiple regression to identify suitable predictors from the set of variables of original bands, fraction imagery, Normalized Difference Vegetation Index and Tasselled Cap components. Four linear models and one nonlinear model were tested. A linear combination of ASTER band 2 (red, 0.630-0.690 μm), band 8 (short wave infrared 5, 2.295-2.365 μm) and green vegetation fraction (from LSMA) was the best AGB predictor (Radj2=0.632, the root-mean-squared error of estimated AGB was 13.3 Mg ha-1 (or 37.7%), resulting from cross-validation), rather than other combinations of the above cited independent variables. Results indicated that using ASTER fraction images in regression models improves the AGB estimation in Mediterranean pine forests. The spatial distribution of the estimated AGB, based on a multiple linear regression model, may be used as baseline information for forest managers in future studies, such as quantifying the regional carbon budget, fuel accumulation or monitoring of management practices.

  18. Analysis of Point Based Image Registration Errors With Applications in Single Molecule Microscopy

    PubMed Central

    Cohen, E. A. K.; Ober, R. J.

    2014-01-01

    We present an asymptotic treatment of errors involved in point-based image registration where control point (CP) localization is subject to heteroscedastic noise; a suitable model for image registration in fluorescence microscopy. Assuming an affine transform, CPs are used to solve a multivariate regression problem. With measurement errors existing for both sets of CPs this is an errors-in-variable problem and linear least squares is inappropriate; the correct method being generalized least squares. To allow for point dependent errors the equivalence of a generalized maximum likelihood and heteroscedastic generalized least squares model is achieved allowing previously published asymptotic results to be extended to image registration. For a particularly useful model of heteroscedastic noise where covariance matrices are scalar multiples of a known matrix (including the case where covariance matrices are multiples of the identity) we provide closed form solutions to estimators and derive their distribution. We consider the target registration error (TRE) and define a new measure called the localization registration error (LRE) believed to be useful, especially in microscopy registration experiments. Assuming Gaussianity of the CP localization errors, it is shown that the asymptotic distribution for the TRE and LRE are themselves Gaussian and the parameterized distributions are derived. Results are successfully applied to registration in single molecule microscopy to derive the key dependence of the TRE and LRE variance on the number of CPs and their associated photon counts. Simulations show asymptotic results are robust for low CP numbers and non-Gaussianity. The method presented here is shown to outperform GLS on real imaging data. PMID:24634573

  19. Linear mixed-effects models to describe individual tree crown width for China-fir in Fujian Province, southeast China.

    PubMed

    Hao, Xu; Yujun, Sun; Xinjie, Wang; Jin, Wang; Yao, Fu

    2015-01-01

    A multiple linear model was developed for individual tree crown width of Cunninghamia lanceolata (Lamb.) Hook in Fujian province, southeast China. Data were obtained from 55 sample plots of pure China-fir plantation stands. An Ordinary Linear Least Squares (OLS) regression was used to establish the crown width model. To adjust for correlations between observations from the same sample plots, we developed one level linear mixed-effects (LME) models based on the multiple linear model, which take into account the random effects of plots. The best random effects combinations for the LME models were determined by the Akaike's information criterion, the Bayesian information criterion and the -2logarithm likelihood. Heteroscedasticity was reduced by three residual variance functions: the power function, the exponential function and the constant plus power function. The spatial correlation was modeled by three correlation structures: the first-order autoregressive structure [AR(1)], a combination of first-order autoregressive and moving average structures [ARMA(1,1)], and the compound symmetry structure (CS). Then, the LME model was compared to the multiple linear model using the absolute mean residual (AMR), the root mean square error (RMSE), and the adjusted coefficient of determination (adj-R2). For individual tree crown width models, the one level LME model showed the best performance. An independent dataset was used to test the performance of the models and to demonstrate the advantage of calibrating LME models.

  20. Fourier transform infrared reflectance spectra of latent fingerprints: a biometric gauge for the age of an individual.

    PubMed

    Hemmila, April; McGill, Jim; Ritter, David

    2008-03-01

    To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.

  1. Analysis of Learning Curve Fitting Techniques.

    DTIC Science & Technology

    1987-09-01

    1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied

  2. A Simulation-Based Comparison of Several Stochastic Linear Regression Methods in the Presence of Outliers.

    ERIC Educational Resources Information Center

    Rule, David L.

    Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…

  3. Nonparametric methods for drought severity estimation at ungauged sites

    NASA Astrophysics Data System (ADS)

    Sadri, S.; Burn, D. H.

    2012-12-01

    The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.

  4. Near-infrared reflectance spectroscopy predicts protein, starch, and seed weight in intact seeds of common bean ( Phaseolus vulgaris L.).

    PubMed

    Hacisalihoglu, Gokhan; Larbi, Bismark; Settles, A Mark

    2010-01-27

    The objective of this study was to explore the potential of near-infrared reflectance (NIR) spectroscopy to determine individual seed composition in common bean ( Phaseolus vulgaris L.). NIR spectra and analytical measurements of seed weight, protein, and starch were collected from 267 individual bean seeds representing 91 diverse genotypes. Partial least-squares (PLS) regression models were developed with 61 bean accessions randomly assigned to a calibration data set and 30 accessions assigned to an external validation set. Protein gave the most accurate PLS regression, with the external validation set having a standard error of prediction (SEP) = 1.6%. PLS regressions for seed weight and starch had sufficient accuracy for seed sorting applications, with SEP = 41.2 mg and 4.9%, respectively. Seed color had a clear effect on the NIR spectra, with black beans having a distinct spectral type. Seed coat color did not impact the accuracy of PLS predictions. This research demonstrates that NIR is a promising technique for simultaneous sorting of multiple seed traits in single bean seeds with no sample preparation.

  5. The discomfort produced by noise and whole-body vertical vibration presented separately and in combination.

    PubMed

    Huang, Yu; Griffin, Michael J

    2014-01-01

    This study investigated the prediction of the discomfort caused by simultaneous noise and vibration from the discomfort caused by noise and the discomfort caused by vibration when they are presented separately. A total of 24 subjects used absolute magnitude estimation to report their discomfort caused by seven levels of noise (70-88 dBA SEL), 7 magnitudes of vibration (0.146-2.318 ms(- 1.75)) and all 49 possible combinations of these noise and vibration stimuli. Vibration did not significantly influence judgements of noise discomfort, but noise reduced vibration discomfort by an amount that increased with increasing noise level, consistent with a 'masking effect' of noise on judgements of vibration discomfort. A multiple linear regression model or a root-sums-of-squares model predicted the discomfort caused by combined noise and vibration, but the root-sums-of-squares model is more convenient and provided a more accurate prediction of the discomfort produced by combined noise and vibration.

  6. Ex-vivo UV autofluorescence imaging and fluorescence spectroscopy of atherosclerotic pathology in human aorta

    NASA Astrophysics Data System (ADS)

    Lewis, William; Williams, Maura; Franco, Walfre

    2017-02-01

    The aim of our study was to identify fluorescence excitation-emission pairs correlated with atherosclerotic pathology in ex-vivo human aorta. Wide-field images of atherosclerotic human aorta were captured using UV and visible excitation and emission wavelength pairs of several known fluorophores to investigate correspondence with gross pathologic features. Fluorescence spectroscopy and histology were performed on 21 aortic samples. A matrix of Pearson correlation coefficients were determined for the relationship between relevant histologic features and the intensity of emission for 427 wavelength pairs. A multiple linear regression analysis indicated that elastin (370/460 nm) and tryptophan (290/340 nm) fluorescence predicted 58% of the variance in intima thickness (R-squared = 0.588, F(2,18) = 12.8, p=.0003), and 48% of the variance in media thickness (R-squared = 0.483, F(2,18) = 8.42, p=.002), suggesting that endogenous fluorescence intensity at these wavelengths can be utilized for improved pathologic characterization of atherosclerotic plaques.

  7. New methods of testing nonlinear hypothesis using iterative NLLS estimator

    NASA Astrophysics Data System (ADS)

    Mahaboob, B.; Venkateswarlu, B.; Mokeshrayalu, G.; Balasiddamuni, P.

    2017-11-01

    This research paper discusses the method of testing nonlinear hypothesis using iterative Nonlinear Least Squares (NLLS) estimator. Takeshi Amemiya [1] explained this method. However in the present research paper, a modified Wald test statistic due to Engle, Robert [6] is proposed to test the nonlinear hypothesis using iterative NLLS estimator. An alternative method for testing nonlinear hypothesis using iterative NLLS estimator based on nonlinear hypothesis using iterative NLLS estimator based on nonlinear studentized residuals has been proposed. In this research article an innovative method of testing nonlinear hypothesis using iterative restricted NLLS estimator is derived. Pesaran and Deaton [10] explained the methods of testing nonlinear hypothesis. This paper uses asymptotic properties of nonlinear least squares estimator proposed by Jenrich [8]. The main purpose of this paper is to provide very innovative methods of testing nonlinear hypothesis using iterative NLLS estimator, iterative NLLS estimator based on nonlinear studentized residuals and iterative restricted NLLS estimator. Eakambaram et al. [12] discussed least absolute deviation estimations versus nonlinear regression model with heteroscedastic errors and also they studied the problem of heteroscedasticity with reference to nonlinear regression models with suitable illustration. William Grene [13] examined the interaction effect in nonlinear models disused by Ai and Norton [14] and suggested ways to examine the effects that do not involve statistical testing. Peter [15] provided guidelines for identifying composite hypothesis and addressing the probability of false rejection for multiple hypotheses.

  8. Soil fungal diversity in natural grasslands of the Tibetan Plateau: associations with plant diversity and productivity.

    PubMed

    Yang, Teng; Adams, Jonathan M; Shi, Yu; He, Jin-Sheng; Jing, Xin; Chen, Litong; Tedersoo, Leho; Chu, Haiyan

    2017-07-01

    Previous studies have revealed inconsistent correlations between fungal diversity and plant diversity from local to global scales, and there is a lack of information about the diversity-diversity and productivity-diversity relationships for fungi in alpine regions. Here we investigated the internal relationships between soil fungal diversity, plant diversity and productivity across 60 grassland sites on the Tibetan Plateau, using Illumina sequencing of the internal transcribed spacer 2 (ITS2) region for fungal identification. Fungal alpha and beta diversities were best explained by plant alpha and beta diversities, respectively, when accounting for environmental drivers and geographic distance. The best ordinary least squares (OLS) multiple regression models, partial least squares regression (PLSR) and variation partitioning analysis (VPA) indicated that plant richness was positively correlated with fungal richness. However, no correlation between plant richness and fungal richness was evident for fungal functional guilds when analyzed individually. Plant productivity showed a weaker relationship to fungal diversity which was intercorrelated with other factors such as plant diversity, and was thus excluded as a main driver. Our study points to a predominant effect of plant diversity, along with other factors such as carbon : nitrogen (C : N) ratio, soil phosphorus and dissolved organic carbon, on soil fungal richness. © 2017 The Authors. New Phytologist © 2017 New Phytologist Trust.

  9. An improved partial least-squares regression method for Raman spectroscopy

    NASA Astrophysics Data System (ADS)

    Momenpour Tehran Monfared, Ali; Anis, Hanan

    2017-10-01

    It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.

  10. Comparing the index-flood and multiple-regression methods using L-moments

    NASA Astrophysics Data System (ADS)

    Malekinezhad, H.; Nachtnebel, H. P.; Klik, A.

    In arid and semi-arid regions, the length of records is usually too short to ensure reliable quantile estimates. Comparing index-flood and multiple-regression analyses based on L-moments was the main objective of this study. Factor analysis was applied to determine main influencing variables on flood magnitude. Ward’s cluster and L-moments approaches were applied to several sites in the Namak-Lake basin in central Iran to delineate homogeneous regions based on site characteristics. Homogeneity test was done using L-moments-based measures. Several distributions were fitted to the regional flood data and index-flood and multiple-regression methods as two regional flood frequency methods were compared. The results of factor analysis showed that length of main waterway, compactness coefficient, mean annual precipitation, and mean annual temperature were the main variables affecting flood magnitude. The study area was divided into three regions based on the Ward’s method of clustering approach. The homogeneity test based on L-moments showed that all three regions were acceptably homogeneous. Five distributions were fitted to the annual peak flood data of three homogeneous regions. Using the L-moment ratios and the Z-statistic criteria, GEV distribution was identified as the most robust distribution among five candidate distributions for all the proposed sub-regions of the study area, and in general, it was concluded that the generalised extreme value distribution was the best-fit distribution for every three regions. The relative root mean square error (RRMSE) measure was applied for evaluating the performance of the index-flood and multiple-regression methods in comparison with the curve fitting (plotting position) method. In general, index-flood method gives more reliable estimations for various flood magnitudes of different recurrence intervals. Therefore, this method should be adopted as regional flood frequency method for the study area and the Namak-Lake basin in central Iran. To estimate floods of various return periods for gauged catchments in the study area, the mean annual peak flood of the catchments may be multiplied by corresponding values of the growth factors, and computed using the GEV distribution.

  11. Computer modeling of multiple-channel input signals and intermodulation losses caused by nonlinear traveling wave tube amplifiers

    NASA Technical Reports Server (NTRS)

    Stankiewicz, N.

    1982-01-01

    The multiple channel input signal to a soft limiter amplifier as a traveling wave tube is represented as a finite, linear sum of Gaussian functions in the frequency domain. Linear regression is used to fit the channel shapes to a least squares residual error. Distortions in output signal, namely intermodulation products, are produced by the nonlinear gain characteristic of the amplifier and constitute the principal noise analyzed in this study. The signal to noise ratios are calculated for various input powers from saturation to 10 dB below saturation for two specific distributions of channels. A criterion for the truncation of the series expansion of the nonlinear transfer characteristic is given. It is found that he signal to noise ratios are very sensitive to the coefficients used in this expansion. Improper or incorrect truncation of the series leads to ambiguous results in the signal to noise ratios.

  12. A new approach for modeling patient overall radiosensitivity and predicting multiple toxicity endpoints for breast cancer patients.

    PubMed

    Mbah, Chamberlain; De Ruyck, Kim; De Schrijver, Silke; De Sutter, Charlotte; Schiettecatte, Kimberly; Monten, Chris; Paelinck, Leen; De Neve, Wilfried; Thierens, Hubert; West, Catharine; Amorim, Gustavo; Thas, Olivier; Veldeman, Liv

    2018-05-01

    Evaluation of patient characteristics inducing toxicity in breast radiotherapy, using simultaneous modeling of multiple endpoints. In 269 early-stage breast cancer patients treated with whole-breast irradiation (WBI) after breast-conserving surgery, toxicity was scored, based on five dichotomized endpoints. Five logistic regression models were fitted, one for each endpoint and the effect sizes of all variables were estimated using maximum likelihood (MLE). The MLEs are improved with James-Stein estimates (JSEs). The method combines all the MLEs, obtained for the same variable but from different endpoints. Misclassification errors were computed using MLE- and JSE-based prediction models. For associations, p-values from the sum of squares of MLEs were compared with p-values from the Standardized Total Average Toxicity (STAT) Score. With JSEs, 19 highest ranked variables were predictive of the five different endpoints. Important variables increasing radiation-induced toxicity were chemotherapy, age, SATB2 rs2881208 SNP and nodal irradiation. Treatment position (prone position) was most protective and ranked eighth. Overall, the misclassification errors were 45% and 34% for the MLE- and JSE-based models, respectively. p-Values from the sum of squares of MLEs and p-values from STAT score led to very similar conclusions, except for the variables nodal irradiation and treatment position, for which STAT p-values suggested an association with radiosensitivity, whereas p-values from the sum of squares indicated no association. Breast volume was ranked as the most significant variable in both strategies. The James-Stein estimator was used for selecting variables that are predictive for multiple toxicity endpoints. With this estimator, 19 variables were predictive for all toxicities of which four were significantly associated with overall radiosensitivity. JSEs led to almost 25% reduction in the misclassification error rate compared to conventional MLEs. Finally, patient characteristics that are associated with radiosensitivity were identified without explicitly quantifying radiosensitivity.

  13. Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models

    ERIC Educational Resources Information Center

    Shieh, Gwowen

    2009-01-01

    In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…

  14. Quantitative structure-activity relationship of the curcumin-related compounds using various regression methods

    NASA Astrophysics Data System (ADS)

    Khazaei, Ardeshir; Sarmasti, Negin; Seyf, Jaber Yousefi

    2016-03-01

    Quantitative structure activity relationship were used to study a series of curcumin-related compounds with inhibitory effect on prostate cancer PC-3 cells, pancreas cancer Panc-1 cells, and colon cancer HT-29 cells. Sphere exclusion method was used to split data set in two categories of train and test set. Multiple linear regression, principal component regression and partial least squares were used as the regression methods. In other hand, to investigate the effect of feature selection methods, stepwise, Genetic algorithm, and simulated annealing were used. In two cases (PC-3 cells and Panc-1 cells), the best models were generated by a combination of multiple linear regression and stepwise (PC-3 cells: r2 = 0.86, q2 = 0.82, pred_r2 = 0.93, and r2m (test) = 0.43, Panc-1 cells: r2 = 0.85, q2 = 0.80, pred_r2 = 0.71, and r2m (test) = 0.68). For the HT-29 cells, principal component regression with stepwise (r2 = 0.69, q2 = 0.62, pred_r2 = 0.54, and r2m (test) = 0.41) is the best method. The QSAR study reveals descriptors which have crucial role in the inhibitory property of curcumin-like compounds. 6ChainCount, T_C_C_1, and T_O_O_7 are the most important descriptors that have the greatest effect. With a specific end goal to design and optimization of novel efficient curcumin-related compounds it is useful to introduce heteroatoms such as nitrogen, oxygen, and sulfur atoms in the chemical structure (reduce the contribution of T_C_C_1 descriptor) and increase the contribution of 6ChainCount and T_O_O_7 descriptors. Models can be useful in the better design of some novel curcumin-related compounds that can be used in the treatment of prostate, pancreas, and colon cancers.

  15. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.

    PubMed

    Chen, Yanguang

    2016-01-01

    In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.

  16. Mortality rates in OECD countries converged during the period 1990-2010.

    PubMed

    Bremberg, Sven G

    2017-06-01

    Since the scientific revolution of the 18th century, human health has gradually improved, but there is no unifying theory that explains this improvement in health. Studies of macrodeterminants have produced conflicting results. Most studies have analysed health at a given point in time as the outcome; however, the rate of improvement in health might be a more appropriate outcome. Twenty-eight OECD member countries were selected for analysis in the period 1990-2010. The main outcomes studied, in six age groups, were the national rates of decrease in mortality in the period 1990-2010. The effects of seven potential determinants on the rates of decrease in mortality were analysed in linear multiple regression models using least squares, controlling for country-specific history constants, which represent the mortality rate in 1990. The multiple regression analyses started with models that only included mortality rates in 1990 as determinants. These models explained 87% of the intercountry variation in the children aged 1-4 years and 51% in adults aged 55-74 years. When added to the regression equations, the seven determinants did not seem to significantly increase the explanatory power of the equations. The analyses indicated a decrease in mortality in all nations and in all age groups. The development of mortality rates in the different nations demonstrated significant catch-up effects. Therefore an important objective of the national public health sector seems to be to reduce the delay between international research findings and the universal implementation of relevant innovations.

  17. Determination of biodiesel content in biodiesel/diesel blends using NIR and visible spectroscopy with variable selection.

    PubMed

    Fernandes, David Douglas Sousa; Gomes, Adriano A; Costa, Gean Bezerra da; Silva, Gildo William B da; Véras, Germano

    2011-12-15

    This work is concerned of evaluate the use of visible and near-infrared (NIR) range, separately and combined, to determine the biodiesel content in biodiesel/diesel blends using Multiple Linear Regression (MLR) and variable selection by Successive Projections Algorithm (SPA). Full spectrum models employing Partial Least Squares (PLS) and variables selection by Stepwise (SW) regression coupled with Multiple Linear Regression (MLR) and PLS models also with variable selection by Jack-Knife (Jk) were compared the proposed methodology. Several preprocessing were evaluated, being chosen derivative Savitzky-Golay with second-order polynomial and 17-point window for NIR and visible-NIR range, with offset correction. A total of 100 blends with biodiesel content between 5 and 50% (v/v) prepared starting from ten sample of biodiesel. In the NIR and visible region the best model was the SPA-MLR using only two and eight wavelengths with RMSEP of 0.6439% (v/v) and 0.5741 respectively, while in the visible-NIR region the best model was the SW-MLR using five wavelengths and RMSEP of 0.9533% (v/v). Results indicate that both spectral ranges evaluated showed potential for developing a rapid and nondestructive method to quantify biodiesel in blends with mineral diesel. Finally, one can still mention that the improvement in terms of prediction error obtained with the procedure for variables selection was significant. Copyright © 2011 Elsevier B.V. All rights reserved.

  18. Principal Covariates Clusterwise Regression (PCCR): Accounting for Multicollinearity and Population Heterogeneity in Hierarchically Organized Data.

    PubMed

    Wilderjans, Tom Frans; Vande Gaer, Eva; Kiers, Henk A L; Van Mechelen, Iven; Ceulemans, Eva

    2017-03-01

    In the behavioral sciences, many research questions pertain to a regression problem in that one wants to predict a criterion on the basis of a number of predictors. Although in many cases, ordinary least squares regression will suffice, sometimes the prediction problem is more challenging, for three reasons: first, multiple highly collinear predictors can be available, making it difficult to grasp their mutual relations as well as their relations to the criterion. In that case, it may be very useful to reduce the predictors to a few summary variables, on which one regresses the criterion and which at the same time yields insight into the predictor structure. Second, the population under study may consist of a few unknown subgroups that are characterized by different regression models. Third, the obtained data are often hierarchically structured, with for instance, observations being nested into persons or participants within groups or countries. Although some methods have been developed that partially meet these challenges (i.e., principal covariates regression (PCovR), clusterwise regression (CR), and structural equation models), none of these methods adequately deals with all of them simultaneously. To fill this gap, we propose the principal covariates clusterwise regression (PCCR) method, which combines the key idea's behind PCovR (de Jong & Kiers in Chemom Intell Lab Syst 14(1-3):155-164, 1992) and CR (Späth in Computing 22(4):367-373, 1979). The PCCR method is validated by means of a simulation study and by applying it to cross-cultural data regarding satisfaction with life.

  19. Estimating current and future streamflow characteristics at ungaged sites, central and eastern Montana, with application to evaluating effects of climate change on fish populations

    USGS Publications Warehouse

    Sando, Roy; Chase, Katherine J.

    2017-03-23

    A common statistical procedure for estimating streamflow statistics at ungaged locations is to develop a relational model between streamflow and drainage basin characteristics at gaged locations using least squares regression analysis; however, least squares regression methods are parametric and make constraining assumptions about the data distribution. The random forest regression method provides an alternative nonparametric method for estimating streamflow characteristics at ungaged sites and requires that the data meet fewer statistical conditions than least squares regression methods.Random forest regression analysis was used to develop predictive models for 89 streamflow characteristics using Precipitation-Runoff Modeling System simulated streamflow data and drainage basin characteristics at 179 sites in central and eastern Montana. The predictive models were developed from streamflow data simulated for current (baseline, water years 1982–99) conditions and three future periods (water years 2021–38, 2046–63, and 2071–88) under three different climate-change scenarios. These predictive models were then used to predict streamflow characteristics for baseline conditions and three future periods at 1,707 fish sampling sites in central and eastern Montana. The average root mean square error for all predictive models was about 50 percent. When streamflow predictions at 23 fish sampling sites were compared to nearby locations with simulated data, the mean relative percent difference was about 43 percent. When predictions were compared to streamflow data recorded at 21 U.S. Geological Survey streamflow-gaging stations outside of the calibration basins, the average mean absolute percent error was about 73 percent.

  20. Comparison of Logistic Regression and Artificial Neural Network in Low Back Pain Prediction: Second National Health Survey

    PubMed Central

    Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H

    2012-01-01

    Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198

  1. Comparison of logistic regression and artificial neural network in low back pain prediction: second national health survey.

    PubMed

    Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H

    2012-01-01

    The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.

  2. Comparison of Regression Methods to Compute Atmospheric Pressure and Earth Tidal Coefficients in Water Level Associated with Wenchuan Earthquake of 12 May 2008

    NASA Astrophysics Data System (ADS)

    He, Anhua; Singh, Ramesh P.; Sun, Zhaohua; Ye, Qing; Zhao, Gang

    2016-07-01

    The earth tide, atmospheric pressure, precipitation and earthquake fluctuations, especially earthquake greatly impacts water well levels, thus anomalous co-seismic changes in ground water levels have been observed. In this paper, we have used four different models, simple linear regression (SLR), multiple linear regression (MLR), principal component analysis (PCA) and partial least squares (PLS) to compute the atmospheric pressure and earth tidal effects on water level. Furthermore, we have used the Akaike information criterion (AIC) to study the performance of various models. Based on the lowest AIC and sum of squares for error values, the best estimate of the effects of atmospheric pressure and earth tide on water level is found using the MLR model. However, MLR model does not provide multicollinearity between inputs, as a result the atmospheric pressure and earth tidal response coefficients fail to reflect the mechanisms associated with the groundwater level fluctuations. On the premise of solving serious multicollinearity of inputs, PLS model shows the minimum AIC value. The atmospheric pressure and earth tidal response coefficients show close response with the observation using PLS model. The atmospheric pressure and the earth tidal response coefficients are found to be sensitive to the stress-strain state using the observed data for the period 1 April-8 June 2008 of Chuan 03# well. The transient enhancement of porosity of rock mass around Chuan 03# well associated with the Wenchuan earthquake (Mw = 7.9 of 12 May 2008) that has taken its original pre-seismic level after 13 days indicates that the co-seismic sharp rise of water well could be induced by static stress change, rather than development of new fractures.

  3. Random sample consensus combined with partial least squares regression (RANSAC-PLS) for microbial metabolomics data mining and phenotype improvement.

    PubMed

    Teoh, Shao Thing; Kitamura, Miki; Nakayama, Yasumune; Putri, Sastia; Mukai, Yukio; Fukusaki, Eiichiro

    2016-08-01

    In recent years, the advent of high-throughput omics technology has made possible a new class of strain engineering approaches, based on identification of possible gene targets for phenotype improvement from omic-level comparison of different strains or growth conditions. Metabolomics, with its focus on the omic level closest to the phenotype, lends itself naturally to this semi-rational methodology. When a quantitative phenotype such as growth rate under stress is considered, regression modeling using multivariate techniques such as partial least squares (PLS) is often used to identify metabolites correlated with the target phenotype. However, linear modeling techniques such as PLS require a consistent metabolite-phenotype trend across the samples, which may not be the case when outliers or multiple conflicting trends are present in the data. To address this, we proposed a data-mining strategy that utilizes random sample consensus (RANSAC) to select subsets of samples with consistent trends for construction of better regression models. By applying a combination of RANSAC and PLS (RANSAC-PLS) to a dataset from a previous study (gas chromatography/mass spectrometry metabolomics data and 1-butanol tolerance of 19 yeast mutant strains), new metabolites were indicated to be correlated with tolerance within certain subsets of the samples. The relevance of these metabolites to 1-butanol tolerance were then validated from single-deletion strains of corresponding metabolic genes. The results showed that RANSAC-PLS is a promising strategy to identify unique metabolites that provide additional hints for phenotype improvement, which could not be detected by traditional PLS modeling using the entire dataset. Copyright © 2016 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.

  4. Fast detection and visualization of minced lamb meat adulteration using NIR hyperspectral imaging and multivariate image analysis.

    PubMed

    Kamruzzaman, Mohammed; Sun, Da-Wen; ElMasry, Gamal; Allen, Paul

    2013-01-15

    Many studies have been carried out in developing non-destructive technologies for predicting meat adulteration, but there is still no endeavor for non-destructive detection and quantification of adulteration in minced lamb meat. The main goal of this study was to develop and optimize a rapid analytical technique based on near-infrared (NIR) hyperspectral imaging to detect the level of adulteration in minced lamb. Initial investigation was carried out using principal component analysis (PCA) to identify the most potential adulterate in minced lamb. Minced lamb meat samples were then adulterated with minced pork in the range 2-40% (w/w) at approximately 2% increments. Spectral data were used to develop a partial least squares regression (PLSR) model to predict the level of adulteration in minced lamb. Good prediction model was obtained using the whole spectral range (910-1700 nm) with a coefficient of determination (R(2)(cv)) of 0.99 and root-mean-square errors estimated by cross validation (RMSECV) of 1.37%. Four important wavelengths (940, 1067, 1144 and 1217 nm) were selected using weighted regression coefficients (Bw) and a multiple linear regression (MLR) model was then established using these important wavelengths to predict adulteration. The MLR model resulted in a coefficient of determination (R(2)(cv)) of 0.98 and RMSECV of 1.45%. The developed MLR model was then applied to each pixel in the image to obtain prediction maps to visualize the distribution of adulteration of the tested samples. The results demonstrated that the laborious and time-consuming tradition analytical techniques could be replaced by spectral data in order to provide rapid, low cost and non-destructive testing technique for adulterate detection in minced lamb meat. Copyright © 2012 Elsevier B.V. All rights reserved.

  5. Estimates of Flow Duration, Mean Flow, and Peak-Discharge Frequency Values for Kansas Stream Locations

    USGS Publications Warehouse

    Perry, Charles A.; Wolock, David M.; Artman, Joshua C.

    2004-01-01

    Streamflow statistics of flow duration and peak-discharge frequency were estimated for 4,771 individual locations on streams listed on the 1999 Kansas Surface Water Register. These statistics included the flow-duration values of 90, 75, 50, 25, and 10 percent, as well as the mean flow value. Peak-discharge frequency values were estimated for the 2-, 5-, 10-, 25-, 50-, and 100-year floods. Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating flow-duration values of 90, 75, 50, 25, and 10 percent and the mean flow for uncontrolled flow stream locations. The contributing-drainage areas of 149 U.S. Geological Survey streamflow-gaging stations in Kansas and parts of surrounding States that had flow uncontrolled by Federal reservoirs and used in the regression analyses ranged from 2.06 to 12,004 square miles. Logarithmic transformations of climatic and basin data were performed to yield the best linear relation for developing equations to compute flow durations and mean flow. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were contributing-drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. The analyses yielded a model standard error of prediction range of 0.43 logarithmic units for the 90-percent duration analysis to 0.15 logarithmic units for the 10-percent duration analysis. The model standard error of prediction was 0.14 logarithmic units for the mean flow. Regression equations used to estimate peak-discharge frequency values were obtained from a previous report, and estimates for the 2-, 5-, 10-, 25-, 50-, and 100-year floods were determined for this report. The regression equations and an interpolation procedure were used to compute flow durations, mean flow, and estimates of peak-discharge frequency for locations along uncontrolled flow streams on the 1999 Kansas Surface Water Register. Flow durations, mean flow, and peak-discharge frequency values determined at available gaging stations were used to interpolate the regression-estimated flows for the stream locations where available. Streamflow statistics for locations that had uncontrolled flow were interpolated using data from gaging stations weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled reaches of Kansas streams, the streamflow statistics were interpolated between gaging stations using only gaged data weighted by drainage area.

  6. A Model Comparison for Count Data with a Positively Skewed Distribution with an Application to the Number of University Mathematics Courses Completed

    ERIC Educational Resources Information Center

    Liou, Pey-Yan

    2009-01-01

    The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…

  7. Multiple tobacco product use among adults in the United States: cigarettes, cigars, electronic cigarettes, hookah, smokeless tobacco, and snus.

    PubMed

    Lee, Youn O; Hebert, Christine J; Nonnemaker, James M; Kim, Annice E

    2014-05-01

    Noncigarette tobacco products are increasingly popular. Researchers need to understand multiple tobacco product use to assess the effects of these products on population health. We estimate national prevalence and examine risk factors for multiple product use. We calculated prevalence estimates of current use patterns involving cigarettes, cigars, electronic cigarettes, hookah, smokeless tobacco, and snus using data from the 2012 RTI National Adult Tobacco Survey (N=3627), a random-digit-dial telephone survey of adults aged 18 and over. Associations between use patterns (exclusive single product and multiple products) and demographic characteristics were examined using Pearson chi-square tests and logistic regression. 32.1% of adults currently use 1 or more tobacco products; 14.9% use cigarettes exclusively, and 6.6% use one noncigarette product exclusively, 6.9% use cigarettes with another product (dual use), 1.3% use two noncigarette products, and 2.4% use three or more products (polytobacco use). Smokers who are young adult, male, never married, reside in the West, and made prior quit attempts were at risk for multiple product use. Over 10% of U.S. adults use multiple tobacco products. A better understanding of multiple product use involving combustible products, like cigars and hookah, is needed. Multiple product use may be associated with past quit attempts. Copyright © 2014 Elsevier Inc. All rights reserved.

  8. Using Quantile and Asymmetric Least Squares Regression for Optimal Risk Adjustment.

    PubMed

    Lorenz, Normann

    2017-06-01

    In this paper, we analyze optimal risk adjustment for direct risk selection (DRS). Integrating insurers' activities for risk selection into a discrete choice model of individuals' health insurance choice shows that DRS has the structure of a contest. For the contest success function (csf) used in most of the contest literature (the Tullock-csf), optimal transfers for a risk adjustment scheme have to be determined by means of a restricted quantile regression, irrespective of whether insurers are primarily engaged in positive DRS (attracting low risks) or negative DRS (repelling high risks). This is at odds with the common practice of determining transfers by means of a least squares regression. However, this common practice can be rationalized for a new csf, but only if positive and negative DRSs are equally important; if they are not, optimal transfers have to be calculated by means of a restricted asymmetric least squares regression. Using data from German and Swiss health insurers, we find considerable differences between the three types of regressions. Optimal transfers therefore critically depend on which csf represents insurers' incentives for DRS and, if it is not the Tullock-csf, whether insurers are primarily engaged in positive or negative DRS. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  9. Binding affinity toward human prion protein of some anti-prion compounds - Assessment based on QSAR modeling, molecular docking and non-parametric ranking.

    PubMed

    Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija

    2018-01-01

    The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.

  10. Seasonal forecasting of high wind speeds over Western Europe

    NASA Astrophysics Data System (ADS)

    Palutikof, J. P.; Holt, T.

    2003-04-01

    As financial losses associated with extreme weather events escalate, there is interest from end users in the forestry and insurance industries, for example, in the development of seasonal forecasting models with a long lead time. This study uses exceedences of the 90th, 95th, and 99th percentiles of daily maximum wind speed over the period 1958 to present to derive predictands of winter wind extremes. The source data is the 6-hourly NCEP Reanalysis gridded surface wind field. Predictor variables include principal components of Atlantic sea surface temperature and several indices of climate variability, including the NAO and SOI. Lead times of up to a year are considered, in monthly increments. Three regression techniques are evaluated; multiple linear regression (MLR), principal component regression (PCR), and partial least squares regression (PLS). PCR and PLS proved considerably superior to MLR with much lower standard errors. PLS was chosen to formulate the predictive model since it offers more flexibility in experimental design and gave slightly better results than PCR. The results indicate that winter windiness can be predicted with considerable skill one year ahead for much of coastal Europe, but that this deteriorates rapidly in the hinterland. The experiment succeeded in highlighting PLS as a very useful method for developing more precise forecasting models, and in identifying areas of high predictability.

  11. Ranking contributing areas of salt and selenium in the Lower Gunnison River Basin, Colorado, using multiple linear regression models

    USGS Publications Warehouse

    Linard, Joshua I.

    2013-01-01

    Mitigating the effects of salt and selenium on water quality in the Grand Valley and lower Gunnison River Basin in western Colorado is a major concern for land managers. Previous modeling indicated means to improve the models by including more detailed geospatial data and a more rigorous method for developing the models. After evaluating all possible combinations of geospatial variables, four multiple linear regression models resulted that could estimate irrigation-season salt yield, nonirrigation-season salt yield, irrigation-season selenium yield, and nonirrigation-season selenium yield. The adjusted r-squared and the residual standard error (in units of log-transformed yield) of the models were, respectively, 0.87 and 2.03 for the irrigation-season salt model, 0.90 and 1.25 for the nonirrigation-season salt model, 0.85 and 2.94 for the irrigation-season selenium model, and 0.93 and 1.75 for the nonirrigation-season selenium model. The four models were used to estimate yields and loads from contributing areas corresponding to 12-digit hydrologic unit codes in the lower Gunnison River Basin study area. Each of the 175 contributing areas was ranked according to its estimated mean seasonal yield of salt and selenium.

  12. Ventilator-associated pneumonia: the influence of bacterial resistance, prescription errors, and de-escalation of antimicrobial therapy on mortality rates.

    PubMed

    Souza-Oliveira, Ana Carolina; Cunha, Thúlio Marquez; Passos, Liliane Barbosa da Silva; Lopes, Gustavo Camargo; Gomes, Fabiola Alves; Röder, Denise Von Dolinger de Brito

    2016-01-01

    Ventilator-associated pneumonia is the most prevalent nosocomial infection in intensive care units and is associated with high mortality rates (14-70%). This study evaluated factors influencing mortality of patients with Ventilator-associated pneumonia (VAP), including bacterial resistance, prescription errors, and de-escalation of antibiotic therapy. This retrospective study included 120 cases of Ventilator-associated pneumonia admitted to the adult adult intensive care unit of the Federal University of Uberlândia. The chi-square test was used to compare qualitative variables. Student's t-test was used for quantitative variables and multiple logistic regression analysis to identify independent predictors of mortality. De-escalation of antibiotic therapy and resistant bacteria did not influence mortality. Mortality was 4 times and 3 times higher, respectively, in patients who received an inappropriate antibiotic loading dose and in patients whose antibiotic dose was not adjusted for renal function. Multiple logistic regression analysis revealed the incorrect adjustment for renal function was the only independent factor associated with increased mortality. Prescription errors influenced mortality of patients with Ventilator-associated pneumonia, underscoring the challenge of proper Ventilator-associated pneumonia treatment, which requires continuous reevaluation to ensure that clinical response to therapy meets expectations. Copyright © 2016. Published by Elsevier Editora Ltda.

  13. Downscaling Land Surface Temperature in Complex Regions by Using Multiple Scale Factors with Adaptive Thresholds

    PubMed Central

    Yang, Yingbao; Li, Xiaolong; Pan, Xin; Zhang, Yong; Cao, Chen

    2017-01-01

    Many downscaling algorithms have been proposed to address the issue of coarse-resolution land surface temperature (LST) derived from available satellite-borne sensors. However, few studies have focused on improving LST downscaling in urban areas with several mixed surface types. In this study, LST was downscaled by a multiple linear regression model between LST and multiple scale factors in mixed areas with three or four surface types. The correlation coefficients (CCs) between LST and the scale factors were used to assess the importance of the scale factors within a moving window. CC thresholds determined which factors participated in the fitting of the regression equation. The proposed downscaling approach, which involves an adaptive selection of the scale factors, was evaluated using the LST derived from four Landsat 8 thermal imageries of Nanjing City in different seasons. Results of the visual and quantitative analyses show that the proposed approach achieves relatively satisfactory downscaling results on 11 August, with coefficient of determination and root-mean-square error of 0.87 and 1.13 °C, respectively. Relative to other approaches, our approach shows the similar accuracy and the availability in all seasons. The best (worst) availability occurred in the region of vegetation (water). Thus, the approach is an efficient and reliable LST downscaling method. Future tasks include reliable LST downscaling in challenging regions and the application of our model in middle and low spatial resolutions. PMID:28368301

  14. A weighted least squares estimation of the polynomial regression model on paddy production in the area of Kedah and Perlis

    NASA Astrophysics Data System (ADS)

    Musa, Rosliza; Ali, Zalila; Baharum, Adam; Nor, Norlida Mohd

    2017-08-01

    The linear regression model assumes that all random error components are identically and independently distributed with constant variance. Hence, each data point provides equally precise information about the deterministic part of the total variation. In other words, the standard deviations of the error terms are constant over all values of the predictor variables. When the assumption of constant variance is violated, the ordinary least squares estimator of regression coefficient lost its property of minimum variance in the class of linear and unbiased estimators. Weighted least squares estimation are often used to maximize the efficiency of parameter estimation. A procedure that treats all of the data equally would give less precisely measured points more influence than they should have and would give highly precise points too little influence. Optimizing the weighted fitting criterion to find the parameter estimates allows the weights to determine the contribution of each observation to the final parameter estimates. This study used polynomial model with weighted least squares estimation to investigate paddy production of different paddy lots based on paddy cultivation characteristics and environmental characteristics in the area of Kedah and Perlis. The results indicated that factors affecting paddy production are mixture fertilizer application cycle, average temperature, the squared effect of average rainfall, the squared effect of pest and disease, the interaction between acreage with amount of mixture fertilizer, the interaction between paddy variety and NPK fertilizer application cycle and the interaction between pest and disease and NPK fertilizer application cycle.

  15. New equations improve NIR prediction of body fat among high school wrestlers.

    PubMed

    Oppliger, R A; Clark, R R; Nielsen, D H

    2000-09-01

    Methodologic study to derive prediction equations for percent body fat (%BF). To develop valid regression equations using NIR to assess body composition among high school wrestlers. Clinicians need a portable, fast, and simple field method for assessing body composition among wrestlers. Near-infrared photospectrometry (NIR) meets these criteria, but its efficacy has been challenged. Subjects were 150 high school wrestlers from 2 Midwestern states with mean +/- SD age of 16.3 +/- 1.1 yrs, weight of 69.5 +/- 11.7 kg, and height of 174.4 +/- 7.0 cm. Relative body fatness (%BF) determined from hydrostatic weighing was the criterion measure, and NIR optical density (OD) measurements at multiple sites, plus height, weight, and body mass index (BMI) were the predictor variables. Four equations were developed with multiple R2s that varied from .530 to .693, root mean squared errors varied from 2.8% BF to 3.4% BF, and prediction errors varied from 2.9% BF to 3.1% BF. The best equation used OD measurements at the biceps, triceps, and thigh sites, BMI, and age. The root mean squared error and prediction error for all 4 equations were equal to or smaller than for a skinfold equation commonly used with wrestlers. The results substantiate the validity of NIR for predicting % BF among high school wrestlers. Cross-validation of these equations is warranted.

  16. A Pilot Study of Reasons and Risk Factors for "No-Shows" in a Pediatric Neurology Clinic.

    PubMed

    Guzek, Lindsay M; Fadel, William F; Golomb, Meredith R

    2015-09-01

    Missed clinic appointments lead to decreased patient access, worse patient outcomes, and increased healthcare costs. The goal of this pilot study was to identify reasons for and risk factors associated with missed pediatric neurology outpatient appointments ("no-shows"). This was a prospective cohort study of patients scheduled for 1 week of clinic. Data on patient clinical and demographic information were collected by record review; data on reasons for missed appointments were collected by phone interviews. Univariate and multivariate analyses were conducted using chi-square tests and multiple logistic regression to assess risk factors for missed appointments. Fifty-nine (25%) of 236 scheduled patients were no-shows. Scheduling conflicts (25.9%) and forgetting (20.4%) were the most common reasons for missed appointments. When controlling for confounding factors in the logistic regression, Medicaid (odds ratio 2.36), distance from clinic, and time since appointment was scheduled were associated with missed appointments. Further work in this area is needed. © The Author(s) 2014.

  17. Evaluation of aroma enhancement for "Ecolly" dry white wines by mixed inoculation of selected Rhodotorula mucilaginosa and Saccharomyces cerevisiae.

    PubMed

    Wang, Xing-Chen; Li, Ai-Hua; Dizy, Marta; Ullah, Niamat; Sun, Wei-Xuan; Tao, Yong-Sheng

    2017-08-01

    To improve the aroma profile of Ecolly dry white wine, the simultaneous and sequential inoculations of selected Rhodotorula mucilaginosa and Saccharomyces cerevisiae were performed in wine making of this work. The two yeasts were mixed in various ratios for making the mixed inoculum. The amount of volatiles and aroma characteristics were determined the following year. Mixed fermentation improved both the varietal and fermentative aroma compound composition, especially that of (Z)-3-hexene-1-ol, nerol oxide, certain acetates and ethyls group compounds. Citrus, sweet fruit, acid fruit, berry, and floral aroma traits were enhanced by mixed fermentation; however, an animal note was introduced upon using higher amounts of R. mucilaginosa. Aroma traits were regressed with volatiles as observed by the partial least-square regression method. Analysis of correlation coefficients revealed that the aroma traits were the multiple interactions of volatile compounds, with the fermentative volatiles having more impact on aroma than varietal compounds. Copyright © 2017 Elsevier Ltd. All rights reserved.

  18. An analysis of the magnitude and frequency of floods on Oahu, Hawaii

    USGS Publications Warehouse

    Nakahara, R.H.

    1980-01-01

    An analysis of available peak-flow data for the island of Oahu, Hawaii, was made by using multiple regression techniques which related flood-frequency data to basin and climatic characteristics for 74 gaging stations on Oahu. In the analysis, several different groupings of stations were investigated, including divisions by geographic location and size of drainage area. The grouping consisting of two leeward divisions and one windward division produced the best results. Drainage basins ranged in area from 0.03 to 45.7 square miles. Equations relating flood magnitudes of selected frequencies to basin characteristics were developed for the three divisions of Oahu. These equations can be used to estimate the magnitude and frequency of floods for any site, gaged or ungaged, for any desired recurrence interval from 2 to 100 years. Data on basin characteristics, flood magnitudes for various recurrence intervals from individual station-frequency curves, and computed flood magnitudes by use of the regression equation are tabulated to provide the needed data. (USGS)

  19. Model Estimation Using Ridge Regression with the Variance Normalization Criterion. Interim Report No. 2. The Education and Inequality in Canada Project.

    ERIC Educational Resources Information Center

    Lee, Wan-Fung; Bulcock, Jeffrey Wilson

    The purposes of this study are: (1) to demonstrate the superiority of simple ridge regression over ordinary least squares regression through theoretical argument and empirical example; (2) to modify ridge regression through use of the variance normalization criterion; and (3) to demonstrate the superiority of simple ridge regression based on the…

  20. Functional Relationships and Regression Analysis.

    ERIC Educational Resources Information Center

    Preece, Peter F. W.

    1978-01-01

    Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…

  1. Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression

    PubMed Central

    Chen, Yanguang

    2016-01-01

    In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271

  2. Membrane Introduction Mass Spectrometry Combined with an Orthogonal Partial-Least Squares Calibration Model for Mixture Analysis.

    PubMed

    Li, Min; Zhang, Lu; Yao, Xiaolong; Jiang, Xingyu

    2017-01-01

    The emerging membrane introduction mass spectrometry technique has been successfully used to detect benzene, toluene, ethyl benzene and xylene (BTEX), while overlapped spectra have unfortunately hindered its further application to the analysis of mixtures. Multivariate calibration, an efficient method to analyze mixtures, has been widely applied. In this paper, we compared univariate and multivariate analyses for quantification of the individual components of mixture samples. The results showed that the univariate analysis creates poor models with regression coefficients of 0.912, 0.867, 0.440 and 0.351 for BTEX, respectively. For multivariate analysis, a comparison to the partial-least squares (PLS) model shows that the orthogonal partial-least squares (OPLS) regression exhibits an optimal performance with regression coefficients of 0.995, 0.999, 0.980 and 0.976, favorable calibration parameters (RMSEC and RMSECV) and a favorable validation parameter (RMSEP). Furthermore, the OPLS exhibits a good recovery of 73.86 - 122.20% and relative standard deviation (RSD) of the repeatability of 1.14 - 4.87%. Thus, MIMS coupled with the OPLS regression provides an optimal approach for a quantitative BTEX mixture analysis in monitoring and predicting water pollution.

  3. Incorporation of prior information on parameters into nonlinear regression groundwater flow models: 2. Applications

    USGS Publications Warehouse

    Cooley, Richard L.

    1983-01-01

    This paper investigates factors influencing the degree of improvement in estimates of parameters of a nonlinear regression groundwater flow model by incorporating prior information of unknown reliability. Consideration of expected behavior of the regression solutions and results of a hypothetical modeling problem lead to several general conclusions. First, if the parameters are properly scaled, linearized expressions for the mean square error (MSE) in parameter estimates of a nonlinear model will often behave very nearly as if the model were linear. Second, by using prior information, the MSE in properly scaled parameters can be reduced greatly over the MSE of ordinary least squares estimates of parameters. Third, plots of estimated MSE and the estimated standard deviation of MSE versus an auxiliary parameter (the ridge parameter) specifying the degree of influence of the prior information on regression results can help determine the potential for improvement of parameter estimates. Fourth, proposed criteria can be used to make appropriate choices for the ridge parameter and another parameter expressing degree of overall bias in the prior information. Results of a case study of Truckee Meadows, Reno-Sparks area, Washoe County, Nevada, conform closely to the results of the hypothetical problem. In the Truckee Meadows case, incorporation of prior information did not greatly change the parameter estimates from those obtained by ordinary least squares. However, the analysis showed that both sets of estimates are more reliable than suggested by the standard errors from ordinary least squares.

  4. Modeling thermal sensation in a Mediterranean climate—a comparison of linear and ordinal models

    NASA Astrophysics Data System (ADS)

    Pantavou, Katerina; Lykoudis, Spyridon

    2014-08-01

    A simple thermo-physiological model of outdoor thermal sensation adjusted with psychological factors is developed aiming to predict thermal sensation in Mediterranean climates. Microclimatic measurements simultaneously with interviews on personal and psychological conditions were carried out in a square, a street canyon and a coastal location of the greater urban area of Athens, Greece. Multiple linear and ordinal regression were applied in order to estimate thermal sensation making allowance for all the recorded parameters or specific, empirically selected, subsets producing so-called extensive and empirical models, respectively. Meteorological, thermo-physiological and overall models - considering psychological factors as well - were developed. Predictions were improved when personal and psychological factors were taken into account as compared to meteorological models. The model based on ordinal regression reproduced extreme values of thermal sensation vote more adequately than the linear regression one, while the empirical model produced satisfactory results in relation to the extensive model. The effects of adaptation and expectation on thermal sensation vote were introduced in the models by means of the exposure time, season and preference related to air temperature and irradiation. The assessment of thermal sensation could be a useful criterion in decision making regarding public health, outdoor spaces planning and tourism.

  5. Speckle evolution with multiple steps of least-squares phase removal

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen Mingzhou; Dainty, Chris; Roux, Filippus S.

    2011-08-15

    We study numerically the evolution of speckle fields due to the annihilation of optical vortices after the least-squares phase has been removed. A process with multiple steps of least-squares phase removal is carried out to minimize both vortex density and scintillation index. Statistical results show that almost all the optical vortices can be removed from a speckle field, which finally decays into a quasiplane wave after such an iterative process.

  6. Applying Regression Analysis to Problems in Institutional Research.

    ERIC Educational Resources Information Center

    Bohannon, Tom R.

    1988-01-01

    Regression analysis is one of the most frequently used statistical techniques in institutional research. Principles of least squares, model building, residual analysis, influence statistics, and multi-collinearity are described and illustrated. (Author/MSE)

  7. On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.

    PubMed

    Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C

    2008-07-21

    The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.

  8. Who Will Win?: Predicting the Presidential Election Using Linear Regression

    ERIC Educational Resources Information Center

    Lamb, John H.

    2007-01-01

    This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…

  9. The Variance Normalization Method of Ridge Regression Analysis.

    ERIC Educational Resources Information Center

    Bulcock, J. W.; And Others

    The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate…

  10. A graphical method to evaluate spectral preprocessing in multivariate regression calibrations: example with Savitzky-Golay filters and partial least squares regression

    USDA-ARS?s Scientific Manuscript database

    In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly ...

  11. Combining the genetic algorithm and successive projection algorithm for the selection of feature wavelengths to evaluate exudative characteristics in frozen-thawed fish muscle.

    PubMed

    Cheng, Jun-Hu; Sun, Da-Wen; Pu, Hongbin

    2016-04-15

    The potential use of feature wavelengths for predicting drip loss in grass carp fish, as affected by being frozen at -20°C for 24 h and thawed at 4°C for 1, 2, 4, and 6 days, was investigated. Hyperspectral images of frozen-thawed fish were obtained and their corresponding spectra were extracted. Least-squares support vector machine and multiple linear regression (MLR) models were established using five key wavelengths, selected by combining a genetic algorithm and successive projections algorithm, and this showed satisfactory performance in drip loss prediction. The MLR model with a determination coefficient of prediction (R(2)P) of 0.9258, and lower root mean square error estimated by a prediction (RMSEP) of 1.12%, was applied to transfer each pixel of the image and generate the distribution maps of exudation changes. The results confirmed that it is feasible to identify the feature wavelengths using variable selection methods and chemometric analysis for developing on-line multispectral imaging. Copyright © 2015 Elsevier Ltd. All rights reserved.

  12. Neither fixed nor random: weighted least squares meta-regression.

    PubMed

    Stanley, T D; Doucouliagos, Hristos

    2017-03-01

    Our study revisits and challenges two core conventional meta-regression estimators: the prevalent use of 'mixed-effects' or random-effects meta-regression analysis and the correction of standard errors that defines fixed-effects meta-regression analysis (FE-MRA). We show how and explain why an unrestricted weighted least squares MRA (WLS-MRA) estimator is superior to conventional random-effects (or mixed-effects) meta-regression when there is publication (or small-sample) bias that is as good as FE-MRA in all cases and better than fixed effects in most practical applications. Simulations and statistical theory show that WLS-MRA provides satisfactory estimates of meta-regression coefficients that are practically equivalent to mixed effects or random effects when there is no publication bias. When there is publication selection bias, WLS-MRA always has smaller bias than mixed effects or random effects. In practical applications, an unrestricted WLS meta-regression is likely to give practically equivalent or superior estimates to fixed-effects, random-effects, and mixed-effects meta-regression approaches. However, random-effects meta-regression remains viable and perhaps somewhat preferable if selection for statistical significance (publication bias) can be ruled out and when random, additive normal heterogeneity is known to directly affect the 'true' regression coefficient. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  13. Peak flow regression equations For small, ungaged streams in Maine: Comparing map-based to field-based variables

    USGS Publications Warehouse

    Lombard, Pamela J.; Hodgkins, Glenn A.

    2015-01-01

    Regression equations to estimate peak streamflows with 1- to 500-year recurrence intervals (annual exceedance probabilities from 99 to 0.2 percent, respectively) were developed for small, ungaged streams in Maine. Equations presented here are the best available equations for estimating peak flows at ungaged basins in Maine with drainage areas from 0.3 to 12 square miles (mi2). Previously developed equations continue to be the best available equations for estimating peak flows for basin areas greater than 12 mi2. New equations presented here are based on streamflow records at 40 U.S. Geological Survey streamgages with a minimum of 10 years of recorded peak flows between 1963 and 2012. Ordinary least-squares regression techniques were used to determine the best explanatory variables for the regression equations. Traditional map-based explanatory variables were compared to variables requiring field measurements. Two field-based variables—culvert rust lines and bankfull channel widths—either were not commonly found or did not explain enough of the variability in the peak flows to warrant inclusion in the equations. The best explanatory variables were drainage area and percent basin wetlands; values for these variables were determined with a geographic information system. Generalized least-squares regression was used with these two variables to determine the equation coefficients and estimates of accuracy for the final equations.

  14. Online measurement of urea concentration in spent dialysate during hemodialysis.

    PubMed

    Olesberg, Jonathon T; Arnold, Mark A; Flanigan, Michael J

    2004-01-01

    We describe online optical measurements of urea in the effluent dialysate line during regular hemodialysis treatment of several patients. Monitoring urea removal can provide valuable information about dialysis efficiency. Spectral measurements were performed with a Fourier-transform infrared spectrometer equipped with a flow-through cell. Spectra were recorded across the 5000-4000 cm(-1) (2.0-2.5 microm) wavelength range at 1-min intervals. Savitzky-Golay filtering was used to remove baseline variations attributable to the temperature dependence of the water absorption spectrum. Urea concentrations were extracted from the filtered spectra by use of partial least-squares regression and the net analyte signal of urea. Urea concentrations predicted by partial least-squares regression matched concentrations obtained from standard chemical assays with a root mean square error of 0.30 mmol/L (0.84 mg/dL urea nitrogen) over an observed concentration range of 0-11 mmol/L. The root mean square error obtained with the net analyte signal of urea was 0.43 mmol/L with a calibration based only on a set of pure-component spectra. The error decreased to 0.23 mmol/L when a slope and offset correction were used. Urea concentrations can be continuously monitored during hemodialysis by near-infrared spectroscopy. Calibrations based on the net analyte signal of urea are particularly appealing because they do not require a training step, as do statistical multivariate calibration procedures such as partial least-squares regression.

  15. Estimation of genetic effects in the presence of multicollinearity in multibreed beef cattle evaluation.

    PubMed

    Roso, V M; Schenkel, F S; Miller, S P; Schaeffer, L R

    2005-08-01

    Breed additive, dominance, and epistatic loss effects are of concern in the genetic evaluation of a multibreed population. Multiple regression equations used for fitting these effects may show a high degree of multicollinearity among predictor variables. Typically, when strong linear relationships exist, the regression coefficients have large SE and are sensitive to changes in the data file and to the addition or deletion of variables in the model. Generalized ridge regression methods were applied to obtain stable estimates of direct and maternal breed additive, dominance, and epistatic loss effects in the presence of multicollinearity among predictor variables. Preweaning weight gains of beef calves in Ontario, Canada, from 1986 to 1999 were analyzed. The genetic model included fixed direct and maternal breed additive, dominance, and epistatic loss effects, fixed environmental effects of age of the calf, contemporary group, and age of the dam x sex of the calf, random additive direct and maternal genetic effects, and random maternal permanent environment effect. The degree and the nature of the multicollinearity were identified and ridge regression methods were used as an alternative to ordinary least squares (LS). Ridge parameters were obtained using two different objective methods: 1) generalized ridge estimator of Hoerl and Kennard (R1); and 2) bootstrap in combination with cross-validation (R2). Both ridge regression methods outperformed the LS estimator with respect to mean squared error of predictions (MSEP) and variance inflation factors (VIF) computed over 100 bootstrap samples. The MSEP of R1 and R2 were similar, and they were 3% less than the MSEP of LS. The average VIF of LS, R1, and R2 were equal to 26.81, 6.10, and 4.18, respectively. Ridge regression methods were particularly effective in decreasing the multicollinearity involving predictor variables of breed additive effects. Because of a high degree of confounding between estimates of maternal dominance and direct epistatic loss effects, it was not possible to compare the relative importance of these effects with a high level of confidence. The inclusion of epistatic loss effects in the additive-dominance model did not cause noticeable reranking of sires, dams, and calves based on across-breed EBV. More precise estimates of breed effects as a result of this study may result in more stable across-breed estimated breeding values over the years.

  16. Low-flow, base-flow, and mean-flow regression equations for Pennsylvania streams

    USGS Publications Warehouse

    Stuckey, Marla H.

    2006-01-01

    Low-flow, base-flow, and mean-flow characteristics are an important part of assessing water resources in a watershed. These streamflow characteristics can be used by watershed planners and regulators to determine water availability, water-use allocations, assimilative capacities of streams, and aquatic-habitat needs. Streamflow characteristics are commonly predicted by use of regression equations when a nearby streamflow-gaging station is not available. Regression equations for predicting low-flow, base-flow, and mean-flow characteristics for Pennsylvania streams were developed from data collected at 293 continuous- and partial-record streamflow-gaging stations with flow unaffected by upstream regulation, diversion, or mining. Continuous-record stations used in the regression analysis had 9 years or more of data, and partial-record stations used had seven or more measurements collected during base-flow conditions. The state was divided into five low-flow regions and regional regression equations were developed for the 7-day, 10-year; 7-day, 2-year; 30-day, 10-year; 30-day, 2-year; and 90-day, 10-year low flows using generalized least-squares regression. Statewide regression equations were developed for the 10-year, 25-year, and 50-year base flows using generalized least-squares regression. Statewide regression equations were developed for harmonic mean and mean annual flow using weighted least-squares regression. Basin characteristics found to be significant explanatory variables at the 95-percent confidence level for one or more regression equations were drainage area, basin slope, thickness of soil, stream density, mean annual precipitation, mean elevation, and the percentage of glaciation, carbonate bedrock, forested area, and urban area within a basin. Standard errors of prediction ranged from 33 to 66 percent for the n-day, T-year low flows; 21 to 23 percent for the base flows; and 12 to 38 percent for the mean annual flow and harmonic mean, respectively. The regression equations are not valid in watersheds with upstream regulation, diversions, or mining activities. Watersheds with karst features need close examination as to the applicability of the regression-equation results.

  17. Methods for estimating annual exceedance-probability discharges and largest recorded floods for unregulated streams in rural Missouri

    USGS Publications Warehouse

    Southard, Rodney E.; Veilleux, Andrea G.

    2014-01-01

    Regression analysis techniques were used to develop a set of equations for rural ungaged stream sites for estimating discharges with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. Basin and climatic characteristics were computed using geographic information software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses. Annual exceedance-probability discharge estimates were computed for 278 streamgages by using the expected moments algorithm to fit a log-Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data from water year 1844 to 2012. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized multiple Grubbs-Beck test was used to detect potentially influential low floods. Annual peak flows less than a minimum recordable discharge at a streamgage were incorporated into the at-site station analyses. An updated regional skew coefficient was determined for the State of Missouri using Bayesian weighted least-squares/generalized least squares regression analyses. At-site skew estimates for 108 long-term streamgages with 30 or more years of record and the 35 basin characteristics defined for this study were used to estimate the regional variability in skew. However, a constant generalized-skew value of -0.30 and a mean square error of 0.14 were determined in this study. Previous flood studies indicated that the distinct physical features of the three physiographic provinces have a pronounced effect on the magnitude of flood peaks. Trends in the magnitudes of the residuals from preliminary statewide regression analyses from previous studies confirmed that regional analyses in this study were similar and related to three primary physiographic provinces. The final regional regression analyses resulted in three sets of equations. For Regions 1 and 2, the basin characteristics of drainage area and basin shape factor were statistically significant. For Region 3, because of the small amount of data from streamgages, only drainage area was statistically significant. Average standard errors of prediction ranged from 28.7 to 38.4 percent for flood region 1, 24.1 to 43.5 percent for flood region 2, and 25.8 to 30.5 percent for region 3. The regional regression equations are only applicable to stream sites in Missouri with flows not significantly affected by regulation, channelization, backwater, diversion, or urbanization. Basins with about 5 percent or less impervious area were considered to be rural. Applicability of the equations are limited to the basin characteristic values that range from 0.11 to 8,212.38 square miles (mi2) and basin shape from 2.25 to 26.59 for Region 1, 0.17 to 4,008.92 mi2 and basin shape 2.04 to 26.89 for Region 2, and 2.12 to 2,177.58 mi2 for Region 3. Annual peak data from streamgages were used to qualitatively assess the largest floods recorded at streamgages in Missouri since the 1915 water year. Based on existing streamgage data, the 1983 flood event was the largest flood event on record since 1915. The next five largest flood events, in descending order, took place in 1993, 1973, 2008, 1994 and 1915. Since 1915, five of six of the largest floods on record occurred from 1973 to 2012.

  18. The relationship between employment and health and health care among working-age adults with and without disabilities in the United States.

    PubMed

    Reichard, Amanda; Stransky, Michelle; Brucker, Debra; Houtenville, Andrew

    2018-05-20

    To better understand the relationship between employment and health and health care for people with disabilities in the United States (US). We pooled US Medical Expenditure Panel Survey (2004-2010) data to examine health status, and access to health care among working-age adults, comparing people with physical disabilities or multiple disabilities to people without disabilities, based on their employment status. Logistic regression and least squares regression were conducted, controlling for sociodemographics, health insurance (when not the outcome), multiple chronic conditions, and need for assistance. Employment was inversely related to access to care, insurance, and obesity. Yet, people with disabilities employed in the past year reported better general and mental health than their peers with the same disabilities who were not employed. Those who were employed were more likely to have delayed/forgone necessary care, across disability groups. Part-time employment, especially for people with multiple limitations, was associated with better health and health care outcomes than full-time employment. Findings highlight the importance of addressing employment-related causes of delayed or foregone receipt of necessary care (e.g., flex-time for attending appointments) that exist for all workers, especially those with physical or multiple disabilities. Implications for rehabilitation These findings demonstrate that rehabilitation professionals who are seeking to support employment for persons with physical limitations need to ensure that overall health concerns are adequately addressed, both for those seeking employment and for those who are currently employed. Assisting clients in prioritizing health equally with employment can ensure that both areas receive sufficient attention. Engaging with employers to develop innovative practices to improve health, health behaviors and access to care for employees with disabilities can decrease turnover, increase productivity, and ensure longer job tenure.

  19. Hyperspectral Remote Sensing of Terrestrial Ecosystem Productivity from ISS

    NASA Astrophysics Data System (ADS)

    Huemmrich, K. F.; Campbell, P. K. E.; Gao, B. C.; Flanagan, L. B.; Goulden, M.

    2017-12-01

    Data from the Hyperspectral Imager for Coastal Ocean (HICO), mounted on the International Space Station (ISS), were used to develop and test algorithms for remotely retrieving ecosystem productivity. The ISS orbit introduces both limitations and opportunities for observing ecosystem dynamics. Twenty six HICO images were used from four study sites representing different vegetation types: grasslands, shrubland, and forest. Gross ecosystem production (GEP) data from eddy covariance were matched with HICO-derived spectra. Multiple algorithms were successful relating spectral reflectance with GEP, including: Spectral Vegetation Indices (SVI), SVI in a light use efficiency model framework, spectral shape characteristics through spectral derivatives and absorption feature analysis, and statistical models leading to Multiband Hyperspectral Indices (MHI) from stepwise regressions and Partial Least Squares Regression (PLSR). Algorithms were able to achieve r2 better than 0.7 for both GEP at the overpass time and daily GEP. These algorithms were successful using a diverse set of observations combining data from multiple years, multiple times during growing season, different times of day, with different view angles, and different vegetation types. The demonstrated robustness of the algorithms presented in this study over these conditions provides some confidence in mapping spatial patterns of GEP, describing variability within fields as well as the regional patterns based only on spectral reflectance information. The ISS orbit provides periods with multiple observations collected at different times of the day within a period of a few days. Diurnal GEP patterns were estimated comparing the half-hourly average GEP from the flux tower against HICO estimates of GEP (r2=0.87) if morning, midday, and afternoon observations were available for average fluxes in the time period.

  20. On estimating gravity anomalies - A comparison of least squares collocation with conventional least squares techniques

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Lowrey, B.

    1977-01-01

    The least squares collocation algorithm for estimating gravity anomalies from geodetic data is shown to be an application of the well known regression equations which provide the mean and covariance of a random vector (gravity anomalies) given a realization of a correlated random vector (geodetic data). It is also shown that the collocation solution for gravity anomalies is equivalent to the conventional least-squares-Stokes' function solution when the conventional solution utilizes properly weighted zero a priori estimates. The mathematical and physical assumptions underlying the least squares collocation estimator are described.

  1. 4D-LQTA-QSAR and docking study on potent Gram-negative specific LpxC inhibitors: a comparison to CoMFA modeling.

    PubMed

    Ghasemi, Jahan B; Safavi-Sohi, Reihaneh; Barbosa, Euzébio G

    2012-02-01

    A quasi 4D-QSAR has been carried out on a series of potent Gram-negative LpxC inhibitors. This approach makes use of the molecular dynamics (MD) trajectories and topology information retrieved from the GROMACS package. This new methodology is based on the generation of a conformational ensemble profile, CEP, for each compound instead of only one conformation, followed by the calculation intermolecular interaction energies at each grid point considering probes and all aligned conformations resulting from MD simulations. These interaction energies are independent variables employed in a QSAR analysis. The comparison of the proposed methodology to comparative molecular field analysis (CoMFA) formalism was performed. This methodology explores jointly the main features of CoMFA and 4D-QSAR models. Step-wise multiple linear regression was used for the selection of the most informative variables. After variable selection, multiple linear regression (MLR) and partial least squares (PLS) methods used for building the regression models. Leave-N-out cross-validation (LNO), and Y-randomization were performed in order to confirm the robustness of the model in addition to analysis of the independent test set. Best models provided the following statistics: [Formula in text] (PLS) and [Formula in text] (MLR). Docking study was applied to investigate the major interactions in protein-ligand complex with CDOCKER algorithm. Visualization of the descriptors of the best model helps us to interpret the model from the chemical point of view, supporting the applicability of this new approach in rational drug design.

  2. Radon-222 concentrations in ground water and soil gas on Indian reservations in Wisconsin

    USGS Publications Warehouse

    DeWild, John F.; Krohelski, James T.

    1995-01-01

    For sites with wells finished in the sand and gravel aquifer, the coefficient of determination (R2) of the regression of concentration of radon-222 in ground water as a function of well depth is 0.003 and the significance level is 0.32, which indicates that there is not a statistically significant relation between radon-222 concentrations in ground water and well depth. The coefficient of determination of the regression of radon-222 in ground water and soil gas is 0.19 and the root mean square error of the regression line is 271 picocuries per liter. Even though the significance level (0.036) indicates a statistical relation, the root mean square error of the regression is so large that the regression equation would not give reliable predictions. Because of an inadequate number of samples, similar statistical analyses could not be performed for sites with wells finished in the crystalline and sedimentary bedrock aquifers.

  3. Quantum State Tomography via Linear Regression Estimation

    PubMed Central

    Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan

    2013-01-01

    A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519

  4. Sampling system for wheat (Triticum aestivum L) area estimation using digital LANDSAT MSS data and aerial photographs. [Brazil

    NASA Technical Reports Server (NTRS)

    Parada, N. D. J. (Principal Investigator); Moreira, M. A.; Chen, S. C.; Batista, G. T.

    1984-01-01

    A procedure to estimate wheat (Triticum aestivum L) area using sampling technique based on aerial photographs and digital LANDSAT MSS data is developed. Aerial photographs covering 720 square km are visually analyzed. To estimate wheat area, a regression approach is applied using different sample sizes and various sampling units. As the size of sampling unit decreased, the percentage of sampled area required to obtain similar estimation performance also decreased. The lowest percentage of the area sampled for wheat estimation with relatively high precision and accuracy through regression estimation is 13.90% using 10 square km as the sampling unit. Wheat area estimation using only aerial photographs is less precise and accurate than those obtained by regression estimation.

  5. Modeling Group Differences in OLS and Orthogonal Regression: Implications for Differential Validity Studies

    ERIC Educational Resources Information Center

    Kane, Michael T.; Mroch, Andrew A.

    2010-01-01

    In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit…

  6. Tutorial on Using Regression Models with Count Outcomes Using R

    ERIC Educational Resources Information Center

    Beaujean, A. Alexander; Morgan, Grant B.

    2016-01-01

    Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…

  7. Teaching the Concept of Breakdown Point in Simple Linear Regression.

    ERIC Educational Resources Information Center

    Chan, Wai-Sum

    2001-01-01

    Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…

  8. Principles of Quantile Regression and an Application

    ERIC Educational Resources Information Center

    Chen, Fang; Chalhoub-Deville, Micheline

    2014-01-01

    Newer statistical procedures are typically introduced to help address the limitations of those already in practice or to deal with emerging research needs. Quantile regression (QR) is introduced in this paper as a relatively new methodology, which is intended to overcome some of the limitations of least squares mean regression (LMR). QR is more…

  9. The concept of psychological regression: metaphors, mapping, Queen Square, and Tavistock Square.

    PubMed

    Mercer, Jean

    2011-05-01

    The term "regression" refers to events in which an individual changes from his or her present level of maturity and regains mental and behavioral characteristics shown at an earlier point in development. This definition has remained constant for over a century, but the implications of the concept have changed systematically from a perspective in which regression was considered pathological, to a current view in which regression may be seen as a positive step in psychotherapy or as a part of normal development. The concept of regression, famously employed by Sigmund Freud and others in his circle, derived from ideas suggested by Herbert Spencer and by John Hughlings Jackson. By the 1940s and '50s, the regression concept was applied by Winnicott and others in treatment of disturbed children and in adult psychotherapy. In addition, behavioral regression came to be seen as a part of a normal developmental trajectory, with a focus on expectable variability. The present article examines historical changes in the regression concept in terms of mapping to biomedical or other metaphors, in terms of a movement from earlier nativism toward an increased environmentalism in psychology, and with respect to other historical factors such as wartime events. The role of dominant metaphors in shifting perspectives on regression is described.

  10. The Increase of Energy Consumption and Carbon Dioxide (CO2) Emission in Indonesia

    NASA Astrophysics Data System (ADS)

    Sasana, Hadi; Putri, Annisa Eka

    2018-02-01

    In the last decade, the increase of energy consumption that has multiplied carbondioxide emissions becomes world problems, especially in the developing countries undergoing industrialization to be developed ones like Indonesia. This aim of this study was to analyze the effect of fossil energy consumption, population growth, and consumption of renewable energy on carbon dioxide emission. The method used was multiple linear regression analysis with Ordinary Least Square approach using time series in the period of 1990 - 2014. The result showed that fossil energy consumption and population growth have a positive influence on carbon dioxide emission in Indonesia. Meanwhile, the consumption variable of renewable energy has a negative effect on the level of carbon dioxide emissions produced.

  11. QSAR study of curcumine derivatives as HIV-1 integrase inhibitors.

    PubMed

    Gupta, Pawan; Sharma, Anju; Garg, Prabha; Roy, Nilanjan

    2013-03-01

    A QSAR study was performed on curcumine derivatives as HIV-1 integrase inhibitors using multiple linear regression. The statistically significant model was developed with squared correlation coefficients (r(2)) 0.891 and cross validated r(2) (r(2) cv) 0.825. The developed model revealed that electronic, shape, size, geometry, substitution's information and hydrophilicity were important atomic properties for determining the inhibitory activity of these molecules. The model was also tested successfully for external validation (r(2) pred = 0.849) as well as Tropsha's test for model predictability. Furthermore, the domain analysis was carried out to evaluate the prediction reliability of external set molecules. The model was statistically robust and had good predictive power which can be successfully utilized for screening of new molecules.

  12. Recursive least squares method of regression coefficients estimation as a special case of Kalman filter

    NASA Astrophysics Data System (ADS)

    Borodachev, S. M.

    2016-06-01

    The simple derivation of recursive least squares (RLS) method equations is given as special case of Kalman filter estimation of a constant system state under changing observation conditions. A numerical example illustrates application of RLS to multicollinearity problem.

  13. Evaluating the performance of different predictor strategies in regression-based downscaling with a focus on glacierized mountain environments

    NASA Astrophysics Data System (ADS)

    Hofer, Marlis; Nemec, Johanna

    2016-04-01

    This study presents first steps towards verifying the hypothesis that uncertainty in global and regional glacier mass simulations can be reduced considerably by reducing the uncertainty in the high-resolution atmospheric input data. To this aim, we systematically explore the potential of different predictor strategies for improving the performance of regression-based downscaling approaches. The investigated local-scale target variables are precipitation, air temperature, wind speed, relative humidity and global radiation, all at a daily time scale. Observations of these target variables are assessed from three sites in geo-environmentally and climatologically very distinct settings, all within highly complex topography and in the close proximity to mountain glaciers: (1) the Vernagtbach station in the Northern European Alps (VERNAGT), (2) the Artesonraju measuring site in the tropical South American Andes (ARTESON), and (3) the Brewster measuring site in the Southern Alps of New Zealand (BREWSTER). As the large-scale predictors, ERA interim reanalysis data are used. In the applied downscaling model training and evaluation procedures, particular emphasis is put on appropriately accounting for the pitfalls of limited and/or patchy observation records that are usually the only (if at all) available data from the glacierized mountain sites. Generalized linear models and beta regression are investigated as alternatives to ordinary least squares regression for the non-Gaussian target variables. By analyzing results for the three different sites, five predictands and for different times of the year, we look for systematic improvements in the downscaling models' skill specifically obtained by (i) using predictor data at the optimum scale rather than the minimum scale of the reanalysis data, (ii) identifying the optimum predictor allocation in the vertical, and (iii) considering multiple (variable, level and/or grid point) predictor options combined with state-of-art empirical feature selection tools. First results show that in particular for air temperature, those downscaling models based on direct predictor selection show comparative skill like those models based on multiple predictors. For all other target variables, however, multiple predictor approaches can considerably outperform those models based on single predictors. Including multiple variable types emerges as the most promising predictor option (in particular for wind speed at all sites), even if the same predictor set is used across the different cases.

  14. Adaptive Digital Signature Design and Short-Data-Record Adaptive Filtering

    DTIC Science & Technology

    2008-04-01

    rate BPSK binary phase shift keying CA − CFAR cell averaging− constant false alarm rate CDMA code − division multiple − access CFAR constant false...Cotae, “Spreading sequence design for multiple cell synchronous DS-CDMA systems under total weighted squared correlation criterion,” EURASIP Journal...415-428, Mar. 2002. [6] P. Cotae, “Spreading sequence design for multiple cell synchronous DS-CDMA systems under total weighted squared correlation

  15. Parameter estimation of Monod model by the Least-Squares method for microalgae Botryococcus Braunii sp

    NASA Astrophysics Data System (ADS)

    See, J. J.; Jamaian, S. S.; Salleh, R. M.; Nor, M. E.; Aman, F.

    2018-04-01

    This research aims to estimate the parameters of Monod model of microalgae Botryococcus Braunii sp growth by the Least-Squares method. Monod equation is a non-linear equation which can be transformed into a linear equation form and it is solved by implementing the Least-Squares linear regression method. Meanwhile, Gauss-Newton method is an alternative method to solve the non-linear Least-Squares problem with the aim to obtain the parameters value of Monod model by minimizing the sum of square error ( SSE). As the result, the parameters of the Monod model for microalgae Botryococcus Braunii sp can be estimated by the Least-Squares method. However, the estimated parameters value obtained by the non-linear Least-Squares method are more accurate compared to the linear Least-Squares method since the SSE of the non-linear Least-Squares method is less than the linear Least-Squares method.

  16. On squares of representations of compact Lie algebras

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zeier, Robert, E-mail: robert.zeier@ch.tum.de; Zimborás, Zoltán, E-mail: zimboras@gmail.com

    We study how tensor products of representations decompose when restricted from a compact Lie algebra to one of its subalgebras. In particular, we are interested in tensor squares which are tensor products of a representation with itself. We show in a classification-free manner that the sum of multiplicities and the sum of squares of multiplicities in the corresponding decomposition of a tensor square into irreducible representations has to strictly grow when restricted from a compact semisimple Lie algebra to a proper subalgebra. For this purpose, relevant details on tensor products of representations are compiled from the literature. Since the summore » of squares of multiplicities is equal to the dimension of the commutant of the tensor-square representation, it can be determined by linear-algebra computations in a scenario where an a priori unknown Lie algebra is given by a set of generators which might not be a linear basis. Hence, our results offer a test to decide if a subalgebra of a compact semisimple Lie algebra is a proper one without calculating the relevant Lie closures, which can be naturally applied in the field of controlled quantum systems.« less

  17. On estimating gravity anomalies: A comparison of least squares collocation with least squares techniques

    NASA Technical Reports Server (NTRS)

    Argentiero, P.; Lowrey, B.

    1976-01-01

    The least squares collocation algorithm for estimating gravity anomalies from geodetic data is shown to be an application of the well known regression equations which provide the mean and covariance of a random vector (gravity anomalies) given a realization of a correlated random vector (geodetic data). It is also shown that the collocation solution for gravity anomalies is equivalent to the conventional least-squares-Stokes' function solution when the conventional solution utilizes properly weighted zero a priori estimates. The mathematical and physical assumptions underlying the least squares collocation estimator are described, and its numerical properties are compared with the numerical properties of the conventional least squares estimator.

  18. An integrated, ethically driven environmental model of clinical decision making in emergency settings.

    PubMed

    Wolf, Lisa

    2013-02-01

    To explore the relationship between multiple variables within a model of critical thinking and moral reasoning. A quantitative descriptive correlational design using a purposive sample of 200 emergency nurses. Measured variables were accuracy in clinical decision-making, moral reasoning, perceived care environment, and demographics. Analysis was by bivariate correlation using Pearson's product-moment correlation coefficients, chi square and multiple linear regression analysis. The elements as identified in the integrated ethically-driven environmental model of clinical decision-making (IEDEM-CD) corrected depict moral reasoning and environment of care as factors significantly affecting accuracy in decision-making. The integrated, ethically driven environmental model of clinical decision making is a framework useful for predicting clinical decision making accuracy for emergency nurses in practice, with further implications in education, research and policy. A diagnostic and therapeutic framework for identifying and remediating individual and environmental challenges to accurate clinical decision making. © 2012, The Author. International Journal of Nursing Knowledge © 2012, NANDA International.

  19. Cross race comparisons between SES health gradients among African-American and white women at mid-life

    PubMed Central

    Salsberry, Pamela J.

    2014-01-01

    This study explored how multiple indicators of socioeconomic status (SES) inform understanding of race differences in the magnitude of health gains associated with higher SES. The study sample, 1268 African-American women and 2066 white women, was drawn from the National Longitudinal Surveys of Youth 1979. The outcome was the Physical Components Summary from the SF-12 assessed at age 40. Ordinary least squares regressions using education, income and net worth fully interacted with race were conducted. Single measure gradients tended to be steeper for whites than African-Americans, partly because “sheepskin” effects of high school and college graduation were higher for whites and low income and low net worth whites had worse health than comparable African-Americans. Conditioning on multiple measures of SES eliminated race disparities in health benefits of education and net worth, but not income. A discussion of current public policies that affect race disparities in levels of education, income and net wealth is provided. PMID:24632052

  20. Confirmatory factor analysis of the female sexual function index.

    PubMed

    Opperman, Emily A; Benson, Lindsay E; Milhausen, Robin R

    2013-01-01

    The Female Sexual Functioning Index (Rosen et al., 2000 ) was designed to assess the key dimensions of female sexual functioning using six domains: desire, arousal, lubrication, orgasm, satisfaction, and pain. A full-scale score was proposed to represent women's overall sexual function. The fifth revision to the Diagnostic and Statistical Manual (DSM) is currently underway and includes a proposal to combine desire and arousal problems. The objective of this article was to evaluate and compare four models of the Female Sexual Functioning Index: (a) single-factor model, (b) six-factor model, (c) second-order factor model, and (4) five-factor model combining the desire and arousal subscales. Cross-sectional and observational data from 85 women were used to conduct a confirmatory factor analysis on the Female Sexual Functioning Index. Local and global goodness-of-fit measures, the chi-square test of differences, squared multiple correlations, and regression weights were used. The single-factor model fit was not acceptable. The original six-factor model was confirmed, and good model fit was found for the second-order and five-factor models. Delta chi-square tests of differences supported best fit for the six-factor model validating usage of the six domains. However, when revisions are made to the DSM-5, the Female Sexual Functioning Index can adapt to reflect these changes and remain a valid assessment tool for women's sexual functioning, as the five-factor structure was also supported.

  1. Lack of Association of Estrogen Receptor Alpha Gene Polymorphisms with Cardiorespiratory and Metabolic Variables in Young Women

    PubMed Central

    Rebelo, Ana Cristina; Verlengia, Rozangela; Kunz, Vandeni; Tamburus, Nayara; Cerda, Alvaro; Hirata, Rosario; Hirata, Mario; Silva, Ester

    2012-01-01

    This study examined the association of estrogen receptor alpha gene (ESR1) polymorphisms with cardiorespiratory and metabolic parameters in young women. In total, 354 healthy women were selected for cardiopulmonary exercise testing and short-term heart rate (HR) variability (HRV) evaluation. The HRV analysis was determined by the temporal indices rMSSD (square root of the mean squared differences of successive R–R intervals (RRi) divided by the number of RRi minus one), SDNN (root mean square of differences from mean RRi, divided by the number of RRi) and power spectrum components by low frequency (LF), high frequency (HF) and LF/HF ratio. Blood samples were obtained for serum lipids, estradiol and DNA extraction. ESR1 rs2234693 and rs9340799 polymorphisms were analyzed by PCR and fragment restriction analysis. HR and oxygen uptake (VO2) values did not differ between the ESR1 polymorphisms with respect to autonomic modulation. We not find a relationship between ESR1 T–A, T–G, C–A and C–G haplotypes and cardiorespiratory and metabolic variables. Multiple linear regression analysis demonstrated that VO2, total cholesterol and triglycerides influence HRV (p < 0.05). The results suggest that ESR1 variants have no effect on cardiorespiratory and metabolic variables, while HRV indices are influenced by aerobic capacity and lipids in healthy women. PMID:23202974

  2. Parameter estimation method and updating of regional prediction equations for ungaged sites in the desert region of California

    USGS Publications Warehouse

    Barth, Nancy A.; Veilleux, Andrea G.

    2012-01-01

    The U.S. Geological Survey (USGS) is currently updating at-site flood frequency estimates for USGS streamflow-gaging stations in the desert region of California. The at-site flood-frequency analysis is complicated by short record lengths (less than 20 years is common) and numerous zero flows/low outliers at many sites. Estimates of the three parameters (mean, standard deviation, and skew) required for fitting the log Pearson Type 3 (LP3) distribution are likely to be highly unreliable based on the limited and heavily censored at-site data. In a generalization of the recommendations in Bulletin 17B, a regional analysis was used to develop regional estimates of all three parameters (mean, standard deviation, and skew) of the LP3 distribution. A regional skew value of zero from a previously published report was used with a new estimated mean squared error (MSE) of 0.20. A weighted least squares (WLS) regression method was used to develop both a regional standard deviation and a mean model based on annual peak-discharge data for 33 USGS stations throughout California’s desert region. At-site standard deviation and mean values were determined by using an expected moments algorithm (EMA) method for fitting the LP3 distribution to the logarithms of annual peak-discharge data. Additionally, a multiple Grubbs-Beck (MGB) test, a generalization of the test recommended in Bulletin 17B, was used for detecting multiple potentially influential low outliers in a flood series. The WLS regression found that no basin characteristics could explain the variability of standard deviation. Consequently, a constant regional standard deviation model was selected, resulting in a log-space value of 0.91 with a MSE of 0.03 log units. Yet drainage area was found to be statistically significant at explaining the site-to-site variability in mean. The linear WLS regional mean model based on drainage area had a Pseudo- 2 R of 51 percent and a MSE of 0.32 log units. The regional parameter estimates were then used to develop a set of equations for estimating flows with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities for ungaged basins. The final equations are functions of drainage area.Average standard errors of prediction for these regression equations range from 214.2 to 856.2 percent.

  3. Estimating flood magnitude and frequency at gaged and ungaged sites on streams in Alaska and conterminous basins in Canada, based on data through water year 2012

    USGS Publications Warehouse

    Curran, Janet H.; Barth, Nancy A.; Veilleux, Andrea G.; Ourso, Robert T.

    2016-03-16

    Estimates of the magnitude and frequency of floods are needed across Alaska for engineering design of transportation and water-conveyance structures, flood-insurance studies, flood-plain management, and other water-resource purposes. This report updates methods for estimating flood magnitude and frequency in Alaska and conterminous basins in Canada. Annual peak-flow data through water year 2012 were compiled from 387 streamgages on unregulated streams with at least 10 years of record. Flood-frequency estimates were computed for each streamgage using the Expected Moments Algorithm to fit a Pearson Type III distribution to the logarithms of annual peak flows. A multiple Grubbs-Beck test was used to identify potentially influential low floods in the time series of peak flows for censoring in the flood frequency analysis.For two new regional skew areas, flood-frequency estimates using station skew were computed for stations with at least 25 years of record for use in a Bayesian least-squares regression analysis to determine a regional skew value. The consideration of basin characteristics as explanatory variables for regional skew resulted in improvements in precision too small to warrant the additional model complexity, and a constant model was adopted. Regional Skew Area 1 in eastern-central Alaska had a regional skew of 0.54 and an average variance of prediction of 0.45, corresponding to an effective record length of 22 years. Regional Skew Area 2, encompassing coastal areas bordering the Gulf of Alaska, had a regional skew of 0.18 and an average variance of prediction of 0.12, corresponding to an effective record length of 59 years. Station flood-frequency estimates for study sites in regional skew areas were then recomputed using a weighted skew incorporating the station skew and regional skew. In a new regional skew exclusion area outside the regional skew areas, the density of long-record streamgages was too sparse for regional analysis and station skew was used for all estimates. Final station flood frequency estimates for all study streamgages are presented for the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities.Regional multiple-regression analysis was used to produce equations for estimating flood frequency statistics from explanatory basin characteristics. Basin characteristics, including physical and climatic variables, were updated for all study streamgages using a geographical information system and geospatial source data. Screening for similar-sized nested basins eliminated hydrologically redundant sites, and screening for eligibility for analysis of explanatory variables eliminated regulated peaks, outburst peaks, and sites with indeterminate basin characteristics. An ordinary least‑squares regression used flood-frequency statistics and basin characteristics for 341 streamgages (284 in Alaska and 57 in Canada) to determine the most suitable combination of basin characteristics for a flood-frequency regression model and to explore regional grouping of streamgages for explaining variability in flood-frequency statistics across the study area. The most suitable model for explaining flood frequency used drainage area and mean annual precipitation as explanatory variables for the entire study area as a region. Final regression equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probability discharge in Alaska and conterminous basins in Canada were developed using a generalized least-squares regression. The average standard error of prediction for the regression equations for the various annual exceedance probabilities ranged from 69 to 82 percent, and the pseudo-coefficient of determination (pseudo-R2) ranged from 85 to 91 percent.The regional regression equations from this study were incorporated into the U.S. Geological Survey StreamStats program for a limited area of the State—the Cook Inlet Basin. StreamStats is a national web-based geographic information system application that facilitates retrieval of streamflow statistics and associated information. StreamStats retrieves published data for gaged sites and, for user-selected ungaged sites, delineates drainage areas from topographic and hydrographic data, computes basin characteristics, and computes flood frequency estimates using the regional regression equations.

  4. Rapid Detection of Volatile Oil in Mentha haplocalyx by Near-Infrared Spectroscopy and Chemometrics.

    PubMed

    Yan, Hui; Guo, Cheng; Shao, Yang; Ouyang, Zhen

    2017-01-01

    Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . The effects of data pre-processing methods on the accuracy of the PLSR calibration models were investigated. The performance of the final model was evaluated according to the correlation coefficient ( R ) and root mean square error of prediction (RMSEP). For PLSR model, the best preprocessing method combination was first-order derivative, standard normal variate transformation (SNV), and mean centering, which had of 0.8805, of 0.8719, RMSEC of 0.091, and RMSEP of 0.097, respectively. The wave number variables linking to volatile oil are from 5500 to 4000 cm-1 by analyzing the loading weights and variable importance in projection (VIP) scores. For SVM model, six LVs (less than seven LVs in PLSR model) were adopted in model, and the result was better than PLSR model. The and were 0.9232 and 0.9202, respectively, with RMSEC and RMSEP of 0.084 and 0.082, respectively, which indicated that the predicted values were accurate and reliable. This work demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in M. haplocalyx . The quality of medicine directly links to clinical efficacy, thus, it is important to control the quality of Mentha haplocalyx . Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . For SVM model, 6 LVs (less than 7 LVs in PLSR model) were adopted in model, and the result was better than PLSR model. It demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in Mentha haplocalyx . Abbreviations used: 1 st der: First-order derivative; 2 nd der: Second-order derivative; LOO: Leave-one-out; LVs: Latent variables; MC: Mean centering, NIR: Near-infrared; NIRS: Near infrared spectroscopy; PCR: Principal component regression, PLSR: Partial least squares regression; RBF: Radial basis function; RMSEC: Root mean square error of cross validation, RMSEC: Root mean square error of calibration; RMSEP: Root mean square error of prediction; SNV: Standard normal variate transformation; SVM: Support vector machine; VIP: Variable Importance in projection.

  5. Estimating peak discharges, flood volumes, and hydrograph shapes of small ungaged urban streams in Ohio

    USGS Publications Warehouse

    Sherwood, J.M.

    1986-01-01

    Methods are presented for estimating peak discharges, flood volumes and hydrograph shapes of small (less than 5 sq mi) urban streams in Ohio. Examples of how to use the various regression equations and estimating techniques also are presented. Multiple-regression equations were developed for estimating peak discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years. The significant independent variables affecting peak discharge are drainage area, main-channel slope, average basin-elevation index, and basin-development factor. Standard errors of regression and prediction for the peak discharge equations range from +/-37% to +/-41%. An equation also was developed to estimate the flood volume of a given peak discharge. Peak discharge, drainage area, main-channel slope, and basin-development factor were found to be the significant independent variables affecting flood volumes for given peak discharges. The standard error of regression for the volume equation is +/-52%. A technique is described for estimating the shape of a runoff hydrograph by applying a specific peak discharge and the estimated lagtime to a dimensionless hydrograph. An equation for estimating the lagtime of a basin was developed. Two variables--main-channel length divided by the square root of the main-channel slope and basin-development factor--have a significant effect on basin lagtime. The standard error of regression for the lagtime equation is +/-48%. The data base for the study was established by collecting rainfall-runoff data at 30 basins distributed throughout several metropolitan areas of Ohio. Five to eight years of data were collected at a 5-min record interval. The USGS rainfall-runoff model A634 was calibrated for each site. The calibrated models were used in conjunction with long-term rainfall records to generate a long-term streamflow record for each site. Each annual peak-discharge record was fitted to a Log-Pearson Type III frequency curve. Multiple-regression techniques were then used to analyze the peak discharge data as a function of the basin characteristics of the 30 sites. (Author 's abstract)

  6. A comparison between the use of Cox regression and the use of partial least squares-Cox regression to predict the survival of kidney-transplant patients

    NASA Astrophysics Data System (ADS)

    Solimun

    2017-05-01

    The aim of this research is to model survival data from kidney-transplant patients using the partial least squares (PLS)-Cox regression, which can both meet and not meet the no-multicollinearity assumption. The secondary data were obtained from research entitled "Factors affecting the survival of kidney-transplant patients". The research subjects comprised 250 patients. The predictor variables consisted of: age (X1), sex (X2); two categories, prior hemodialysis duration (X3), diabetes (X4); two categories, prior transplantation number (X5), number of blood transfusions (X6), discrepancy score (X7), use of antilymphocyte globulin(ALG) (X8); two categories, while the response variable was patient survival time (in months). Partial least squares regression is a model that connects the predictor variables X and the response variable y and it initially aims to determine the relationship between them. Results of the above analyses suggest that the survival of kidney transplant recipients ranged from 0 to 55 months, with 62% of the patients surviving until they received treatment that lasted for 55 months. The PLS-Cox regression analysis results revealed that patients' age and the use of ALG significantly affected the survival time of patients. The factor of patients' age (X1) in the PLS-Cox regression model merely affected the failure probability by 1.201. This indicates that the probability of dying for elderly patients with a kidney transplant is 1.152 times higher than that for younger patients.

  7. Determination of total iron-reactive phenolics, anthocyanins and tannins in wine grapes of skins and seeds based on near-infrared hyperspectral imaging.

    PubMed

    Zhang, Ni; Liu, Xu; Jin, Xiaoduo; Li, Chen; Wu, Xuan; Yang, Shuqin; Ning, Jifeng; Yanne, Paul

    2017-12-15

    Phenolics contents in wine grapes are key indicators for assessing ripeness. Near-infrared hyperspectral images during ripening have been explored to achieve an effective method for predicting phenolics contents. Principal component regression (PCR), partial least squares regression (PLSR) and support vector regression (SVR) models were built, respectively. The results show that SVR behaves globally better than PLSR and PCR, except in predicting tannins content of seeds. For the best prediction results, the squared correlation coefficient and root mean square error reached 0.8960 and 0.1069g/L (+)-catechin equivalents (CE), respectively, for tannins in skins, 0.9065 and 0.1776 (g/L CE) for total iron-reactive phenolics (TIRP) in skins, 0.8789 and 0.1442 (g/L M3G) for anthocyanins in skins, 0.9243 and 0.2401 (g/L CE) for tannins in seeds, and 0.8790 and 0.5190 (g/L CE) for TIRP in seeds. Our results indicated that NIR hyperspectral imaging has good prospects for evaluation of phenolics in wine grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. The relationship between air pollution, fossil fuel energy consumption, and water resources in the panel of selected Asia-Pacific countries.

    PubMed

    Rafindadi, Abdulkadir Abdulrashid; Yusof, Zarinah; Zaman, Khalid; Kyophilavong, Phouphet; Akhmat, Ghulam

    2014-10-01

    The objective of the study is to examine the relationship between air pollution, fossil fuel energy consumption, water resources, and natural resource rents in the panel of selected Asia-Pacific countries, over a period of 1975-2012. The study includes number of variables in the model for robust analysis. The results of cross-sectional analysis show that there is a significant relationship between air pollution, energy consumption, and water productivity in the individual countries of Asia-Pacific. However, the results of each country vary according to the time invariant shocks. For this purpose, the study employed the panel least square technique which includes the panel least square regression, panel fixed effect regression, and panel two-stage least square regression. In general, all the panel tests indicate that there is a significant and positive relationship between air pollution, energy consumption, and water resources in the region. The fossil fuel energy consumption has a major dominating impact on the changes in the air pollution in the region.

  9. Retrieving relevant factors with exploratory SEM and principal-covariate regression: A comparison.

    PubMed

    Vervloet, Marlies; Van den Noortgate, Wim; Ceulemans, Eva

    2018-02-12

    Behavioral researchers often linearly regress a criterion on multiple predictors, aiming to gain insight into the relations between the criterion and predictors. Obtaining this insight from the ordinary least squares (OLS) regression solution may be troublesome, because OLS regression weights show only the effect of a predictor on top of the effects of other predictors. Moreover, when the number of predictors grows larger, it becomes likely that the predictors will be highly collinear, which makes the regression weights' estimates unstable (i.e., the "bouncing beta" problem). Among other procedures, dimension-reduction-based methods have been proposed for dealing with these problems. These methods yield insight into the data by reducing the predictors to a smaller number of summarizing variables and regressing the criterion on these summarizing variables. Two promising methods are principal-covariate regression (PCovR) and exploratory structural equation modeling (ESEM). Both simultaneously optimize reduction and prediction, but they are based on different frameworks. The resulting solutions have not yet been compared; it is thus unclear what the strengths and weaknesses are of both methods. In this article, we focus on the extents to which PCovR and ESEM are able to extract the factors that truly underlie the predictor scores and can predict a single criterion. The results of two simulation studies showed that for a typical behavioral dataset, ESEM (using the BIC for model selection) in this regard is successful more often than PCovR. Yet, in 93% of the datasets PCovR performed equally well, and in the case of 48 predictors, 100 observations, and large differences in the strengths of the factors, PCovR even outperformed ESEM.

  10. Universal Algorithms for Plant Phenotyping: Are we there yet?

    NASA Astrophysics Data System (ADS)

    Kakani, V. G.; Kambham, R. R.; Zhao, D.; Foster, A. J.; Gowda, P. H.

    2017-12-01

    Hyperspectral remote sensing offers ability to capture spectral signatures of plant morpho-physio-biochemical traits at multiple scales (leaf to canopy to aerial). Experimental results on plant phenotype from pot, growth chamber and field studies at multiple location were used in this study. Pigment, leaf/plant water status, plant nutrient status, plant height, leaf area, fresh and dry weights of biomass and its components are correlated with hyperspectral reflectance signatures. Leaf reflectance was collected with spectroradiometer having a light source. Canopy hyperspectral reflectance was collected from 1.5 m above the canopy using a spectroradiometer, while multispectral images were acquired from aerial platforms ( 400m). Several statistical methods including simple ratios, principal component analysis, and partial least squares regression were used to identify hyperspectral reflectance bands that were tightly associated with plant phenotypic traits. Leaf level spectra best described the morpho-physio-biochemical traits (R2 = 0.6-0.9), while canopy reflectance best described plant height (R2 = 0.65), leaf area index (R2 = 0.67-0.74) and biomass (R2 = 0.69-0.78), while aerial spectra improved canopy level regression coefficients for plant height (R2 = 0.93) and leaf area index (R2 = 0.89). The comparison of multi-level spectra and resolution, clearly showed the advantage of hyperspectral reflectance data over the multispectral reflectance data, particularly for understanding the basis for spectral reflectance differences among species and traits. In conclusion, high resolution (1-2 cm) spectral imagery can help to bridge the gap across multiple levels of phenotype measurement.

  11. A New Test of Linear Hypotheses in OLS Regression under Heteroscedasticity of Unknown Form

    ERIC Educational Resources Information Center

    Cai, Li; Hayes, Andrew F.

    2008-01-01

    When the errors in an ordinary least squares (OLS) regression model are heteroscedastic, hypothesis tests involving the regression coefficients can have Type I error rates that are far from the nominal significance level. Asymptotically, this problem can be rectified with the use of a heteroscedasticity-consistent covariance matrix (HCCM)…

  12. Deriving the Regression Equation without Using Calculus

    ERIC Educational Resources Information Center

    Gordon, Sheldon P.; Gordon, Florence S.

    2004-01-01

    Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…

  13. The Collinearity Free and Bias Reduced Regression Estimation Project: The Theory of Normalization Ridge Regression. Report No. 2.

    ERIC Educational Resources Information Center

    Bulcock, J. W.; And Others

    Multicollinearity refers to the presence of highly intercorrelated independent variables in structural equation models, that is, models estimated by using techniques such as least squares regression and maximum likelihood. There is a problem of multicollinearity in both the natural and social sciences where theory formulation and estimation is in…

  14. Independent contrasts and PGLS regression estimators are equivalent.

    PubMed

    Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary

    2012-05-01

    We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.

  15. Understanding Scaling Relations in Fracture and Mechanical Deformation of Single Crystal and Polycrystalline Silicon by Performing Atomistic Simulations at Mesoscale

    DTIC Science & Technology

    2009-07-16

    0.25 0.26 -0.85 1 SSR SSE R SSTO SSTO = = − 2 2 ˆ( ) : Regression sum of square, ˆwhere : mean value, : value from the fitted line ˆ...Error sum of square : Total sum of square i i i i SSR Y Y Y Y SSE Y Y SSTO SSE SSR = − = − = + ∑ ∑ Statistical analysis: Coefficient of correlation

  16. The effect of playing tactics and situational variables on achieving score-box possessions in a professional soccer team.

    PubMed

    Lago-Ballesteros, Joaquin; Lago-Peñas, Carlos; Rey, Ezequiel

    2012-01-01

    The aim of this study was to analyse the influence of playing tactics, opponent interaction and situational variables on achieving score-box possessions in professional soccer. The sample was constituted by 908 possessions obtained by a team from the Spanish soccer league in 12 matches played during the 2009-2010 season. Multidimensional qualitative data obtained from 12 ordered categorical variables were used. Sampled matches were registered by the AMISCO PRO system. Data were analysed using chi-square analysis and multiple logistic regression analysis. Of 908 possessions, 303 (33.4%) produced score-box possessions, 477 (52.5%) achieved progression and 128 (14.1%) failed to reach any sort of progression. Multiple logistic regression showed that, for the main variable "team possession type", direct attacks and counterattacks were three times more effective than elaborate attacks for producing a score-box possession (P < 0.05). Team possession originating from the middle zones and playing against less than six defending players (P < 0.001) registered a higher success than those started in the defensive zone with a balanced defence. When the team was drawing or winning, the probability of reaching the score-box decreased by 43 and 53 percent, respectively, compared with the losing situation (P < 0.05). Accounting for opponent interactions and situational variables is critical to evaluate the effectiveness of offensive playing tactics on producing score-box possessions.

  17. Association of dentine hypersensitivity with different risk factors - a cross sectional study.

    PubMed

    Vijaya, V; Sanjay, Venkataraam; Varghese, Rana K; Ravuri, Rajyalakshmi; Agarwal, Anil

    2013-12-01

    This study was done to assess the prevalence of Dentine hypersensitivity (DH) and its associated risk factors. This epidemiological study was done among patients coming to dental college regarding prevalence of DH. A self structured questionnaire along with clinical examination was done for assessment. Descriptive statistics were obtained and frequency distribution was calculated using Chi square test at p value <0.05. Stepwise multiple linear regression was also done to access frequency of DH with different factors. The study population was comprised of 655 participants with different age groups. Our study showed prevalence as 55% and it was more common among males. Similarly smokers and those who use hard tooth brush had more cases of DH. Step wise multiple linear regression showed that best predictor for DH was age followed by habit of smoking and type of tooth brush. Most aggravating factors were cold water (15.4%) and sweet foods (14.7%), whereas only 5% of the patients had it while brushing. A high level of dental hypersensitivity has been in this study and more common among males. A linear finding was shown with age, smoking and type of tooth brush. How to cite this article: Vijaya V, Sanjay V, Varghese RK, Ravuri R, Agarwal A. Association of Dentine Hypersensitivity with Different Risk Factors - A Cross Sectional Study. J Int Oral Health 2013;5(6):88-92 .

  18. An Application of Robust Method in Multiple Linear Regression Model toward Credit Card Debt

    NASA Astrophysics Data System (ADS)

    Amira Azmi, Nur; Saifullah Rusiman, Mohd; Khalid, Kamil; Roslan, Rozaini; Sufahani, Suliadi; Mohamad, Mahathir; Salleh, Rohayu Mohd; Hamzah, Nur Shamsidah Amir

    2018-04-01

    Credit card is a convenient alternative replaced cash or cheque, and it is essential component for electronic and internet commerce. In this study, the researchers attempt to determine the relationship and significance variables between credit card debt and demographic variables such as age, household income, education level, years with current employer, years at current address, debt to income ratio and other debt. The provided data covers 850 customers information. There are three methods that applied to the credit card debt data which are multiple linear regression (MLR) models, MLR models with least quartile difference (LQD) method and MLR models with mean absolute deviation method. After comparing among three methods, it is found that MLR model with LQD method became the best model with the lowest value of mean square error (MSE). According to the final model, it shows that the years with current employer, years at current address, household income in thousands and debt to income ratio are positively associated with the amount of credit debt. Meanwhile variables for age, level of education and other debt are negatively associated with amount of credit debt. This study may serve as a reference for the bank company by using robust methods, so that they could better understand their options and choice that is best aligned with their goals for inference regarding to the credit card debt.

  19. Analysis of methods to estimate spring flows in a karst aquifer

    USGS Publications Warehouse

    Sepulveda, N.

    2009-01-01

    Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer. ?? 2008 National Ground Water Association.

  20. Analysis of methods to estimate spring flows in a karst aquifer.

    PubMed

    Sepúlveda, Nicasio

    2009-01-01

    Hydraulically and statistically based methods were analyzed to identify the most reliable method to predict spring flows in a karst aquifer. Measured water levels at nearby observation wells, measured spring pool altitudes, and the distance between observation wells and the spring pool were the parameters used to match measured spring flows. Measured spring flows at six Upper Floridan aquifer springs in central Florida were used to assess the reliability of these methods to predict spring flows. Hydraulically based methods involved the application of the Theis, Hantush-Jacob, and Darcy-Weisbach equations, whereas the statistically based methods were the multiple linear regressions and the technology of artificial neural networks (ANNs). Root mean square errors between measured and predicted spring flows using the Darcy-Weisbach method ranged between 5% and 15% of the measured flows, lower than the 7% to 27% range for the Theis or Hantush-Jacob methods. Flows at all springs were estimated to be turbulent based on the Reynolds number derived from the Darcy-Weisbach equation for conduit flow. The multiple linear regression and the Darcy-Weisbach methods had similar spring flow prediction capabilities. The ANNs provided the lowest residuals between measured and predicted spring flows, ranging from 1.6% to 5.3% of the measured flows. The model prediction efficiency criteria also indicated that the ANNs were the most accurate method predicting spring flows in a karst aquifer.

  1. Use of partial least squares regression for the multivariate calibration of hazardous air pollutants in open-path FT-IR spectrometry

    NASA Astrophysics Data System (ADS)

    Hart, Brian K.; Griffiths, Peter R.

    1998-06-01

    Partial least squares (PLS) regression has been evaluated as a robust calibration technique for over 100 hazardous air pollutants (HAPs) measured by open path Fourier transform infrared (OP/FT-IR) spectrometry. PLS has the advantage over the current recommended calibration method of classical least squares (CLS), in that it can look at the whole useable spectrum (700-1300 cm-1, 2000-2150 cm-1, and 2400-3000 cm-1), and detect several analytes simultaneously. Up to one hundred HAPs synthetically added to OP/FT-IR backgrounds have been simultaneously calibrated and detected using PLS. PLS also has the advantage in requiring less preprocessing of spectra than that which is required in CLS calibration schemes, allowing PLS to provide user independent real-time analysis of OP/FT-IR spectra.

  2. Intrinsic Raman spectroscopy for quantitative biological spectroscopy Part II

    PubMed Central

    Bechtel, Kate L.; Shih, Wei-Chuan; Feld, Michael S.

    2009-01-01

    We demonstrate the effectiveness of intrinsic Raman spectroscopy (IRS) at reducing errors caused by absorption and scattering. Physical tissue models, solutions of varying absorption and scattering coefficients with known concentrations of Raman scatterers, are studied. We show significant improvement in prediction error by implementing IRS to predict concentrations of Raman scatterers using both ordinary least squares regression (OLS) and partial least squares regression (PLS). In particular, we show that IRS provides a robust calibration model that does not increase in error when applied to samples with optical properties outside the range of calibration. PMID:18711512

  3. Estimation of Ordinary Differential Equation Parameters Using Constrained Local Polynomial Regression.

    PubMed

    Ding, A Adam; Wu, Hulin

    2014-10-01

    We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method.

  4. Estimation of Ordinary Differential Equation Parameters Using Constrained Local Polynomial Regression

    PubMed Central

    Ding, A. Adam; Wu, Hulin

    2015-01-01

    We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method. PMID:26401093

  5. Modified locally weighted--partial least squares regression improving clinical predictions from infrared spectra of human serum samples.

    PubMed

    Perez-Guaita, David; Kuligowski, Julia; Quintás, Guillermo; Garrigues, Salvador; Guardia, Miguel de la

    2013-03-30

    Locally weighted partial least squares regression (LW-PLSR) has been applied to the determination of four clinical parameters in human serum samples (total protein, triglyceride, glucose and urea contents) by Fourier transform infrared (FTIR) spectroscopy. Classical LW-PLSR models were constructed using different spectral regions. For the selection of parameters by LW-PLSR modeling, a multi-parametric study was carried out employing the minimum root-mean square error of cross validation (RMSCV) as objective function. In order to overcome the effect of strong matrix interferences on the predictive accuracy of LW-PLSR models, this work focuses on sample selection. Accordingly, a novel strategy for the development of local models is proposed. It was based on the use of: (i) principal component analysis (PCA) performed on an analyte specific spectral region for identifying most similar sample spectra and (ii) partial least squares regression (PLSR) constructed using the whole spectrum. Results found by using this strategy were compared to those provided by PLSR using the same spectral intervals as for LW-PLSR. Prediction errors found by both, classical and modified LW-PLSR improved those obtained by PLSR. Hence, both proposed approaches were useful for the determination of analytes present in a complex matrix as in the case of human serum samples. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. Methods for estimating selected low-flow frequency statistics and harmonic mean flows for streams in Iowa

    USGS Publications Warehouse

    Eash, David A.; Barnes, Kimberlee K.

    2017-01-01

    A statewide study was conducted to develop regression equations for estimating six selected low-flow frequency statistics and harmonic mean flows for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include: the annual 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years, the annual 30-day mean low flow for a recurrence interval of 5 years, and the seasonal (October 1 through December 31) 1- and 7-day mean low flows for a recurrence interval of 10 years. Estimation equations also were developed for the harmonic-mean-flow statistic. Estimates of these seven selected statistics are provided for 208 U.S. Geological Survey continuous-record streamgages using data through September 30, 2006. The study area comprises streamgages located within Iowa and 50 miles beyond the State's borders. Because trend analyses indicated statistically significant positive trends when considering the entire period of record for the majority of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. The median number of years of record used to compute each of these seven selected statistics was 35. Geographic information system software was used to measure 54 selected basin characteristics for each streamgage. Following the removal of two streamgages from the initial data set, data collected for 206 streamgages were compiled to investigate three approaches for regionalization of the seven selected statistics. Regionalization, a process using statistical regression analysis, provides a relation for efficiently transferring information from a group of streamgages in a region to ungaged sites in the region. The three regionalization approaches tested included statewide, regional, and region-of-influence regressions. For the regional regression, the study area was divided into three low-flow regions on the basis of hydrologic characteristics, landform regions, and soil regions. A comparison of root mean square errors and average standard errors of prediction for the statewide, regional, and region-of-influence regressions determined that the regional regression provided the best estimates of the seven selected statistics at ungaged sites in Iowa. Because a significant number of streams in Iowa reach zero flow as their minimum flow during low-flow years, four different types of regression analyses were used: left-censored, logistic, generalized-least-squares, and weighted-least-squares regression. A total of 192 streamgages were included in the development of 27 regression equations for the three low-flow regions. For the northeast and northwest regions, a censoring threshold was used to develop 12 left-censored regression equations to estimate the 6 low-flow frequency statistics for each region. For the southern region a total of 12 regression equations were developed; 6 logistic regression equations were developed to estimate the probability of zero flow for the 6 low-flow frequency statistics and 6 generalized least-squares regression equations were developed to estimate the 6 low-flow frequency statistics, if nonzero flow is estimated first by use of the logistic equations. A weighted-least-squares regression equation was developed for each region to estimate the harmonic-mean-flow statistic. Average standard errors of estimate for the left-censored equations for the northeast region range from 64.7 to 88.1 percent and for the northwest region range from 85.8 to 111.8 percent. Misclassification percentages for the logistic equations for the southern region range from 5.6 to 14.0 percent. Average standard errors of prediction for generalized least-squares equations for the southern region range from 71.7 to 98.9 percent and pseudo coefficients of determination for the generalized-least-squares equations range from 87.7 to 91.8 percent. Average standard errors of prediction for weighted-least-squares equations developed for estimating the harmonic-mean-flow statistic for each of the three regions range from 66.4 to 80.4 percent. The regression equations are applicable only to stream sites in Iowa with low flows not significantly affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. If the equations are used at ungaged sites on regulated streams, or on streams affected by water-supply and agricultural withdrawals, then the estimates will need to be adjusted by the amount of regulation or withdrawal to estimate the actual flow conditions if that is of interest. Caution is advised when applying the equations for basins with characteristics near the applicable limits of the equations and for basins located in karst topography. A test of two drainage-area ratio methods using 31 pairs of streamgages, for the annual 7-day mean low-flow statistic for a recurrence interval of 10 years, indicates a weighted drainage-area ratio method provides better estimates than regional regression equations for an ungaged site on a gaged stream in Iowa when the drainage-area ratio is between 0.5 and 1.4. These regression equations will be implemented within the U.S. Geological Survey StreamStats web-based geographic-information-system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the seven selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these seven selected statistics are provided for the streamgage.

  7. Renal Protective Role of Xiexin Decoction with Multiple Active Ingredients Involves Inhibition of Inflammation through Downregulation of the Nuclear Factor-κB Pathway in Diabetic Rats

    PubMed Central

    Wu, Jia-sheng; Shi, Rong; Zhong, Jie; Lu, Xiong; Ma, Bing-liang; Wang, Tian-ming; Zan, Bin; Ma, Yue-ming; Cheng, Neng-neng; Qiu, Fu-rong

    2013-01-01

    In Chinese medicine, Xiexin decoction (XXD) has been used for the clinical treatment of diabetes for at least 1700 years. The present study was conducted to investigate the effective ingredients of XXD and their molecular mechanisms of antidiabetic nephropathy in rats. Rats with diabetes induced by high-fat diet and streptozotocin were treated with XXD extract for 12 weeks. XXD significantly improved the glucolipid metabolism disorder, attenuated albuminuria and renal pathological changes, reduced renal advanced glycation end-products, inhibited receptor for advanced glycation end-product and inflammation factors expression, suppressed renal nuclear factor-κB pathway activity, and downregulated renal transforming growth factor-β1. The concentrations of multiple components in plasma from XXD were determined by liquid chromatography and tandem mass spectrometry. Pharmacokinetic/pharmacodynamic analysis using partial least square regression revealed that 8 ingredients of XXD were responsible for renal protective effects via actions on multiple molecular targets. Our study suggests that the renal protective role of XXD with multiple effective ingredients involves inhibition of inflammation through downregulation of the nuclear factor-κB pathway, reducing renal advanced glycation end-products and receptor for advanced glycation end-product in diabetic rats. PMID:23935673

  8. Sample Size Calculation for Estimating or Testing a Nonzero Squared Multiple Correlation Coefficient

    ERIC Educational Resources Information Center

    Krishnamoorthy, K.; Xia, Yanping

    2008-01-01

    The problems of hypothesis testing and interval estimation of the squared multiple correlation coefficient of a multivariate normal distribution are considered. It is shown that available one-sided tests are uniformly most powerful, and the one-sided confidence intervals are uniformly most accurate. An exact method of calculating sample size to…

  9. The study on the near infrared spectrum technology of sauce component analysis

    NASA Astrophysics Data System (ADS)

    Li, Shangyu; Zhang, Jun; Chen, Xingdan; Liang, Jingqiu; Wang, Ce

    2006-01-01

    The author, Shangyu Li, engages in supervising and inspecting the quality of products. In soy sauce manufacturing, quality control of intermediate and final products by many components such as total nitrogen, saltless soluble solids, nitrogen of amino acids and total acid is demanded. Wet chemistry analytical methods need much labor and time for these analyses. In order to compensate for this problem, we used near infrared spectroscopy technology to measure the chemical-composition of soy sauce. In the course of the work, a certain amount of soy sauce was collected and was analyzed by wet chemistry analytical methods. The soy sauce was scanned by two kinds of the spectrometer, the Fourier Transform near infrared spectrometer (FT-NIR spectrometer) and the filter near infrared spectroscopy analyzer. The near infrared spectroscopy of soy sauce was calibrated with the components of wet chemistry methods by partial least squares regression and stepwise multiple linear regression. The contents of saltless soluble solids, total nitrogen, total acid and nitrogen of amino acids were predicted by cross validation. The results are compared with the wet chemistry analytical methods. The correlation coefficient and root-mean-square error of prediction (RMSEP) in the better prediction run were found to be 0.961 and 0.206 for total nitrogen, 0.913 and 1.215 for saltless soluble solids, 0.855 and 0.199 nitrogen of amino acids, 0.966 and 0.231 for total acid, respectively. The results presented here demonstrate that the NIR spectroscopy technology is promising for fast and reliable determination of major components of soy sauce.

  10. Hierarchical cluster-based partial least squares regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models.

    PubMed

    Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald

    2011-06-01

    Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.

  11. Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) is an efficient tool for metamodelling of nonlinear dynamic models

    PubMed Central

    2011-01-01

    Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852

  12. Modeling Aboveground Biomass in Hulunber Grassland Ecosystem by Using Unmanned Aerial Vehicle Discrete Lidar

    PubMed Central

    Wang, Dongliang; Xin, Xiaoping; Shao, Quanqin; Brolly, Matthew; Zhu, Zhiliang; Chen, Jin

    2017-01-01

    Accurate canopy structure datasets, including canopy height and fractional cover, are required to monitor aboveground biomass as well as to provide validation data for satellite remote sensing products. In this study, the ability of an unmanned aerial vehicle (UAV) discrete light detection and ranging (lidar) was investigated for modeling both the canopy height and fractional cover in Hulunber grassland ecosystem. The extracted mean canopy height, maximum canopy height, and fractional cover were used to estimate the aboveground biomass. The influences of flight height on lidar estimates were also analyzed. The main findings are: (1) the lidar-derived mean canopy height is the most reasonable predictor of aboveground biomass (R2 = 0.340, root-mean-square error (RMSE) = 81.89 g·m−2, and relative error of 14.1%). The improvement of multiple regressions to the R2 and RMSE values is unobvious when adding fractional cover in the regression since the correlation between mean canopy height and fractional cover is high; (2) Flight height has a pronounced effect on the derived fractional cover and details of the lidar data, but the effect is insignificant on the derived canopy height when the flight height is within the range (<100 m). These findings are helpful for modeling stable regressions to estimate grassland biomass using lidar returns. PMID:28106819

  13. Modeling Aboveground Biomass in Hulunber Grassland Ecosystem by Using Unmanned Aerial Vehicle Discrete Lidar.

    PubMed

    Wang, Dongliang; Xin, Xiaoping; Shao, Quanqin; Brolly, Matthew; Zhu, Zhiliang; Chen, Jin

    2017-01-19

    Accurate canopy structure datasets, including canopy height and fractional cover, are required to monitor aboveground biomass as well as to provide validation data for satellite remote sensing products. In this study, the ability of an unmanned aerial vehicle (UAV) discrete light detection and ranging (lidar) was investigated for modeling both the canopy height and fractional cover in Hulunber grassland ecosystem. The extracted mean canopy height, maximum canopy height, and fractional cover were used to estimate the aboveground biomass. The influences of flight height on lidar estimates were also analyzed. The main findings are: (1) the lidar-derived mean canopy height is the most reasonable predictor of aboveground biomass ( R ² = 0.340, root-mean-square error (RMSE) = 81.89 g·m -2 , and relative error of 14.1%). The improvement of multiple regressions to the R ² and RMSE values is unobvious when adding fractional cover in the regression since the correlation between mean canopy height and fractional cover is high; (2) Flight height has a pronounced effect on the derived fractional cover and details of the lidar data, but the effect is insignificant on the derived canopy height when the flight height is within the range (<100 m). These findings are helpful for modeling stable regressions to estimate grassland biomass using lidar returns.

  14. Applicability of Monte Carlo cross validation technique for model development and validation using generalised least squares regression

    NASA Astrophysics Data System (ADS)

    Haddad, Khaled; Rahman, Ataur; A Zaman, Mohammad; Shrestha, Surendra

    2013-03-01

    SummaryIn regional hydrologic regression analysis, model selection and validation are regarded as important steps. Here, the model selection is usually based on some measurements of goodness-of-fit between the model prediction and observed data. In Regional Flood Frequency Analysis (RFFA), leave-one-out (LOO) validation or a fixed percentage leave out validation (e.g., 10%) is commonly adopted to assess the predictive ability of regression-based prediction equations. This paper develops a Monte Carlo Cross Validation (MCCV) technique (which has widely been adopted in Chemometrics and Econometrics) in RFFA using Generalised Least Squares Regression (GLSR) and compares it with the most commonly adopted LOO validation approach. The study uses simulated and regional flood data from the state of New South Wales in Australia. It is found that when developing hydrologic regression models, application of the MCCV is likely to result in a more parsimonious model than the LOO. It has also been found that the MCCV can provide a more realistic estimate of a model's predictive ability when compared with the LOO.

  15. Assessment of parametric uncertainty for groundwater reactive transport modeling,

    USGS Publications Warehouse

    Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun

    2014-01-01

    The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.

  16. Rex fortran 4 system for combinatorial screening or conventional analysis of multivariate regressions

    Treesearch

    L.R. Grosenbaugh

    1967-01-01

    Describes an expansible computerized system that provides data needed in regression or covariance analysis of as many as 50 variables, 8 of which may be dependent. Alternatively, it can screen variously generated combinations of independent variables to find the regression with the smallest mean-squared-residual, which will be fitted if desired. The user can easily...

  17. Pattern variation of fish fingerling abundance in the Na Thap Tidal river of Southern Thailand: 2005-2015

    NASA Astrophysics Data System (ADS)

    Donroman, T.; Chesoh, S.; Lim, A.

    2018-04-01

    This study aimed to investigate the variation patterns of fish fingerling abundance based on month, year and sampling site. Monthly collecting data set of the Na Thap tidal river of southern Thailand, were obtained from June 2005 to October 2015. The square root transformation was employed for maintaining the fingerling data normality. Factor analysis was applied for clustering number of fingerling species and multiple linear regression was used to examine the association between fingerling density and year, month and site. Results from factor analysis classified fingerling into 3 factors based on saline preference; saline water, freshwater and ubiquitous species. The results showed a statistically high significant relation between fingerling density, month, year and site. Abundance of saline water and ubiquitous fingerling density showed similar pattern. Downstream site presented highest fingerling density whereas almost of freshwater fingerling occurred in upstream. This finding confirmed that factor analysis and the general linear regression method can be used as an effective tool for predicting and monitoring wild fingerling density in order to sustain fish stock management.

  18. On the interannual oscillations in the northern temperate total ozone

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krzyscin, J.W.

    1994-07-01

    The interannual variations in total ozone are studied using revised Dobson total ozone records (1961-1990) from 17 stations located within the latitude band 30 deg N - 60 deg N. To obtain the quasi-biennial oscillation (QBO), El Nino-Southern Oscillation (ENSO), and 11-year solar cycle manifestation in the `northern temperate` total ozone data, various multiple regression models are constructed by the least squares fitting to the observed ozone. The statistical relationships between the selected indices of the atmospheric variabilities and total ozone are described in the linear and nonlinear regression models. Nonlinear relationships to the predictor variables are found. That is,more » the total ozone variations are statistically modeled by nonlinear terms accounting for the coupling between QBO and ENSO, QBO and solar activity, and ENSO and solar activity. It is suggested that large reduction of total ozone values over the `northern temperate` region occurs in cold season when a strong ENSO warm event meets the west phase of the QBO during the period of high solar activity.« less

  19. Peak-flow characteristics of Wyoming streams

    USGS Publications Warehouse

    Miller, Kirk A.

    2003-01-01

    Peak-flow characteristics for unregulated streams in Wyoming are described in this report. Frequency relations for annual peak flows through water year 2000 at 364 streamflow-gaging stations in and near Wyoming were evaluated and revised or updated as needed. Analyses of historical floods, temporal trends, and generalized skew were included in the evaluation. Physical and climatic basin characteristics were determined for each gaging station using a geographic information system. Gaging stations with similar peak-flow and basin characteristics were grouped into six hydrologic regions. Regional statistical relations between peak-flow and basin characteristics were explored using multiple-regression techniques. Generalized least squares regression equations for estimating magnitudes of annual peak flows with selected recurrence intervals from 1.5 to 500 years were developed for each region. Average standard errors of estimate range from 34 to 131 percent. Average standard errors of prediction range from 35 to 135 percent. Several statistics for evaluating and comparing the errors in these estimates are described. Limitations of the equations are described. Methods for applying the regional equations for various circumstances are listed and examples are given.

  20. Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation.

    PubMed

    Smeers, Inge; Decorte, Ronny; Van de Voorde, Wim; Bekaert, Bram

    2018-05-01

    DNA methylation is a promising biomarker for forensic age prediction. A challenge that has emerged in recent studies is the fact that prediction errors become larger with increasing age due to interindividual differences in epigenetic ageing rates. This phenomenon of non-constant variance or heteroscedasticity violates an assumption of the often used method of ordinary least squares (OLS) regression. The aim of this study was to evaluate alternative statistical methods that do take heteroscedasticity into account in order to provide more accurate, age-dependent prediction intervals. A weighted least squares (WLS) regression is proposed as well as a quantile regression model. Their performances were compared against an OLS regression model based on the same dataset. Both models provided age-dependent prediction intervals which account for the increasing variance with age, but WLS regression performed better in terms of success rate in the current dataset. However, quantile regression might be a preferred method when dealing with a variance that is not only non-constant, but also not normally distributed. Ultimately the choice of which model to use should depend on the observed characteristics of the data. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Paleotemperature reconstruction from mammalian phosphate δ18O records - an alternative view on data processing

    NASA Astrophysics Data System (ADS)

    Skrzypek, Grzegorz; Sadler, Rohan; Wiśniewski, Andrzej

    2017-04-01

    The stable oxygen isotope composition of phosphates (δ18O) extracted from mammalian bone and teeth material is commonly used as a proxy for paleotemperature. Historically, several different analytical and statistical procedures for determining air paleotemperatures from the measured δ18O of phosphates have been applied. This inconsistency in both stable isotope data processing and the application of statistical procedures has led to large and unwanted differences between calculated results. This study presents the uncertainty associated with two of the most commonly used regression methods: least squares inverted fit and transposed fit. We assessed the performance of these methods by designing and applying calculation experiments to multiple real-life data sets, calculating in reverse temperatures, and comparing them with true recorded values. Our calculations clearly show that the mean absolute errors are always substantially higher for the inverted fit (a causal model), with the transposed fit (a predictive model) returning mean values closer to the measured values (Skrzypek et al. 2015). The predictive models always performed better than causal models, with 12-65% lower mean absolute errors. Moreover, the least-squares regression (LSM) model is more appropriate than Reduced Major Axis (RMA) regression for calculating the environmental water stable oxygen isotope composition from phosphate signatures, as well as for calculating air temperature from the δ18O value of environmental water. The transposed fit introduces a lower overall error than the inverted fit for both the δ18O of environmental water and Tair calculations; therefore, the predictive models are more statistically efficient than the causal models in this instance. The direct comparison of paleotemperature results from different laboratories and studies may only be achieved if a single method of calculation is applied. Reference Skrzypek G., Sadler R., Wiśniewski A., 2016. Reassessment of recommendations for processing mammal phosphate δ18O data for paleotemperature reconstruction. Palaeogeography, Palaeoclimatology, Palaeoecology 446, 162-167.

  2. Prediction of Baseflow Index of Catchments using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Yadav, B.; Hatfield, K.

    2017-12-01

    We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.

  3. Near-infrared Spectroscopy as a Process Analytical Technology Tool for Monitoring the Parching Process of Traditional Chinese Medicine Based on Two Kinds of Chemical Indicators.

    PubMed

    Li, Kaiyue; Wang, Weiying; Liu, Yanping; Jiang, Su; Huang, Guo; Ye, Liming

    2017-01-01

    The active ingredients and thus pharmacological efficacy of traditional Chinese medicine (TCM) at different degrees of parching process vary greatly. Near-infrared spectroscopy (NIR) was used to develop a new method for rapid online analysis of TCM parching process, using two kinds of chemical indicators (5-(hydroxymethyl) furfural [5-HMF] content and 420 nm absorbance) as reference values which were obviously observed and changed in most TCM parching process. Three representative TCMs, Areca ( Areca catechu L.), Malt ( Hordeum Vulgare L.), and Hawthorn ( Crataegus pinnatifida Bge.), were used in this study. With partial least squares regression, calibration models of NIR were generated based on two kinds of reference values, i.e. 5-HMF contents measured by high-performance liquid chromatography (HPLC) and 420 nm absorbance measured by ultraviolet-visible spectroscopy (UV/Vis), respectively. In the optimized models for 5-HMF, the root mean square errors of prediction (RMSEP) for Areca, Malt, and Hawthorn was 0.0192, 0.0301, and 0.2600 and correlation coefficients ( R cal ) were 99.86%, 99.88%, and 99.88%, respectively. Moreover, in the optimized models using 420 nm absorbance as reference values, the RMSEP for Areca, Malt, and Hawthorn was 0.0229, 0.0096, and 0.0409 and R cal were 99.69%, 99.81%, and 99.62%, respectively. NIR models with 5-HMF content and 420 nm absorbance as reference values can rapidly and effectively identify three kinds of TCM in different parching processes. This method has great promise to replace current subjective color judgment and time-consuming HPLC or UV/Vis methods and is suitable for rapid online analysis and quality control in TCM industrial manufacturing process. Near-infrared spectroscopy.(NIR) was used to develop a new method for online analysis of traditional Chinese medicine.(TCM) parching processCalibration and validation models of Areca, Malt, and Hawthorn were generated by partial least squares regression using 5.(hydroxymethyl) furfural contents and 420.nm absorbance as reference values, respectively, which were main indicator components during parching process of most TCMThe established NIR models of three TCMs had low root mean square errors of prediction and high correlation coefficientsThe NIR method has great promise for use in TCM industrial manufacturing processes for rapid online analysis and quality control. Abbreviations used: NIR: Near-infrared Spectroscopy; TCM: Traditional Chinese medicine; Areca: Areca catechu L.; Hawthorn: Crataegus pinnatifida Bge.; Malt: Hordeum vulgare L.; 5-HMF: 5-(hydroxymethyl) furfural; PLS: Partial least squares; D: Dimension faction; SLS: Straight line subtraction, MSC: Multiplicative scatter correction; VN: Vector normalization; RMSECV: Root mean square errors of cross-validation; RMSEP: Root mean square errors of validation; R cal : Correlation coefficients; RPD: Residual predictive deviation; PAT: Process analytical technology; FDA: Food and Drug Administration; ICH: International Conference on Harmonization of Technical Requirements for Registration of Pharmaceuticals for Human Use.

  4. A statistical methodology for estimating transport parameters: Theory and applications to one-dimensional advectivec-dispersive systems

    USGS Publications Warehouse

    Wagner, Brian J.; Gorelick, Steven M.

    1986-01-01

    A simulation nonlinear multiple-regression methodology for estimating parameters that characterize the transport of contaminants is developed and demonstrated. Finite difference contaminant transport simulation is combined with a nonlinear weighted least squares multiple-regression procedure. The technique provides optimal parameter estimates and gives statistics for assessing the reliability of these estimates under certain general assumptions about the distributions of the random measurement errors. Monte Carlo analysis is used to estimate parameter reliability for a hypothetical homogeneous soil column for which concentration data contain large random measurement errors. The value of data collected spatially versus data collected temporally was investigated for estimation of velocity, dispersion coefficient, effective porosity, first-order decay rate, and zero-order production. The use of spatial data gave estimates that were 2–3 times more reliable than estimates based on temporal data for all parameters except velocity. Comparison of estimated linear and nonlinear confidence intervals based upon Monte Carlo analysis showed that the linear approximation is poor for dispersion coefficient and zero-order production coefficient when data are collected over time. In addition, examples demonstrate transport parameter estimation for two real one-dimensional systems. First, the longitudinal dispersivity and effective porosity of an unsaturated soil are estimated using laboratory column data. We compare the reliability of estimates based upon data from individual laboratory experiments versus estimates based upon pooled data from several experiments. Second, the simulation nonlinear regression procedure is extended to include an additional governing equation that describes delayed storage during contaminant transport. The model is applied to analyze the trends, variability, and interrelationship of parameters in a mourtain stream in northern California.

  5. Brain networks of temporal preparation: A multiple regression analysis of neuropsychological data.

    PubMed

    Triviño, Mónica; Correa, Ángel; Lupiáñez, Juan; Funes, María Jesús; Catena, Andrés; He, Xun; Humphreys, Glyn W

    2016-11-15

    There are only a few studies on the brain networks involved in the ability to prepare in time, and most of them followed a correlational rather than a neuropsychological approach. The present neuropsychological study performed multiple regression analysis to address the relationship between both grey and white matter (measured by magnetic resonance imaging in patients with brain lesion) and different effects in temporal preparation (Temporal orienting, Foreperiod and Sequential effects). Two versions of a temporal preparation task were administered to a group of 23 patients with acquired brain injury. In one task, the cue presented (a red versus green square) to inform participants about the time of appearance (early versus late) of a target stimulus was blocked, while in the other task the cue was manipulated on a trial-by-trial basis. The duration of the cue-target time intervals (400 versus 1400ms) was always manipulated within blocks in both tasks. Regression analysis were conducted between either the grey matter lesion size or the white matter tracts disconnection and the three temporal preparation effects separately. The main finding was that each temporal preparation effect was predicted by a different network of structures, depending on cue expectancy. Specifically, the Temporal orienting effect was related to both prefrontal and temporal brain areas. The Foreperiod effect was related to right and left prefrontal structures. Sequential effects were predicted by both parietal cortex and left subcortical structures. These findings show a clear dissociation of brain circuits involved in the different ways to prepare in time, showing for the first time the involvement of temporal areas in the Temporal orienting effect, as well as the parietal cortex in the Sequential effects. Copyright © 2016 Elsevier Inc. All rights reserved.

  6. Production of deerbrush and mountain whitethorn related to shrub volume and overstory crown closure

    Treesearch

    John G. Kie

    1985-01-01

    Annual production by deerbrush (Ceanothus integerrimus) and mountain whitethorn shrubs (C. cordulatus) in the south-central Sierra Nevada of California was related to shrub volume, volume squared, and overstory crown closure by regression models. production increased as shrub volume and volume squared increased, and decreased as...

  7. Relationship between serum bilirubin concentrations and diabetic nephropathy in Shanghai Han's patients with type 1 diabetes mellitus.

    PubMed

    Li, Xu; Zhang, Lei; Chen, Haibing; Guo, Kaifeng; Yu, Haoyong; Zhou, Jian; Li, Ming; Li, Qing; Li, Lianxi; Yin, Jun; Liu, Fang; Bao, Yuqian; Han, Junfeng; Jia, Weiping

    2017-03-31

    Recent studies highlight a negative association between total bilirubin concentrations and albuminuria in patients with type 2 diabetes mellitus. Our study evaluated the relationship between bilirubin concentrations and the prevalence of diabetic nephropathy (DN) in Chinese patients with type 1 diabetes mellitus (T1DM). A total of 258 patients with T1DM were recruited and bilirubin concentrations were compared between patients with or without diabetic nephropathy. Multiple stepwise regression analysis was used to examine the relationship between bilirubin concentrations and 24 h urinary microalbumin. Binary logistic regression analysis was performed to assess independent risk factors for diabetic nephropathy. Participants were divided into four groups according to the quartile of total bilirubin concentrations (Q1, 0.20-0.60; Q2, 0.60-0.80; Q3, 0.80-1.00; Q4, 1.00-1.90 mg/dL) and the chi-square test was used to compare the prevalence of DN in patients with T1DM. The median bilirubin level was 0.56 (interquartile: 0.43-0.68 mg/dL) in the DN group, significantly lower than in the non-DN group (0.70 [interquartile: 0.58-0.89 mg/dL], P < 0.001). Spearman's correlational analysis showed bilirubin concentrations were inversely correlated with 24 h urinary microalbumin (r = -0.13, P < 0.05) and multiple stepwise regression analysis showed bilirubin concentrations were independently associated with 24 h urinary microalbumin. In logistic regression analysis, bilirubin concentrations were significantly inversely associated with nephropathy. In addition, in stratified analysis, from the first to the fourth quartile group, increased bilirubin concentrations were associated with decreased prevalence of DN from 21.90% to 2.00%. High bilirubin concentrations are independently and negatively associated with albuminuria and the prevalence of DN in patients with T1DM.

  8. Post-processing through linear regression

    NASA Astrophysics Data System (ADS)

    van Schaeybroeck, B.; Vannitsem, S.

    2011-03-01

    Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.

  9. Synchronization of tunable asymmetric square-wave pulses in delay-coupled optoelectronic oscillators.

    PubMed

    Martínez-Llinàs, Jade; Colet, Pere; Erneux, Thomas

    2015-03-01

    We consider a model for two delay-coupled optoelectronic oscillators under positive delayed feedback as prototypical to study the conditions for synchronization of asymmetric square-wave oscillations, for which the duty cycle is not half of the period. We show that the scenario arising for positive feedback is much richer than with negative feedback. First, it allows for the coexistence of multiple in- and out-of-phase asymmetric periodic square waves for the same parameter values. Second, it is tunable: The period of all the square-wave periodic pulses can be tuned with the ratio of the delays, and the duty cycle of the asymmetric square waves can be changed with the offset phase while the total period remains constant. Finally, in addition to the multiple in- and out-of-phase periodic square waves, low-frequency periodic asymmetric solutions oscillating in phase may coexist for the same values of the parameters. Our analytical results are in agreement with numerical simulations and bifurcation diagrams obtained by using continuation techniques.

  10. Psychological effects of dog ownership: role strain, role enhancement, and depression.

    PubMed

    Cline, Krista Marie Clark

    2010-01-01

    The purpose of this study is to examine the link between multiple roles and depression and to attempt to provide a clearer answer to the question of what effect, if any, the role of dog ownership plays. Role strain and role enhancement theories are drawn upon to study this relationship. Ordinary least squares regression is used to examine a national sample of 201 adults in the United States. Findings revealed sex and marital status differences in the relationship between dog ownership and well-being, with women and single adults more likely to benefit from dog ownership. The findings presented here suggest that inattention to variations in marital status and sex may have been one factor in the inconsistency in the literature on pets and well-being.

  11. In situ Raman spectroscopy for simultaneous monitoring of multiple process parameters in mammalian cell culture bioreactors.

    PubMed

    Whelan, Jessica; Craven, Stephen; Glennon, Brian

    2012-01-01

    In this study, the application of Raman spectroscopy to the simultaneous quantitative determination of glucose, glutamine, lactate, ammonia, glutamate, total cell density (TCD), and viable cell density (VCD) in a CHO fed-batch process was demonstrated in situ in 3 L and 15 L bioreactors. Spectral preprocessing and partial least squares (PLS) regression were used to correlate spectral data with off-line reference data. Separate PLS calibration models were developed for each analyte at the 3 L laboratory bioreactor scale before assessing its transferability to the same bioprocess conducted at the 15 L pilot scale. PLS calibration models were successfully developed for all analytes bar VCD and transferred to the 15 L scale. Copyright © 2012 American Institute of Chemical Engineers (AIChE).

  12. More caregiving, less working: caregiving roles and gender difference.

    PubMed

    Lee, Yeonjung; Tang, Fengyan

    2015-06-01

    This study examined the relationship of caregiving roles to labor force participation using the nationally representative data from the Health and Retirement Study. The sample was composed of men and women aged 50 to 61 years (N = 5,119). Caregiving roles included caregiving for spouse, parents, and grandchildren; a summary of three caregiving roles was used to indicate multiple caregiving roles. Bivariate analysis using chi-square and t tests and binary logistic regression models were applied. Results show that women caregivers for parents and/or grandchildren were less likely to be in the labor force than non-caregivers and that caregiving responsibility was not related to labor force participation for the sample of men. Findings have implication for supporting family caregivers, especially women, to balance work and caregiving commitments. © The Author(s) 2013.

  13. Regression Analysis: Instructional Resource for Cost/Managerial Accounting

    ERIC Educational Resources Information Center

    Stout, David E.

    2015-01-01

    This paper describes a classroom-tested instructional resource, grounded in principles of active learning and a constructivism, that embraces two primary objectives: "demystify" for accounting students technical material from statistics regarding ordinary least-squares (OLS) regression analysis--material that students may find obscure or…

  14. An index of effluent aquatic toxicity designed by partial least squares regression, using acute and chronic tests and expert judgements.

    PubMed

    Vindimian, Éric; Garric, Jeanne; Flammarion, Patrick; Thybaud, Éric; Babut, Marc

    1999-10-01

    The evaluation of the ecotoxicity of effluents requires a battery of biological tests on several species. In order to derive a summary parameter from such a battery, a single endpoint was calculated for all the tests: the EC10, obtained by nonlinear regression, with bootstrap evaluation of the confidence intervals. Principal component analysis was used to characterize and visualize the correlation between the tests. The table of the toxicity of the effluents was then submitted to a panel of experts, who classified the effluents according to the test results. Partial least squares (PLS) regression was used to fit the average value of the experts' judgements to the toxicity data, using a simple equation. Furthermore, PLS regression on partial data sets and other considerations resulted in an optimum battery, with two chronic tests and one acute test. The index is intended to be used for the classification of effluents based on their toxicity to aquatic species. Copyright © 1999 SETAC.

  15. An index of effluent aquatic toxicity designed by partial least squares regression, using acute and chronic tests and expert judgments

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vindimian, E.; Garric, J.; Flammarion, P.

    1999-10-01

    The evaluation of the ecotoxicity of effluents requires a battery of biological tests on several species. In order to derive a summary parameter from such a battery, a single endpoint was calculated for all the tests: the EC10, obtained by nonlinear regression, with bootstrap evaluation of the confidence intervals. Principal component analysis was used to characterize and visualize the correlation between the tests. The table of the toxicity of the effluents was then submitted to a panel of experts, who classified the effluents according to the test results. Partial least squares (PLS) regression was used to fit the average valuemore » of the experts' judgments to the toxicity data, using a simple equation. Furthermore, PLS regression on partial data sets and other considerations resulted in an optimum battery, with two chronic tests and one acute test. The index is intended to be used for the classification of effluents based on their toxicity to aquatic species.« less

  16. Electricity Consumption in the Industrial Sector of Jordan: Application of Multivariate Linear Regression and Adaptive Neuro-Fuzzy Techniques

    NASA Astrophysics Data System (ADS)

    Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.

    2009-08-01

    In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.

  17. Quantile regression applied to spectral distance decay

    USGS Publications Warehouse

    Rocchini, D.; Cade, B.S.

    2008-01-01

    Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.01), considering both OLS and quantile regressions. Nonetheless, the OLS regression estimate of the mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when the spectral distance approaches zero, was very low compared with the intercepts of the upper quantiles, which detected high species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.

  18. [Study on the early detection of Sclerotinia of Brassica napus based on combinational-stimulated bands].

    PubMed

    Liu, Fei; Feng, Lei; Lou, Bing-gan; Sun, Guang-ming; Wang, Lian-ping; He, Yong

    2010-07-01

    The combinational-stimulated bands were used to develop linear and nonlinear calibrations for the early detection of sclerotinia of oilseed rape (Brassica napus L.). Eighty healthy and 100 Sclerotinia leaf samples were scanned, and different preprocessing methods combined with successive projections algorithm (SPA) were applied to develop partial least squares (PLS) discriminant models, multiple linear regression (MLR) and least squares-support vector machine (LS-SVM) models. The results indicated that the optimal full-spectrum PLS model was achieved by direct orthogonal signal correction (DOSC), then De-trending and Raw spectra with correct recognition ratio of 100%, 95.7% and 95.7%, respectively. When using combinational-stimulated bands, the optimal linear models were SPA-MLR (DOSC) and SPA-PLS (DOSC) with correct recognition ratio of 100%. All SPA-LSSVM models using DOSC, De-trending and Raw spectra achieved perfect results with recognition of 100%. The overall results demonstrated that it was feasible to use combinational-stimulated bands for the early detection of Sclerotinia of oilseed rape, and DOSC-SPA was a powerful way for informative wavelength selection. This method supplied a new approach to the early detection and portable monitoring instrument of sclerotinia.

  19. Error Covariance Penalized Regression: A novel multivariate model combining penalized regression with multivariate error structure.

    PubMed

    Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C

    2018-06-29

    A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Bias due to two-stage residual-outcome regression analysis in genetic association studies.

    PubMed

    Demissie, Serkalem; Cupples, L Adrienne

    2011-11-01

    Association studies of risk factors and complex diseases require careful assessment of potential confounding factors. Two-stage regression analysis, sometimes referred to as residual- or adjusted-outcome analysis, has been increasingly used in association studies of single nucleotide polymorphisms (SNPs) and quantitative traits. In this analysis, first, a residual-outcome is calculated from a regression of the outcome variable on covariates and then the relationship between the adjusted-outcome and the SNP is evaluated by a simple linear regression of the adjusted-outcome on the SNP. In this article, we examine the performance of this two-stage analysis as compared with multiple linear regression (MLR) analysis. Our findings show that when a SNP and a covariate are correlated, the two-stage approach results in biased genotypic effect and loss of power. Bias is always toward the null and increases with the squared-correlation between the SNP and the covariate (). For example, for , 0.1, and 0.5, two-stage analysis results in, respectively, 0, 10, and 50% attenuation in the SNP effect. As expected, MLR was always unbiased. Since individual SNPs often show little or no correlation with covariates, a two-stage analysis is expected to perform as well as MLR in many genetic studies; however, it produces considerably different results from MLR and may lead to incorrect conclusions when independent variables are highly correlated. While a useful alternative to MLR under , the two -stage approach has serious limitations. Its use as a simple substitute for MLR should be avoided. © 2011 Wiley Periodicals, Inc.

  1. Diagnostic and psychosocial differences in psychiatrically hospitalized military service members with single versus multiple suicide attempts.

    PubMed

    Kochanski-Ruscio, Kristen M; Carreno-Ponce, Jaime T; DeYoung, Kathryn; Grammer, Geoffrey; Ghahramanlou-Holloway, Marjan

    2014-04-01

    Individuals with multiple versus single suicide attempts present a more severe clinical picture and may be at greater risk for suicide. Yet group differences within military samples have been vastly understudied. The objective is to determine demographic, diagnostic, and psychosocial differences, based on suicide attempt status, among military inpatients admitted for suicide-related events. A retrospective chart review design was used with a total of 423 randomly selected medical records of psychiatric admissions to a military hospital from 2001 to 2006. Chi-square analyses indicated that individuals with multiple versus single suicide attempts were significantly more likely to have documented childhood sexual abuse (p =.025); problem substance use (p=.001); mood disorder diagnosis (p=.005); substance disorder diagnosis (p =.050); personality disorder not otherwise specified diagnosis (p =.018); and Axis II traits or diagnosis (p=.038) when compared to those with a single attempt history. Logistic regression analyses showed that males with multiple suicide attempts were more likely to have problem substance use (p=.005) and a mood disorder diagnosis (p =.002), while females with a multiple attempt history were more likely to have a history of childhood sexual (p =.027). Clinically meaningful differences among military inpatients with single versus multiple suicide attempts exist. Targeted Department of Defense suicide prevention and intervention efforts that address the unique needs of these two specific at-risk subgroups are additionally needed. Published by Elsevier Inc.

  2. Principal components and iterative regression analysis of geophysical series: Application to Sunspot number (1750 2004)

    NASA Astrophysics Data System (ADS)

    Nordemann, D. J. R.; Rigozo, N. R.; de Souza Echer, M. P.; Echer, E.

    2008-11-01

    We present here an implementation of a least squares iterative regression method applied to the sine functions embedded in the principal components extracted from geophysical time series. This method seems to represent a useful improvement for the non-stationary time series periodicity quantitative analysis. The principal components determination followed by the least squares iterative regression method was implemented in an algorithm written in the Scilab (2006) language. The main result of the method is to obtain the set of sine functions embedded in the series analyzed in decreasing order of significance, from the most important ones, likely to represent the physical processes involved in the generation of the series, to the less important ones that represent noise components. Taking into account the need of a deeper knowledge of the Sun's past history and its implication to global climate change, the method was applied to the Sunspot Number series (1750-2004). With the threshold and parameter values used here, the application of the method leads to a total of 441 explicit sine functions, among which 65 were considered as being significant and were used for a reconstruction that gave a normalized mean squared error of 0.146.

  3. On Quantile Regression in Reproducing Kernel Hilbert Spaces with Data Sparsity Constraint

    PubMed Central

    Zhang, Chong; Liu, Yufeng; Wu, Yichao

    2015-01-01

    For spline regressions, it is well known that the choice of knots is crucial for the performance of the estimator. As a general learning framework covering the smoothing splines, learning in a Reproducing Kernel Hilbert Space (RKHS) has a similar issue. However, the selection of training data points for kernel functions in the RKHS representation has not been carefully studied in the literature. In this paper we study quantile regression as an example of learning in a RKHS. In this case, the regular squared norm penalty does not perform training data selection. We propose a data sparsity constraint that imposes thresholding on the kernel function coefficients to achieve a sparse kernel function representation. We demonstrate that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty. Therefore, the data sparsity method can serve as a competitive alternative to the squared norm penalty method. Some theoretical properties of our proposed method using the data sparsity constraint are obtained. Both simulated and real data sets are used to demonstrate the usefulness of our data sparsity constraint. PMID:27134575

  4. Mixed geographically weighted regression (MGWR) model with weighted adaptive bi-square for case of dengue hemorrhagic fever (DHF) in Surakarta

    NASA Astrophysics Data System (ADS)

    Astuti, H. N.; Saputro, D. R. S.; Susanti, Y.

    2017-06-01

    MGWR model is combination of linear regression model and geographically weighted regression (GWR) model, therefore, MGWR model could produce parameter estimation that had global parameter estimation, and other parameter that had local parameter in accordance with its observation location. The linkage between locations of the observations expressed in specific weighting that is adaptive bi-square. In this research, we applied MGWR model with weighted adaptive bi-square for case of DHF in Surakarta based on 10 factors (variables) that is supposed to influence the number of people with DHF. The observation unit in the research is 51 urban villages and the variables are number of inhabitants, number of houses, house index, many public places, number of healthy homes, number of Posyandu, area width, level population density, welfare of the family, and high-region. Based on this research, we obtained 51 MGWR models. The MGWR model were divided into 4 groups with significant variable is house index as a global variable, an area width as a local variable and the remaining variables vary in each. Global variables are variables that significantly affect all locations, while local variables are variables that significantly affect a specific location.

  5. Ordinary least squares regression is indicated for studies of allometry.

    PubMed

    Kilmer, J T; Rodríguez, R L

    2017-01-01

    When it comes to fitting simple allometric slopes through measurement data, evolutionary biologists have been torn between regression methods. On the one hand, there is the ordinary least squares (OLS) regression, which is commonly used across many disciplines of biology to fit lines through data, but which has a reputation for underestimating slopes when measurement error is present. On the other hand, there is the reduced major axis (RMA) regression, which is often recommended as a substitute for OLS regression in studies of allometry, but which has several weaknesses of its own. Here, we review statistical theory as it applies to evolutionary biology and studies of allometry. We point out that the concerns that arise from measurement error for OLS regression are small and straightforward to deal with, whereas RMA has several key properties that make it unfit for use in the field of allometry. The recommended approach for researchers interested in allometry is to use OLS regression on measurements taken with low (but realistically achievable) measurement error. If measurement error is unavoidable and relatively large, it is preferable to correct for slope attenuation rather than to turn to RMA regression, or to take the expected amount of attenuation into account when interpreting the data. © 2016 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2016 European Society For Evolutionary Biology.

  6. Least-Squares Regression and Spectral Residual Augmented Classical Least-Squares Chemometric Models for Stability-Indicating Analysis of Agomelatine and Its Degradation Products: A Comparative Study.

    PubMed

    Naguib, Ibrahim A; Abdelrahman, Maha M; El Ghobashy, Mohamed R; Ali, Nesma A

    2016-01-01

    Two accurate, sensitive, and selective stability-indicating methods are developed and validated for simultaneous quantitative determination of agomelatine (AGM) and its forced degradation products (Deg I and Deg II), whether in pure forms or in pharmaceutical formulations. Partial least-squares regression (PLSR) and spectral residual augmented classical least-squares (SRACLS) are two chemometric models that are being subjected to a comparative study through handling UV spectral data in range (215-350 nm). For proper analysis, a three-factor, four-level experimental design was established, resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of eight mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze AGM, Deg I, and Deg II with high selectivity and accuracy. The analysis results of the pharmaceutical formulations were statistically compared to the reference HPLC method, with no significant differences observed regarding accuracy and precision. The SRACLS model gives comparable results to the PLSR model; however, it keeps the qualitative spectral information of the classical least-squares algorithm for analyzed components.

  7. Quality of semen: a 6-year single experience study on 5680 patients.

    PubMed

    Cozzolino, Mauro; Coccia, Maria E; Picone, Rita

    2018-02-08

    The aim of our study was to evaluate the quality of semen of a large sample from general healthy population living in Italy, in order to identify possible variables that could influence several parameters of spermiogram. We conducted a cross-sectional study from February 2010 to March 2015, collecting semen samples from the general population. Semen analysis was performed according to the WHO guidelines. The collected data were inserted in a database and processed using the software Stata 12. The Mann - Whitney test was used to assess the relationship of dichotomus variables with the parameters of the spermiogram; Kruskal-Wallis test for variables with more than two categories. We used also Robust regression and Spearman correlation to analyze the relationship between age and the parameters. We collected 5680 samples of semen. The mean age of our patients was 41.4 years old. Mann-Whitney test showed that the citizenship (codified as "Italian/Foreign") influences some parameters: pH, vitality, number of spermatozoa, sperm concentration, with worse results for the Italian group. Kruskal-Wallis test showed that the single nationality influences pH, volume, Sperm motility A-B-C-D, vitality, morphology, number of spermatozoa, sperm concentration. Robust regression showed a relationship between age and several parameters: volume (p=0.04, R squared= 0.0007 β: - 0.06); sperm motility A (p<0.01; R squared 0.0051 β: 0.02); sperm motility B (p<0.01; R squared 0.02 β: -0.35); sperm motility C (p<0.01; R squared 0.01 β: 0.12); sperm motility D (p<0.01; R squared 0.006 β: 0.2); vitality (p<0.01; R squared 0.01 β: -0.32); sperm concentration (p=0.01; R squared 0.001 β: 0.19). Our patients had spermiogram's results quite better than the standard guidelines. Our study showed that the country of origin could be a factor influencing several parameters of the spermiogram in healthy population and through Robust regression confirmed a strict correlation between age and these parameters.

  8. Method for nonlinear exponential regression analysis

    NASA Technical Reports Server (NTRS)

    Junkin, B. G.

    1972-01-01

    Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.

  9. Estimates of Median Flows for Streams on the 1999 Kansas Surface Water Register

    USGS Publications Warehouse

    Perry, Charles A.; Wolock, David M.; Artman, Joshua C.

    2004-01-01

    The Kansas State Legislature, by enacting Kansas Statute KSA 82a?2001 et. seq., mandated the criteria for determining which Kansas stream segments would be subject to classification by the State. One criterion for the selection as a classified stream segment is based on the statistic of median flow being equal to or greater than 1 cubic foot per second. As specified by KSA 82a?2001 et. seq., median flows were determined from U.S. Geological Survey streamflow-gaging-station data by using the most-recent 10 years of gaged data (KSA) for each streamflow-gaging station. Median flows also were determined by using gaged data from the entire period of record (all-available hydrology, AAH). Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating median flows for uncontrolled stream segments. The drainage area of the gaging stations on uncontrolled stream segments used in the regression analyses ranged from 2.06 to 12,004 square miles. A logarithmic transformation of the data was needed to develop the best linear relation for computing median flows. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. Tobit analyses of KSA data yielded a model standard error of prediction of 0.285 logarithmic units, and the best equations using Tobit analyses of AAH data had a model standard error of prediction of 0.250 logarithmic units. These regression equations and an interpolation procedure were used to compute median flows for the uncontrolled stream segments on the 1999 Kansas Surface Water Register. Measured median flows from gaging stations were incorporated into the regression-estimated median flows along the stream segments where available. The segments that were uncontrolled were interpolated using gaged data weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled segments of Kansas streams, the median flow information was interpolated between gaging stations using only gaged data weighted by drainage area. Of the 2,232 total stream segments on the Kansas Surface Water Register, 34.5 percent of the segments had an estimated median streamflow of less than 1 cubic foot per second when the KSA analysis was used. When the AAH analysis was used, 36.2 percent of the segments had an estimated median streamflow of less than 1 cubic foot per second. This report supercedes U.S. Geological Survey Water-Resources Investigations Report 02?4292.

  10. Inverse models: A necessary next step in ground-water modeling

    USGS Publications Warehouse

    Poeter, E.P.; Hill, M.C.

    1997-01-01

    Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares repression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares regression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.

  11. The Impact of School Socioeconomic Status on Student-Generated Teacher Ratings

    ERIC Educational Resources Information Center

    Agnew, Steve

    2011-01-01

    This paper uses ordinary least squares, logit and probit regressions, along with chi-square analysis applied to nationwide data from the New Zealand ratemyteacher website to establish if there is any correlation between student ratings of their teachers and the socioeconomic status of the school the students attend. The results show that students…

  12. Maritime Adaptive Optics Beam Control

    DTIC Science & Technology

    2010-09-01

    Liquid Crystal LMS Least Mean Square MIMO Multiple- Input Multiple-Output MMDM Micromachined Membrane Deformable Mirror MSE Mean Square Error...determine how the beam is distorted, a control computer to calculate the correction to be applied, and a corrective element, usually a deformable mirror ...during this research, an overview of the system modification is provided here. Using additional mirrors and reflecting the beam to and from an

  13. Assessment of blood donation intention among medical students in Pakistan--An application of theory of planned behavior.

    PubMed

    Faqah, Anadil; Moiz, Bushra; Shahid, Fatima; Ibrahim, Mariam; Raheem, Ahmed

    2015-12-01

    Theory of Planned Behavior proposes a model which can measure how human actions are guided. It has been successfully utilized in the context of blood donation. We employed a decision-making framework to determine the intention of blood donation among medical students who have never donated blood before the study. Survey responses were collected from 391 medical students from four various universities on a defined questionnaire. The tool composed of 20 questions that were formulated to explain donation intention based on theory of planned behavior. The construct included questions related to attitude, subjective norm and perceived behavior control, descriptive norm, moral norm, anticipated regret, donation anxiety and religious norm. Pearson's correlational relationships were measured between independent and dependent variables of intention to donate blood. ANOVA was applied to observe the model fit; a value of 0.000 was considered statistically significant. A multiple regression analysis was conducted to explore the relative importance of the main independent variables in the prediction of intention. Multi-collinearity was also evaluated to determine that various independent variables determine the intention. The reliability of measures composed of two items was assessed using inter-item correlations. Three hundred and ninety-one medical students (M:F; 1:2.2) with mean age of 21.96 years ± 1.95 participated in this study. Mean item score was 3.8 ± 0.83. Multiple regression analysis suggested that perceived behavioral control, anticipated regret and attitude were the most influential factors in determining intention of blood donation. Donation anxiety was least correlated and in fact bore a negative correlation with intention. ANOVA computed an F value of 199.082 with a p-value of 0.000 indicating fitness of model. The value of R square and adjusted R square was 0.811 and 0.807 respectively indicating strong correlation between various independent and dependent variables. Medical students as novice blood donors showed a positive attitude toward blood donation. Theory of planned behavior can be successfully utilized in determining the antecedents toward blood donation behavior. Copyright © 2015 Elsevier Ltd. All rights reserved.

  14. Least-squares sequential parameter and state estimation for large space structures

    NASA Technical Reports Server (NTRS)

    Thau, F. E.; Eliazov, T.; Montgomery, R. C.

    1982-01-01

    This paper presents the formulation of simultaneous state and parameter estimation problems for flexible structures in terms of least-squares minimization problems. The approach combines an on-line order determination algorithm, with least-squares algorithms for finding estimates of modal approximation functions, modal amplitudes, and modal parameters. The approach combines previous results on separable nonlinear least squares estimation with a regression analysis formulation of the state estimation problem. The technique makes use of sequential Householder transformations. This allows for sequential accumulation of matrices required during the identification process. The technique is used to identify the modal prameters of a flexible beam.

  15. Linearized inversion of multiple scattering seismic energy

    NASA Astrophysics Data System (ADS)

    Aldawood, Ali; Hoteit, Ibrahim; Zuberi, Mohammad

    2014-05-01

    Internal multiples deteriorate the quality of the migrated image obtained conventionally by imaging single scattering energy. So, imaging seismic data with the single-scattering assumption does not locate multiple bounces events in their actual subsurface positions. However, imaging internal multiples properly has the potential to enhance the migrated image because they illuminate zones in the subsurface that are poorly illuminated by single scattering energy such as nearly vertical faults. Standard migration of these multiples provides subsurface reflectivity distributions with low spatial resolution and migration artifacts due to the limited recording aperture, coarse sources and receivers sampling, and the band-limited nature of the source wavelet. The resultant image obtained by the adjoint operator is a smoothed depiction of the true subsurface reflectivity model and is heavily masked by migration artifacts and the source wavelet fingerprint that needs to be properly deconvolved. Hence, we proposed a linearized least-square inversion scheme to mitigate the effect of the migration artifacts, enhance the spatial resolution, and provide more accurate amplitude information when imaging internal multiples. The proposed algorithm uses the least-square image based on single-scattering assumption as a constraint to invert for the part of the image that is illuminated by internal scattering energy. Then, we posed the problem of imaging double-scattering energy as a least-square minimization problem that requires solving the normal equation of the following form: GTGv = GTd, (1) where G is a linearized forward modeling operator that predicts double-scattered seismic data. Also, GT is a linearized adjoint operator that image double-scattered seismic data. Gradient-based optimization algorithms solve this linear system. Hence, we used a quasi-Newton optimization technique to find the least-square minimizer. In this approach, an estimate of the Hessian matrix that contains curvature information is modified at every iteration by a low-rank update based on gradient changes at every step. At each iteration, the data residual is imaged using GT to determine the model update. Application of the linearized inversion to synthetic data to image a vertical fault plane demonstrate the effectiveness of this methodology to properly delineate the vertical fault plane and give better amplitude information than the standard migrated image using the adjoint operator that takes into account internal multiples. Thus, least-square imaging of multiple scattering enhances the spatial resolution of the events illuminated by internal scattering energy. It also deconvolves the source signature and helps remove the fingerprint of the acquisition geometry. The final image is obtained by the superposition of the least-square solution based on single scattering assumption and the least-square solution based on double scattering assumption.

  16. Quantification of brain lipids by FTIR spectroscopy and partial least squares regression

    NASA Astrophysics Data System (ADS)

    Dreissig, Isabell; Machill, Susanne; Salzer, Reiner; Krafft, Christoph

    2009-01-01

    Brain tissue is characterized by high lipid content. Its content decreases and the lipid composition changes during transformation from normal brain tissue to tumors. Therefore, the analysis of brain lipids might complement the existing diagnostic tools to determine the tumor type and tumor grade. Objective of this work is to extract lipids from gray matter and white matter of porcine brain tissue, record infrared (IR) spectra of these extracts and develop a quantification model for the main lipids based on partial least squares (PLS) regression. IR spectra of the pure lipids cholesterol, cholesterol ester, phosphatidic acid, phosphatidylcholine, phosphatidylethanolamine, phosphatidylserine, phosphatidylinositol, sphingomyelin, galactocerebroside and sulfatide were used as references. Two lipid mixtures were prepared for training and validation of the quantification model. The composition of lipid extracts that were predicted by the PLS regression of IR spectra was compared with lipid quantification by thin layer chromatography.

  17. Hypothesis Testing Using Factor Score Regression

    PubMed Central

    Devlieger, Ines; Mayer, Axel; Rosseel, Yves

    2015-01-01

    In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and with structural equation modeling (SEM) by using analytic calculations and two Monte Carlo simulation studies to examine their finite sample characteristics. Several performance criteria are used, such as the bias using the unstandardized and standardized parameterization, efficiency, mean square error, standard error bias, type I error rate, and power. The results show that the bias correcting method, with the newly developed standard error, is the only suitable alternative for SEM. While it has a higher standard error bias than SEM, it has a comparable bias, efficiency, mean square error, power, and type I error rate. PMID:29795886

  18. Prediction of clinical depression scores and detection of changes in whole-brain using resting-state functional MRI data with partial least squares regression

    PubMed Central

    Shimizu, Yu; Yoshimoto, Junichiro; Takamura, Masahiro; Okada, Go; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji

    2017-01-01

    In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS) regression to resting-state functional magnetic resonance imaging (rs-fMRI) data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area. PMID:28700672

  19. Raman spectroscopy compared against traditional predictors of shear force in lamb m. longissimus lumborum.

    PubMed

    Fowler, Stephanie M; Schmidt, Heinar; van de Ven, Remy; Wynn, Peter; Hopkins, David L

    2014-12-01

    A Raman spectroscopic hand held device was used to predict shear force (SF) of 80 fresh lamb m. longissimus lumborum (LL) at 1 and 5days post mortem (PM). Traditional predictors of SF including sarcomere length (SL), particle size (PS), cooking loss (CL), percentage myofibrillar breaks and pH were also measured. SF values were regressed against Raman spectra using partial least squares regression and against the traditional predictors using linear regression. The best prediction of shear force values used spectra at 1day PM to predict shear force at 1day which gave a root mean square error of prediction (RMSEP) of 13.6 (Null=14.0) and the R(2) between observed and cross validated predicted values was 0.06 (R(2)cv). Overall, for fresh LL, the predictability SF, by either the Raman hand held probe or traditional predictors was low. Copyright © 2014 Elsevier Ltd. All rights reserved.

  20. Robust regression on noisy data for fusion scaling laws

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be; Laboratoire de Physique des Plasmas de l'ERM - Laboratorium voor Plasmafysica van de KMS

    2014-11-15

    We introduce the method of geodesic least squares (GLS) regression for estimating fusion scaling laws. Based on straightforward principles, the method is easily implemented, yet it clearly outperforms established regression techniques, particularly in cases of significant uncertainty on both the response and predictor variables. We apply GLS for estimating the scaling of the L-H power threshold, resulting in estimates for ITER that are somewhat higher than predicted earlier.

  1. Application of nonlinear least-squares regression to ground-water flow modeling, west-central Florida

    USGS Publications Warehouse

    Yobbi, D.K.

    2000-01-01

    A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.

  2. Hyperspectral remote sensing of plant biochemistry using Bayesian model averaging with variable and band selection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Zhao, Kaiguang; Valle, Denis; Popescu, Sorin

    2013-05-15

    Model specification remains challenging in spectroscopy of plant biochemistry, as exemplified by the availability of various spectral indices or band combinations for estimating the same biochemical. This lack of consensus in model choice across applications argues for a paradigm shift in hyperspectral methods to address model uncertainty and misspecification. We demonstrated one such method using Bayesian model averaging (BMA), which performs variable/band selection and quantifies the relative merits of many candidate models to synthesize a weighted average model with improved predictive performances. The utility of BMA was examined using a portfolio of 27 foliage spectral–chemical datasets representing over 80 speciesmore » across the globe to estimate multiple biochemical properties, including nitrogen, hydrogen, carbon, cellulose, lignin, chlorophyll (a or b), carotenoid, polar and nonpolar extractives, leaf mass per area, and equivalent water thickness. We also compared BMA with partial least squares (PLS) and stepwise multiple regression (SMR). Results showed that all the biochemicals except carotenoid were accurately estimated from hyerspectral data with R2 values > 0.80.« less

  3. Factors Infuencing Women in Pap Smear Uptake

    NASA Astrophysics Data System (ADS)

    Wijayanti, K. E.; Alam, I. G.

    2017-03-01

    Objective: Pap smear has proven can decrease death caused by cervical cancer. However, in Indonesia, only few woman who already did pap smear. The aim of this study was to investigate women’s knowledge about pap smear cervical cancer, and to investigate factors influence women to do pap smear test. Methods: Quantitative data colected through questionairre towards 31 women who did pap smear and 55 women who did not do pap smear. Questionairre was made using Health Belief model as a guideline to examine percieved susceptibility, perceived serioussnes, perceived benefits and perceived barriers. Chi square and multiple logistic regresion were used to investigate difference in knowledge and what the most factor that influence women to take pap smear test. Results: There’s significance knowledge difference betweeen women who did and did not do pap smear. But furthermore, by using Multiple Logistic Regression test, appearantly knowledge was not a strong predictor factor for women to take pap smear test (koefisiensi β = -0,164) Conclusion: Perceived barriers were factors that affected pap smear uptake in women in Indonesia. Few respondents get the wrong informations about pap smear, cevical cancer and its symptoms

  4. Adult correlates of early behavioral maladjustment: a study of injured drivers.

    PubMed

    Ryb, Gabriel; Dischinger, Patricia; Smith, Gordon; Soderstrom, Carl

    2008-10-01

    To establish whether a history of school suspension (HSS) predicts adult driver behavior. 323 injured drivers were interviewed as part of a study of psychoactive substance use disorders (PSUD) and injury. Drivers with a HSS were compared to those without HSS in relation to demographics, SES, PSUD, risky behaviors, trauma history and driving history using student's t test and chi-square. Multiple logistic regression models were constructed to adjust for demographics, SES and PSUD. HSS drivers represented 31% of the population and were younger, more likely to be male and had higher rates of alcohol and drug dependence than drivers without HSS. Educational achievement was worse for drivers with HSS. Drivers with HSS were more likely to have a history of prior vehicular trauma and assault. Seat-belt non-use, drinking and driving, riding with drunk driver, binge drinking, driving fast for the thrill, license suspension and drinking and driving convictions were more common among drivers with HSS. In multiple logistic regression models adjusting for demographics and SES, HSS revealed higher odds ratios for the same outcomes. After adding PSUD to the models HSS remained significant only for seat belt non use, binge drinking and previous assault history. HSS is associated with risky behaviors, repeated vehicular injury, and poor driver history. The association with driver history, however, disappears when PSUD are included in the models. The association of HSS (a marker of early behavioral maladjustment) with behavioral risks suggests that undiagnosed psychopathology may be linked to injury recidivism.

  5. Climatological Modeling of Monthly Air Temperature and Precipitation in Egypt through GIS Techniques

    NASA Astrophysics Data System (ADS)

    El Kenawy, A.

    2009-09-01

    This paper describes a method for modeling and mapping four climatic variables (maximum temperature, minimum temperature, mean temperature and total precipitation) in Egypt using a multiple regression approach implemented in a GIS environment. In this model, a set of variables including latitude, longitude, elevation within a distance of 5, 10 and 15 km, slope, aspect, distance to the Mediterranean Sea, distance to the Red Sea, distance to the Nile, ratio between land and water masses within a radius of 5, 10, 15 km, the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Water Index (NDWI), the Normalized Difference Temperature Index (NDTI) and reflectance are included as independent variables. These variables were integrated as raster layers in MiraMon software at a spatial resolution of 1 km. Climatic variables were considered as dependent variables and averaged from quality controlled and homogenized 39 series distributing across the entire country during the period of (1957-2006). For each climatic variable, digital and objective maps were finally obtained using the multiple regression coefficients at monthly, seasonal and annual timescale. The accuracy of these maps were assessed through cross-validation between predicted and observed values using a set of statistics including coefficient of determination (R2), root mean square error (RMSE), mean absolute error (MAE), mean bias Error (MBE) and D Willmott statistic. These maps are valuable in the sense of spatial resolution as well as the number of observatories involved in the current analysis.

  6. Association of Dentine Hypersensitivity with Different Risk Factors – A Cross Sectional Study

    PubMed Central

    Vijaya, V; Sanjay, Venkataraam; Varghese, Rana K; Ravuri, Rajyalakshmi; Agarwal, Anil

    2013-01-01

    Background: This study was done to assess the prevalence of Dentine hypersensitivity (DH) and its associated risk factors. Materials & Methods: This epidemiological study was done among patients coming to dental college regarding prevalence of DH. A self structured questionnaire along with clinical examination was done for assessment. Descriptive statistics were obtained and frequency distribution was calculated using Chi square test at p value <0.05. Stepwise multiple linear regression was also done to access frequency of DH with different factors. Results: The study population was comprised of 655 participants with different age groups. Our study showed prevalence as 55% and it was more common among males. Similarly smokers and those who use hard tooth brush had more cases of DH. Step wise multiple linear regression showed that best predictor for DH was age followed by habit of smoking and type of tooth brush. Most aggravating factors were cold water (15.4%) and sweet foods (14.7%), whereas only 5% of the patients had it while brushing. Conclusion: A high level of dental hypersensitivity has been in this study and more common among males. A linear finding was shown with age, smoking and type of tooth brush. How to cite this article: Vijaya V, Sanjay V, Varghese RK, Ravuri R, Agarwal A. Association of Dentine Hypersensitivity with Different Risk Factors – A Cross Sectional Study. J Int Oral Health 2013;5(6):88-92 . PMID:24453451

  7. A framework for evaluating student perceptions of health policy training in medical school.

    PubMed

    Patel, Mitesh S; Lypson, Monica L; Miller, D Douglas; Davis, Matthew M

    2014-10-01

    Nearly half of graduating medical students in the United States report that medical school provides inadequate instruction in topics related to health policy. Although most medical schools report some form of policy education, there lacks a standard for teaching core concepts and evaluating student satisfaction. Responses to the Association of American Medical College's Medical School Graduation Questionnaire were obtained for the years 2007-2008 and 2011-2012 and mapped to domains of training in health policy curricula for four domains: systems and principles; value and equity; quality and safety; and politics and law. Chi-square tests were used to test differences among unadjusted temporal trends. Multiple logistic regression models were fit to the outcome variables and adjusted for student characteristics, student preferences, and medical school characteristics. Compared with 2007-2008, students' perceptions of training in 2011-2012 increased on a relative basis by 11.7% for components within systems and principles, 2.8% for quality and safety, and 6.8% for value and equity. Components within politics and law had a composite decline of 4.8%. Multiple logistic regression models found higher odds of reporting satisfaction with training over time for all components within the domains of systems and principles, quality and safety, and value and equity (P < .01), with the exception of medical economics. Medical student perceptions of training in health policy improved over time. Causal factors for these trends require further study. Despite improvement, nearly 40% of graduating medical students still report inadequate instruction in health policy.

  8. Risk factors for repetitive strain injuries among school teachers in Thailand.

    PubMed

    Chaiklieng, Sunisa; Suggaravetsiri, Pornnapa

    2012-01-01

    Prolonged posture, static works and repetition are previously reported as the cause of repetitive strain injuries (RSIs) among workers including teachers. This cross-sectional analytic study aimed to investigate the prevalence and risk factors of RSIs among school teachers. Participants were 452 full-time school teachers in Thailand. Data were collected by the structural questionnaires, illuminance measurements and the physical fitness tests. Descriptive statistics and inferential statistics which were Chi-square test and multiple logistic regression analysis were used. Most teachers in this study were females (57.3%), the mean years of work experience was 22.6 ± 10.4 years. The six-month prevalence of RSIs was 73.7%. The univariate analysis identified the related risk factors to RSIs which were chronic disease (OR=1.8; 95% CI = 1.16-2.73), history of trauma (OR=2.0; 95% CI = 1.02-4.01), member of family had RSIs (OR=2.0; 95% CI = 1.02- 4.01), stretch to write on board (OR=1.7; 95% CI = 1.06-1.70) and high heel shoe >2 inch (OR=1.6; 95% CI = 1.03-2.51). Multiple logistic regression analysis showed that chronic diseases and high heel shoe >2 inch significantly related to developing of RSIs. The poor grip strength and back muscle flexibility significantly affected RSIs of teachers. In conclusions, RSIs were highly prevalent in school teachers that they should be aware of health promotion to prevent RSIs.

  9. Experimental Investigations of Non-Stationary Properties In Radiometer Receivers Using Measurements of Multiple Calibration References

    NASA Technical Reports Server (NTRS)

    Racette, Paul; Lang, Roger; Zhang, Zhao-Nan; Zacharias, David; Krebs, Carolyn A. (Technical Monitor)

    2002-01-01

    Radiometers must be periodically calibrated because the receiver response fluctuates. Many techniques exist to correct for the time varying response of a radiometer receiver. An analytical technique has been developed that uses generalized least squares regression (LSR) to predict the performance of a wide variety of calibration algorithms. The total measurement uncertainty including the uncertainty of the calibration can be computed using LSR. The uncertainties of the calibration samples used in the regression are based upon treating the receiver fluctuations as non-stationary processes. Signals originating from the different sources of emission are treated as simultaneously existing random processes. Thus, the radiometer output is a series of samples obtained from these random processes. The samples are treated as random variables but because the underlying processes are non-stationary the statistics of the samples are treated as non-stationary. The statistics of the calibration samples depend upon the time for which the samples are to be applied. The statistics of the random variables are equated to the mean statistics of the non-stationary processes over the interval defined by the time of calibration sample and when it is applied. This analysis opens the opportunity for experimental investigation into the underlying properties of receiver non stationarity through the use of multiple calibration references. In this presentation we will discuss the application of LSR to the analysis of various calibration algorithms, requirements for experimental verification of the theory, and preliminary results from analyzing experiment measurements.

  10. Gingival Inflammation Associates with Stroke – A Role for Oral Health Personnel in Prevention: A Database Study

    PubMed Central

    2015-01-01

    Objectives Gingival inflammation is the physiological response to poor oral hygiene. If gingivitis is not resolved the response will become an established lesion.We studied whether gingivitis associates with elevated risk for stroke. The hypothesis was based on the periodontitis–atherosclerosis paradigm. Methods In our prospective cohort study from Sweden 1676 randomly selected subjects were followed up from 1985 to 2012. All subjects underwent clinical oral examination and answered a questionnaire assessing background variables such as socio-economic status and pack-years of smoking. Cases with stroke were recorded from the Center of Epidemiology, Swedish National Board of Health and Welfare, Sweden, and classified according to the WHO International Classification of Diseases. Unpaired t-test, chi-square tests, and multiple logistic regression analyses were used. Results Of the 1676 participants, 39 subjects (2.3%) had been diagnosed with stroke. There were significant differences between the patients with stroke and subjects without in pack-years of smoking (p = 0.01), prevalence of gingival inflammation (GI) (p = 0.03), and dental calculus (p = 0.017). In a multiple regression analysis the association between GI, confounders and stroke, GI showed odds ratio 2.20 (95% confidence interval 1.02–4.74) for stroke. Conclusion Our present findings showed that gingival inflammation clearly associated with stroke in this 26-year cohort study. The results emphasize the role of oral health personnel in prevention. PMID:26405803

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.

    Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less

  12. Two biased estimation techniques in linear regression: Application to aircraft

    NASA Technical Reports Server (NTRS)

    Klein, Vladislav

    1988-01-01

    Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.

  13. Partial Least Squares Regression Models for the Analysis of Kinase Signaling.

    PubMed

    Bourgeois, Danielle L; Kreeger, Pamela K

    2017-01-01

    Partial least squares regression (PLSR) is a data-driven modeling approach that can be used to analyze multivariate relationships between kinase networks and cellular decisions or patient outcomes. In PLSR, a linear model relating an X matrix of dependent variables and a Y matrix of independent variables is generated by extracting the factors with the strongest covariation. While the identified relationship is correlative, PLSR models can be used to generate quantitative predictions for new conditions or perturbations to the network, allowing for mechanisms to be identified. This chapter will provide a brief explanation of PLSR and provide an instructive example to demonstrate the use of PLSR to analyze kinase signaling.

  14. Diabetic Prevalence in Bangladesh: The Role of Some Associated Demographic and Socioeconomic Characteristics

    NASA Astrophysics Data System (ADS)

    Imam, Tasneem

    2012-12-01

    The study attempts at examining the association of a few selected socio-economic and demographic characteristics on diabetic prevalence. Nationally representative data from BIRDEM 2000 have been used to meet the objectives of the study. Cross tabulation, Chi-square and logistic regression analysis have been used to portray the necessary associations. Chi- square reveals significant relationship between diabetic prevalence and all the selected demographic and socio-economic variables except ìeducationî while logistic regression analysis shows no significant contribution of ìageî and ìeducationî in diabetic prevalence. It has to be noted that, this paper dealt with all the three types of diabetes- Type 1, Type 2 and Gestational.

  15. Multiple-Coil, Pulse-Induction Metal Detector

    NASA Technical Reports Server (NTRS)

    Lesky, Edward S.; Reid, Alan M.; Bushong, Wilton E.; Dickey, Duane P.

    1988-01-01

    Multiple-head, pulse-induction metal detector scans area of 72 feet squared with combination of eight detector heads, each 3 ft. square. Head includes large primary coil inducing current in smaller secondary coils. Array of eight heads enables searcher to cover large area quickly. Pulses applied to primary coil, induced in secondary coils measured to determine whether metal present within range of detector head. Detector designed for recovery of Space Shuttle debris.

  16. Using the Criterion-Predictor Factor Model to Compute the Probability of Detecting Prediction Bias with Ordinary Least Squares Regression

    ERIC Educational Resources Information Center

    Culpepper, Steven Andrew

    2012-01-01

    The study of prediction bias is important and the last five decades include research studies that examined whether test scores differentially predict academic or employment performance. Previous studies used ordinary least squares (OLS) to assess whether groups differ in intercepts and slopes. This study shows that OLS yields inaccurate inferences…

  17. A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections

    NASA Technical Reports Server (NTRS)

    Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.

    2014-01-01

    A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.

  18. Modelling of the batch biosorption system: study on exchange of protons with cell wall-bound mineral ions.

    PubMed

    Mishra, Vishal

    2015-01-01

    The interchange of the protons with the cell wall-bound calcium and magnesium ions at the interface of solution/bacterial cell surface in the biosorption system at various concentrations of protons has been studied in the present work. A mathematical model for establishing the correlation between concentration of protons and active sites was developed and optimized. The sporadic limited residence time reactor was used to titrate the calcium and magnesium ions at the individual data point. The accuracy of the proposed mathematical model was estimated using error functions such as nonlinear regression, adjusted nonlinear regression coefficient, the chi-square test, P-test and F-test. The values of the chi-square test (0.042-0.017), P-test (<0.001-0.04), sum of square errors (0.061-0.016), root mean square error (0.01-0.04) and F-test (2.22-19.92) reported in the present research indicated the suitability of the model over a wide range of proton concentrations. The zeta potential of the bacterium surface at various concentrations of protons was observed to validate the denaturation of active sites.

  19. Support vector machine regression (SVR/LS-SVM)--an alternative to neural networks (ANN) for analytical chemistry? Comparison of nonlinear methods on near infrared (NIR) spectroscopy data.

    PubMed

    Balabin, Roman M; Lomakina, Ekaterina I

    2011-04-21

    In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.

  20. Impact of multicollinearity on small sample hydrologic regression models

    NASA Astrophysics Data System (ADS)

    Kroll, Charles N.; Song, Peter

    2013-06-01

    Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.

  1. Multiple Correlation versus Multiple Regression.

    ERIC Educational Resources Information Center

    Huberty, Carl J.

    2003-01-01

    Describes differences between multiple correlation analysis (MCA) and multiple regression analysis (MRA), showing how these approaches involve different research questions and study designs, different inferential approaches, different analysis strategies, and different reported information. (SLD)

  2. Linear Least Squares for Correlated Data

    NASA Technical Reports Server (NTRS)

    Dean, Edwin B.

    1988-01-01

    Throughout the literature authors have consistently discussed the suspicion that regression results were less than satisfactory when the independent variables were correlated. Camm, Gulledge, and Womer, and Womer and Marcotte provide excellent applied examples of these concerns. Many authors have obtained partial solutions for this problem as discussed by Womer and Marcotte and Wonnacott and Wonnacott, which result in generalized least squares algorithms to solve restrictive cases. This paper presents a simple but relatively general multivariate method for obtaining linear least squares coefficients which are free of the statistical distortion created by correlated independent variables.

  3. Identification of molecular descriptors for design of novel Isoalloxazine derivatives as potential Acetylcholinesterase inhibitors against Alzheimer's disease.

    PubMed

    Gurung, Arun Bahadur; Aguan, Kripamoy; Mitra, Sivaprasad; Bhattacharjee, Atanu

    2017-06-01

    In Alzheimer's disease (AD), the level of Acetylcholine (ACh) neurotransmitter is reduced. Since Acetylcholinesterase (AChE) cleaves ACh, inhibitors of AChE are very much sought after for AD treatment. The side effects of current inhibitors necessitate development of newer AChE inhibitors. Isoalloxazine derivatives have proved to be promising (AChE) inhibitors. However, their structure-activity relationship studies have not been reported till date. In the present work, various quantitative structure-activity relationship (QSAR) building methods such as multiple linear regression (MLR), partial least squares ,and principal component regression were employed to derive 3D-QSAR models using steric and electrostatic field descriptors. Statistically significant model was obtained using MLR coupled with stepwise selection method having r 2  = .9405, cross validated r 2 (q 2 ) = .6683, and a high predictability (pred_r 2  = .6206 and standard error, pred_r 2 se = .2491). Steric and electrostatic contribution plot revealed three electrostatic fields E_496, E_386 and E_577 and one steric field S_60 contributing towards biological activity. A ligand-based 3D-pharmacophore model was generated consisting of eight pharmacophore features. Isoalloxazine derivatives were docked against human AChE, which revealed critical residues implicated in hydrogen bonds as well as hydrophobic interactions. The binding modes of docked complexes (AChE_IA1 and AChE_IA14) were validated by molecular dynamics simulation which showed their stable trajectories in terms of root mean square deviation and molecular mechanics/Poisson-Boltzmann surface area binding free energy analysis revealed key residues contributing significantly to overall binding energy. The present study may be useful in the design of more potent Isoalloxazine derivatives as AChE inhibitors.

  4. Antiretroviral drug costs and prescription patterns in British Columbia, Canada: 1996-2011.

    PubMed

    Nosyk, Bohdan; Montaner, Julio S G; Yip, Benita; Lima, Viviane D; Hogg, Robert S

    2014-04-01

    Treatment options and therapeutic guidelines have evolved substantially since highly active antiretroviral treatment (HAART) became the standard of HIV care in 1996. We conducted the present population-based analysis to characterize the determinants of direct costs of HAART over time in British Columbia, Canada. We considered individuals ever receiving HAART in British Columbia from 1996 to 2011. Linear mixed-effects regression models were constructed to determine the effects of demographic indicators, clinical stage, and treatment characteristics on quarterly costs of HAART (in 2010$CDN) among individuals initiating in different temporal periods. The least-square mean values were estimated by CD4 category and over time for each temporal cohort. Longitudinal data on HAART recipients (N = 9601, 17.6% female, mean age at initiation = 40.5) were analyzed. Multiple regression analyses identified demographics, treatment adherence, and pharmacological class to be independently associated with quarterly HAART costs. Higher CD4 cell counts were associated with modestly lower costs among pre-HAART initiators [least-square means (95% confidence interval), CD4 > 500: 4674 (4632-4716); CD4: 350-499: 4765 (4721-4809) CD4: 200-349: 4826 (4780-4871); CD4 <200: 4809 (4759-4859)]; however these differences were not significant among post-2003 HAART initiators. Population-level mean costs increased through 2006 and stabilized post-2003 HAART initiators incurred quarterly costs up to 23% lower than pre-2000 HAART initiators in 2010. Our results highlight the magnitude of the temporal changes in HAART costs, and disparities between recent and pre-HAART initiators. This methodology can improve the precision of economic modeling efforts by using detailed cost functions for annual, population-level medication costs according to the distribution of clients by clinical stage and era of treatment initiation.

  5. Compatible Models of Carbon Content of Individual Trees on a Cunninghamia lanceolata Plantation in Fujian Province, China

    PubMed Central

    Zhuo, Lin; Tao, Hong; Wei, Hong; Chengzhen, Wu

    2016-01-01

    We tried to establish compatible carbon content models of individual trees for a Chinese fir (Cunninghamia lanceolata (Lamb.) Hook.) plantation from Fujian province in southeast China. In general, compatibility requires that the sum of components equal the whole tree, meaning that the sum of percentages calculated from component equations should equal 100%. Thus, we used multiple approaches to simulate carbon content in boles, branches, foliage leaves, roots and the whole individual trees. The approaches included (i) single optimal fitting (SOF), (ii) nonlinear adjustment in proportion (NAP) and (iii) nonlinear seemingly unrelated regression (NSUR). These approaches were used in combination with variables relating diameter at breast height (D) and tree height (H), such as D, D2H, DH and D&H (where D&H means two separate variables in bivariate model). Power, exponential and polynomial functions were tested as well as a new general function model was proposed by this study. Weighted least squares regression models were employed to eliminate heteroscedasticity. Model performances were evaluated by using mean residuals, residual variance, mean square error and the determination coefficient. The results indicated that models with two dimensional variables (DH, D2H and D&H) were always superior to those with a single variable (D). The D&H variable combination was found to be the most useful predictor. Of all the approaches, SOF could establish a single optimal model separately, but there were deviations in estimating results due to existing incompatibilities, while NAP and NSUR could ensure predictions compatibility. Simultaneously, we found that the new general model had better accuracy than others. In conclusion, we recommend that the new general model be used to estimate carbon content for Chinese fir and considered for other vegetation types as well. PMID:26982054

  6. New Insights into Handling Missing Values in Environmental Epidemiological Studies

    PubMed Central

    Roda, Célina; Nicolis, Ioannis; Momas, Isabelle; Guihenneuc, Chantal

    2014-01-01

    Missing data are unavoidable in environmental epidemiologic surveys. The aim of this study was to compare methods for handling large amounts of missing values: omission of missing values, single and multiple imputations (through linear regression or partial least squares regression), and a fully Bayesian approach. These methods were applied to the PARIS birth cohort, where indoor domestic pollutant measurements were performed in a random sample of babies' dwellings. A simulation study was conducted to assess performances of different approaches with a high proportion of missing values (from 50% to 95%). Different simulation scenarios were carried out, controlling the true value of the association (odds ratio of 1.0, 1.2, and 1.4), and varying the health outcome prevalence. When a large amount of data is missing, omitting these missing data reduced statistical power and inflated standard errors, which affected the significance of the association. Single imputation underestimated the variability, and considerably increased risk of type I error. All approaches were conservative, except the Bayesian joint model. In the case of a common health outcome, the fully Bayesian approach is the most efficient approach (low root mean square error, reasonable type I error, and high statistical power). Nevertheless for a less prevalent event, the type I error is increased and the statistical power is reduced. The estimated posterior distribution of the OR is useful to refine the conclusion. Among the methods handling missing values, no approach is absolutely the best but when usual approaches (e.g. single imputation) are not sufficient, joint modelling approach of missing process and health association is more efficient when large amounts of data are missing. PMID:25226278

  7. Simulation of flood hydrographs for Georgia streams

    USGS Publications Warehouse

    Inman, Ernest J.

    1987-01-01

    Flood hydrographs are needed for the design of many highway drainage structures and embankments. A method for simulating these flood hydrographs at ungaged sites in Georgia is presented in this report. The O'Donnell method was used to compute unit hydrographs and lagtimes for 355 floods at 80 gaging stations. An average unit hydrograph and an average lagtime were computed for each station. These average unit hydrographs were transformed to unit hydrographs having durations of one-fourth, one-third, one-half, and three-fourths lagtime, then reduced to dimensionless terms by dividing the time by lagtime and the discharge by peak discharge. Hydrographs were simulated for these 355 floods and their widths were compared with the widths of the observed hydrographs at 50 and 75 percent of peak flow. The dimensionless hydrograph based on one-half lagtime duration provided the best fit of the observed data. Multiple regression analysis was then used to define relations between lagtime and certain physical basin characteristics; of these characteristics, drainage area and slope were found to be significant for the rural-stream equations and drainage area, slope, and impervious area were found to be significant for the Atlanta urban-stream equation. A hydrograph can be simulated from the dimensionless hydrograph, the peak discharge of a specific recurrence interval, and the lagtime obtained from regression equations for any site in Georgia having a drainage area of less than 500 square miles. For simulating hydrographs at sites having basins larger than 500 square miles, the U.S. Geological Survey computer model CONROUT can be used. This model routes streamflow from an upstream channel location to a user-defined location downstream. The product of CONROUT is a simulated discharge hydrograph for the downstream site that has a peak discharge of a specific recurrence interval.

  8. The Detection and Interpretation of Interaction Effects between Continuous Variables in Multiple Regression.

    ERIC Educational Resources Information Center

    Jaccard, James; And Others

    1990-01-01

    Issues in the detection and interpretation of interaction effects between quantitative variables in multiple regression analysis are discussed. Recent discussions associated with problems of multicollinearity are reviewed in the context of the conditional nature of multiple regression with product terms. (TJH)

  9. One-step global parameter estimation of kinetic inactivation parameters for Bacillus sporothermodurans spores under static and dynamic thermal processes.

    PubMed

    Cattani, F; Dolan, K D; Oliveira, S D; Mishra, D K; Ferreira, C A S; Periago, P M; Aznar, A; Fernandez, P S; Valdramidis, V P

    2016-11-01

    Bacillus sporothermodurans produces highly heat-resistant endospores, that can survive under ultra-high temperature. High heat-resistant sporeforming bacteria are one of the main causes for spoilage and safety of low-acid foods. They can be used as indicators or surrogates to establish the minimum requirements for heat processes, but it is necessary to understand their thermal inactivation kinetics. The aim of the present work was to study the inactivation kinetics under both static and dynamic conditions in a vegetable soup. Ordinary least squares one-step regression and sequential procedures were applied for estimating these parameters. Results showed that multiple dynamic heating profiles, when analyzed simultaneously, can be used to accurately estimate the kinetic parameters while significantly reducing estimation errors and data collection. Copyright © 2016 Elsevier Ltd. All rights reserved.

  10. Evaluation of the Williams-type model for barley yields in North Dakota and Minnesota

    NASA Technical Reports Server (NTRS)

    Barnett, T. L. (Principal Investigator)

    1981-01-01

    The Williams-type yield model is based on multiple regression analysis of historial time series data at CRD level pooled to regional level (groups of similar CRDs). Basic variables considered in the analysis include USDA yield, monthly mean temperature, monthly precipitation, soil texture and topographic information, and variables derived from these. Technologic trend is represented by piecewise linear and/or quadratic functions of year. Indicators of yield reliability obtained from a ten-year bootstrap test (1970-1979) demonstrate that biases are small and performance based on root mean square appears to be acceptable for the intended AgRISTARS large area applications. The model is objective, adequate, timely, simple, and not costly. It consideres scientific knowledge on a broad scale but not in detail, and does not provide a good current measure of modeled yield reliability.

  11. Wind Tunnel Strain-Gage Balance Calibration Data Analysis Using a Weighted Least Squares Approach

    NASA Technical Reports Server (NTRS)

    Ulbrich, N.; Volden, T.

    2017-01-01

    A new approach is presented that uses a weighted least squares fit to analyze wind tunnel strain-gage balance calibration data. The weighted least squares fit is specifically designed to increase the influence of single-component loadings during the regression analysis. The weighted least squares fit also reduces the impact of calibration load schedule asymmetries on the predicted primary sensitivities of the balance gages. A weighting factor between zero and one is assigned to each calibration data point that depends on a simple count of its intentionally loaded load components or gages. The greater the number of a data point's intentionally loaded load components or gages is, the smaller its weighting factor becomes. The proposed approach is applicable to both the Iterative and Non-Iterative Methods that are used for the analysis of strain-gage balance calibration data in the aerospace testing community. The Iterative Method uses a reasonable estimate of the tare corrected load set as input for the determination of the weighting factors. The Non-Iterative Method, on the other hand, uses gage output differences relative to the natural zeros as input for the determination of the weighting factors. Machine calibration data of a six-component force balance is used to illustrate benefits of the proposed weighted least squares fit. In addition, a detailed derivation of the PRESS residuals associated with a weighted least squares fit is given in the appendices of the paper as this information could not be found in the literature. These PRESS residuals may be needed to evaluate the predictive capabilities of the final regression models that result from a weighted least squares fit of the balance calibration data.

  12. America's Democracy Colleges: The Civic Engagement of Community College Students

    ERIC Educational Resources Information Center

    Angeli Newell, Mallory

    2014-01-01

    This study explored the civic engagement of current two- and four-year students to explore whether differences exist between the groups and what may explain the differences. Using binary logistic regression and Ordinary Least Squares regression it was found that community-based engagement was lower for two- than four-year students, though…

  13. Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method

    ERIC Educational Resources Information Center

    Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev

    2018-01-01

    The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…

  14. Robust Regression for Slope Estimation in Curriculum-Based Measurement Progress Monitoring

    ERIC Educational Resources Information Center

    Mercer, Sterett H.; Lyons, Alina F.; Johnston, Lauren E.; Millhoff, Courtney L.

    2015-01-01

    Although ordinary least-squares (OLS) regression has been identified as a preferred method to calculate rates of improvement for individual students during curriculum-based measurement (CBM) progress monitoring, OLS slope estimates are sensitive to the presence of extreme values. Robust estimators have been developed that are less biased by…

  15. Pick Your Poisson: A Tutorial on Analyzing Counts of Student Victimization Data

    ERIC Educational Resources Information Center

    Huang, Francis L.; Cornell, Dewey G.

    2012-01-01

    School violence research is often concerned with infrequently occurring events such as counts of the number of bullying incidents or fights a student may experience. Analyzing count data using ordinary least squares regression may produce improbable predicted values, and as a result of regression assumption violations, result in higher Type I…

  16. Causal Models with Unmeasured Variables: An Introduction to LISREL.

    ERIC Educational Resources Information Center

    Wolfle, Lee M.

    Whenever one uses ordinary least squares regression, one is making an implicit assumption that all of the independent variables have been measured without error. Such an assumption is obviously unrealistic for most social data. One approach for estimating such regression models is to measure implied coefficients between latent variables for which…

  17. Early Home Activities and Oral Language Skills in Middle Childhood: A Quantile Analysis

    ERIC Educational Resources Information Center

    Law, James; Rush, Robert; King, Tom; Westrupp, Elizabeth; Reilly, Sheena

    2018-01-01

    Oral language development is a key outcome of elementary school, and it is important to identify factors that predict it most effectively. Commonly researchers use ordinary least squares regression with conclusions restricted to average performance conditional on relevant covariates. Quantile regression offers a more sophisticated alternative.…

  18. On the null distribution of Bayes factors in linear regression

    USDA-ARS?s Scientific Manuscript database

    We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...

  19. Partial least squares for efficient models of fecal indicator bacteria on Great Lakes beaches

    USGS Publications Warehouse

    Brooks, Wesley R.; Fienen, Michael N.; Corsi, Steven R.

    2013-01-01

    At public beaches, it is now common to mitigate the impact of water-borne pathogens by posting a swimmer's advisory when the concentration of fecal indicator bacteria (FIB) exceeds an action threshold. Since culturing the bacteria delays public notification when dangerous conditions exist, regression models are sometimes used to predict the FIB concentration based on readily-available environmental measurements. It is hard to know which environmental parameters are relevant to predicting FIB concentration, and the parameters are usually correlated, which can hurt the predictive power of a regression model. Here the method of partial least squares (PLS) is introduced to automate the regression modeling process. Model selection is reduced to the process of setting a tuning parameter to control the decision threshold that separates predicted exceedances of the standard from predicted non-exceedances. The method is validated by application to four Great Lakes beaches during the summer of 2010. Performance of the PLS models compares favorably to that of the existing state-of-the-art regression models at these four sites.

  20. Spectroscopic Determination of Aboveground Biomass in Grasslands Using Spectral Transformations, Support Vector Machine and Partial Least Squares Regression

    PubMed Central

    Marabel, Miguel; Alvarez-Taboada, Flor

    2013-01-01

    Aboveground biomass (AGB) is one of the strategic biophysical variables of interest in vegetation studies. The main objective of this study was to evaluate the Support Vector Machine (SVM) and Partial Least Squares Regression (PLSR) for estimating the AGB of grasslands from field spectrometer data and to find out which data pre-processing approach was the most suitable. The most accurate model to predict the total AGB involved PLSR and the Maximum Band Depth index derived from the continuum removed reflectance in the absorption features between 916–1,120 nm and 1,079–1,297 nm (R2 = 0.939, RMSE = 7.120 g/m2). Regarding the green fraction of the AGB, the Area Over the Minimum index derived from the continuum removed spectra provided the most accurate model overall (R2 = 0.939, RMSE = 3.172 g/m2). Identifying the appropriate absorption features was proved to be crucial to improve the performance of PLSR to estimate the total and green aboveground biomass, by using the indices derived from those spectral regions. Ordinary Least Square Regression could be used as a surrogate for the PLSR approach with the Area Over the Minimum index as the independent variable, although the resulting model would not be as accurate. PMID:23925082

  1. Near-infrared hyperspectral imaging and partial least squares regression for rapid and reagentless determination of Enterobacteriaceae on chicken fillets.

    PubMed

    Feng, Yao-Ze; Elmasry, Gamal; Sun, Da-Wen; Scannell, Amalia G M; Walsh, Des; Morcy, Noha

    2013-06-01

    Bacterial pathogens are the main culprits for outbreaks of food-borne illnesses. This study aimed to use the hyperspectral imaging technique as a non-destructive tool for quantitative and direct determination of Enterobacteriaceae loads on chicken fillets. Partial least squares regression (PLSR) models were established and the best model using full wavelengths was obtained in the spectral range 930-1450 nm with coefficients of determination R(2)≥ 0.82 and root mean squared errors (RMSEs) ≤ 0.47 log(10)CFUg(-1). In further development of simplified models, second derivative spectra and weighted PLS regression coefficients (BW) were utilised to select important wavelengths. However, the three wavelengths (930, 1121 and 1345 nm) selected from BW were competent and more preferred for predicting Enterobacteriaceae loads with R(2) of 0.89, 0.86 and 0.87 and RMSEs of 0.33, 0.40 and 0.45 log(10)CFUg(-1) for calibration, cross-validation and prediction, respectively. Besides, the constructed prediction map provided the distribution of Enterobacteriaceae bacteria on chicken fillets, which cannot be achieved by conventional methods. It was demonstrated that hyperspectral imaging is a potential tool for determining food sanitation and detecting bacterial pathogens on food matrix without using complicated laboratory regimes. Copyright © 2012 Elsevier Ltd. All rights reserved.

  2. Analysis of the low-flow characteristics of streams in Louisiana

    USGS Publications Warehouse

    Lee, Fred N.

    1985-01-01

    The U.S. Geological Survey, in cooperation with the Louisiana Department of Transportation and Development, Office of Public Works, used geologic maps, soils maps, precipitation data, and low-flow data to define four hydrographic regions in Louisiana having distinct low-flow characteristics. Equations were derived, using regression analyses, to estimate the 7Q2, 7Q10, and 7Q20 flow rates for basically unaltered stream basins smaller than 525 square miles. Independent variables in the equations include drainage area (square miles), mean annual precipitation index (inches), and main channel slope (feet per mile). Average standard errors of regression ranged from +44 to +61 percent. Graphs are given for estimating the 7Q2, 7Q10, and 7Q20 for stream basins for which the drainage area of the most downstream data-collection site is larger than 525 square miles. Detailed examples are given in this report for the use of the equations and graphs.

  3. Feasibility of using near infrared spectroscopy to detect and quantify an adulterant in high quality sandalwood oil.

    PubMed

    Kuriakose, Saji; Joe, I Hubert

    2013-11-01

    Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC=0.00009% v/v). The lowest root mean square error of prediction (RMSEP=0.00016% v/v) in the test set and the highest coefficient of determination (R(2)=0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model. Copyright © 2013 Elsevier B.V. All rights reserved.

  4. Nonlinear least-squares data fitting in Excel spreadsheets.

    PubMed

    Kemmer, Gerdi; Keller, Sandro

    2010-02-01

    We describe an intuitive and rapid procedure for analyzing experimental data by nonlinear least-squares fitting (NLSF) in the most widely used spreadsheet program. Experimental data in x/y form and data calculated from a regression equation are inputted and plotted in a Microsoft Excel worksheet, and the sum of squared residuals is computed and minimized using the Solver add-in to obtain the set of parameter values that best describes the experimental data. The confidence of best-fit values is then visualized and assessed in a generally applicable and easily comprehensible way. Every user familiar with the most basic functions of Excel will be able to implement this protocol, without previous experience in data fitting or programming and without additional costs for specialist software. The application of this tool is exemplified using the well-known Michaelis-Menten equation characterizing simple enzyme kinetics. Only slight modifications are required to adapt the protocol to virtually any other kind of dataset or regression equation. The entire protocol takes approximately 1 h.

  5. Extension of the Haseman-Elston regression model to longitudinal data.

    PubMed

    Won, Sungho; Elston, Robert C; Park, Taesung

    2006-01-01

    We propose an extension to longitudinal data of the Haseman and Elston regression method for linkage analysis. The proposed model is a mixed model having several random effects. As response variable, we investigate the sibship sample mean corrected cross-product (smHE) and the BLUP-mean corrected cross product (pmHE), comparing them with the original squared difference (oHE), the overall mean corrected cross-product (rHE), and the weighted average of the squared difference and the squared mean-corrected sum (wHE). The proposed model allows for the correlation structure of longitudinal data. Also, the model can test for gene x time interaction to discover genetic variation over time. The model was applied in an analysis of the Genetic Analysis Workshop 13 (GAW13) simulated dataset for a quantitative trait simulating systolic blood pressure. Independence models did not preserve the test sizes, while the mixed models with both family and sibpair random effects tended to preserve size well. Copyright 2006 S. Karger AG, Basel.

  6. Development of Jet Noise Power Spectral Laws Using SHJAR Data

    NASA Technical Reports Server (NTRS)

    Khavaran, Abbas; Bridges, James

    2009-01-01

    High quality jet noise spectral data measured at the Aeroacoustic Propulsion Laboratory at the NASA Glenn Research Center is used to examine a number of jet noise scaling laws. Configurations considered in the present study consist of convergent and convergent-divergent axisymmetric nozzles. Following the work of Viswanathan, velocity power factors are estimated using a least squares fit on spectral power density as a function of jet temperature and observer angle. The regression parameters are scrutinized for their uncertainty within the desired confidence margins. As an immediate application of the velocity power laws, spectral density in supersonic jets are decomposed into their respective components attributed to the jet mixing noise and broadband shock associated noise. Subsequent application of the least squares method on the shock power intensity shows that the latter also scales with some power of the shock parameter. A modified shock parameter is defined in order to reduce the dependency of the regression factors on the nozzle design point within the uncertainty margins of the least squares method.

  7. Least Square Fast Learning Network for modeling the combustion efficiency of a 300WM coal-fired boiler.

    PubMed

    Li, Guoqiang; Niu, Peifeng; Wang, Huaibao; Liu, Yongchao

    2014-03-01

    This paper presents a novel artificial neural network with a very fast learning speed, all of whose weights and biases are determined by the twice Least Square method, so it is called Least Square Fast Learning Network (LSFLN). In addition, there is another difference from conventional neural networks, which is that the output neurons of LSFLN not only receive the information from the hidden layer neurons, but also receive the external information itself directly from the input neurons. In order to test the validity of LSFLN, it is applied to 6 classical regression applications, and also employed to build the functional relation between the combustion efficiency and operating parameters of a 300WM coal-fired boiler. Experimental results show that, compared with other methods, LSFLN with very less hidden neurons could achieve much better regression precision and generalization ability at a much faster learning speed. Copyright © 2013 Elsevier Ltd. All rights reserved.

  8. Feasibility of using near infrared spectroscopy to detect and quantify an adulterant in high quality sandalwood oil

    NASA Astrophysics Data System (ADS)

    Kuriakose, Saji; Joe, I. Hubert

    2013-11-01

    Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model.

  9. Role of supraglacial lakes in recession of Himalayan glaciers: A case study of Dudh Koshi basin, Nepal

    NASA Astrophysics Data System (ADS)

    Tuladhar, Florencia Matina; KC, Diwakar

    2018-07-01

    Climate change has been adversely affecting glaciers causing them to advance and recession worldwide. Existing studies have primarily attributed temperature as the leading factor causing glacier recession. However, detailed studies that investigate effect of other factors like presence of debris cover, slope, and contact with water bodies are still scarce. This research, thus investigated the role of supraglacial lakes in recession of debris-covered glaciers (DCG). Such glaciers were studied since these lakes are found in debris-covered glaciers only. For this purpose the interannual variation in area of supraglacial lakes of Dudh Koshi basin was computed to test the hypothesis that these lakes play a significant role in glacier recession. Supraglacial lakes were delineated using Google Earth Pro at five year intervals to assess interannual variation in lake area. Slope, elevation and change in supraglacial lake area were the predictors influencing average decadal change in area of glaciers. Two models prepared using multiple linear regression in Excel were compared. The first model used elevation and slope as predictors while the second model used change in supraglacial lake area as the additional predictor. The second model had a higher coefficient of determination (R square) and Adjusted R-square values of 99 % and 96 % compared to the first model. Further test statistics from Analysis of Variance (ANOVA) results were compared to test the hypothesis. Moreover the Root mean square error (RMSE) of second model was also less than the first one. Hence both the regression statistics and RMSE confirmed that change in area of supraglacial lakes was an important factor that influences overall recession of debris-covered glaciers. Nevertheless, use of high spatial and temporal resolution imageries along-with increase in number of glaciers sampled should be incorporated in future studies to ensure robust outcomes. Thus this research can bolster the overall understanding between glacier and glacial lake dynamics which will improve the resilience of downstream inhabitants from climate induced hazards, such as glacial lake outburst floods (GLOFs).

  10. Inferential modeling and predictive feedback control in real-time motion compensation using the treatment couch during radiotherapy

    NASA Astrophysics Data System (ADS)

    Qiu, Peng; D'Souza, Warren D.; McAvoy, Thomas J.; Liu, K. J. Ray

    2007-09-01

    Tumor motion induced by respiration presents a challenge to the reliable delivery of conformal radiation treatments. Real-time motion compensation represents the technologically most challenging clinical solution but has the potential to overcome the limitations of existing methods. The performance of a real-time couch-based motion compensation system is mainly dependent on two aspects: the ability to infer the internal anatomical position and the performance of the feedback control system. In this paper, we propose two novel methods for the two aspects respectively, and then combine the proposed methods into one system. To accurately estimate the internal tumor position, we present partial-least squares (PLS) regression to predict the position of the diaphragm using skin-based motion surrogates. Four radio-opaque markers were placed on the abdomen of patients who underwent fluoroscopic imaging of the diaphragm. The coordinates of the markers served as input variables and the position of the diaphragm served as the output variable. PLS resulted in lower prediction errors compared with standard multiple linear regression (MLR). The performance of the feedback control system depends on the system dynamics and dead time (delay between the initiation and execution of the control action). While the dynamics of the system can be inverted in a feedback control system, the dead time cannot be inverted. To overcome the dead time of the system, we propose a predictive feedback control system by incorporating forward prediction using least-mean-square (LMS) and recursive least square (RLS) filtering into the couch-based control system. Motion data were obtained using a skin-based marker. The proposed predictive feedback control system was benchmarked against pure feedback control (no forward prediction) and resulted in a significant performance gain. Finally, we combined the PLS inference model and the predictive feedback control to evaluate the overall performance of the feedback control system. Our results show that, with the tumor motion unknown but inferred by skin-based markers through the PLS model, the predictive feedback control system was able to effectively compensate intra-fraction motion.

  11. Beyond Multiple Regression: Using Commonality Analysis to Better Understand R[superscript 2] Results

    ERIC Educational Resources Information Center

    Warne, Russell T.

    2011-01-01

    Multiple regression is one of the most common statistical methods used in quantitative educational research. Despite the versatility and easy interpretability of multiple regression, it has some shortcomings in the detection of suppressor variables and for somewhat arbitrarily assigning values to the structure coefficients of correlated…

  12. Multiplication factor versus regression analysis in stature estimation from hand and foot dimensions.

    PubMed

    Krishan, Kewal; Kanchan, Tanuj; Sharma, Abhilasha

    2012-05-01

    Estimation of stature is an important parameter in identification of human remains in forensic examinations. The present study is aimed to compare the reliability and accuracy of stature estimation and to demonstrate the variability in estimated stature and actual stature using multiplication factor and regression analysis methods. The study is based on a sample of 246 subjects (123 males and 123 females) from North India aged between 17 and 20 years. Four anthropometric measurements; hand length, hand breadth, foot length and foot breadth taken on the left side in each subject were included in the study. Stature was measured using standard anthropometric techniques. Multiplication factors were calculated and linear regression models were derived for estimation of stature from hand and foot dimensions. Derived multiplication factors and regression formula were applied to the hand and foot measurements in the study sample. The estimated stature from the multiplication factors and regression analysis was compared with the actual stature to find the error in estimated stature. The results indicate that the range of error in estimation of stature from regression analysis method is less than that of multiplication factor method thus, confirming that the regression analysis method is better than multiplication factor analysis in stature estimation. Copyright © 2012 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.

  13. Quantitative monitoring of sucrose, reducing sugar and total sugar dynamics for phenotyping of water-deficit stress tolerance in rice through spectroscopy and chemometrics

    NASA Astrophysics Data System (ADS)

    Das, Bappa; Sahoo, Rabi N.; Pargal, Sourabh; Krishna, Gopal; Verma, Rakesh; Chinnusamy, Viswanathan; Sehgal, Vinay K.; Gupta, Vinod K.; Dash, Sushanta K.; Swain, Padmini

    2018-03-01

    In the present investigation, the changes in sucrose, reducing and total sugar content due to water-deficit stress in rice leaves were modeled using visible, near infrared (VNIR) and shortwave infrared (SWIR) spectroscopy. The objectives of the study were to identify the best vegetation indices and suitable multivariate technique based on precise analysis of hyperspectral data (350 to 2500 nm) and sucrose, reducing sugar and total sugar content measured at different stress levels from 16 different rice genotypes. Spectral data analysis was done to identify suitable spectral indices and models for sucrose estimation. Novel spectral indices in near infrared (NIR) range viz. ratio spectral index (RSI) and normalised difference spectral indices (NDSI) sensitive to sucrose, reducing sugar and total sugar content were identified which were subsequently calibrated and validated. The RSI and NDSI models had R2 values of 0.65, 0.71 and 0.67; RPD values of 1.68, 1.95 and 1.66 for sucrose, reducing sugar and total sugar, respectively for validation dataset. Different multivariate spectral models such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), multiple linear regression (MLR), partial least square regression (PLSR), random forest regression (RFR) and support vector machine regression (SVMR) were also evaluated. The best performing multivariate models for sucrose, reducing sugars and total sugars were found to be, MARS, ANN and MARS, respectively with respect to RPD values of 2.08, 2.44, and 1.93. Results indicated that VNIR and SWIR spectroscopy combined with multivariate calibration can be used as a reliable alternative to conventional methods for measurement of sucrose, reducing sugars and total sugars of rice under water-deficit stress as this technique is fast, economic, and noninvasive.

  14. Estimating EQ-5D values from the Oswestry Disability Index and numeric rating scales for back and leg pain.

    PubMed

    Carreon, Leah Y; Bratcher, Kelly R; Das, Nandita; Nienhuis, Jacob B; Glassman, Steven D

    2014-04-15

    Cross-sectional cohort. The purpose of this study is to determine whether the EuroQOL-5D (EQ-5D) can be derived from commonly available low back disease-specific health-related quality of life measures. The Oswestry Disability Index (ODI) and numeric rating scales (0-10) for back pain (BP) and leg pain (LP) are widely used disease-specific measures in patients with lumbar degenerative disorders. Increasingly, the EQ-5D is being used as a measure of utility due to ease of administration and scoring. The EQ-5D, ODI, BP, and LP were prospectively collected in 14,544 patients seen in clinic for lumbar degenerative disorders. Pearson correlation coefficients for paired observations from multiple time points between ODI, BP, LP, and EQ-5D were determined. Regression modeling was done to compute the EQ-5D score from the ODI, BP, and LP. The mean age was 53.3 ± 16.4 years and 41% were male. Correlations between the EQ-5D and the ODI, BP, and LP were statistically significant (P < 0.0001) with correlation coefficients of -0.77, -0.50, and -0.57, respectively. The regression equation: [0.97711 + (-0.00687 × ODI) + (-0.01488 × LP) + (-0.01008 × BP)] to predict EQ-5D, had an R2 of 0.61 and a root mean square error of 0.149. The model using ODI alone had an R2 of 0.57 and a root mean square error of 0.156. The model using the individual ODI items had an R2 of 0.64 and a root mean square error of 0.143. The correlation coefficient between the observed and estimated EQ-5D score was 0.78. There was no statistically significant difference between the actual EQ-5D (0.553 ± 0.238) and the estimated EQ-5D score (0.553 ± 0.186) using the ODI, BP, and LP regression model. However, rounding off the coefficients to less than 5 decimal places produced less accurate results. Unlike previous studies showing a robust relationship between low back-specific measures and the Short Form-6D, a similar relationship was not seen between the ODI, BP, LP, and the EQ-5D. Thus, the EQ-5D cannot be accurately estimated from the ODI, BP, and LP. 2.

  15. Modeling a historical mountain pine beetle outbreak using Landsat MSS and multiple lines of evidence

    USGS Publications Warehouse

    Assal, Timothy J.; Sibold, Jason; Reich, Robin M.

    2014-01-01

    Mountain pine beetles are significant forest disturbance agents, capable of inducing widespread mortality in coniferous forests in western North America. Various remote sensing approaches have assessed the impacts of beetle outbreaks over the last two decades. However, few studies have addressed the impacts of historical mountain pine beetle outbreaks, including the 1970s event that impacted Glacier National Park. The lack of spatially explicit data on this disturbance represents both a major data gap and a critical research challenge in that wildfire has removed some of the evidence from the landscape. We utilized multiple lines of evidence to model forest canopy mortality as a proxy for outbreak severity. We incorporate historical aerial and landscape photos, aerial detection survey data, a nine-year collection of satellite imagery and abiotic data. This study presents a remote sensing based framework to (1) relate measurements of canopy mortality from fine-scale aerial photography to coarse-scale multispectral imagery and (2) classify the severity of mountain pine beetle affected areas using a temporal sequence of Landsat data and other landscape variables. We sampled canopy mortality in 261 plots from aerial photos and found that insect effects on mortality were evident in changes to the Normalized Difference Vegetation Index (NDVI) over time. We tested multiple spectral indices and found that a combination of NDVI and the green band resulted in the strongest model. We report a two-step process where we utilize a generalized least squares model to account for the large-scale variability in the data and a binary regression tree to describe the small-scale variability. The final model had a root mean square error estimate of 9.8% canopy mortality, a mean absolute error of 7.6% and an R2 of 0.82. The results demonstrate that a model of percent canopy mortality as a continuous variable can be developed to identify a gradient of mountain pine beetle severity on the landscape.

  16. On Insensitivity of the Chi-Square Model Test to Nonlinear Misspecification in Structural Equation Models

    ERIC Educational Resources Information Center

    Mooijaart, Ab; Satorra, Albert

    2009-01-01

    In this paper, we show that for some structural equation models (SEM), the classical chi-square goodness-of-fit test is unable to detect the presence of nonlinear terms in the model. As an example, we consider a regression model with latent variables and interactions terms. Not only the model test has zero power against that type of…

  17. An Analysis of Advertising Effectiveness for U.S. Navy Recruiting

    DTIC Science & Technology

    1997-09-01

    This thesis estimates the effect of Navy television advertising on enlistment rates of high quality male recruits (Armed Forces Qualification Test...Joint advertising is for all Armed Forces), Joint journal, and Joint direct mail advertising are explored. Enlistments are modeled as a function of...several factors including advertising , recruiters, and economic. Regression analyses (Ordinary Least Squares and Two Stage Least Squares) explore the

  18. Education level as a predictor of condom use in jail-incarcerated women, with fundamental cause analysis.

    PubMed

    Emerson, Amanda M; Carroll, Hsiang-Feng; Ramaswamy, Megha

    2018-05-27

    To model condom usage by jail-incarcerated women incarcerated in US local jails and understand results in terms of fundamental cause theory. We surveyed 102 women in an urban jail in the Midwest United States. Chi-square tests and generalized linear modeling were used to identify factors of significance for women who used condoms during last sex compared with women who did not. Stepwise multiple logistic regression was conducted to estimate the relation between the outcome variable and variables linked to condom use in the literature. Logistic regression showed that for women who completed high school odds of reporting condom use during last sex were 2.78 times higher (p = .043) than the odds for women with less than a high school education. Among women who responded no to ever having had a sexually transmitted infection, odds of using a condom during last sex were 2.597 times (p = .03) higher than odds for women who responded that they had had a sexually transmitted infection. Education is a fundamental cause of reproductive health risk among incarcerated women. We recommend interventions that creatively target distal over proximal factors. © 2018 Wiley Periodicals, Inc.

  19. Novel spectrophotometric determination of chloramphenicol and dexamethasone in the presence of non labeled interfering substances using univariate methods and multivariate regression model updating

    NASA Astrophysics Data System (ADS)

    Hegazy, Maha A.; Lotfy, Hayam M.; Rezk, Mamdouh R.; Omran, Yasmin Rostom

    2015-04-01

    Smart and novel spectrophotometric and chemometric methods have been developed and validated for the simultaneous determination of a binary mixture of chloramphenicol (CPL) and dexamethasone sodium phosphate (DSP) in presence of interfering substances without prior separation. The first method depends upon derivative subtraction coupled with constant multiplication. The second one is ratio difference method at optimum wavelengths which were selected after applying derivative transformation method via multiplying by a decoding spectrum in order to cancel the contribution of non labeled interfering substances. The third method relies on partial least squares with regression model updating. They are so simple that they do not require any preliminary separation steps. Accuracy, precision and linearity ranges of these methods were determined. Moreover, specificity was assessed by analyzing synthetic mixtures of both drugs. The proposed methods were successfully applied for analysis of both drugs in their pharmaceutical formulation. The obtained results have been statistically compared to that of an official spectrophotometric method to give a conclusion that there is no significant difference between the proposed methods and the official ones with respect to accuracy and precision.

  20. Is obesity associated with global warming?

    PubMed

    Squalli, J

    2014-12-01

    Obesity is a national epidemic that imposes direct medical and indirect economic costs on society. Recent scholarly inquiries contend that obesity also contributes to global warming. The paper investigates the relationship between greenhouse gas emissions and obesity. Cross-sectional state-level data for the year 2010. Multiple regression analysis using least squares with bootstrapped standard errors and quantile regression. States with higher rates of obesity are associated with higher CO2 and CH4 emissions (p < 0.05) and marginally associated with higher N2O emissions (p < 0.10), net of other factors. Reverting to the obesity rates of the year 2000 across the entire United States could decrease greenhouse gas emissions by about two percent, representing more than 136 million metric tons of CO2 equivalent. Future studies should establish clear causality between obesity and emissions by using longitudinal data while controlling for other relevant factors. They should also consider identifying means to net out the potential effects of carbon sinks, conversion of CH4 to energy, cross-state diversion, disposal, and transfer of municipal solid waste, and potentially lower energy consumption from increased sedentariness. Copyright © 2014 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.

  1. Standing on the shoulders of apes: Analyzing the form and function of the hominoid scapula using geometric morphometrics and finite element analysis.

    PubMed

    Püschel, Thomas A; Sellers, William I

    2016-02-01

    The aim was to analyze the relationship between scapular form and function in hominoids by using geometric morphometrics (GM) and finite element analysis (FEA). FEA was used to analyze the biomechanical performance of different hominoid scapulae by simulating static postural scenarios. GM was used to quantify scapular shape differences and the relationship between form and function was analyzed by applying both multivariate-multiple regressions and phylogenetic generalized least-squares regressions (PGLS). Although it has been suggested that primate scapular morphology is mainly a product of function rather than phylogeny, our results showed that shape has a significant phylogenetic signal. There was a significant relationship between scapular shape and its biomechanical performance; hence at least part of the scapular shape variation is due to non-phylogenetic factors, probably related to functional demands. This study has shown that a combined approach using GM and FEA was able to cast some light regarding the functional and phylogenetic contributions in hominoid scapular morphology, thus contributing to a better insight of the association between scapular form and function. © 2015 Wiley Periodicals, Inc.

  2. Feasibility of using a miniature NIR spectrometer to measure volumic mass during alcoholic fermentation.

    PubMed

    Fernández-Novales, Juan; López, María-Isabel; González-Caballero, Virginia; Ramírez, Pilar; Sánchez, María-Teresa

    2011-06-01

    Volumic mass-a key component of must quality control tests during alcoholic fermentation-is of great interest to the winemaking industry. Transmitance near-infrared (NIR) spectra of 124 must samples over the range of 200-1,100-nm were obtained using a miniature spectrometer. The performance of this instrument to predict volumic mass was evaluated using partial least squares (PLS) regression and multiple linear regression (MLR). The validation statistics coefficient of determination (r(2)) and the standard error of prediction (SEP) were r(2) = 0.98, n = 31 and r(2) = 0.96, n = 31, and SEP = 5.85 and 7.49 g/dm(3) for PLS and MLR equations developed to fit reference data for volumic mass and spectral data. Comparison of results from MLR and PLS demonstrates that a MLR model with six significant wavelengths (P < 0.05) fit volumic mass data to transmittance (1/T) data slightly worse than a more sophisticated PLS model using the full scanning range. The results suggest that NIR spectroscopy is a suitable technique for predicting volumic mass during alcoholic fermentation, and that a low-cost NIR instrument can be used for this purpose.

  3. Estimation of magnitude and frequency of floods for streams in Puerto Rico : new empirical models

    USGS Publications Warehouse

    Ramos-Gines, Orlando

    1999-01-01

    Flood-peak discharges and frequencies are presented for 57 gaged sites in Puerto Rico for recurrence intervals ranging from 2 to 500 years. The log-Pearson Type III distribution, the methodology recommended by the United States Interagency Committee on Water Data, was used to determine the magnitude and frequency of floods at the gaged sites having 10 to 43 years of record. A technique is presented for estimating flood-peak discharges at recurrence intervals ranging from 2 to 500 years for unregulated streams in Puerto Rico with contributing drainage areas ranging from 0.83 to 208 square miles. Loglinear multiple regression analyses, using climatic and basin characteristics and peak-discharge data from the 57 gaged sites, were used to construct regression equations to transfer the magnitude and frequency information from gaged to ungaged sites. The equations have contributing drainage area, depth-to-rock, and mean annual rainfall as the basin and climatic characteristics in estimating flood peak discharges. Examples are given to show a step-by-step procedure in calculating a 100-year flood at a gaged site, an ungaged site, a site near a gaged location, and a site between two gaged sites.

  4. Lagtime relations for urban streams in Georgia

    USGS Publications Warehouse

    Inman, Ernest J.

    2000-01-01

    Urban flood hydrographs are needed for the design of many highway drainage structures, embankments, and entrances to detention ponds. The three components that are needed to simulate urban flood hydrographs at ungaged sites are the design flood, the dimensionless hydrograph, and lagtime. The design flood and the dimensionless hydrograph have been presented in earlier studies for urban streams in Georgia. The objective of this study was to develop equations for estimating lagtime for urban streams in Georgia. Lagtimes were computed for 329 floods at 69 urban gaging stations in 11 cities in Georgia. These data were used to compute an average lagtime for each gaging station. Multiple regression analysis was then used to define relations between lagtime and certain physical basin characteristics, of which drainage area, slope, and impervious area were found to be significant. A qualitative variable was used to account for a geographical bias in flood-frequency region 4, a small area of southwestern Georgia. Information from this report can be used to simulate a flood hydrograph using a dimensionless hydrograph, the design flood, and the lagtime obtained from regression equations for any urban site with less than a 25-square-mile drainage area in Georgia.

  5. Methods for estimating the magnitude and frequency of peak streamflows for unregulated streams in Oklahoma

    USGS Publications Warehouse

    Lewis, Jason M.

    2010-01-01

    Peak-streamflow regression equations were determined for estimating flows with exceedance probabilities from 50 to 0.2 percent for the state of Oklahoma. These regression equations incorporate basin characteristics to estimate peak-streamflow magnitude and frequency throughout the state by use of a generalized least squares regression analysis. The most statistically significant independent variables required to estimate peak-streamflow magnitude and frequency for unregulated streams in Oklahoma are contributing drainage area, mean-annual precipitation, and main-channel slope. The regression equations are applicable for watershed basins with drainage areas less than 2,510 square miles that are not affected by regulation. The resulting regression equations had a standard model error ranging from 31 to 46 percent. Annual-maximum peak flows observed at 231 streamflow-gaging stations through water year 2008 were used for the regression analysis. Gage peak-streamflow estimates were used from previous work unless 2008 gaging-station data were available, in which new peak-streamflow estimates were calculated. The U.S. Geological Survey StreamStats web application was used to obtain the independent variables required for the peak-streamflow regression equations. Limitations on the use of the regression equations and the reliability of regression estimates for natural unregulated streams are described. Log-Pearson Type III analysis information, basin and climate characteristics, and the peak-streamflow frequency estimates for the 231 gaging stations in and near Oklahoma are listed. Methodologies are presented to estimate peak streamflows at ungaged sites by using estimates from gaging stations on unregulated streams. For ungaged sites on urban streams and streams regulated by small floodwater retarding structures, an adjustment of the statewide regression equations for natural unregulated streams can be used to estimate peak-streamflow magnitude and frequency.

  6. Regression Equations for Estimating Flood Flows at Selected Recurrence Intervals for Ungaged Streams in Pennsylvania

    USGS Publications Warehouse

    Roland, Mark A.; Stuckey, Marla H.

    2008-01-01

    Regression equations were developed for estimating flood flows at selected recurrence intervals for ungaged streams in Pennsylvania with drainage areas less than 2,000 square miles. These equations were developed utilizing peak-flow data from 322 streamflow-gaging stations within Pennsylvania and surrounding states. All stations used in the development of the equations had 10 or more years of record and included active and discontinued continuous-record as well as crest-stage partial-record stations. The state was divided into four regions, and regional regression equations were developed to estimate the 2-, 5-, 10-, 50-, 100-, and 500-year recurrence-interval flood flows. The equations were developed by means of a regression analysis that utilized basin characteristics and flow data associated with the stations. Significant explanatory variables at the 95-percent confidence level for one or more regression equations included the following basin characteristics: drainage area; mean basin elevation; and the percentages of carbonate bedrock, urban area, and storage within a basin. The regression equations can be used to predict the magnitude of flood flows for specified recurrence intervals for most streams in the state; however, they are not valid for streams with drainage areas generally greater than 2,000 square miles or with substantial regulation, diversion, or mining activity within the basin. Estimates of flood-flow magnitude and frequency for streamflow-gaging stations substantially affected by upstream regulation are also presented.

  7. Estimating the Extreme Behaviors of Students Performance Using Quantile Regression--Evidences from Taiwan

    ERIC Educational Resources Information Center

    Chen, Sheng-Tung; Kuo, Hsiao-I.; Chen, Chi-Chung

    2012-01-01

    The two-stage least squares approach together with quantile regression analysis is adopted here to estimate the educational production function. Such a methodology is able to capture the extreme behaviors of the two tails of students' performance and the estimation outcomes have important policy implications. Our empirical study is applied to the…

  8. Determination of cellulose I crystallinity by FT-Raman spectroscopy

    Treesearch

    Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph

    2009-01-01

    Two new methods based on FT-Raman spectroscopy, one simple, based on band intensity ratio, and the other, using a partial least-squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in semicrystalline cellulose I samples was determined based on univariate regression that was first developed using the...

  9. Testing the Hypothesis of a Homoscedastic Error Term in Simple, Nonparametric Regression

    ERIC Educational Resources Information Center

    Wilcox, Rand R.

    2006-01-01

    Consider the nonparametric regression model Y = m(X)+ [tau](X)[epsilon], where X and [epsilon] are independent random variables, [epsilon] has a median of zero and variance [sigma][squared], [tau] is some unknown function used to model heteroscedasticity, and m(X) is an unknown function reflecting some conditional measure of location associated…

  10. CONTRIBUTION OF NUTRIENTS AND E. COLI TO SURFACE WATER CONDITION IN THE OZARKS I. USING PARTIAL LEAST SQUARES PREDICTIONS WHEN STANDARD REGRESSION ASSUMPTIONS ARE VIOLATED

    EPA Science Inventory

    We present here the application of PLS regression to predicting surface water total phosphorous, total ammonia and Escherichia coli from landscape metrics. The amount of variability in surface water constituents explained by each model reflects the composition of the contributi...

  11. Analyzing Multilevel Data: An Empirical Comparison of Parameter Estimates of Hierarchical Linear Modeling and Ordinary Least Squares Regression

    ERIC Educational Resources Information Center

    Rocconi, Louis M.

    2011-01-01

    Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…

  12. Least-Squares Analysis of Data with Uncertainty in "y" and "x": Algorithms in Excel and KaleidaGraph

    ERIC Educational Resources Information Center

    Tellinghuisen, Joel

    2018-01-01

    For the least-squares analysis of data having multiple uncertain variables, the generally accepted best solution comes from minimizing the sum of weighted squared residuals over all uncertain variables, with, for example, weights in x[subscript i] taken as inversely proportional to the variance [delta][subscript xi][superscript 2]. A complication…

  13. Methamphetamine abuse during pregnancy and its health impact on neonates born at Siriraj Hospital, Bangkok, Thailand.

    PubMed

    Chomchai, Chulathida; Na Manorom, Natawadee; Watanarungsan, Pornchai; Yossuck, Panitan; Chomchai, Summon

    2004-03-01

    To ascertain the impact of intrauterine methamphetamine exposure on the overall health of newborn infants at Siriraj Hospital, Bangkok, Thailand, birth records of somatic growth parameters and neonatal withdrawal symptoms of 47 infants born to methamphetamine-abusing women during January 2001 to December 2001 were compared to 49 newborns whose mothers did not use methamphetamines during pregnancy. The data on somatic growth was analyzed using linear regression and multiple linear regression. The association between methamphetamine use and withdrawal symptoms was analyzed using the chi-square. Home visitation and maternal interview records were reviewed in order to assess for child-rearing attitude, and psychosocial parameters. Infants of methamphetamine-abusing mothers were found to have a significantly smaller gestational age-adjusted head circumference (regression coefficient = -1.458, p < 0.001) and birth weight (regression coefficient = -217.9, p < or = 0.001) measurements. Methamphetamine exposure was also associated with symptoms of agitation (5/47), vomiting (11/47) and tachypnea (12/47) when compared to the non-exposed group (p < 0r =0.001). Maternal interviews were conducted in 23 cases and showed that: 96% of the cases had inadequate prenatal care (<5 visits), 48% had at least one parent involved in prostitution, 39% of the mothers were unwilling to take their children home, and government or non-government support were provided in only 30% of the cases. In-utero methamphetamine exposure has been shown to adversely effect somatic growth of newborns and cause a variety of withdrawal-like symptoms. These infants are also psychosocially disadvantaged and are at greater risk for abuse and neglect.

  14. Modulation of the relationship between external knee adduction moments and medial joint contact forces across subjects and activities.

    PubMed

    Trepczynski, Adam; Kutzner, Ines; Bergmann, Georg; Taylor, William R; Heller, Markus O

    2014-05-01

    The external knee adduction moment (EAM) is often considered a surrogate measure of the distribution of loads across the tibiofemoral joint during walking. This study was undertaken to quantify the relationship between the EAM and directly measured medial tibiofemoral contact forces (Fmed ) in a sample of subjects across a spectrum of activities. The EAM for 9 patients who underwent total knee replacement was calculated using inverse dynamics analysis, while telemetric implants provided Fmed for multiple repetitions of 10 activities, including walking, stair negotiation, sit-to-stand activities, and squatting. The effects of the factors "subject" and "activity" on the relationships between Fmed and EAM were quantified using mixed-effects regression analyses in terms of the root mean square error (RMSE) and the slope of the regression. Across subjects and activities a good correlation between peak EAM and Fmed values was observed, with an overall R(2) value of 0.88. However, the slope of the linear regressions varied between subjects by up to a factor of 2. At peak EAM and Fmed , the RMSE of the regression across all subjects was 35% body weight (%BW), while the maximum error was 127 %BW. The relationship between EAM and Fmed is generally good but varies considerably across subjects and activities. These findings emphasize the limitation of relying solely on the EAM to infer medial joint loading when excessive directed cocontraction of muscles exists and call for further investigations into the soft tissue-related mechanisms that modulate the internal forces at the knee. Copyright © 2014 by the American College of Rheumatology.

  15. Improving the chi-squared approximation for bivariate normal tolerance regions

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.

    1993-01-01

    Let X be a two-dimensional random variable distributed according to N2(mu,Sigma) and let bar-X and S be the respective sample mean and covariance matrix calculated from N observations of X. Given a containment probability beta and a level of confidence gamma, we seek a number c, depending only on N, beta, and gamma such that the ellipsoid R = (x: (x - bar-X)'S(exp -1) (x - bar-X) less than or = c) is a tolerance region of content beta and level gamma; i.e., R has probability gamma of containing at least 100 beta percent of the distribution of X. Various approximations for c exist in the literature, but one of the simplest to compute -- a multiple of the ratio of certain chi-squared percentage points -- is badly biased for small N. For the bivariate normal case, most of the bias can be removed by simple adjustment using a factor A which depends on beta and gamma. This paper provides values of A for various beta and gamma so that the simple approximation for c can be made viable for any reasonable sample size. The methodology provides an illustrative example of how a combination of Monte-Carlo simulation and simple regression modelling can be used to improve an existing approximation.

  16. The prediction of speed and incline in outdoor running in humans using accelerometry.

    PubMed

    Herren, R; Sparti, A; Aminian, K; Schutz, Y

    1999-07-01

    To explore whether triaxial accelerometric measurements can be utilized to accurately assess speed and incline of running in free-living conditions. Body accelerations during running were recorded at the lower back and at the heel by a portable data logger in 20 human subjects, 10 men, and 10 women. After parameterizing body accelerations, two neural networks were designed to recognize each running pattern and calculate speed and incline. Each subject ran 18 times on outdoor roads at various speeds and inclines; 12 runs were used to calibrate the neural networks whereas the 6 other runs were used to validate the model. A small difference between the estimated and the actual values was observed: the square root of the mean square error (RMSE) was 0.12 m x s(-1) for speed and 0.014 radiant (rad) (or 1.4% in absolute value) for incline. Multiple regression analysis allowed accurate prediction of speed (RMSE = 0.14 m x s(-1)) but not of incline (RMSE = 0.026 rad or 2.6% slope). Triaxial accelerometric measurements allows an accurate estimation of speed of running and incline of terrain (the latter with more uncertainty). This will permit the validation of the energetic results generated on the treadmill as applied to more physiological unconstrained running conditions.

  17. Texture-preserved penalized weighted least-squares reconstruction of low-dose CT image via image segmentation and high-order MRF modeling

    NASA Astrophysics Data System (ADS)

    Han, Hao; Zhang, Hao; Wei, Xinzhou; Moore, William; Liang, Zhengrong

    2016-03-01

    In this paper, we proposed a low-dose computed tomography (LdCT) image reconstruction method with the help of prior knowledge learning from previous high-quality or normal-dose CT (NdCT) scans. The well-established statistical penalized weighted least squares (PWLS) algorithm was adopted for image reconstruction, where the penalty term was formulated by a texture-based Gaussian Markov random field (gMRF) model. The NdCT scan was firstly segmented into different tissue types by a feature vector quantization (FVQ) approach. Then for each tissue type, a set of tissue-specific coefficients for the gMRF penalty was statistically learnt from the NdCT image via multiple-linear regression analysis. We also proposed a scheme to adaptively select the order of gMRF model for coefficients prediction. The tissue-specific gMRF patterns learnt from the NdCT image were finally used to form an adaptive MRF penalty for the PWLS reconstruction of LdCT image. The proposed texture-adaptive PWLS image reconstruction algorithm was shown to be more effective to preserve image textures than the conventional PWLS image reconstruction algorithm, and we further demonstrated the gain of high-order MRF modeling for texture-preserved LdCT PWLS image reconstruction.

  18. Toxicity of ionic liquids: database and prediction via quantitative structure-activity relationship method.

    PubMed

    Zhao, Yongsheng; Zhao, Jihong; Huang, Ying; Zhou, Qing; Zhang, Xiangping; Zhang, Suojiang

    2014-08-15

    A comprehensive database on toxicity of ionic liquids (ILs) is established. The database includes over 4000 pieces of data. Based on the database, the relationship between IL's structure and its toxicity has been analyzed qualitatively. Furthermore, Quantitative Structure-Activity relationships (QSAR) model is conducted to predict the toxicities (EC50 values) of various ILs toward the Leukemia rat cell line IPC-81. Four parameters selected by the heuristic method (HM) are used to perform the studies of multiple linear regression (MLR) and support vector machine (SVM). The squared correlation coefficient (R(2)) and the root mean square error (RMSE) of training sets by two QSAR models are 0.918 and 0.959, 0.258 and 0.179, respectively. The prediction R(2) and RMSE of QSAR test sets by MLR model are 0.892 and 0.329, by SVM model are 0.958 and 0.234, respectively. The nonlinear model developed by SVM algorithm is much outperformed MLR, which indicates that SVM model is more reliable in the prediction of toxicity of ILs. This study shows that increasing the relative number of O atoms of molecules leads to decrease in the toxicity of ILs. Copyright © 2014 Elsevier B.V. All rights reserved.

  19. Mortality risk score prediction in an elderly population using machine learning.

    PubMed

    Rose, Sherri

    2013-03-01

    Standard practice for prediction often relies on parametric regression methods. Interesting new methods from the machine learning literature have been introduced in epidemiologic studies, such as random forest and neural networks. However, a priori, an investigator will not know which algorithm to select and may wish to try several. Here I apply the super learner, an ensembling machine learning approach that combines multiple algorithms into a single algorithm and returns a prediction function with the best cross-validated mean squared error. Super learning is a generalization of stacking methods. I used super learning in the Study of Physical Performance and Age-Related Changes in Sonomans (SPPARCS) to predict death among 2,066 residents of Sonoma, California, aged 54 years or more during the period 1993-1999. The super learner for predicting death (risk score) improved upon all single algorithms in the collection of algorithms, although its performance was similar to that of several algorithms. Super learner outperformed the worst algorithm (neural networks) by 44% with respect to estimated cross-validated mean squared error and had an R2 value of 0.201. The improvement of super learner over random forest with respect to R2 was approximately 2-fold. Alternatives for risk score prediction include the super learner, which can provide improved performance.

  20. A comparative study of kinetic and connectionist modeling for shelf-life prediction of Basundi mix.

    PubMed

    Ruhil, A P; Singh, R R B; Jain, D K; Patel, A A; Patil, G R

    2011-04-01

    A ready-to-reconstitute formulation of Basundi, a popular Indian dairy dessert was subjected to storage at various temperatures (10, 25 and 40 °C) and deteriorative changes in the Basundi mix were monitored using quality indices like pH, hydroxyl methyl furfural (HMF), bulk density (BD) and insolubility index (II). The multiple regression equations and the Arrhenius functions that describe the parameters' dependence on temperature for the four physico-chemical parameters were integrated to develop mathematical models for predicting sensory quality of Basundi mix. Connectionist model using multilayer feed forward neural network with back propagation algorithm was also developed for predicting the storage life of the product employing artificial neural network (ANN) tool box of MATLAB software. The quality indices served as the input parameters whereas the output parameters were the sensorily evaluated flavour and total sensory score. A total of 140 observations were used and the prediction performance was judged on the basis of per cent root mean square error. The results obtained from the two approaches were compared. Relatively lower magnitudes of percent root mean square error for both the sensory parameters indicated that the connectionist models were better fitted than kinetic models for predicting storage life.

  1. Testing concordance of instrumental variable effects in generalized linear models with application to Mendelian randomization

    PubMed Central

    Dai, James Y.; Chan, Kwun Chuen Gary; Hsu, Li

    2014-01-01

    Instrumental variable regression is one way to overcome unmeasured confounding and estimate causal effect in observational studies. Built on structural mean models, there has been considerale work recently developed for consistent estimation of causal relative risk and causal odds ratio. Such models can sometimes suffer from identification issues for weak instruments. This hampered the applicability of Mendelian randomization analysis in genetic epidemiology. When there are multiple genetic variants available as instrumental variables, and causal effect is defined in a generalized linear model in the presence of unmeasured confounders, we propose to test concordance between instrumental variable effects on the intermediate exposure and instrumental variable effects on the disease outcome, as a means to test the causal effect. We show that a class of generalized least squares estimators provide valid and consistent tests of causality. For causal effect of a continuous exposure on a dichotomous outcome in logistic models, the proposed estimators are shown to be asymptotically conservative. When the disease outcome is rare, such estimators are consistent due to the log-linear approximation of the logistic function. Optimality of such estimators relative to the well-known two-stage least squares estimator and the double-logistic structural mean model is further discussed. PMID:24863158

  2. M-estimation for robust sparse unmixing of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Toomik, Maria; Lu, Shijian; Nelson, James D. B.

    2016-10-01

    Hyperspectral unmixing methods often use a conventional least squares based lasso which assumes that the data follows the Gaussian distribution. The normality assumption is an approximation which is generally invalid for real imagery data. We consider a robust (non-Gaussian) approach to sparse spectral unmixing of remotely sensed imagery which reduces the sensitivity of the estimator to outliers and relaxes the linearity assumption. The method consists of several appropriate penalties. We propose to use an lp norm with 0 < p < 1 in the sparse regression problem, which induces more sparsity in the results, but makes the problem non-convex. On the other hand, the problem, though non-convex, can be solved quite straightforwardly with an extensible algorithm based on iteratively reweighted least squares. To deal with the huge size of modern spectral libraries we introduce a library reduction step, similar to the multiple signal classification (MUSIC) array processing algorithm, which not only speeds up unmixing but also yields superior results. In the hyperspectral setting we extend the traditional least squares method to the robust heavy-tailed case and propose a generalised M-lasso solution. M-estimation replaces the Gaussian likelihood with a fixed function ρ(e) that restrains outliers. The M-estimate function reduces the effect of errors with large amplitudes or even assigns the outliers zero weights. Our experimental results on real hyperspectral data show that noise with large amplitudes (outliers) often exists in the data. This ability to mitigate the influence of such outliers can therefore offer greater robustness. Qualitative hyperspectral unmixing results on real hyperspectral image data corroborate the efficacy of the proposed method.

  3. Prediction of hearing outcomes by multiple regression analysis in patients with idiopathic sudden sensorineural hearing loss.

    PubMed

    Suzuki, Hideaki; Tabata, Takahisa; Koizumi, Hiroki; Hohchi, Nobusuke; Takeuchi, Shoko; Kitamura, Takuro; Fujino, Yoshihisa; Ohbuchi, Toyoaki

    2014-12-01

    This study aimed to create a multiple regression model for predicting hearing outcomes of idiopathic sudden sensorineural hearing loss (ISSNHL). The participants were 205 consecutive patients (205 ears) with ISSNHL (hearing level ≥ 40 dB, interval between onset and treatment ≤ 30 days). They received systemic steroid administration combined with intratympanic steroid injection. Data were examined by simple and multiple regression analyses. Three hearing indices (percentage hearing improvement, hearing gain, and posttreatment hearing level [HLpost]) and 7 prognostic factors (age, days from onset to treatment, initial hearing level, initial hearing level at low frequencies, initial hearing level at high frequencies, presence of vertigo, and contralateral hearing level) were included in the multiple regression analysis as dependent and explanatory variables, respectively. In the simple regression analysis, the percentage hearing improvement, hearing gain, and HLpost showed significant correlation with 2, 5, and 6 of the 7 prognostic factors, respectively. The multiple correlation coefficients were 0.396, 0.503, and 0.714 for the percentage hearing improvement, hearing gain, and HLpost, respectively. Predicted values of HLpost calculated by the multiple regression equation were reliable with 70% probability with a 40-dB-width prediction interval. Prediction of HLpost by the multiple regression model may be useful to estimate the hearing prognosis of ISSNHL. © The Author(s) 2014.

  4. Linear regression analysis for comparing two measurers or methods of measurement: but which regression?

    PubMed

    Ludbrook, John

    2010-07-01

    1. There are two reasons for wanting to compare measurers or methods of measurement. One is to calibrate one method or measurer against another; the other is to detect bias. Fixed bias is present when one method gives higher (or lower) values across the whole range of measurement. Proportional bias is present when one method gives values that diverge progressively from those of the other. 2. Linear regression analysis is a popular method for comparing methods of measurement, but the familiar ordinary least squares (OLS) method is rarely acceptable. The OLS method requires that the x values are fixed by the design of the study, whereas it is usual that both y and x values are free to vary and are subject to error. In this case, special regression techniques must be used. 3. Clinical chemists favour techniques such as major axis regression ('Deming's method'), the Passing-Bablok method or the bivariate least median squares method. Other disciplines, such as allometry, astronomy, biology, econometrics, fisheries research, genetics, geology, physics and sports science, have their own preferences. 4. Many Monte Carlo simulations have been performed to try to decide which technique is best, but the results are almost uninterpretable. 5. I suggest that pharmacologists and physiologists should use ordinary least products regression analysis (geometric mean regression, reduced major axis regression): it is versatile, can be used for calibration or to detect bias and can be executed by hand-held calculator or by using the loss function in popular, general-purpose, statistical software.

  5. False Positives in Multiple Regression: Unanticipated Consequences of Measurement Error in the Predictor Variables

    ERIC Educational Resources Information Center

    Shear, Benjamin R.; Zumbo, Bruno D.

    2013-01-01

    Type I error rates in multiple regression, and hence the chance for false positive research findings, can be drastically inflated when multiple regression models are used to analyze data that contain random measurement error. This article shows the potential for inflated Type I error rates in commonly encountered scenarios and provides new…

  6. Using Robust Standard Errors to Combine Multiple Regression Estimates with Meta-Analysis

    ERIC Educational Resources Information Center

    Williams, Ryan T.

    2012-01-01

    Combining multiple regression estimates with meta-analysis has continued to be a difficult task. A variety of methods have been proposed and used to combine multiple regression slope estimates with meta-analysis, however, most of these methods have serious methodological and practical limitations. The purpose of this study was to explore the use…

  7. Use of Multiple Regression and Use-Availability Analyses in Determining Habitat Selection by Gray Squirrels (Sciurus Carolinensis)

    Treesearch

    John W. Edwards; Susan C. Loeb; David C. Guynn

    1994-01-01

    Multiple regression and use-availability analyses are two methods for examining habitat selection. Use-availability analysis is commonly used to evaluate macrohabitat selection whereas multiple regression analysis can be used to determine microhabitat selection. We compared these techniques using behavioral observations (n = 5534) and telemetry locations (n = 2089) of...

  8. An Application of Interactive Computer Graphics to the Study of Inferential Statistics and the General Linear Model

    DTIC Science & Technology

    1991-09-01

    matrix, the Regression Sum of Squares (SSR) and Error Sum of Squares (SSE) are also displayed as a percentage of the Total Sum of Squares ( SSTO ...vector when the student compares the SSR to the SSE. In addition to the plot, the actual values of SSR, SSE, and SSTO are also provided. Figure 3 gives the...Es ainSpace = E 3 Error- Eor Space =n t! L . Pro~cio q Yonto Pro~rct on of Y onto the simaton, pac ror Space SSR SSEL0.20 IV = 14,1 +IErrorI 2 SSTO

  9. Building Regression Models: The Importance of Graphics.

    ERIC Educational Resources Information Center

    Dunn, Richard

    1989-01-01

    Points out reasons for using graphical methods to teach simple and multiple regression analysis. Argues that a graphically oriented approach has considerable pedagogic advantages in the exposition of simple and multiple regression. Shows that graphical methods may play a central role in the process of building regression models. (Author/LS)

  10. Testing Different Model Building Procedures Using Multiple Regression.

    ERIC Educational Resources Information Center

    Thayer, Jerome D.

    The stepwise regression method of selecting predictors for computer assisted multiple regression analysis was compared with forward, backward, and best subsets regression, using 16 data sets. The results indicated the stepwise method was preferred because of its practical nature, when the models chosen by different selection methods were similar…

  11. Compound Identification Using Penalized Linear Regression on Metabolomics

    PubMed Central

    Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho

    2014-01-01

    Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894

  12. Decreasing Multicollinearity: A Method for Models with Multiplicative Functions.

    ERIC Educational Resources Information Center

    Smith, Kent W.; Sasaki, M. S.

    1979-01-01

    A method is proposed for overcoming the problem of multicollinearity in multiple regression equations where multiplicative independent terms are entered. The method is not a ridge regression solution. (JKS)

  13. Fast function-on-scalar regression with penalized basis expansions.

    PubMed

    Reiss, Philip T; Huang, Lei; Mennes, Maarten

    2010-01-01

    Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.

  14. Durbin-Watson partial least-squares regression applied to MIR data on adulteration with edible oils of different origins.

    PubMed

    Jović, Ozren

    2016-12-15

    A novel method for quantitative prediction and variable-selection on spectroscopic data, called Durbin-Watson partial least-squares regression (dwPLS), is proposed in this paper. The idea is to inspect serial correlation in infrared data that is known to consist of highly correlated neighbouring variables. The method selects only those variables whose intervals have a lower Durbin-Watson statistic (dw) than a certain optimal cutoff. For each interval, dw is calculated on a vector of regression coefficients. Adulteration of cold-pressed linseed oil (L), a well-known nutrient beneficial to health, is studied in this work by its being mixed with cheaper oils: rapeseed oil (R), sesame oil (Se) and sunflower oil (Su). The samples for each botanical origin of oil vary with respect to producer, content and geographic origin. The results obtained indicate that MIR-ATR, combined with dwPLS could be implemented to quantitative determination of edible-oil adulteration. Copyright © 2016 Elsevier Ltd. All rights reserved.

  15. A quasi-Monte-Carlo comparison of parametric and semiparametric regression methods for heavy-tailed and non-normal data: an application to healthcare costs.

    PubMed

    Jones, Andrew M; Lomas, James; Moore, Peter T; Rice, Nigel

    2016-10-01

    We conduct a quasi-Monte-Carlo comparison of the recent developments in parametric and semiparametric regression methods for healthcare costs, both against each other and against standard practice. The population of English National Health Service hospital in-patient episodes for the financial year 2007-2008 (summed for each patient) is randomly divided into two equally sized subpopulations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and among the best four for bias and goodness of fit. The best performing model for bias is linear regression with square-root-transformed dependent variables, whereas a generalized linear model with square-root link function and Poisson distribution performs best in terms of goodness of fit. Commonly used models utilizing a log-link are shown to perform badly relative to other models considered in our comparison.

  16. An interactive website for analytical method comparison and bias estimation.

    PubMed

    Bahar, Burak; Tuncel, Ayse F; Holmes, Earle W; Holmes, Daniel T

    2017-12-01

    Regulatory standards mandate laboratories to perform studies to ensure accuracy and reliability of their test results. Method comparison and bias estimation are important components of these studies. We developed an interactive website for evaluating the relative performance of two analytical methods using R programming language tools. The website can be accessed at https://bahar.shinyapps.io/method_compare/. The site has an easy-to-use interface that allows both copy-pasting and manual entry of data. It also allows selection of a regression model and creation of regression and difference plots. Available regression models include Ordinary Least Squares, Weighted-Ordinary Least Squares, Deming, Weighted-Deming, Passing-Bablok and Passing-Bablok for large datasets. The server processes the data and generates downloadable reports in PDF or HTML format. Our website provides clinical laboratories a practical way to assess the relative performance of two analytical methods. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  17. Discriminative least squares regression for multiclass classification and feature selection.

    PubMed

    Xiang, Shiming; Nie, Feiping; Meng, Gaofeng; Pan, Chunhong; Zhang, Changshui

    2012-11-01

    This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.

  18. Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation

    NASA Astrophysics Data System (ADS)

    Reis, D. S.; Stedinger, J. R.; Martins, E. S.

    2005-10-01

    This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.

  19. Interpretation of the Coefficients in the Fit y = at + bx + c

    ERIC Educational Resources Information Center

    Farnsworth, David L.

    2006-01-01

    The goals of this note are to derive formulas for the coefficients a and b in the least-squares regression plane y = at + bx + c for observations (t[subscript]i,x[subscript]i,y[subscript]i), i = 1, 2, ..., n, and to present meanings for the coefficients a and b. In this note, formulas for the coefficients a and b in the least-squares fit are…

  20. Prediction of pH of cola beverage using Vis/NIR spectroscopy and least squares-support vector machine

    NASA Astrophysics Data System (ADS)

    Liu, Fei; He, Yong

    2008-02-01

    Visible and near infrared (Vis/NIR) transmission spectroscopy and chemometric methods were utilized to predict the pH values of cola beverages. Five varieties of cola were prepared and 225 samples (45 samples for each variety) were selected for the calibration set, while 75 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay and standard normal variate (SNV) followed by first-derivative were used as the pre-processing methods. Partial least squares (PLS) analysis was employed to extract the principal components (PCs) which were used as the inputs of least squares-support vector machine (LS-SVM) model according to their accumulative reliabilities. Then LS-SVM with radial basis function (RBF) kernel function and a two-step grid search technique were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias were 0.961, 0.040 and 0.012 for PLS, while 0.975, 0.031 and 4.697x10 -3 for LS-SVM, respectively. Both methods obtained a satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be applied as an alternative way for the prediction of pH of cola beverages.

Top