Tu, Yu-Kang; Krämer, Nicole; Lee, Wen-Chung
2012-07-01
In the analysis of trends in health outcomes, an ongoing issue is how to separate and estimate the effects of age, period, and cohort. As these 3 variables are perfectly collinear by definition, regression coefficients in a general linear model are not unique. In this tutorial, we review why identification is a problem, and how this problem may be tackled using partial least squares and principal components regression analyses. Both methods produce regression coefficients that fulfill the same collinearity constraint as the variables age, period, and cohort. We show that, because the constraint imposed by partial least squares and principal components regression is inherent in the mathematical relation among the 3 variables, this leads to more interpretable results. We use one dataset from a Taiwanese health-screening program to illustrate how to use partial least squares regression to analyze the trends in body heights with 3 continuous variables for age, period, and cohort. We then use another dataset of hepatocellular carcinoma mortality rates for Taiwanese men to illustrate how to use partial least squares regression to analyze tables with aggregated data. We use the second dataset to show the relation between the intrinsic estimator, a recently proposed method for the age-period-cohort analysis, and partial least squares regression. We also show that the inclusion of all indicator variables provides a more consistent approach. R code for our analyses is provided in the eAppendix.
Partial least squares (PLS) analysis offers a number of advantages over the more traditionally used regression analyses applied in landscape ecology, particularly for determining the associations among multiple constituents of surface water and landscape configuration. Common dat...
Partial least squares (PLS) analysis offers a number of advantages over the more traditionally used regression analyses applied in landscape ecology to study the associations among constituents of surface water and landscapes. Common data problems in ecological studies include: s...
ERIC Educational Resources Information Center
Hester, Yvette
Least squares methods are sophisticated mathematical curve fitting procedures used in all classical parametric methods. The linear least squares approximation is most often associated with finding the "line of best fit" or the regression line. Since all statistical analyses are correlational and all classical parametric methods are least…
Alexander, Terry W.; Wilson, Gary L.
1995-01-01
A generalized least-squares regression technique was used to relate the 2- to 500-year flood discharges from 278 selected streamflow-gaging stations to statistically significant basin characteristics. The regression relations (estimating equations) were defined for three hydrologic regions (I, II, and III) in rural Missouri. Ordinary least-squares regression analyses indicate that drainage area (Regions I, II, and III) and main-channel slope (Regions I and II) are the only basin characteristics needed for computing the 2- to 500-year design-flood discharges at gaged or ungaged stream locations. The resulting generalized least-squares regression equations provide a technique for estimating the 2-, 5-, 10-, 25-, 50-, 100-, and 500-year flood discharges on unregulated streams in rural Missouri. The regression equations for Regions I and II were developed from stream-flow-gaging stations with drainage areas ranging from 0.13 to 11,500 square miles and 0.13 to 14,000 square miles, and main-channel slopes ranging from 1.35 to 150 feet per mile and 1.20 to 279 feet per mile. The regression equations for Region III were developed from streamflow-gaging stations with drainage areas ranging from 0.48 to 1,040 square miles. Standard errors of estimate for the generalized least-squares regression equations in Regions I, II, and m ranged from 30 to 49 percent.
Geodesic least squares regression for scaling studies in magnetic confinement fusion
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert
In regression analyses for deriving scaling laws that occur in various scientific disciplines, usually standard regression methods have been applied, of which ordinary least squares (OLS) is the most popular. However, concerns have been raised with respect to several assumptions underlying OLS in its application to scaling laws. We here discuss a new regression method that is robust in the presence of significant uncertainty on both the data and the regression model. The method, which we call geodesic least squares regression (GLS), is based on minimization of the Rao geodesic distance on a probabilistic manifold. We demonstrate the superiority ofmore » the method using synthetic data and we present an application to the scaling law for the power threshold for the transition to the high confinement regime in magnetic confinement fusion devices.« less
A parameter estimation subroutine package
NASA Technical Reports Server (NTRS)
Bierman, G. J.; Nead, M. W.
1978-01-01
Linear least squares estimation and regression analyses continue to play a major role in orbit determination and related areas. A library of FORTRAN subroutines were developed to facilitate analyses of a variety of estimation problems. An easy to use, multi-purpose set of algorithms that are reasonably efficient and which use a minimal amount of computer storage are presented. Subroutine inputs, outputs, usage and listings are given, along with examples of how these routines can be used. The routines are compact and efficient and are far superior to the normal equation and Kalman filter data processing algorithms that are often used for least squares analyses.
Comparing least-squares and quantile regression approaches to analyzing median hospital charges.
Olsen, Cody S; Clark, Amy E; Thomas, Andrea M; Cook, Lawrence J
2012-07-01
Emergency department (ED) and hospital charges obtained from administrative data sets are useful descriptors of injury severity and the burden to EDs and the health care system. However, charges are typically positively skewed due to costly procedures, long hospital stays, and complicated or prolonged treatment for few patients. The median is not affected by extreme observations and is useful in describing and comparing distributions of hospital charges. A least-squares analysis employing a log transformation is one approach for estimating median hospital charges, corresponding confidence intervals (CIs), and differences between groups; however, this method requires certain distributional properties. An alternate method is quantile regression, which allows estimation and inference related to the median without making distributional assumptions. The objective was to compare the log-transformation least-squares method to the quantile regression approach for estimating median hospital charges, differences in median charges between groups, and associated CIs. The authors performed simulations using repeated sampling of observed statewide ED and hospital charges and charges randomly generated from a hypothetical lognormal distribution. The median and 95% CI and the multiplicative difference between the median charges of two groups were estimated using both least-squares and quantile regression methods. Performance of the two methods was evaluated. In contrast to least squares, quantile regression produced estimates that were unbiased and had smaller mean square errors in simulations of observed ED and hospital charges. Both methods performed well in simulations of hypothetical charges that met least-squares method assumptions. When the data did not follow the assumed distribution, least-squares estimates were often biased, and the associated CIs had lower than expected coverage as sample size increased. Quantile regression analyses of hospital charges provide unbiased estimates even when lognormal and equal variance assumptions are violated. These methods may be particularly useful in describing and analyzing hospital charges from administrative data sets. © 2012 by the Society for Academic Emergency Medicine.
Ecologists are often faced with problem of small sample size, correlated and large number of predictors, and high noise-to-signal relationships. This necessitates excluding important variables from the model when applying standard multiple or multivariate regression analyses. In ...
Development of Super-Ensemble techniques for ocean analyses: the Mediterranean Sea case
NASA Astrophysics Data System (ADS)
Pistoia, Jenny; Pinardi, Nadia; Oddo, Paolo; Collins, Matthew; Korres, Gerasimos; Drillet, Yann
2017-04-01
Short-term ocean analyses for Sea Surface Temperature SST in the Mediterranean Sea can be improved by a statistical post-processing technique, called super-ensemble. This technique consists in a multi-linear regression algorithm applied to a Multi-Physics Multi-Model Super-Ensemble (MMSE) dataset, a collection of different operational forecasting analyses together with ad-hoc simulations produced by modifying selected numerical model parameterizations. A new linear regression algorithm based on Empirical Orthogonal Function filtering techniques is capable to prevent overfitting problems, even if best performances are achieved when we add correlation to the super-ensemble structure using a simple spatial filter applied after the linear regression. Our outcomes show that super-ensemble performances depend on the selection of an unbiased operator and the length of the learning period, but the quality of the generating MMSE dataset has the largest impact on the MMSE analysis Root Mean Square Error (RMSE) evaluated with respect to observed satellite SST. Lower RMSE analysis estimates result from the following choices: 15 days training period, an overconfident MMSE dataset (a subset with the higher quality ensemble members), and the least square algorithm being filtered a posteriori.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.
An Analysis of Advertising Effectiveness for U.S. Navy Recruiting
1997-09-01
This thesis estimates the effect of Navy television advertising on enlistment rates of high quality male recruits (Armed Forces Qualification Test...Joint advertising is for all Armed Forces), Joint journal, and Joint direct mail advertising are explored. Enlistments are modeled as a function of...several factors including advertising , recruiters, and economic. Regression analyses (Ordinary Least Squares and Two Stage Least Squares) explore the
Marston, Louise; Peacock, Janet L; Yu, Keming; Brocklehurst, Peter; Calvert, Sandra A; Greenough, Anne; Marlow, Neil
2009-07-01
Studies of prematurely born infants contain a relatively large percentage of multiple births, so the resulting data have a hierarchical structure with small clusters of size 1, 2 or 3. Ignoring the clustering may lead to incorrect inferences. The aim of this study was to compare statistical methods which can be used to analyse such data: generalised estimating equations, multilevel models, multiple linear regression and logistic regression. Four datasets which differed in total size and in percentage of multiple births (n = 254, multiple 18%; n = 176, multiple 9%; n = 10 098, multiple 3%; n = 1585, multiple 8%) were analysed. With the continuous outcome, two-level models produced similar results in the larger dataset, while generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) produced divergent estimates using the smaller dataset. For the dichotomous outcome, most methods, except generalised least squares multilevel modelling (ML GH 'xtlogit' in Stata) gave similar odds ratios and 95% confidence intervals within datasets. For the continuous outcome, our results suggest using multilevel modelling. We conclude that generalised least squares multilevel modelling (ML GLS 'xtreg' in Stata) and maximum likelihood multilevel modelling (ML MLE 'xtmixed' in Stata) should be used with caution when the dataset is small. Where the outcome is dichotomous and there is a relatively large percentage of non-independent data, it is recommended that these are accounted for in analyses using logistic regression with adjusted standard errors or multilevel modelling. If, however, the dataset has a small percentage of clusters greater than size 1 (e.g. a population dataset of children where there are few multiples) there appears to be less need to adjust for clustering.
ERIC Educational Resources Information Center
Wu, Dane W.
2002-01-01
The year 2000 US presidential election between Al Gore and George Bush has been the most intriguing and controversial one in American history. The state of Florida was the trigger for the controversy, mainly, due to the use of the misleading "butterfly ballot". Using prediction (or confidence) intervals for least squares regression lines…
The Relationship between Retention and College Counseling for High-risk Students
ERIC Educational Resources Information Center
Bishop, Kyle K.
2016-01-01
The author used an archival study to explore the relationship between college counseling and retention. The cohort for this study was a college's 2006 class of full-time, 1st-year students (N = 429). The results of chi-square analyses and regression analyses indicated (a) a significant difference in retention between high-risk and low-risk…
Radon-222 concentrations in ground water and soil gas on Indian reservations in Wisconsin
DeWild, John F.; Krohelski, James T.
1995-01-01
For sites with wells finished in the sand and gravel aquifer, the coefficient of determination (R2) of the regression of concentration of radon-222 in ground water as a function of well depth is 0.003 and the significance level is 0.32, which indicates that there is not a statistically significant relation between radon-222 concentrations in ground water and well depth. The coefficient of determination of the regression of radon-222 in ground water and soil gas is 0.19 and the root mean square error of the regression line is 271 picocuries per liter. Even though the significance level (0.036) indicates a statistical relation, the root mean square error of the regression is so large that the regression equation would not give reliable predictions. Because of an inadequate number of samples, similar statistical analyses could not be performed for sites with wells finished in the crystalline and sedimentary bedrock aquifers.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271
Li, Min; Zhang, Lu; Yao, Xiaolong; Jiang, Xingyu
2017-01-01
The emerging membrane introduction mass spectrometry technique has been successfully used to detect benzene, toluene, ethyl benzene and xylene (BTEX), while overlapped spectra have unfortunately hindered its further application to the analysis of mixtures. Multivariate calibration, an efficient method to analyze mixtures, has been widely applied. In this paper, we compared univariate and multivariate analyses for quantification of the individual components of mixture samples. The results showed that the univariate analysis creates poor models with regression coefficients of 0.912, 0.867, 0.440 and 0.351 for BTEX, respectively. For multivariate analysis, a comparison to the partial-least squares (PLS) model shows that the orthogonal partial-least squares (OPLS) regression exhibits an optimal performance with regression coefficients of 0.995, 0.999, 0.980 and 0.976, favorable calibration parameters (RMSEC and RMSECV) and a favorable validation parameter (RMSEP). Furthermore, the OPLS exhibits a good recovery of 73.86 - 122.20% and relative standard deviation (RSD) of the repeatability of 1.14 - 4.87%. Thus, MIMS coupled with the OPLS regression provides an optimal approach for a quantitative BTEX mixture analysis in monitoring and predicting water pollution.
Southard, Rodney E.; Veilleux, Andrea G.
2014-01-01
Regression analysis techniques were used to develop a set of equations for rural ungaged stream sites for estimating discharges with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. Basin and climatic characteristics were computed using geographic information software and digital geospatial data. A total of 35 characteristics were computed for use in preliminary statewide and regional regression analyses. Annual exceedance-probability discharge estimates were computed for 278 streamgages by using the expected moments algorithm to fit a log-Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data from water year 1844 to 2012. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized multiple Grubbs-Beck test was used to detect potentially influential low floods. Annual peak flows less than a minimum recordable discharge at a streamgage were incorporated into the at-site station analyses. An updated regional skew coefficient was determined for the State of Missouri using Bayesian weighted least-squares/generalized least squares regression analyses. At-site skew estimates for 108 long-term streamgages with 30 or more years of record and the 35 basin characteristics defined for this study were used to estimate the regional variability in skew. However, a constant generalized-skew value of -0.30 and a mean square error of 0.14 were determined in this study. Previous flood studies indicated that the distinct physical features of the three physiographic provinces have a pronounced effect on the magnitude of flood peaks. Trends in the magnitudes of the residuals from preliminary statewide regression analyses from previous studies confirmed that regional analyses in this study were similar and related to three primary physiographic provinces. The final regional regression analyses resulted in three sets of equations. For Regions 1 and 2, the basin characteristics of drainage area and basin shape factor were statistically significant. For Region 3, because of the small amount of data from streamgages, only drainage area was statistically significant. Average standard errors of prediction ranged from 28.7 to 38.4 percent for flood region 1, 24.1 to 43.5 percent for flood region 2, and 25.8 to 30.5 percent for region 3. The regional regression equations are only applicable to stream sites in Missouri with flows not significantly affected by regulation, channelization, backwater, diversion, or urbanization. Basins with about 5 percent or less impervious area were considered to be rural. Applicability of the equations are limited to the basin characteristic values that range from 0.11 to 8,212.38 square miles (mi2) and basin shape from 2.25 to 26.59 for Region 1, 0.17 to 4,008.92 mi2 and basin shape 2.04 to 26.89 for Region 2, and 2.12 to 2,177.58 mi2 for Region 3. Annual peak data from streamgages were used to qualitatively assess the largest floods recorded at streamgages in Missouri since the 1915 water year. Based on existing streamgage data, the 1983 flood event was the largest flood event on record since 1915. The next five largest flood events, in descending order, took place in 1993, 1973, 2008, 1994 and 1915. Since 1915, five of six of the largest floods on record occurred from 1973 to 2012.
Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston
2016-10-28
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
Estimating the magnitude and frequency of floods in urban basins in Missouri
Southard, Rodney E.
2010-01-01
Streamgage flood-frequency analyses were done for 35 streamgages on urban streams in and adjacent to Missouri for estimation of the magnitude and frequency of floods in urban areas of Missouri. A log-Pearson Type-III distribution was fitted to the annual series of peak flow data retrieved from the U.S. Geological Survey National Water Information System. For this report, the flood frequency estimates are expressed in terms of percent annual exceedance probabilities of 50, 20, 10, 4, 2, 1, and 0.2. Of the 35 streamgages, 30 are located in Missouri. The remaining five non-Missouri streamgages were added to the dataset to improve the range and applicability of the regression analyses from the streamgage frequency analyses. Ordinary least-squares was used to determine the best set of independent variables for the regression equations. Basin characteristics selected for independent variables into the ordinary least-squares regression analyses were based on theoretical relation to flood flows, literature review of possible basin characteristics, and the ability to measure the basin characteristics using digital datasets and geographic information system technology. Results of the ordinary least-squares were evaluated on the basis of Mallow's Cp statistic, the adjusted coefficient of determination, and the statistical significance of the independent variables. The independent variables of drainage area and percent impervious area were determined to be statistically significant and readily determined from existing digital datasets. The drainage area variable was computed using the best elevation data available, either from a statewide 10-meter grid or high-resolution elevation data from urban areas. The impervious area variable was computed from the National Land Cover Dataset 2001 impervious area dataset. The National Land Cover Dataset 2001 impervious area data for each basin was compared to historical imagery and 7.5-minute topographic maps to verify the national dataset represented the urbanization of the basin at the time streamgage data were collected. Eight streamgages had less urbanization during the period of time streamflow data were collected than was shown on the 2001 dataset. The impervious area values for these eight urban basins were adjusted downward as much as 23 percent to account for the additional urbanization since the streamflow data were collected. Weighted least-squares regression techniques were used to determine the final regression equations for the statewide urban flood-frequency equations. Weighted least-squares techniques improve regression equations by adjusting for different and varying lengths in streamflow records. The final flood-frequency equations for the 50-, 20-, 10-, 4-, 2-, 1-, and 0.2-percent annual exceedance probability floods for Missouri provide a technique for estimating peak flows on urban streams at gaged and ungaged sites. The applicability of the equations is limited by the range in basin characteristics used to develop the regression equations. The range in drainage area is 0.28 to 189 square miles; range in impervious area is 2.3 to 46.0 percent. Seven of the 35 selected streamgages were used to compare the results of the existing rural and urban equations to the urban equations presented in this report for the 1-percent annual exceedance probability. Results of the comparison indicate that the estimated peak flows for the urban equation in this report ranged from 3 to 52 percent higher than the results from the rural equations. Comparing the estimated urban peak flows from this report to the existing urban equation developed in 1986 indicated the range was 255 percent lower to 10 percent higher. The overall comparison between the current (2010) and 1986 urban equations indicates a reduction in estimated peak flow values for the 1-percent annual exceedance probability flood.
Investigating bias in squared regression structure coefficients
Nimon, Kim F.; Zientek, Linda R.; Thompson, Bruce
2015-01-01
The importance of structure coefficients and analogs of regression weights for analysis within the general linear model (GLM) has been well-documented. The purpose of this study was to investigate bias in squared structure coefficients in the context of multiple regression and to determine if a formula that had been shown to correct for bias in squared Pearson correlation coefficients and coefficients of determination could be used to correct for bias in squared regression structure coefficients. Using data from a Monte Carlo simulation, this study found that squared regression structure coefficients corrected with Pratt's formula produced less biased estimates and might be more accurate and stable estimates of population squared regression structure coefficients than estimates with no such corrections. While our findings are in line with prior literature that identified multicollinearity as a predictor of bias in squared regression structure coefficients but not coefficients of determination, the findings from this study are unique in that the level of predictive power, number of predictors, and sample size were also observed to contribute bias in squared regression structure coefficients. PMID:26217273
Independent contrasts and PGLS regression estimators are equivalent.
Blomberg, Simon P; Lefevre, James G; Wells, Jessie A; Waterhouse, Mary
2012-05-01
We prove that the slope parameter of the ordinary least squares regression of phylogenetically independent contrasts (PICs) conducted through the origin is identical to the slope parameter of the method of generalized least squares (GLSs) regression under a Brownian motion model of evolution. This equivalence has several implications: 1. Understanding the structure of the linear model for GLS regression provides insight into when and why phylogeny is important in comparative studies. 2. The limitations of the PIC regression analysis are the same as the limitations of the GLS model. In particular, phylogenetic covariance applies only to the response variable in the regression and the explanatory variable should be regarded as fixed. Calculation of PICs for explanatory variables should be treated as a mathematical idiosyncrasy of the PIC regression algorithm. 3. Since the GLS estimator is the best linear unbiased estimator (BLUE), the slope parameter estimated using PICs is also BLUE. 4. If the slope is estimated using different branch lengths for the explanatory and response variables in the PIC algorithm, the estimator is no longer the BLUE, so this is not recommended. Finally, we discuss whether or not and how to accommodate phylogenetic covariance in regression analyses, particularly in relation to the problem of phylogenetic uncertainty. This discussion is from both frequentist and Bayesian perspectives.
A parameter estimation subroutine package
NASA Technical Reports Server (NTRS)
Bierman, G. J.; Nead, W. M.
1977-01-01
Linear least squares estimation and regression analyses continue to play a major role in orbit determination and related areas. FORTRAN subroutines have been developed to facilitate analyses of a variety of parameter estimation problems. Easy to use multipurpose sets of algorithms are reported that are reasonably efficient and which use a minimal amount of computer storage. Subroutine inputs, outputs, usage and listings are given, along with examples of how these routines can be used.
Estelles-Lopez, Lucia; Ropodi, Athina; Pavlidis, Dimitris; Fotopoulou, Jenny; Gkousari, Christina; Peyrodie, Audrey; Panagou, Efstathios; Nychas, George-John; Mohareb, Fady
2017-09-01
Over the past decade, analytical approaches based on vibrational spectroscopy, hyperspectral/multispectral imagining and biomimetic sensors started gaining popularity as rapid and efficient methods for assessing food quality, safety and authentication; as a sensible alternative to the expensive and time-consuming conventional microbiological techniques. Due to the multi-dimensional nature of the data generated from such analyses, the output needs to be coupled with a suitable statistical approach or machine-learning algorithms before the results can be interpreted. Choosing the optimum pattern recognition or machine learning approach for a given analytical platform is often challenging and involves a comparative analysis between various algorithms in order to achieve the best possible prediction accuracy. In this work, "MeatReg", a web-based application is presented, able to automate the procedure of identifying the best machine learning method for comparing data from several analytical techniques, to predict the counts of microorganisms responsible of meat spoilage regardless of the packaging system applied. In particularly up to 7 regression methods were applied and these are ordinary least squares regression, stepwise linear regression, partial least square regression, principal component regression, support vector regression, random forest and k-nearest neighbours. MeatReg" was tested with minced beef samples stored under aerobic and modified atmosphere packaging and analysed with electronic nose, HPLC, FT-IR, GC-MS and Multispectral imaging instrument. Population of total viable count, lactic acid bacteria, pseudomonads, Enterobacteriaceae and B. thermosphacta, were predicted. As a result, recommendations of which analytical platforms are suitable to predict each type of bacteria and which machine learning methods to use in each case were obtained. The developed system is accessible via the link: www.sorfml.com. Copyright © 2017 Elsevier Ltd. All rights reserved.
Analysis of the low-flow characteristics of streams in Louisiana
Lee, Fred N.
1985-01-01
The U.S. Geological Survey, in cooperation with the Louisiana Department of Transportation and Development, Office of Public Works, used geologic maps, soils maps, precipitation data, and low-flow data to define four hydrographic regions in Louisiana having distinct low-flow characteristics. Equations were derived, using regression analyses, to estimate the 7Q2, 7Q10, and 7Q20 flow rates for basically unaltered stream basins smaller than 525 square miles. Independent variables in the equations include drainage area (square miles), mean annual precipitation index (inches), and main channel slope (feet per mile). Average standard errors of regression ranged from +44 to +61 percent. Graphs are given for estimating the 7Q2, 7Q10, and 7Q20 for stream basins for which the drainage area of the most downstream data-collection site is larger than 525 square miles. Detailed examples are given in this report for the use of the equations and graphs.
The crux of the method: assumptions in ordinary least squares and logistic regression.
Long, Rebecca G
2008-10-01
Logistic regression has increasingly become the tool of choice when analyzing data with a binary dependent variable. While resources relating to the technique are widely available, clear discussions of why logistic regression should be used in place of ordinary least squares regression are difficult to find. The current paper compares and contrasts the assumptions of ordinary least squares with those of logistic regression and explains why logistic regression's looser assumptions make it adept at handling violations of the more important assumptions in ordinary least squares.
NASA Astrophysics Data System (ADS)
Solimun
2017-05-01
The aim of this research is to model survival data from kidney-transplant patients using the partial least squares (PLS)-Cox regression, which can both meet and not meet the no-multicollinearity assumption. The secondary data were obtained from research entitled "Factors affecting the survival of kidney-transplant patients". The research subjects comprised 250 patients. The predictor variables consisted of: age (X1), sex (X2); two categories, prior hemodialysis duration (X3), diabetes (X4); two categories, prior transplantation number (X5), number of blood transfusions (X6), discrepancy score (X7), use of antilymphocyte globulin(ALG) (X8); two categories, while the response variable was patient survival time (in months). Partial least squares regression is a model that connects the predictor variables X and the response variable y and it initially aims to determine the relationship between them. Results of the above analyses suggest that the survival of kidney transplant recipients ranged from 0 to 55 months, with 62% of the patients surviving until they received treatment that lasted for 55 months. The PLS-Cox regression analysis results revealed that patients' age and the use of ALG significantly affected the survival time of patients. The factor of patients' age (X1) in the PLS-Cox regression model merely affected the failure probability by 1.201. This indicates that the probability of dying for elderly patients with a kidney transplant is 1.152 times higher than that for younger patients.
What Is the Relationship between Teacher Quality and Student Achievement? An Exploratory Study
ERIC Educational Resources Information Center
Stronge, James H.; Ward, Thomas J.; Tucker, Pamela D.; Hindman, Jennifer L.
2007-01-01
The major purpose of the study was to examine what constitutes effective teaching as defined by measured increases in student learning with a focus on the instructional behaviors and practices. Ordinary least squares (OLS) regression analyses and hierarchical linear modeling (HLM) were used to identify teacher effectiveness levels while…
Single Parenthood and Children's Reading Performance in Asia
ERIC Educational Resources Information Center
Park, Hyunjoon
2007-01-01
Using the data from Program for International Student Assessment, I examine the gap in reading performance between 15-year-old students in single-parent and intact families in 5 Asian countries in comparison to the United States. The ordinary least square regression analyses show negligible disadvantages of students with a single parent in Hong…
A New Global Regression Analysis Method for the Prediction of Wind Tunnel Model Weight Corrections
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Bridge, Thomas M.; Amaya, Max A.
2014-01-01
A new global regression analysis method is discussed that predicts wind tunnel model weight corrections for strain-gage balance loads during a wind tunnel test. The method determines corrections by combining "wind-on" model attitude measurements with least squares estimates of the model weight and center of gravity coordinates that are obtained from "wind-off" data points. The method treats the least squares fit of the model weight separate from the fit of the center of gravity coordinates. Therefore, it performs two fits of "wind- off" data points and uses the least squares estimator of the model weight as an input for the fit of the center of gravity coordinates. Explicit equations for the least squares estimators of the weight and center of gravity coordinates are derived that simplify the implementation of the method in the data system software of a wind tunnel. In addition, recommendations for sets of "wind-off" data points are made that take typical model support system constraints into account. Explicit equations of the confidence intervals on the model weight and center of gravity coordinates and two different error analyses of the model weight prediction are also discussed in the appendices of the paper.
NASA Astrophysics Data System (ADS)
Reis, D. S.; Stedinger, J. R.; Martins, E. S.
2005-10-01
This paper develops a Bayesian approach to analysis of a generalized least squares (GLS) regression model for regional analyses of hydrologic data. The new approach allows computation of the posterior distributions of the parameters and the model error variance using a quasi-analytic approach. Two regional skew estimation studies illustrate the value of the Bayesian GLS approach for regional statistical analysis of a shape parameter and demonstrate that regional skew models can be relatively precise with effective record lengths in excess of 60 years. With Bayesian GLS the marginal posterior distribution of the model error variance and the corresponding mean and variance of the parameters can be computed directly, thereby providing a simple but important extension of the regional GLS regression procedures popularized by Tasker and Stedinger (1989), which is sensitive to the likely values of the model error variance when it is small relative to the sampling error in the at-site estimator.
Shrinkage regression-based methods for microarray missing value imputation.
Wang, Hsiuying; Chiu, Chia-Chun; Wu, Yi-Ching; Wu, Wei-Sheng
2013-01-01
Missing values commonly occur in the microarray data, which usually contain more than 5% missing values with up to 90% of genes affected. Inaccurate missing value estimation results in reducing the power of downstream microarray data analyses. Many types of methods have been developed to estimate missing values. Among them, the regression-based methods are very popular and have been shown to perform better than the other types of methods in many testing microarray datasets. To further improve the performances of the regression-based methods, we propose shrinkage regression-based methods. Our methods take the advantage of the correlation structure in the microarray data and select similar genes for the target gene by Pearson correlation coefficients. Besides, our methods incorporate the least squares principle, utilize a shrinkage estimation approach to adjust the coefficients of the regression model, and then use the new coefficients to estimate missing values. Simulation results show that the proposed methods provide more accurate missing value estimation in six testing microarray datasets than the existing regression-based methods do. Imputation of missing values is a very important aspect of microarray data analyses because most of the downstream analyses require a complete dataset. Therefore, exploring accurate and efficient methods for estimating missing values has become an essential issue. Since our proposed shrinkage regression-based methods can provide accurate missing value estimation, they are competitive alternatives to the existing regression-based methods.
A parameter estimation subroutine package
NASA Technical Reports Server (NTRS)
Bierman, G. J.; Nead, M. W.
1978-01-01
Linear least squares estimation and regression analyses continue to play a major role in orbit determination and related areas. In this report we document a library of FORTRAN subroutines that have been developed to facilitate analyses of a variety of estimation problems. Our purpose is to present an easy to use, multi-purpose set of algorithms that are reasonably efficient and which use a minimal amount of computer storage. Subroutine inputs, outputs, usage and listings are given along with examples of how these routines can be used. The following outline indicates the scope of this report: Section (1) introduction with reference to background material; Section (2) examples and applications; Section (3) subroutine directory summary; Section (4) the subroutine directory user description with input, output, and usage explained; and Section (5) subroutine FORTRAN listings. The routines are compact and efficient and are far superior to the normal equation and Kalman filter data processing algorithms that are often used for least squares analyses.
Publication bias in obesity treatment trials?
Allison, D B; Faith, M S; Gorman, B S
1996-10-01
The present investigation examined the extent of publication bias (namely the tendency to publish significant findings and file away non-significant findings) within the obesity treatment literature. Quantitative literature synthesis of four published meta-analyses from the obesity treatment literature. Interventions in these studies included pharmacological, educational, child, and couples treatments. To assess publication bias, several regression procedures (for example weighted least-squares, random-effects multi-level modeling, and robust regression methods) were used to regress effect sizes onto their standard errors, or proxies thereof, within each of the four meta-analysis. A significant positive beta weight in these analyses signified publication bias. There was evidence for publication bias within two of the four published meta-analyses, such that reviews of published studies were likely to overestimate clinical efficacy. The lack of evidence for publication bias within the two other meta-analyses might have been due to insufficient statistical power rather than the absence of selection bias. As in other disciplines, publication bias appears to exist in the obesity treatment literature. Suggestions are offered for managing publication bias once identified or reducing its likelihood in the first place.
Sparse partial least squares regression for simultaneous dimension reduction and variable selection
Chun, Hyonho; Keleş, Sündüz
2010-01-01
Partial least squares regression has been an alternative to ordinary least squares for handling multicollinearity in several areas of scientific research since the 1960s. It has recently gained much attention in the analysis of high dimensional genomic data. We show that known asymptotic consistency of the partial least squares estimator for a univariate response does not hold with the very large p and small n paradigm. We derive a similar result for a multivariate response regression with partial least squares. We then propose a sparse partial least squares formulation which aims simultaneously to achieve good predictive performance and variable selection by producing sparse linear combinations of the original predictors. We provide an efficient implementation of sparse partial least squares regression and compare it with well-known variable selection and dimension reduction approaches via simulation experiments. We illustrate the practical utility of sparse partial least squares regression in a joint analysis of gene expression and genomewide binding data. PMID:20107611
Perry, Charles A.; Wolock, David M.; Artman, Joshua C.
2004-01-01
Streamflow statistics of flow duration and peak-discharge frequency were estimated for 4,771 individual locations on streams listed on the 1999 Kansas Surface Water Register. These statistics included the flow-duration values of 90, 75, 50, 25, and 10 percent, as well as the mean flow value. Peak-discharge frequency values were estimated for the 2-, 5-, 10-, 25-, 50-, and 100-year floods. Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating flow-duration values of 90, 75, 50, 25, and 10 percent and the mean flow for uncontrolled flow stream locations. The contributing-drainage areas of 149 U.S. Geological Survey streamflow-gaging stations in Kansas and parts of surrounding States that had flow uncontrolled by Federal reservoirs and used in the regression analyses ranged from 2.06 to 12,004 square miles. Logarithmic transformations of climatic and basin data were performed to yield the best linear relation for developing equations to compute flow durations and mean flow. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were contributing-drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. The analyses yielded a model standard error of prediction range of 0.43 logarithmic units for the 90-percent duration analysis to 0.15 logarithmic units for the 10-percent duration analysis. The model standard error of prediction was 0.14 logarithmic units for the mean flow. Regression equations used to estimate peak-discharge frequency values were obtained from a previous report, and estimates for the 2-, 5-, 10-, 25-, 50-, and 100-year floods were determined for this report. The regression equations and an interpolation procedure were used to compute flow durations, mean flow, and estimates of peak-discharge frequency for locations along uncontrolled flow streams on the 1999 Kansas Surface Water Register. Flow durations, mean flow, and peak-discharge frequency values determined at available gaging stations were used to interpolate the regression-estimated flows for the stream locations where available. Streamflow statistics for locations that had uncontrolled flow were interpolated using data from gaging stations weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled reaches of Kansas streams, the streamflow statistics were interpolated between gaging stations using only gaged data weighted by drainage area.
Random regression analyses using B-splines to model growth of Australian Angus cattle
Meyer, Karin
2005-01-01
Regression on the basis function of B-splines has been advocated as an alternative to orthogonal polynomials in random regression analyses. Basic theory of splines in mixed model analyses is reviewed, and estimates from analyses of weights of Australian Angus cattle from birth to 820 days of age are presented. Data comprised 84 533 records on 20 731 animals in 43 herds, with a high proportion of animals with 4 or more weights recorded. Changes in weights with age were modelled through B-splines of age at recording. A total of thirteen analyses, considering different combinations of linear, quadratic and cubic B-splines and up to six knots, were carried out. Results showed good agreement for all ages with many records, but fluctuated where data were sparse. On the whole, analyses using B-splines appeared more robust against "end-of-range" problems and yielded more consistent and accurate estimates of the first eigenfunctions than previous, polynomial analyses. A model fitting quadratic B-splines, with knots at 0, 200, 400, 600 and 821 days and a total of 91 covariance components, appeared to be a good compromise between detailedness of the model, number of parameters to be estimated, plausibility of results, and fit, measured as residual mean square error. PMID:16093011
Guelpa, Anina; Bevilacqua, Marta; Marini, Federico; O'Kennedy, Kim; Geladi, Paul; Manley, Marena
2015-04-15
It has been established in this study that the Rapid Visco Analyser (RVA) can describe maize hardness, irrespective of the RVA profile, when used in association with appropriate multivariate data analysis techniques. Therefore, the RVA can complement or replace current and/or conventional methods as a hardness descriptor. Hardness modelling based on RVA viscograms was carried out using seven conventional hardness methods (hectoliter mass (HLM), hundred kernel mass (HKM), particle size index (PSI), percentage vitreous endosperm (%VE), protein content, percentage chop (%chop) and near infrared (NIR) spectroscopy) as references and three different RVA profiles (hard, soft and standard) as predictors. An approach using locally weighted partial least squares (LW-PLS) was followed to build the regression models. The resulted prediction errors (root mean square error of cross-validation (RMSECV) and root mean square error of prediction (RMSEP)) for the quantification of hardness values were always lower or in the same order of the laboratory error of the reference method. Copyright © 2014 Elsevier Ltd. All rights reserved.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen
2016-02-01
Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.
Why Do They Stay? Job Tenure among Certified Nursing Assistants in Nursing Homes
ERIC Educational Resources Information Center
Wiener, Joshua M.; Squillace, Marie R.; Anderson, Wayne L.; Khatutsky, Galina
2009-01-01
Purpose: This study identifies factors related to job tenure among certified nursing assistants (CNAs) working in nursing homes. Design and Methods: The study uses 2004 data from the National Nursing Home Survey, the National Nursing Assistant Survey, and the Area Resource File. Ordinary least squares regression analyses were conducted with length…
ERIC Educational Resources Information Center
Kaur, Berinderjeet; Areepattamannil, Shaljan
2012-01-01
This study, drawing on data from the Programme for International Student Assessment (PISA) 2009, explored the influences of metacognitive and self-regulated learning strategies for reading on mathematical literacy of adolescents in Australia and Singapore. Ordinary least squares (OLS) regression analyses revealed the positive influences of…
ERIC Educational Resources Information Center
Areepattamannil, Shaljan; Kaur, Berinderjeet
2012-01-01
This study, drawing on data from the Trends in International Mathematics and Science Study (TIMSS) 2007, examined the influences of self-perceived competence in mathematics and positive affect toward mathematics on mathematics achievement of adolescents in Singapore. Ordinary least squares (OLS) regression analyses revealed the positive influences…
Libiger, Ondrej; Schork, Nicholas J.
2015-01-01
It is now feasible to examine the composition and diversity of microbial communities (i.e., “microbiomes”) that populate different human organs and orifices using DNA sequencing and related technologies. To explore the potential links between changes in microbial communities and various diseases in the human body, it is essential to test associations involving different species within and across microbiomes, environmental settings and disease states. Although a number of statistical techniques exist for carrying out relevant analyses, it is unclear which of these techniques exhibit the greatest statistical power to detect associations given the complexity of most microbiome datasets. We compared the statistical power of principal component regression, partial least squares regression, regularized regression, distance-based regression, Hill's diversity measures, and a modified test implemented in the popular and widely used microbiome analysis methodology “Metastats” across a wide range of simulated scenarios involving changes in feature abundance between two sets of metagenomic samples. For this purpose, simulation studies were used to change the abundance of microbial species in a real dataset from a published study examining human hands. Each technique was applied to the same data, and its ability to detect the simulated change in abundance was assessed. We hypothesized that a small subset of methods would outperform the rest in terms of the statistical power. Indeed, we found that the Metastats technique modified to accommodate multivariate analysis and partial least squares regression yielded high power under the models and data sets we studied. The statistical power of diversity measure-based tests, distance-based regression and regularized regression was significantly lower. Our results provide insight into powerful analysis strategies that utilize information on species counts from large microbiome data sets exhibiting skewed frequency distributions obtained on a small to moderate number of samples. PMID:26734061
Using Weighted Least Squares Regression for Obtaining Langmuir Sorption Constants
USDA-ARS?s Scientific Manuscript database
One of the most commonly used models for describing phosphorus (P) sorption to soils is the Langmuir model. To obtain model parameters, the Langmuir model is fit to measured sorption data using least squares regression. Least squares regression is based on several assumptions including normally dist...
Two Enhancements of the Logarithmic Least-Squares Method for Analyzing Subjective Comparisons
1989-03-25
error term. 1 For this model, the total sum of squares ( SSTO ), defined as n 2 SSTO = E (yi y) i=1 can be partitioned into error and regression sums...of the regression line around the mean value. Mathematically, for the model given by equation A.4, SSTO = SSE + SSR (A.6) A-4 where SSTO is the total...sum of squares (i.e., the variance of the yi’s), SSE is error sum of squares, and SSR is the regression sum of squares. SSTO , SSE, and SSR are given
Wang, L; Qin, X C; Lin, H C; Deng, K F; Luo, Y W; Sun, Q R; Du, Q X; Wang, Z Y; Tuo, Y; Sun, J H
2018-02-01
To analyse the relationship between Fourier transform infrared (FTIR) spectrum of rat's spleen tissue and postmortem interval (PMI) for PMI estimation using FTIR spectroscopy combined with data mining method. Rats were sacrificed by cervical dislocation, and the cadavers were placed at 20 ℃. The FTIR spectrum data of rats' spleen tissues were taken and measured at different time points. After pretreatment, the data was analysed by data mining method. The absorption peak intensity of rat's spleen tissue spectrum changed with the PMI, while the absorption peak position was unchanged. The results of principal component analysis (PCA) showed that the cumulative contribution rate of the first three principal components was 96%. There was an obvious clustering tendency for the spectrum sample at each time point. The methods of partial least squares discriminant analysis (PLS-DA) and support vector machine classification (SVMC) effectively divided the spectrum samples with different PMI into four categories (0-24 h, 48-72 h, 96-120 h and 144-168 h). The determination coefficient ( R ²) of the PMI estimation model established by PLS regression analysis was 0.96, and the root mean square error of calibration (RMSEC) and root mean square error of cross validation (RMSECV) were 9.90 h and 11.39 h respectively. In prediction set, the R ² was 0.97, and the root mean square error of prediction (RMSEP) was 10.49 h. The FTIR spectrum of the rat's spleen tissue can be effectively analyzed qualitatively and quantitatively by the combination of FTIR spectroscopy and data mining method, and the classification and PLS regression models can be established for PMI estimation. Copyright© by the Editorial Department of Journal of Forensic Medicine.
ERIC Educational Resources Information Center
Helmreich, James E.; Krog, K. Peter
2018-01-01
We present a short, inquiry-based learning course on concepts and methods underlying ordinary least squares (OLS), least absolute deviation (LAD), and quantile regression (QR). Students investigate squared, absolute, and weighted absolute distance functions (metrics) as location measures. Using differential calculus and properties of convex…
Second Chance Education Matters! Income Trajectories of Poorly Educated Non-Nordics in Sweden
ERIC Educational Resources Information Center
Nordlund, Madelene; Bonfanti, Sara; Strandh, Mattias
2015-01-01
In this study we examine the long-term impact of second chance education (SCE) on incomes of poorly educated individuals who live in Sweden but were not born in a Nordic country, using data on income changes from 1992 to 2003 compiled by Statistics Sweden. Ordinary Least Squares regression analyses show that participation in SCE increased the work…
ERIC Educational Resources Information Center
Mugrage, Beverly; And Others
Three ridge regression solutions are compared with ordinary least squares regression and with principal components regression using all components. Ridge regression, particularly the Lawless-Wang solution, out-performed ordinary least squares regression and the principal components solution on the criteria of stability of coefficient and closeness…
Estimates of Median Flows for Streams on the 1999 Kansas Surface Water Register
Perry, Charles A.; Wolock, David M.; Artman, Joshua C.
2004-01-01
The Kansas State Legislature, by enacting Kansas Statute KSA 82a?2001 et. seq., mandated the criteria for determining which Kansas stream segments would be subject to classification by the State. One criterion for the selection as a classified stream segment is based on the statistic of median flow being equal to or greater than 1 cubic foot per second. As specified by KSA 82a?2001 et. seq., median flows were determined from U.S. Geological Survey streamflow-gaging-station data by using the most-recent 10 years of gaged data (KSA) for each streamflow-gaging station. Median flows also were determined by using gaged data from the entire period of record (all-available hydrology, AAH). Least-squares multiple regression techniques were used, along with Tobit analyses, to develop equations for estimating median flows for uncontrolled stream segments. The drainage area of the gaging stations on uncontrolled stream segments used in the regression analyses ranged from 2.06 to 12,004 square miles. A logarithmic transformation of the data was needed to develop the best linear relation for computing median flows. In the regression analyses, the significant climatic and basin characteristics, in order of importance, were drainage area, mean annual precipitation, mean basin permeability, and mean basin slope. Tobit analyses of KSA data yielded a model standard error of prediction of 0.285 logarithmic units, and the best equations using Tobit analyses of AAH data had a model standard error of prediction of 0.250 logarithmic units. These regression equations and an interpolation procedure were used to compute median flows for the uncontrolled stream segments on the 1999 Kansas Surface Water Register. Measured median flows from gaging stations were incorporated into the regression-estimated median flows along the stream segments where available. The segments that were uncontrolled were interpolated using gaged data weighted according to the drainage area and the bias between the regression-estimated and gaged flow information. On controlled segments of Kansas streams, the median flow information was interpolated between gaging stations using only gaged data weighted by drainage area. Of the 2,232 total stream segments on the Kansas Surface Water Register, 34.5 percent of the segments had an estimated median streamflow of less than 1 cubic foot per second when the KSA analysis was used. When the AAH analysis was used, 36.2 percent of the segments had an estimated median streamflow of less than 1 cubic foot per second. This report supercedes U.S. Geological Survey Water-Resources Investigations Report 02?4292.
Estimation of Magnitude and Frequency of Floods for Streams on the Island of Oahu, Hawaii
Wong, Michael F.
1994-01-01
This report describes techniques for estimating the magnitude and frequency of floods for the island of Oahu. The log-Pearson Type III distribution and methodology recommended by the Interagency Committee on Water Data was used to determine the magnitude and frequency of floods at 79 gaging stations that had 11 to 72 years of record. Multiple regression analysis was used to construct regression equations to transfer the magnitude and frequency information from gaged sites to ungaged sites. Oahu was divided into three hydrologic regions to define relations between peak discharge and drainage-basin and climatic characteristics. Regression equations are provided to estimate the 2-, 5-, 10-, 25-, 50-, and 100-year peak discharges at ungaged sites. Significant basin and climatic characteristics included in the regression equations are drainage area, median annual rainfall, and the 2-year, 24-hour rainfall intensity. Drainage areas for sites used in this study ranged from 0.03 to 45.7 square miles. Standard error of prediction for the regression equations ranged from 34 to 62 percent. Peak-discharge data collected through water year 1988, geographic information system (GIS) technology, and generalized least-squares regression were used in the analyses. The use of GIS seems to be a more flexible and consistent means of defining and calculating basin and climatic characteristics than using manual methods. Standard errors of estimate for the regression equations in this report are an average of 8 percent less than those published in previous studies.
Yu, Peigen; Low, Mei Yin; Zhou, Weibiao
2018-01-01
In order to develop products that would be preferred by consumers, the effects of the chemical compositions of ready-to-drink green tea beverages on consumer liking were studied through regression analyses. Green tea model systems were prepared by dosing solutions of 0.1% green tea extract with differing concentrations of eight flavour keys deemed to be important for green tea aroma and taste, based on a D-optimal experimental design, before undergoing commercial sterilisation. Sensory evaluation of the green tea model system was carried out using an untrained consumer panel to obtain hedonic liking scores of the samples. Regression models were subsequently trained to objectively predict the consumer liking scores of the green tea model systems. A linear partial least squares (PLS) regression model was developed to describe the effects of the eight flavour keys on consumer liking, with a coefficient of determination (R 2 ) of 0.733, and a root-mean-square error (RMSE) of 3.53%. The PLS model was further augmented with an artificial neural network (ANN) to establish a PLS-ANN hybrid model. The established hybrid model was found to give a better prediction of consumer liking scores, based on its R 2 (0.875) and RMSE (2.41%). Copyright © 2017 Elsevier Ltd. All rights reserved.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Magnitude and frequency of floods in Arkansas
Hodge, Scott A.; Tasker, Gary D.
1995-01-01
Methods are presented for estimating the magnitude and frequency of peak discharges of streams in Arkansas. Regression analyses were developed in which a stream's physical and flood characteristics were related. Four sets of regional regression equations were derived to predict peak discharges with selected recurrence intervals of 2, 5, 10, 25, 50, 100, and 500 years on streams draining less than 7,770 square kilometers. The regression analyses indicate that size of drainage area, main channel slope, mean basin elevation, and the basin shape factor were the most significant basin characteristics that affect magnitude and frequency of floods. The region of influence method is included in this report. This method is still being improved and is to be considered only as a second alternative to the standard method of producing regional regression equations. This method estimates unique regression equations for each recurrence interval for each ungaged site. The regression analyses indicate that size of drainage area, main channel slope, mean annual precipitation, mean basin elevation, and the basin shape factor were the most significant basin and climatic characteristics that affect magnitude and frequency of floods for this method. Certain recommendations on the use of this method are provided. A method is described for estimating the magnitude and frequency of peak discharges of streams for urban areas in Arkansas. The method is from a nationwide U.S. Geeological Survey flood frequency report which uses urban basin characteristics combined with rural discharges to estimate urban discharges. Annual peak discharges from 204 gaging stations, with drainage areas less than 7,770 square kilometers and at least 10 years of unregulated record, were used in the analysis. These data provide the basis for this analysis and are published in the Appendix of this report as supplemental data. Large rivers such as the Red, Arkansas, White, Black, St. Francis, Mississippi, and Ouachita Rivers have floodflow characteristics that differ from those of smaller tributary streams and were treated individually. Regional regression equations are not applicable to these large rivers. The magnitude and frequency of floods along these rivers are based on specific station data. This section is provided in the Appendix and has not been updated since the last Arkansas flood frequency report (1987b), but is included at the request of the cooperator.
Eash, David A.; Barnes, Kimberlee K.
2017-01-01
A statewide study was conducted to develop regression equations for estimating six selected low-flow frequency statistics and harmonic mean flows for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include: the annual 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years, the annual 30-day mean low flow for a recurrence interval of 5 years, and the seasonal (October 1 through December 31) 1- and 7-day mean low flows for a recurrence interval of 10 years. Estimation equations also were developed for the harmonic-mean-flow statistic. Estimates of these seven selected statistics are provided for 208 U.S. Geological Survey continuous-record streamgages using data through September 30, 2006. The study area comprises streamgages located within Iowa and 50 miles beyond the State's borders. Because trend analyses indicated statistically significant positive trends when considering the entire period of record for the majority of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. The median number of years of record used to compute each of these seven selected statistics was 35. Geographic information system software was used to measure 54 selected basin characteristics for each streamgage. Following the removal of two streamgages from the initial data set, data collected for 206 streamgages were compiled to investigate three approaches for regionalization of the seven selected statistics. Regionalization, a process using statistical regression analysis, provides a relation for efficiently transferring information from a group of streamgages in a region to ungaged sites in the region. The three regionalization approaches tested included statewide, regional, and region-of-influence regressions. For the regional regression, the study area was divided into three low-flow regions on the basis of hydrologic characteristics, landform regions, and soil regions. A comparison of root mean square errors and average standard errors of prediction for the statewide, regional, and region-of-influence regressions determined that the regional regression provided the best estimates of the seven selected statistics at ungaged sites in Iowa. Because a significant number of streams in Iowa reach zero flow as their minimum flow during low-flow years, four different types of regression analyses were used: left-censored, logistic, generalized-least-squares, and weighted-least-squares regression. A total of 192 streamgages were included in the development of 27 regression equations for the three low-flow regions. For the northeast and northwest regions, a censoring threshold was used to develop 12 left-censored regression equations to estimate the 6 low-flow frequency statistics for each region. For the southern region a total of 12 regression equations were developed; 6 logistic regression equations were developed to estimate the probability of zero flow for the 6 low-flow frequency statistics and 6 generalized least-squares regression equations were developed to estimate the 6 low-flow frequency statistics, if nonzero flow is estimated first by use of the logistic equations. A weighted-least-squares regression equation was developed for each region to estimate the harmonic-mean-flow statistic. Average standard errors of estimate for the left-censored equations for the northeast region range from 64.7 to 88.1 percent and for the northwest region range from 85.8 to 111.8 percent. Misclassification percentages for the logistic equations for the southern region range from 5.6 to 14.0 percent. Average standard errors of prediction for generalized least-squares equations for the southern region range from 71.7 to 98.9 percent and pseudo coefficients of determination for the generalized-least-squares equations range from 87.7 to 91.8 percent. Average standard errors of prediction for weighted-least-squares equations developed for estimating the harmonic-mean-flow statistic for each of the three regions range from 66.4 to 80.4 percent. The regression equations are applicable only to stream sites in Iowa with low flows not significantly affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. If the equations are used at ungaged sites on regulated streams, or on streams affected by water-supply and agricultural withdrawals, then the estimates will need to be adjusted by the amount of regulation or withdrawal to estimate the actual flow conditions if that is of interest. Caution is advised when applying the equations for basins with characteristics near the applicable limits of the equations and for basins located in karst topography. A test of two drainage-area ratio methods using 31 pairs of streamgages, for the annual 7-day mean low-flow statistic for a recurrence interval of 10 years, indicates a weighted drainage-area ratio method provides better estimates than regional regression equations for an ungaged site on a gaged stream in Iowa when the drainage-area ratio is between 0.5 and 1.4. These regression equations will be implemented within the U.S. Geological Survey StreamStats web-based geographic-information-system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the seven selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these seven selected statistics are provided for the streamgage.
ERIC Educational Resources Information Center
Kaljee, Linda M.; Green, Mackenzie S.; Zhan, Min; Riel, Rosemary; Lerdboon, Porntip; Lostutter, Ty W.; Tho, Le Huu; Luong, Vo Van; Minh, Truong Tan
2011-01-01
A randomly selected cross-sectional survey was conducted with 880 youth (16 to 24 years) in Nha Trang City to assess relationships between alcohol consumption and sexual behaviors. A timeline followback method was employed. Chi-square, generalized logit modeling and logistic regression analyses were performed. Of the sample, 78.2% male and 56.1%…
Plant selection for ethnobotanical uses on the Amalfi Coast (Southern Italy).
Savo, V; Joy, R; Caneva, G; McClatchey, W C
2015-07-15
Many ethnobotanical studies have investigated selection criteria for medicinal and non-medicinal plants. In this paper we test several statistical methods using different ethnobotanical datasets in order to 1) define to which extent the nature of the datasets can affect the interpretation of results; 2) determine if the selection for different plant uses is based on phylogeny, or other selection criteria. We considered three different ethnobotanical datasets: two datasets of medicinal plants and a dataset of non-medicinal plants (handicraft production, domestic and agro-pastoral practices) and two floras of the Amalfi Coast. We performed residual analysis from linear regression, the binomial test and the Bayesian approach for calculating under-used and over-used plant families within ethnobotanical datasets. Percentages of agreement were calculated to compare the results of the analyses. We also analyzed the relationship between plant selection and phylogeny, chorology, life form and habitat using the chi-square test. Pearson's residuals for each of the significant chi-square analyses were examined for investigating alternative hypotheses of plant selection criteria. The three statistical analysis methods differed within the same dataset, and between different datasets and floras, but with some similarities. In the two medicinal datasets, only Lamiaceae was identified in both floras as an over-used family by all three statistical methods. All statistical methods in one flora agreed that Malvaceae was over-used and Poaceae under-used, but this was not found to be consistent with results of the second flora in which one statistical result was non-significant. All other families had some discrepancy in significance across methods, or floras. Significant over- or under-use was observed in only a minority of cases. The chi-square analyses were significant for phylogeny, life form and habitat. Pearson's residuals indicated a non-random selection of woody species for non-medicinal uses and an under-use of plants of temperate forests for medicinal uses. Our study showed that selection criteria for plant uses (including medicinal) are not always based on phylogeny. The comparison of different statistical methods (regression, binomial and Bayesian) under different conditions led to the conclusion that the most conservative results are obtained using regression analysis.
Food and Drug Administration tobacco regulation and product judgments.
Kaufman, Annette R; Finney Rutten, Lila J; Parascandola, Mark; Blake, Kelly D; Augustson, Erik M
2015-04-01
The Family Smoking Prevention and Tobacco Control Act granted the Food and Drug Administration (FDA) the authority to regulate tobacco products in the U.S. However, little is known about how regulation may be related to judgments about tobacco product-related risks. To understand how FDA tobacco regulation beliefs are associated with judgments about tobacco product-related risks. The Health Information National Trends Survey is a national survey of the U.S. adult population. Data used in this analysis were collected from October 2012 through January 2013 (N=3,630) by mailed questionnaire and analyzed in 2013. Weighted bivariate chi-square analyses were used to assess associations among FDA regulation belief, tobacco harm judgments, sociodemographics, and smoking status. A weighted multinomial logistic regression was conducted where FDA regulation belief was regressed on tobacco product judgments, controlling for sociodemographic variables and smoking status. About 41% believed that the FDA regulates tobacco products in the U.S., 23.6% reported the FDA does not, and 35.3% did not know. Chi-square analyses showed that smoking status was significantly related to harm judgments about electronic cigarettes (p<0.0001). The multinomial logistic regression revealed that uncertainty about FDA regulation was associated with tobacco product harm judgment uncertainty. Tobacco product harm perceptions are associated with beliefs about tobacco product regulation by the FDA. These findings suggest the need for increased public awareness and understanding of the role of tobacco product regulation in protecting public health. Copyright © 2015. Published by Elsevier Inc.
Katsarov, Plamen; Gergov, Georgi; Alin, Aylin; Pilicheva, Bissera; Al-Degs, Yahya; Simeonov, Vasil; Kassarova, Margarita
2018-03-01
The prediction power of partial least squares (PLS) and multivariate curve resolution-alternating least squares (MCR-ALS) methods have been studied for simultaneous quantitative analysis of the binary drug combination - doxylamine succinate and pyridoxine hydrochloride. Analysis of first-order UV overlapped spectra was performed using different PLS models - classical PLS1 and PLS2 as well as partial robust M-regression (PRM). These linear models were compared to MCR-ALS with equality and correlation constraints (MCR-ALS-CC). All techniques operated within the full spectral region and extracted maximum information for the drugs analysed. The developed chemometric methods were validated on external sample sets and were applied to the analyses of pharmaceutical formulations. The obtained statistical parameters were satisfactory for calibration and validation sets. All developed methods can be successfully applied for simultaneous spectrophotometric determination of doxylamine and pyridoxine both in laboratory-prepared mixtures and commercial dosage forms.
Carpani, Irene; Conti, Paolo; Lanteri, Silvia; Legnani, Pier Paolo; Leoni, Erica; Tonelli, Domenica
2008-02-28
A home-made microelectrode array, based on reticulated vitreous carbon, was used as working electrode in square wave voltammetry experiments to quantify the bacterial load of Escherichia coli ATCC 13706 and Pseudomonas aeruginosa ATCC 27853, chosen as test microorganisms, in synthetic samples similar to drinking water (phosphate buffer). Raw electrochemical signals were analysed with partial least squares regression coupled to variable selection in order to correlate these values with the bacterial load estimated by aerobic plate counting. The results demonstrated the ability of the method to detect even low loads of microorganisms in synthetic water samples. In particular, the model detects the bacterial load in the range 3-2,020 CFU ml(-1) for E. coli and in the range 76-155,556 CFU ml(-1) for P. aeruginosa.
A Simple Introduction to Moving Least Squares and Local Regression Estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garimella, Rao Veerabhadra
In this brief note, a highly simpli ed introduction to esimating functions over a set of particles is presented. The note starts from Global Least Squares tting, going on to Moving Least Squares estimation (MLS) and nally, Local Regression Estimation (LRE).
Stamey, Timothy C.
1998-01-01
Simple and reliable methods for estimating hourly streamflow are needed for the calibration and verification of a Chattahoochee River basin model between Buford Dam and Franklin, Ga. The river basin model is being developed by Georgia Department of Natural Resources, Environmental Protection Division, as part of their Chattahoochee River Modeling Project. Concurrent streamflow data collected at 19 continuous-record, and 31 partial-record streamflow stations, were used in ordinary least-squares linear regression analyses to define estimating equations, and in verifying drainage-area prorations. The resulting regression or drainage-area ratio estimating equations were used to compute hourly streamflow at the partial-record stations. The coefficients of determination (r-squared values) for the regression estimating equations ranged from 0.90 to 0.99. Observed and estimated hourly and daily streamflow data were computed for May 1, 1995, through October 31, 1995. Comparisons of observed and estimated daily streamflow data for 12 continuous-record tributary stations, that had available streamflow data for all or part of the period from May 1, 1995, to October 31, 1995, indicate that the mean error of estimate for the daily streamflow was about 25 percent.
Monitoring Building Deformation with InSAR: Experiments and Validation.
Yang, Kui; Yan, Li; Huang, Guoman; Chen, Chu; Wu, Zhengpeng
2016-12-20
Synthetic Aperture Radar Interferometry (InSAR) techniques are increasingly applied for monitoring land subsidence. The advantages of InSAR include high accuracy and the ability to cover large areas; nevertheless, research validating the use of InSAR on building deformation is limited. In this paper, we test the monitoring capability of the InSAR in experiments using two landmark buildings; the Bohai Building and the China Theater, located in Tianjin, China. They were selected as real examples to compare InSAR and leveling approaches for building deformation. Ten TerraSAR-X images spanning half a year were used in Permanent Scatterer InSAR processing. These extracted InSAR results were processed considering the diversity in both direction and spatial distribution, and were compared with true leveling values in both Ordinary Least Squares (OLS) regression and measurement of error analyses. The detailed experimental results for the Bohai Building and the China Theater showed a high correlation between InSAR results and the leveling values. At the same time, the two Root Mean Square Error (RMSE) indexes had values of approximately 1 mm. These analyses show that a millimeter level of accuracy can be achieved by means of InSAR technique when measuring building deformation. We discuss the differences in accuracy between OLS regression and measurement of error analyses, and compare the accuracy index of leveling in order to propose InSAR accuracy levels appropriate for monitoring buildings deformation. After assessing the advantages and limitations of InSAR techniques in monitoring buildings, further applications are evaluated.
ERIC Educational Resources Information Center
Bulcock, J. W.
The problem of model estimation when the data are collinear was examined. Though the ridge regression (RR) outperforms ordinary least squares (OLS) regression in the presence of acute multicollinearity, it is not a problem free technique for reducing the variance of the estimates. It is a stochastic procedure when it should be nonstochastic and it…
Spirituality and Resilience Among Mexican American IPV Survivors.
de la Rosa, Iván A; Barnett-Queen, Timothy; Messick, Madeline; Gurrola, Maria
2016-12-01
Women with abusive partners use a variety of coping strategies. This study examined the correlation between spirituality, resilience, and intimate partner violence using a cross-sectional survey of 54 Mexican American women living along the U.S.-Mexico border. The meaning-making coping model provides the conceptual framework to explore how spirituality is used as a copying strategy. Multiple ordinary least squares (OLS) regression results indicate women who score higher on spirituality also report greater resilient characteristics. Poisson regression analyses revealed that an increase in level of spirituality is associated with lower number of types of abuse experienced. Clinical, programmatic, and research implications are discussed. © The Author(s) 2015.
Use of partial least squares regression to impute SNP genotypes in Italian cattle breeds.
Dimauro, Corrado; Cellesi, Massimo; Gaspa, Giustino; Ajmone-Marsan, Paolo; Steri, Roberto; Marras, Gabriele; Macciotta, Nicolò P P
2013-06-05
The objective of the present study was to test the ability of the partial least squares regression technique to impute genotypes from low density single nucleotide polymorphisms (SNP) panels i.e. 3K or 7K to a high density panel with 50K SNP. No pedigree information was used. Data consisted of 2093 Holstein, 749 Brown Swiss and 479 Simmental bulls genotyped with the Illumina 50K Beadchip. First, a single-breed approach was applied by using only data from Holstein animals. Then, to enlarge the training population, data from the three breeds were combined and a multi-breed analysis was performed. Accuracies of genotypes imputed using the partial least squares regression method were compared with those obtained by using the Beagle software. The impact of genotype imputation on breeding value prediction was evaluated for milk yield, fat content and protein content. In the single-breed approach, the accuracy of imputation using partial least squares regression was around 90 and 94% for the 3K and 7K platforms, respectively; corresponding accuracies obtained with Beagle were around 85% and 90%. Moreover, computing time required by the partial least squares regression method was on average around 10 times lower than computing time required by Beagle. Using the partial least squares regression method in the multi-breed resulted in lower imputation accuracies than using single-breed data. The impact of the SNP-genotype imputation on the accuracy of direct genomic breeding values was small. The correlation between estimates of genetic merit obtained by using imputed versus actual genotypes was around 0.96 for the 7K chip. Results of the present work suggested that the partial least squares regression imputation method could be useful to impute SNP genotypes when pedigree information is not available.
Shiota, Makoto; Iwasawa, Ai; Suzuki-Iwashima, Ai; Iida, Fumiko
2015-12-01
The impact of flavor composition, texture, and other factors on desirability of different commercial sources of Gouda-type cheese using multivariate analyses on the basis of sensory and instrumental analyses were investigated. Volatile aroma compounds were measured using headspace solid-phase microextraction gas chromatography/mass spectrometry (GC/MS) and steam distillation extraction (SDE)-GC/MS, and fatty acid composition, low-molecular-weight compounds, including amino acids, and organic acids, as well pH, texture, and color were measured to determine their relationship with sensory perception. Orthogonal partial least squares-discriminant analysis (OPLS-DA) was performed to discriminate between 2 different ripening periods in 7 sample sets, revealing that ethanol, ethyl acetate, hexanoic acid, and octanoic acid increased with increasing sensory attribute scores for sweetness, fruity, and sulfurous. A partial least squares (PLS) regression model was constructed to predict the desirability of cheese using these parameters. We showed that texture and buttery flavors are important factors affecting the desirability of Gouda-type cheeses for Japanese consumers using these multivariate analyses. © 2015 Institute of Food Technologists®
Robust analysis of trends in noisy tokamak confinement data using geodesic least squares regression
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, G., E-mail: geert.verdoolaege@ugent.be; Laboratory for Plasma Physics, Royal Military Academy, B-1000 Brussels; Shabbir, A.
Regression analysis is a very common activity in fusion science for unveiling trends and parametric dependencies, but it can be a difficult matter. We have recently developed the method of geodesic least squares (GLS) regression that is able to handle errors in all variables, is robust against data outliers and uncertainty in the regression model, and can be used with arbitrary distribution models and regression functions. We here report on first results of application of GLS to estimation of the multi-machine scaling law for the energy confinement time in tokamaks, demonstrating improved consistency of the GLS results compared to standardmore » least squares.« less
NASA Astrophysics Data System (ADS)
Muller, Sybrand Jacobus; van Niekerk, Adriaan
2016-07-01
Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.
Retargeted Least Squares Regression Algorithm.
Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin
2015-09-01
This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.
Wagner, Daniel M.; Krieger, Joshua D.; Veilleux, Andrea G.
2016-08-04
In 2013, the U.S. Geological Survey initiated a study to update regional skew, annual exceedance probability discharges, and regional regression equations used to estimate annual exceedance probability discharges for ungaged locations on streams in the study area with the use of recent geospatial data, new analytical methods, and available annual peak-discharge data through the 2013 water year. An analysis of regional skew using Bayesian weighted least-squares/Bayesian generalized-least squares regression was performed for Arkansas, Louisiana, and parts of Missouri and Oklahoma. The newly developed constant regional skew of -0.17 was used in the computation of annual exceedance probability discharges for 281 streamgages used in the regional regression analysis. Based on analysis of covariance, four flood regions were identified for use in the generation of regional regression models. Thirty-nine basin characteristics were considered as potential explanatory variables, and ordinary least-squares regression techniques were used to determine the optimum combinations of basin characteristics for each of the four regions. Basin characteristics in candidate models were evaluated based on multicollinearity with other basin characteristics (variance inflation factor < 2.5) and statistical significance at the 95-percent confidence level (p ≤ 0.05). Generalized least-squares regression was used to develop the final regression models for each flood region. Average standard errors of prediction of the generalized least-squares models ranged from 32.76 to 59.53 percent, with the largest range in flood region D. Pseudo coefficients of determination of the generalized least-squares models ranged from 90.29 to 97.28 percent, with the largest range also in flood region D. The regional regression equations apply only to locations on streams in Arkansas where annual peak discharges are not substantially affected by regulation, diversion, channelization, backwater, or urbanization. The applicability and accuracy of the regional regression equations depend on the basin characteristics measured for an ungaged location on a stream being within range of those used to develop the equations.
Monitoring Building Deformation with InSAR: Experiments and Validation
Yang, Kui; Yan, Li; Huang, Guoman; Chen, Chu; Wu, Zhengpeng
2016-01-01
Synthetic Aperture Radar Interferometry (InSAR) techniques are increasingly applied for monitoring land subsidence. The advantages of InSAR include high accuracy and the ability to cover large areas; nevertheless, research validating the use of InSAR on building deformation is limited. In this paper, we test the monitoring capability of the InSAR in experiments using two landmark buildings; the Bohai Building and the China Theater, located in Tianjin, China. They were selected as real examples to compare InSAR and leveling approaches for building deformation. Ten TerraSAR-X images spanning half a year were used in Permanent Scatterer InSAR processing. These extracted InSAR results were processed considering the diversity in both direction and spatial distribution, and were compared with true leveling values in both Ordinary Least Squares (OLS) regression and measurement of error analyses. The detailed experimental results for the Bohai Building and the China Theater showed a high correlation between InSAR results and the leveling values. At the same time, the two Root Mean Square Error (RMSE) indexes had values of approximately 1 mm. These analyses show that a millimeter level of accuracy can be achieved by means of InSAR technique when measuring building deformation. We discuss the differences in accuracy between OLS regression and measurement of error analyses, and compare the accuracy index of leveling in order to propose InSAR accuracy levels appropriate for monitoring buildings deformation. After assessing the advantages and limitations of InSAR techniques in monitoring buildings, further applications are evaluated. PMID:27999403
An Epidemiology Primer: Bridging the Gap between Epidemiology and Psychology.
1981-07-01
to the methods traditionally used in the field of psychology , The intent of this report is to describe some of these methods and explain the. in a...Hypotheses are formulated and tested in much the ame manmer and chi-square, regression, correLation, and analyses of variance are commonly employed in...etudies of morb~idity and mortality. It elsa we fu that epidemiologic studies employ rat"s and measures which, although sldm see in psychology , are
NASA Astrophysics Data System (ADS)
Wang, Yan-Jun; Liu, Qun
1999-03-01
Analysis of stock-recruitment (SR) data is most often done by fitting various SR relationship curves to the data. Fish population dynamics data often have stochastic variations and measurement errors, which usually result in a biased regression analysis. This paper presents a robust regression method, least median of squared orthogonal distance (LMD), which is insensitive to abnormal values in the dependent and independent variables in a regression analysis. Outliers that have significantly different variance from the rest of the data can be identified in a residual analysis. Then, the least squares (LS) method is applied to the SR data with defined outliers being down weighted. The application of LMD and LMD-based Reweighted Least Squares (RLS) method to simulated and real fisheries SR data is explored.
A Weighted Least Squares Approach To Robustify Least Squares Estimates.
ERIC Educational Resources Information Center
Lin, Chowhong; Davenport, Ernest C., Jr.
This study developed a robust linear regression technique based on the idea of weighted least squares. In this technique, a subsample of the full data of interest is drawn, based on a measure of distance, and an initial set of regression coefficients is calculated. The rest of the data points are then taken into the subsample, one after another,…
Validation of Core Temperature Estimation Algorithm
2016-01-29
plot of observed versus estimated core temperature with the line of identity (dashed) and the least squares regression line (solid) and line equation...estimated PSI with the line of identity (dashed) and the least squares regression line (solid) and line equation in the top left corner. (b) Bland...for comparison. The root mean squared error (RMSE) was also computed, as given by Equation 2.
Hordge, LaQuana N; McDaniel, Kiara L; Jones, Derick D; Fakayode, Sayo O
2016-05-15
The endocrine disruption property of estrogens necessitates the immediate need for effective monitoring and development of analytical protocols for their analyses in biological and human specimens. This study explores the first combined utility of a steady-state fluorescence spectroscopy and multivariate partial-least-square (PLS) regression analysis for the simultaneous determination of two estrogens (17α-ethinylestradiol (EE) and norgestimate (NOR)) concentrations in bovine serum albumin (BSA) and human serum albumin (HSA) samples. The influence of EE and NOR concentrations and temperature on the emission spectra of EE-HSA EE-BSA, NOR-HSA, and NOR-BSA complexes was also investigated. The binding of EE with HSA and BSA resulted in increase in emission characteristics of HSA and BSA and a significant blue spectra shift. In contrast, the interaction of NOR with HSA and BSA quenched the emission characteristics of HSA and BSA. The observed emission spectral shifts preclude the effective use of traditional univariate regression analysis of fluorescent data for the determination of EE and NOR concentrations in HSA and BSA samples. Multivariate partial-least-squares (PLS) regression analysis was utilized to correlate the changes in emission spectra with EE and NOR concentrations in HSA and BSA samples. The figures-of-merit of the developed PLS regression models were excellent, with limits of detection as low as 1.6×10(-8) M for EE and 2.4×10(-7) M for NOR and good linearity (R(2)>0.994985). The PLS models correctly predicted EE and NOR concentrations in independent validation HSA and BSA samples with a root-mean-square-percent-relative-error (RMS%RE) of less than 6.0% at physiological condition. On the contrary, the use of univariate regression resulted in poor predictions of EE and NOR in HSA and BSA samples, with RMS%RE larger than 40% at physiological conditions. High accuracy, low sensitivity, simplicity, low-cost with no prior analyte extraction or separation required makes this method promising, compelling, and attractive alternative for the rapid determination of estrogen concentrations in biomedical and biological specimens, pharmaceuticals, or environmental samples. Published by Elsevier B.V.
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F
2018-06-01
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
Smith, S. Jerrod; Lewis, Jason M.; Graves, Grant M.
2015-09-28
Generalized-least-squares multiple-linear regression analysis was used to formulate regression relations between peak-streamflow frequency statistics and basin characteristics. Contributing drainage area was the only basin characteristic determined to be statistically significant for all percentage of annual exceedance probabilities and was the only basin characteristic used in regional regression equations for estimating peak-streamflow frequency statistics on unregulated streams in and near the Oklahoma Panhandle. The regression model pseudo-coefficient of determination, converted to percent, for the Oklahoma Panhandle regional regression equations ranged from about 38 to 63 percent. The standard errors of prediction and the standard model errors for the Oklahoma Panhandle regional regression equations ranged from about 84 to 148 percent and from about 76 to 138 percent, respectively. These errors were comparable to those reported for regional peak-streamflow frequency regression equations for the High Plains areas of Texas and Colorado. The root mean square errors for the Oklahoma Panhandle regional regression equations (ranging from 3,170 to 92,000 cubic feet per second) were less than the root mean square errors for the Oklahoma statewide regression equations (ranging from 18,900 to 412,000 cubic feet per second); therefore, the Oklahoma Panhandle regional regression equations produce more accurate peak-streamflow statistic estimates for the irrigated period of record in the Oklahoma Panhandle than do the Oklahoma statewide regression equations. The regression equations developed in this report are applicable to streams that are not substantially affected by regulation, impoundment, or surface-water withdrawals. These regression equations are intended for use for stream sites with contributing drainage areas less than or equal to about 2,060 square miles, the maximum value for the independent variable used in the regression analysis.
Hydrologic and Hydraulic Analyses of Selected Streams in Lorain County, Ohio, 2003
Jackson, K. Scott; Ostheimer, Chad J.; Whitehead, Matthew T.
2003-01-01
Hydrologic and hydraulic analyses were done for selected reaches of nine streams in Lorain County Ohio. To assess the alternatives for flood-damage mitigation, the Lorain County Engineer and the U.S. Geological Survey (USGS) initiated a cooperative study to investigate aspects of the hydrology and hydraulics of the nine streams. Historical streamflow data and regional regression equations were used to estimate instantaneous peak discharges for floods having recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Explanatory variables used in the regression equations were drainage area, main-channel slope, and storage area. Drainage areas of the nine stream reaches studied ranged from 1.80 to 19.3 square miles. The step-backwater model HEC-RAS was used to determine water-surface-elevation profiles for the 10-year-recurrence-interval (10-year) flood along a selected reach of each stream. The water-surface pro-file information was used then to generate digital mapping of flood-plain boundaries. The analyses indicate that at the 10-year flood elevation, road overflow results at numerous hydraulic structures along the nine streams.
Rohwedder, J J R; Pasquini, C; Fortes, P R; Raimundo, I M; Wilk, A; Mizaikoff, B
2014-07-21
A miniaturised gas analyser is described and evaluated based on the use of a substrate-integrated hollow waveguide (iHWG) coupled to a microsized near-infrared spectrophotometer comprising a linear variable filter and an array of InGaAs detectors. This gas sensing system was applied to analyse surrogate samples of natural fuel gas containing methane, ethane, propane and butane, quantified by using multivariate regression models based on partial least square (PLS) algorithms and Savitzky-Golay 1(st) derivative data preprocessing. The external validation of the obtained models reveals root mean square errors of prediction of 0.37, 0.36, 0.67 and 0.37% (v/v), for methane, ethane, propane and butane, respectively. The developed sensing system provides particularly rapid response times upon composition changes of the gaseous sample (approximately 2 s) due the minute volume of the iHWG-based measurement cell. The sensing system developed in this study is fully portable with a hand-held sized analyser footprint, and thus ideally suited for field analysis. Last but not least, the obtained results corroborate the potential of NIR-iHWG analysers for monitoring the quality of natural gas and petrochemical gaseous products.
Brudvig, Jean M; Swenson, Cheryl L
2015-12-01
Rapid and precise measurement of total and differential nucleated cell counts is a crucial diagnostic component of cavitary and synovial fluid analyses. The objectives of this study included (1) evaluation of reliability and precision of canine and equine fluid total nucleated cell count (TNCC) determined by the benchtop Abaxis VetScan HM5, in comparison with the automated reference instruments ADVIA 120 and the scil Vet abc, respectively, and (2) comparison of automated with manual canine differential nucleated cell counts. The TNCC and differential counts in canine pleural and peritoneal, and equine synovial fluids were determined on the Abaxis VetScan HM5 and compared with the ADVIA 120 and Vet abc analyzer, respectively. Statistical analyses included correlation, least squares fit linear regression, Passing-Bablok regression, and Bland-Altman difference plots. In addition, precision of the total cell count generated by the VetScan HM5 was determined. Agreement was excellent without significant constant or proportional bias for canine cavitary fluid TNCC. Automated and manual differential counts had R(2) < .5 for individual cell types (least squares fit linear regression). Equine synovial fluid TNCC agreed but with some bias due to the VetScan HM5 overestimating TNCC compared to the Vet abc. Intra-assay precision of the VetScan HM5 in 3 fluid samples was 2-31%. The Abaxis VetScan HM5 provided rapid, reliable TNCC for canine and equine fluid samples. The differential nucleated cell count should be verified microscopically as counts from the VetScan HM5 and also from the ADVIA 120 were often incorrect in canine fluid samples. © 2015 American Society for Veterinary Clinical Pathology.
Quantification of trace metals in infant formula premixes using laser-induced breakdown spectroscopy
NASA Astrophysics Data System (ADS)
Cama-Moncunill, Raquel; Casado-Gavalda, Maria P.; Cama-Moncunill, Xavier; Markiewicz-Keszycka, Maria; Dixit, Yash; Cullen, Patrick J.; Sullivan, Carl
2017-09-01
Infant formula is a human milk substitute generally based upon fortified cow milk components. In order to mimic the composition of breast milk, trace elements such as copper, iron and zinc are usually added in a single operation using a premix. The correct addition of premixes must be verified to ensure that the target levels in infant formulae are achieved. In this study, a laser-induced breakdown spectroscopy (LIBS) system was assessed as a fast validation tool for trace element premixes. LIBS is a promising emission spectroscopic technique for elemental analysis, which offers real-time analyses, little to no sample preparation and ease of use. LIBS was employed for copper and iron determinations of premix samples ranging approximately from 0 to 120 mg/kg Cu/1640 mg/kg Fe. LIBS spectra are affected by several parameters, hindering subsequent quantitative analyses. This work aimed at testing three matrix-matched calibration approaches (simple-linear regression, multi-linear regression and partial least squares regression (PLS)) as means for precision and accuracy enhancement of LIBS quantitative analysis. All calibration models were first developed using a training set and then validated with an independent test set. PLS yielded the best results. For instance, the PLS model for copper provided a coefficient of determination (R2) of 0.995 and a root mean square error of prediction (RMSEP) of 14 mg/kg. Furthermore, LIBS was employed to penetrate through the samples by repetitively measuring the same spot. Consequently, LIBS spectra can be obtained as a function of sample layers. This information was used to explore whether measuring deeper into the sample could reduce possible surface-contaminant effects and provide better quantifications.
Greene, LaVana; Elzey, Brianda; Franklin, Mariah; Fakayode, Sayo O
2017-03-05
The negative health impact of polycyclic aromatic hydrocarbons (PAHs) and differences in pharmacological activity of enantiomers of chiral molecules in humans highlights the need for analysis of PAHs and their chiral analogue molecules in humans. Herein, the first use of cyclodextrin guest-host inclusion complexation, fluorescence spectrophotometry, and chemometric approach to PAH (anthracene) and chiral-PAH analogue derivatives (1-(9-anthryl)-2,2,2-triflouroethanol (TFE)) analyses are reported. The binding constants (K b ), stoichiometry (n), and thermodynamic properties (Gibbs free energy (ΔG), enthalpy (ΔH), and entropy (ΔS)) of anthracene and enantiomers of TFE-methyl-β-cyclodextrin (Me-β-CD) guest-host complexes were also determined. Chemometric partial-least-square (PLS) regression analysis of emission spectra data of Me-β-CD-guest-host inclusion complexes was used for the determination of anthracene and TFE enantiomer concentrations in Me-β-CD-guest-host inclusion complex samples. The values of calculated K b and negative ΔG suggest the thermodynamic favorability of anthracene-Me-β-CD and enantiomeric of TFE-Me-β-CD inclusion complexation reactions. However, anthracene-Me-β-CD and enantiomer TFE-Me-β-CD inclusion complexations showed notable differences in the binding affinity behaviors and thermodynamic properties. The PLS regression analysis resulted in square-correlation-coefficients of 0.997530 or better and a low LOD of 3.81×10 -7 M for anthracene and 3.48×10 -8 M for TFE enantiomers at physiological conditions. Most importantly, PLS regression accurately determined the anthracene and TFE enantiomer concentrations with an average low error of 2.31% for anthracene, 4.44% for R-TFE and 3.60% for S-TFE. The results of the study are highly significant because of its high sensitivity and accuracy for analysis of PAH and chiral PAH analogue derivatives without the need of an expensive chiral column, enantiomeric resolution, or use of a polarized light. Published by Elsevier B.V.
Selective Weighted Least Squares Method for Fourier Transform Infrared Quantitative Analysis.
Wang, Xin; Li, Yan; Wei, Haoyun; Chen, Xia
2017-06-01
Classical least squares (CLS) regression is a popular multivariate statistical method used frequently for quantitative analysis using Fourier transform infrared (FT-IR) spectrometry. Classical least squares provides the best unbiased estimator for uncorrelated residual errors with zero mean and equal variance. However, the noise in FT-IR spectra, which accounts for a large portion of the residual errors, is heteroscedastic. Thus, if this noise with zero mean dominates in the residual errors, the weighted least squares (WLS) regression method described in this paper is a better estimator than CLS. However, if bias errors, such as the residual baseline error, are significant, WLS may perform worse than CLS. In this paper, we compare the effect of noise and bias error in using CLS and WLS in quantitative analysis. Results indicated that for wavenumbers with low absorbance, the bias error significantly affected the error, such that the performance of CLS is better than that of WLS. However, for wavenumbers with high absorbance, the noise significantly affected the error, and WLS proves to be better than CLS. Thus, we propose a selective weighted least squares (SWLS) regression that processes data with different wavenumbers using either CLS or WLS based on a selection criterion, i.e., lower or higher than an absorbance threshold. The effects of various factors on the optimal threshold value (OTV) for SWLS have been studied through numerical simulations. These studies reported that: (1) the concentration and the analyte type had minimal effect on OTV; and (2) the major factor that influences OTV is the ratio between the bias error and the standard deviation of the noise. The last part of this paper is dedicated to quantitative analysis of methane gas spectra, and methane/toluene mixtures gas spectra as measured using FT-IR spectrometry and CLS, WLS, and SWLS. The standard error of prediction (SEP), bias of prediction (bias), and the residual sum of squares of the errors (RSS) from the three quantitative analyses were compared. In methane gas analysis, SWLS yielded the lowest SEP and RSS among the three methods. In methane/toluene mixture gas analysis, a modification of the SWLS has been presented to tackle the bias error from other components. The SWLS without modification presents the lowest SEP in all cases but not bias and RSS. The modification of SWLS reduced the bias, which showed a lower RSS than CLS, especially for small components.
Howard, Réka; Carriquiry, Alicia L.; Beavis, William D.
2014-01-01
Parametric and nonparametric methods have been developed for purposes of predicting phenotypes. These methods are based on retrospective analyses of empirical data consisting of genotypic and phenotypic scores. Recent reports have indicated that parametric methods are unable to predict phenotypes of traits with known epistatic genetic architectures. Herein, we review parametric methods including least squares regression, ridge regression, Bayesian ridge regression, least absolute shrinkage and selection operator (LASSO), Bayesian LASSO, best linear unbiased prediction (BLUP), Bayes A, Bayes B, Bayes C, and Bayes Cπ. We also review nonparametric methods including Nadaraya-Watson estimator, reproducing kernel Hilbert space, support vector machine regression, and neural networks. We assess the relative merits of these 14 methods in terms of accuracy and mean squared error (MSE) using simulated genetic architectures consisting of completely additive or two-way epistatic interactions in an F2 population derived from crosses of inbred lines. Each simulated genetic architecture explained either 30% or 70% of the phenotypic variability. The greatest impact on estimates of accuracy and MSE was due to genetic architecture. Parametric methods were unable to predict phenotypic values when the underlying genetic architecture was based entirely on epistasis. Parametric methods were slightly better than nonparametric methods for additive genetic architectures. Distinctions among parametric methods for additive genetic architectures were incremental. Heritability, i.e., proportion of phenotypic variability, had the second greatest impact on estimates of accuracy and MSE. PMID:24727289
Ahmad, Iftikhar; Ahmad, Manzoor; Khan, Karim; Ikram, Masroor
2016-06-01
Optical polarimetry was employed for assessment of ex vivo healthy and basal cell carcinoma (BCC) tissue samples from human skin. Polarimetric analyses revealed that depolarization and retardance for healthy tissue group were significantly higher (p<0.001) compared to BCC tissue group. Histopathology indicated that these differences partially arise from BCC-related characteristic changes in tissue morphology. Wilks lambda statistics demonstrated the potential of all investigated polarimetric properties for computer assisted classification of the two tissue groups. Based on differences in polarimetric properties, partial least square (PLS) regression classified the samples with 100% accuracy, sensitivity and specificity. These findings indicate that optical polarimetry together with PLS statistics hold promise for automated pathology classification. Copyright © 2016 Elsevier B.V. All rights reserved.
Comparative Analysis of River Flow Modelling by Using Supervised Learning Technique
NASA Astrophysics Data System (ADS)
Ismail, Shuhaida; Mohamad Pandiahi, Siraj; Shabri, Ani; Mustapha, Aida
2018-04-01
The goal of this research is to investigate the efficiency of three supervised learning algorithms for forecasting monthly river flow of the Indus River in Pakistan, spread over 550 square miles or 1800 square kilometres. The algorithms include the Least Square Support Vector Machine (LSSVM), Artificial Neural Network (ANN) and Wavelet Regression (WR). The forecasting models predict the monthly river flow obtained from the three models individually for river flow data and the accuracy of the all models were then compared against each other. The monthly river flow of the said river has been forecasted using these three models. The obtained results were compared and statistically analysed. Then, the results of this analytical comparison showed that LSSVM model is more precise in the monthly river flow forecasting. It was found that LSSVM has he higher r with the value of 0.934 compared to other models. This indicate that LSSVM is more accurate and efficient as compared to the ANN and WR model.
Mortality rates in OECD countries converged during the period 1990-2010.
Bremberg, Sven G
2017-06-01
Since the scientific revolution of the 18th century, human health has gradually improved, but there is no unifying theory that explains this improvement in health. Studies of macrodeterminants have produced conflicting results. Most studies have analysed health at a given point in time as the outcome; however, the rate of improvement in health might be a more appropriate outcome. Twenty-eight OECD member countries were selected for analysis in the period 1990-2010. The main outcomes studied, in six age groups, were the national rates of decrease in mortality in the period 1990-2010. The effects of seven potential determinants on the rates of decrease in mortality were analysed in linear multiple regression models using least squares, controlling for country-specific history constants, which represent the mortality rate in 1990. The multiple regression analyses started with models that only included mortality rates in 1990 as determinants. These models explained 87% of the intercountry variation in the children aged 1-4 years and 51% in adults aged 55-74 years. When added to the regression equations, the seven determinants did not seem to significantly increase the explanatory power of the equations. The analyses indicated a decrease in mortality in all nations and in all age groups. The development of mortality rates in the different nations demonstrated significant catch-up effects. Therefore an important objective of the national public health sector seems to be to reduce the delay between international research findings and the universal implementation of relevant innovations.
Prediction models for Arabica coffee beverage quality based on aroma analyses and chemometrics.
Ribeiro, J S; Augusto, F; Salva, T J G; Ferreira, M M C
2012-11-15
In this work, soft modeling based on chemometric analyses of coffee beverage sensory data and the chromatographic profiles of volatile roasted coffee compounds is proposed to predict the scores of acidity, bitterness, flavor, cleanliness, body, and overall quality of the coffee beverage. A partial least squares (PLS) regression method was used to construct the models. The ordered predictor selection (OPS) algorithm was applied to select the compounds for the regression model of each sensory attribute in order to take only significant chromatographic peaks into account. The prediction errors of these models, using 4 or 5 latent variables, were equal to 0.28, 0.33, 0.35, 0.33, 0.34 and 0.41, for each of the attributes and compatible with the errors of the mean scores of the experts. Thus, the results proved the feasibility of using a similar methodology in on-line or routine applications to predict the sensory quality of Brazilian Arabica coffee. Copyright © 2012 Elsevier B.V. All rights reserved.
Do MCAT scores predict USMLE scores? An analysis on 5 years of medical student data.
Gauer, Jacqueline L; Wolff, Josephine M; Jackson, J Brooks
2016-01-01
The purpose of this study was to determine the associations and predictive values of Medical College Admission Test (MCAT) component and composite scores prior to 2015 with U.S. Medical Licensure Exam (USMLE) Step 1 and Step 2 Clinical Knowledge (CK) scores, with a focus on whether students scoring low on the MCAT were particularly likely to continue to score low on the USMLE exams. Multiple linear regression, correlation, and chi-square analyses were performed to determine the relationship between MCAT component and composite scores and USMLE Step 1 and Step 2 CK scores from five graduating classes (2011-2015) at the University of Minnesota Medical School ( N =1,065). The multiple linear regression analyses were both significant ( p <0.001). The three MCAT component scores together explained 17.7% of the variance in Step 1 scores ( p< 0.001) and 12.0% of the variance in Step 2 CK scores ( p <0.001). In the chi-square analyses, significant, albeit weak associations were observed between almost all MCAT component scores and USMLE scores (Cramer's V ranged from 0.05 to 0.24). Each of the MCAT component scores was significantly associated with USMLE Step 1 and Step 2 CK scores, although the effect size was small. Being in the top or bottom scoring range of the MCAT exam was predictive of being in the top or bottom scoring range of the USMLE exams, although the strengths of the associations were weak to moderate. These results indicate that MCAT scores are predictive of student performance on the USMLE exams, but, given the small effect sizes, should be considered as part of the holistic view of the student.
Do MCAT scores predict USMLE scores? An analysis on 5 years of medical student data
Gauer, Jacqueline L.; Wolff, Josephine M.; Jackson, J. Brooks
2016-01-01
Introduction The purpose of this study was to determine the associations and predictive values of Medical College Admission Test (MCAT) component and composite scores prior to 2015 with U.S. Medical Licensure Exam (USMLE) Step 1 and Step 2 Clinical Knowledge (CK) scores, with a focus on whether students scoring low on the MCAT were particularly likely to continue to score low on the USMLE exams. Method Multiple linear regression, correlation, and chi-square analyses were performed to determine the relationship between MCAT component and composite scores and USMLE Step 1 and Step 2 CK scores from five graduating classes (2011–2015) at the University of Minnesota Medical School (N=1,065). Results The multiple linear regression analyses were both significant (p<0.001). The three MCAT component scores together explained 17.7% of the variance in Step 1 scores (p<0.001) and 12.0% of the variance in Step 2 CK scores (p<0.001). In the chi-square analyses, significant, albeit weak associations were observed between almost all MCAT component scores and USMLE scores (Cramer's V ranged from 0.05 to 0.24). Discussion Each of the MCAT component scores was significantly associated with USMLE Step 1 and Step 2 CK scores, although the effect size was small. Being in the top or bottom scoring range of the MCAT exam was predictive of being in the top or bottom scoring range of the USMLE exams, although the strengths of the associations were weak to moderate. These results indicate that MCAT scores are predictive of student performance on the USMLE exams, but, given the small effect sizes, should be considered as part of the holistic view of the student. PMID:27702431
Orthogonalizing EM: A design-based least squares algorithm.
Xiong, Shifeng; Dai, Bin; Huling, Jared; Qian, Peter Z G
We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares problems, and is considerably faster than competing methods when n is much larger than p . Supplementary materials for this article are available online.
Newman, J; Egan, T; Harbourne, N; O'Riordan, D; Jacquier, J C; O'Sullivan, M
2014-08-01
Sensory evaluation can be problematic for ingredients with a bitter taste during research and development phase of new food products. In this study, 19 dairy protein hydrolysates (DPH) were analysed by an electronic tongue and their physicochemical characteristics, the data obtained from these methods were correlated with their bitterness intensity as scored by a trained sensory panel and each model was also assessed by its predictive capabilities. The physiochemical characteristics of the DPHs investigated were degree of hydrolysis (DH%), and data relating to peptide size and relative hydrophobicity from size exclusion chromatography (SEC) and reverse phase (RP) HPLC. Partial least square regression (PLS) was used to construct the prediction models. All PLS regressions had good correlations (0.78 to 0.93) with the strongest being the combination of data obtained from SEC and RP HPLC. However, the PLS with the strongest predictive power was based on the e-tongue which had the PLS regression with the lowest root mean predicted residual error sum of squares (PRESS) in the study. The results show that the PLS models constructed with the e-tongue and the combination of SEC and RP-HPLC has potential to be used for prediction of bitterness and thus reducing the reliance on sensory analysis in DPHs for future food research. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Yehia, Ali M.; Arafa, Reham M.; Abbas, Samah S.; Amer, Sawsan M.
2016-01-01
Spectral resolution of cefquinome sulfate (CFQ) in the presence of its degradation products was studied. Three selective, accurate and rapid spectrophotometric methods were performed for the determination of CFQ in the presence of either its hydrolytic, oxidative or photo-degradation products. The proposed ratio difference, derivative ratio and mean centering are ratio manipulating spectrophotometric methods that were satisfactorily applied for selective determination of CFQ within linear range of 5.0-40.0 μg mL- 1. Concentration Residuals Augmented Classical Least Squares was applied and evaluated for the determination of the cited drug in the presence of its all degradation products. Traditional Partial Least Squares regression was also applied and benchmarked against the proposed advanced multivariate calibration. Experimentally designed 25 synthetic mixtures of three factors at five levels were used to calibrate and validate the multivariate models. Advanced chemometrics succeeded in quantitative and qualitative analyses of CFQ along with its hydrolytic, oxidative and photo-degradation products. The proposed methods were applied successfully for different pharmaceutical formulations analyses. These developed methods were simple and cost-effective compared with the manufacturer's RP-HPLC method.
ERIC Educational Resources Information Center
Coskuntuncel, Orkun
2013-01-01
The purpose of this study is two-fold; the first aim being to show the effect of outliers on the widely used least squares regression estimator in social sciences. The second aim is to compare the classical method of least squares with the robust M-estimator using the "determination of coefficient" (R[superscript 2]). For this purpose,…
Developing a dengue forecast model using machine learning: A case study in China.
Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-10-01
In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.
Rahman, Md. Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D. W.; Labrique, Alain B.; Rashid, Mahbubur; Christian, Parul; West, Keith P.
2017-01-01
Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 − -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset. PMID:29261760
Kabir, Alamgir; Rahman, Md Jahanur; Shamim, Abu Ahmed; Klemm, Rolf D W; Labrique, Alain B; Rashid, Mahbubur; Christian, Parul; West, Keith P
2017-01-01
Birth weight, length and circumferences of the head, chest and arm are key measures of newborn size and health in developing countries. We assessed maternal socio-demographic factors associated with multiple measures of newborn size in a large rural population in Bangladesh using partial least squares (PLS) regression method. PLS regression, combining features from principal component analysis and multiple linear regression, is a multivariate technique with an ability to handle multicollinearity while simultaneously handling multiple dependent variables. We analyzed maternal and infant data from singletons (n = 14,506) born during a double-masked, cluster-randomized, placebo-controlled maternal vitamin A or β-carotene supplementation trial in rural northwest Bangladesh. PLS regression results identified numerous maternal factors (parity, age, early pregnancy MUAC, living standard index, years of education, number of antenatal care visits, preterm delivery and infant sex) significantly (p<0.001) associated with newborn size. Among them, preterm delivery had the largest negative influence on newborn size (Standardized β = -0.29 - -0.19; p<0.001). Scatter plots of the scores of first two PLS components also revealed an interaction between newborn sex and preterm delivery on birth size. PLS regression was found to be more parsimonious than both ordinary least squares regression and principal component regression. It also provided more stable estimates than the ordinary least squares regression and provided the effect measure of the covariates with greater accuracy as it accounts for the correlation among the covariates and outcomes. Therefore, PLS regression is recommended when either there are multiple outcome measurements in the same study, or the covariates are correlated, or both situations exist in a dataset.
Kernel Partial Least Squares for Nonlinear Regression and Discrimination
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Clancy, Daniel (Technical Monitor)
2002-01-01
This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.
Superquantile Regression: Theory, Algorithms, and Applications
2014-12-01
Example C: Stack loss data scatterplot matrix. 91 Regression α c0 caf cwt cac R̄ 2 α R̄ 2 α,Adj Least Squares NA -39.9197 0.7156 1.2953 -0.1521 0.9136...This is due to a small 92 Model Regression α c0 cwt cwt2 R̄ 2 α R̄ 2 α,Adj f2 Least Squares NA -41.9109 2.8174 — 0.7665 0.7542 Quantile 0.25 -32.0000
Waltemeyer, Scott D.
2008-01-01
Estimates of the magnitude and frequency of peak discharges are necessary for the reliable design of bridges, culverts, and open-channel hydraulic analysis, and for flood-hazard mapping in New Mexico and surrounding areas. The U.S. Geological Survey, in cooperation with the New Mexico Department of Transportation, updated estimates of peak-discharge magnitude for gaging stations in the region and updated regional equations for estimation of peak discharge and frequency at ungaged sites. Equations were developed for estimating the magnitude of peak discharges for recurrence intervals of 2, 5, 10, 25, 50, 100, and 500 years at ungaged sites by use of data collected through 2004 for 293 gaging stations on unregulated streams that have 10 or more years of record. Peak discharges for selected recurrence intervals were determined at gaging stations by fitting observed data to a log-Pearson Type III distribution with adjustments for a low-discharge threshold and a zero skew coefficient. A low-discharge threshold was applied to frequency analysis of 140 of the 293 gaging stations. This application provides an improved fit of the log-Pearson Type III frequency distribution. Use of the low-discharge threshold generally eliminated the peak discharge by having a recurrence interval of less than 1.4 years in the probability-density function. Within each of the nine regions, logarithms of the maximum peak discharges for selected recurrence intervals were related to logarithms of basin and climatic characteristics by using stepwise ordinary least-squares regression techniques for exploratory data analysis. Generalized least-squares regression techniques, an improved regression procedure that accounts for time and spatial sampling errors, then were applied to the same data used in the ordinary least-squares regression analyses. The average standard error of prediction, which includes average sampling error and average standard error of regression, ranged from 38 to 93 percent (mean value is 62, and median value is 59) for the 100-year flood. The 1996 investigation standard error of prediction for the flood regions ranged from 41 to 96 percent (mean value is 67, and median value is 68) for the 100-year flood that was analyzed by using generalized least-squares regression analysis. Overall, the equations based on generalized least-squares regression techniques are more reliable than those in the 1996 report because of the increased length of record and improved geographic information system (GIS) method to determine basin and climatic characteristics. Flood-frequency estimates can be made for ungaged sites upstream or downstream from gaging stations by using a method that transfers flood-frequency data at the gaging station to the ungaged site by using a drainage-area ratio adjustment equation. The peak discharge for a given recurrence interval at the gaging station, drainage-area ratio, and the drainage-area exponent from the regional regression equation of the respective region is used to transfer the peak discharge for the recurrence interval to the ungaged site. Maximum observed peak discharge as related to drainage area was determined for New Mexico. Extreme events are commonly used in the design and appraisal of bridge crossings and other structures. Bridge-scour evaluations are commonly made by using the 500-year peak discharge for these appraisals. Peak-discharge data collected at 293 gaging stations and 367 miscellaneous sites were used to develop a maximum peak-discharge relation as an alternative method of estimating peak discharge of an extreme event such as a maximum probable flood.
NASA Astrophysics Data System (ADS)
Althuwaynee, Omar F.; Pradhan, Biswajeet; Ahmad, Noordin
2014-06-01
This article uses methodology based on chi-squared automatic interaction detection (CHAID), as a multivariate method that has an automatic classification capacity to analyse large numbers of landslide conditioning factors. This new algorithm was developed to overcome the subjectivity of the manual categorization of scale data of landslide conditioning factors, and to predict rainfall-induced susceptibility map in Kuala Lumpur city and surrounding areas using geographic information system (GIS). The main objective of this article is to use CHi-squared automatic interaction detection (CHAID) method to perform the best classification fit for each conditioning factor, then, combining it with logistic regression (LR). LR model was used to find the corresponding coefficients of best fitting function that assess the optimal terminal nodes. A cluster pattern of landslide locations was extracted in previous study using nearest neighbor index (NNI), which were then used to identify the clustered landslide locations range. Clustered locations were used as model training data with 14 landslide conditioning factors such as; topographic derived parameters, lithology, NDVI, land use and land cover maps. Pearson chi-squared value was used to find the best classification fit between the dependent variable and conditioning factors. Finally the relationship between conditioning factors were assessed and the landslide susceptibility map (LSM) was produced. An area under the curve (AUC) was used to test the model reliability and prediction capability with the training and validation landslide locations respectively. This study proved the efficiency and reliability of decision tree (DT) model in landslide susceptibility mapping. Also it provided a valuable scientific basis for spatial decision making in planning and urban management studies.
Socioeconomic factors affecting infant sleep-related deaths in St. Louis.
Hogan, Cathy
2014-01-01
Though the Back to Sleep Campaign that began in 1994 caused an overall decrease in sudden infant death syndrome (SIDS) rates, racial disparity has continued to increase in St. Louis. Though researchers have analyzed and described various sociodemographic characteristics of SIDS and infant deaths by unintentional suffocation in St. Louis, they have not simultaneously controlled for contributory risk factors to racial disparity such as race, poverty, maternal education, and number of children born to each mother (parity). To determine whether there is a relationship between maternal socioeconomic factors and sleep-related infant death. This quantitative case-control study used secondary data collected by the Missouri Department of Health and Senior Services between 2005 and 2009. The sample includes matched birth/death certificates and living birth certificates of infants who were born/died within time frame. Descriptive analysis, Chi-square, and logistic regression. The controls were birth records of infants who lived more than 1 year. Chi-square and logistic regression analyses confirmed that race and poverty have significant relationships with infant sleep-related deaths. The social significance of this study is that the results may lead to population-specific modifications of prevention messages that will reduce infant sleep-related deaths. © 2013 Wiley Periodicals, Inc.
Confidence Intervals for Squared Semipartial Correlation Coefficients: The Effect of Nonnormality
ERIC Educational Resources Information Center
Algina, James; Keselman, H. J.; Penfield, Randall D.
2010-01-01
The increase in the squared multiple correlation coefficient ([delta]R[superscript 2]) associated with a variable in a regression equation is a commonly used measure of importance in regression analysis. Algina, Keselman, and Penfield found that intervals based on asymptotic principles were typically very inaccurate, even though the sample size…
Multilevel Modeling and Ordinary Least Squares Regression: How Comparable Are They?
ERIC Educational Resources Information Center
Huang, Francis L.
2018-01-01
Studies analyzing clustered data sets using both multilevel models (MLMs) and ordinary least squares (OLS) regression have generally concluded that resulting point estimates, but not the standard errors, are comparable with each other. However, the accuracy of the estimates of OLS models is important to consider, as several alternative techniques…
A Comparison of Mean Phase Difference and Generalized Least Squares for Analyzing Single-Case Data
ERIC Educational Resources Information Center
Manolov, Rumen; Solanas, Antonio
2013-01-01
The present study focuses on single-case data analysis specifically on two procedures for quantifying differences between baseline and treatment measurements. The first technique tested is based on generalized least square regression analysis and is compared to a proposed non-regression technique, which allows obtaining similar information. The…
Immigration Reporting Laws: Ethical Dilemmas in Pediatric Practice
Geltman, Paul L.; Meyers, Alan F.
1998-01-01
Objectives. This study assessed the potential impact of immigration reporting requirements on pediatricians' referrals to child protective services. Methods. A random sample of 200 Massachusetts pediatricians were surveyed. Chi-square and logistic regression analyses were performed. Results. Asked whether potential deportation of the family would cause them to question or alter a decision to refer, 50% of the respondents said yes. Conclusions. Pediatricians, as mandated reporters of child abuse, will face ethical dilemmas if laws requiring reporting of immigration status are enacted. (Am J Public Health. 1998;88:967-968) PMID:9618632
Feaster, Toby D.; Gotvald, Anthony J.; Weaver, J. Curtis
2014-01-01
Reliable estimates of the magnitude and frequency of floods are essential for the design of transportation and water-conveyance structures, flood-insurance studies, and flood-plain management. Such estimates are particularly important in densely populated urban areas. In order to increase the number of streamflow-gaging stations (streamgages) available for analysis, expand the geographical coverage that would allow for application of regional regression equations across State boundaries, and build on a previous flood-frequency investigation of rural U.S Geological Survey streamgages in the Southeast United States, a multistate approach was used to update methods for determining the magnitude and frequency of floods in urban and small, rural streams that are not substantially affected by regulation or tidal fluctuations in Georgia, South Carolina, and North Carolina. The at-site flood-frequency analysis of annual peak-flow data for urban and small, rural streams (through September 30, 2011) included 116 urban streamgages and 32 small, rural streamgages, defined in this report as basins draining less than 1 square mile. The regional regression analysis included annual peak-flow data from an additional 338 rural streamgages previously included in U.S. Geological Survey flood-frequency reports and 2 additional rural streamgages in North Carolina that were not included in the previous Southeast rural flood-frequency investigation for a total of 488 streamgages included in the urban and small, rural regression analysis. The at-site flood-frequency analyses for the urban and small, rural streamgages included the expected moments algorithm, which is a modification of the Bulletin 17B log-Pearson type III method for fitting the statistical distribution to the logarithms of the annual peak flows. Where applicable, the flood-frequency analysis also included low-outlier and historic information. Additionally, the application of a generalized Grubbs-Becks test allowed for the detection of multiple potentially influential low outliers. Streamgage basin characteristics were determined using geographical information system techniques. Initial ordinary least squares regression simulations reduced the number of basin characteristics on the basis of such factors as statistical significance, coefficient of determination, Mallow’s Cp statistic, and ease of measurement of the explanatory variable. Application of generalized least squares regression techniques produced final predictive (regression) equations for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probability flows for urban and small, rural ungaged basins for three hydrologic regions (HR1, Piedmont–Ridge and Valley; HR3, Sand Hills; and HR4, Coastal Plain), which previously had been defined from exploratory regression analysis in the Southeast rural flood-frequency investigation. Because of the limited availability of urban streamgages in the Coastal Plain of Georgia, South Carolina, and North Carolina, additional urban streamgages in Florida and New Jersey were used in the regression analysis for this region. Including the urban streamgages in New Jersey allowed for the expansion of the applicability of the predictive equations in the Coastal Plain from 3.5 to 53.5 square miles. Average standard error of prediction for the predictive equations, which is a measure of the average accuracy of the regression equations when predicting flood estimates for ungaged sites, range from 25.0 percent for the 10-percent annual exceedance probability regression equation for the Piedmont–Ridge and Valley region to 73.3 percent for the 0.2-percent annual exceedance probability regression equation for the Sand Hills region.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bernad-Beltrán, D.; Simó, A.; Bovea, M.D., E-mail: bovea@uji.es
Highlights: • Attitude towards incorporating biowaste selective collection is analysed. • Willingness to participate and to pay in biowaste selective collection is obtained. • Socioeconomic aspects affecting WtParticipate and WtPay are identified. - Abstract: European waste legislation has been encouraging for years the incorporation of selective collection systems for the biowaste fraction. European countries are therefore incorporating it into their current municipal solid waste management (MSWM) systems. However, this incorporation involves changes in the current waste management habits of households. In this paper, the attitude of the public towards the incorporation of selective collection of biowaste into an existing MSWMmore » system in a Spanish municipality is analysed. A semi-structured telephone interview was used to obtain information regarding aspects such as: level of participation in current waste collection systems, willingness to participate in selective collection of biowaste, reasons and barriers that affect participation, willingness to pay for the incorporation of the selective collection of biowaste and the socioeconomic characteristics of citizens who are willing to participate and pay for selective collection of biowaste. The results showed that approximately 81% of the respondents were willing to participate in selective collection of biowaste. This percentage would increase until 89% if the Town Council provided specific waste bins and bags, since the main barrier to participate in the new selective collection system is the need to use specific waste bin and bags for the separation of biowaste. A logit response model was applied to estimate the average willingness to pay, obtaining an estimated mean of 7.5% on top of the current waste management annual tax. The relationship of willingness to participate and willingness to pay for the implementation of this new selective collection with the socioeconomic variables (age, gender, size of the household, work, education and income) was analysed. Chi-square independence tests and binary logistic regression was used for willingness to participate, not being obtained any significant relationship. Chi-square independence tests, ordinal logistic regression and ordinary linear regression was applied for willingness to pay, obtaining statistically significant relationship for most of the socioeconomic variables.« less
Orthogonalizing EM: A design-based least squares algorithm
Xiong, Shifeng; Dai, Bin; Huling, Jared; Qian, Peter Z. G.
2016-01-01
We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares problems, and is considerably faster than competing methods when n is much larger than p. Supplementary materials for this article are available online. PMID:27499558
Optimizing methods for linking cinematic features to fMRI data.
Kauttonen, Janne; Hlushchuk, Yevhen; Tikka, Pia
2015-04-15
One of the challenges of naturalistic neurosciences using movie-viewing experiments is how to interpret observed brain activations in relation to the multiplicity of time-locked stimulus features. As previous studies have shown less inter-subject synchronization across viewers of random video footage than story-driven films, new methods need to be developed for analysis of less story-driven contents. To optimize the linkage between our fMRI data collected during viewing of a deliberately non-narrative silent film 'At Land' by Maya Deren (1944) and its annotated content, we combined the method of elastic-net regularization with the model-driven linear regression and the well-established data-driven independent component analysis (ICA) and inter-subject correlation (ISC) methods. In the linear regression analysis, both IC and region-of-interest (ROI) time-series were fitted with time-series of a total of 36 binary-valued and one real-valued tactile annotation of film features. The elastic-net regularization and cross-validation were applied in the ordinary least-squares linear regression in order to avoid over-fitting due to the multicollinearity of regressors, the results were compared against both the partial least-squares (PLS) regression and the un-regularized full-model regression. Non-parametric permutation testing scheme was applied to evaluate the statistical significance of regression. We found statistically significant correlation between the annotation model and 9 ICs out of 40 ICs. Regression analysis was also repeated for a large set of cubic ROIs covering the grey matter. Both IC- and ROI-based regression analyses revealed activations in parietal and occipital regions, with additional smaller clusters in the frontal lobe. Furthermore, we found elastic-net based regression more sensitive than PLS and un-regularized regression since it detected a larger number of significant ICs and ROIs. Along with the ISC ranking methods, our regression analysis proved a feasible method for ordering the ICs based on their functional relevance to the annotated cinematic features. The novelty of our method is - in comparison to the hypothesis-driven manual pre-selection and observation of some individual regressors biased by choice - in applying data-driven approach to all content features simultaneously. We found especially the combination of regularized regression and ICA useful when analyzing fMRI data obtained using non-narrative movie stimulus with a large set of complex and correlated features. Copyright © 2015. Published by Elsevier Inc.
Geodesic regression on orientation distribution functions with its application to an aging study.
Du, Jia; Goh, Alvina; Kushnarev, Sergey; Qiu, Anqi
2014-02-15
In this paper, we treat orientation distribution functions (ODFs) derived from high angular resolution diffusion imaging (HARDI) as elements of a Riemannian manifold and present a method for geodesic regression on this manifold. In order to find the optimal regression model, we pose this as a least-squares problem involving the sum-of-squared geodesic distances between observed ODFs and their model fitted data. We derive the appropriate gradient terms and employ gradient descent to find the minimizer of this least-squares optimization problem. In addition, we show how to perform statistical testing for determining the significance of the relationship between the manifold-valued regressors and the real-valued regressands. Experiments on both synthetic and real human data are presented. In particular, we examine aging effects on HARDI via geodesic regression of ODFs in normal adults aged 22 years old and above. © 2013 Elsevier Inc. All rights reserved.
Eash, David A.; Barnes, Kimberlee K.; Veilleux, Andrea G.
2013-01-01
A statewide study was performed to develop regional regression equations for estimating selected annual exceedance-probability statistics for ungaged stream sites in Iowa. The study area comprises streamgages located within Iowa and 50 miles beyond the State’s borders. Annual exceedance-probability estimates were computed for 518 streamgages by using the expected moments algorithm to fit a Pearson Type III distribution to the logarithms of annual peak discharges for each streamgage using annual peak-discharge data through 2010. The estimation of the selected statistics included a Bayesian weighted least-squares/generalized least-squares regression analysis to update regional skew coefficients for the 518 streamgages. Low-outlier and historic information were incorporated into the annual exceedance-probability analyses, and a generalized Grubbs-Beck test was used to detect multiple potentially influential low flows. Also, geographic information system software was used to measure 59 selected basin characteristics for each streamgage. Regional regression analysis, using generalized least-squares regression, was used to develop a set of equations for each flood region in Iowa for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent annual exceedance probabilities, which are equivalent to annual flood-frequency recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years, respectively. A total of 394 streamgages were included in the development of regional regression equations for three flood regions (regions 1, 2, and 3) that were defined for Iowa based on landform regions and soil regions. Average standard errors of prediction range from 31.8 to 45.2 percent for flood region 1, 19.4 to 46.8 percent for flood region 2, and 26.5 to 43.1 percent for flood region 3. The pseudo coefficients of determination for the generalized least-squares equations range from 90.8 to 96.2 percent for flood region 1, 91.5 to 97.9 percent for flood region 2, and 92.4 to 96.0 percent for flood region 3. The regression equations are applicable only to stream sites in Iowa with flows not significantly affected by regulation, diversion, channelization, backwater, or urbanization and with basin characteristics within the range of those used to develop the equations. These regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system tool. StreamStats allows users to click on any ungaged site on a river and compute estimates of the eight selected statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged sites also are provided by the Web-based tool. StreamStats also allows users to click on any streamgage in Iowa and estimates computed for these eight selected statistics are provided for the streamgage.
Weighted linear regression using D2H and D2 as the independent variables
Hans T. Schreuder; Michael S. Williams
1998-01-01
Several error structures for weighted regression equations used for predicting volume were examined for 2 large data sets of felled and standing loblolly pine trees (Pinus taeda L.). The generally accepted model with variance of error proportional to the value of the covariate squared ( D2H = diameter squared times height or D...
Liley, Helen; Zhang, Ju; Firth, Elwyn; Fernandez, Justin; Besier, Thor
2017-11-01
Population variance in bone shape is an important consideration when applying the results of subject-specific computational models to a population. In this letter, we demonstrate the ability of partial least squares regression to provide an improved shape prediction of the equine third metacarpal epiphysis, using two easily obtained measurements.
Can Emotional and Behavioral Dysregulation in Youth Be Decoded from Functional Neuroimaging?
Portugal, Liana C L; Rosa, Maria João; Rao, Anil; Bebko, Genna; Bertocci, Michele A; Hinze, Amanda K; Bonar, Lisa; Almeida, Jorge R C; Perlman, Susan B; Versace, Amelia; Schirda, Claudiu; Travis, Michael; Gill, Mary Kay; Demeter, Christine; Diwadkar, Vaibhav A; Ciuffetelli, Gary; Rodriguez, Eric; Forbes, Erika E; Sunshine, Jeffrey L; Holland, Scott K; Kowatch, Robert A; Birmaher, Boris; Axelson, David; Horwitz, Sarah M; Arnold, Eugene L; Fristad, Mary A; Youngstrom, Eric A; Findling, Robert L; Pereira, Mirtes; Oliveira, Leticia; Phillips, Mary L; Mourao-Miranda, Janaina
2016-01-01
High comorbidity among pediatric disorders characterized by behavioral and emotional dysregulation poses problems for diagnosis and treatment, and suggests that these disorders may be better conceptualized as dimensions of abnormal behaviors. Furthermore, identifying neuroimaging biomarkers related to dimensional measures of behavior may provide targets to guide individualized treatment. We aimed to use functional neuroimaging and pattern regression techniques to determine whether patterns of brain activity could accurately decode individual-level severity on a dimensional scale measuring behavioural and emotional dysregulation at two different time points. A sample of fifty-seven youth (mean age: 14.5 years; 32 males) was selected from a multi-site study of youth with parent-reported behavioral and emotional dysregulation. Participants performed a block-design reward paradigm during functional Magnetic Resonance Imaging (fMRI). Pattern regression analyses consisted of Relevance Vector Regression (RVR) and two cross-validation strategies implemented in the Pattern Recognition for Neuroimaging toolbox (PRoNTo). Medication was treated as a binary confounding variable. Decoded and actual clinical scores were compared using Pearson's correlation coefficient (r) and mean squared error (MSE) to evaluate the models. Permutation test was applied to estimate significance levels. Relevance Vector Regression identified patterns of neural activity associated with symptoms of behavioral and emotional dysregulation at the initial study screen and close to the fMRI scanning session. The correlation and the mean squared error between actual and decoded symptoms were significant at the initial study screen and close to the fMRI scanning session. However, after controlling for potential medication effects, results remained significant only for decoding symptoms at the initial study screen. Neural regions with the highest contribution to the pattern regression model included cerebellum, sensory-motor and fronto-limbic areas. The combination of pattern regression models and neuroimaging can help to determine the severity of behavioral and emotional dysregulation in youth at different time points.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
NASA Astrophysics Data System (ADS)
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four
Methods for estimating selected low-flow frequency statistics for unregulated streams in Kentucky
Martin, Gary R.; Arihood, Leslie D.
2010-01-01
This report provides estimates of, and presents methods for estimating, selected low-flow frequency statistics for unregulated streams in Kentucky including the 30-day mean low flows for recurrence intervals of 2 and 5 years (30Q2 and 30Q5) and the 7-day mean low flows for recurrence intervals of 5, 10, and 20 years (7Q2, 7Q10, and 7Q20). Estimates of these statistics are provided for 121 U.S. Geological Survey streamflow-gaging stations with data through the 2006 climate year, which is the 12-month period ending March 31 of each year. Data were screened to identify the periods of homogeneous, unregulated flows for use in the analyses. Logistic-regression equations are presented for estimating the annual probability of the selected low-flow frequency statistics being equal to zero. Weighted-least-squares regression equations were developed for estimating the magnitude of the nonzero 30Q2, 30Q5, 7Q2, 7Q10, and 7Q20 low flows. Three low-flow regions were defined for estimating the 7-day low-flow frequency statistics. The explicit explanatory variables in the regression equations include total drainage area and the mapped streamflow-variability index measured from a revised statewide coverage of this characteristic. The percentage of the station low-flow statistics correctly classified as zero or nonzero by use of the logistic-regression equations ranged from 87.5 to 93.8 percent. The average standard errors of prediction of the weighted-least-squares regression equations ranged from 108 to 226 percent. The 30Q2 regression equations have the smallest standard errors of prediction, and the 7Q20 regression equations have the largest standard errors of prediction. The regression equations are applicable only to stream sites with low flows unaffected by regulation from reservoirs and local diversions of flow and to drainage basins in specified ranges of basin characteristics. Caution is advised when applying the equations for basins with characteristics near the applicable limits and for basins with karst drainage features.
Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio
Koltun, G.F.; Roberts, J.W.
1990-01-01
Multiple-regression equations are presented for estimating flood-peak discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years at ungaged sites on rural, unregulated streams in Ohio. The average standard errors of prediction for the equations range from 33.4% to 41.4%. Peak discharge estimates determined by log-Pearson Type III analysis using data collected through the 1987 water year are reported for 275 streamflow-gaging stations. Ordinary least-squares multiple-regression techniques were used to divide the State into three regions and to identify a set of basin characteristics that help explain station-to- station variation in the log-Pearson estimates. Contributing drainage area, main-channel slope, and storage area were identified as suitable explanatory variables. Generalized least-square procedures, which include historical flow data and account for differences in the variance of flows at different gaging stations, spatial correlation among gaging station records, and variable lengths of station record were used to estimate the regression parameters. Weighted peak-discharge estimates computed as a function of the log-Pearson Type III and regression estimates are reported for each station. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site located on the same stream. Limitations and shortcomings cited in an earlier report on the magnitude and frequency of floods in Ohio are addressed in this study. Geographic bias is no longer evident for the Maumee River basin of northwestern Ohio. No bias is found to be associated with the forested-area characteristic for the range used in the regression analysis (0.0 to 99.0%), nor is this characteristic significant in explaining peak discharges. Surface-mined area likewise is not significant in explaining peak discharges, and the regression equations are not biased when applied to basins having approximately 30% or less surface-mined area. Analyses of residuals indicate that the equations tend to overestimate flood-peak discharges for basins having approximately 30% or more surface-mined area. (USGS)
Peng, Ying; Li, Su-Ning; Pei, Xuexue; Hao, Kun
2018-03-01
Amultivariate regression statisticstrategy was developed to clarify multi-components content-effect correlation ofpanaxginseng saponins extract and predict the pharmacological effect by components content. In example 1, firstly, we compared pharmacological effects between panax ginseng saponins extract and individual saponin combinations. Secondly, we examined the anti-platelet aggregation effect in seven different saponin combinations of ginsenoside Rb1, Rg1, Rh, Rd, Ra3 and notoginsenoside R1. Finally, the correlation between anti-platelet aggregation and the content of multiple components was analyzed by a partial least squares algorithm. In example 2, firstly, 18 common peaks were identified in ten different batches of panax ginseng saponins extracts from different origins. Then, we investigated the anti-myocardial ischemia reperfusion injury effects of the ten different panax ginseng saponins extracts. Finally, the correlation between the fingerprints and the cardioprotective effects was analyzed by a partial least squares algorithm. Both in example 1 and 2, the relationship between the components content and pharmacological effect was modeled well by the partial least squares regression equations. Importantly, the predicted effect curve was close to the observed data of dot marked on the partial least squares regression model. This study has given evidences that themulti-component content is a promising information for predicting the pharmacological effects of traditional Chinese medicine.
Koch, Cosima; Posch, Andreas E; Goicoechea, Héctor C; Herwig, Christoph; Lendl, Bernhard
2014-01-07
This paper presents the quantification of Penicillin V and phenoxyacetic acid, a precursor, inline during Pencillium chrysogenum fermentations by FTIR spectroscopy and partial least squares (PLS) regression and multivariate curve resolution - alternating least squares (MCR-ALS). First, the applicability of an attenuated total reflection FTIR fiber optic probe was assessed offline by measuring standards of the analytes of interest and investigating matrix effects of the fermentation broth. Then measurements were performed inline during four fed-batch fermentations with online HPLC for the determination of Penicillin V and phenoxyacetic acid as reference analysis. PLS and MCR-ALS models were built using these data and validated by comparison of single analyte spectra with the selectivity ratio of the PLS models and the extracted spectral traces of the MCR-ALS models, respectively. The achieved root mean square errors of cross-validation for the PLS regressions were 0.22 g L(-1) for Penicillin V and 0.32 g L(-1) for phenoxyacetic acid and the root mean square errors of prediction for MCR-ALS were 0.23 g L(-1) for Penicillin V and 0.15 g L(-1) for phenoxyacetic acid. A general work-flow for building and assessing chemometric regression models for the quantification of multiple analytes in bioprocesses by FTIR spectroscopy is given. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-25
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.
Waltemeyer, Scott D.
2006-01-01
Estimates of the magnitude and frequency of peak discharges are necessary for the reliable flood-hazard mapping in the Navajo Nation in Arizona, Utah, Colorado, and New Mexico. The Bureau of Indian Affairs, U.S. Army Corps of Engineers, and Navajo Nation requested that the U.S. Geological Survey update estimates of peak discharge magnitude for gaging stations in the region and update regional equations for estimation of peak discharge and frequency at ungaged sites. Equations were developed for estimating the magnitude of peak discharges for recurrence intervals of 2, 5, 10, 25, 50, 100, and 500 years at ungaged sites using data collected through 1999 at 146 gaging stations, an additional 13 years of peak-discharge data since a 1997 investigation, which used gaging-station data through 1986. The equations for estimation of peak discharges at ungaged sites were developed for flood regions 8, 11, high elevation, and 6 and are delineated on the basis of the hydrologic codes from the 1997 investigation. Peak discharges for selected recurrence intervals were determined at gaging stations by fitting observed data to a log-Pearson Type III distribution with adjustments for a low-discharge threshold and a zero skew coefficient. A low-discharge threshold was applied to frequency analysis of 82 of the 146 gaging stations. This application provides an improved fit of the log-Pearson Type III frequency distribution. Use of the low-discharge threshold generally eliminated the peak discharge having a recurrence interval of less than 1.4 years in the probability-density function. Within each region, logarithms of the peak discharges for selected recurrence intervals were related to logarithms of basin and climatic characteristics using stepwise ordinary least-squares regression techniques for exploratory data analysis. Generalized least-squares regression techniques, an improved regression procedure that accounts for time and spatial sampling errors, then was applied to the same data used in the ordinary least-squares regression analyses. The average standard error of prediction for a peak discharge have a recurrence interval of 100-years for region 8 was 53 percent (average) for the 100-year flood. The average standard of prediction, which includes average sampling error and average standard error of regression, ranged from 45 to 83 percent for the 100-year flood. Estimated standard error of prediction for a hybrid method for region 11 was large in the 1997 investigation. No distinction of floods produced from a high-elevation region was presented in the 1997 investigation. Overall, the equations based on generalized least-squares regression techniques are considered to be more reliable than those in the 1997 report because of the increased length of record and improved GIS method. Techniques for transferring flood-frequency relations to ungaged sites on the same stream can be estimated at an ungaged site by a direct application of the regional regression equation or at an ungaged site on a stream that has a gaging station upstream or downstream by using the drainage-area ratio and the drainage-area exponent from the regional regression equation of the respective region.
The arcsine is asinine: the analysis of proportions in ecology.
Warton, David I; Hui, Francis K C
2011-01-01
The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
ERIC Educational Resources Information Center
Maggin, Daniel M.; Swaminathan, Hariharan; Rogers, Helen J.; O'Keeffe, Breda V.; Sugai, George; Horner, Robert H.
2011-01-01
A new method for deriving effect sizes from single-case designs is proposed. The strategy is applicable to small-sample time-series data with autoregressive errors. The method uses Generalized Least Squares (GLS) to model the autocorrelation of the data and estimate regression parameters to produce an effect size that represents the magnitude of…
Use of Thematic Mapper for water quality assessment
NASA Technical Reports Server (NTRS)
Horn, E. M.; Morrissey, L. A.
1984-01-01
The evaluation of simulated TM data obtained on an ER-2 aircraft at twenty-five predesignated sample sites for mapping water quality factors such as conductivity, pH, suspended solids, turbidity, temperature, and depth, is discussed. Using a multiple regression for the seven TM bands, an equation is developed for the suspended solids. TM bands 1, 2, 3, 4, and 6 are used with logarithm conductivity in a multiple regression. The assessment of regression equations for a high coefficient of determination (R-squared) and statistical significance is considered. Confidence intervals about the mean regression point are calculated in order to assess the robustness of the regressions used for mapping conductivity, turbidity, and suspended solids, and by regressing random subsamples of sites and comparing the resultant range of R-squared, cross validation is conducted.
Gene set analysis using variance component tests.
Huang, Yen-Tsung; Lin, Xihong
2013-06-28
Gene set analyses have become increasingly important in genomic research, as many complex diseases are contributed jointly by alterations of numerous genes. Genes often coordinate together as a functional repertoire, e.g., a biological pathway/network and are highly correlated. However, most of the existing gene set analysis methods do not fully account for the correlation among the genes. Here we propose to tackle this important feature of a gene set to improve statistical power in gene set analyses. We propose to model the effects of an independent variable, e.g., exposure/biological status (yes/no), on multiple gene expression values in a gene set using a multivariate linear regression model, where the correlation among the genes is explicitly modeled using a working covariance matrix. We develop TEGS (Test for the Effect of a Gene Set), a variance component test for the gene set effects by assuming a common distribution for regression coefficients in multivariate linear regression models, and calculate the p-values using permutation and a scaled chi-square approximation. We show using simulations that type I error is protected under different choices of working covariance matrices and power is improved as the working covariance approaches the true covariance. The global test is a special case of TEGS when correlation among genes in a gene set is ignored. Using both simulation data and a published diabetes dataset, we show that our test outperforms the commonly used approaches, the global test and gene set enrichment analysis (GSEA). We develop a gene set analyses method (TEGS) under the multivariate regression framework, which directly models the interdependence of the expression values in a gene set using a working covariance. TEGS outperforms two widely used methods, GSEA and global test in both simulation and a diabetes microarray data.
Spectral distance decay: Assessing species beta-diversity by quantile regression
Rocchinl, D.; Nagendra, H.; Ghate, R.; Cade, B.S.
2009-01-01
Remotely sensed data represents key information for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance may allow us to quantitatively estimate how beta-diversity in species changes with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological datasets are characterized by a high number of zeroes that can add noise to the regression model. Quantile regression can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this paper, we used ordinary least square (ols) and quantile regression to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.05) considering both ols and quantile regression. Nonetheless, ols regression estimate of mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when spectral distance approaches zero, was very low compared with the intercepts of upper quantiles, which detected high species similarity when habitats are more similar. In this paper we demonstrated the power of using quantile regressions applied to spectral distance decay in order to reveal species diversity patterns otherwise lost or underestimated by ordinary least square regression. ?? 2009 American Society for Photogrammetry and Remote Sensing.
Least Square Regression Method for Estimating Gas Concentration in an Electronic Nose System
Khalaf, Walaa; Pace, Calogero; Gaudioso, Manlio
2009-01-01
We describe an Electronic Nose (ENose) system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values), the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM) approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte. PMID:22573980
Regional regression of flood characteristics employing historical information
Tasker, Gary D.; Stedinger, J.R.
1987-01-01
Streamflow gauging networks provide hydrologic information for use in estimating the parameters of regional regression models. The regional regression models can be used to estimate flood statistics, such as the 100 yr peak, at ungauged sites as functions of drainage basin characteristics. A recent innovation in regional regression is the use of a generalized least squares (GLS) estimator that accounts for unequal station record lengths and sample cross correlation among the flows. However, this technique does not account for historical flood information. A method is proposed here to adjust this generalized least squares estimator to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the ordinary least squares estimator. Results indicate that using the GLS estimator adjusted for historical information significantly improves the regression model. ?? 1987.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Plant leaf chlorophyll content retrieval based on a field imaging spectroscopy system.
Liu, Bo; Yue, Yue-Min; Li, Ru; Shen, Wen-Jing; Wang, Ke-Lin
2014-10-23
A field imaging spectrometer system (FISS; 380-870 nm and 344 bands) was designed for agriculture applications. In this study, FISS was used to gather spectral information from soybean leaves. The chlorophyll content was retrieved using a multiple linear regression (MLR), partial least squares (PLS) regression and support vector machine (SVM) regression. Our objective was to verify the performance of FISS in a quantitative spectral analysis through the estimation of chlorophyll content and to determine a proper quantitative spectral analysis method for processing FISS data. The results revealed that the derivative reflectance was a more sensitive indicator of chlorophyll content and could extract content information more efficiently than the spectral reflectance, which is more significant for FISS data compared to ASD (analytical spectral devices) data, reducing the corresponding RMSE (root mean squared error) by 3.3%-35.6%. Compared with the spectral features, the regression methods had smaller effects on the retrieval accuracy. A multivariate linear model could be the ideal model to retrieve chlorophyll information with a small number of significant wavelengths used. The smallest RMSE of the chlorophyll content retrieved using FISS data was 0.201 mg/g, a relative reduction of more than 30% compared with the RMSE based on a non-imaging ASD spectrometer, which represents a high estimation accuracy compared with the mean chlorophyll content of the sampled leaves (4.05 mg/g). Our study indicates that FISS could obtain both spectral and spatial detailed information of high quality. Its image-spectrum-in-one merit promotes the good performance of FISS in quantitative spectral analyses, and it can potentially be widely used in the agricultural sector.
Plant Leaf Chlorophyll Content Retrieval Based on a Field Imaging Spectroscopy System
Liu, Bo; Yue, Yue-Min; Li, Ru; Shen, Wen-Jing; Wang, Ke-Lin
2014-01-01
A field imaging spectrometer system (FISS; 380–870 nm and 344 bands) was designed for agriculture applications. In this study, FISS was used to gather spectral information from soybean leaves. The chlorophyll content was retrieved using a multiple linear regression (MLR), partial least squares (PLS) regression and support vector machine (SVM) regression. Our objective was to verify the performance of FISS in a quantitative spectral analysis through the estimation of chlorophyll content and to determine a proper quantitative spectral analysis method for processing FISS data. The results revealed that the derivative reflectance was a more sensitive indicator of chlorophyll content and could extract content information more efficiently than the spectral reflectance, which is more significant for FISS data compared to ASD (analytical spectral devices) data, reducing the corresponding RMSE (root mean squared error) by 3.3%–35.6%. Compared with the spectral features, the regression methods had smaller effects on the retrieval accuracy. A multivariate linear model could be the ideal model to retrieve chlorophyll information with a small number of significant wavelengths used. The smallest RMSE of the chlorophyll content retrieved using FISS data was 0.201 mg/g, a relative reduction of more than 30% compared with the RMSE based on a non-imaging ASD spectrometer, which represents a high estimation accuracy compared with the mean chlorophyll content of the sampled leaves (4.05 mg/g). Our study indicates that FISS could obtain both spectral and spatial detailed information of high quality. Its image-spectrum-in-one merit promotes the good performance of FISS in quantitative spectral analyses, and it can potentially be widely used in the agricultural sector. PMID:25341439
Button, Deeanna M; Tewksbury, Richard; Mustaine, Elizabeth E; Payne, Brian K
2013-01-01
The purpose of this article is to explore factors contributing to perceptions about electronic monitoring policies governing sex offenders. Guided by Tannenbaum's theory of attribution and Shaw and McKay's theory of social disorganization, the authors examine the influence of demographic characteristics, victimization experiences, and neighborhood characteristics on perceptions about policies regarding the electronic monitoring of sex offenders. Ordinary least squares regression and logistic regression analyses of stratified telephone survey data reveal that factors associated with favorable views on the use of global positioning satellite monitoring for registered sex offenders appear to stem primarily from individuals' demographic characteristics. Experiential and neighborhood factors do provide some influence over individuals' views of electronic monitoring policies for sex offenders. Theoretical and policy implications are discussed.
Soltanzadeh, Ahmad; Mohammadfam, Iraj; Moghimbeigi, Abbas; Ghiasvand, Reza
2016-03-01
Construction industry involves the highest risk of occupational accidents and bodily injuries, which range from mild to very severe. The aim of this cross-sectional study was to identify the factors associated with accident severity rate (ASR) in the largest Iranian construction companies based on data about 500 occupational accidents recorded from 2009 to 2013. We also gathered data on safety and health risk management and training systems. Data were analysed using Pearson's chi-squared coefficient and multiple regression analysis. Median ASR (and the interquartile range) was 107.50 (57.24- 381.25). Fourteen of the 24 studied factors stood out as most affecting construction accident severity (p<0.05). These findings can be applied in the design and implementation of a comprehensive safety and health risk management system to reduce ASR.
Yehia, Ali M; Arafa, Reham M; Abbas, Samah S; Amer, Sawsan M
2016-01-15
Spectral resolution of cefquinome sulfate (CFQ) in the presence of its degradation products was studied. Three selective, accurate and rapid spectrophotometric methods were performed for the determination of CFQ in the presence of either its hydrolytic, oxidative or photo-degradation products. The proposed ratio difference, derivative ratio and mean centering are ratio manipulating spectrophotometric methods that were satisfactorily applied for selective determination of CFQ within linear range of 5.0-40.0 μg mL(-1). Concentration Residuals Augmented Classical Least Squares was applied and evaluated for the determination of the cited drug in the presence of its all degradation products. Traditional Partial Least Squares regression was also applied and benchmarked against the proposed advanced multivariate calibration. Experimentally designed 25 synthetic mixtures of three factors at five levels were used to calibrate and validate the multivariate models. Advanced chemometrics succeeded in quantitative and qualitative analyses of CFQ along with its hydrolytic, oxidative and photo-degradation products. The proposed methods were applied successfully for different pharmaceutical formulations analyses. These developed methods were simple and cost-effective compared with the manufacturer's RP-HPLC method. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Szuvandzsiev, Péter; Helyes, Lajos; Lugasi, Andrea; Szántó, Csongor; Baranowski, Piotr; Pék, Zoltán
2014-10-01
Processing tomato production represents an important part of the total production of processed vegetables in the world. The quality characteristics of processing tomato, important for the food industry, are soluble solids content and antioxidant content (such as lycopene and polyphenols) of the fruit. Analytical quantification of these components is destructive, time and labour consuming. That is why researchers try to develop a non-destructive and rapid method to assess those quality parameters. The present study reports the suitability of a portable handheld visible near infrared spectrometer to predict soluble solids, lycopene and polyphenol content of tomato fruit puree. Spectral ranges of 500-1000 nm were directly acquired on fruit puree of five different tomato varieties using a FieldSpec HandHeld 2™ Portable Spectroradiometer. Immediately after spectral measurement, each fruit sample was analysed to determine soluble solids, lycopene and polyphenol content. Partial least square regressions were carried out to create models of prediction between spectral data and the values obtained from the analytical results. The accuracy of the predictions was analysed according to the coefficient of determination value (R2), the root mean square error of calibration/ cross-validation.
Inter-class sparsity based discriminative least square regression.
Wen, Jie; Xu, Yong; Li, Zuoyong; Ma, Zhongli; Xu, Yuanrong
2018-06-01
Least square regression is a very popular supervised classification method. However, two main issues greatly limit its performance. The first one is that it only focuses on fitting the input features to the corresponding output labels while ignoring the correlations among samples. The second one is that the used label matrix, i.e., zero-one label matrix is inappropriate for classification. To solve these problems and improve the performance, this paper presents a novel method, i.e., inter-class sparsity based discriminative least square regression (ICS_DLSR), for multi-class classification. Different from other methods, the proposed method pursues that the transformed samples have a common sparsity structure in each class. For this goal, an inter-class sparsity constraint is introduced to the least square regression model such that the margins of samples from the same class can be greatly reduced while those of samples from different classes can be enlarged. In addition, an error term with row-sparsity constraint is introduced to relax the strict zero-one label matrix, which allows the method to be more flexible in learning the discriminative transformation matrix. These factors encourage the method to learn a more compact and discriminative transformation for regression and thus has the potential to perform better than other methods. Extensive experimental results show that the proposed method achieves the best performance in comparison with other methods for multi-class classification. Copyright © 2018 Elsevier Ltd. All rights reserved.
Determination of suitable drying curve model for bread moisture loss during baking
NASA Astrophysics Data System (ADS)
Soleimani Pour-Damanab, A. R.; Jafary, A.; Rafiee, S.
2013-03-01
This study presents mathematical modelling of bread moisture loss or drying during baking in a conventional bread baking process. In order to estimate and select the appropriate moisture loss curve equation, 11 different models, semi-theoretical and empirical, were applied to the experimental data and compared according to their correlation coefficients, chi-squared test and root mean square error which were predicted by nonlinear regression analysis. Consequently, of all the drying models, a Page model was selected as the best one, according to the correlation coefficients, chi-squared test, and root mean square error values and its simplicity. Mean absolute estimation error of the proposed model by linear regression analysis for natural and forced convection modes was 2.43, 4.74%, respectively.
A stream-gaging network analysis for the 7-day, 10-year annual low flow in New Hampshire streams
Flynn, Robert H.
2003-01-01
The 7-day, 10-year (7Q10) low-flow-frequency statistic is a widely used measure of surface-water availability in New Hampshire. Regression equations and basin-characteristic digital data sets were developed to help water-resource managers determine surface-water resources during periods of low flow in New Hampshire streams. These regression equations and data sets were developed to estimate streamflow statistics for the annual and seasonal low-flow-frequency, and period-of-record and seasonal period-of-record flow durations. generalized-least-squares (GLS) regression methods were used to develop the annual 7Q10 low-flow-frequency regression equation from 60 continuous-record stream-gaging stations in New Hampshire and in neighboring States. In the regression equation, the dependent variables were the annual 7Q10 flows at the 60 stream-gaging stations. The independent (or predictor) variables were objectively selected characteristics of the drainage basins that contribute flow to those stations. In contrast to ordinary-least-squares (OLS) regression analysis, GLS-developed estimating equations account for differences in length of record and spatial correlations among the flow-frequency statistics at the various stations.A total of 93 measurable drainage-basin characteristics were candidate independent variables. On the basis of several statistical parameters that were used to evaluate which combination of basin characteristics contribute the most to the predictive power of the equations, three drainage-basin characteristics were determined to be statistically significant predictors of the annual 7Q10: (1) total drainage area, (2) mean summer stream-gaging station precipitation from 1961 to 90, and (3) average mean annual basinwide temperature from 1961 to 1990.To evaluate the effectiveness of the stream-gaging network in providing regional streamflow data for the annual 7Q10, the computer program GLSNET (generalized-least-squares NETwork) was used to analyze the network by application of GLS regression between streamflow and the climatic and basin characteristics of the drainage basin upstream from each stream-gaging station. Improvement to the predictive ability of the regression equations developed for the network analyses is measured by the reduction in the average sampling-error variance, and can be achieved by collecting additional streamflow data at existing stations. The predictive ability of the regression equations is enhanced even further with the addition of new stations to the network. Continued data collection at unregulated stream-gaging stations with less than 14 years of record resulted in the greatest cost-weighted reduction to the average sampling-error variance of the annual 7Q10 regional regression equation. The addition of new stations in basins with underrepresented values for the independent variables of the total drainage area, average mean annual basinwide temperature, or mean summer stream-gaging station precipitation in the annual 7Q10 regression equation yielded a much greater cost-weighted reduction to the average sampling-error variance than when more data were collected at existing unregulated stations. To maximize the regional information obtained from the stream-gaging network for the annual 7Q10, ranking of the streamflow data can be used to determine whether an active station should be continued or if a new or discontinued station should be activated for streamflow data collection. Thus, this network analysis can help determine the costs and benefits of continuing the operation of a particular station or activating a new station at another location to predict the 7Q10 at ungaged stream reaches. The decision to discontinue an existing station or activate a new station, however, must also consider its contribution to other water-resource analyses such as flood management, water quality, or trends in land use or climatic change.
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P.; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS.
Catak, Binali; Oner, Can; Sutlu, Sevinc; Kilinc, Selcuk
2016-01-01
To determine the sociocultural factors that have effect on spontaneous abortion in Burdur, Turkey. Study was designed as case-control study. The case group consist of 257 women whose pregnancies ended with spontaneous abortion. The control group consisted of 514 women whose pregnancy continued since 22 weeks and more during the study. Chi-square, and backward LR logistic regression were utilized in analyses. In multifactorial-analyses it was determined that four factors (educational status of women, employment status of women, exposure to physical violence and non-receipt of ANC) created independent risk on spontaneous abortions. Pregnant women with these risk factors should be followed up more frequently and in a more qualified way in primary and secondary and tertiary health institutions.
Interpreting the Results of Weighted Least-Squares Regression: Caveats for the Statistical Consumer.
ERIC Educational Resources Information Center
Willett, John B.; Singer, Judith D.
In research, data sets often occur in which the variance of the distribution of the dependent variable at given levels of the predictors is a function of the values of the predictors. In this situation, the use of weighted least-squares (WLS) or techniques is required. Weights suitable for use in a WLS regression analysis must be estimated. A…
USDA-ARS?s Scientific Manuscript database
Purpose: The aim of this study was to develop a technique for the non-destructive and rapid prediction of the moisture content in red pepper powder using near-infrared (NIR) spectroscopy and a partial least squares regression (PLSR) model. Methods: Three red pepper powder products were separated in...
USDA-ARS?s Scientific Manuscript database
A technique of using multiple calibration sets in partial least squares regression (PLS) was proposed to improve the quantitative determination of ammonia from open-path Fourier transform infrared spectra. The spectra were measured near animal farms, and the path-integrated concentration of ammonia...
Local Linear Regression for Data with AR Errors.
Li, Runze; Li, Yan
2009-07-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Clarifying the role of mean centring in multicollinearity of interaction effects.
Shieh, Gwowen
2011-11-01
Moderated multiple regression (MMR) is frequently employed to analyse interaction effects between continuous predictor variables. The procedure of mean centring is commonly recommended to mitigate the potential threat of multicollinearity between predictor variables and the constructed cross-product term. Also, centring does typically provide more straightforward interpretation of the lower-order terms. This paper attempts to clarify two methodological issues of potential confusion. First, the positive and negative effects of mean centring on multicollinearity diagnostics are explored. It is illustrated that the mean centring method is, depending on the characteristics of the data, capable of either increasing or decreasing various measures of multicollinearity. Second, the exact reason why mean centring does not affect the detection of interaction effects is given. The explication shows the symmetrical influence of mean centring on the corrected sum of squares and variance inflation factor of the product variable while maintaining the equivalence between the two residual sums of squares for the regression of the product term on the two predictor variables. Thus the resulting test statistic remains unchanged regardless of the obvious modification of multicollinearity with mean centring. These findings provide a clear understanding and demonstration on the diverse impact of mean centring in MMR applications. ©2011 The British Psychological Society.
Wang, Bingqian; Peng, Bangzhu
2017-02-01
This work aims to investigate the potential of fiber-optic Fourier transform-near-infrared (FT-NIR) spectrometry associated with chemometric analysis, which will be applied to monitor time-related changes in residual sugar and alcohol strength during kiwi wine fermentation. NIR calibration models for residual sugar and alcohol strength during kiwi wine fermentation were established on the FT-NIR spectra of 98 samples scanned in a fiber-optic FT-NIR spectrometer, and partial least squares regression method. The results showed that R 2 and root mean square error of cross-validation could achieve 0.982 and 3.81 g/L for residual sugar, and 0.984 and 0.34% for alcohol strength, respectively. Furthermore, crucial process information on kiwi must and wine fermentations provided by fiber-optic FT-NIR spectrometry was found to agree with those obtained from traditional chemical methods, and therefore this fiber-optic FT-NIR spectrometry can be applied as an effective and suitable alternative for analyses and monitoring of those processes. The overall results suggested that fiber-optic FT-NIR spectrometry is a promising tool for monitoring and controlling the kiwi wine fermentation process. © 2017 Institute of Food Technologists®.
Neden, Catherine A; Parkin, Claire; Blow, Carol; Siriwardena, Aloysius Niroshan
2018-05-08
The aim of this study was to assess whether the absolute standard of candidates sitting the MRCGP Applied Knowledge Test (AKT) between 2011 and 2016 had changed. It is a descriptive study comparing the performance on marker questions of a reference group of UK graduates taking the AKT for the first time between 2011 and 2016. Using aggregated examination data, the performance of individual 'marker' questions was compared using Pearson's chi-squared tests and trend-line analysis. Binary logistic regression was used to analyse changes in performance over the study period. Changes in performance of individual marker questions using Pearson's chi-squared test showed statistically significant differences in 32 of the 49 questions included in the study. Trend line analysis showed a positive trend in 29 questions and a negative trend in the remaining 23. The magnitude of change was small. Logistic regression did not demonstrate any evidence for a change in the performance of the question set over the study period. However, candidates were more likely to get items on administration wrong compared with clinical medicine or research. There was no evidence of a change in performance of the question set as a whole.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Straub, D.E.
1998-01-01
The streamflow-gaging station network in Ohio was evaluated for its effectiveness in providing regional streamflow information. The analysis involved application of the principles of generalized least squares regression between streamflow and climatic and basin characteristics. Regression equations were developed for three flow characteristics: (1) the instantaneous peak flow with a 100-year recurrence interval (P100), (2) the mean annual flow (Qa), and (3) the 7-day, 10-year low flow (7Q10). All active and discontinued gaging stations with 5 or more years of unregulated-streamflow data with respect to each flow characteristic were used to develop the regression equations. The gaging-station network was evaluated for the current (1996) condition of the network and estimated conditions of various network strategies if an additional 5 and 20 years of streamflow data were collected. Any active or discontinued gaging station with (1) less than 5 years of unregulated-streamflow record, (2) previously defined basin and climatic characteristics, and (3) the potential for collection of more unregulated-streamflow record were included in the network strategies involving the additional 5 and 20 years of data. The network analysis involved use of the regression equations, in combination with location, period of record, and cost of operation, to determine the contribution of the data for each gaging station to regional streamflow information. The contribution of each gaging station was based on a cost-weighted reduction of the mean square error (average sampling-error variance) associated with each regional estimating equation. All gaging stations included in the network analysis were then ranked according to their contribution to the regional information for each flow characteristic. The predictive ability of the regression equations developed from the gaging station network could be improved for all three flow characteristics with the collection of additional streamflow data. The addition of new gaging stations to the network would result in an even greater improvement of the accuracy of the regional regression equations. Typically, continued data collection at stations with unregulated streamflow for all flow conditions that had less than 11 years of record with drainage areas smaller than 200 square miles contributed the largest cost-weighted reduction to the average sampling-error variance of the regional estimating equations. The results of the network analyses can be used to prioritize the continued operation of active gaging stations or the reactivation of discontinued gaging stations if the objective is to maximize the regional information content in the streamflow-gaging station network.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-01
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026
Developing a dengue forecast model using machine learning: A case study in China
Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-01-01
Background In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Methodology/Principal findings Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011–2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. Conclusion and significance The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics. PMID:29036169
Hemmila, April; McGill, Jim; Ritter, David
2008-03-01
To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
Analysis of Learning Curve Fitting Techniques.
1987-09-01
1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied
ERIC Educational Resources Information Center
Rule, David L.
Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
Doble, Brett; Lorgelly, Paula
2016-04-01
To determine the external validity of existing mapping algorithms for predicting EQ-5D-3L utility values from EORTC QLQ-C30 responses and to establish their generalizability in different types of cancer. A main analysis (pooled) sample of 3560 observations (1727 patients) and two disease severity patient samples (496 and 93 patients) with repeated observations over time from Cancer 2015 were used to validate the existing algorithms. Errors were calculated between observed and predicted EQ-5D-3L utility values using a single pooled sample and ten pooled tumour type-specific samples. Predictive accuracy was assessed using mean absolute error (MAE) and standardized root-mean-squared error (RMSE). The association between observed and predicted EQ-5D utility values and other covariates across the distribution was tested using quantile regression. Quality-adjusted life years (QALYs) were calculated using observed and predicted values to test responsiveness. Ten 'preferred' mapping algorithms were identified. Two algorithms estimated via response mapping and ordinary least-squares regression using dummy variables performed well on number of validation criteria, including accurate prediction of the best and worst QLQ-C30 health states, predicted values within the EQ-5D tariff range, relatively small MAEs and RMSEs, and minimal differences between estimated QALYs. Comparison of predictive accuracy across ten tumour type-specific samples highlighted that algorithms are relatively insensitive to grouping by tumour type and affected more by differences in disease severity. Two of the 'preferred' mapping algorithms suggest more accurate predictions, but limitations exist. We recommend extensive scenario analyses if mapped utilities are used in cost-utility analyses.
First Trimester Prenatal Care Initiation Among Hispanic Women Along the U.S.-Mexico Border.
Selchau, Katherine; Babuca, Maricela; Bower, Kara; Castro, Yara; Coakley, Eugenie; Flores, Araceli; Garcia, Jonah O; Reyes, Maria Lourdes F; Rojas, Yvonne; Rubin, Jason; Samuels, Deanne; Shattuck, Laura
2017-12-01
Background First trimester prenatal care (FTPNC) is associated with improved birth outcomes. U.S.-Mexico border Hispanic women have lower FTPNC than non-border or non-Hispanic women. This study aimed to identify (1) what demographic, knowledge and care-seeking factors influence FTPNC among Hispanic women in border counties served by five Healthy Start sites, and (2) what FTPNC barriers may be unique to this target population. Healthy Starts work to eliminate disparities in perinatal health in areas with high poverty and poor birth outcomes. Methods 403 Hispanic women of reproductive age in border communities of California, Arizona, New Mexico and Texas were surveyed on knowledge and behaviors related to prenatal care (PNC) and basic demographic information. Chi square analyses and logistic regressions were used to identify important relationships. Results Chi square analyses revealed that primiparous women were significantly less likely to start FTPNC than multiparous women (χ 2 = 6.8372, p = 0.0089). Women with accurate knowledge about FTPNC were more likely to obtain FTPNC (χ 2 = 29.280, p < .001) and more likely to have seen a doctor within the past year (χ 2 = 5.550, p = .018). Logistic regression confirmed that multiparity was associated with FTPNC and also that living in Texas was negatively associated with FTPNC (R 2 = 0.066, F(9,340) = 2.662, p = .005). Among 27 women with non-FTPNC, barriers included late pregnancy recognition (n = 19) and no medical insurance (n = 5). Conclusions This study supports research that first time pregnancies have lower FTPNC, and demonstrated a strong association between delayed PNC and late pregnancy recognition. Strengthened investments in preconception planning could improve FTPNC in this population.
Motulsky, Harvey J; Brown, Ronald E
2006-01-01
Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949
Ridge: a computer program for calculating ridge regression estimates
Donald E. Hilt; Donald W. Seegrist
1977-01-01
Least-squares coefficients for multiple-regression models may be unstable when the independent variables are highly correlated. Ridge regression is a biased estimation procedure that produces stable estimates of the coefficients. Ridge regression is discussed, and a computer program for calculating the ridge coefficients is presented.
Moradi, Ali; Rahmani, Khaled; Kavousi, Amir; Eshghabadi, Farshid; Nematollahi, Shahrzad; Zainni, Slahedyn; Soori, Hamid
2018-02-20
The aim of this study was to geographically analyse the traffic casualties in pedestrians in downtown of Tehran City. Study population consisted of pedestrians who had traffic injury accidents from April 2014 to March 2015 in Tehran City. Data were extracted from the offices of traffic police and municipality. For analysis of environmental factors and site of accidents, Ordinary Least Square (OLS) regression models and Geographically Weighted Regression (GWR) were used. All pedestrian accidents including 514 accidents were assessed in this study in which the site of accidents included arterial streets in 370 (71.9%) cases, collector streets in 133 cases (25.2%) and highways in 11 cases (2.1%). Geographical units of traffic accidents in pedestrians had statistically significant relationship with the number of bus stations, number of crossroads and recreational areas. Neighbourhoods close to markets are considered as the most dangerous places for injury in traffic accidents.
Prediction of elemental creep. [steady state and cyclic data from regression analysis
NASA Technical Reports Server (NTRS)
Davis, J. W.; Rummler, D. R.
1975-01-01
Cyclic and steady-state creep tests were performed to provide data which were used to develop predictive equations. These equations, describing creep as a function of stress, temperature, and time, were developed through the use of a least squares regression analyses computer program for both the steady-state and cyclic data sets. Comparison of the data from the two types of tests, revealed that there was no significant difference between the cyclic and steady-state creep strains for the L-605 sheet under the experimental conditions investigated (for the same total time at load). Attempts to develop a single linear equation describing the combined steady-state and cyclic creep data resulted in standard errors of estimates higher than obtained for the individual data sets. A proposed approach to predict elemental creep in metals uses the cyclic creep equation and a computer program which applies strain and time hardening theories of creep accumulation.
Explaining cross-national differences in marriage, cohabitation, and divorce in Europe, 1990-2000.
Kalmijn, Matthijs
2007-11-01
European countries differ considerably in their marriage patterns. The study presented in this paper describes these differences for the 1990s and attempts to explain them from a macro-level perspective. We find that different indicators of marriage (i.e., marriage rate, age at marriage, divorce rate, and prevalence of unmarried cohabitation) cannot be seen as indicators of an underlying concept such as the 'strength of marriage'. Multivariate ordinary least squares (OLS) regression analyses are estimated with countries as units and panel regression models are estimated in which annual time series for multiple countries are pooled. Using these models, we find that popular explanations of trends in the indicators - explanations that focus on gender roles, secularization, unemployment, and educational expansion - are also important for understanding differences among countries. We also find evidence for the role of historical continuity and societal disintegration in understanding cross-national differences.
Miller, Arthur L.; Weakley, Andrew Todd; Griffiths, Peter R.; Cauda, Emanuele G.; Bayman, Sean
2017-01-01
In order to help reduce silicosis in miners, the National Institute for Occupational Health and Safety (NIOSH) is developing field-portable methods for measuring airborne respirable crystalline silica (RCS), specifically the polymorph α-quartz, in mine dusts. In this study we demonstrate the feasibility of end-of-shift measurement of α-quartz using a direct-on-filter (DoF) method to analyze coal mine dust samples deposited onto polyvinyl chloride filters. The DoF method is potentially amenable for on-site analyses, but deviates from the current regulatory determination of RCS for coal mines by eliminating two sample preparation steps: ashing the sampling filter and redepositing the ash prior to quantification by Fourier transform infrared (FT-IR) spectrometry. In this study, the FT-IR spectra of 66 coal dust samples from active mines were used, and the RCS was quantified by using: (1) an ordinary least squares (OLS) calibration approach that utilizes standard silica material as done in the Mine Safety and Health Administration's P7 method; and (2) a partial least squares (PLS) regression approach. Both were capable of accounting for kaolinite, which can confound the IR analysis of silica. The OLS method utilized analytical standards for silica calibration and kaolin correction, resulting in a good linear correlation with P7 results and minimal bias but with the accuracy limited by the presence of kaolinite. The PLS approach also produced predictions well-correlated to the P7 method, as well as better accuracy in RCS prediction, and no bias due to variable kaolinite mass. Besides decreased sensitivity to mineral or substrate confounders, PLS has the advantage that the analyst is not required to correct for the presence of kaolinite or background interferences related to the substrate, making the method potentially viable for automated RCS prediction in the field. This study demonstrated the efficacy of FT-IR transmission spectrometry for silica determination in coal mine dusts, using both OLS and PLS analyses, when kaolinite was present. PMID:27645724
Hage, Camilla; Michaëlsson, Erik; Linde, Cecilia; Donal, Erwan; Daubert, Jean-Claude; Gan, Li-Ming; Lund, Lars H
2017-02-01
Underlying mechanisms in heart failure (HF) with preserved ejection fraction remain unknown. We investigated cardiovascular plasma biomarkers in HF with preserved ejection fraction and their correlation to diastolic dysfunction, functional class, pathophysiological processes, and prognosis. In 86 stable patients with HF and EF ≥45% in the Karolinska Rennes (KaRen) biomarker substudy, biomarkers were quantified by a multiplex immunoassay. Orthogonal projection to latent structures by partial least square analysis was performed on 87 biomarkers and 240 clinical variables, ranking biomarkers associated with New York Heart Association (NYHA) Functional class and the composite outcome (all-cause mortality and HF hospitalization). Biomarkers significantly correlated with outcome were analyzed by multivariable Cox regression and correlations with echocardiographic measurements performed. The orthogonal partial least square outcome-predicting biomarker pattern was run against the Ingenuity Pathway Analysis (IPA) database, containing annotated data from the public domain. The orthogonal partial least square analyses identified 32 biomarkers correlated with NYHA class and 28 predicting outcomes. Among outcome-predicting biomarkers, growth/differentiation factor-15 was the strongest and an additional 7 were also significant in Cox regression analyses when adjusted for age, sex, and N-terminal probrain natriuretic peptide: adrenomedullin (hazard ratio per log increase 2.53), agouti-related protein; (1.48), chitinase-3-like protein 1 (1.35), C-C motif chemokine 20 (1.35), fatty acid-binding protein (1.33), tumor necrosis factor receptor 1 (2.29), and TNF-related apoptosis-inducing ligand (0.34). Twenty-three of them correlated with diastolic dysfunction (E/e') and 5 with left atrial volume index. The IPA suggested that increased inflammation, immune activation with decreased necrosis and apoptosis preceded poor outcome. In HF with preserved ejection fraction, novel biomarkers of inflammation predict HF severity and prognosis that may complement or even outperform traditional markers, such as N-terminal probrain natriuretic peptide. These findings lend support to a hypothesis implicating global systemic inflammation in HF with preserved ejection fraction. URL: http://www.clinicaltrials.gov; Unique identifier: NCT00774709. © 2017 American Heart Association, Inc.
Miller, Arthur L; Weakley, Andrew Todd; Griffiths, Peter R; Cauda, Emanuele G; Bayman, Sean
2017-05-01
In order to help reduce silicosis in miners, the National Institute for Occupational Health and Safety (NIOSH) is developing field-portable methods for measuring airborne respirable crystalline silica (RCS), specifically the polymorph α-quartz, in mine dusts. In this study we demonstrate the feasibility of end-of-shift measurement of α-quartz using a direct-on-filter (DoF) method to analyze coal mine dust samples deposited onto polyvinyl chloride filters. The DoF method is potentially amenable for on-site analyses, but deviates from the current regulatory determination of RCS for coal mines by eliminating two sample preparation steps: ashing the sampling filter and redepositing the ash prior to quantification by Fourier transform infrared (FT-IR) spectrometry. In this study, the FT-IR spectra of 66 coal dust samples from active mines were used, and the RCS was quantified by using: (1) an ordinary least squares (OLS) calibration approach that utilizes standard silica material as done in the Mine Safety and Health Administration's P7 method; and (2) a partial least squares (PLS) regression approach. Both were capable of accounting for kaolinite, which can confound the IR analysis of silica. The OLS method utilized analytical standards for silica calibration and kaolin correction, resulting in a good linear correlation with P7 results and minimal bias but with the accuracy limited by the presence of kaolinite. The PLS approach also produced predictions well-correlated to the P7 method, as well as better accuracy in RCS prediction, and no bias due to variable kaolinite mass. Besides decreased sensitivity to mineral or substrate confounders, PLS has the advantage that the analyst is not required to correct for the presence of kaolinite or background interferences related to the substrate, making the method potentially viable for automated RCS prediction in the field. This study demonstrated the efficacy of FT-IR transmission spectrometry for silica determination in coal mine dusts, using both OLS and PLS analyses, when kaolinite was present.
An improved partial least-squares regression method for Raman spectroscopy
NASA Astrophysics Data System (ADS)
Momenpour Tehran Monfared, Ali; Anis, Hanan
2017-10-01
It is known that the performance of partial least-squares (PLS) regression analysis can be improved using the backward variable selection method (BVSPLS). In this paper, we further improve the BVSPLS based on a novel selection mechanism. The proposed method is based on sorting the weighted regression coefficients, and then the importance of each variable of the sorted list is evaluated using root mean square errors of prediction (RMSEP) criterion in each iteration step. Our Improved BVSPLS (IBVSPLS) method has been applied to leukemia and heparin data sets and led to an improvement in limit of detection of Raman biosensing ranged from 10% to 43% compared to PLS. Our IBVSPLS was also compared to the jack-knifing (simpler) and Genetic Algorithm (more complex) methods. Our method was consistently better than the jack-knifing method and showed either a similar or a better performance compared to the genetic algorithm.
Buchner, Florian; Wasem, Jürgen; Schillo, Sonja
2017-01-01
Risk equalization formulas have been refined since their introduction about two decades ago. Because of the complexity and the abundance of possible interactions between the variables used, hardly any interactions are considered. A regression tree is used to systematically search for interactions, a methodologically new approach in risk equalization. Analyses are based on a data set of nearly 2.9 million individuals from a major German social health insurer. A two-step approach is applied: In the first step a regression tree is built on the basis of the learning data set. Terminal nodes characterized by more than one morbidity-group-split represent interaction effects of different morbidity groups. In the second step the 'traditional' weighted least squares regression equation is expanded by adding interaction terms for all interactions detected by the tree, and regression coefficients are recalculated. The resulting risk adjustment formula shows an improvement in the adjusted R 2 from 25.43% to 25.81% on the evaluation data set. Predictive ratios are calculated for subgroups affected by the interactions. The R 2 improvement detected is only marginal. According to the sample level performance measures used, not involving a considerable number of morbidity interactions forms no relevant loss in accuracy. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
2013-01-01
Background Malnutrition is one of the principal causes of child mortality in developing countries including Bangladesh. According to our knowledge, most of the available studies, that addressed the issue of malnutrition among under-five children, considered the categorical (dichotomous/polychotomous) outcome variables and applied logistic regression (binary/multinomial) to find their predictors. In this study malnutrition variable (i.e. outcome) is defined as the number of under-five malnourished children in a family, which is a non-negative count variable. The purposes of the study are (i) to demonstrate the applicability of the generalized Poisson regression (GPR) model as an alternative of other statistical methods and (ii) to find some predictors of this outcome variable. Methods The data is extracted from the Bangladesh Demographic and Health Survey (BDHS) 2007. Briefly, this survey employs a nationally representative sample which is based on a two-stage stratified sample of households. A total of 4,460 under-five children is analysed using various statistical techniques namely Chi-square test and GPR model. Results The GPR model (as compared to the standard Poisson regression and negative Binomial regression) is found to be justified to study the above-mentioned outcome variable because of its under-dispersion (variance < mean) property. Our study also identify several significant predictors of the outcome variable namely mother’s education, father’s education, wealth index, sanitation status, source of drinking water, and total number of children ever born to a woman. Conclusions Consistencies of our findings in light of many other studies suggest that the GPR model is an ideal alternative of other statistical models to analyse the number of under-five malnourished children in a family. Strategies based on significant predictors may improve the nutritional status of children in Bangladesh. PMID:23297699
Exact Analysis of Squared Cross-Validity Coefficient in Predictive Regression Models
ERIC Educational Resources Information Center
Shieh, Gwowen
2009-01-01
In regression analysis, the notion of population validity is of theoretical interest for describing the usefulness of the underlying regression model, whereas the presumably more important concept of population cross-validity represents the predictive effectiveness for the regression equation in future research. It appears that the inference…
Lien, Tonje G; Borgan, Ørnulf; Reppe, Sjur; Gautvik, Kaare; Glad, Ingrid Kristine
2018-03-07
Using high-dimensional penalized regression we studied genome-wide DNA-methylation in bone biopsies of 80 postmenopausal women in relation to their bone mineral density (BMD). The women showed BMD varying from severely osteoporotic to normal. Global gene expression data from the same individuals was available, and since DNA-methylation often affects gene expression, the overall aim of this paper was to include both of these omics data sets into an integrated analysis. The classical penalized regression uses one penalty, but we incorporated individual penalties for each of the DNA-methylation sites. These individual penalties were guided by the strength of association between DNA-methylations and gene transcript levels. DNA-methylations that were highly associated to one or more transcripts got lower penalties and were therefore favored compared to DNA-methylations showing less association to expression. Because of the complex pathways and interactions among genes, we investigated both the association between DNA-methylations and their corresponding cis gene, as well as the association between DNA-methylations and trans-located genes. Two integrating penalized methods were used: first, an adaptive group-regularized ridge regression, and secondly, variable selection was performed through a modified version of the weighted lasso. When information from gene expressions was integrated, predictive performance was considerably improved, in terms of predictive mean square error, compared to classical penalized regression without data integration. We found a 14.7% improvement in the ridge regression case and a 17% improvement for the lasso case. Our version of the weighted lasso with data integration found a list of 22 interesting methylation sites. Several corresponded to genes that are known to be important in bone formation. Using BMD as response and these 22 methylation sites as covariates, least square regression analyses resulted in R 2 =0.726, comparable to an average R 2 =0.438 for 10000 randomly selected groups of DNA-methylations with group size 22. Two recent types of penalized regression methods were adapted to integrate DNA-methylation and their association to gene expression in the analysis of bone mineral density. In both cases predictions clearly benefit from including the additional information on gene expressions.
Lithium in drinking water and suicide mortality.
Kapusta, Nestor D; Mossaheb, Nilufar; Etzersdorfer, Elmar; Hlavin, Gerald; Thau, Kenneth; Willeit, Matthäus; Praschak-Rieder, Nicole; Sonneck, Gernot; Leithner-Dziubas, Katharina
2011-05-01
There is some evidence that natural levels of lithium in drinking water may have a protective effect on suicide mortality. To evaluate the association between local lithium levels in drinking water and suicide mortality at district level in Austria. A nationwide sample of 6460 lithium measurements was examined for association with suicide rates per 100,000 population and suicide standardised mortality ratios across all 99 Austrian districts. Multivariate regression models were adjusted for well-known socioeconomic factors known to influence suicide mortality in Austria (population density, per capita income, proportion of Roman Catholics, as well as the availability of mental health service providers). Sensitivity analyses and weighted least squares regression were used to challenge the robustness of the results. The overall suicide rate (R(2) = 0.15, β = -0.39, t = -4.14, P = 0.000073) as well as the suicide mortality ratio (R(2) = 0.17, β = -0.41, t = -4.38, P = 0.000030) were inversely associated with lithium levels in drinking water and remained significant after sensitivity analyses and adjustment for socioeconomic factors. In replicating and extending previous results, this study provides strong evidence that geographic regions with higher natural lithium concentrations in drinking water are associated with lower suicide mortality rates.
Kulis, Stephen; Hodge, David R; Ayers, Stephanie L; Brown, Eddie F; Marsiglia, Flavio F
2012-09-01
This article explores the aspects of spirituality and religious involvement that may be the protective factors against substance use among urban American Indian (AI) youth. Data come from AI youth (N = 123) in five urban middle schools in a southwestern metropolis. Ordinary least squares regression analyses indicated that following Christian beliefs and belonging to the Native American Church were associated with lower levels of substance use. Following AI traditional spiritual beliefs was associated with antidrug attitudes, norms, and expectancies. Having a sense of belonging to traditions from both AI cultures and Christianity may foster integration of the two worlds in which urban AI youth live.
NASA Astrophysics Data System (ADS)
Abunama, Taher; Othman, Faridah
2017-06-01
Analysing the fluctuations of wastewater inflow rates in sewage treatment plants (STPs) is essential to guarantee a sufficient treatment of wastewater before discharging it to the environment. The main objectives of this study are to statistically analyze and forecast the wastewater inflow rates into the Bandar Tun Razak STP in Kuala Lumpur, Malaysia. A time series analysis of three years’ weekly influent data (156weeks) has been conducted using the Auto-Regressive Integrated Moving Average (ARIMA) model. Various combinations of ARIMA orders (p, d, q) have been tried to select the most fitted model, which was utilized to forecast the wastewater inflow rates. The linear regression analysis was applied to testify the correlation between the observed and predicted influents. ARIMA (3, 1, 3) model was selected with the highest significance R-square and lowest normalized Bayesian Information Criterion (BIC) value, and accordingly the wastewater inflow rates were forecasted to additional 52weeks. The linear regression analysis between the observed and predicted values of the wastewater inflow rates showed a positive linear correlation with a coefficient of 0.831.
Bhamidipati, Ravi Kanth; Syed, Muzeeb; Mullangi, Ramesh; Srinivas, Nuggehally
2018-02-01
1. Dalbavancin, a lipoglycopeptide, is approved for treating gram-positive bacterial infections. Area under plasma concentration versus time curve (AUC inf ) of dalbavancin is a key parameter and AUC inf /MIC ratio is a critical pharmacodynamic marker. 2. Using end of intravenous infusion concentration (i.e. C max ) C max versus AUC inf relationship for dalbavancin was established by regression analyses (i.e. linear, log-log, log-linear and power models) using 21 pairs of subject data. 3. The predictions of the AUC inf were performed using published C max data by application of regression equations. The quotient of observed/predicted values rendered fold difference. The mean absolute error (MAE)/root mean square error (RMSE) and correlation coefficient (r) were used in the assessment. 4. MAE and RMSE values for the various models were comparable. The C max versus AUC inf exhibited excellent correlation (r > 0.9488). The internal data evaluation showed narrow confinement (0.84-1.14-fold difference) with a RMSE < 10.3%. The external data evaluation showed that the models predicted AUC inf with a RMSE of 3.02-27.46% with fold difference largely contained within 0.64-1.48. 5. Regardless of the regression models, a single time point strategy of using C max (i.e. end of 30-min infusion) is amenable as a prospective tool for predicting AUC inf of dalbavancin in patients.
Sando, Roy; Chase, Katherine J.
2017-03-23
A common statistical procedure for estimating streamflow statistics at ungaged locations is to develop a relational model between streamflow and drainage basin characteristics at gaged locations using least squares regression analysis; however, least squares regression methods are parametric and make constraining assumptions about the data distribution. The random forest regression method provides an alternative nonparametric method for estimating streamflow characteristics at ungaged sites and requires that the data meet fewer statistical conditions than least squares regression methods.Random forest regression analysis was used to develop predictive models for 89 streamflow characteristics using Precipitation-Runoff Modeling System simulated streamflow data and drainage basin characteristics at 179 sites in central and eastern Montana. The predictive models were developed from streamflow data simulated for current (baseline, water years 1982–99) conditions and three future periods (water years 2021–38, 2046–63, and 2071–88) under three different climate-change scenarios. These predictive models were then used to predict streamflow characteristics for baseline conditions and three future periods at 1,707 fish sampling sites in central and eastern Montana. The average root mean square error for all predictive models was about 50 percent. When streamflow predictions at 23 fish sampling sites were compared to nearby locations with simulated data, the mean relative percent difference was about 43 percent. When predictions were compared to streamflow data recorded at 21 U.S. Geological Survey streamflow-gaging stations outside of the calibration basins, the average mean absolute percent error was about 73 percent.
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
Background: The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Methods: Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. Results: The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Conclusions: Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant. PMID:23113198
Parsaeian, M; Mohammad, K; Mahmoudi, M; Zeraati, H
2012-01-01
The purpose of this investigation was to compare empirically predictive ability of an artificial neural network with a logistic regression in prediction of low back pain. Data from the second national health survey were considered in this investigation. This data includes the information of low back pain and its associated risk factors among Iranian people aged 15 years and older. Artificial neural network and logistic regression models were developed using a set of 17294 data and they were validated in a test set of 17295 data. Hosmer and Lemeshow recommendation for model selection was used in fitting the logistic regression. A three-layer perceptron with 9 inputs, 3 hidden and 1 output neurons was employed. The efficiency of two models was compared by receiver operating characteristic analysis, root mean square and -2 Loglikelihood criteria. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the logistic regression was 0.752 (0.004), 0.3832 and 14769.2, respectively. The area under the ROC curve (SE), root mean square and -2Loglikelihood of the artificial neural network was 0.754 (0.004), 0.3770 and 14757.6, respectively. Based on these three criteria, artificial neural network would give better performance than logistic regression. Although, the difference is statistically significant, it does not seem to be clinically significant.
Gaskin, Cadeyrn J; Happell, Brenda
2014-05-01
To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial improvement. Most importantly, researchers should abandon the misleading practice of interpreting the results from inferential tests based solely on whether they are statistically significant (or not) and, instead, focus on reporting and interpreting effect sizes, confidence intervals, and significance levels. Nursing researchers also need to conduct and report a priori power analyses, and to address the issue of Type I experiment-wise error inflation in their studies. Crown Copyright © 2013. Published by Elsevier Ltd. All rights reserved.
ERIC Educational Resources Information Center
Liou, Pey-Yan
2009-01-01
The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
Mapping health outcome measures from a stroke registry to EQ-5D weights.
Ghatnekar, Ola; Eriksson, Marie; Glader, Eva-Lotta
2013-03-07
To map health outcome related variables from a national register, not part of any validated instrument, with EQ-5D weights among stroke patients. We used two cross-sectional data sets including patient characteristics, outcome variables and EQ-5D weights from the national Swedish stroke register. Three regression techniques were used on the estimation set (n=272): ordinary least squares (OLS), Tobit, and censored least absolute deviation (CLAD). The regression coefficients for "dressing", "toileting", "mobility", "mood", "general health" and "proxy-responders" were applied to the validation set (n=272), and the performance was analysed with mean absolute error (MAE) and mean square error (MSE). The number of statistically significant coefficients varied by model, but all models generated consistent coefficients in terms of sign. Mean utility was underestimated in all models (least in OLS) and with lower variation (least in OLS) compared to the observed. The maximum attainable EQ-5D weight ranged from 0.90 (OLS) to 1.00 (Tobit and CLAD). Health states with utility weights <0.5 had greater errors than those with weights ≥ 0.5 (P<0.01). This study indicates that it is possible to map non-validated health outcome measures from a stroke register into preference-based utilities to study the development of stroke care over time, and to compare with other conditions in terms of utility.
Balabin, Roman M; Smirnov, Sergey V
2011-04-29
During the past several years, near-infrared (near-IR/NIR) spectroscopy has increasingly been adopted as an analytical tool in various fields from petroleum to biomedical sectors. The NIR spectrum (above 4000 cm(-1)) of a sample is typically measured by modern instruments at a few hundred of wavelengths. Recently, considerable effort has been directed towards developing procedures to identify variables (wavelengths) that contribute useful information. Variable selection (VS) or feature selection, also called frequency selection or wavelength selection, is a critical step in data analysis for vibrational spectroscopy (infrared, Raman, or NIRS). In this paper, we compare the performance of 16 different feature selection methods for the prediction of properties of biodiesel fuel, including density, viscosity, methanol content, and water concentration. The feature selection algorithms tested include stepwise multiple linear regression (MLR-step), interval partial least squares regression (iPLS), backward iPLS (BiPLS), forward iPLS (FiPLS), moving window partial least squares regression (MWPLS), (modified) changeable size moving window partial least squares (CSMWPLS/MCSMWPLSR), searching combination moving window partial least squares (SCMWPLS), successive projections algorithm (SPA), uninformative variable elimination (UVE, including UVE-SPA), simulated annealing (SA), back-propagation artificial neural networks (BP-ANN), Kohonen artificial neural network (K-ANN), and genetic algorithms (GAs, including GA-iPLS). Two linear techniques for calibration model building, namely multiple linear regression (MLR) and partial least squares regression/projection to latent structures (PLS/PLSR), are used for the evaluation of biofuel properties. A comparison with a non-linear calibration model, artificial neural networks (ANN-MLP), is also provided. Discussion of gasoline, ethanol-gasoline (bioethanol), and diesel fuel data is presented. The results of other spectroscopic techniques application, such as Raman, ultraviolet-visible (UV-vis), or nuclear magnetic resonance (NMR) spectroscopies, can be greatly improved by an appropriate feature selection choice. Copyright © 2011 Elsevier B.V. All rights reserved.
Using Quantile and Asymmetric Least Squares Regression for Optimal Risk Adjustment.
Lorenz, Normann
2017-06-01
In this paper, we analyze optimal risk adjustment for direct risk selection (DRS). Integrating insurers' activities for risk selection into a discrete choice model of individuals' health insurance choice shows that DRS has the structure of a contest. For the contest success function (csf) used in most of the contest literature (the Tullock-csf), optimal transfers for a risk adjustment scheme have to be determined by means of a restricted quantile regression, irrespective of whether insurers are primarily engaged in positive DRS (attracting low risks) or negative DRS (repelling high risks). This is at odds with the common practice of determining transfers by means of a least squares regression. However, this common practice can be rationalized for a new csf, but only if positive and negative DRSs are equally important; if they are not, optimal transfers have to be calculated by means of a restricted asymmetric least squares regression. Using data from German and Swiss health insurers, we find considerable differences between the three types of regressions. Optimal transfers therefore critically depend on which csf represents insurers' incentives for DRS and, if it is not the Tullock-csf, whether insurers are primarily engaged in positive or negative DRS. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Weaver, J. Curtis; Feaster, Toby D.; Gotvald, Anthony J.
2009-01-01
Reliable estimates of the magnitude and frequency of floods are required for the economical and safe design of transportation and water-conveyance structures. A multistate approach was used to update methods for estimating the magnitude and frequency of floods in rural, ungaged basins in North Carolina, South Carolina, and Georgia that are not substantially affected by regulation, tidal fluctuations, or urban development. In North Carolina, annual peak-flow data available through September 2006 were available for 584 sites; 402 of these sites had a total of 10 or more years of systematic record that is required for at-site, flood-frequency analysis. Following data reviews and the computation of 20 physical and climatic basin characteristics for each station as well as at-site flood-frequency statistics, annual peak-flow data were identified for 363 sites in North Carolina suitable for use in this analysis. Among these 363 sites, 19 sites had records that could be divided into unregulated and regulated/ channelized annual peak discharges, which means peak-flow records were identified for a total of 382 cases in North Carolina. Considering the 382 cases, at-site flood-frequency statistics are provided for 333 unregulated cases (also used for the regression database) and 49 regulated/channelized cases. The flood-frequency statistics for the 333 unregulated sites were combined with data for sites from South Carolina, Georgia, and adjacent parts of Alabama, Florida, Tennessee, and Virginia to create a database of 943 sites considered for use in the regional regression analysis. Flood-frequency statistics were computed by fitting logarithms (base 10) of the annual peak flows to a log-Pearson Type III distribution. As part of the computation process, a new generalized skew coefficient was developed by using a Bayesian generalized least-squares regression model. Exploratory regression analyses using ordinary least-squares regression completed on the initial database of 943 sites resulted in defining five hydrologic regions for North Carolina, South Carolina, and Georgia. Stations with drainage areas less than 1 square mile were removed from the database, and a procedure to examine for basin redundancy (based on drainage area and periods of record) also resulted in the removal of some stations from the regression database. Flood-frequency estimates and basin characteristics for 828 gaged stations were combined to form the final database that was used in the regional regression analysis. Regional regression analysis, using generalized least-squares regression, was used to develop a set of predictive equations that can be used for estimating the 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent chance exceedance flows for rural ungaged, basins in North Carolina, South Carolina, and Georgia. The final predictive equations are all functions of drainage area and the percentage of drainage basin within each of the five hydrologic regions. Average errors of prediction for these regression equations range from 34.0 to 47.7 percent. Discharge estimates determined from the systematic records for the current study are, on average, larger in magnitude than those from a previous study for the highest percent chance exceedances (50 and 20 percent) and tend to be smaller than those from the previous study for the lower percent chance exceedances when all sites are considered as a group. For example, mean differences for sites in the Piedmont hydrologic region range from positive 0.5 percent for the 50-percent chance exceedance flow to negative 4.6 percent for the 0.2-percent chance exceedance flow when stations are grouped by hydrologic region. Similarly for the same hydrologic region, median differences range from positive 0.9 percent for the 50-percent chance exceedance flow to negative 7.1 percent for the 0.2-percent chance exceedance flow. However, mean and median percentage differences between the estimates from the previous and curre
NASA Astrophysics Data System (ADS)
Denli, H. H.; Durmus, B.
2016-12-01
The purpose of this study is to examine the factors which may affect the apartment prices with multiple linear regression analysis models and visualize the results by value maps. The study is focused on a county of Istanbul - Turkey. Totally 390 apartments around the county Umraniye are evaluated due to their physical and locational conditions. The identification of factors affecting the price of apartments in the county with a population of approximately 600k is expected to provide a significant contribution to the apartment market.Physical factors are selected as the age, number of rooms, size, floor numbers of the building and the floor that the apartment is positioned in. Positional factors are selected as the distances to the nearest hospital, school, park and police station. Totally ten physical and locational parameters are examined by regression analysis.After the regression analysis has been performed, value maps are composed from the parameters age, price and price per square meters. The most significant of the composed maps is the price per square meters map. Results show that the location of the apartment has the most influence to the square meter price information of the apartment. A different practice is developed from the composed maps by searching the ability of using price per square meters map in urban transformation practices. By marking the buildings older than 15 years in the price per square meters map, a different and new interpretation has been made to determine the buildings, to which should be given priority during an urban transformation in the county.This county is very close to the North Anatolian Fault zone and is under the threat of earthquakes. By marking the apartments older than 15 years on the price per square meters map, both older and expensive square meters apartments list can be gathered. By the help of this list, the priority could be given to the selected higher valued old apartments to support the economy of the country during an earthquake loss. We may call this urban transformation as earthquake-based urban transformation.
NASA Astrophysics Data System (ADS)
Musa, Rosliza; Ali, Zalila; Baharum, Adam; Nor, Norlida Mohd
2017-08-01
The linear regression model assumes that all random error components are identically and independently distributed with constant variance. Hence, each data point provides equally precise information about the deterministic part of the total variation. In other words, the standard deviations of the error terms are constant over all values of the predictor variables. When the assumption of constant variance is violated, the ordinary least squares estimator of regression coefficient lost its property of minimum variance in the class of linear and unbiased estimators. Weighted least squares estimation are often used to maximize the efficiency of parameter estimation. A procedure that treats all of the data equally would give less precisely measured points more influence than they should have and would give highly precise points too little influence. Optimizing the weighted fitting criterion to find the parameter estimates allows the weights to determine the contribution of each observation to the final parameter estimates. This study used polynomial model with weighted least squares estimation to investigate paddy production of different paddy lots based on paddy cultivation characteristics and environmental characteristics in the area of Kedah and Perlis. The results indicated that factors affecting paddy production are mixture fertilizer application cycle, average temperature, the squared effect of average rainfall, the squared effect of pest and disease, the interaction between acreage with amount of mixture fertilizer, the interaction between paddy variety and NPK fertilizer application cycle and the interaction between pest and disease and NPK fertilizer application cycle.
ERIC Educational Resources Information Center
Lee, Wan-Fung; Bulcock, Jeffrey Wilson
The purposes of this study are: (1) to demonstrate the superiority of simple ridge regression over ordinary least squares regression through theoretical argument and empirical example; (2) to modify ridge regression through use of the variance normalization criterion; and (3) to demonstrate the superiority of simple ridge regression based on the…
Determining Sample Size for Accurate Estimation of the Squared Multiple Correlation Coefficient.
ERIC Educational Resources Information Center
Algina, James; Olejnik, Stephen
2000-01-01
Discusses determining sample size for estimation of the squared multiple correlation coefficient and presents regression equations that permit determination of the sample size for estimating this parameter for up to 20 predictor variables. (SLD)
Analyzing industrial energy use through ordinary least squares regression models
NASA Astrophysics Data System (ADS)
Golden, Allyson Katherine
Extensive research has been performed using regression analysis and calibrated simulations to create baseline energy consumption models for residential buildings and commercial institutions. However, few attempts have been made to discuss the applicability of these methodologies to establish baseline energy consumption models for industrial manufacturing facilities. In the few studies of industrial facilities, the presented linear change-point and degree-day regression analyses illustrate ideal cases. It follows that there is a need in the established literature to discuss the methodologies and to determine their applicability for establishing baseline energy consumption models of industrial manufacturing facilities. The thesis determines the effectiveness of simple inverse linear statistical regression models when establishing baseline energy consumption models for industrial manufacturing facilities. Ordinary least squares change-point and degree-day regression methods are used to create baseline energy consumption models for nine different case studies of industrial manufacturing facilities located in the southeastern United States. The influence of ambient dry-bulb temperature and production on total facility energy consumption is observed. The energy consumption behavior of industrial manufacturing facilities is only sometimes sufficiently explained by temperature, production, or a combination of the two variables. This thesis also provides methods for generating baseline energy models that are straightforward and accessible to anyone in the industrial manufacturing community. The methods outlined in this thesis may be easily replicated by anyone that possesses basic spreadsheet software and general knowledge of the relationship between energy consumption and weather, production, or other influential variables. With the help of simple inverse linear regression models, industrial manufacturing facilities may better understand their energy consumption and production behavior, and identify opportunities for energy and cost savings. This thesis study also utilizes change-point and degree-day baseline energy models to disaggregate facility annual energy consumption into separate industrial end-user categories. The baseline energy model provides a suitable and economical alternative to sub-metering individual manufacturing equipment. One case study describes the conjoined use of baseline energy models and facility information gathered during a one-day onsite visit to perform an end-point energy analysis of an injection molding facility conducted by the Alabama Industrial Assessment Center. Applying baseline regression model results to the end-point energy analysis allowed the AIAC to better approximate the annual energy consumption of the facility's HVAC system.
Functional Relationships and Regression Analysis.
ERIC Educational Resources Information Center
Preece, Peter F. W.
1978-01-01
Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Weighted regression analysis and interval estimators
Donald W. Seegrist
1974-01-01
A method for deriving the weighted least squares estimators for the parameters of a multiple regression model. Confidence intervals for expected values, and prediction intervals for the means of future samples are given.
Griffiths, Robert I; Gleeson, Michelle L; Danese, Mark D; O'Hagan, Anthony
2012-01-01
To assess the accuracy and precision of inverse probability weighted (IPW) least squares regression analysis for censored cost data. By using Surveillance, Epidemiology, and End Results-Medicare, we identified 1500 breast cancer patients who died and had complete cost information within the database. Patients were followed for up to 48 months (partitions) after diagnosis, and their actual total cost was calculated in each partition. We then simulated patterns of administrative and dropout censoring and also added censoring to patients receiving chemotherapy to simulate comparing a newer to older intervention. For each censoring simulation, we performed 1000 IPW regression analyses (bootstrap, sampling with replacement), calculated the average value of each coefficient in each partition, and summed the coefficients for each regression parameter to obtain the cumulative values from 1 to 48 months. The cumulative, 48-month, average cost was $67,796 (95% confidence interval [CI] $58,454-$78,291) with no censoring, $66,313 (95% CI $54,975-$80,074) with administrative censoring, and $66,765 (95% CI $54,510-$81,843) with administrative plus dropout censoring. In multivariate analysis, chemotherapy was associated with increased cost of $25,325 (95% CI $17,549-$32,827) compared with $28,937 (95% CI $20,510-$37,088) with administrative censoring and $29,593 ($20,564-$39,399) with administrative plus dropout censoring. Adding censoring to the chemotherapy group resulted in less accurate IPW estimates. This was ameliorated, however, by applying IPW within treatment groups. IPW is a consistent estimator of population mean costs if the weight is correctly specified. If the censoring distribution depends on some covariates, a model that accommodates this dependency must be correctly specified in IPW to obtain accurate estimates. Copyright © 2012 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel.
Grapov, Dmitry; Newman, John W
2012-09-01
Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010).
Physical Activity and Depressive Symptoms in Four Ethnic Groups of Midlife Women
Im, Eun-Ok; Ham, Ok Kyung; Chee, Eunice; Chee, Wonshik
2014-01-01
The purpose of this study was to determine the associations between physical activity and depression and the multiple contextual factors influencing these associations in four major ethnic-groups of midlife women in the U.S. This was a secondary analysis of the data from 542 midlife women. The instruments included questions on background characteristics and health and menopausal status; the Depression Index for Midlife Women; and the Kaiser Physical Activity Survey. The data were analyzed using chi-square tests, the ANOVA, twoway ANOVA, correlation analyses, and hierarchical multiple regression analyses. The women's depressive symptoms were negatively correlated with active living and sports/exercise physical activities whereas they were positively correlated with occupational physical activities (p < .01). Family income was the strongest predictor of their depressive symptoms. Increasing physical activity may improve midlife women's depressive symptoms, but the types of physical activity and multiple contextual factors need to be considered in intervention development. PMID:24879749
A chemometric approach to the characterisation of historical mortars
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rampazzi, L.; Pozzi, A.; Sansonetti, A.
2006-06-15
The compositional knowledge of historical mortars is of great concern in case of provenance and dating investigations and of conservation works since the nature of the raw materials suggests the most compatible conservation products. The classic characterisation usually goes through various analytical determinations, while conservation laboratories call for simple and quick analyses able to enlighten the nature of mortars, usually in terms of the binder fraction. A chemometric approach to the matter is here undertaken. Specimens of mortars were prepared with calcitic and dolomitic binders and analysed by Atomic Spectroscopy. Principal Components Analysis (PCA) was used to investigate the featuresmore » of specimens and samples. A Partial Least Square (PLS1) regression was done in order to predict the binder/aggregate ratio. The model was applied to historical mortars from the churches of St. Lorenzo (Milan) and St. Abbondio (Como). The accordance between the predictive model and the real samples is discussed.« less
Predicting consumer preferences for mineral composition of bottled and tap water.
Platikanov, Stefan; Hernández, Alejandra; González, Susana; Luis Cortina, Jose; Tauler, Roma; Devesa, Ricard
2017-01-01
The overall liking for taste of water was correlated with the mineral composition of selected bottled and tap waters. Sixty-nine untrained volunteers assessed and rated twenty-five different commercial bottled and tap waters from. Water samples were physicochemical characterised by analysing conductivity, pH, total dissolved solids (TDS) and major anions and cations: HCO 3 - , SO 4 2- , Cl - , NO 3 - , Ca 2+ , Mg 2+ , Na + , and K + . Residual chlorine levels were also analysed in the tap water samples. Globally, volunteers preferred waters rich in calcium bicarbonate and sulfate, rather than in sodium chloride. This study also demonstrated that it was possible to accurately predict the overall liking by a Partial Least Squares regression using either all measured physicochemical parameters or a reduced number of them. These results were in agreement with previously published results using trained panellists. Copyright © 2016 Elsevier B.V. All rights reserved.
Regression analysis on the variation in efficiency frontiers for prevention stage of HIV/AIDS.
Kamae, Maki S; Kamae, Isao; Cohen, Joshua T; Neumann, Peter J
2011-01-01
To investigate how the cost effectiveness of preventing HIV/AIDS varies across possible efficiency frontiers (EFs) by taking into account potentially relevant external factors, such as prevention stage, and how the EFs can be characterized using regression analysis given uncertainty of the QALY-cost estimates. We reviewed cost-effectiveness estimates for the prevention and treatment of HIV/AIDS published from 2002-2007 and catalogued in the Tufts Medical Center Cost-Effectiveness Analysis (CEA) Registry. We constructed efficiency frontier (EF) curves by plotting QALYs against costs, using methods used by the Institute for Quality and Efficiency in Health Care (IQWiG) in Germany. We stratified the QALY-cost ratios by prevention stage, country of study, and payer perspective, and estimated EF equations using log and square-root models. A total of 53 QALY-cost ratios were identified for HIV/AIDS in the Tufts CEA Registry. Plotted ratios stratified by prevention stage were visually grouped into a cluster consisting of primary/secondary prevention measures and a cluster consisting of tertiary measures. Correlation coefficients for each cluster were statistically significant. For each cluster, we derived two EF equations - one based on the log model, and one based on the square-root model. Our findings indicate that stratification of HIV/AIDS interventions by prevention stage can yield distinct EFs, and that the correlation and regression analyses are useful for parametrically characterizing EF equations. Our study has certain limitations, such as the small number of included articles and the potential for study populations to be non-representative of countries of interest. Nonetheless, our approach could help develop a deeper appreciation of cost effectiveness beyond the deterministic approach developed by IQWiG.
Shen, Chan; Sambamoorthi, Usha; Rust, George
2008-06-01
The objectives of the study were to compare health care expenditures between adults with and without mental illness among individuals with obesity and chronic physical illness. We performed a cross-sectional analysis of 2440 adults (older than age 21) with obesity using a nationally representative survey of households, the Medical Expenditure Panel Survey. Chronic physical illness consisted of self-reported asthma, diabetes, heart disease, hypertension, or osteoarthritis. Mental illness included affective disorders; anxiety, somatoform, dissociative, personality disorders; and schizophrenia. Utilization and expenditures by type of service (total, inpatient, outpatient, emergency room, pharmacy, and other) were the dependent variables. Chi-square tests, logistic regression on likelihood of use, and ordinary least squares regression on logged expenditures among users were performed. All regressions controlled for gender, race/ethnicity, age, martial status, region, education, employment, poverty status, health insurance, smoking, and exercise. All analyses accounted for the complex design of the survey. We found that 25% of adults with obesity and physical illness had a mental illness. The average total expenditures for obese adults with physical illness and mental illness were $9897; average expenditures were $6584 for those with physical illness only. Mean pharmacy expenditures for obese adults with physical illness and mental illness and for those with physical illness only were $3343 and $1756, respectively. After controlling for all independent variables, among adults with obesity and physical illness, those with mental illness were more likely to use emergency services and had higher total, outpatient, and pharmaceutical expenditures than those without mental illness. Among individuals with obesity and chronic physical illness, expenditures increased when mental illness is added. Our study findings suggest cost-savings efforts should examine the reasons for high utilization and expenditures for those with obesity, chronic physical illness, and mental illness.
Vilar-Compte, Mireya; Sandoval-Olascoaga, Sebastian; Bernal-Stuart, Ana; Shimoga, Sandhya; Vargas-Bustamante, Arturo
2015-11-01
The present paper investigated the impact of the 2008 financial crisis on food security in Mexico and how it disproportionally affected vulnerable households. A generalized ordered logistic regression was estimated to assess the impact of the crisis on households' food security status. An ordinary least squares and a quantile regression were estimated to evaluate the effect of the financial crisis on a continuous proxy measure of food security defined as the share of a household's current income devoted to food expenditures. Setting Both analyses were performed using pooled cross-sectional data from the Mexican National Household Income and Expenditure Survey 2008 and 2010. The analytical sample included 29,468 households in 2008 and 27,654 in 2010. The generalized ordered logistic model showed that the financial crisis significantly (P<0·05) decreased the probability of being food secure, mildly or moderately food insecure, compared with being severely food insecure (OR=0·74). A similar but smaller effect was found when comparing severely and moderately food-insecure households with mildly food-insecure and food-secure households (OR=0·81). The ordinary least squares model showed that the crisis significantly (P<0·05) increased the share of total income spent on food (β coefficient of 0·02). The quantile regression confirmed the findings suggested by the generalized ordered logistic model, showing that the effects of the crisis were more profound among poorer households. The results suggest that households that were more vulnerable before the financial crisis saw a worsened effect in terms of food insecurity with the crisis. Findings were consistent with both measures of food security--one based on self-reported experience and the other based on food spending.
Cooley, Richard L.
1983-01-01
This paper investigates factors influencing the degree of improvement in estimates of parameters of a nonlinear regression groundwater flow model by incorporating prior information of unknown reliability. Consideration of expected behavior of the regression solutions and results of a hypothetical modeling problem lead to several general conclusions. First, if the parameters are properly scaled, linearized expressions for the mean square error (MSE) in parameter estimates of a nonlinear model will often behave very nearly as if the model were linear. Second, by using prior information, the MSE in properly scaled parameters can be reduced greatly over the MSE of ordinary least squares estimates of parameters. Third, plots of estimated MSE and the estimated standard deviation of MSE versus an auxiliary parameter (the ridge parameter) specifying the degree of influence of the prior information on regression results can help determine the potential for improvement of parameter estimates. Fourth, proposed criteria can be used to make appropriate choices for the ridge parameter and another parameter expressing degree of overall bias in the prior information. Results of a case study of Truckee Meadows, Reno-Sparks area, Washoe County, Nevada, conform closely to the results of the hypothetical problem. In the Truckee Meadows case, incorporation of prior information did not greatly change the parameter estimates from those obtained by ordinary least squares. However, the analysis showed that both sets of estimates are more reliable than suggested by the standard errors from ordinary least squares.
Applying Regression Analysis to Problems in Institutional Research.
ERIC Educational Resources Information Center
Bohannon, Tom R.
1988-01-01
Regression analysis is one of the most frequently used statistical techniques in institutional research. Principles of least squares, model building, residual analysis, influence statistics, and multi-collinearity are described and illustrated. (Author/MSE)
On the calibration process of film dosimetry: OLS inverse regression versus WLS inverse prediction.
Crop, F; Van Rompaye, B; Paelinck, L; Vakaet, L; Thierens, H; De Wagter, C
2008-07-21
The purpose of this study was both putting forward a statistically correct model for film calibration and the optimization of this process. A reliable calibration is needed in order to perform accurate reference dosimetry with radiographic (Gafchromic) film. Sometimes, an ordinary least squares simple linear (in the parameters) regression is applied to the dose-optical-density (OD) curve with the dose as a function of OD (inverse regression) or sometimes OD as a function of dose (inverse prediction). The application of a simple linear regression fit is an invalid method because heteroscedasticity of the data is not taken into account. This could lead to erroneous results originating from the calibration process itself and thus to a lower accuracy. In this work, we compare the ordinary least squares (OLS) inverse regression method with the correct weighted least squares (WLS) inverse prediction method to create calibration curves. We found that the OLS inverse regression method could lead to a prediction bias of up to 7.3 cGy at 300 cGy and total prediction errors of 3% or more for Gafchromic EBT film. Application of the WLS inverse prediction method resulted in a maximum prediction bias of 1.4 cGy and total prediction errors below 2% in a 0-400 cGy range. We developed a Monte-Carlo-based process to optimize calibrations, depending on the needs of the experiment. This type of thorough analysis can lead to a higher accuracy for film dosimetry.
Spatial Autocorrelation of Cancer Incidence in Saudi Arabia
Al-Ahmadi, Khalid; Al-Zahrani, Ali
2013-01-01
Little is known about the geographic distribution of common cancers in Saudi Arabia. We explored the spatial incidence patterns of common cancers in Saudi Arabia using spatial autocorrelation analyses, employing the global Moran’s I and Anselin’s local Moran’s I statistics to detect nonrandom incidence patterns. Global ordinary least squares (OLS) regression and local geographically-weighted regression (GWR) were applied to examine the spatial correlation of cancer incidences at the city level. Population-based records of cancers diagnosed between 1998 and 2004 were used. Male lung cancer and female breast cancer exhibited positive statistically significant global Moran’s I index values, indicating a tendency toward clustering. The Anselin’s local Moran’s I analyses revealed small significant clusters of lung cancer, prostate cancer and Hodgkin’s disease among males in the Eastern region and significant clusters of thyroid cancers in females in the Eastern and Riyadh regions. Additionally, both regression methods found significant associations among various cancers. For example, OLS and GWR revealed significant spatial associations among NHL, leukemia and Hodgkin’s disease (r² = 0.49–0.67 using OLS and r² = 0.52–0.68 using GWR) and between breast and prostate cancer (r² = 0.53 OLS and 0.57 GWR) in Saudi Arabian cities. These findings may help to generate etiologic hypotheses of cancer causation and identify spatial anomalies in cancer incidence in Saudi Arabia. Our findings should stimulate further research on the possible causes underlying these clusters and associations. PMID:24351742
Who Will Win?: Predicting the Presidential Election Using Linear Regression
ERIC Educational Resources Information Center
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
The Variance Normalization Method of Ridge Regression Analysis.
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
The testing of contemporary sociological theory often calls for the application of structural-equation models to data which are inherently collinear. It is shown that simple ridge regression, which is commonly used for controlling the instability of ordinary least squares regression estimates in ill-conditioned data sets, is not a legitimate…
USDA-ARS?s Scientific Manuscript database
In multivariate regression analysis of spectroscopy data, spectral preprocessing is often performed to reduce unwanted background information (offsets, sloped baselines) or accentuate absorption features in intrinsically overlapping bands. These procedures, also known as pretreatments, are commonly ...
Order-constrained linear optimization.
Tidwell, Joe W; Dougherty, Michael R; Chrabaszcz, Jeffrey S; Thomas, Rick P
2017-11-01
Despite the fact that data and theories in the social, behavioural, and health sciences are often represented on an ordinal scale, there has been relatively little emphasis on modelling ordinal properties. The most common analytic framework used in psychological science is the general linear model, whose variants include ANOVA, MANOVA, and ordinary linear regression. While these methods are designed to provide the best fit to the metric properties of the data, they are not designed to maximally model ordinal properties. In this paper, we develop an order-constrained linear least-squares (OCLO) optimization algorithm that maximizes the linear least-squares fit to the data conditional on maximizing the ordinal fit based on Kendall's τ. The algorithm builds on the maximum rank correlation estimator (Han, 1987, Journal of Econometrics, 35, 303) and the general monotone model (Dougherty & Thomas, 2012, Psychological Review, 119, 321). Analyses of simulated data indicate that when modelling data that adhere to the assumptions of ordinary least squares, OCLO shows minimal bias, little increase in variance, and almost no loss in out-of-sample predictive accuracy. In contrast, under conditions in which data include a small number of extreme scores (fat-tailed distributions), OCLO shows less bias and variance, and substantially better out-of-sample predictive accuracy, even when the outliers are removed. We show that the advantages of OCLO over ordinary least squares in predicting new observations hold across a variety of scenarios in which researchers must decide to retain or eliminate extreme scores when fitting data. © 2017 The British Psychological Society.
Pragmatic estimation of a spatio-temporal air quality model with irregular monitoring data
NASA Astrophysics Data System (ADS)
Sampson, Paul D.; Szpiro, Adam A.; Sheppard, Lianne; Lindström, Johan; Kaufman, Joel D.
2011-11-01
Statistical analyses of health effects of air pollution have increasingly used GIS-based covariates for prediction of ambient air quality in "land use" regression models. More recently these spatial regression models have accounted for spatial correlation structure in combining monitoring data with land use covariates. We present a flexible spatio-temporal modeling framework and pragmatic, multi-step estimation procedure that accommodates essentially arbitrary patterns of missing data with respect to an ideally complete space by time matrix of observations on a network of monitoring sites. The methodology incorporates a model for smooth temporal trends with coefficients varying in space according to Partial Least Squares regressions on a large set of geographic covariates and nonstationary modeling of spatio-temporal residuals from these regressions. This work was developed to provide spatial point predictions of PM 2.5 concentrations for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air) using irregular monitoring data derived from the AQS regulatory monitoring network and supplemental short-time scale monitoring campaigns conducted to better predict intra-urban variation in air quality. We demonstrate the interpretation and accuracy of this methodology in modeling data from 2000 through 2006 in six U.S. metropolitan areas and establish a basis for likelihood-based estimation.
Cui, Zaixu; Gong, Gaolang
2018-06-02
Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
Neither fixed nor random: weighted least squares meta-regression.
Stanley, T D; Doucouliagos, Hristos
2017-03-01
Our study revisits and challenges two core conventional meta-regression estimators: the prevalent use of 'mixed-effects' or random-effects meta-regression analysis and the correction of standard errors that defines fixed-effects meta-regression analysis (FE-MRA). We show how and explain why an unrestricted weighted least squares MRA (WLS-MRA) estimator is superior to conventional random-effects (or mixed-effects) meta-regression when there is publication (or small-sample) bias that is as good as FE-MRA in all cases and better than fixed effects in most practical applications. Simulations and statistical theory show that WLS-MRA provides satisfactory estimates of meta-regression coefficients that are practically equivalent to mixed effects or random effects when there is no publication bias. When there is publication selection bias, WLS-MRA always has smaller bias than mixed effects or random effects. In practical applications, an unrestricted WLS meta-regression is likely to give practically equivalent or superior estimates to fixed-effects, random-effects, and mixed-effects meta-regression approaches. However, random-effects meta-regression remains viable and perhaps somewhat preferable if selection for statistical significance (publication bias) can be ruled out and when random, additive normal heterogeneity is known to directly affect the 'true' regression coefficient. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Prahutama, Alan; Suparti; Wahyu Utami, Tiani
2018-03-01
Regression analysis is an analysis to model the relationship between response variables and predictor variables. The parametric approach to the regression model is very strict with the assumption, but nonparametric regression model isn’t need assumption of model. Time series data is the data of a variable that is observed based on a certain time, so if the time series data wanted to be modeled by regression, then we should determined the response and predictor variables first. Determination of the response variable in time series is variable in t-th (yt), while the predictor variable is a significant lag. In nonparametric regression modeling, one developing approach is to use the Fourier series approach. One of the advantages of nonparametric regression approach using Fourier series is able to overcome data having trigonometric distribution. In modeling using Fourier series needs parameter of K. To determine the number of K can be used Generalized Cross Validation method. In inflation modeling for the transportation sector, communication and financial services using Fourier series yields an optimal K of 120 parameters with R-square 99%. Whereas if it was modeled by multiple linear regression yield R-square 90%.
NASA Astrophysics Data System (ADS)
de Oliveira, Isadora R. N.; Roque, Jussara V.; Maia, Mariza P.; Stringheta, Paulo C.; Teófilo, Reinaldo F.
2018-04-01
A new method was developed to determine the antioxidant properties of red cabbage extract (Brassica oleracea) by mid (MID) and near (NIR) infrared spectroscopies and partial least squares (PLS) regression. A 70% (v/v) ethanolic extract of red cabbage was concentrated to 9° Brix and further diluted (12 to 100%) in water. The dilutions were used as external standards for the building of PLS models. For the first time, this strategy was applied for building multivariate regression models. Reference analyses and spectral data were obtained from diluted extracts. The determinate properties were total and monomeric anthocyanins, total polyphenols and antioxidant capacity by ABTS (2,2-azino-bis(3-ethyl-benzothiazoline-6-sulfonate)) and DPPH (2,2-diphenyl-1-picrylhydrazyl) methods. Ordered predictors selection (OPS) and genetic algorithm (GA) were used for feature selection before PLS regression (PLS-1). In addition, a PLS-2 regression was applied to all properties simultaneously. PLS-1 models provided more predictive models than did PLS-2 regression. PLS-OPS and PLS-GA models presented excellent prediction results with a correlation coefficient higher than 0.98. However, the best models were obtained using PLS and variable selection with the OPS algorithm and the models based on NIR spectra were considered more predictive for all properties. Then, these models provided a simple, rapid and accurate method for determination of red cabbage extract antioxidant properties and its suitability for use in the food industry.
Christensen, A L; Lundbye-Christensen, S; Dethlefsen, C
2011-12-01
Several statistical methods of assessing seasonal variation are available. Brookhart and Rothman [3] proposed a second-order moment-based estimator based on the geometrical model derived by Edwards [1], and reported that this estimator is superior in estimating the peak-to-trough ratio of seasonal variation compared with Edwards' estimator with respect to bias and mean squared error. Alternatively, seasonal variation may be modelled using a Poisson regression model, which provides flexibility in modelling the pattern of seasonal variation and adjustments for covariates. Based on a Monte Carlo simulation study three estimators, one based on the geometrical model, and two based on log-linear Poisson regression models, were evaluated in regards to bias and standard deviation (SD). We evaluated the estimators on data simulated according to schemes varying in seasonal variation and presence of a secular trend. All methods and analyses in this paper are available in the R package Peak2Trough[13]. Applying a Poisson regression model resulted in lower absolute bias and SD for data simulated according to the corresponding model assumptions. Poisson regression models had lower bias and SD for data simulated to deviate from the corresponding model assumptions than the geometrical model. This simulation study encourages the use of Poisson regression models in estimating the peak-to-trough ratio of seasonal variation as opposed to the geometrical model. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Lombard, Pamela J.; Hodgkins, Glenn A.
2015-01-01
Regression equations to estimate peak streamflows with 1- to 500-year recurrence intervals (annual exceedance probabilities from 99 to 0.2 percent, respectively) were developed for small, ungaged streams in Maine. Equations presented here are the best available equations for estimating peak flows at ungaged basins in Maine with drainage areas from 0.3 to 12 square miles (mi2). Previously developed equations continue to be the best available equations for estimating peak flows for basin areas greater than 12 mi2. New equations presented here are based on streamflow records at 40 U.S. Geological Survey streamgages with a minimum of 10 years of recorded peak flows between 1963 and 2012. Ordinary least-squares regression techniques were used to determine the best explanatory variables for the regression equations. Traditional map-based explanatory variables were compared to variables requiring field measurements. Two field-based variables—culvert rust lines and bankfull channel widths—either were not commonly found or did not explain enough of the variability in the peak flows to warrant inclusion in the equations. The best explanatory variables were drainage area and percent basin wetlands; values for these variables were determined with a geographic information system. Generalized least-squares regression was used with these two variables to determine the equation coefficients and estimates of accuracy for the final equations.
Online measurement of urea concentration in spent dialysate during hemodialysis.
Olesberg, Jonathon T; Arnold, Mark A; Flanigan, Michael J
2004-01-01
We describe online optical measurements of urea in the effluent dialysate line during regular hemodialysis treatment of several patients. Monitoring urea removal can provide valuable information about dialysis efficiency. Spectral measurements were performed with a Fourier-transform infrared spectrometer equipped with a flow-through cell. Spectra were recorded across the 5000-4000 cm(-1) (2.0-2.5 microm) wavelength range at 1-min intervals. Savitzky-Golay filtering was used to remove baseline variations attributable to the temperature dependence of the water absorption spectrum. Urea concentrations were extracted from the filtered spectra by use of partial least-squares regression and the net analyte signal of urea. Urea concentrations predicted by partial least-squares regression matched concentrations obtained from standard chemical assays with a root mean square error of 0.30 mmol/L (0.84 mg/dL urea nitrogen) over an observed concentration range of 0-11 mmol/L. The root mean square error obtained with the net analyte signal of urea was 0.43 mmol/L with a calibration based only on a set of pure-component spectra. The error decreased to 0.23 mmol/L when a slope and offset correction were used. Urea concentrations can be continuously monitored during hemodialysis by near-infrared spectroscopy. Calibrations based on the net analyte signal of urea are particularly appealing because they do not require a training step, as do statistical multivariate calibration procedures such as partial least-squares regression.
Treatment of singularities in a middle-crack tension specimen
NASA Technical Reports Server (NTRS)
Shivakumar, K. N.; Raju, I. S.
1990-01-01
A three-dimensional finite-element analysis of a middle-crack tension specimen subjected to mode I loading was performed to study the stress singularity along the crack front. The specimen was modeled using 20-node isoparametric elements with collapsed nonsingular elements at the crack front. The displacements and stresses from the analysis were used to estimate the power of singularities, by a log-log regression analysis, along the crack front. Analyses showed that finite-sized cracked bodies have two singular stress fields. Because of two singular stress fields near the free surface and the classical square root singularity elsewhere, the strain energy release rate appears to be an appropriate parameter all along the crack front.
Validation of Nimbus-7 temperature-humidity infrared radiometer estimates of cloud type and amount
NASA Technical Reports Server (NTRS)
Stowe, L. L.
1982-01-01
Estimates of clear and low, middle and high cloud amount in fixed geographical regions approximately (160 km) squared are being made routinely from 11.5 micron radiance measurements of the Nimbus-7 Temperature-Humidity Infrared Radiometer (THIR). The purpose of validation is to determine the accuracy of the THIR cloud estimates. Validation requires that a comparison be made between the THIR estimates of cloudiness and the 'true' cloudiness. The validation results reported in this paper use human analysis of concurrent but independent satellite images with surface meteorological and radiosonde observations to approximate the 'true' cloudiness. Regression and error analyses are used to estimate the systematic and random errors of THIR derived clear amount.
Low-flow, base-flow, and mean-flow regression equations for Pennsylvania streams
Stuckey, Marla H.
2006-01-01
Low-flow, base-flow, and mean-flow characteristics are an important part of assessing water resources in a watershed. These streamflow characteristics can be used by watershed planners and regulators to determine water availability, water-use allocations, assimilative capacities of streams, and aquatic-habitat needs. Streamflow characteristics are commonly predicted by use of regression equations when a nearby streamflow-gaging station is not available. Regression equations for predicting low-flow, base-flow, and mean-flow characteristics for Pennsylvania streams were developed from data collected at 293 continuous- and partial-record streamflow-gaging stations with flow unaffected by upstream regulation, diversion, or mining. Continuous-record stations used in the regression analysis had 9 years or more of data, and partial-record stations used had seven or more measurements collected during base-flow conditions. The state was divided into five low-flow regions and regional regression equations were developed for the 7-day, 10-year; 7-day, 2-year; 30-day, 10-year; 30-day, 2-year; and 90-day, 10-year low flows using generalized least-squares regression. Statewide regression equations were developed for the 10-year, 25-year, and 50-year base flows using generalized least-squares regression. Statewide regression equations were developed for harmonic mean and mean annual flow using weighted least-squares regression. Basin characteristics found to be significant explanatory variables at the 95-percent confidence level for one or more regression equations were drainage area, basin slope, thickness of soil, stream density, mean annual precipitation, mean elevation, and the percentage of glaciation, carbonate bedrock, forested area, and urban area within a basin. Standard errors of prediction ranged from 33 to 66 percent for the n-day, T-year low flows; 21 to 23 percent for the base flows; and 12 to 38 percent for the mean annual flow and harmonic mean, respectively. The regression equations are not valid in watersheds with upstream regulation, diversions, or mining activities. Watersheds with karst features need close examination as to the applicability of the regression-equation results.
NASA Technical Reports Server (NTRS)
Argentiero, P.; Lowrey, B.
1977-01-01
The least squares collocation algorithm for estimating gravity anomalies from geodetic data is shown to be an application of the well known regression equations which provide the mean and covariance of a random vector (gravity anomalies) given a realization of a correlated random vector (geodetic data). It is also shown that the collocation solution for gravity anomalies is equivalent to the conventional least-squares-Stokes' function solution when the conventional solution utilizes properly weighted zero a priori estimates. The mathematical and physical assumptions underlying the least squares collocation estimator are described.
Quantum State Tomography via Linear Regression Estimation
Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan
2013-01-01
A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519
NASA Technical Reports Server (NTRS)
Parada, N. D. J. (Principal Investigator); Moreira, M. A.; Chen, S. C.; Batista, G. T.
1984-01-01
A procedure to estimate wheat (Triticum aestivum L) area using sampling technique based on aerial photographs and digital LANDSAT MSS data is developed. Aerial photographs covering 720 square km are visually analyzed. To estimate wheat area, a regression approach is applied using different sample sizes and various sampling units. As the size of sampling unit decreased, the percentage of sampled area required to obtain similar estimation performance also decreased. The lowest percentage of the area sampled for wheat estimation with relatively high precision and accuracy through regression estimation is 13.90% using 10 square km as the sampling unit. Wheat area estimation using only aerial photographs is less precise and accurate than those obtained by regression estimation.
Martelo-Vidal, M J; Vázquez, M
2014-09-01
Spectral analysis is a quick and non-destructive method to analyse wine. In this work, trans-resveratrol, oenin, malvin, catechin, epicatechin, quercetin and syringic acid were determined in commercial red wines from DO Rías Baixas and DO Ribeira Sacra (Spain) by UV-VIS-NIR spectroscopy. Calibration models were developed using principal component regression (PCR) or partial least squares (PLS) regression. HPLC was used as reference method. The results showed that reliable PLS models were obtained to quantify all polyphenols for Rías Baixas wines. For Ribeira Sacra, feasible models were obtained to determine quercetin, epicatechin, oenin and syringic acid. PCR calibration models showed worst reliable of prediction than PLS models. For red wines from mencía grapes, feasible models were obtained for catechin and oenin, regardless the geographical origin. The results obtained demonstrate that UV-VIS-NIR spectroscopy can be used to determine individual polyphenolic compounds in red wines. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Pilot Study of Reasons and Risk Factors for "No-Shows" in a Pediatric Neurology Clinic.
Guzek, Lindsay M; Fadel, William F; Golomb, Meredith R
2015-09-01
Missed clinic appointments lead to decreased patient access, worse patient outcomes, and increased healthcare costs. The goal of this pilot study was to identify reasons for and risk factors associated with missed pediatric neurology outpatient appointments ("no-shows"). This was a prospective cohort study of patients scheduled for 1 week of clinic. Data on patient clinical and demographic information were collected by record review; data on reasons for missed appointments were collected by phone interviews. Univariate and multivariate analyses were conducted using chi-square tests and multiple logistic regression to assess risk factors for missed appointments. Fifty-nine (25%) of 236 scheduled patients were no-shows. Scheduling conflicts (25.9%) and forgetting (20.4%) were the most common reasons for missed appointments. When controlling for confounding factors in the logistic regression, Medicaid (odds ratio 2.36), distance from clinic, and time since appointment was scheduled were associated with missed appointments. Further work in this area is needed. © The Author(s) 2014.
Male-initiated partner abuse during marital separation prior to divorce.
Toews, Michelle L; McKenry, Patrick C; Catlett, Beth S
2003-08-01
The purpose of this study was to assess predictors of male-initiated psychological and physical partner abuse during the separation process prior to divorce among a sample of 80 divorced fathers who reported no physical violence during their marriages. The predictor variables examined were male gender-role identity, female-initiated divorces, dependence on one's former wife, depression, anxiety, and coparental conflict. Through ordinary least square (OLS) regression techniques, it was found that male gender-role identity was positively related to male-initiated psychological abuse during separation. Logistic regression analyses revealed that male-initiated psychological abuse, anxiety level, coparental conflict, and dependence on one's former spouse increased the odds of a man engaging in physical abuse. However, depression decreased the odds of separation physical abuse. The models predicting both male-initiated psychological abuse (F = 2.20, p < .05, R2 = .15) and physical violence during the separation process were significant (Model chi2 = 35.00, df= 7, p < .001).
ERIC Educational Resources Information Center
Kane, Michael T.; Mroch, Andrew A.
2010-01-01
In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit…
Tutorial on Using Regression Models with Count Outcomes Using R
ERIC Educational Resources Information Center
Beaujean, A. Alexander; Morgan, Grant B.
2016-01-01
Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can…
Teaching the Concept of Breakdown Point in Simple Linear Regression.
ERIC Educational Resources Information Center
Chan, Wai-Sum
2001-01-01
Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…
Principles of Quantile Regression and an Application
ERIC Educational Resources Information Center
Chen, Fang; Chalhoub-Deville, Micheline
2014-01-01
Newer statistical procedures are typically introduced to help address the limitations of those already in practice or to deal with emerging research needs. Quantile regression (QR) is introduced in this paper as a relatively new methodology, which is intended to overcome some of the limitations of least squares mean regression (LMR). QR is more…
The concept of psychological regression: metaphors, mapping, Queen Square, and Tavistock Square.
Mercer, Jean
2011-05-01
The term "regression" refers to events in which an individual changes from his or her present level of maturity and regains mental and behavioral characteristics shown at an earlier point in development. This definition has remained constant for over a century, but the implications of the concept have changed systematically from a perspective in which regression was considered pathological, to a current view in which regression may be seen as a positive step in psychotherapy or as a part of normal development. The concept of regression, famously employed by Sigmund Freud and others in his circle, derived from ideas suggested by Herbert Spencer and by John Hughlings Jackson. By the 1940s and '50s, the regression concept was applied by Winnicott and others in treatment of disturbed children and in adult psychotherapy. In addition, behavioral regression came to be seen as a part of a normal developmental trajectory, with a focus on expectable variability. The present article examines historical changes in the regression concept in terms of mapping to biomedical or other metaphors, in terms of a movement from earlier nativism toward an increased environmentalism in psychology, and with respect to other historical factors such as wartime events. The role of dominant metaphors in shifting perspectives on regression is described.
NASA Astrophysics Data System (ADS)
Borodachev, S. M.
2016-06-01
The simple derivation of recursive least squares (RLS) method equations is given as special case of Kalman filter estimation of a constant system state under changing observation conditions. A numerical example illustrates application of RLS to multicollinearity problem.
NASA Astrophysics Data System (ADS)
See, J. J.; Jamaian, S. S.; Salleh, R. M.; Nor, M. E.; Aman, F.
2018-04-01
This research aims to estimate the parameters of Monod model of microalgae Botryococcus Braunii sp growth by the Least-Squares method. Monod equation is a non-linear equation which can be transformed into a linear equation form and it is solved by implementing the Least-Squares linear regression method. Meanwhile, Gauss-Newton method is an alternative method to solve the non-linear Least-Squares problem with the aim to obtain the parameters value of Monod model by minimizing the sum of square error ( SSE). As the result, the parameters of the Monod model for microalgae Botryococcus Braunii sp can be estimated by the Least-Squares method. However, the estimated parameters value obtained by the non-linear Least-Squares method are more accurate compared to the linear Least-Squares method since the SSE of the non-linear Least-Squares method is less than the linear Least-Squares method.
Wei, Chang-Na; Zhou, Qing-He; Wang, Li-Zhong
2017-01-01
Abstract Currently, there is no consensus on how to determine the optimal dose of intrathecal bupivacaine for an individual undergoing an elective cesarean section. In this study, we developed a regression equation between intrathecal 0.5% hyperbaric bupivacaine volume and abdominal girth and vertebral column length, to determine a suitable block level (T5) for elective cesarean section patients. In phase I, we analyzed 374 parturients undergoing an elective cesarean section that received a suitable dose of intrathecal 0.5% hyperbaric bupivacaine after a combined spinal-epidural (CSE) was performed at the L3/4 interspace. Parturients with T5 blockade to pinprick were selected for establishing the regression equation between 0.5% hyperbaric bupivacaine volume and vertebral column length and abdominal girth. Six parturient and neonatal variables, intrathecal 0.5% hyperbaric bupivacaine volume, and spinal anesthesia spread were recorded. Bivariate line correlation analyses, multiple line regression analyses, and 2-tailed t tests or chi-square test were performed, as appropriate. In phase II, another 200 parturients with CSE for elective cesarean section were enrolled to verify the accuracy of the regression equation. In phase I, a total of 143 parturients were selected to establish the following regression equation: YT5 = 0.074X1 − 0.022X2 − 0.017 (YT5 = 0.5% hyperbaric bupivacaine volume for T5 block level; X1 = vertebral column length; and X2 = abdominal girth). In phase II, a total of 189 participants were enrolled in the study to verify the accuracy of the regression equation, and 155 parturients with T5 blockade were deemed eligible, which accounted for 82.01% of all participants. This study evaluated parturients with T5 blockade to pinprick after a CSE for elective cesarean section to establish a regression equation between parturient vertebral column length and abdominal girth and 0.5% hyperbaric intrathecal bupivacaine volume. This equation can accurately predict the suitable intrathecal hyperbaric bupivacaine dose for elective cesarean section. PMID:28834913
Wei, Chang-Na; Zhou, Qing-He; Wang, Li-Zhong
2017-08-01
Currently, there is no consensus on how to determine the optimal dose of intrathecal bupivacaine for an individual undergoing an elective cesarean section. In this study, we developed a regression equation between intrathecal 0.5% hyperbaric bupivacaine volume and abdominal girth and vertebral column length, to determine a suitable block level (T5) for elective cesarean section patients.In phase I, we analyzed 374 parturients undergoing an elective cesarean section that received a suitable dose of intrathecal 0.5% hyperbaric bupivacaine after a combined spinal-epidural (CSE) was performed at the L3/4 interspace. Parturients with T5 blockade to pinprick were selected for establishing the regression equation between 0.5% hyperbaric bupivacaine volume and vertebral column length and abdominal girth. Six parturient and neonatal variables, intrathecal 0.5% hyperbaric bupivacaine volume, and spinal anesthesia spread were recorded. Bivariate line correlation analyses, multiple line regression analyses, and 2-tailed t tests or chi-square test were performed, as appropriate. In phase II, another 200 parturients with CSE for elective cesarean section were enrolled to verify the accuracy of the regression equation.In phase I, a total of 143 parturients were selected to establish the following regression equation: YT5 = 0.074X1 - 0.022X2 - 0.017 (YT5 = 0.5% hyperbaric bupivacaine volume for T5 block level; X1 = vertebral column length; and X2 = abdominal girth). In phase II, a total of 189 participants were enrolled in the study to verify the accuracy of the regression equation, and 155 parturients with T5 blockade were deemed eligible, which accounted for 82.01% of all participants.This study evaluated parturients with T5 blockade to pinprick after a CSE for elective cesarean section to establish a regression equation between parturient vertebral column length and abdominal girth and 0.5% hyperbaric intrathecal bupivacaine volume. This equation can accurately predict the suitable intrathecal hyperbaric bupivacaine dose for elective cesarean section.
Comparison of random regression test-day models for Polish Black and White cattle.
Strabel, T; Szyda, J; Ptak, E; Jamrozik, J
2005-10-01
Test-day milk yields of first-lactation Black and White cows were used to select the model for routine genetic evaluation of dairy cattle in Poland. The population of Polish Black and White cows is characterized by small herd size, low level of production, and relatively early peak of lactation. Several random regression models for first-lactation milk yield were initially compared using the "percentage of squared bias" criterion and the correlations between true and predicted breeding values. Models with random herd-test-date effects, fixed age-season and herd-year curves, and random additive genetic and permanent environmental curves (Legendre polynomials of different orders were used for all regressions) were chosen for further studies. Additional comparisons included analyses of the residuals and shapes of variance curves in days in milk. The low production level and early peak of lactation of the breed required the use of Legendre polynomials of order 5 to describe age-season lactation curves. For the other curves, Legendre polynomials of order 3 satisfactorily described daily milk yield variation. Fitting third-order polynomials for the permanent environmental effect made it possible to adequately account for heterogeneous residual variance at different stages of lactation.
Spatial regression models of park and land-use impacts on the urban heat island in central Beijing.
Dai, Zhaoxin; Guldmann, Jean-Michel; Hu, Yunfeng
2018-06-01
Understanding the relationship between urban land structure and land surface temperatures (LST) is important for mitigating the urban heat island (UHI). This paper explores this relationship within central Beijing, an area located within the 2nd Ring Road. The urban variables include the Normalized Difference Vegetation Index (NDVI), the Normalized Difference Build-up Index (NDBI), the area of building footprints, the area of main roads, the area of water bodies and a gravity index for parks that account for both park size and distance. The data are captured over 8 grids of square cells (30 m, 60 m, 90 m, 120 m, 150 m, 180 m, 210 m, 240 m). The research involves: (1) estimating land surface temperatures using Landsat 8 satellite imagery, (2) building the database of urban variables, and (3) conducting regression analyses. The results show that (1) all the variables impact surface temperatures, (2) spatial regressions are necessary to capture neighboring effects, and (3) higher-order polynomial functions are more suitable for capturing the effects of NDVI and NDBI. Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Argentiero, P.; Lowrey, B.
1976-01-01
The least squares collocation algorithm for estimating gravity anomalies from geodetic data is shown to be an application of the well known regression equations which provide the mean and covariance of a random vector (gravity anomalies) given a realization of a correlated random vector (geodetic data). It is also shown that the collocation solution for gravity anomalies is equivalent to the conventional least-squares-Stokes' function solution when the conventional solution utilizes properly weighted zero a priori estimates. The mathematical and physical assumptions underlying the least squares collocation estimator are described, and its numerical properties are compared with the numerical properties of the conventional least squares estimator.
Giovenzana, Valentina; Civelli, Raffaele; Beghi, Roberto; Oberti, Roberto; Guidetti, Riccardo
2015-11-01
The aim of this work was to test a simplified optical prototype for a rapid estimation of the ripening parameters of white grape for Franciacorta wine directly in field. Spectral acquisition based on reflectance at four wavelengths (630, 690, 750 and 850 nm) was proposed. The integration of a simple processing algorithm in the microcontroller software would allow to visualize real time values of spectral reflectance. Non-destructive analyses were carried out on 95 grape bunches for a total of 475 berries. Samplings were performed weekly during the last ripening stages. Optical measurements were carried out both using the simplified system and a portable commercial vis/NIR spectrophotometer, as reference instrument for performance comparison. Chemometric analyses were performed in order to extract the maximum useful information from optical data. Principal component analysis (PCA) was performed for a preliminary evaluation of the data. Correlations between the optical data matrix and ripening parameters (total soluble solids content, SSC; titratable acidity, TA) were carried out using partial least square (PLS) regression for spectra and using multiple linear regression (MLR) for data from the simplified device. Classification analysis were also performed with the aim of discriminate ripe and unripe samples. PCA, MLR and classification analyses show the effectiveness of the simplified system in separating samples among different sampling dates and in discriminating ripe from unripe samples. Finally, simple equations for SSC and TA prediction were calculated. Copyright © 2015 Elsevier B.V. All rights reserved.
de Freitas, Brunnella Alcantara Chagas; Sant'Ana, Luciana Ferreira da Rocha; Longo, Giana Zarbato; Siqueira-Batista, Rodrigo; Priore, Silvia Eloiza; Franceschin, Sylvia do Carmo Castro
2012-01-01
Objective To analyze the process of care provided to premature infants in a neonatal intensive care unit and the factors associated with their mortality. Methods Cross-sectional retrospective study of premature infants in an intensive care unit between 2008 and 2010. The characteristics of the mothers and premature infants were described, and a bivariate analysis was performed on the following characteristics: the study period and the "death" outcome (hospital, neonatal and early) using Pearson's chi-square test, Fisher's exact test or a chi-square test for linear trends. Bivariate and multivariable logistic regression analyses were performed using a stepwise backward logistic regression method between the variables with p<0.20 and the "death" outcome. A p value <0.05 was considered to be significant. Results In total, 293 preterm infants were studied. Increased access to complementary tests (transfontanellar ultrasound and Doppler echocardiogram) and breastfeeding rates were indicators of improving care. Mortality was concentrated in the neonatal period, especially in the early neonatal period, and was associated with extreme prematurity, small size for gestational age and an Apgar score <7 at 5 minutes after birth. The late-onset sepsis was also associated with a greater chance of neonatal death, and antenatal corticosteroids were protective against neonatal and early deaths. Conclusions Although these results are comparable to previous findings regarding mortality among premature infants in Brazil, the study emphasizes the need to implement strategies that promote breastfeeding and reduce neonatal mortality and its early component. PMID:23917938
Rein, Thomas R
2011-11-01
Phalanges are considered to be highly informative in the reconstruction of extinct primate locomotor behavior since these skeletal elements directly interact with the substrate during locomotion. Variation in shaft curvature and relative phalangeal length has been linked to differences in the degree of suspension and overall arboreal locomotor activities. Building on previous work, this study investigated these two skeletal characters in a comparative context to analyze function, while taking evolutionary relationships into account. This study examined the correspondence between proportions of suspension and overall substrate usage observed in 17 extant taxa and included angle of curvature and relative phalangeal length. Predictive models based on these traits are reported. Published proportions of different locomotor behaviors were regressed against each phalangeal measurement and a size proxy. The relationship between each behavior and skeletal trait was investigated using ordinary least-squares, phylogenetic generalized least-squares (pGLS), and two pGLS transformation methods to determine the model of best-fit. Phalangeal curvature and relative length had significant positive relationships with both suspension and overall arboreal locomotion. Cross-validation analyses demonstrated that relative length and curvature provide accurate predictions of relative suspensory behavior and substrate usage in a range of extant species when used together in predictive models. These regression equations provide a refined method to assess the amount of suspensory and overall arboreal locomotion characterizing species in the catarrhine fossil record. Copyright © 2011 Wiley-Liss, Inc.
Mapping health outcome measures from a stroke registry to EQ-5D weights
2013-01-01
Purpose To map health outcome related variables from a national register, not part of any validated instrument, with EQ-5D weights among stroke patients. Methods We used two cross-sectional data sets including patient characteristics, outcome variables and EQ-5D weights from the national Swedish stroke register. Three regression techniques were used on the estimation set (n = 272): ordinary least squares (OLS), Tobit, and censored least absolute deviation (CLAD). The regression coefficients for “dressing“, “toileting“, “mobility”, “mood”, “general health” and “proxy-responders” were applied to the validation set (n = 272), and the performance was analysed with mean absolute error (MAE) and mean square error (MSE). Results The number of statistically significant coefficients varied by model, but all models generated consistent coefficients in terms of sign. Mean utility was underestimated in all models (least in OLS) and with lower variation (least in OLS) compared to the observed. The maximum attainable EQ-5D weight ranged from 0.90 (OLS) to 1.00 (Tobit and CLAD). Health states with utility weights <0.5 had greater errors than those with weights ≥0.5 (P < 0.01). Conclusion This study indicates that it is possible to map non-validated health outcome measures from a stroke register into preference-based utilities to study the development of stroke care over time, and to compare with other conditions in terms of utility. PMID:23496957
Radioecological modelling of Polonium-210 and Caesium-137 in lichen-reindeer-man and top predators.
Persson, Bertil R R; Gjelsvik, Runhild; Holm, Elis
2018-06-01
This work deals with analysis and modelling of the radionuclides 210 Pb and 210 Po in the food-chain lichen-reindeer-man in addition to 210 Po and 137 Cs in top predators. By using the methods of Partial Least Square Regression (PLSR) the atmospheric deposition of 210 Pb and 210 Po is predicted at the sample locations. Dynamic modelling of the activity concentration with differential equations is fitted to the sample data. Reindeer lichen consumption, gastrointestinal absorption, organ distribution and elimination is derived from information in the literature. Dynamic modelling of transfer of 210 Pb and 210 Po to reindeer meat, liver and bone from lichen consumption, fitted well with data from Sweden and Finland from 1966 to 1971. The activity concentration of 210 Pb in the skeleton in man is modelled by using the results of studying the kinetics of lead in skeleton and blood in lead-workers after end of occupational exposure. The result of modelling 210 Pb and 210 Po activity in skeleton matched well with concentrations of 210 Pb and 210 Po in teeth from reindeer-breeders and autopsy bone samples in Finland. The results of 210 Po and 137 Cs in different tissues of wolf, wolverine and lynx previously published, are analysed with multivariate data processing methods such as Principal Component Analysis PCA, and modelled with the method of Projection to Latent Structures, PLS, or Partial Least Square Regression PLSR. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rapid Detection of Volatile Oil in Mentha haplocalyx by Near-Infrared Spectroscopy and Chemometrics.
Yan, Hui; Guo, Cheng; Shao, Yang; Ouyang, Zhen
2017-01-01
Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . The effects of data pre-processing methods on the accuracy of the PLSR calibration models were investigated. The performance of the final model was evaluated according to the correlation coefficient ( R ) and root mean square error of prediction (RMSEP). For PLSR model, the best preprocessing method combination was first-order derivative, standard normal variate transformation (SNV), and mean centering, which had of 0.8805, of 0.8719, RMSEC of 0.091, and RMSEP of 0.097, respectively. The wave number variables linking to volatile oil are from 5500 to 4000 cm-1 by analyzing the loading weights and variable importance in projection (VIP) scores. For SVM model, six LVs (less than seven LVs in PLSR model) were adopted in model, and the result was better than PLSR model. The and were 0.9232 and 0.9202, respectively, with RMSEC and RMSEP of 0.084 and 0.082, respectively, which indicated that the predicted values were accurate and reliable. This work demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in M. haplocalyx . The quality of medicine directly links to clinical efficacy, thus, it is important to control the quality of Mentha haplocalyx . Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . For SVM model, 6 LVs (less than 7 LVs in PLSR model) were adopted in model, and the result was better than PLSR model. It demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in Mentha haplocalyx . Abbreviations used: 1 st der: First-order derivative; 2 nd der: Second-order derivative; LOO: Leave-one-out; LVs: Latent variables; MC: Mean centering, NIR: Near-infrared; NIRS: Near infrared spectroscopy; PCR: Principal component regression, PLSR: Partial least squares regression; RBF: Radial basis function; RMSEC: Root mean square error of cross validation, RMSEC: Root mean square error of calibration; RMSEP: Root mean square error of prediction; SNV: Standard normal variate transformation; SVM: Support vector machine; VIP: Variable Importance in projection.
Olson, Scott A.
2003-01-01
The stream-gaging network in New Hampshire was analyzed for its effectiveness in providing regional information on peak-flood flow, mean-flow, and low-flow frequency. The data available for analysis were from stream-gaging stations in New Hampshire and selected stations in adjacent States. The principles of generalized-least-squares regression analysis were applied to develop regional regression equations that relate streamflow-frequency characteristics to watershed characteristics. Regression equations were developed for (1) the instantaneous peak flow with a 100-year recurrence interval, (2) the mean-annual flow, and (3) the 7-day, 10-year low flow. Active and discontinued stream-gaging stations with 10 or more years of flow data were used to develop the regression equations. Each stream-gaging station in the network was evaluated and ranked on the basis of how much the data from that station contributed to the cost-weighted sampling-error component of the regression equation. The potential effect of data from proposed and new stream-gaging stations on the sampling error also was evaluated. The stream-gaging network was evaluated for conditions in water year 2000 and for estimated conditions under various network strategies if an additional 5 years and 20 years of streamflow data were collected. The effectiveness of the stream-gaging network in providing regional streamflow information could be improved for all three flow characteristics with the collection of additional flow data, both temporally and spatially. With additional years of data collection, the greatest reduction in the average sampling error of the regional regression equations was found for the peak- and low-flow characteristics. In general, additional data collection at stream-gaging stations with unregulated flow, relatively short-term record (less than 20 years), and drainage areas smaller than 45 square miles contributed the largest cost-weighted reduction to the average sampling error of the regional estimating equations. The results of the network analyses can be used to prioritize the continued operation of active stations, the reactivation of discontinued stations, or the activation of new stations to maximize the regional information content provided by the stream-gaging network. Final decisions regarding altering the New Hampshire stream-gaging network would require the consideration of the many uses of the streamflow data serving local, State, and Federal interests.
Zhang, Ni; Liu, Xu; Jin, Xiaoduo; Li, Chen; Wu, Xuan; Yang, Shuqin; Ning, Jifeng; Yanne, Paul
2017-12-15
Phenolics contents in wine grapes are key indicators for assessing ripeness. Near-infrared hyperspectral images during ripening have been explored to achieve an effective method for predicting phenolics contents. Principal component regression (PCR), partial least squares regression (PLSR) and support vector regression (SVR) models were built, respectively. The results show that SVR behaves globally better than PLSR and PCR, except in predicting tannins content of seeds. For the best prediction results, the squared correlation coefficient and root mean square error reached 0.8960 and 0.1069g/L (+)-catechin equivalents (CE), respectively, for tannins in skins, 0.9065 and 0.1776 (g/L CE) for total iron-reactive phenolics (TIRP) in skins, 0.8789 and 0.1442 (g/L M3G) for anthocyanins in skins, 0.9243 and 0.2401 (g/L CE) for tannins in seeds, and 0.8790 and 0.5190 (g/L CE) for TIRP in seeds. Our results indicated that NIR hyperspectral imaging has good prospects for evaluation of phenolics in wine grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Rafindadi, Abdulkadir Abdulrashid; Yusof, Zarinah; Zaman, Khalid; Kyophilavong, Phouphet; Akhmat, Ghulam
2014-10-01
The objective of the study is to examine the relationship between air pollution, fossil fuel energy consumption, water resources, and natural resource rents in the panel of selected Asia-Pacific countries, over a period of 1975-2012. The study includes number of variables in the model for robust analysis. The results of cross-sectional analysis show that there is a significant relationship between air pollution, energy consumption, and water productivity in the individual countries of Asia-Pacific. However, the results of each country vary according to the time invariant shocks. For this purpose, the study employed the panel least square technique which includes the panel least square regression, panel fixed effect regression, and panel two-stage least square regression. In general, all the panel tests indicate that there is a significant and positive relationship between air pollution, energy consumption, and water resources in the region. The fossil fuel energy consumption has a major dominating impact on the changes in the air pollution in the region.
On-line milk spectrometry: analysis of bovine milk composition
NASA Astrophysics Data System (ADS)
Spitzer, Kyle; Kuennemeyer, Rainer; Woolford, Murray; Claycomb, Rod
2005-04-01
We present partial least squares (PLS) regressions to predict the composition of raw, unhomogenised milk using visible to near infrared spectroscopy. A total of 370 milk samples from individual quarters were collected and analysed on-line by two low cost spectrometers in the wavelength ranges 380-1100 nm and 900-1700 nm. Samples were collected from 22 Friesian, 17 Jersey, 2 Ayrshire and 3 Friesian-Jersey crossbred cows over a period of 7 consecutive days. Transmission spectra were recorded in an inline flowcell through a 0.5 mm thick milk sample. PLS models, where wavelength selection was performed using iterative PLS, were developed for fat, protein, lactose, and somatic cell content. The root mean square error of prediction (and correlation coefficient) for the nir and visible spectrometers respectively were 0.70%(0.93) and 0.91%(0.91) for fat, 0.65%(0.5) and 0.47%(0.79) for protein, 0.36%(0.49) and 0.45%(0.43) for lactose, and 0.50(0.54) and 0.48(0.51) for log10 somatic cells.
Azeez, Adeboye; Obaromi, Davies; Odeyemi, Akinwumi; Ndege, James; Muntabayi, Ruffin
2016-07-26
Tuberculosis (TB) is a deadly infectious disease caused by Mycobacteria tuberculosis. Tuberculosis as a chronic and highly infectious disease is prevalent in almost every part of the globe. More than 95% of TB mortality occurs in low/middle income countries. In 2014, approximately 10 million people were diagnosed with active TB and two million died from the disease. In this study, our aim is to compare the predictive powers of the seasonal autoregressive integrated moving average (SARIMA) and neural network auto-regression (SARIMA-NNAR) models of TB incidence and analyse its seasonality in South Africa. TB incidence cases data from January 2010 to December 2015 were extracted from the Eastern Cape Health facility report of the electronic Tuberculosis Register (ERT.Net). A SARIMA model and a combined model of SARIMA model and a neural network auto-regression (SARIMA-NNAR) model were used in analysing and predicting the TB data from 2010 to 2015. Simulation performance parameters of mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), mean percent error (MPE), mean absolute scaled error (MASE) and mean absolute percentage error (MAPE) were applied to assess the better performance of prediction between the models. Though practically, both models could predict TB incidence, the combined model displayed better performance. For the combined model, the Akaike information criterion (AIC), second-order AIC (AICc) and Bayesian information criterion (BIC) are 288.56, 308.31 and 299.09 respectively, which were lower than the SARIMA model with corresponding values of 329.02, 327.20 and 341.99, respectively. The seasonality trend of TB incidence was forecast to have a slightly increased seasonal TB incidence trend from the SARIMA-NNAR model compared to the single model. The combined model indicated a better TB incidence forecasting with a lower AICc. The model also indicates the need for resolute intervention to reduce infectious disease transmission with co-infection with HIV and other concomitant diseases, and also at festival peak periods.
Johansen, Mette; Bahrt, Henriette; Altman, Roy D; Bartels, Else M; Juhl, Carsten B; Bliddal, Henning; Lund, Hans; Christensen, Robin
2016-08-01
The aim was to identify factors explaining inconsistent observations concerning the efficacy of intra-articular hyaluronic acid compared to intra-articular sham/control, or non-intervention control, in patients with symptomatic osteoarthritis, based on randomized clinical trials (RCTs). A systematic review and meta-regression analyses of available randomized trials were conducted. The outcome, pain, was assessed according to a pre-specified hierarchy of potentially available outcomes. Hedges׳s standardized mean difference [SMD (95% CI)] served as effect size. REstricted Maximum Likelihood (REML) mixed-effects models were used to combine study results, and heterogeneity was calculated and interpreted as Tau-squared and I-squared, respectively. Overall, 99 studies (14,804 patients) met the inclusion criteria: Of these, only 71 studies (72%), including 85 comparisons (11,216 patients), had adequate data available for inclusion in the primary meta-analysis. Overall, compared with placebo, intra-articular hyaluronic acid reduced pain with an effect size of -0.39 [-0.47 to -0.31; P < 0.001], combining very heterogeneous trial findings (I(2) = 73%). The three most important covariates in reducing heterogeneity were overall risk of bias, blinding of personnel and trial size, reducing heterogeneity with 26%, 26%, and 25%, respectively (Interaction: P ≤ 0.001). Adjusting for publication/selective outcome reporting bias (by imputing "null effects") in 24 of the comparisons with no data available reduced the combined estimate to -0.30 [-0.36 to -0.23; P < 0.001] still in favor of hyaluronic acid. Based on available trial data, intra-articular hyaluronic acid showed a better effect than intra-articular saline on pain reduction in osteoarthritis. Publication bias and the risk of selective outcome reporting suggest only small clinical effect compared to saline. Copyright © 2016 Elsevier Inc. All rights reserved.
Estimation of Flood-Frequency Discharges for Rural, Unregulated Streams in West Virginia
Wiley, Jeffrey B.; Atkins, John T.
2010-01-01
Flood-frequency discharges were determined for 290 streamgage stations having a minimum of 9 years of record in West Virginia and surrounding states through the 2006 or 2007 water year. No trend was determined in the annual peaks used to calculate the flood-frequency discharges. Multiple and simple least-squares regression equations for the 100-year (1-percent annual-occurrence probability) flood discharge with independent variables that describe the basin characteristics were developed for 290 streamgage stations in West Virginia and adjacent states. The regression residuals for the models were evaluated and used to define three regions of the State, designated as Eastern Panhandle, Central Mountains, and Western Plateaus. Exploratory data analysis procedures identified 44 streamgage stations that were excluded from the development of regression equations representative of rural, unregulated streams in West Virginia. Regional equations for the 1.1-, 1.5-, 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year flood discharges were determined by generalized least-squares regression using data from the remaining 246 streamgage stations. Drainage area was the only significant independent variable determined for all equations in all regions. Procedures developed to estimate flood-frequency discharges on ungaged streams were based on (1) regional equations and (2) drainage-area ratios between gaged and ungaged locations on the same stream. The procedures are applicable only to rural, unregulated streams within the boundaries of West Virginia that have drainage areas within the limits of the stations used to develop the regional equations (from 0.21 to 1,461 square miles in the Eastern Panhandle, from 0.10 to 1,619 square miles in the Central Mountains, and from 0.13 to 1,516 square miles in the Western Plateaus). The accuracy of the equations is quantified by measuring the average prediction error (from 21.7 to 56.3 percent) and equivalent years of record (from 2.0 to 70.9 years).
A New Test of Linear Hypotheses in OLS Regression under Heteroscedasticity of Unknown Form
ERIC Educational Resources Information Center
Cai, Li; Hayes, Andrew F.
2008-01-01
When the errors in an ordinary least squares (OLS) regression model are heteroscedastic, hypothesis tests involving the regression coefficients can have Type I error rates that are far from the nominal significance level. Asymptotically, this problem can be rectified with the use of a heteroscedasticity-consistent covariance matrix (HCCM)…
Deriving the Regression Equation without Using Calculus
ERIC Educational Resources Information Center
Gordon, Sheldon P.; Gordon, Florence S.
2004-01-01
Probably the one "new" mathematical topic that is most responsible for modernizing courses in college algebra and precalculus over the last few years is the idea of fitting a function to a set of data in the sense of a least squares fit. Whether it be simple linear regression or nonlinear regression, this topic opens the door to applying the…
ERIC Educational Resources Information Center
Bulcock, J. W.; And Others
Multicollinearity refers to the presence of highly intercorrelated independent variables in structural equation models, that is, models estimated by using techniques such as least squares regression and maximum likelihood. There is a problem of multicollinearity in both the natural and social sciences where theory formulation and estimation is in…
2009-07-16
0.25 0.26 -0.85 1 SSR SSE R SSTO SSTO = = − 2 2 ˆ( ) : Regression sum of square, ˆwhere : mean value, : value from the fitted line ˆ...Error sum of square : Total sum of square i i i i SSR Y Y Y Y SSE Y Y SSTO SSE SSR = − = − = + ∑ ∑ Statistical analysis: Coefficient of correlation
A regression-kriging model for estimation of rainfall in the Laohahe basin
NASA Astrophysics Data System (ADS)
Wang, Hong; Ren, Li L.; Liu, Gao H.
2009-10-01
This paper presents a multivariate geostatistical algorithm called regression-kriging (RK) for predicting the spatial distribution of rainfall by incorporating five topographic/geographic factors of latitude, longitude, altitude, slope and aspect. The technique is illustrated using rainfall data collected at 52 rain gauges from the Laohahe basis in northeast China during 1986-2005 . Rainfall data from 44 stations were selected for modeling and the remaining 8 stations were used for model validation. To eliminate multicollinearity, the five explanatory factors were first transformed using factor analysis with three Principal Components (PCs) extracted. The rainfall data were then fitted using step-wise regression and residuals interpolated using SK. The regression coefficients were estimated by generalized least squares (GLS), which takes the spatial heteroskedasticity between rainfall and PCs into account. Finally, the rainfall prediction based on RK was compared with that predicted from ordinary kriging (OK) and ordinary least squares (OLS) multiple regression (MR). For correlated topographic factors are taken into account, RK improves the efficiency of predictions. RK achieved a lower relative root mean square error (RMSE) (44.67%) than MR (49.23%) and OK (73.60%) and a lower bias than MR and OK (23.82 versus 30.89 and 32.15 mm) for annual rainfall. It is much more effective for the wet season than for the dry season. RK is suitable for estimation of rainfall in areas where there are no stations nearby and where topography has a major influence on rainfall.
NASA Astrophysics Data System (ADS)
Hart, Brian K.; Griffiths, Peter R.
1998-06-01
Partial least squares (PLS) regression has been evaluated as a robust calibration technique for over 100 hazardous air pollutants (HAPs) measured by open path Fourier transform infrared (OP/FT-IR) spectrometry. PLS has the advantage over the current recommended calibration method of classical least squares (CLS), in that it can look at the whole useable spectrum (700-1300 cm-1, 2000-2150 cm-1, and 2400-3000 cm-1), and detect several analytes simultaneously. Up to one hundred HAPs synthetically added to OP/FT-IR backgrounds have been simultaneously calibrated and detected using PLS. PLS also has the advantage in requiring less preprocessing of spectra than that which is required in CLS calibration schemes, allowing PLS to provide user independent real-time analysis of OP/FT-IR spectra.
Li, Yankun; Shao, Xueguang; Cai, Wensheng
2007-04-15
Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.
Intrinsic Raman spectroscopy for quantitative biological spectroscopy Part II
Bechtel, Kate L.; Shih, Wei-Chuan; Feld, Michael S.
2009-01-01
We demonstrate the effectiveness of intrinsic Raman spectroscopy (IRS) at reducing errors caused by absorption and scattering. Physical tissue models, solutions of varying absorption and scattering coefficients with known concentrations of Raman scatterers, are studied. We show significant improvement in prediction error by implementing IRS to predict concentrations of Raman scatterers using both ordinary least squares regression (OLS) and partial least squares regression (PLS). In particular, we show that IRS provides a robust calibration model that does not increase in error when applied to samples with optical properties outside the range of calibration. PMID:18711512
Ding, A Adam; Wu, Hulin
2014-10-01
We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method.
Ding, A. Adam; Wu, Hulin
2015-01-01
We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method. PMID:26401093
NASA Technical Reports Server (NTRS)
Jacobsen, R. T.; Stewart, R. B.; Crain, R. W., Jr.; Rose, G. L.; Myers, A. F.
1976-01-01
A method was developed for establishing a rational choice of the terms to be included in an equation of state with a large number of adjustable coefficients. The methods presented were developed for use in the determination of an equation of state for oxygen and nitrogen. However, a general application of the methods is possible in studies involving the determination of an optimum polynomial equation for fitting a large number of data points. The data considered in the least squares problem are experimental thermodynamic pressure-density-temperature data. Attention is given to a description of stepwise multiple regression and the use of stepwise regression in the determination of an equation of state for oxygen and nitrogen.
Perez-Guaita, David; Kuligowski, Julia; Quintás, Guillermo; Garrigues, Salvador; Guardia, Miguel de la
2013-03-30
Locally weighted partial least squares regression (LW-PLSR) has been applied to the determination of four clinical parameters in human serum samples (total protein, triglyceride, glucose and urea contents) by Fourier transform infrared (FTIR) spectroscopy. Classical LW-PLSR models were constructed using different spectral regions. For the selection of parameters by LW-PLSR modeling, a multi-parametric study was carried out employing the minimum root-mean square error of cross validation (RMSCV) as objective function. In order to overcome the effect of strong matrix interferences on the predictive accuracy of LW-PLSR models, this work focuses on sample selection. Accordingly, a novel strategy for the development of local models is proposed. It was based on the use of: (i) principal component analysis (PCA) performed on an analyte specific spectral region for identifying most similar sample spectra and (ii) partial least squares regression (PLSR) constructed using the whole spectrum. Results found by using this strategy were compared to those provided by PLSR using the same spectral intervals as for LW-PLSR. Prediction errors found by both, classical and modified LW-PLSR improved those obtained by PLSR. Hence, both proposed approaches were useful for the determination of analytes present in a complex matrix as in the case of human serum samples. Copyright © 2013 Elsevier B.V. All rights reserved.
Application of near-infrared spectroscopy for the rapid quality assessment of Radix Paeoniae Rubra
NASA Astrophysics Data System (ADS)
Zhan, Hao; Fang, Jing; Tang, Liying; Yang, Hongjun; Li, Hua; Wang, Zhuju; Yang, Bin; Wu, Hongwei; Fu, Meihong
2017-08-01
Near-infrared (NIR) spectroscopy with multivariate analysis was used to quantify gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra, and the feasibility to classify the samples originating from different areas was investigated. A new high-performance liquid chromatography method was developed and validated to analyze gallic acid, catechin, albiflorin, and paeoniflorin in Radix Paeoniae Rubra as the reference. Partial least squares (PLS), principal component regression (PCR), and stepwise multivariate linear regression (SMLR) were performed to calibrate the regression model. Different data pretreatments such as derivatives (1st and 2nd), multiplicative scatter correction, standard normal variate, Savitzky-Golay filter, and Norris derivative filter were applied to remove the systematic errors. The performance of the model was evaluated according to the root mean square of calibration (RMSEC), root mean square error of prediction (RMSEP), root mean square error of cross-validation (RMSECV), and correlation coefficient (r). The results show that compared to PCR and SMLR, PLS had a lower RMSEC, RMSECV, and RMSEP and higher r for all the four analytes. PLS coupled with proper pretreatments showed good performance in both the fitting and predicting results. Furthermore, the original areas of Radix Paeoniae Rubra samples were partly distinguished by principal component analysis. This study shows that NIR with PLS is a reliable, inexpensive, and rapid tool for the quality assessment of Radix Paeoniae Rubra.
Escobar-Rodriguez, Tomas; Bartual-Sopena, Lourdes
Enterprise resources planning (ERP) systems enable central and integrative control over all processes throughout an organisation by ensuring one data entry point and the use of a common database. T his paper analyses the attitude of healthcare personnel towards the use of an ERP system in a Spanish public hospital, identifying influencing factors. This research is based on a regression analysis of latent variables using the optimisation technique of partial least squares. We propose a research model including possible relationships among different constructs using the technology acceptance model. Our results show that the personal characteristics of potential users are key factors in explaining attitude towards using ERP systems.
Impact of gender, age and experience of pilots on general aviation accidents.
Bazargan, Massoud; Guzhva, Vitaly S
2011-05-01
General aviation (GA) accounts for more than 82% of all air transport-related accidents and air transport-related fatalities in the U.S. In this study, we conduct a series of statistical analyses to investigate the significance of a pilot's gender, age and experience in influencing the risk for pilot errors and fatalities in GA accidents. There is no evidence from the Chi-square tests and logistic regression models that support the likelihood of an accident caused by pilot error to be related to pilot gender. However, evidence is found that male pilots, those older than 60 years of age, and with more experience, are more likely to be involved in a fatal accident. Copyright © 2010 Elsevier Ltd. All rights reserved.
Performance degradation and cleaning of photovoltaic arrays
NASA Technical Reports Server (NTRS)
Sheskin, T. J.; Chang, G. C.; Cull, R. C.; Knapp, W. D.
1982-01-01
NASA tests results from an 18 mo program of cleaning silicone-encapsulated and glass fronted solar cell panels in urban and desert environments to examine the effects of cleaning on module performance are reported. The panels were cleaned on weekly, monthly, quarterly, or semi-annual basis, while other panels of the same construction were not cleaned and served as controls. Commercially-available detergents and city water were employed for the tests, and the measurements were maintained of the modules' continuing short-circuit current output. The decay of the output was determined by least square regression analyses. Performance degradation was noticeably less in glass covered, rather than silicone-encapsulated modules which decayed faster in urban than in desert environments. Lower frequency cleanings are recommended where labor costs are high.
Ergöçmen, Banu Akadli; Yüksel-Kaptanoğlu, İlknur; Jansen, Henrica A F M Henriette
2013-09-01
This study explores the severity and frequency of physical violence from an intimate partner experienced by 15- to 59-year-old women and their help-seeking behavior by using data from the "National Research on Domestic Violence Against Women in Turkey." Chi-square tests and logistic regression analyses were conducted to compare the relationship between severity and frequency of violence and women's characteristics. Of all ever-partnered women, 36% have been exposed to partner violence; almost half of these experienced severe types of violence. Women used informal strategies to manage the violence instead of seeking help from formal institutions. Help-seeking behavior increases with increased severity and frequency of violence.
Kulis, Stephen; Hodge, David R.; Ayers, Stephanie L.; Brown, Eddie F.; Marsiglia, Flavio F.
2012-01-01
Background and objective This article explores the aspects of spirituality and religious involvement that may be the protective factors against substance use among urban American Indian (AI) youth. Methods Data come from AI youth (N = 123) in five urban middle schools in a southwestern metropolis. Results Ordinary least squares regression analyses indicated that following Christian beliefs and belonging to the Native American Church were associated with lower levels of substance use. Conclusions and Scientific Significance Following AI traditional spiritual beliefs was associated with antidrug attitudes, norms, and expectancies. Having a sense of belonging to traditions from both AI cultures and Christianity may foster integration of the two worlds in which urban AI youth live. PMID:22554065
Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald
2011-06-01
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852
Using heart rate to predict energy expenditure in large domestic dogs.
Gerth, N; Ruoß, C; Dobenecker, B; Reese, S; Starck, J M
2016-06-01
The aim of this study was to establish heart rate as a measure of energy expenditure in large active kennel dogs (28 ± 3 kg bw). Therefore, the heart rate (HR)-oxygen consumption (V˙O2) relationship was analysed in Foxhound-Boxer-Ingelheim-Labrador cross-breds (FBI dogs) at rest and graded levels of exercise on a treadmill up to 60-65% of maximal aerobic capacity. To test for effects of training, HR and V˙O2 were measured in female dogs, before and after a training period, and after an adjacent training pause to test for reversibility of potential effects. Least squares regression was applied to describe the relationship between HR and V˙O2. The applied training had no statistically significant effect on the HR-V˙O2 regression. A general regression line from all data collected was prepared to establish a general predictive equation for energy expenditure from HR in FBI dogs. The regression equation established in this study enables fast estimation of energy requirement for running activity. The equation is valid for large dogs weighing around 30 kg that run at ground level up to 15 km/h with a heart rate maximum of 190 bpm irrespective of the training level. Journal of Animal Physiology and Animal Nutrition © 2015 Blackwell Verlag GmbH.
NASA Astrophysics Data System (ADS)
Haddad, Khaled; Rahman, Ataur; A Zaman, Mohammad; Shrestha, Surendra
2013-03-01
SummaryIn regional hydrologic regression analysis, model selection and validation are regarded as important steps. Here, the model selection is usually based on some measurements of goodness-of-fit between the model prediction and observed data. In Regional Flood Frequency Analysis (RFFA), leave-one-out (LOO) validation or a fixed percentage leave out validation (e.g., 10%) is commonly adopted to assess the predictive ability of regression-based prediction equations. This paper develops a Monte Carlo Cross Validation (MCCV) technique (which has widely been adopted in Chemometrics and Econometrics) in RFFA using Generalised Least Squares Regression (GLSR) and compares it with the most commonly adopted LOO validation approach. The study uses simulated and regional flood data from the state of New South Wales in Australia. It is found that when developing hydrologic regression models, application of the MCCV is likely to result in a more parsimonious model than the LOO. It has also been found that the MCCV can provide a more realistic estimate of a model's predictive ability when compared with the LOO.
Least squares regression methods for clustered ROC data with discrete covariates.
Tang, Liansheng Larry; Zhang, Wei; Li, Qizhai; Ye, Xuan; Chan, Leighton
2016-07-01
The receiver operating characteristic (ROC) curve is a popular tool to evaluate and compare the accuracy of diagnostic tests to distinguish the diseased group from the nondiseased group when test results from tests are continuous or ordinal. A complicated data setting occurs when multiple tests are measured on abnormal and normal locations from the same subject and the measurements are clustered within the subject. Although least squares regression methods can be used for the estimation of ROC curve from correlated data, how to develop the least squares methods to estimate the ROC curve from the clustered data has not been studied. Also, the statistical properties of the least squares methods under the clustering setting are unknown. In this article, we develop the least squares ROC methods to allow the baseline and link functions to differ, and more importantly, to accommodate clustered data with discrete covariates. The methods can generate smooth ROC curves that satisfy the inherent continuous property of the true underlying curve. The least squares methods are shown to be more efficient than the existing nonparametric ROC methods under appropriate model assumptions in simulation studies. We apply the methods to a real example in the detection of glaucomatous deterioration. We also derive the asymptotic properties of the proposed methods. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Assessment of parametric uncertainty for groundwater reactive transport modeling,
Shi, Xiaoqing; Ye, Ming; Curtis, Gary P.; Miller, Geoffery L.; Meyer, Philip D.; Kohler, Matthias; Yabusaki, Steve; Wu, Jichun
2014-01-01
The validity of using Gaussian assumptions for model residuals in uncertainty quantification of a groundwater reactive transport model was evaluated in this study. Least squares regression methods explicitly assume Gaussian residuals, and the assumption leads to Gaussian likelihood functions, model parameters, and model predictions. While the Bayesian methods do not explicitly require the Gaussian assumption, Gaussian residuals are widely used. This paper shows that the residuals of the reactive transport model are non-Gaussian, heteroscedastic, and correlated in time; characterizing them requires using a generalized likelihood function such as the formal generalized likelihood function developed by Schoups and Vrugt (2010). For the surface complexation model considered in this study for simulating uranium reactive transport in groundwater, parametric uncertainty is quantified using the least squares regression methods and Bayesian methods with both Gaussian and formal generalized likelihood functions. While the least squares methods and Bayesian methods with Gaussian likelihood function produce similar Gaussian parameter distributions, the parameter distributions of Bayesian uncertainty quantification using the formal generalized likelihood function are non-Gaussian. In addition, predictive performance of formal generalized likelihood function is superior to that of least squares regression and Bayesian methods with Gaussian likelihood function. The Bayesian uncertainty quantification is conducted using the differential evolution adaptive metropolis (DREAM(zs)) algorithm; as a Markov chain Monte Carlo (MCMC) method, it is a robust tool for quantifying uncertainty in groundwater reactive transport models. For the surface complexation model, the regression-based local sensitivity analysis and Morris- and DREAM(ZS)-based global sensitivity analysis yield almost identical ranking of parameter importance. The uncertainty analysis may help select appropriate likelihood functions, improve model calibration, and reduce predictive uncertainty in other groundwater reactive transport and environmental modeling.
L.R. Grosenbaugh
1967-01-01
Describes an expansible computerized system that provides data needed in regression or covariance analysis of as many as 50 variables, 8 of which may be dependent. Alternatively, it can screen variously generated combinations of independent variables to find the regression with the smallest mean-squared-residual, which will be fitted if desired. The user can easily...
Smeers, Inge; Decorte, Ronny; Van de Voorde, Wim; Bekaert, Bram
2018-05-01
DNA methylation is a promising biomarker for forensic age prediction. A challenge that has emerged in recent studies is the fact that prediction errors become larger with increasing age due to interindividual differences in epigenetic ageing rates. This phenomenon of non-constant variance or heteroscedasticity violates an assumption of the often used method of ordinary least squares (OLS) regression. The aim of this study was to evaluate alternative statistical methods that do take heteroscedasticity into account in order to provide more accurate, age-dependent prediction intervals. A weighted least squares (WLS) regression is proposed as well as a quantile regression model. Their performances were compared against an OLS regression model based on the same dataset. Both models provided age-dependent prediction intervals which account for the increasing variance with age, but WLS regression performed better in terms of success rate in the current dataset. However, quantile regression might be a preferred method when dealing with a variance that is not only non-constant, but also not normally distributed. Ultimately the choice of which model to use should depend on the observed characteristics of the data. Copyright © 2018 Elsevier B.V. All rights reserved.
Production of deerbrush and mountain whitethorn related to shrub volume and overstory crown closure
John G. Kie
1985-01-01
Annual production by deerbrush (Ceanothus integerrimus) and mountain whitethorn shrubs (C. cordulatus) in the south-central Sierra Nevada of California was related to shrub volume, volume squared, and overstory crown closure by regression models. production increased as shrub volume and volume squared increased, and decreased as...
Post-processing through linear regression
NASA Astrophysics Data System (ADS)
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
NASA Technical Reports Server (NTRS)
Wilson, Edward (Inventor)
2006-01-01
The present invention is a method for identifying unknown parameters in a system having a set of governing equations describing its behavior that cannot be put into regression form with the unknown parameters linearly represented. In this method, the vector of unknown parameters is segmented into a plurality of groups where each individual group of unknown parameters may be isolated linearly by manipulation of said equations. Multiple concurrent and independent recursive least squares identification of each said group run, treating other unknown parameters appearing in their regression equation as if they were known perfectly, with said values provided by recursive least squares estimation from the other groups, thereby enabling the use of fast, compact, efficient linear algorithms to solve problems that would otherwise require nonlinear solution approaches. This invention is presented with application to identification of mass and thruster properties for a thruster-controlled spacecraft.
Regression Analysis: Instructional Resource for Cost/Managerial Accounting
ERIC Educational Resources Information Center
Stout, David E.
2015-01-01
This paper describes a classroom-tested instructional resource, grounded in principles of active learning and a constructivism, that embraces two primary objectives: "demystify" for accounting students technical material from statistics regarding ordinary least-squares (OLS) regression analysis--material that students may find obscure or…
Demidenko, Eugene
2017-09-01
The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.
Vindimian, Éric; Garric, Jeanne; Flammarion, Patrick; Thybaud, Éric; Babut, Marc
1999-10-01
The evaluation of the ecotoxicity of effluents requires a battery of biological tests on several species. In order to derive a summary parameter from such a battery, a single endpoint was calculated for all the tests: the EC10, obtained by nonlinear regression, with bootstrap evaluation of the confidence intervals. Principal component analysis was used to characterize and visualize the correlation between the tests. The table of the toxicity of the effluents was then submitted to a panel of experts, who classified the effluents according to the test results. Partial least squares (PLS) regression was used to fit the average value of the experts' judgements to the toxicity data, using a simple equation. Furthermore, PLS regression on partial data sets and other considerations resulted in an optimum battery, with two chronic tests and one acute test. The index is intended to be used for the classification of effluents based on their toxicity to aquatic species. Copyright © 1999 SETAC.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vindimian, E.; Garric, J.; Flammarion, P.
1999-10-01
The evaluation of the ecotoxicity of effluents requires a battery of biological tests on several species. In order to derive a summary parameter from such a battery, a single endpoint was calculated for all the tests: the EC10, obtained by nonlinear regression, with bootstrap evaluation of the confidence intervals. Principal component analysis was used to characterize and visualize the correlation between the tests. The table of the toxicity of the effluents was then submitted to a panel of experts, who classified the effluents according to the test results. Partial least squares (PLS) regression was used to fit the average valuemore » of the experts' judgments to the toxicity data, using a simple equation. Furthermore, PLS regression on partial data sets and other considerations resulted in an optimum battery, with two chronic tests and one acute test. The index is intended to be used for the classification of effluents based on their toxicity to aquatic species.« less
NASA Astrophysics Data System (ADS)
Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.
2009-08-01
In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
Quantile regression applied to spectral distance decay
Rocchini, D.; Cade, B.S.
2008-01-01
Remotely sensed imagery has long been recognized as a powerful support for characterizing and estimating biodiversity. Spectral distance among sites has proven to be a powerful approach for detecting species composition variability. Regression analysis of species similarity versus spectral distance allows us to quantitatively estimate the amount of turnover in species composition with respect to spectral and ecological variability. In classical regression analysis, the residual sum of squares is minimized for the mean of the dependent variable distribution. However, many ecological data sets are characterized by a high number of zeroes that add noise to the regression model. Quantile regressions can be used to evaluate trend in the upper quantiles rather than a mean trend across the whole distribution of the dependent variable. In this letter, we used ordinary least squares (OLS) and quantile regressions to estimate the decay of species similarity versus spectral distance. The achieved decay rates were statistically nonzero (p < 0.01), considering both OLS and quantile regressions. Nonetheless, the OLS regression estimate of the mean decay rate was only half the decay rate indicated by the upper quantiles. Moreover, the intercept value, representing the similarity reached when the spectral distance approaches zero, was very low compared with the intercepts of the upper quantiles, which detected high species similarity when habitats are more similar. In this letter, we demonstrated the power of using quantile regressions applied to spectral distance decay to reveal species diversity patterns otherwise lost or underestimated by OLS regression. ?? 2008 IEEE.
imDEV: a graphical user interface to R multivariate analysis tools in Microsoft Excel
Grapov, Dmitry; Newman, John W.
2012-01-01
Summary: Interactive modules for Data Exploration and Visualization (imDEV) is a Microsoft Excel spreadsheet embedded application providing an integrated environment for the analysis of omics data through a user-friendly interface. Individual modules enables interactive and dynamic analyses of large data by interfacing R's multivariate statistics and highly customizable visualizations with the spreadsheet environment, aiding robust inferences and generating information-rich data visualizations. This tool provides access to multiple comparisons with false discovery correction, hierarchical clustering, principal and independent component analyses, partial least squares regression and discriminant analysis, through an intuitive interface for creating high-quality two- and a three-dimensional visualizations including scatter plot matrices, distribution plots, dendrograms, heat maps, biplots, trellis biplots and correlation networks. Availability and implementation: Freely available for download at http://sourceforge.net/projects/imdev/. Implemented in R and VBA and supported by Microsoft Excel (2003, 2007 and 2010). Contact: John.Newman@ars.usda.gov Supplementary Information: Installation instructions, tutorials and users manual are available at http://sourceforge.net/projects/imdev/. PMID:22815358
Smith, Alan D
2008-01-01
The marketability and viability of biometric technologies by companies marketing their own versions of pre-approved registered travel programmes have generated a number of controversies. Data were collected and analysed to formulate graphs, run regression and correlation analyses, and use Chi-square to formally test basic research propositions on a sample of 241 professionals in the Pittsburgh area. It was found that there was a significant relationship between the respondents' familiarity with new technology (namely web-enabled and internet sophistication) and knowledge of biometrics, in particular iris scans. Participants who frequently use the internet are more comfortable with innovative technology; although individuals with higher income levels have less trust in the government, it appeared that virtually everyone is concerned about trusting the government with their personal information. Healthcare professionals need to document the safety, CRM-related factors, and provide leadership in the international collaboration of biometric-related personal identification technologies, since they will be one of the main beneficiaries of the implementation of such technologies.
Correlation of track irregularities and vehicle responses based on measured data
NASA Astrophysics Data System (ADS)
Karis, Tomas; Berg, Mats; Stichel, Sebastian; Li, Martin; Thomas, Dirk; Dirks, Babette
2018-06-01
Track geometry quality and dynamic vehicle response are closely related, but do not always correspond with each other in terms of maximum values and standard deviations. This can often be seen to give poor results in analyses with correlation coefficients or regression analysis. Measured data from both the EU project DynoTRAIN and the Swedish Green Train (Gröna Tåget) research programme is used in this paper to evaluate track-vehicle response for three vehicles. A single degree of freedom model is used as an inspiration to divide track-vehicle interaction into three parts, which are analysed in terms of correlation. One part, the vertical axle box acceleration divided by vehicle speed squared (?) and the second spatial derivative of the vertical track irregularities (?), is shown to be the weak link with lower correlation coefficients than the other parts. Future efforts should therefore be directed towards investigating the relation between axle box accelerations and track irregularity second derivatives.
Physical activity and depressive symptoms in four ethnic groups of midlife women.
Im, Eun-Ok; Ham, Ok Kyung; Chee, Eunice; Chee, Wonshik
2015-06-01
The purpose of this study was to determine the associations between physical activity and depression and the multiple contextual factors influencing these associations in four major ethnic groups of midlife women in the United States. This was a secondary analysis of the data from 542 midlife women. The instruments included questions on background characteristics and health and menopausal status; the Depression Index for Midlife Women (DIMW); and the Kaiser Physical Activity Survey (KPAS). The data were analyzed using chi-square tests, the ANOVA, two-way ANOVA, correlation analyses, and hierarchical multiple regression analyses. The women's depressive symptoms were negatively correlated with active living and sports/exercise physical activities whereas they were positively correlated with occupational physical activities (p < .01). Family income was the strongest predictor of their depressive symptoms. Increasing physical activity may improve midlife women's depressive symptoms, but the types of physical activity and multiple contextual factors need to be considered in intervention development. © The Author(s) 2014.
Allegrini, Franco; Braga, Jez W B; Moreira, Alessandro C O; Olivieri, Alejandro C
2018-06-29
A new multivariate regression model, named Error Covariance Penalized Regression (ECPR) is presented. Following a penalized regression strategy, the proposed model incorporates information about the measurement error structure of the system, using the error covariance matrix (ECM) as a penalization term. Results are reported from both simulations and experimental data based on replicate mid and near infrared (MIR and NIR) spectral measurements. The results for ECPR are better under non-iid conditions when compared with traditional first-order multivariate methods such as ridge regression (RR), principal component regression (PCR) and partial least-squares regression (PLS). Copyright © 2018 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Nordemann, D. J. R.; Rigozo, N. R.; de Souza Echer, M. P.; Echer, E.
2008-11-01
We present here an implementation of a least squares iterative regression method applied to the sine functions embedded in the principal components extracted from geophysical time series. This method seems to represent a useful improvement for the non-stationary time series periodicity quantitative analysis. The principal components determination followed by the least squares iterative regression method was implemented in an algorithm written in the Scilab (2006) language. The main result of the method is to obtain the set of sine functions embedded in the series analyzed in decreasing order of significance, from the most important ones, likely to represent the physical processes involved in the generation of the series, to the less important ones that represent noise components. Taking into account the need of a deeper knowledge of the Sun's past history and its implication to global climate change, the method was applied to the Sunspot Number series (1750-2004). With the threshold and parameter values used here, the application of the method leads to a total of 441 explicit sine functions, among which 65 were considered as being significant and were used for a reconstruction that gave a normalized mean squared error of 0.146.
On Quantile Regression in Reproducing Kernel Hilbert Spaces with Data Sparsity Constraint
Zhang, Chong; Liu, Yufeng; Wu, Yichao
2015-01-01
For spline regressions, it is well known that the choice of knots is crucial for the performance of the estimator. As a general learning framework covering the smoothing splines, learning in a Reproducing Kernel Hilbert Space (RKHS) has a similar issue. However, the selection of training data points for kernel functions in the RKHS representation has not been carefully studied in the literature. In this paper we study quantile regression as an example of learning in a RKHS. In this case, the regular squared norm penalty does not perform training data selection. We propose a data sparsity constraint that imposes thresholding on the kernel function coefficients to achieve a sparse kernel function representation. We demonstrate that the proposed data sparsity method can have competitive prediction performance for certain situations, and have comparable performance in other cases compared to that of the traditional squared norm penalty. Therefore, the data sparsity method can serve as a competitive alternative to the squared norm penalty method. Some theoretical properties of our proposed method using the data sparsity constraint are obtained. Both simulated and real data sets are used to demonstrate the usefulness of our data sparsity constraint. PMID:27134575
NASA Astrophysics Data System (ADS)
Astuti, H. N.; Saputro, D. R. S.; Susanti, Y.
2017-06-01
MGWR model is combination of linear regression model and geographically weighted regression (GWR) model, therefore, MGWR model could produce parameter estimation that had global parameter estimation, and other parameter that had local parameter in accordance with its observation location. The linkage between locations of the observations expressed in specific weighting that is adaptive bi-square. In this research, we applied MGWR model with weighted adaptive bi-square for case of DHF in Surakarta based on 10 factors (variables) that is supposed to influence the number of people with DHF. The observation unit in the research is 51 urban villages and the variables are number of inhabitants, number of houses, house index, many public places, number of healthy homes, number of Posyandu, area width, level population density, welfare of the family, and high-region. Based on this research, we obtained 51 MGWR models. The MGWR model were divided into 4 groups with significant variable is house index as a global variable, an area width as a local variable and the remaining variables vary in each. Global variables are variables that significantly affect all locations, while local variables are variables that significantly affect a specific location.
Ordinary least squares regression is indicated for studies of allometry.
Kilmer, J T; Rodríguez, R L
2017-01-01
When it comes to fitting simple allometric slopes through measurement data, evolutionary biologists have been torn between regression methods. On the one hand, there is the ordinary least squares (OLS) regression, which is commonly used across many disciplines of biology to fit lines through data, but which has a reputation for underestimating slopes when measurement error is present. On the other hand, there is the reduced major axis (RMA) regression, which is often recommended as a substitute for OLS regression in studies of allometry, but which has several weaknesses of its own. Here, we review statistical theory as it applies to evolutionary biology and studies of allometry. We point out that the concerns that arise from measurement error for OLS regression are small and straightforward to deal with, whereas RMA has several key properties that make it unfit for use in the field of allometry. The recommended approach for researchers interested in allometry is to use OLS regression on measurements taken with low (but realistically achievable) measurement error. If measurement error is unavoidable and relatively large, it is preferable to correct for slope attenuation rather than to turn to RMA regression, or to take the expected amount of attenuation into account when interpreting the data. © 2016 European Society For Evolutionary Biology. Journal of Evolutionary Biology © 2016 European Society For Evolutionary Biology.
Naguib, Ibrahim A; Abdelrahman, Maha M; El Ghobashy, Mohamed R; Ali, Nesma A
2016-01-01
Two accurate, sensitive, and selective stability-indicating methods are developed and validated for simultaneous quantitative determination of agomelatine (AGM) and its forced degradation products (Deg I and Deg II), whether in pure forms or in pharmaceutical formulations. Partial least-squares regression (PLSR) and spectral residual augmented classical least-squares (SRACLS) are two chemometric models that are being subjected to a comparative study through handling UV spectral data in range (215-350 nm). For proper analysis, a three-factor, four-level experimental design was established, resulting in a training set consisting of 16 mixtures containing different ratios of interfering species. An independent test set consisting of eight mixtures was used to validate the prediction ability of the suggested models. The results presented indicate the ability of mentioned multivariate calibration models to analyze AGM, Deg I, and Deg II with high selectivity and accuracy. The analysis results of the pharmaceutical formulations were statistically compared to the reference HPLC method, with no significant differences observed regarding accuracy and precision. The SRACLS model gives comparable results to the PLSR model; however, it keeps the qualitative spectral information of the classical least-squares algorithm for analyzed components.
Quality of semen: a 6-year single experience study on 5680 patients.
Cozzolino, Mauro; Coccia, Maria E; Picone, Rita
2018-02-08
The aim of our study was to evaluate the quality of semen of a large sample from general healthy population living in Italy, in order to identify possible variables that could influence several parameters of spermiogram. We conducted a cross-sectional study from February 2010 to March 2015, collecting semen samples from the general population. Semen analysis was performed according to the WHO guidelines. The collected data were inserted in a database and processed using the software Stata 12. The Mann - Whitney test was used to assess the relationship of dichotomus variables with the parameters of the spermiogram; Kruskal-Wallis test for variables with more than two categories. We used also Robust regression and Spearman correlation to analyze the relationship between age and the parameters. We collected 5680 samples of semen. The mean age of our patients was 41.4 years old. Mann-Whitney test showed that the citizenship (codified as "Italian/Foreign") influences some parameters: pH, vitality, number of spermatozoa, sperm concentration, with worse results for the Italian group. Kruskal-Wallis test showed that the single nationality influences pH, volume, Sperm motility A-B-C-D, vitality, morphology, number of spermatozoa, sperm concentration. Robust regression showed a relationship between age and several parameters: volume (p=0.04, R squared= 0.0007 β: - 0.06); sperm motility A (p<0.01; R squared 0.0051 β: 0.02); sperm motility B (p<0.01; R squared 0.02 β: -0.35); sperm motility C (p<0.01; R squared 0.01 β: 0.12); sperm motility D (p<0.01; R squared 0.006 β: 0.2); vitality (p<0.01; R squared 0.01 β: -0.32); sperm concentration (p=0.01; R squared 0.001 β: 0.19). Our patients had spermiogram's results quite better than the standard guidelines. Our study showed that the country of origin could be a factor influencing several parameters of the spermiogram in healthy population and through Robust regression confirmed a strict correlation between age and these parameters.
Use of Empirical Estimates of Shrinkage in Multiple Regression: A Caution.
ERIC Educational Resources Information Center
Kromrey, Jeffrey D.; Hines, Constance V.
1995-01-01
The accuracy of four empirical techniques to estimate shrinkage in multiple regression was studied through Monte Carlo simulation. None of the techniques provided unbiased estimates of the population squared multiple correlation coefficient, but the normalized jackknife and bootstrap techniques demonstrated marginally acceptable performance with…
Enhance-Synergism and Suppression Effects in Multiple Regression
ERIC Educational Resources Information Center
Lipovetsky, Stan; Conklin, W. Michael
2004-01-01
Relations between pairwise correlations and the coefficient of multiple determination in regression analysis are considered. The conditions for the occurrence of enhance-synergism and suppression effects when multiple determination becomes bigger than the total of squared correlations of the dependent variable with the regressors are discussed. It…
Method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1972-01-01
Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
Gearing, Robin E; Schwalbe, Craig S J; Lee, RaeHyuck; Hoagwood, Kimberly E
2013-09-01
To investigate the effects of booster sessions in cognitive behavioral therapy (CBT) for children and adolescents with mood or anxiety disorders, whereas controlling for youth demographics (e.g., gender, age), primary diagnosis, and intervention characteristics (e.g., treatment modality, number of sessions). Electronic databases were searched for CBT interventions for youth with mood and anxiety disorders. Fifty-three (k = 53) studies investigating 1,937 youth met criteria for inclusion. Booster sessions were examined using two case-controlled effect sizes: pre-post and pre-follow-up (6 months) effect sizes and employing weighted least squares (WLSs) regressions. Meta-analyses found pre-post studies with booster sessions had a larger effect size r = .58 (k = 15; 95% CI = 0.52-0.65; P < .01) than those without booster sessions r = .45 (k = 38; 95% CI = 0.41-0.49; P < .001). In the WLS regression analyses, controlling for demographic factors, primary diagnosis, and intervention characteristics, studies with booster sessions showed larger pre-post effect sizes than those without booster sessions (B = 0.13, P < .10). Similarly, pre-follow-up studies with booster sessions showed a larger effect size r = .64 (k = 10; 95% CI = 0.57-0.70; P < .10) than those without booster sessions r = .48 (k = 20; 95% CI = 0.42-0.53; P < .01). Also, in the WLS regression analyses, pre-follow-up studies showed larger effect sizes than those without booster sessions (B = 0.08, P < .01) after accounting for all control variables. Result suggests that CBT interventions with booster sessions are more effective and the effect is more sustainable for youth managing mood or anxiety disorders than CBT interventions without booster sessions. © 2013 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Vance, Leisha Ann
The Campus Demotechnic Index (CDI) is a normalized metric developed to provide universities with a method for tracking progress toward or retreat from sustainability in their energy consumption. The CDI is modified after the Demotechnic Index of Mata et al. (1994). CDI values assess the total campus energy consumed against the total energy required to meet the campus population's basal metabolism. Like the D-Index, the CDI is thus a measure of the scalar quantity of energy consumed in excess of the quantity of energy required for simple survival on a per capita basis. For this research, data were collected from an on-line survey designed for U.S. colleges and universities, which requested information related to campus demographics and campus built and mobile environmental energy consumption. Data were requested for the years of 2000 to 2005. Wilcoxon signed rank test analyses were conducted to determine if CDI values significantly increased over time. ANOVAs, GLMs, correlations and regressions were conducted to determine if climate and campus size significantly influenced CDI. ANOVAs, correlations and regressions were conducted to determine the effect of acreage on mobile fuel consumption and to ascertain whether differing proportions between the built and mobile environments significantly influenced CDI values. Correlations and regressions were carried out to which variables best predicted CDI, and cluster analyses were conducted to find out if any significant groups existed based on CDI values, fossil fuel consumption and population per square foot. The knowledge gained from results of these analyses not only provides a depiction of campus energy consumption, but also puts campus energy consumption into context in that CDI scores allow peer institutional comparisons. Awareness of factors that contribute to campus energy use (and CDI ranks) could also facilitate prioritization of sustainability-related issues, as well as the design and establishment of sustainable management systems.
Chen, Yi; Pouillot, Régis; S Burall, Laurel; Strain, Errol A; Van Doren, Jane M; De Jesus, Antonio J; Laasri, Anna; Wang, Hua; Ali, Laila; Tatavarthy, Aparna; Zhang, Guodong; Hu, Lijun; Day, James; Sheth, Ishani; Kang, Jihun; Sahu, Surasri; Srinivasan, Devayani; Brown, Eric W; Parish, Mickey; Zink, Donald L; Datta, Atin R; Hammack, Thomas S; Macarisin, Dumitru
2017-01-16
A precise and accurate method for enumeration of low level of Listeria monocytogenes in foods is critical to a variety of studies. In this study, paired comparison of most probable number (MPN) and direct plating enumeration of L. monocytogenes was conducted on a total of 1730 outbreak-associated ice cream samples that were naturally contaminated with low level of L. monocytogenes. MPN was performed on all 1730 samples. Direct plating was performed on all samples using the RAPID'L.mono (RLM) agar (1600 samples) and agar Listeria Ottaviani and Agosti (ALOA; 130 samples). Probabilistic analysis with Bayesian inference model was used to compare paired direct plating and MPN estimates of L. monocytogenes in ice cream samples because assumptions implicit in ordinary least squares (OLS) linear regression analyses were not met for such a comparison. The probabilistic analysis revealed good agreement between the MPN and direct plating estimates, and this agreement showed that the MPN schemes and direct plating schemes using ALOA or RLM evaluated in the present study were suitable for enumerating low levels of L. monocytogenes in these ice cream samples. The statistical analysis further revealed that OLS linear regression analyses of direct plating and MPN data did introduce bias that incorrectly characterized systematic differences between estimates from the two methods. Published by Elsevier B.V.
Lago-Ballesteros, Joaquin; Lago-Peñas, Carlos; Rey, Ezequiel
2012-01-01
The aim of this study was to analyse the influence of playing tactics, opponent interaction and situational variables on achieving score-box possessions in professional soccer. The sample was constituted by 908 possessions obtained by a team from the Spanish soccer league in 12 matches played during the 2009-2010 season. Multidimensional qualitative data obtained from 12 ordered categorical variables were used. Sampled matches were registered by the AMISCO PRO system. Data were analysed using chi-square analysis and multiple logistic regression analysis. Of 908 possessions, 303 (33.4%) produced score-box possessions, 477 (52.5%) achieved progression and 128 (14.1%) failed to reach any sort of progression. Multiple logistic regression showed that, for the main variable "team possession type", direct attacks and counterattacks were three times more effective than elaborate attacks for producing a score-box possession (P < 0.05). Team possession originating from the middle zones and playing against less than six defending players (P < 0.001) registered a higher success than those started in the defensive zone with a balanced defence. When the team was drawing or winning, the probability of reaching the score-box decreased by 43 and 53 percent, respectively, compared with the losing situation (P < 0.05). Accounting for opponent interactions and situational variables is critical to evaluate the effectiveness of offensive playing tactics on producing score-box possessions.
NASA Astrophysics Data System (ADS)
Deng, Chengbin; Wu, Changshan
2013-12-01
Urban impervious surface information is essential for urban and environmental applications at the regional/national scales. As a popular image processing technique, spectral mixture analysis (SMA) has rarely been applied to coarse-resolution imagery due to the difficulty of deriving endmember spectra using traditional endmember selection methods, particularly within heterogeneous urban environments. To address this problem, we derived endmember signatures through a least squares solution (LSS) technique with known abundances of sample pixels, and integrated these endmember signatures into SMA for mapping large-scale impervious surface fraction. In addition, with the same sample set, we carried out objective comparative analyses among SMA (i.e. fully constrained and unconstrained SMA) and machine learning (i.e. Cubist regression tree and Random Forests) techniques. Analysis of results suggests three major conclusions. First, with the extrapolated endmember spectra from stratified random training samples, the SMA approaches performed relatively well, as indicated by small MAE values. Second, Random Forests yields more reliable results than Cubist regression tree, and its accuracy is improved with increased sample sizes. Finally, comparative analyses suggest a tentative guide for selecting an optimal approach for large-scale fractional imperviousness estimation: unconstrained SMA might be a favorable option with a small number of samples, while Random Forests might be preferred if a large number of samples are available.
Gonçalves, Iara; Linhares, Marcelo; Bordin, Jose; Matos, Delcio
2009-01-01
Identification of risk factors for requiring transfusions during surgery for colorectal cancer may lead to preventive actions or alternative measures, towards decreasing the use of blood components in these procedures, and also rationalization of resources use in hemotherapy services. This was a retrospective case-control study using data from 383 patients who were treated surgically for colorectal adenocarcinoma at 'Fundação Pio XII', in Barretos-SP, Brazil, between 1999 and 2003. To recognize significant risk factors for requiring intraoperative blood transfusion in colorectal cancer surgical procedures. Univariate analyses were performed using Fisher's exact test or the chi-squared test for dichotomous variables and Student's t test for continuous variables, followed by multivariate analysis using multiple logistic regression. In the univariate analyses, height (P = 0.06), glycemia (P = 0.05), previous abdominal or pelvic surgery (P = 0.031), abdominoperineal surgery (P<0.001), extended surgery (P<0.001) and intervention with radical intent (P<0.001) were considered significant. In the multivariate analysis using logistic regression, intervention with radical intent (OR = 10.249, P<0.001, 95% CI = 3.071-34.212) and abdominoperineal amputation (OR = 3.096, P = 0.04, 95% CI = 1.445-6.623) were considered to be independently significant. This investigation allows the conclusion that radical intervention and the abdominoperineal procedure in the surgical treatment of colorectal adenocarcinoma are risk factors for requiring intraoperative blood transfusion.
Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N
2017-07-01
Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant role in the calibration while wavelength selection plays a marginal role and the combination of certain pre-processing, wavelength selection, and nonlinear regression methods can achieve superior performance over traditional linear regression-based calibration.
The study on the near infrared spectrum technology of sauce component analysis
NASA Astrophysics Data System (ADS)
Li, Shangyu; Zhang, Jun; Chen, Xingdan; Liang, Jingqiu; Wang, Ce
2006-01-01
The author, Shangyu Li, engages in supervising and inspecting the quality of products. In soy sauce manufacturing, quality control of intermediate and final products by many components such as total nitrogen, saltless soluble solids, nitrogen of amino acids and total acid is demanded. Wet chemistry analytical methods need much labor and time for these analyses. In order to compensate for this problem, we used near infrared spectroscopy technology to measure the chemical-composition of soy sauce. In the course of the work, a certain amount of soy sauce was collected and was analyzed by wet chemistry analytical methods. The soy sauce was scanned by two kinds of the spectrometer, the Fourier Transform near infrared spectrometer (FT-NIR spectrometer) and the filter near infrared spectroscopy analyzer. The near infrared spectroscopy of soy sauce was calibrated with the components of wet chemistry methods by partial least squares regression and stepwise multiple linear regression. The contents of saltless soluble solids, total nitrogen, total acid and nitrogen of amino acids were predicted by cross validation. The results are compared with the wet chemistry analytical methods. The correlation coefficient and root-mean-square error of prediction (RMSEP) in the better prediction run were found to be 0.961 and 0.206 for total nitrogen, 0.913 and 1.215 for saltless soluble solids, 0.855 and 0.199 nitrogen of amino acids, 0.966 and 0.231 for total acid, respectively. The results presented here demonstrate that the NIR spectroscopy technology is promising for fast and reliable determination of major components of soy sauce.
Park, Young-Yoon; Jeong, Young-Jin; Lee, Junyong; Moon, Nayun; Bang, Inho; Kim, Hyunju; Yun, Kyung-Sook; Kim, Yong-I; Jeon, Tae-Hee
2018-01-01
This study investigated the effect of family members on terminally ill cancer patients by measuring the relationship of the presence of the family caregivers, visiting time by family and friends, and family adaptability and cohesion with patient's anxiety and depression. From June, 2016 to March, 2017, 100 terminally ill cancer patients who were admitted to a palliative care unit in Seoul, South Korea, were surveyed, and their medical records were reviewed. The Korean version of the Family Adaptability and Cohesion Evaluation Scales III and Hospital Anxiety-Depression Scale was used. Chi-square and multiple logistic regression analyses were used. The results of the chi-square analysis showed that the presence of family caregivers and family visit times did not have statistically significant effects on anxiety and depression in terminally ill cancer patients. In multiple logistic regression, when adjusted for age, sex, ECOG PS, and the monthly average income, the odds ratios (ORs) of the low family adaptability to anxiety and depression were 2.4 (1.03-5.83) and 5.4 (1.10-26.87), respectively. The OR of low family cohesion for depression was 5.4 (1.10-27.20) when adjusted for age, sex, ECOG PS, and monthly average household income. A higher family adaptability resulted in a lower degree of anxiety and depression in terminally ill cancer patients. The higher the family cohesion, the lower the degree of depression in the patient. The presence of the family caregiver and the visiting time by family and friends did not affect the patient's anxiety and depression.
Inverse models: A necessary next step in ground-water modeling
Poeter, E.P.; Hill, M.C.
1997-01-01
Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares repression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.Inverse models using, for example, nonlinear least-squares regression, provide capabilities that help modelers take full advantage of the insight available from ground-water models. However, lack of information about the requirements and benefits of inverse models is an obstacle to their widespread use. This paper presents a simple ground-water flow problem to illustrate the requirements and benefits of the nonlinear least-squares regression method of inverse modeling and discusses how these attributes apply to field problems. The benefits of inverse modeling include: (1) expedited determination of best fit parameter values; (2) quantification of the (a) quality of calibration, (b) data shortcomings and needs, and (c) confidence limits on parameter estimates and predictions; and (3) identification of issues that are easily overlooked during nonautomated calibration.
The Impact of School Socioeconomic Status on Student-Generated Teacher Ratings
ERIC Educational Resources Information Center
Agnew, Steve
2011-01-01
This paper uses ordinary least squares, logit and probit regressions, along with chi-square analysis applied to nationwide data from the New Zealand ratemyteacher website to establish if there is any correlation between student ratings of their teachers and the socioeconomic status of the school the students attend. The results show that students…
Jiménez-González, Marco A; Álvarez, Ana M; Carral, Pilar; González-Vila, Francisco J; Almendros, Gonzalo
2017-07-28
The variable extent to which environmental factors are involved in soil carbon storage is currently a subject of controversy. In fact, justifying why some soils accumulate more organic matter than others is not trivial. Some abiotic factors such as organo-mineral associations have classically been invoked as the main drivers for soil C stabilization. However, in this research indirect evidences based on correlations between soil C storage and compositional descriptors of the soil organic matter are presented. It is assumed that the intrinsic structure of soil organic matter should have a bearing in the soil carbon storage. This is examined here by focusing on the methoxyphenols released by direct pyrolysis from a wide variety of topsoil samples from continental Mediterranean ecosystems from Spain with different properties and carbon content. Methoxyphenols are typical signature compounds presumptively informing on the occurrence and degree of alteration of lignin in soils. The methoxyphenol assemblages (12 major guaiacyl- and syringyl-type compounds) were analyzed by pyrolysis-gas chromatography-mass spectrometry. The Shannon-Wiener diversity index was chosen to describe the complexity of this phenolic signature. A series of exploratory statistical analyses (simple regression, partial least squares regression, multidimensional scaling) were applied to analyze the relationships existing between chemical and spectroscopic characteristics and the carbon content in the soils. These treatments coincided in pointing out that significant correlations exist between the progressive molecular diversity of the methoxyphenol assemblages and the concentration of organic carbon stored in the corresponding soils. This potential of the diversity in the phenolic signature as a surrogate index of the carbon storage in soils is tentatively interpreted as the accumulation of plant macromolecules altered into microbially reworked structures not readily recognized by soil enzymes. From a quantitative viewpoint, the partial least squares regression models exclusively based on total abundances of the 12 major methoxyphenols were especially successful in forecasting soil carbon storage. Copyright © 2017 Elsevier B.V. All rights reserved.
Vilar-Compte, Mireya; Sandoval-Olascoaga, Sebastian; Bernal-Stuart, Ana; Shimoga, Sandhya; Vargas-Bustamante, Arturo
2015-01-01
Objective The present paper investigated the impact of the 2008 financial crisis on food security in Mexico and how it disproportionally affected vulnerable households. Design A generalized ordered logistic regression was estimated to assess the impact of the crisis on households’ food security status. An ordinary least squares and a quantile regression were estimated to evaluate the effect of the financial crisis on a continuous proxy measure of food security defined as the share of a household’s current income devoted to food expenditures. Setting Both analyses were performed using pooled cross-sectional data from the Mexican National Household Income and Expenditure Survey 2008 and 2010. Subjects The analytical sample included 29 468 households in 2008 and 27 654 in 2010. Results The generalized ordered logistic model showed that the financial crisis significantly (P < 0·05) decreased the probability of being food secure, mildly or moderately food insecure, compared with being severely food insecure (OR = 0·74). A similar but smaller effect was found when comparing severely and moderately food-insecure households with mildly food-insecure and food-secure households (OR = 0·81). The ordinary least squares model showed that the crisis significantly (P < 0·05) increased the share of total income spent on food (β coefficient of 0·02). The quantile regression confirmed the findings suggested by the generalized ordered logistic model, showing that the effects of the crisis were more profound among poorer households. Conclusions The results suggest that households that were more vulnerable before the financial crisis saw a worsened effect in terms of food insecurity with the crisis. Findings were consistent with both measures of food security – one based on self-reported experience and the other based on food spending. PMID:25428800
ERIC Educational Resources Information Center
Seah, Rebecca; Horne, Marj; Berenger, Adrian
2016-01-01
This study surveyed and analysed four secondary school students' writing about a square. Sfard's discursive approach to understanding mathematical discourse was used to analyse the responses collected from 214 Australian secondary school students. The results showed that geometric knowledge was developed experientially and not developmentally.…
Aryee, Samuel; Walumbwa, Fred O; Seidu, Emmanuel Y M; Otaye, Lilian E
2012-03-01
We proposed and tested a multilevel model, underpinned by empowerment theory, that examines the processes linking high-performance work systems (HPWS) and performance outcomes at the individual and organizational levels of analyses. Data were obtained from 37 branches of 2 banking institutions in Ghana. Results of hierarchical regression analysis revealed that branch-level HPWS relates to empowerment climate. Additionally, results of hierarchical linear modeling that examined the hypothesized cross-level relationships revealed 3 salient findings. First, experienced HPWS and empowerment climate partially mediate the influence of branch-level HPWS on psychological empowerment. Second, psychological empowerment partially mediates the influence of empowerment climate and experienced HPWS on service performance. Third, service orientation moderates the psychological empowerment-service performance relationship such that the relationship is stronger for those high rather than low in service orientation. Last, ordinary least squares regression results revealed that branch-level HPWS influences branch-level market performance through cross-level and individual-level influences on service performance that emerges at the branch level as aggregated service performance.
Peak-flow characteristics of Wyoming streams
Miller, Kirk A.
2003-01-01
Peak-flow characteristics for unregulated streams in Wyoming are described in this report. Frequency relations for annual peak flows through water year 2000 at 364 streamflow-gaging stations in and near Wyoming were evaluated and revised or updated as needed. Analyses of historical floods, temporal trends, and generalized skew were included in the evaluation. Physical and climatic basin characteristics were determined for each gaging station using a geographic information system. Gaging stations with similar peak-flow and basin characteristics were grouped into six hydrologic regions. Regional statistical relations between peak-flow and basin characteristics were explored using multiple-regression techniques. Generalized least squares regression equations for estimating magnitudes of annual peak flows with selected recurrence intervals from 1.5 to 500 years were developed for each region. Average standard errors of estimate range from 34 to 131 percent. Average standard errors of prediction range from 35 to 135 percent. Several statistics for evaluating and comparing the errors in these estimates are described. Limitations of the equations are described. Methods for applying the regional equations for various circumstances are listed and examples are given.
McNamara, Tay K; Pitt-Catsouphes, Marcie; Matz-Costa, Christina; Brown, Melissa; Valcour, Monique
2013-03-01
This study investigated the association between hours worked per week and satisfaction with work-family balance, using data from a 2007-2008 survey of employees nested within organizations. We tested hypotheses informed by the resource drain and resources-and-demands perspectives using quantile regression. We found that the negative association between hours worked per week and satisfaction with work-family balance was significantly stronger at the 25th percentile, as compared to at the 75th percentile, of satisfaction with work-family balance. Further, there was some evidence that perceived flexibility-fit (i.e., the fit between worker needs and flexible work options available) and supportive work-family culture attenuated the relationship between hours worked and satisfaction with work-family balance. The results suggest that analyses focusing on the average relationship between long work hours (such as those using ordinary least squares regression) and satisfaction with work-family balance may underestimate the importance of long work hours for workers with lower satisfaction levels. Copyright © 2012 Elsevier Inc. All rights reserved.
Least-squares sequential parameter and state estimation for large space structures
NASA Technical Reports Server (NTRS)
Thau, F. E.; Eliazov, T.; Montgomery, R. C.
1982-01-01
This paper presents the formulation of simultaneous state and parameter estimation problems for flexible structures in terms of least-squares minimization problems. The approach combines an on-line order determination algorithm, with least-squares algorithms for finding estimates of modal approximation functions, modal amplitudes, and modal parameters. The approach combines previous results on separable nonlinear least squares estimation with a regression analysis formulation of the state estimation problem. The technique makes use of sequential Householder transformations. This allows for sequential accumulation of matrices required during the identification process. The technique is used to identify the modal prameters of a flexible beam.
Quantification of brain lipids by FTIR spectroscopy and partial least squares regression
NASA Astrophysics Data System (ADS)
Dreissig, Isabell; Machill, Susanne; Salzer, Reiner; Krafft, Christoph
2009-01-01
Brain tissue is characterized by high lipid content. Its content decreases and the lipid composition changes during transformation from normal brain tissue to tumors. Therefore, the analysis of brain lipids might complement the existing diagnostic tools to determine the tumor type and tumor grade. Objective of this work is to extract lipids from gray matter and white matter of porcine brain tissue, record infrared (IR) spectra of these extracts and develop a quantification model for the main lipids based on partial least squares (PLS) regression. IR spectra of the pure lipids cholesterol, cholesterol ester, phosphatidic acid, phosphatidylcholine, phosphatidylethanolamine, phosphatidylserine, phosphatidylinositol, sphingomyelin, galactocerebroside and sulfatide were used as references. Two lipid mixtures were prepared for training and validation of the quantification model. The composition of lipid extracts that were predicted by the PLS regression of IR spectra was compared with lipid quantification by thin layer chromatography.
NASA Astrophysics Data System (ADS)
Chen, Hua-cai; Chen, Xing-dan; Lu, Yong-jun; Cao, Zhi-qiang
2006-01-01
Near infrared (NIR) reflectance spectroscopy was used to develop a fast determination method for total ginsenosides in Ginseng (Panax Ginseng) powder. The spectra were analyzed with multiplicative signal correction (MSC) correlation method. The best correlative spectra region with the total ginsenosides content was 1660 nm~1880 nm and 2230nm~2380 nm. The NIR calibration models of ginsenosides were built with multiple linear regression (MLR), principle component regression (PCR) and partial least squares (PLS) regression respectively. The results showed that the calibration model built with PLS combined with MSC and the optimal spectrum region was the best one. The correlation coefficient and the root mean square error of correction validation (RMSEC) of the best calibration model were 0.98 and 0.15% respectively. The optimal spectrum region for calibration was 1204nm~2014nm. The result suggested that using NIR to rapidly determinate the total ginsenosides content in ginseng powder were feasible.
Hypothesis Testing Using Factor Score Regression
Devlieger, Ines; Mayer, Axel; Rosseel, Yves
2015-01-01
In this article, an overview is given of four methods to perform factor score regression (FSR), namely regression FSR, Bartlett FSR, the bias avoiding method of Skrondal and Laake, and the bias correcting method of Croon. The bias correcting method is extended to include a reliable standard error. The four methods are compared with each other and with structural equation modeling (SEM) by using analytic calculations and two Monte Carlo simulation studies to examine their finite sample characteristics. Several performance criteria are used, such as the bias using the unstandardized and standardized parameterization, efficiency, mean square error, standard error bias, type I error rate, and power. The results show that the bias correcting method, with the newly developed standard error, is the only suitable alternative for SEM. While it has a higher standard error bias than SEM, it has a comparable bias, efficiency, mean square error, power, and type I error rate. PMID:29795886
Shimizu, Yu; Yoshimoto, Junichiro; Takamura, Masahiro; Okada, Go; Okamoto, Yasumasa; Yamawaki, Shigeto; Doya, Kenji
2017-01-01
In diagnostic applications of statistical machine learning methods to brain imaging data, common problems include data high-dimensionality and co-linearity, which often cause over-fitting and instability. To overcome these problems, we applied partial least squares (PLS) regression to resting-state functional magnetic resonance imaging (rs-fMRI) data, creating a low-dimensional representation that relates symptoms to brain activity and that predicts clinical measures. Our experimental results, based upon data from clinically depressed patients and healthy controls, demonstrated that PLS and its kernel variants provided significantly better prediction of clinical measures than ordinary linear regression. Subsequent classification using predicted clinical scores distinguished depressed patients from healthy controls with 80% accuracy. Moreover, loading vectors for latent variables enabled us to identify brain regions relevant to depression, including the default mode network, the right superior frontal gyrus, and the superior motor area. PMID:28700672
Fowler, Stephanie M; Schmidt, Heinar; van de Ven, Remy; Wynn, Peter; Hopkins, David L
2014-12-01
A Raman spectroscopic hand held device was used to predict shear force (SF) of 80 fresh lamb m. longissimus lumborum (LL) at 1 and 5days post mortem (PM). Traditional predictors of SF including sarcomere length (SL), particle size (PS), cooking loss (CL), percentage myofibrillar breaks and pH were also measured. SF values were regressed against Raman spectra using partial least squares regression and against the traditional predictors using linear regression. The best prediction of shear force values used spectra at 1day PM to predict shear force at 1day which gave a root mean square error of prediction (RMSEP) of 13.6 (Null=14.0) and the R(2) between observed and cross validated predicted values was 0.06 (R(2)cv). Overall, for fresh LL, the predictability SF, by either the Raman hand held probe or traditional predictors was low. Copyright © 2014 Elsevier Ltd. All rights reserved.
Rippin, H L; Hutchinson, J; Ocke, M; Jewell, J; Breda, J J; Cade, J E
2017-01-01
Trans fatty acids (TFA) increase the risk of mortality and chronic diseases. TFA intakes have fallen since reformulation, but may still be high in certain, vulnerable, groups. This paper investigates socio-economic and food consumption characteristics of high TFA consumers after voluntary reformulation in the Netherlands and UK. Post-reformulation data of adults aged 19-64 was analysed in two national surveys: the Dutch National Food Consumption Survey (DNFCS) collected 2007-2010 using 2*24hr recalls (N = 1933) and the UK National Diet and Nutrition Survey (NDNS) years 3&4 collected 2010/11 and 2011/12 using 4-day food diaries (N = 848). The socio-economic and food consumption characteristics of the top 10% and remaining 90% TFA consumers were compared. Means of continuous data were compared using t-tests and categorical data means using chi-squared tests. Multivariate logistic regression models indicated which socio-demographic variables were associated with high TFA consumption. In the Dutch analyses, women and those born outside the Netherlands were more likely to be top 10% TFA consumers than men and Dutch-born. In the UK unadjusted analyses there was no significant trend in socio-economic characteristics between high and lower TFA consumers, but there were regional differences in the multivariate logistic regression analyses. In the Netherlands, high TFA consumers were more likely to be consumers of cakes, buns & pastries; cream; and fried potato than the remaining 90%. Whereas in the UK, high TFA consumers were more likely to be consumers of lamb; cheese; and dairy desserts and lower crisps and savoury snack consumers. Some socio-demographic differences between high and lower TFA consumers were evident post-reformulation. High TFA consumers in the Dutch 2007-10 survey appeared more likely to obtain TFA from artificial sources than those in the UK survey. Further analyses using more up-to-date food composition databases may be needed.
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
Robust regression on noisy data for fusion scaling laws
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be; Laboratoire de Physique des Plasmas de l'ERM - Laboratorium voor Plasmafysica van de KMS
2014-11-15
We introduce the method of geodesic least squares (GLS) regression for estimating fusion scaling laws. Based on straightforward principles, the method is easily implemented, yet it clearly outperforms established regression techniques, particularly in cases of significant uncertainty on both the response and predictor variables. We apply GLS for estimating the scaling of the L-H power threshold, resulting in estimates for ITER that are somewhat higher than predicted earlier.
Yobbi, D.K.
2000-01-01
A nonlinear least-squares regression technique for estimation of ground-water flow model parameters was applied to an existing model of the regional aquifer system underlying west-central Florida. The regression technique minimizes the differences between measured and simulated water levels. Regression statistics, including parameter sensitivities and correlations, were calculated for reported parameter values in the existing model. Optimal parameter values for selected hydrologic variables of interest are estimated by nonlinear regression. Optimal estimates of parameter values are about 140 times greater than and about 0.01 times less than reported values. Independently estimating all parameters by nonlinear regression was impossible, given the existing zonation structure and number of observations, because of parameter insensitivity and correlation. Although the model yields parameter values similar to those estimated by other methods and reproduces the measured water levels reasonably accurately, a simpler parameter structure should be considered. Some possible ways of improving model calibration are to: (1) modify the defined parameter-zonation structure by omitting and/or combining parameters to be estimated; (2) carefully eliminate observation data based on evidence that they are likely to be biased; (3) collect additional water-level data; (4) assign values to insensitive parameters, and (5) estimate the most sensitive parameters first, then, using the optimized values for these parameters, estimate the entire data set.
Traulsen, I; Breitenberger, S; Auer, W; Stamer, E; Müller, K; Krieter, J
2016-06-01
Lameness is an important issue in group-housed sows. Automatic detection systems are a beneficial diagnostic tool to support management. The aim of the present study was to evaluate data of a positioning system including acceleration measurements to detect lameness in group-housed sows. Data were acquired at the Futterkamp research farm from May 2012 until April 2013. In the gestation unit, 212 group-housed sows were equipped with an ear sensor to sample position and acceleration per sow and second. Three activity indices were calculated per sow and day: path length walked by a sow during the day (Path), number of squares (25×25 cm) visited during the day (Square) and variance of the acceleration measurement during the day (Acc). In addition, data on lameness treatments of the sows and a weekly lameness score were used as reference systems. To determine the influence of a lameness event, all indices were analysed in a linear random regression model. Test day, parity class and day before treatment had a significant influence on all activity indices (P<0.05). In healthy sows, indices Path and Square increased with increasing parity, whereas variance slightly decreased. The indices Path and Square showed a decreasing trend in a 14-day period before a lameness treatment and to a smaller extent before a lameness score of 2 (severe lameness). For the index acceleration, there was no obvious difference between the lame and non-lame periods. In conclusion, positioning and acceleration measurements with ear sensors can be used to describe the activity pattern of sows. However, improvements in sampling rate and analysis techniques should be made for a practical application as an automatic lameness detection system.
Eash, David A.; Barnes, Kimberlee K.; O'Shea, Padraic S.
2016-09-19
A statewide study was led to develop regression equations for estimating three selected spring and three selected fall low-flow frequency statistics for ungaged stream sites in Iowa. The estimation equations developed for the six low-flow frequency statistics include spring (April through June) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years and fall (October through December) 1-, 7-, and 30-day mean low flows for a recurrence interval of 10 years. Estimates of the three selected spring statistics are provided for 241 U.S. Geological Survey continuous-record streamgages, and estimates of the three selected fall statistics are provided for 238 of these streamgages, using data through June 2014. Because only 9 years of fall streamflow record were available, three streamgages included in the development of the spring regression equations were not included in the development of the fall regression equations. Because of regulation, diversion, or urbanization, 30 of the 241 streamgages were not included in the development of the regression equations. The study area includes Iowa and adjacent areas within 50 miles of the Iowa border. Because trend analyses indicated statistically significant positive trends when considering the period of record for most of the streamgages, the longest, most recent period of record without a significant trend was determined for each streamgage for use in the study. Geographic information system software was used to measure 63 selected basin characteristics for each of the 211streamgages used to develop the regional regression equations. The study area was divided into three low-flow regions that were defined in a previous study for the development of regional regression equations.Because several streamgages included in the development of regional regression equations have estimates of zero flow calculated from observed streamflow for selected spring and fall low-flow frequency statistics, the final equations for the three low-flow regions were developed using two types of regression analyses—left-censored and generalized-least-squares regression analyses. A total of 211 streamgages were included in the development of nine spring regression equations—three equations for each of the three low-flow regions. A total of 208 streamgages were included in the development of nine fall regression equations—three equations for each of the three low-flow regions. A censoring threshold was used to develop 15 left-censored regression equations to estimate the three fall low-flow frequency statistics for each of the three low-flow regions and to estimate the three spring low-flow frequency statistics for the southern and northwest regions. For the northeast region, generalized-least-squares regression was used to develop three equations to estimate the three spring low-flow frequency statistics. For the northeast region, average standard errors of prediction range from 32.4 to 48.4 percent for the spring equations and average standard errors of estimate range from 56.4 to 73.8 percent for the fall equations. For the northwest region, average standard errors of estimate range from 58.9 to 62.1 percent for the spring equations and from 83.2 to 109.4 percent for the fall equations. For the southern region, average standard errors of estimate range from 43.2 to 64.0 percent for the spring equations and from 78.1 to 78.7 percent for the fall equations.The regression equations are applicable only to stream sites in Iowa with low flows not substantially affected by regulation, diversion, or urbanization and with basin characteristics within the range of those used to develop the equations. The regression equations will be implemented within the U.S. Geological Survey StreamStats Web-based geographic information system application. StreamStats allows users to click on any ungaged stream site and compute estimates of the six selected spring and fall low-flow statistics; in addition, 90-percent prediction intervals and the measured basin characteristics for the ungaged site are provided. StreamStats also allows users to click on any Iowa streamgage to obtain computed estimates for the six selected spring and fall low-flow statistics.
Loch, T P; Scribner, K; Tempelman, R; Whelan, G; Faisal, M
2012-01-01
Herein, we describe the prevalence of bacterial infections in Chinook salmon, Oncorhynchus tshawytscha (Walbaum), returning to spawn in two tributaries within the Lake Michigan watershed. Ten bacterial genera, including Renibacterium, Aeromonas, Carnobacterium, Serratia, Proteus, Pseudomonas, Hafnia, Salmonella, Shewanella and Morganella, were detected in the kidneys of Chinook salmon (n = 480) using culture, serological and molecular analyses. Among these, Aeromonas salmonicida was detected at a prevalence of ∼15%. Analyses revealed significant interactions between location/time of collection and gender for these infections, whereby overall infection prevalence increased greatly later in the spawning run and was significantly higher in females. Renibacterium salmoninarum was detected in fish kidneys at an overall prevalence of >25%. Logistic regression analyses revealed that R. salmoninarum prevalence differed significantly by location/time of collection and gender, with a higher likelihood of infection later in the spawning season and in females vs. males. Chi-square analyses quantifying non-independence of infection by multiple pathogens revealed a significant association between R. salmoninarum and motile aeromonad infections. Additionally, greater numbers of fish were found to be co-infected by multiple bacterial species than would be expected by chance alone. The findings of this study suggest a potential synergism between bacteria infecting spawning Chinook salmon. © 2011 Blackwell Publishing Ltd.
Stein, Koen W H; Werner, Jan
2013-01-01
Osteocytes harbour much potential for paleobiological studies. Synchrotron radiation and spectroscopic analyses are providing fascinating data on osteocyte density, size and orientation in fossil taxa. However, such studies may be costly and time consuming. Here we describe an uncomplicated and inexpensive method to measure osteocyte lacunar densities in bone thin sections. We report on cell lacunar densities in the long bones of various extant and extinct tetrapods, with a focus on sauropodomorph dinosaurs, and how lacunar densities can help us understand bone formation rates in the iconic sauropod dinosaurs. Ordinary least square and phylogenetic generalized least square regressions suggest that sauropodomorphs have lacunar densities higher than scaled up or comparably sized mammals. We also found normal mammalian-like osteocyte densities for the extinct bovid Myotragus, questioning its crocodilian-like physiology. When accounting for body mass effects and phylogeny, growth rates are a main factor determining the density of the lacunocanalicular network. However, functional aspects most likely play an important role as well. Observed differences in cell strategies between mammals and dinosaurs likely illustrate the convergent nature of fast growing bone tissues in these groups.
NASA Astrophysics Data System (ADS)
Yan, Ling; Liu, Changhong; Qu, Hao; Liu, Wei; Zhang, Yan; Yang, Jianbo; Zheng, Lei
2018-03-01
Terahertz (THz) technique, a recently developed spectral method, has been researched and used for the rapid discrimination and measurements of food compositions due to its low-energy and non-ionizing characteristics. In this study, THz spectroscopy combined with chemometrics has been utilized for qualitative and quantitative analysis of myricetin, quercetin, and kaempferol with concentrations of 0.025, 0.05, and 0.1 mg/mL. The qualitative discrimination was achieved by KNN, ELM, and RF models with the spectra pre-treatments. An excellent discrimination (100% CCR in the prediction set) could be achieved using the RF model. Furthermore, the quantitative analyses were performed by partial least square regression (PLSR) and least squares support vector machine (LS-SVM). Comparing to the PLSR models, the LS-SVM yielded better results with low RMSEP (0.0044, 0.0039, and 0.0048), higher Rp (0.9601, 0.9688, and 0.9359), and higher RPD (8.6272, 9.6333, and 7.9083) for myricetin, quercetin, and kaempferol, respectively. Our results demonstrate that THz spectroscopy technique is a powerful tool for identification of three flavonols with similar chemical structures and quantitative determination of their concentrations.
Stein, Koen W. H.; Werner, Jan
2013-01-01
Osteocytes harbour much potential for paleobiological studies. Synchrotron radiation and spectroscopic analyses are providing fascinating data on osteocyte density, size and orientation in fossil taxa. However, such studies may be costly and time consuming. Here we describe an uncomplicated and inexpensive method to measure osteocyte lacunar densities in bone thin sections. We report on cell lacunar densities in the long bones of various extant and extinct tetrapods, with a focus on sauropodomorph dinosaurs, and how lacunar densities can help us understand bone formation rates in the iconic sauropod dinosaurs. Ordinary least square and phylogenetic generalized least square regressions suggest that sauropodomorphs have lacunar densities higher than scaled up or comparably sized mammals. We also found normal mammalian-like osteocyte densities for the extinct bovid Myotragus, questioning its crocodilian-like physiology. When accounting for body mass effects and phylogeny, growth rates are a main factor determining the density of the lacunocanalicular network. However, functional aspects most likely play an important role as well. Observed differences in cell strategies between mammals and dinosaurs likely illustrate the convergent nature of fast growing bone tissues in these groups. PMID:24204748
Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data.
Wei, Runmin; Wang, Jingye; Su, Mingming; Jia, Erik; Chen, Shaoqiu; Chen, Tianlu; Ni, Yan
2018-01-12
Missing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection can significantly affect following data analyses. Typically, there are three types of missing values, missing not at random (MNAR), missing at random (MAR), and missing completely at random (MCAR). Our study comprehensively compared eight imputation methods (zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC)) for different types of missing values using four metabolomics datasets. Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate imputation accuracy. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes analysis were used to evaluate the overall sample distribution. Student's t-test followed by correlation analysis was conducted to evaluate the effects on univariate statistics. Our findings demonstrated that RF performed the best for MCAR/MAR and QRILC was the favored one for left-censored MNAR. Finally, we proposed a comprehensive strategy and developed a public-accessible web-tool for the application of missing value imputation in metabolomics ( https://metabolomics.cc.hawaii.edu/software/MetImp/ ).
Nurse staffing levels and Medicaid reimbursement rates in nursing facilities.
Harrington, Charlene; Swan, James H; Carrillo, Helen
2007-06-01
To examine the relationship between nursing staffing levels in U.S. nursing homes and state Medicaid reimbursement rates. Facility staffing, characteristics, and case-mix data were from the federal On-Line Survey Certification and Reporting (OSCAR) system and other data were from public sources. Ordinary least squares and two-stage least squares regression analyses were used to separately examine the relationship between registered nurse (RN) and total nursing hours in all U.S. nursing homes in 2002, with two endogenous variables: Medicaid reimbursement rates and resident case mix. RN hours and total nursing hours were endogenous with Medicaid reimbursement rates and resident case mix. As expected, Medicaid nursing home reimbursement rates were positively related to both RN and total nursing hours. Resident case mix was a positive predictor of RN hours and a negative predictor of total nursing hours. Higher state minimum RN staffing standards was a positive predictor of RN and total nursing hours while for-profit facilities and the percent of Medicaid residents were negative predictors. To increase staffing levels, average Medicaid reimbursement rates would need to be substantially increased while higher state minimum RN staffing standards is a stronger positive predictor of RN and total nursing hours.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kwon, Deukwoo; Little, Mark P.; Miller, Donald L.
Purpose: To determine more accurate regression formulas for estimating peak skin dose (PSD) from reference air kerma (RAK) or kerma-area product (KAP). Methods: After grouping of the data from 21 procedures into 13 clinically similar groups, assessments were made of optimal clustering using the Bayesian information criterion to obtain the optimal linear regressions of (log-transformed) PSD vs RAK, PSD vs KAP, and PSD vs RAK and KAP. Results: Three clusters of clinical groups were optimal in regression of PSD vs RAK, seven clusters of clinical groups were optimal in regression of PSD vs KAP, and six clusters of clinical groupsmore » were optimal in regression of PSD vs RAK and KAP. Prediction of PSD using both RAK and KAP is significantly better than prediction of PSD with either RAK or KAP alone. The regression of PSD vs RAK provided better predictions of PSD than the regression of PSD vs KAP. The partial-pooling (clustered) method yields smaller mean squared errors compared with the complete-pooling method.Conclusion: PSD distributions for interventional radiology procedures are log-normal. Estimates of PSD derived from RAK and KAP jointly are most accurate, followed closely by estimates derived from RAK alone. Estimates of PSD derived from KAP alone are the least accurate. Using a stochastic search approach, it is possible to cluster together certain dissimilar types of procedures to minimize the total error sum of squares.« less
Tri-axial tactile sensing element
NASA Astrophysics Data System (ADS)
Castellanos-Ramos, Julián.; Navas-González, Rafael; Vidal-Verdú, F.
2013-05-01
A 13 x 13 square millimetre tri-axial taxel is presented which is suitable for some medical applications, for instance in assistive robotics that involves contact with humans or in prosthetics. Finite Element Analysis is carried out to determine what structure is the best to obtain a uniform distribution of pressure on the sensing areas underneath the structure. This structure has been fabricated in plastic with a 3D printer and a commercial tactile sensor has been used to implement the sensing areas. A three axis linear motorized translation stage with a tri-axial precision force sensor is used to find the parameters of the linear regression model and characterize the proposed taxel. The results are analysed to see to what extent the goal has been reached in this specific implementation.
Suicidal ideation among Italian and Spanish young adults: the role of sexual orientation.
Baiocco, Roberto; Ioverno, Salvatore; Lonigro, Antonia; Baumgartner, Emma; Laghi, Fiorenzo
2015-01-01
The purpose of the current study was to identify demographic, social, and psychological variables associated with suicidal ideation in an Italian sample and a Spanish sample, taking into account the relevance of sexual orientation as a risk factor for suicide. Three hundred twenty gay and bisexual men, 396 heterosexual men, 281 lesbians and bisexual women, and 835 heterosexual women were recruited. In chi-square and multivariable logistic regression analyses we identified several consistent cross-national risk factors for suicidal ideation: having lower education, not being religious, being homosexual or bisexual, not being engaged in a stable relationship, having lower level of peer and parental attachment, and having depressive symptoms. Interestingly, the strongest risk factor in both samples, after depression symptoms, was sexual orientation.
Global image registration using a symmetric block-matching approach
Modat, Marc; Cash, David M.; Daga, Pankaj; Winston, Gavin P.; Duncan, John S.; Ourselin, Sébastien
2014-01-01
Abstract. Most medical image registration algorithms suffer from a directionality bias that has been shown to largely impact subsequent analyses. Several approaches have been proposed in the literature to address this bias in the context of nonlinear registration, but little work has been done for global registration. We propose a symmetric approach based on a block-matching technique and least-trimmed square regression. The proposed method is suitable for multimodal registration and is robust to outliers in the input images. The symmetric framework is compared with the original asymmetric block-matching technique and is shown to outperform it in terms of accuracy and robustness. The methodology presented in this article has been made available to the community as part of the NiftyReg open-source package. PMID:26158035
Two biased estimation techniques in linear regression: Application to aircraft
NASA Technical Reports Server (NTRS)
Klein, Vladislav
1988-01-01
Several ways for detection and assessment of collinearity in measured data are discussed. Because data collinearity usually results in poor least squares estimates, two estimation techniques which can limit a damaging effect of collinearity are presented. These two techniques, the principal components regression and mixed estimation, belong to a class of biased estimation techniques. Detection and assessment of data collinearity and the two biased estimation techniques are demonstrated in two examples using flight test data from longitudinal maneuvers of an experimental aircraft. The eigensystem analysis and parameter variance decomposition appeared to be a promising tool for collinearity evaluation. The biased estimators had far better accuracy than the results from the ordinary least squares technique.
Partial Least Squares Regression Models for the Analysis of Kinase Signaling.
Bourgeois, Danielle L; Kreeger, Pamela K
2017-01-01
Partial least squares regression (PLSR) is a data-driven modeling approach that can be used to analyze multivariate relationships between kinase networks and cellular decisions or patient outcomes. In PLSR, a linear model relating an X matrix of dependent variables and a Y matrix of independent variables is generated by extracting the factors with the strongest covariation. While the identified relationship is correlative, PLSR models can be used to generate quantitative predictions for new conditions or perturbations to the network, allowing for mechanisms to be identified. This chapter will provide a brief explanation of PLSR and provide an instructive example to demonstrate the use of PLSR to analyze kinase signaling.
NASA Astrophysics Data System (ADS)
Imam, Tasneem
2012-12-01
The study attempts at examining the association of a few selected socio-economic and demographic characteristics on diabetic prevalence. Nationally representative data from BIRDEM 2000 have been used to meet the objectives of the study. Cross tabulation, Chi-square and logistic regression analysis have been used to portray the necessary associations. Chi- square reveals significant relationship between diabetic prevalence and all the selected demographic and socio-economic variables except ìeducationî while logistic regression analysis shows no significant contribution of ìageî and ìeducationî in diabetic prevalence. It has to be noted that, this paper dealt with all the three types of diabetes- Type 1, Type 2 and Gestational.
ERIC Educational Resources Information Center
Culpepper, Steven Andrew
2012-01-01
The study of prediction bias is important and the last five decades include research studies that examined whether test scores differentially predict academic or employment performance. Previous studies used ordinary least squares (OLS) to assess whether groups differ in intercepts and slopes. This study shows that OLS yields inaccurate inferences…
Mishra, Vishal
2015-01-01
The interchange of the protons with the cell wall-bound calcium and magnesium ions at the interface of solution/bacterial cell surface in the biosorption system at various concentrations of protons has been studied in the present work. A mathematical model for establishing the correlation between concentration of protons and active sites was developed and optimized. The sporadic limited residence time reactor was used to titrate the calcium and magnesium ions at the individual data point. The accuracy of the proposed mathematical model was estimated using error functions such as nonlinear regression, adjusted nonlinear regression coefficient, the chi-square test, P-test and F-test. The values of the chi-square test (0.042-0.017), P-test (<0.001-0.04), sum of square errors (0.061-0.016), root mean square error (0.01-0.04) and F-test (2.22-19.92) reported in the present research indicated the suitability of the model over a wide range of proton concentrations. The zeta potential of the bacterium surface at various concentrations of protons was observed to validate the denaturation of active sites.
Balabin, Roman M; Lomakina, Ekaterina I
2011-04-21
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Using Remote Sensing Data to Evaluate Surface Soil Properties in Alabama Ultisols
NASA Technical Reports Server (NTRS)
Sullivan, Dana G.; Shaw, Joey N.; Rickman, Doug; Mask, Paul L.; Luvall, Jeff
2005-01-01
Evaluation of surface soil properties via remote sensing could facilitate soil survey mapping, erosion prediction and allocation of agrochemicals for precision management. The objective of this study was to evaluate the relationship between soil spectral signature and surface soil properties in conventionally managed row crop systems. High-resolution RS data were acquired over bare fields in the Coastal Plain, Appalachian Plateau, and Ridge and Valley provinces of Alabama using the Airborne Terrestrial Applications Sensor multispectral scanner. Soils ranged from sandy Kandiudults to fine textured Rhodudults. Surface soil samples (0-1 cm) were collected from 163 sampling points for soil organic carbon, particle size distribution, and citrate dithionite extractable iron content. Surface roughness, soil water content, and crusting were also measured during sampling. Two methods of analysis were evaluated: 1) multiple linear regression using common spectral band ratios, and 2) partial least squares regression. Our data show that thermal infrared spectra are highly, linearly related to soil organic carbon, sand and clay content. Soil organic carbon content was the most difficult to quantify in these highly weathered systems, where soil organic carbon was generally less than 1.2%. Estimates of sand and clay content were best using partial least squares regression at the Valley site, explaining 42-59% of the variability. In the Coastal Plain, sandy surfaces prone to crusting limited estimates of sand and clay content via partial least squares and regression with common band ratios. Estimates of iron oxide content were a function of mineralogy and best accomplished using specific band ratios, with regression explaining 36-65% of the variability at the Valley and Coastal Plain sites, respectively.
Impact of multicollinearity on small sample hydrologic regression models
NASA Astrophysics Data System (ADS)
Kroll, Charles N.; Song, Peter
2013-06-01
Often hydrologic regression models are developed with ordinary least squares (OLS) procedures. The use of OLS with highly correlated explanatory variables produces multicollinearity, which creates highly sensitive parameter estimators with inflated variances and improper model selection. It is not clear how to best address multicollinearity in hydrologic regression models. Here a Monte Carlo simulation is developed to compare four techniques to address multicollinearity: OLS, OLS with variance inflation factor screening (VIF), principal component regression (PCR), and partial least squares regression (PLS). The performance of these four techniques was observed for varying sample sizes, correlation coefficients between the explanatory variables, and model error variances consistent with hydrologic regional regression models. The negative effects of multicollinearity are magnified at smaller sample sizes, higher correlations between the variables, and larger model error variances (smaller R2). The Monte Carlo simulation indicates that if the true model is known, multicollinearity is present, and the estimation and statistical testing of regression parameters are of interest, then PCR or PLS should be employed. If the model is unknown, or if the interest is solely on model predictions, is it recommended that OLS be employed since using more complicated techniques did not produce any improvement in model performance. A leave-one-out cross-validation case study was also performed using low-streamflow data sets from the eastern United States. Results indicate that OLS with stepwise selection generally produces models across study regions with varying levels of multicollinearity that are as good as biased regression techniques such as PCR and PLS.
Linear Least Squares for Correlated Data
NASA Technical Reports Server (NTRS)
Dean, Edwin B.
1988-01-01
Throughout the literature authors have consistently discussed the suspicion that regression results were less than satisfactory when the independent variables were correlated. Camm, Gulledge, and Womer, and Womer and Marcotte provide excellent applied examples of these concerns. Many authors have obtained partial solutions for this problem as discussed by Womer and Marcotte and Wonnacott and Wonnacott, which result in generalized least squares algorithms to solve restrictive cases. This paper presents a simple but relatively general multivariate method for obtaining linear least squares coefficients which are free of the statistical distortion created by correlated independent variables.
Yilmaz, Banu; Aras, Egemen; Nacar, Sinan; Kankal, Murat
2018-05-23
The functional life of a dam is often determined by the rate of sediment delivery to its reservoir. Therefore, an accurate estimate of the sediment load in rivers with dams is essential for designing and predicting a dam's useful lifespan. The most credible method is direct measurements of sediment input, but this can be very costly and it cannot always be implemented at all gauging stations. In this study, we tested various regression models to estimate suspended sediment load (SSL) at two gauging stations on the Çoruh River in Turkey, including artificial bee colony (ABC), teaching-learning-based optimization algorithm (TLBO), and multivariate adaptive regression splines (MARS). These models were also compared with one another and with classical regression analyses (CRA). Streamflow values and previously collected data of SSL were used as model inputs with predicted SSL data as output. Two different training and testing dataset configurations were used to reinforce the model accuracy. For the MARS method, the root mean square error value was found to range between 35% and 39% for the test two gauging stations, which was lower than errors for other models. Error values were even lower (7% to 15%) using another dataset. Our results indicate that simultaneous measurements of streamflow with SSL provide the most effective parameter for obtaining accurate predictive models and that MARS is the most accurate model for predicting SSL. Copyright © 2017 Elsevier B.V. All rights reserved.
Wind Tunnel Strain-Gage Balance Calibration Data Analysis Using a Weighted Least Squares Approach
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Volden, T.
2017-01-01
A new approach is presented that uses a weighted least squares fit to analyze wind tunnel strain-gage balance calibration data. The weighted least squares fit is specifically designed to increase the influence of single-component loadings during the regression analysis. The weighted least squares fit also reduces the impact of calibration load schedule asymmetries on the predicted primary sensitivities of the balance gages. A weighting factor between zero and one is assigned to each calibration data point that depends on a simple count of its intentionally loaded load components or gages. The greater the number of a data point's intentionally loaded load components or gages is, the smaller its weighting factor becomes. The proposed approach is applicable to both the Iterative and Non-Iterative Methods that are used for the analysis of strain-gage balance calibration data in the aerospace testing community. The Iterative Method uses a reasonable estimate of the tare corrected load set as input for the determination of the weighting factors. The Non-Iterative Method, on the other hand, uses gage output differences relative to the natural zeros as input for the determination of the weighting factors. Machine calibration data of a six-component force balance is used to illustrate benefits of the proposed weighted least squares fit. In addition, a detailed derivation of the PRESS residuals associated with a weighted least squares fit is given in the appendices of the paper as this information could not be found in the literature. These PRESS residuals may be needed to evaluate the predictive capabilities of the final regression models that result from a weighted least squares fit of the balance calibration data.
America's Democracy Colleges: The Civic Engagement of Community College Students
ERIC Educational Resources Information Center
Angeli Newell, Mallory
2014-01-01
This study explored the civic engagement of current two- and four-year students to explore whether differences exist between the groups and what may explain the differences. Using binary logistic regression and Ordinary Least Squares regression it was found that community-based engagement was lower for two- than four-year students, though…
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method
ERIC Educational Resources Information Center
Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev
2018-01-01
The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
Robust Regression for Slope Estimation in Curriculum-Based Measurement Progress Monitoring
ERIC Educational Resources Information Center
Mercer, Sterett H.; Lyons, Alina F.; Johnston, Lauren E.; Millhoff, Courtney L.
2015-01-01
Although ordinary least-squares (OLS) regression has been identified as a preferred method to calculate rates of improvement for individual students during curriculum-based measurement (CBM) progress monitoring, OLS slope estimates are sensitive to the presence of extreme values. Robust estimators have been developed that are less biased by…
Pick Your Poisson: A Tutorial on Analyzing Counts of Student Victimization Data
ERIC Educational Resources Information Center
Huang, Francis L.; Cornell, Dewey G.
2012-01-01
School violence research is often concerned with infrequently occurring events such as counts of the number of bullying incidents or fights a student may experience. Analyzing count data using ordinary least squares regression may produce improbable predicted values, and as a result of regression assumption violations, result in higher Type I…
Causal Models with Unmeasured Variables: An Introduction to LISREL.
ERIC Educational Resources Information Center
Wolfle, Lee M.
Whenever one uses ordinary least squares regression, one is making an implicit assumption that all of the independent variables have been measured without error. Such an assumption is obviously unrealistic for most social data. One approach for estimating such regression models is to measure implied coefficients between latent variables for which…
Early Home Activities and Oral Language Skills in Middle Childhood: A Quantile Analysis
ERIC Educational Resources Information Center
Law, James; Rush, Robert; King, Tom; Westrupp, Elizabeth; Reilly, Sheena
2018-01-01
Oral language development is a key outcome of elementary school, and it is important to identify factors that predict it most effectively. Commonly researchers use ordinary least squares regression with conclusions restricted to average performance conditional on relevant covariates. Quantile regression offers a more sophisticated alternative.…
On the null distribution of Bayes factors in linear regression
USDA-ARS?s Scientific Manuscript database
We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...
Partial least squares for efficient models of fecal indicator bacteria on Great Lakes beaches
Brooks, Wesley R.; Fienen, Michael N.; Corsi, Steven R.
2013-01-01
At public beaches, it is now common to mitigate the impact of water-borne pathogens by posting a swimmer's advisory when the concentration of fecal indicator bacteria (FIB) exceeds an action threshold. Since culturing the bacteria delays public notification when dangerous conditions exist, regression models are sometimes used to predict the FIB concentration based on readily-available environmental measurements. It is hard to know which environmental parameters are relevant to predicting FIB concentration, and the parameters are usually correlated, which can hurt the predictive power of a regression model. Here the method of partial least squares (PLS) is introduced to automate the regression modeling process. Model selection is reduced to the process of setting a tuning parameter to control the decision threshold that separates predicted exceedances of the standard from predicted non-exceedances. The method is validated by application to four Great Lakes beaches during the summer of 2010. Performance of the PLS models compares favorably to that of the existing state-of-the-art regression models at these four sites.
Marabel, Miguel; Alvarez-Taboada, Flor
2013-01-01
Aboveground biomass (AGB) is one of the strategic biophysical variables of interest in vegetation studies. The main objective of this study was to evaluate the Support Vector Machine (SVM) and Partial Least Squares Regression (PLSR) for estimating the AGB of grasslands from field spectrometer data and to find out which data pre-processing approach was the most suitable. The most accurate model to predict the total AGB involved PLSR and the Maximum Band Depth index derived from the continuum removed reflectance in the absorption features between 916–1,120 nm and 1,079–1,297 nm (R2 = 0.939, RMSE = 7.120 g/m2). Regarding the green fraction of the AGB, the Area Over the Minimum index derived from the continuum removed spectra provided the most accurate model overall (R2 = 0.939, RMSE = 3.172 g/m2). Identifying the appropriate absorption features was proved to be crucial to improve the performance of PLSR to estimate the total and green aboveground biomass, by using the indices derived from those spectral regions. Ordinary Least Square Regression could be used as a surrogate for the PLSR approach with the Area Over the Minimum index as the independent variable, although the resulting model would not be as accurate. PMID:23925082
Feng, Yao-Ze; Elmasry, Gamal; Sun, Da-Wen; Scannell, Amalia G M; Walsh, Des; Morcy, Noha
2013-06-01
Bacterial pathogens are the main culprits for outbreaks of food-borne illnesses. This study aimed to use the hyperspectral imaging technique as a non-destructive tool for quantitative and direct determination of Enterobacteriaceae loads on chicken fillets. Partial least squares regression (PLSR) models were established and the best model using full wavelengths was obtained in the spectral range 930-1450 nm with coefficients of determination R(2)≥ 0.82 and root mean squared errors (RMSEs) ≤ 0.47 log(10)CFUg(-1). In further development of simplified models, second derivative spectra and weighted PLS regression coefficients (BW) were utilised to select important wavelengths. However, the three wavelengths (930, 1121 and 1345 nm) selected from BW were competent and more preferred for predicting Enterobacteriaceae loads with R(2) of 0.89, 0.86 and 0.87 and RMSEs of 0.33, 0.40 and 0.45 log(10)CFUg(-1) for calibration, cross-validation and prediction, respectively. Besides, the constructed prediction map provided the distribution of Enterobacteriaceae bacteria on chicken fillets, which cannot be achieved by conventional methods. It was demonstrated that hyperspectral imaging is a potential tool for determining food sanitation and detecting bacterial pathogens on food matrix without using complicated laboratory regimes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Kuriakose, Saji; Joe, I Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC=0.00009% v/v). The lowest root mean square error of prediction (RMSEP=0.00016% v/v) in the test set and the highest coefficient of determination (R(2)=0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model. Copyright © 2013 Elsevier B.V. All rights reserved.
Nonlinear least-squares data fitting in Excel spreadsheets.
Kemmer, Gerdi; Keller, Sandro
2010-02-01
We describe an intuitive and rapid procedure for analyzing experimental data by nonlinear least-squares fitting (NLSF) in the most widely used spreadsheet program. Experimental data in x/y form and data calculated from a regression equation are inputted and plotted in a Microsoft Excel worksheet, and the sum of squared residuals is computed and minimized using the Solver add-in to obtain the set of parameter values that best describes the experimental data. The confidence of best-fit values is then visualized and assessed in a generally applicable and easily comprehensible way. Every user familiar with the most basic functions of Excel will be able to implement this protocol, without previous experience in data fitting or programming and without additional costs for specialist software. The application of this tool is exemplified using the well-known Michaelis-Menten equation characterizing simple enzyme kinetics. Only slight modifications are required to adapt the protocol to virtually any other kind of dataset or regression equation. The entire protocol takes approximately 1 h.
Extension of the Haseman-Elston regression model to longitudinal data.
Won, Sungho; Elston, Robert C; Park, Taesung
2006-01-01
We propose an extension to longitudinal data of the Haseman and Elston regression method for linkage analysis. The proposed model is a mixed model having several random effects. As response variable, we investigate the sibship sample mean corrected cross-product (smHE) and the BLUP-mean corrected cross product (pmHE), comparing them with the original squared difference (oHE), the overall mean corrected cross-product (rHE), and the weighted average of the squared difference and the squared mean-corrected sum (wHE). The proposed model allows for the correlation structure of longitudinal data. Also, the model can test for gene x time interaction to discover genetic variation over time. The model was applied in an analysis of the Genetic Analysis Workshop 13 (GAW13) simulated dataset for a quantitative trait simulating systolic blood pressure. Independence models did not preserve the test sizes, while the mixed models with both family and sibpair random effects tended to preserve size well. Copyright 2006 S. Karger AG, Basel.
Development of Jet Noise Power Spectral Laws Using SHJAR Data
NASA Technical Reports Server (NTRS)
Khavaran, Abbas; Bridges, James
2009-01-01
High quality jet noise spectral data measured at the Aeroacoustic Propulsion Laboratory at the NASA Glenn Research Center is used to examine a number of jet noise scaling laws. Configurations considered in the present study consist of convergent and convergent-divergent axisymmetric nozzles. Following the work of Viswanathan, velocity power factors are estimated using a least squares fit on spectral power density as a function of jet temperature and observer angle. The regression parameters are scrutinized for their uncertainty within the desired confidence margins. As an immediate application of the velocity power laws, spectral density in supersonic jets are decomposed into their respective components attributed to the jet mixing noise and broadband shock associated noise. Subsequent application of the least squares method on the shock power intensity shows that the latter also scales with some power of the shock parameter. A modified shock parameter is defined in order to reduce the dependency of the regression factors on the nozzle design point within the uncertainty margins of the least squares method.
Li, Guoqiang; Niu, Peifeng; Wang, Huaibao; Liu, Yongchao
2014-03-01
This paper presents a novel artificial neural network with a very fast learning speed, all of whose weights and biases are determined by the twice Least Square method, so it is called Least Square Fast Learning Network (LSFLN). In addition, there is another difference from conventional neural networks, which is that the output neurons of LSFLN not only receive the information from the hidden layer neurons, but also receive the external information itself directly from the input neurons. In order to test the validity of LSFLN, it is applied to 6 classical regression applications, and also employed to build the functional relation between the combustion efficiency and operating parameters of a 300WM coal-fired boiler. Experimental results show that, compared with other methods, LSFLN with very less hidden neurons could achieve much better regression precision and generalization ability at a much faster learning speed. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Kuriakose, Saji; Joe, I. Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model.
Techniques for estimating magnitude and frequency of peak flows for Pennsylvania streams
Stuckey, Marla H.; Reed, Lloyd A.
2000-01-01
Regression equations for estimating the magnitude and frequency of floods on ungaged streams in Pennsylvania with drainage areas less that 2,000 square miles were developed on the basis of peak-flow data collected at 313 streamflow-gaging stations. All streamflow-gaging stations used in the development of the equations had 10 or more years of record and include active and discontinued continuous-record and crest-stage partial-record streamflow-gaging stations. Regional regression equations were developed for flood flows expected every 10, 25, 50, 100, and 500 years by the use of a weighted multiple linear regression model.The State was divided into two regions. The largest region, Region A, encompasses about 78 percent of Pennsylvania. The smaller region, Region B, includes only the northwestern part of the State. Basin characteristics used in the regression equations for Region A are drainage area, percentage of forest cover, percentage of urban development, percentage of basin underlain by carbonate bedrock, and percentage of basin controlled by lakes, swamps, and reservoirs. Basin characteristics used in the regression equations for Region B are drainage area and percentage of basin controlled by lakes, swamps, and reservoirs. The coefficient of determination (R2) values for the five flood-frequency equations for Region A range from 0.93 to 0.82, and for Region B, the range is from 0.96 to 0.89.While the regression equations can be used to predict the magnitude and frequency of peak flows for most streams in the State, they should not be used for streams with drainage areas greater than 2,000 square miles or less than 1.5 square miles, for streams that drain extensively mined areas, or for stream reaches immediately below flood-control reservoirs. In addition, the equations presented for Region B should not be used if the stream drains a basin with more than 5 percent urban development.
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients
NASA Astrophysics Data System (ADS)
Gorgees, HazimMansoor; Mahdi, FatimahAssim
2018-05-01
This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
Early Separation Incentives: An Analysis of Survey Data and Reenlistment Decision-Making Models
1993-05-01
Table 8 Analyses of Variance for COL by Race and Gender (Enlisted Personnel) SOURCE 2F TYPE III SS MEAN SQUARE F VALUE PR > F SEX 1 80219.5873...SOURCE TYPEIII MEAN SUARE F VALUE PR >F SEX 1 82779.4582 82779.4582 1.04 0.3074 RACE 4 202214.2787 50553.5697 0.64 0.6362 Note: R-Square= 0.004141 12...Table 10 Analyses of Variance for COL by Race and Gender (Commissioned officers) SOURCE DF TYPE III SS MEAN SQUARE F VALUE PR > F SEX 1 1843343.079
ERIC Educational Resources Information Center
Mooijaart, Ab; Satorra, Albert
2009-01-01
In this paper, we show that for some structural equation models (SEM), the classical chi-square goodness-of-fit test is unable to detect the presence of nonlinear terms in the model. As an example, we consider a regression model with latent variables and interactions terms. Not only the model test has zero power against that type of…
Lewis, Jason M.
2010-01-01
Peak-streamflow regression equations were determined for estimating flows with exceedance probabilities from 50 to 0.2 percent for the state of Oklahoma. These regression equations incorporate basin characteristics to estimate peak-streamflow magnitude and frequency throughout the state by use of a generalized least squares regression analysis. The most statistically significant independent variables required to estimate peak-streamflow magnitude and frequency for unregulated streams in Oklahoma are contributing drainage area, mean-annual precipitation, and main-channel slope. The regression equations are applicable for watershed basins with drainage areas less than 2,510 square miles that are not affected by regulation. The resulting regression equations had a standard model error ranging from 31 to 46 percent. Annual-maximum peak flows observed at 231 streamflow-gaging stations through water year 2008 were used for the regression analysis. Gage peak-streamflow estimates were used from previous work unless 2008 gaging-station data were available, in which new peak-streamflow estimates were calculated. The U.S. Geological Survey StreamStats web application was used to obtain the independent variables required for the peak-streamflow regression equations. Limitations on the use of the regression equations and the reliability of regression estimates for natural unregulated streams are described. Log-Pearson Type III analysis information, basin and climate characteristics, and the peak-streamflow frequency estimates for the 231 gaging stations in and near Oklahoma are listed. Methodologies are presented to estimate peak streamflows at ungaged sites by using estimates from gaging stations on unregulated streams. For ungaged sites on urban streams and streams regulated by small floodwater retarding structures, an adjustment of the statewide regression equations for natural unregulated streams can be used to estimate peak-streamflow magnitude and frequency.
Patterson, P Daniel; Moore, Charity G; Sanddal, Nels D; Wingrove, Gary; LaCroix, Brian
2009-01-01
The primary purpose of this study was to characterize job satisfaction with opportunities for advancement, job satisfaction with pay and benefits, and intent to leave the EMS profession among Nationally Registered EMT-Basics and EMT-Paramedics. A secondary data analysis was performed on the National Registry of EMTs Longitudinal Emergency Medical Technician Attributes and Demographic Study Project (LEADS) 2005 core survey. We used chi-square and multiple logistic regression analyses to test for differences in job satisfaction with opportunities for advancement, job satisfaction with pay and benefits, and intent to leave the EMS profession across years of experience and work location. Among 11 measures of job satisfaction, NREMT-Basics and NREMT-Paramedics were least satisfied with opportunities for advancement and pay and benefits (67.8 and 55.2%, respectively). Nearly 6% of respondents reported intentions of leaving the profession within 12 months. In univariate analyses, job satisfaction with advancement opportunities varied across years of experience and work location. Job satisfaction with pay and benefits varied across years of experience and work location. The proportion reporting intentions of leaving the profession did not vary across the two independent variables of interest. In multivariable logistic regression, statistical differences observed in univariate analyses were attenuated to non-significance across all outcome models. Income, personal health, level of EMS certification, and type of EMS work were significant in several outcome models. EMS workforce research is at its infancy, thus our study adds to a limited but growing body of knowledge. In future and replicated research, one will need to consider different person and organizational variables in predicting different measures of job satisfaction among EMS personnel.
Liu, Bin; Geng, Huizhen; Yang, Juan; Zhang, Ying; Deng, Langhui; Chen, Weiqing; Wang, Zilian
2016-03-17
Hyperlipidemia and high fasting plasma glucose levels at the first prenatal visit (First Visit FPG) are both related to gestational diabetes mellitus, maternal obesity/overweight and fetal overgrowth. The purpose of the present study is to investigate the correlation between First Visit FPG and lipid concentrations, and their potential association with offspring size at delivery. Pregnant women that received regular prenatal care and delivered in our center in 2013 were recruited for the study. Fasting plasma glucose levels were tested at the first prenatal visit (First Visit FPG) and prior to delivery (Before Delivery FPG). HbA1c and lipid profiles were examined at the time of OGTT test. Maternal and neonatal clinical data were collected for analysis. Data was analyzed by independent sample t test, Pearson correlation, and Chi-square test, followed by partial correlation and multiple linear regression analyses to confirm association. Statistical significance level was α =0.05. Analyses were based on 1546 mother-baby pairs. First Visit FPG was not correlated with any lipid parameters after adjusting for maternal pregravid BMI, maternal age and gestational age at First Visit FPG. HbA1c was positively correlated with triglyceride and Apolipoprotein B in the whole cohort and in the NGT group after adjusting for maternal age and maternal BMI at OGTT test. Multiple linear regression analyses showed neonatal birth weight, head circumference and shoulder circumference were all associated with First Visit FPG and triglyceride levels. Fasting plasma glucose at first prenatal visit is not associated with lipid concentrations in mid-pregnancy, but may influence fetal growth together with triglyceride concentration.
Mechanisms behind the estimation of photosynthesis traits from leaf reflectance observations
NASA Astrophysics Data System (ADS)
Dechant, Benjamin; Cuntz, Matthias; Doktor, Daniel; Vohland, Michael
2016-04-01
Many studies have investigated the reflectance-based estimation of leaf chlorophyll, water and dry matter contents of plants. Only few studies focused on photosynthesis traits, however. The maximum potential uptake of carbon dioxide under given environmental conditions is determined mainly by RuBisCO activity, limiting carboxylation, or the speed of photosynthetic electron transport. These two main limitations are represented by the maximum carboxylation capacity, V cmax,25, and the maximum electron transport rate, Jmax,25. These traits were estimated from leaf reflectance before but the mechanisms underlying the estimation remain rather speculative. The aim of this study was therefore to reveal the mechanisms behind reflectance-based estimation of V cmax,25 and Jmax,25. Leaf reflectance, photosynthetic response curves as well as nitrogen content per area, Narea, and leaf mass per area, LMA, were measured on 37 deciduous tree species. V cmax,25 and Jmax,25 were determined from the response curves. Partial Least Squares (PLS) regression models for the two photosynthesis traits V cmax,25 and Jmax,25 as well as Narea and LMA were studied using a cross-validation approach. Analyses of linear regression models based on Narea and other leaf traits estimated via PROSPECT inversion, PLS regression coefficients and model residuals were conducted in order to reveal the mechanisms behind the reflectance-based estimation. We found that V cmax,25 and Jmax,25 can be estimated from leaf reflectance with good to moderate accuracy for a large number of species and different light conditions. The dominant mechanism behind the estimations was the strong relationship between photosynthesis traits and leaf nitrogen content. This was concluded from very strong relationships between PLS regression coefficients, the model residuals as well as the prediction performance of Narea- based linear regression models compared to PLS regression models. While the PLS regression model for V cmax,25 was fully based on the correlation to Narea, the PLS regression model for Jmax,25 was not entirely based on it. Analyses of the contributions of different parts of the reflectance spectrum revealed that the information contributing to the Jmax,25 PLS regression model in addition to the main source of information, Narea, was mainly located in the visible part of the spectrum (500-900 nm). Estimated chlorophyll content could be excluded as potential source of this extra information. The PLS regression coefficients of the Jmax,25 model indicated possible contributions from chlorophyll fluorescence and cytochrome f content. In summary, we found that the main mechanism behind the estimation of V cmax,25 and Jmax,25 from leaf reflectance observations is the correlation to Narea but that there is additional information related to Jmax,25 mainly in the visible part of the spectrum.
Roland, Mark A.; Stuckey, Marla H.
2008-01-01
Regression equations were developed for estimating flood flows at selected recurrence intervals for ungaged streams in Pennsylvania with drainage areas less than 2,000 square miles. These equations were developed utilizing peak-flow data from 322 streamflow-gaging stations within Pennsylvania and surrounding states. All stations used in the development of the equations had 10 or more years of record and included active and discontinued continuous-record as well as crest-stage partial-record stations. The state was divided into four regions, and regional regression equations were developed to estimate the 2-, 5-, 10-, 50-, 100-, and 500-year recurrence-interval flood flows. The equations were developed by means of a regression analysis that utilized basin characteristics and flow data associated with the stations. Significant explanatory variables at the 95-percent confidence level for one or more regression equations included the following basin characteristics: drainage area; mean basin elevation; and the percentages of carbonate bedrock, urban area, and storage within a basin. The regression equations can be used to predict the magnitude of flood flows for specified recurrence intervals for most streams in the state; however, they are not valid for streams with drainage areas generally greater than 2,000 square miles or with substantial regulation, diversion, or mining activity within the basin. Estimates of flood-flow magnitude and frequency for streamflow-gaging stations substantially affected by upstream regulation are also presented.
ERIC Educational Resources Information Center
Chen, Sheng-Tung; Kuo, Hsiao-I.; Chen, Chi-Chung
2012-01-01
The two-stage least squares approach together with quantile regression analysis is adopted here to estimate the educational production function. Such a methodology is able to capture the extreme behaviors of the two tails of students' performance and the estimation outcomes have important policy implications. Our empirical study is applied to the…
Determination of cellulose I crystallinity by FT-Raman spectroscopy
Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph
2009-01-01
Two new methods based on FT-Raman spectroscopy, one simple, based on band intensity ratio, and the other, using a partial least-squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in semicrystalline cellulose I samples was determined based on univariate regression that was first developed using the...
Testing the Hypothesis of a Homoscedastic Error Term in Simple, Nonparametric Regression
ERIC Educational Resources Information Center
Wilcox, Rand R.
2006-01-01
Consider the nonparametric regression model Y = m(X)+ [tau](X)[epsilon], where X and [epsilon] are independent random variables, [epsilon] has a median of zero and variance [sigma][squared], [tau] is some unknown function used to model heteroscedasticity, and m(X) is an unknown function reflecting some conditional measure of location associated…
We present here the application of PLS regression to predicting surface water total phosphorous, total ammonia and Escherichia coli from landscape metrics. The amount of variability in surface water constituents explained by each model reflects the composition of the contributi...
ERIC Educational Resources Information Center
Rocconi, Louis M.
2011-01-01
Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…
Antiretroviral drug costs and prescription patterns in British Columbia, Canada: 1996-2011.
Nosyk, Bohdan; Montaner, Julio S G; Yip, Benita; Lima, Viviane D; Hogg, Robert S
2014-04-01
Treatment options and therapeutic guidelines have evolved substantially since highly active antiretroviral treatment (HAART) became the standard of HIV care in 1996. We conducted the present population-based analysis to characterize the determinants of direct costs of HAART over time in British Columbia, Canada. We considered individuals ever receiving HAART in British Columbia from 1996 to 2011. Linear mixed-effects regression models were constructed to determine the effects of demographic indicators, clinical stage, and treatment characteristics on quarterly costs of HAART (in 2010$CDN) among individuals initiating in different temporal periods. The least-square mean values were estimated by CD4 category and over time for each temporal cohort. Longitudinal data on HAART recipients (N = 9601, 17.6% female, mean age at initiation = 40.5) were analyzed. Multiple regression analyses identified demographics, treatment adherence, and pharmacological class to be independently associated with quarterly HAART costs. Higher CD4 cell counts were associated with modestly lower costs among pre-HAART initiators [least-square means (95% confidence interval), CD4 > 500: 4674 (4632-4716); CD4: 350-499: 4765 (4721-4809) CD4: 200-349: 4826 (4780-4871); CD4 <200: 4809 (4759-4859)]; however these differences were not significant among post-2003 HAART initiators. Population-level mean costs increased through 2006 and stabilized post-2003 HAART initiators incurred quarterly costs up to 23% lower than pre-2000 HAART initiators in 2010. Our results highlight the magnitude of the temporal changes in HAART costs, and disparities between recent and pre-HAART initiators. This methodology can improve the precision of economic modeling efforts by using detailed cost functions for annual, population-level medication costs according to the distribution of clients by clinical stage and era of treatment initiation.
Megalamanegowdru, Jayachandra; Ankola, Anil V; Vathar, Jagadishchandra; Vishwakarma, Prashanthkumar; Dhanappa, Kirankumar B; Balappanavar, Aswini Y
2012-01-01
To assess and compare the periodontal health status among permanent residents of low, optimum and high fluoride areas in Kolar District, India. A house-to-house survey was conducted in a population consisting of 925 permanent residents aged 35 to 44 years in three villages having different levels of fluoride concentrations in the drinking water. The fluoride concentrations in selected villages were 0.48 ppm (low), 1.03 ppm (optimum) and 3.21 ppm (high). The ion selective electrode method was used to estimate the fluoride concentration in the drinking water. Periodontal status was assessed using the Community Periodontal Index (CPI) and loss of attachment (LOA). Results were analysed using the chi-square test and logistic regression. The chi-square test was used to find the group differences and logistic regression to find association between the variables. The overall prevalence of periodontitis was 72.9%; specifically, prevalences were 95.4%, 76.3% and 45.7% in low, optimum and high fluoride areas, respectively. The number of sextants with shallow or deep pockets decreased (shallow pockets: 525, 438, 217; deep pockets: 183, 81, 34) from low to high fluoride areas (odds ratio: 71.3). The low fluoride area had a 7.9-fold higher risk of periodontitis than the optimum fluoride area and a 30-fold higher risk than the high fluoride area, which was highly significant (χ2 = 53.5, P < 0.0001 and χ2 = 192.8, P < 0.001, respectively). The severity of periodontal disease is inversely associated with the fluoride concentrations in drinking water. This relation can provide an approach to fluoride treatments to reduce the prevalence or incidence of this disease.
Avelino, Jacques; Cabut, Sandrine; Barboza, Bernardo; Barquero, Miguel; Alfaro, Ronny; Esquivel, César; Durand, Jean-François; Cilas, Christian
2007-12-01
ABSTRACT We monitored the development of American leaf spot of coffee, a disease caused by the gemmiferous fungus Mycena citricolor, in 57 plots in Costa Rica for 1 or 2 years in order to gain a clearer understanding of conditions conducive to the disease and improve its control. During the investigation, characteristics of the coffee trees, crop management, and the environment were recorded. For the analyses, we used partial least-squares regression via the spline functions (PLSS), which is a nonlinear extension to partial least-squares regression (PLS). The fungus developed well in areas located between approximately 1,100 and 1,550 m above sea level. Slopes were conducive to its development, but eastern-facing slopes were less affected than the others, probably because they were more exposed to sunlight, especially in the rainy season. The distance between planting rows, the shade percentage, coffee tree height, the type of shade, and the pruning system explained disease intensity due to their effects on coffee tree shading and, possibly, on the humidity conditions in the plot. Forest trees and fruit trees intercropped with coffee provided particularly propitious conditions. Apparently, fertilization was unfavorable for the disease, probably due to dilution phenomena associated with faster coffee tree growth. Finally, series of wet spells interspersed with dry spells, which were frequent in the middle of the rainy season, were critical for the disease, probably because they affected the production and release of gemmae and their viability. These results could be used to draw up a map of epidemic risks taking topographical factors into account. To reduce those risks and improve chemical control, our results suggested that farmers should space planting rows further apart, maintain light shading in the plantation, and prune their coffee trees.
NASA Astrophysics Data System (ADS)
Salawu, Emmanuel Oluwatobi; Hesse, Evelyn; Stopford, Chris; Davey, Neil; Sun, Yi
2017-11-01
Better understanding and characterization of cloud particles, whose properties and distributions affect climate and weather, are essential for the understanding of present climate and climate change. Since imaging cloud probes have limitations of optical resolution, especially for small particles (with diameter < 25 μm), instruments like the Small Ice Detector (SID) probes, which capture high-resolution spatial light scattering patterns from individual particles down to 1 μm in size, have been developed. In this work, we have proposed a method using Machine Learning techniques to estimate simulated particles' orientation-averaged projected sizes (PAD) and aspect ratio from their 2D scattering patterns. The two-dimensional light scattering patterns (2DLSP) of hexagonal prisms are computed using the Ray Tracing with Diffraction on Facets (RTDF) model. The 2DLSP cover the same angular range as the SID probes. We generated 2DLSP for 162 hexagonal prisms at 133 orientations for each. In a first step, the 2DLSP were transformed into rotation-invariant Zernike moments (ZMs), which are particularly suitable for analyses of pattern symmetry. Then we used ZMs, summed intensities, and root mean square contrast as inputs to the advanced Machine Learning methods. We created one random forests classifier for predicting prism orientation, 133 orientation-specific (OS) support vector classification models for predicting the prism aspect-ratios, 133 OS support vector regression models for estimating prism sizes, and another 133 OS Support Vector Regression (SVR) models for estimating the size PADs. We have achieved a high accuracy of 0.99 in predicting prism aspect ratios, and a low value of normalized mean square error of 0.004 for estimating the particle's size and size PADs.
Sobouti, Farhad; Rakhshan, Vahid; Saravi, Mahdi Gholamrezaei; Zamanian, Ali; Shariati, Mahsa
2016-03-01
Traditional retainers (both metal and fiber-reinforced composite [FRC]) have limitations, and a retainer made from more flexible ligature wires might be advantageous. We aimed to compare an experimental design with two traditional retainers. In this prospective preliminary clinical trial, 150 post-treatment patients were enrolled and randomly divided into three groups of 50 patients each to receive mandibular canine-to-canine retainers made of FRC, flexible spiral wire (FSW), and twisted wire (TW). The patients were monitored monthly. The time at which the first signs of breakage/debonding were detected was recorded. The success rates of the retainers were compared using chi-squared, Kaplan-Meier, and Cox proportional-hazard regression analyses (α = 0.05). In total, 42 patients in the FRC group, 41 in the FSW group, and 45 in the TW group completed the study. The 2-year failure rates were 35.7% in the FRC group, 26.8% in the FSW group, and 17.8% in the TW group. These rates differed insignificantly (chi-squared p = 0.167). According to the Kaplan-Meier analysis, failure occurred at 19.95 months in the FRC group, 21.37 months in the FSW group, and 22.36 months in the TW group. The differences between the survival rates in the three groups were not significant (Cox regression p = 0.146). Although the failure rate of the experimental retainer was two times lower than that of the FRC retainer, the difference was not statistically significant. The experimental TW retainer was successful, and larger studies are warranted to verify these results.
Sobouti, Farhad; Rakhshan, Vahid; Saravi, Mahdi Gholamrezaei; Zamanian, Ali
2016-01-01
Objective Traditional retainers (both metal and fiber-reinforced composite [FRC]) have limitations, and a retainer made from more flexible ligature wires might be advantageous. We aimed to compare an experimental design with two traditional retainers. Methods In this prospective preliminary clinical trial, 150 post-treatment patients were enrolled and randomly divided into three groups of 50 patients each to receive mandibular canine-to-canine retainers made of FRC, flexible spiral wire (FSW), and twisted wire (TW). The patients were monitored monthly. The time at which the first signs of breakage/debonding were detected was recorded. The success rates of the retainers were compared using chi-squared, Kaplan-Meier, and Cox proportional-hazard regression analyses (α = 0.05). Results In total, 42 patients in the FRC group, 41 in the FSW group, and 45 in the TW group completed the study. The 2-year failure rates were 35.7% in the FRC group, 26.8% in the FSW group, and 17.8% in the TW group. These rates differed insignificantly (chi-squared p = 0.167). According to the Kaplan-Meier analysis, failure occurred at 19.95 months in the FRC group, 21.37 months in the FSW group, and 22.36 months in the TW group. The differences between the survival rates in the three groups were not significant (Cox regression p = 0.146). Conclusions Although the failure rate of the experimental retainer was two times lower than that of the FRC retainer, the difference was not statistically significant. The experimental TW retainer was successful, and larger studies are warranted to verify these results. PMID:27019825
A degree-day model of sheep grazing influence on alfalfa weevil and crop characteristics.
Goosey, Hayes B
2012-02-01
Domestic sheep (Ovis spp.) grazing is emerging as an integrated pest management tactic for alfalfa weevil, Hypera postica (Gyllenhal), management and a degree-day model is needed as a decision and support tool. In response to this need, grazing exclosures with unique degree-days and stocking rates were established at weekly intervals in a central Montana alfalfa field during 2008 and 2009. Analyses indicate that increased stocking rates and grazing degree-days were associated with decreased crop levels of weevil larvae. Larval data collected from grazing treatments were regressed against on-site and near-site temperatures that produced the same accuracy. The near-site model was chosen to encourage producer acceptance. The regression slope differed from zero, had an r2 of 0.83, and a root mean square error of 0.2. Crop data were collected to achieve optimal weevil management with forage quality and yield. Differences were recorded in crude protein, acid and neutral detergent fibers, total digestible nutrients, and mean stage by weight. Stem heights differed with higher stocking rates and degree-days recording the shortest alfalfa canopy height at harvest. The degree-day model was validated at four sites during 2010 with a mean square prediction error of 0.74. The recommendation from this research is to stock alfalfa fields in the spring before 63 DD with rates between 251 and 583 sheep days per hectare (d/ha). Sheep should be allowed to graze to a minimum of 106 and maximum of 150 DD before removal. This model gives field entomologists a new method for implementing grazing in an integrated pest management program.
Haiduc, Adrian Marius; van Duynhoven, John
2005-02-01
The porous properties of food materials are known to determine important macroscopic parameters such as water-holding capacity and texture. In conventional approaches, understanding is built from a long process of establishing macrostructure-property relations in a rational manner. Only recently, multivariate approaches were introduced for the same purpose. The model systems used here are oil-in-water emulsions, stabilised by protein, and form complex structures, consisting of fat droplets dispersed in a porous protein phase. NMR time-domain decay curves were recorded for emulsions with varied levels of fat, protein and water. Hardness, dry matter content and water drainage were determined by classical means and analysed for correlation with the NMR data with multivariate techniques. Partial least squares can calibrate and predict these properties directly from the continuous NMR exponential decays and yields regression coefficients higher than 82%. However, the calibration coefficients themselves belong to the continuous exponential domain and do little to explain the connection between NMR data and emulsion properties. Transformation of the NMR decays into a discreet domain with non-negative least squares permits the use of multilinear regression (MLR) on the resulting amplitudes as predictors and hardness or water drainage as responses. The MLR coefficients show that hardness is highly correlated with the components that have T2 distributions of about 20 and 200 ms whereas water drainage is correlated with components that have T2 distributions around 400 and 1800 ms. These T2 distributions very likely correlate with water populations present in pores with different sizes and/or wall mobility. The results for the emulsions studied demonstrate that NMR time-domain decays can be employed to predict properties and to provide insight in the underlying microstructural features.
González Costa, J J; Reigosa, M J; Matías, J M; Covelo, E F
2017-09-01
The aim of this study was to model the sorption and retention of Cd, Cu, Ni, Pb and Zn in soils. To that extent, the sorption and retention of these metals were studied and the soil characterization was performed separately. Multiple stepwise regression was used to produce multivariate models with linear techniques and with support vector machines, all of which included 15 explanatory variables characterizing soils. When the R-squared values are represented, two different groups are noticed. Cr, Cu and Pb sorption and retention show a higher R-squared; the most explanatory variables being humified organic matter, Al oxides and, in some cases, cation-exchange capacity (CEC). The other group of metals (Cd, Ni and Zn) shows a lower R-squared, and clays are the most explanatory variables, including a percentage of vermiculite and slime. In some cases, quartz, plagioclase or hematite percentages also show some explanatory capacity. Support Vector Machine (SVM) regression shows that the different models are not as regular as in multiple regression in terms of number of variables, the regression for nickel adsorption being the one with the highest number of variables in its optimal model. On the other hand, there are cases where the most explanatory variables are the same for two metals, as it happens with Cd and Cr adsorption. A similar adsorption mechanism is thus postulated. These patterns of the introduction of variables in the model allow us to create explainability sequences. Those which are the most similar to the selectivity sequences obtained by Covelo (2005) are Mn oxides in multiple regression and change capacity in SVM. Among all the variables, the only one that is explanatory for all the metals after applying the maximum parsimony principle is the percentage of sand in the retention process. In the competitive model arising from the aforementioned sequences, the most intense competitiveness for the adsorption and retention of different metals appears between Cr and Cd, Cu and Zn in multiple regression; and between Cr and Cd in SVM regression. Copyright © 2017 Elsevier B.V. All rights reserved.
Asquith, William H.; Roussel, Meghan C.
2007-01-01
Estimation of representative hydrographs from design storms, which are known as design hydrographs, provides for cost-effective, riskmitigated design of drainage structures such as bridges, culverts, roadways, and other infrastructure. During 2001?07, the U.S. Geological Survey (USGS), in cooperation with the Texas Department of Transportation, investigated runoff hydrographs, design storms, unit hydrographs,and watershed-loss models to enhance design hydrograph estimation in Texas. Design hydrographs ideally should mimic the general volume, peak, and shape of observed runoff hydrographs. Design hydrographs commonly are estimated in part by unit hydrographs. A unit hydrograph is defined as the runoff hydrograph that results from a unit pulse of excess rainfall uniformly distributed over the watershed at a constant rate for a specific duration. A time-distributed, watershed-loss model is required for modeling by unit hydrographs. This report develops a specific time-distributed, watershed-loss model known as an initial-abstraction, constant-loss model. For this watershed-loss model, a watershed is conceptualized to have the capacity to store or abstract an absolute depth of rainfall at and near the beginning of a storm. Depths of total rainfall less than this initial abstraction do not produce runoff. The watershed also is conceptualized to have the capacity to remove rainfall at a constant rate (loss) after the initial abstraction is satisfied. Additional rainfall inputs after the initial abstraction is satisfied contribute to runoff if the rainfall rate (intensity) is larger than the constant loss. The initial abstraction, constant-loss model thus is a two-parameter model. The initial-abstraction, constant-loss model is investigated through detailed computational and statistical analysis of observed rainfall and runoff data for 92 USGS streamflow-gaging stations (watersheds) in Texas with contributing drainage areas from 0.26 to 166 square miles. The analysis is limited to a previously described, watershed-specific, gamma distribution model of the unit hydrograph. In particular, the initial-abstraction, constant-loss model is tuned to the gamma distribution model of the unit hydrograph. A complex computational analysis of observed rainfall and runoff for the 92 watersheds was done to determine, by storm, optimal values of initial abstraction and constant loss. Optimal parameter values for a given storm were defined as those values that produced a modeled runoff hydrograph with volume equal to the observed runoff hydrograph and also minimized the residual sum of squares of the two hydrographs. Subsequently, the means of the optimal parameters were computed on a watershed-specific basis. These means for each watershed are considered the most representative, are tabulated, and are used in further statistical analyses. Statistical analyses of watershed-specific, initial abstraction and constant loss include documentation of the distribution of each parameter using the generalized lambda distribution. The analyses show that watershed development has substantial influence on initial abstraction and limited influence on constant loss. The means and medians of the 92 watershed-specific parameters are tabulated with respect to watershed development; although they have considerable uncertainty, these parameters can be used for parameter prediction for ungaged watersheds. The statistical analyses of watershed-specific, initial abstraction and constant loss also include development of predictive procedures for estimation of each parameter for ungaged watersheds. Both regression equations and regression trees for estimation of initial abstraction and constant loss are provided. The watershed characteristics included in the regression analyses are (1) main-channel length, (2) a binary factor representing watershed development, (3) a binary factor representing watersheds with an abundance of rocky and thin-soiled terrain, and (4) curve numb
Visentin, G; Penasa, M; Gottardo, P; Cassandro, M; De Marchi, M
2016-10-01
Milk minerals and coagulation properties are important for both consumers and processors, and they can aid in increasing milk added value. However, large-scale monitoring of these traits is hampered by expensive and time-consuming reference analyses. The objective of the present study was to develop prediction models for major mineral contents (Ca, K, Mg, Na, and P) and milk coagulation properties (MCP: rennet coagulation time, curd-firming time, and curd firmness) using mid-infrared spectroscopy. Individual milk samples (n=923) of Holstein-Friesian, Brown Swiss, Alpine Grey, and Simmental cows were collected from single-breed herds between January and December 2014. Reference analysis for the determination of both mineral contents and MCP was undertaken with standardized methods. For each milk sample, the mid-infrared spectrum in the range from 900 to 5,000cm(-1) was stored. Prediction models were calibrated using partial least squares regression coupled with a wavenumber selection technique called uninformative variable elimination, to improve model accuracy, and validated both internally and externally. The average reduction of wavenumbers used in partial least squares regression was 80%, which was accompanied by an average increment of 20% of the explained variance in external validation. The proportion of explained variance in external validation was about 70% for P, K, Ca, and Mg, and it was lower (40%) for Na. Milk coagulation properties prediction models explained between 54% (rennet coagulation time) and 56% (curd-firming time) of the total variance in external validation. The ratio of standard deviation of each trait to the respective root mean square error of prediction, which is an indicator of the predictive ability of an equation, suggested that the developed models might be effective for screening and collection of milk minerals and coagulation properties at the population level. Although prediction equations were not accurate enough to be proposed for analytic purposes, mid-infrared spectroscopy predictions could be evaluated as phenotypic information to genetically improve milk minerals and MCP on a large scale. Copyright © 2016 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Ludbrook, John
2010-07-01
1. There are two reasons for wanting to compare measurers or methods of measurement. One is to calibrate one method or measurer against another; the other is to detect bias. Fixed bias is present when one method gives higher (or lower) values across the whole range of measurement. Proportional bias is present when one method gives values that diverge progressively from those of the other. 2. Linear regression analysis is a popular method for comparing methods of measurement, but the familiar ordinary least squares (OLS) method is rarely acceptable. The OLS method requires that the x values are fixed by the design of the study, whereas it is usual that both y and x values are free to vary and are subject to error. In this case, special regression techniques must be used. 3. Clinical chemists favour techniques such as major axis regression ('Deming's method'), the Passing-Bablok method or the bivariate least median squares method. Other disciplines, such as allometry, astronomy, biology, econometrics, fisheries research, genetics, geology, physics and sports science, have their own preferences. 4. Many Monte Carlo simulations have been performed to try to decide which technique is best, but the results are almost uninterpretable. 5. I suggest that pharmacologists and physiologists should use ordinary least products regression analysis (geometric mean regression, reduced major axis regression): it is versatile, can be used for calibration or to detect bias and can be executed by hand-held calculator or by using the loss function in popular, general-purpose, statistical software.
NASA Astrophysics Data System (ADS)
Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said
2014-09-01
In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.
1991-09-01
matrix, the Regression Sum of Squares (SSR) and Error Sum of Squares (SSE) are also displayed as a percentage of the Total Sum of Squares ( SSTO ...vector when the student compares the SSR to the SSE. In addition to the plot, the actual values of SSR, SSE, and SSTO are also provided. Figure 3 gives the...Es ainSpace = E 3 Error- Eor Space =n t! L . Pro~cio q Yonto Pro~rct on of Y onto the simaton, pac ror Space SSR SSEL0.20 IV = 14,1 +IErrorI 2 SSTO
Determination of fat and total protein content in milk using conventional digital imaging.
Kucheryavskiy, Sergey; Melenteva, Anastasiia; Bogomolov, Andrey
2014-04-01
The applicability of conventional digital imaging to quantitative determination of fat and total protein in cow's milk, based on the phenomenon of light scatter, has been proved. A new algorithm for extracting features from digital images of milk samples has been developed. The algorithm takes into account spatial distribution of light, diffusely transmitted through a sample. The proposed method has been tested on two sample sets prepared from industrial raw milk standards, with variable fat and protein content. Partial Least-Squares (PLS) regression on the features calculated from images of monochromatically illuminated milk samples resulted in models with high prediction performance when analysed the sets separately (best models with cross-validated R(2)=0.974 for protein and R(2)=0.973 for fat content). However when analysed the sets jointly with the obtained results were significantly worse (best models with cross-validated R(2)=0.890 for fat content and R(2)=0.720 for protein content). The results have been compared with previously published Vis/SW-NIR spectroscopic study of similar samples. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kusumo, B. H.; Sukartono, S.; Bustan, B.
2018-02-01
Measuring soil organic carbon (C) using conventional analysis is tedious procedure, time consuming and expensive. It is needed simple procedure which is cheap and saves time. Near infrared technology offers rapid procedure as it works based on the soil spectral reflectance and without any chemicals. The aim of this research is to test whether this technology able to rapidly measure soil organic C in rice paddy field. Soil samples were collected from rice paddy field of Lombok Island Indonesia, and the coordinates of the samples were recorded. Parts of the samples were analysed using conventional analysis (Walkley and Black) and some other parts were scanned using near infrared spectroscopy (NIRS) for soil spectral collection. Partial Least Square Regression (PLSR) Models were developed using data of soil C analysed using conventional analysis and data from soil spectral reflectance. The models were moderately successful to measure soil C in rice paddy field of Lombok Island. This shows that the NIR technology can be further used to monitor the C change in rice paddy soil.
NASA Astrophysics Data System (ADS)
Sanchez, Gerardo
A flipped laboratory model involves significant preparation by the students on lab material prior to entry to the laboratory. This allows laboratory time to be focused on active learning through experiments. The aim of this study was to observe changes in student performance through the transition from a traditional laboratory format, to a flipped format. The data showed that for both Anatomy and Physiology (I and II) laboratories a more normal distribution of grades was observed once labs were flipped and lecture grade averages increased. Chi square and analysis of variance tests showed grade changes to a statistically significant degree, with a p value of less than 0.05 on both analyses. Regression analyses gave decreasing numbers after the flipped labs were introduced with an r. 2 value of .485 for A&P I, and .564 for A&P II. Results indicate improved scores for the lecture part of the A&P course, decreased outlying scores above 100, and all score distributions approached a more normal distribution.
Rudi, Knut; Zimonja, Monika; Kvenshagen, Bente; Rugtveit, Jarle; Midtvedt, Tore; Eggesbø, Merete
2007-01-01
We present a novel approach for comparing 16S rRNA gene clone libraries that is independent of both DNA sequence alignment and definition of bacterial phylogroups. These steps are the major bottlenecks in current microbial comparative analyses. We used direct comparisons of taxon density distributions in an absolute evolutionary coordinate space. The coordinate space was generated by using alignment-independent bilinear multivariate modeling. Statistical analyses for clone library comparisons were based on multivariate analysis of variance, partial least-squares regression, and permutations. Clone libraries from both adult and infant gastrointestinal tract microbial communities were used as biological models. We reanalyzed a library consisting of 11,831 clones covering complete colons from three healthy adults in addition to a smaller 390-clone library from infant feces. We show that it is possible to extract detailed information about microbial community structures using our alignment-independent method. Our density distribution analysis is also very efficient with respect to computer operation time, meeting the future requirements of large-scale screenings to understand the diversity and dynamics of microbial communities. PMID:17337554
Eves, Frank F
2015-02-01
The paper by Shaffer, McManama, Swank, Williams & Durgin (2014) uses correlations between palm-board and verbal estimates of geographical slant to argue against dissociation of the two measures. This paper reports the correlations between the verbal, visual and palm-board measures of geographical slant used by Proffitt and co-workers as a counterpoint to the analyses presented by Shaffer and colleagues. The data are for slant perception of staircases in a station (N=269), a shopping mall (N=229) and a civic square (N=109). In all three studies, modest correlations between the palm-board matches and the verbal reports were obtained. Multiple-regression analyses of potential contributors to verbal reports, however, indicated no unique association between verbal and palm-board measures. Data from three further studies (combined N=528) also show no evidence of any relationship. Shared method variance between visual and palm-board matches could account for the modest association between palm-boards and verbal reports. Copyright © 2015. Published by Elsevier B.V.
Brown, Lezah P.; Rospenda, Kathleen M.; Sokas, Rosemary K.; Conroy, Lorraine; Freels, Sally; Swanson, Naomi G.
2014-01-01
Objective This research project characterizes occupational injuries, illnesses, and assaults (OIIAs) as a negative outcome associated with worker exposure to generalized workplace abuse/harassment, sexual harassment, and job threat and pressure. Methods Data were collected in a nationwide random-digit-dial telephone survey conducted during 2003–2004. There were 2,151 study interviews conducted in English and Spanish. Analyses included cross tabulation with Pearson’s Chi-Square, and logistic regression analyses. Results Three hundred fifty-one (351) study participants reported having an OIIA during the 12 months preceding the study. Occurrences of generalized workplace harassment (O.R.= 1.53; CI = 1.33 – 1.75, p≤ 0.05), sexual harassment (O.R.= 1. 18; CI = 1.04 –1.34, p≤ 0.05), and job pressure and threat (O.R.=1.26; CI = 1.10–1.45, p≤ 0.05), were significantly associated with reporting an OIIA. Conclusions The psychosocial environment is significantly associated with an increased risk of OIIA. Further research is needed to understand causal pathways and to explore potential interventions. PMID:21154106
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
ERIC Educational Resources Information Center
Ding, Cody S.; Davison, Mark L.
2010-01-01
Akaike's information criterion is suggested as a tool for evaluating fit and dimensionality in metric multidimensional scaling that uses least squares methods of estimation. This criterion combines the least squares loss function with the number of estimated parameters. Numerical examples are presented. The results from analyses of both simulation…
Bonilla, M.G.; Mark, R.K.; Lienkaemper, J.J.
1984-01-01
In order to refine correlations of surface-wave magnitude, fault rupture length at the ground surface, and fault displacement at the surface by including the uncertainties in these variables, the existing data were critically reviewed and a new data base was compiled. Earthquake magnitudes were redetermined as necessary to make them as consistent as possible with the Gutenberg methods and results, which necessarily make up much of the data base. Measurement errors were estimated for the three variables for 58 moderate to large shallow-focus earthquakes. Regression analyses were then made utilizing the estimated measurement errors. The regression analysis demonstrates that the relations among the variables magnitude, length, and displacement are stochastic in nature. The stochastic variance, introduced in part by incomplete surface expression of seismogenic faulting, variation in shear modulus, and regional factors, dominates the estimated measurement errors. Thus, it is appropriate to use ordinary least squares for the regression models, rather than regression models based upon an underlying deterministic relation with the variance resulting from measurement errors. Significant differences exist in correlations of certain combinations of length, displacement, and magnitude when events are qrouped by fault type or by region, including attenuation regions delineated by Evernden and others. Subdivision of the data results in too few data for some fault types and regions, and for these only regressions using all of the data as a group are reported. Estimates of the magnitude and the standard deviation of the magnitude of a prehistoric or future earthquake associated with a fault can be made by correlating M with the logarithms of rupture length, fault displacement, or the product of length and displacement. Fault rupture area could be reliably estimated for about 20 of the events in the data set. Regression of MS on rupture area did not result in a marked improvement over regressions that did not involve rupture area. Because no subduction-zone earthquakes are included in this study, the reported results do not apply to such zones.
Impact of a comprehensive population health management program on health care costs.
Grossmeier, Jessica; Seaverson, Erin L D; Mangen, David J; Wright, Steven; Dalal, Karl; Phalen, Chris; Gold, Daniel B
2013-06-01
Assess the influence of participation in a population health management (PHM) program on health care costs. A quasi-experimental study relied on logistic and ordinary least squares regression models to compare the costs of program participants with those of nonparticipants, while controlling for differences in health care costs and utilization, demographics, and health status. Propensity score models were developed and analyses were weighted by inverse propensity scores to control for selection bias. Study models yielded an estimated savings of $60.65 per wellness participant per month and $214.66 per disease management participant per month. Program savings were combined to yield an integrated return-on-investment of $3 in savings for every dollar invested. A PHM program yielded a positive return on investment after 2 years of wellness program and 1 year of integrated disease management program launch.
For public service or money: understanding geographical imbalances in the health workforce.
Serneels, Pieter; Lindelow, Magnus; Montalvo, Jose G; Barr, Abigail
2007-05-01
Geographical imbalances in the health workforce have been a consistent feature of nearly all health systems, and especially in developing countries. In this paper we investigate the willingness to work in a rural area among final year nursing and medical students in Ethiopia. Analysing data obtained from contingent valuation questions for final year students from three medical schools and eight nursing schools, we find that there is substantial heterogeneity in the willingness to serve in rural areas. Using both ordinary least squares and maximum likelihood regression analysis, we find that household consumption and the student's motivation to help the poor are the main determinants of willingness to work in a rural area. We carry out a simulation on how much it would cost to get a target proportion of health workers to take up a rural post.
Wang, Jun; Yang, Dong-Lin; Chen, Zhong-Zhu; Gou, Ben-Fu
2016-06-01
In order to further reveal the differences of association between body mass index (BMI) and cancer incidence across populations, genders, and menopausal status, we performed comprehensive meta-analysis with eligible citations. The risk ratio (RR) of incidence at 10 different cancer sites (per 5kg/m(2) increase in BMI) were quantified separately by employing generalized least-squares to estimate trends, and combined by meta-analyses. We observed significantly stronger association between increased BMI and breast cancer incidence in the Asia-Pacific group (RR 1.18:1.11-1.26) than in European-Australian (1.05:1.00-1.09) and North-American group (1.06:1.03-1.08) (meta-regression p<0.05). No association between increased BMI and pancreatic cancer incidence (0.94:0.71-1.24) was shown in the Asia-Pacific group (meta-regression p<0.05), whereas positive associations were found in other two groups. A significantly higher RR in men was found for colorectal cancer in comparison with women (meta-regression p<0.05). Compared with postmenopausal women, premenopausal women displayed significantly higher RR for ovarian cancer (pre- vs. post-=1.10 vs. 1.01, meta-regression p<0.05), but lower RR for breast cancer (pre- vs. post-=0.99 vs. 1.11, meta-regression p<0.0001). Our results indicate that overweight or obesity is a strong risk factor of cancer incidence at several cancer sites. Genders, populations, and menopausal status are important factors effecting the association between obesity and cancer incidence for certain cancer types. Copyright © 2016 Elsevier Ltd. All rights reserved.
Fast function-on-scalar regression with penalized basis expansions.
Reiss, Philip T; Huang, Lei; Mennes, Maarten
2010-01-01
Regression models for functional responses and scalar predictors are often fitted by means of basis functions, with quadratic roughness penalties applied to avoid overfitting. The fitting approach described by Ramsay and Silverman in the 1990 s amounts to a penalized ordinary least squares (P-OLS) estimator of the coefficient functions. We recast this estimator as a generalized ridge regression estimator, and present a penalized generalized least squares (P-GLS) alternative. We describe algorithms by which both estimators can be implemented, with automatic selection of optimal smoothing parameters, in a more computationally efficient manner than has heretofore been available. We discuss pointwise confidence intervals for the coefficient functions, simultaneous inference by permutation tests, and model selection, including a novel notion of pointwise model selection. P-OLS and P-GLS are compared in a simulation study. Our methods are illustrated with an analysis of age effects in a functional magnetic resonance imaging data set, as well as a reanalysis of a now-classic Canadian weather data set. An R package implementing the methods is publicly available.
Jović, Ozren
2016-12-15
A novel method for quantitative prediction and variable-selection on spectroscopic data, called Durbin-Watson partial least-squares regression (dwPLS), is proposed in this paper. The idea is to inspect serial correlation in infrared data that is known to consist of highly correlated neighbouring variables. The method selects only those variables whose intervals have a lower Durbin-Watson statistic (dw) than a certain optimal cutoff. For each interval, dw is calculated on a vector of regression coefficients. Adulteration of cold-pressed linseed oil (L), a well-known nutrient beneficial to health, is studied in this work by its being mixed with cheaper oils: rapeseed oil (R), sesame oil (Se) and sunflower oil (Su). The samples for each botanical origin of oil vary with respect to producer, content and geographic origin. The results obtained indicate that MIR-ATR, combined with dwPLS could be implemented to quantitative determination of edible-oil adulteration. Copyright © 2016 Elsevier Ltd. All rights reserved.
Jones, Andrew M; Lomas, James; Moore, Peter T; Rice, Nigel
2016-10-01
We conduct a quasi-Monte-Carlo comparison of the recent developments in parametric and semiparametric regression methods for healthcare costs, both against each other and against standard practice. The population of English National Health Service hospital in-patient episodes for the financial year 2007-2008 (summed for each patient) is randomly divided into two equally sized subpopulations to form an estimation set and a validation set. Evaluating out-of-sample using the validation set, a conditional density approximation estimator shows considerable promise in forecasting conditional means, performing best for accuracy of forecasting and among the best four for bias and goodness of fit. The best performing model for bias is linear regression with square-root-transformed dependent variables, whereas a generalized linear model with square-root link function and Poisson distribution performs best in terms of goodness of fit. Commonly used models utilizing a log-link are shown to perform badly relative to other models considered in our comparison.
Wiley, J.B.; Atkins, John T.; Tasker, Gary D.
2000-01-01
Multiple and simple least-squares regression models for the log10-transformed 100-year discharge with independent variables describing the basin characteristics (log10-transformed and untransformed) for 267 streamflow-gaging stations were evaluated, and the regression residuals were plotted as areal distributions that defined three regions of the State, designated East, North, and South. Exploratory data analysis procedures identified 31 gaging stations at which discharges are different than would be expected for West Virginia. Regional equations for the 2-, 5-, 10-, 25-, 50-, 100-, 200-, and 500-year peak discharges were determined by generalized least-squares regression using data from 236 gaging stations. Log10-transformed drainage area was the most significant independent variable for all regions.Equations developed in this study are applicable only to rural, unregulated, streams within the boundaries of West Virginia. The accuracy of estimating equations is quantified by measuring the average prediction error (from 27.7 to 44.7 percent) and equivalent years of record (from 1.6 to 20.0 years).
An interactive website for analytical method comparison and bias estimation.
Bahar, Burak; Tuncel, Ayse F; Holmes, Earle W; Holmes, Daniel T
2017-12-01
Regulatory standards mandate laboratories to perform studies to ensure accuracy and reliability of their test results. Method comparison and bias estimation are important components of these studies. We developed an interactive website for evaluating the relative performance of two analytical methods using R programming language tools. The website can be accessed at https://bahar.shinyapps.io/method_compare/. The site has an easy-to-use interface that allows both copy-pasting and manual entry of data. It also allows selection of a regression model and creation of regression and difference plots. Available regression models include Ordinary Least Squares, Weighted-Ordinary Least Squares, Deming, Weighted-Deming, Passing-Bablok and Passing-Bablok for large datasets. The server processes the data and generates downloadable reports in PDF or HTML format. Our website provides clinical laboratories a practical way to assess the relative performance of two analytical methods. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Discriminative least squares regression for multiclass classification and feature selection.
Xiang, Shiming; Nie, Feiping; Meng, Gaofeng; Pan, Chunhong; Zhang, Changshui
2012-11-01
This paper presents a framework of discriminative least squares regression (LSR) for multiclass classification and feature selection. The core idea is to enlarge the distance between different classes under the conceptual framework of LSR. First, a technique called ε-dragging is introduced to force the regression targets of different classes moving along opposite directions such that the distances between classes can be enlarged. Then, the ε-draggings are integrated into the LSR model for multiclass classification. Our learning framework, referred to as discriminative LSR, has a compact model form, where there is no need to train two-class machines that are independent of each other. With its compact form, this model can be naturally extended for feature selection. This goal is achieved in terms of L2,1 norm of matrix, generating a sparse learning model for feature selection. The model for multiclass classification and its extension for feature selection are finally solved elegantly and efficiently. Experimental evaluation over a range of benchmark datasets indicates the validity of our method.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Interpretation of the Coefficients in the Fit y = at + bx + c
ERIC Educational Resources Information Center
Farnsworth, David L.
2006-01-01
The goals of this note are to derive formulas for the coefficients a and b in the least-squares regression plane y = at + bx + c for observations (t[subscript]i,x[subscript]i,y[subscript]i), i = 1, 2, ..., n, and to present meanings for the coefficients a and b. In this note, formulas for the coefficients a and b in the least-squares fit are…
NASA Astrophysics Data System (ADS)
Liu, Fei; He, Yong
2008-02-01
Visible and near infrared (Vis/NIR) transmission spectroscopy and chemometric methods were utilized to predict the pH values of cola beverages. Five varieties of cola were prepared and 225 samples (45 samples for each variety) were selected for the calibration set, while 75 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay and standard normal variate (SNV) followed by first-derivative were used as the pre-processing methods. Partial least squares (PLS) analysis was employed to extract the principal components (PCs) which were used as the inputs of least squares-support vector machine (LS-SVM) model according to their accumulative reliabilities. Then LS-SVM with radial basis function (RBF) kernel function and a two-step grid search technique were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias were 0.961, 0.040 and 0.012 for PLS, while 0.975, 0.031 and 4.697x10 -3 for LS-SVM, respectively. Both methods obtained a satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be applied as an alternative way for the prediction of pH of cola beverages.
Methods for Improving Information from ’Undesigned’ Human Factors Experiments.
Human factors engineering, Information processing, Regression analysis , Experimental design, Least squares method, Analysis of variance, Correlation techniques, Matrices(Mathematics), Multiple disciplines, Mathematical prediction
Delamou, Alexandre; Delvaux, Therese; Beavogui, Abdoul Habib; Levêque, Alain; Zhang, Wei-Hong; De Brouwere, Vincent
2016-10-10
Obstetric fistula is a serious medical condition which affects women in low income countries. Despite the progress of research on fistula, there is little data on long term follow-up after surgical repair. The objective of this study is to analyse the factors associated with the recurrence of fistula and the outcomes of pregnancy following fistula repair in Guinea. A descriptive longitudinal study design will be used. The study will include women who underwent fistula repair between 2012 and 2015 at 3 fistula repair sites supported by the Fistula Care Project in Guinea (Kissidougou Prefectoral Hospital, Labé Regional Hospital and Jean Paul II Hospital of Conakry). Participants giving an informed consent after a home visit by the Fistula Counsellors will be interviewed for enrolment at least 3 months after hospital discharge The study enrolment period is January 1, 2012 - June 30, 2015. Participants will be followed-up until June 30, 2016 for a maximum follow up period of 48 months. The sample size is estimated at 364 women. The cumulative incidence rates of fistula recurrence and pregnancy post-repair will be calculated using Kaplan-Meier methods and the risk factor analyses will be performed using adjusted Cox regression. The outcomes of pregnancy will be analysed using proportions, the Pearson's Chi Square (χ2) and a logistic regression with associations reported as risk ratios with 95 % confidence intervals. All analyses will be done using STATA version 13 (STATA Corporation, College Station, TX, USA) with a level of significance set at P < 0.05. This study will contribute to improving the prevention and management of obstetric fistula within the community and support advocacy efforts for the social reintegration of fistula patients into their communities. It will also guide policy makers and strategic planning for fistula programs. ClinicalTrials.gov Identifier: NCT02686957 . Registered 12 February 2016 (Retrospectively registered).
[Association of XRCC1 genetic polymorphism with susceptibility to non-Hodgkin's lymphoma].
Li, Su-Xia; Zhu, Hong-Li; Guo, Bo; Yang, Yang; Wang, Hong-Yan; Sun, Jing-Fen; Cao, Yong-Bin
2014-08-01
The purpose of this study was to explore the association between X-ray repair cross-complementing group 1 (XRCC1)gene polymorphism and non-Hodgkin's lymphoma risk. A total of 282 non-Hodgkin's lymphoma (NHL) patients and 231 normal controls were used to investigate the effect of three XRCC1 gene polymorphisms (rs25487, rs25489, rs1799782) on susceptibility to non-Hodgkin's lymphoma. Genotyping was performed by using SNaPshot method. All statistical analyses were done with R software. Genotype and allele frequencies of XRCC1 were compared between the patients and controls by using the chi-square test. Crude and adjusted odd ratios and 95% confidence intervals were calculated by using logistic regression on the basis of genetic different models. For four kinds of NHL, subgroup analyses were also conducted. Combined genotype analyses of the three XRCC1 polymorphisms were also done by using logistic regression. The results showed that the variant genotype frequency was not significantly different between the controls and NHL or NHL subtype cases. Combined genotype analyses of XRCC1 399-280-194 results showed that the combined genotype was not associated with risk of NHL overall, but the VT-WT-WT combined genotype was associated with the decreased risk of T-NHL (OR: 0.21; 95%CI (0.06-0.8); P = 0.022), and the WT-VT-WT combined genotype was associated with the increased risk of FL(OR:15.23; 95%CI (1.69-137.39); P = 0.015). It is concluded that any studied polymorphism (rs25487, rs25489, rs1799782) alone was not shown to be rela-ted with the risk of NHL or each histologic subtype of NHL. The combined genotype with mutation of three SNP of XRCC1 was not related to the risk of NHL. However, further large-scale studies would be needed to confirm the association of decreased or increased risk for T-NHL and FL with the risk 3 combined SNP mutants of XRCC1 polymorphism.
Modified Regression Correlation Coefficient for Poisson Regression Model
NASA Astrophysics Data System (ADS)
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
Zhu, Hongyan; Chu, Bingquan; Fan, Yangyang; Tao, Xiaoya; Yin, Wenxin; He, Yong
2017-08-10
We investigated the feasibility and potentiality of determining firmness, soluble solids content (SSC), and pH in kiwifruits using hyperspectral imaging, combined with variable selection methods and calibration models. The images were acquired by a push-broom hyperspectral reflectance imaging system covering two spectral ranges. Weighted regression coefficients (BW), successive projections algorithm (SPA) and genetic algorithm-partial least square (GAPLS) were compared and evaluated for the selection of effective wavelengths. Moreover, multiple linear regression (MLR), partial least squares regression and least squares support vector machine (LS-SVM) were developed to predict quality attributes quantitatively using effective wavelengths. The established models, particularly SPA-MLR, SPA-LS-SVM and GAPLS-LS-SVM, performed well. The SPA-MLR models for firmness (R pre = 0.9812, RPD = 5.17) and SSC (R pre = 0.9523, RPD = 3.26) at 380-1023 nm showed excellent performance, whereas GAPLS-LS-SVM was the optimal model at 874-1734 nm for predicting pH (R pre = 0.9070, RPD = 2.60). Image processing algorithms were developed to transfer the predictive model in every pixel to generate prediction maps that visualize the spatial distribution of firmness and SSC. Hence, the results clearly demonstrated that hyperspectral imaging has the potential as a fast and non-invasive method to predict the quality attributes of kiwifruits.
Lee, Soo Yee; Mediani, Ahmed; Maulidiani, Maulidiani; Khatib, Alfi; Ismail, Intan Safinar; Zawawi, Norhasnida; Abas, Faridah
2018-01-01
Neptunia oleracea is a plant consumed as a vegetable and which has been used as a folk remedy for several diseases. Herein, two regression models (partial least squares, PLS; and random forest, RF) in a metabolomics approach were compared and applied to the evaluation of the relationship between phenolics and bioactivities of N. oleracea. In addition, the effects of different extraction conditions on the phenolic constituents were assessed by pattern recognition analysis. Comparison of the PLS and RF showed that RF exhibited poorer generalization and hence poorer predictive performance. Both the regression coefficient of PLS and the variable importance of RF revealed that quercetin and kaempferol derivatives, caffeic acid and vitexin-2-O-rhamnoside were significant towards the tested bioactivities. Furthermore, principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) results showed that sonication and absolute ethanol are the preferable extraction method and ethanol ratio, respectively, to produce N. oleracea extracts with high phenolic levels and therefore high DPPH scavenging and α-glucosidase inhibitory activities. Both PLS and RF are useful regression models in metabolomics studies. This work provides insight into the performance of different multivariate data analysis tools and the effects of different extraction conditions on the extraction of desired phenolics from plants. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.
Eash, David A.; Barnes, Kimberlee K.; O'Shea, Padraic S.; Gelder, Brian K.
2018-02-14
Basin-characteristic measurements related to stream length, stream slope, stream density, and stream order have been identified as significant variables for estimation of flood, flow-duration, and low-flow discharges in Iowa. The placement of channel initiation points, however, has always been a matter of individual interpretation, leading to differences in stream definitions between analysts.This study investigated five different methods to define stream initiation using 3-meter light detection and ranging (lidar) digital elevation models (DEMs) data for 17 streamgages with drainage areas less than 50 square miles within the Des Moines Lobe landform region in north-central Iowa. Each DEM was hydrologically enforced and the five stream initiation methods were used to define channel initiation points and the downstream flow paths. The five different methods to define stream initiation were tested side-by-side for three watershed delineations: (1) the total drainage-area delineation, (2) an effective drainage-area delineation of basins based on a 2-percent annual exceedance probability (AEP) 12-hour rainfall, and (3) an effective drainage-area delineation based on a 20-percent AEP 12-hour rainfall.Generalized least squares regression analysis was used to develop a set of equations for sites in the Des Moines Lobe landform region for estimating discharges for ungaged stream sites with 50-, 20-, 10-, 4-, 2-, 1-, 0.5-, and 0.2-percent AEPs. A total of 17 streamgages were included in the development of the regression equations. In addition, geographic information system software was used to measure 58 selected basin-characteristics for each streamgage.Results of the regression analyses of the 15 lidar datasets indicate that the datasets that produce regional regression equations (RREs) with the best overall predictive accuracy are the National Hydrographic Dataset, Iowa Department of Natural Resources, and profile curvature of 0.5 stream initiation methods combined with the 20-percent AEP 12-hour rainfall watershed delineation method. These RREs have a mean average standard error of prediction (SEP) for 4-, 2-, and 1-percent AEP discharges of 53.9 percent and a mean SEP for all eight AEPs of 55.5 percent. Compared to the RREs developed in this study using the basin characteristics from the U.S. Geological Survey StreamStats application, the lidar basin characteristics provide better overall predictive accuracy.
Using ridge regression in systematic pointing error corrections
NASA Technical Reports Server (NTRS)
Guiar, C. N.
1988-01-01
A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.
2006-03-01
identify if an explanatory variable may have been omitted due to model misspecification ( Ramsey , 1979). The RESET test resulted in failure to...Prob > F 0.0094 This model was also regressed using Huber-White estimators. Again, the Ramsey RESET test was done to ensure relevant...Aircraft. Annapolis, MD: Naval Institute Press, 2004. Ramsey , J. B. “ Tests for Specification Errors in Classical Least-Squares Regression Analysis
Umesh P. Agarwal; Richard S. Reiner; Sally A. Ralph
2010-01-01
Two new methods based on FTâRaman spectroscopy, one simple, based on band intensity ratio, and the other using a partial least squares (PLS) regression model, are proposed to determine cellulose I crystallinity. In the simple method, crystallinity in cellulose I samples was determined based on univariate regression that was first developed using the Raman band...
Avoiding and Correcting Bias in Score-Based Latent Variable Regression with Discrete Manifest Items
ERIC Educational Resources Information Center
Lu, Irene R. R.; Thomas, D. Roland
2008-01-01
This article considers models involving a single structural equation with latent explanatory and/or latent dependent variables where discrete items are used to measure the latent variables. Our primary focus is the use of scores as proxies for the latent variables and carrying out ordinary least squares (OLS) regression on such scores to estimate…
Simultaneous spectrophotometric determination of salbutamol and bromhexine in tablets.
Habib, I H I; Hassouna, M E M; Zaki, G A
2005-03-01
Typical anti-mucolytic drugs called salbutamol hydrochloride and bromhexine sulfate encountered in tablets were determined simultaneously either by using linear regression at zero-crossing wavelengths of the first derivation of UV-spectra or by application of multiple linear partial least squares regression method. The results obtained by the two proposed mathematical methods were compared with those obtained by the HPLC technique.
NASA Technical Reports Server (NTRS)
Rogers, David
1991-01-01
G/SPLINES are a hybrid of Friedman's Multivariable Adaptive Regression Splines (MARS) algorithm with Holland's Genetic Algorithm. In this hybrid, the incremental search is replaced by a genetic search. The G/SPLINE algorithm exhibits performance comparable to that of the MARS algorithm, requires fewer least squares computations, and allows significantly larger problems to be considered.
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Guo, Canyong; Luo, Xuefang; Zhou, Xiaohua; Shi, Beijia; Wang, Juanjuan; Zhao, Jinqi; Zhang, Xiaoxia
2017-06-05
Vibrational spectroscopic techniques such as infrared, near-infrared and Raman spectroscopy have become popular in detecting and quantifying polymorphism of pharmaceutics since they are fast and non-destructive. This study assessed the ability of three vibrational spectroscopy combined with multivariate analysis to quantify a low-content undesired polymorph within a binary polymorphic mixture. Partial least squares (PLS) regression and support vector machine (SVM) regression were employed to build quantitative models. Fusidic acid, a steroidal antibiotic, was used as the model compound. It was found that PLS regression performed slightly better than SVM regression in all the three spectroscopic techniques. Root mean square errors of prediction (RMSEP) were ranging from 0.48% to 1.17% for diffuse reflectance FTIR spectroscopy and 1.60-1.93% for diffuse reflectance FT-NIR spectroscopy and 1.62-2.31% for Raman spectroscopy. The results indicate that diffuse reflectance FTIR spectroscopy offers significant advantages in providing accurate measurement of polymorphic content in the fusidic acid binary mixtures, while Raman spectroscopy is the least accurate technique for quantitative analysis of polymorphs. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Nazeer, Majid; Bilal, Muhammad
2018-04-01
Landsat-5 Thematic Mapper (TM) dataset have been used to estimate salinity in the coastal area of Hong Kong. Four adjacent Landsat TM images were used in this study, which was atmospherically corrected using the Second Simulation of the Satellite Signal in the Solar Spectrum (6S) radiative transfer code. The atmospherically corrected images were further used to develop models for salinity using Ordinary Least Square (OLS) regression and Geographically Weighted Regression (GWR) based on in situ data of October 2009. Results show that the coefficient of determination ( R 2) of 0.42 between the OLS estimated and in situ measured salinity is much lower than that of the GWR model, which is two times higher ( R 2 = 0.86). It indicates that the GWR model has more ability than the OLS regression model to predict salinity and show its spatial heterogeneity better. It was observed that the salinity was high in Deep Bay (north-western part of Hong Kong) which might be due to the industrial waste disposal, whereas the salinity was estimated to be constant (32 practical salinity units) towards the open sea.
Shao, Limin; Griffiths, Peter R; Leytem, April B
2010-10-01
The automated quantification of three greenhouse gases, ammonia, methane, and nitrous oxide, in the vicinity of a large dairy farm by open-path Fourier transform infrared (OP/FT-IR) spectrometry at intervals of 5 min is demonstrated. Spectral pretreatment, including the automated detection and correction of the effect of interrupting the infrared beam, is by a moving object, and the automated correction for the nonlinear detector response is applied to the measured interferograms. Two ways of obtaining quantitative data from OP/FT-IR data are described. The first, which is installed in a recently acquired commercial OP/FT-IR spectrometer, is based on classical least-squares (CLS) regression, and the second is based on partial least-squares (PLS) regression. It is shown that CLS regression only gives accurate results if the absorption features of the analytes are located in very short spectral intervals where lines due to atmospheric water vapor are absent or very weak; of the three analytes examined, only ammonia fell into this category. On the other hand, PLS regression works allowed what appeared to be accurate results to be obtained for all three analytes.
Mitigating Errors in External Respiratory Surrogate-Based Models of Tumor Position
DOE Office of Scientific and Technical Information (OSTI.GOV)
Malinowski, Kathleen T.; Fischell Department of Bioengineering, University of Maryland, College Park, MD; McAvoy, Thomas J.
2012-04-01
Purpose: To investigate the effect of tumor site, measurement precision, tumor-surrogate correlation, training data selection, model design, and interpatient and interfraction variations on the accuracy of external marker-based models of tumor position. Methods and Materials: Cyberknife Synchrony system log files comprising synchronously acquired positions of external markers and the tumor from 167 treatment fractions were analyzed. The accuracy of Synchrony, ordinary-least-squares regression, and partial-least-squares regression models for predicting the tumor position from the external markers was evaluated. The quantity and timing of the data used to build the predictive model were varied. The effects of tumor-surrogate correlation and the precisionmore » in both the tumor and the external surrogate position measurements were explored by adding noise to the data. Results: The tumor position prediction errors increased during the duration of a fraction. Increasing the training data quantities did not always lead to more accurate models. Adding uncorrelated noise to the external marker-based inputs degraded the tumor-surrogate correlation models by 16% for partial-least-squares and 57% for ordinary-least-squares. External marker and tumor position measurement errors led to tumor position prediction changes 0.3-3.6 times the magnitude of the measurement errors, varying widely with model algorithm. The tumor position prediction errors were significantly associated with the patient index but not with the fraction index or tumor site. Partial-least-squares was as accurate as Synchrony and more accurate than ordinary-least-squares. Conclusions: The accuracy of surrogate-based inferential models of tumor position was affected by all the investigated factors, except for the tumor site and fraction index.« less
Spatial analysis of alcohol-related motor vehicle crash injuries in southeastern Michigan.
Meliker, Jaymie R; Maio, Ronald F; Zimmerman, Marc A; Kim, Hyungjin Myra; Smith, Sarah C; Wilson, Mark L
2004-11-01
Temporal, behavioral and social risk factors that affect injuries resulting from alcohol-related motor vehicle crashes have been characterized in previous research. Much less is known about spatial patterns and environmental associations of alcohol-related motor vehicle crashes. The aim of this study was to evaluate geographic patterns of alcohol-related motor vehicle crashes and to determine if locations of alcohol outlets are associated with those crashes. In addition, we sought to demonstrate the value of integrating spatial and traditional statistical techniques in the analysis of this preventable public health risk. The study design was a cross-sectional analysis of individual-level blood alcohol content, traffic report information, census block group data, and alcohol distribution outlets. Besag and Newell's spatial analysis and traditional logistic regression both indicated that areas of low population density had more alcohol-related motor vehicle crashes than expected (P < 0.05). There was no significant association between alcohol outlets and alcohol-related motor vehicle crashes using distance analyses, logistic regression, and Chi-square. Differences in environmental or behavioral factors characteristic of areas of low population density may be responsible for the higher proportion of alcohol-related crashes occurring in these areas.
Factors Impacting Online Ratings for Otolaryngologists.
Calixto, Nathaniel E; Chiao, Whitney; Durr, Megan L; Jiang, Nancy
2018-06-01
To identify factors associated with online patient ratings and comments for a nationwide sample of otolaryngologists. Ratings, demographic information, and written comments were obtained for a random sample of otolaryngologists from HealthGrades.com and Vitals.com . Online Presence Score (OPS) was based on 10 criteria, including professional website and social media profiles. Regression analyses identified factors associated with increased rating. We evaluated for correlations between OPS and other attributes with star rating and used chi-square tests to evaluate content differences between positive and negative comments. On linear regression, increased OPS was associated with higher ratings on HealthGrades and Vitals; higher ratings were also associated with younger age on Vitals and less experience on HealthGrades. However, detailed correlation studies showed weak correlation between OPS and rating; age and graduation year also showed low correlation with ratings. Negative comments more likely focused on surgeon-independent factors or poor bedside manner. Though younger otolaryngologists with greater online presence tend to have higher ratings, weak correlations suggest that age and online presence have only a small impact on the content found on ratings websites. While most written comments are positive, deficiencies in bedside manner or other physician-independent factors tend to elicit negative comments.
Islam, Rakibul M
2017-01-01
Despite startling developments in maternal health care services, use of these services has been disproportionately distributed among different minority groups in Bangladesh. This study aimed to explore the factors associated with the use of these services among the Mru indigenous women in Bangladesh. A total of 374 currently married Mru women were interviewed using convenience sampling from three administrative sub-districts of the Bandarban district from June to August of 2009. Associations were assessed using Chi-square tests, and a binary logistic regression model was employed to explore factors associated with the use of maternal health care services. Among the women surveyed, 30% had ever visited maternal health care services in the Mru community, a very low proportion compared with mainstream society. Multivariable logistic regression analyses revealed that place of residence, religion, school attendance, place of service provided, distance to the service center, and exposure to mass media were factors significantly associated with the use of maternal health care services among Mru women. Considering indigenous socio-cultural beliefs and practices, comprehensive community-based outreach health programs are recommended in the community with a special emphasis on awareness through maternal health education and training packages for the Mru adolescents.
Estimation of magnitude and frequency of floods for streams in Puerto Rico : new empirical models
Ramos-Gines, Orlando
1999-01-01
Flood-peak discharges and frequencies are presented for 57 gaged sites in Puerto Rico for recurrence intervals ranging from 2 to 500 years. The log-Pearson Type III distribution, the methodology recommended by the United States Interagency Committee on Water Data, was used to determine the magnitude and frequency of floods at the gaged sites having 10 to 43 years of record. A technique is presented for estimating flood-peak discharges at recurrence intervals ranging from 2 to 500 years for unregulated streams in Puerto Rico with contributing drainage areas ranging from 0.83 to 208 square miles. Loglinear multiple regression analyses, using climatic and basin characteristics and peak-discharge data from the 57 gaged sites, were used to construct regression equations to transfer the magnitude and frequency information from gaged to ungaged sites. The equations have contributing drainage area, depth-to-rock, and mean annual rainfall as the basin and climatic characteristics in estimating flood peak discharges. Examples are given to show a step-by-step procedure in calculating a 100-year flood at a gaged site, an ungaged site, a site near a gaged location, and a site between two gaged sites.
Li, Jing; Wang, Ying; Han, Fang; Wang, Zhu; Xu, Lichun; Tong, Jiandong
2016-12-13
Marital status correlates with health. Our goal was to examine the impact of marital status on the survival outcomes of patients with colorectal neuroendocrine neoplasms (NENs). The Surveillance, Epidemiology and End Results program was used to identify 1,289 eligible patients diagnosed between 2004 and 2010 with colorectal NENs. Statistical analyses were performed using Chi-square, Kaplan-Meier, and Cox regression proportional hazards methods. Patients in the widowed group had the highest proportion of larger tumor (>2cm), and higher ratio of poor grade (Grade III and IV) and more tumors at advanced stage (P<0.05). The 5-year cause specific survival (CSS) was 76% in the married group, 51% in the widowed group, 73% in the single group, and 72% in the divorced/separated group, which manifest statistically significant difference in the univariate log-rank test and Cox regression model (P<0.05). Furthermore, marital status was an independent prognostic factor only in Distant stage (P<0.001). In conclusion, patients in widowed group were at greater risk of cancer specific mortality from colorectal NENs and social support may lead to improved outcomes for patients with NENs.
Indrehus, Oddny; Aralt, Tor Tybring
2005-04-01
Aerosol, NO and CO concentration, temperature, air humidity, air flow and number of running ventilation fans were measured by continuous analysers every minute for a whole week for six different one-week periods spread over ten months in 2001 and 2002 at measuring stations in the 7860 m long tunnel. The ventilation control system was mainly based on aerosol measurements taken by optical scatter sensors. The ventilation turned out to be satisfactory according to Norwegian air quality standards for road tunnels; however, there was some uncertainty concerning the NO2 levels. The air humidity and temperature inside the tunnel were highly influenced by the outside metrological conditions. Statistical models for NO concentration were developed and tested; correlations between predicted and measured NO were 0.81 for a partial least squares regression (PLS1) model based on CO and aerosol, and 0.77 for a linear regression model based only on aerosol. Hence, the ventilation control system should not solely be based on aerosol measurements. Since NO2 is the hazardous polluter, modelling NO2 concentration rather than NO should be preferred in any further optimising of the ventilation control.
Adams, Temitope F; Wongchai, Chatchawal; Chaidee, Anchalee; Pfeiffer, Wolfgang
2016-01-01
Plant essential oils have been suggested as a promising alternative to the established mosquito repellent DEET (N,N-diethyl-meta-toluamide). Searching for an assay with generally available equipment, we designed a new audiovisual assay of repellent activity against mosquitoes "Singing in the Tube," testing single mosquitoes in Drosophila cultivation tubes. Statistics with regression analysis should compensate for limitations of simple hardware. The assay was established with female Culex pipiens mosquitoes in 60 experiments, 120-h audio recording, and 2580 estimations of the distance between mosquito sitting position and the chemical. Correlations between parameters of sitting position, flight activity pattern, and flight tone spectrum were analyzed. Regression analysis of psycho-acoustic data of audio files (dB[A]) used a squared and modified sinus function determining wing beat frequency WBF ± SD (357 ± 47 Hz). Application of logistic regression defined the repelling velocity constant. The repelling velocity constant showed a decreasing order of efficiency of plant essential oils: rosemary (Rosmarinus officinalis), eucalyptus (Eucalyptus globulus), lavender (Lavandula angustifolia), citronella (Cymbopogon nardus), tea tree (Melaleuca alternifolia), clove (Syzygium aromaticum), lemon (Citrus limon), patchouli (Pogostemon cablin), DEET, cedar wood (Cedrus atlantica). In conclusion, we suggest (1) disease vector control (e.g., impregnation of bed nets) by eight plant essential oils with repelling velocity superior to DEET, (2) simple mosquito repellency testing in Drosophila cultivation tubes, (3) automated approaches and room surveillance by generally available audio equipment (dB[A]: ISO standard 226), and (4) quantification of repellent activity by parameters of the audiovisual assay defined by correlation and regression analyses.
Trepczynski, Adam; Kutzner, Ines; Bergmann, Georg; Taylor, William R; Heller, Markus O
2014-05-01
The external knee adduction moment (EAM) is often considered a surrogate measure of the distribution of loads across the tibiofemoral joint during walking. This study was undertaken to quantify the relationship between the EAM and directly measured medial tibiofemoral contact forces (Fmed ) in a sample of subjects across a spectrum of activities. The EAM for 9 patients who underwent total knee replacement was calculated using inverse dynamics analysis, while telemetric implants provided Fmed for multiple repetitions of 10 activities, including walking, stair negotiation, sit-to-stand activities, and squatting. The effects of the factors "subject" and "activity" on the relationships between Fmed and EAM were quantified using mixed-effects regression analyses in terms of the root mean square error (RMSE) and the slope of the regression. Across subjects and activities a good correlation between peak EAM and Fmed values was observed, with an overall R(2) value of 0.88. However, the slope of the linear regressions varied between subjects by up to a factor of 2. At peak EAM and Fmed , the RMSE of the regression across all subjects was 35% body weight (%BW), while the maximum error was 127 %BW. The relationship between EAM and Fmed is generally good but varies considerably across subjects and activities. These findings emphasize the limitation of relying solely on the EAM to infer medial joint loading when excessive directed cocontraction of muscles exists and call for further investigations into the soft tissue-related mechanisms that modulate the internal forces at the knee. Copyright © 2014 by the American College of Rheumatology.
Hao, Z Q; Li, C M; Shen, M; Yang, X Y; Li, K H; Guo, L B; Li, X Y; Lu, Y F; Zeng, X Y
2015-03-23
Laser-induced breakdown spectroscopy (LIBS) with partial least squares regression (PLSR) has been applied to measuring the acidity of iron ore, which can be defined by the concentrations of oxides: CaO, MgO, Al₂O₃, and SiO₂. With the conventional internal standard calibration, it is difficult to establish the calibration curves of CaO, MgO, Al₂O₃, and SiO₂ in iron ore due to the serious matrix effects. PLSR is effective to address this problem due to its excellent performance in compensating the matrix effects. In this work, fifty samples were used to construct the PLSR calibration models for the above-mentioned oxides. These calibration models were validated by the 10-fold cross-validation method with the minimum root-mean-square errors (RMSE). Another ten samples were used as a test set. The acidities were calculated according to the estimated concentrations of CaO, MgO, Al₂O₃, and SiO₂ using the PLSR models. The average relative error (ARE) and RMSE of the acidity achieved 3.65% and 0.0048, respectively, for the test samples.
Fadzillah, Nurrulhidayah Ahmad; Man, Yaakob bin Che; Rohman, Abdul; Rosman, Arieff Salleh; Ismail, Amin; Mustafa, Shuhaimi; Khatib, Alfi
2015-01-01
The authentication of food products from the presence of non-allowed components for certain religion like lard is very important. In this study, we used proton Nuclear Magnetic Resonance ((1)H-NMR) spectroscopy for the analysis of butter adulterated with lard by simultaneously quantification of all proton bearing compounds, and consequently all relevant sample classes. Since the spectra obtained were too complex to be analyzed visually by the naked eyes, the classification of spectra was carried out.The multivariate calibration of partial least square (PLS) regression was used for modelling the relationship between actual value of lard and predicted value. The model yielded a highest regression coefficient (R(2)) of 0.998 and the lowest root mean square error calibration (RMSEC) of 0.0091% and root mean square error prediction (RMSEP) of 0.0090, respectively. Cross validation testing evaluates the predictive power of the model. PLS model was shown as good models as the intercept of R(2)Y and Q(2)Y were 0.0853 and -0.309, respectively.
The influencing factors of CO2 emission intensity of Chinese agriculture from 1997 to 2014.
Long, Xingle; Luo, Yusen; Wu, Chao; Zhang, Jijian
2018-05-01
In China, agriculture produces the greatest chemical oxygen demand (COD) emissions in wastewater and the most methane (CH 4 ) emissions. It is imperative that agricultural pollution in China be reduced. This study investigated the influencing factors of the CO 2 emission intensity of Chinese agriculture from 1997 to 2014. We analyzed the influencing factors of the CO 2 emission intensity through the first-stage least-square regression. We also analyzed determinants of innovation through the second-stage least-square regression. We found that innovation negatively affected the CO 2 emission intensity in the model of the nation. FDI positively affected innovation in China. It is important to enhance indigenous innovation for green agriculture through labor training and collaboration between agriculture and academia.
Moreira, Luiz Felipe Pompeu Prado; Ferrari, Adriana Cristina; Moraes, Tiago Bueno; Reis, Ricardo Andrade; Colnago, Luiz Alberto; Pereira, Fabíola Manhas Verbi
2016-05-19
Time-domain nuclear magnetic resonance and chemometrics were used to predict color parameters, such as lightness (L*), redness (a*), and yellowness (b*) of beef (Longissimus dorsi muscle) samples. Analyzing the relaxation decays with multivariate models performed with partial least-squares regression, color quality parameters were predicted. The partial least-squares models showed low errors independent of the sample size, indicating the potentiality of the method. Minced procedure and weighing were not necessary to improve the predictive performance of the models. The reduction of transverse relaxation time (T 2 ) measured by Carr-Purcell-Meiboom-Gill pulse sequence in darker beef in comparison with lighter ones can be explained by the lower relaxivity Fe 2+ present in deoxymyoglobin and oxymyoglobin (red beef) to the higher relaxivity of Fe 3+ present in metmyoglobin (brown beef). These results point that time-domain nuclear magnetic resonance spectroscopy can become a useful tool for quality assessment of beef cattle on bulk of the sample and through-packages, because this technique is also widely applied to measure sensorial parameters, such as flavor, juiciness and tenderness, and physicochemical parameters, cooking loss, fat and moisture content, and instrumental tenderness using Warner Bratzler shear force. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Farrés, Mireia; Piña, Benjamí; Tauler, Romà
2016-08-01
Copper containing fungicides are used to protect vineyards from fungal infections. Higher residues of copper in grapes at toxic concentrations are potentially toxic and affect the microorganisms living in vineyards, such as Saccharomyces cerevisiae. In this study, the response of the metabolic profiles of S. cerevisiae at different concentrations of copper sulphate (control, 1 mM, 3 mM and 6 mM) was analysed by liquid chromatography coupled to mass spectrometry (LC-MS) and multivariate curve resolution-alternating least squares (MCR-ALS) using an untargeted metabolomics approach. Peak areas of the MCR-ALS resolved elution profiles in control and in Cu(ii)-treated samples were compared using partial least squares regression (PLSR) and PLS-discriminant analysis (PLS-DA), and the intracellular metabolites best contributing to sample discrimination were selected and identified. Fourteen metabolites showed significant concentration changes upon Cu(ii) exposure, following a dose-response effect. The observed changes were consistent with the expected effects of Cu(ii) toxicity, including oxidative stress and DNA damage. This research confirmed that LC-MS based metabolomics coupled to chemometric methods are a powerful approach for discerning metabolomics changes in S. cerevisiae and for elucidating modes of toxicity of environmental stressors, including heavy metals like Cu(ii).
McBurnett, Keith; Clemow, David; Williams, David; Villodas, Miguel; Wietecha, Linda; Barkley, Russell
2017-02-01
To evaluate effects of atomoxetine versus placebo on sluggish cognitive tempo (SCT) and determine factors affecting improvement of SCT in children with attention-deficit/hyperactivity disorder (ADHD) with dyslexia (ADHD+D) or dyslexia only. This is a post hoc analysis of a 16-week placebo-controlled, double-blind randomized phase of a previously reported atomoxetine study in children aged 10-16 years with ADHD+D, Dyslexia-only, or ADHD-only (no placebo arm). Least squares mean changes from baseline to endpoint for atomoxetine versus placebo on the Kiddie-Sluggish Cognitive Tempo Interview (K-SCT) (Parent, Teacher, and Youth) were analyzed using analysis of covariance and multiple regression (partial R 2 ) analyses to test contributions of ADHD and dyslexia to improvements in K-SCT scores. Results were examined for the three informants within the three diagnostic groups (nine outcomes). Atomoxetine treatment was associated with significant reductions from baseline in seven of the nine outcomes using the p = 0.05 significance level, appropriate for exploratory analysis. When change in ADHD symptom severity was controlled, all of the seven SCT outcomes remained significant; changes in effect sizes were minimal. Regression analyses using SCT change as the criterion found a significant contribution by inattention change only for parent report, whereas, baseline SCT severity was a significant predictor in the randomized groups with the exception of teacher report in the Dyslexia-only group. Given that controlling for change in ADHD symptoms had little effect on change in SCT scores, findings suggest that change in SCT is substantially independent of change in ADHD. By inference, SCT and its response to treatment is a partially distinct phenomenon from ADHD response. Regression analyses did not reveal global effects of inattention change on SCT change; instead, baseline SCT severity was the strongest predictor of placebo-controlled treatment effect on SCT. Atomoxetine effects on SCT appear to be best predicted by how much room for improvement exists for SCT rather than by severity or improvement in inattention. NCT00607919, www.clinicaltrials.gov.
Hasbrouck, W.P.
1983-01-01
Processing of data taken with the U.S. Geological Survey's coal-seismic system is done with a desktop, stand-alone computer. Programs for this computer are written in the extended BASIC language used by the Tektronix 4051 Graphic System. This report presents computer programs to perform X-square/T-square analyses and to plot normal moveout lines on a seismogram overlay.
Yang, Yongbin; Buys, David R.; Judd, Suzanne E.; Gower, Barbara A.; Locher, Julie L.
2012-01-01
The purpose of this study was to examine food preferences of older adults living in the Black Belt Region of the Southeastern United States and the extent to which food preferences vary according to ethnicity, gender, and educational level. 270 older adults who were receiving home health services were interviewed in their home and were queried regarding their favorite foods. Descriptive statistics were used to characterize the sample. Chi-square analysis or one-way analyses of variance was used, where appropriate, in bivariate analyses, and logistic regression models were used in multivariate analyses. A total of 1,857 favorite foods were reported (mean per person = 6.88). The top ten favorite foods reported included: 1) chicken (of any kind), 2) collard greens, 3) cornbread, 4) green or string beans, 5) fish (fried catfish is implied), 6) turnip greens, 7) potatoes, 8) apples, 9) tomatoes, fried chicken, and eggs tied, and 10) steak and ice cream tied. African Americans and those with lower levels of education were more likely to report traditional Southern foods among their favorite foods and had a more limited repertoire of favorite foods. Findings have implications for understanding health disparities that may be associated with diet and development of culturally-appropriate nutrition interventions. PMID:23262296
Linhart, S. Mike; Nania, Jon F.; Sanders, Curtis L.; Archfield, Stacey A.
2012-01-01
The U.S. Geological Survey (USGS) maintains approximately 148 real-time streamgages in Iowa for which daily mean streamflow information is available, but daily mean streamflow data commonly are needed at locations where no streamgages are present. Therefore, the USGS conducted a study as part of a larger project in cooperation with the Iowa Department of Natural Resources to develop methods to estimate daily mean streamflow at locations in ungaged watersheds in Iowa by using two regression-based statistical methods. The regression equations for the statistical methods were developed from historical daily mean streamflow and basin characteristics from streamgages within the study area, which includes the entire State of Iowa and adjacent areas within a 50-mile buffer of Iowa in neighboring states. Results of this study can be used with other techniques to determine the best method for application in Iowa and can be used to produce a Web-based geographic information system tool to compute streamflow estimates automatically. The Flow Anywhere statistical method is a variation of the drainage-area-ratio method, which transfers same-day streamflow information from a reference streamgage to another location by using the daily mean streamflow at the reference streamgage and the drainage-area ratio of the two locations. The Flow Anywhere method modifies the drainage-area-ratio method in order to regionalize the equations for Iowa and determine the best reference streamgage from which to transfer same-day streamflow information to an ungaged location. Data used for the Flow Anywhere method were retrieved for 123 continuous-record streamgages located in Iowa and within a 50-mile buffer of Iowa. The final regression equations were computed by using either left-censored regression techniques with a low limit threshold set at 0.1 cubic feet per second (ft3/s) and the daily mean streamflow for the 15th day of every other month, or by using an ordinary-least-squares multiple linear regression method and the daily mean streamflow for the 15th day of every other month. The Flow Duration Curve Transfer method was used to estimate unregulated daily mean streamflow from the physical and climatic characteristics of gaged basins. For the Flow Duration Curve Transfer method, daily mean streamflow quantiles at the ungaged site were estimated with the parameter-based regression model, which results in a continuous daily flow-duration curve (the relation between exceedance probability and streamflow for each day of observed streamflow) at the ungaged site. By the use of a reference streamgage, the Flow Duration Curve Transfer is converted to a time series. Data used in the Flow Duration Curve Transfer method were retrieved for 113 continuous-record streamgages in Iowa and within a 50-mile buffer of Iowa. The final statewide regression equations for Iowa were computed by using a weighted-least-squares multiple linear regression method and were computed for the 0.01-, 0.05-, 0.10-, 0.15-, 0.20-, 0.30-, 0.40-, 0.50-, 0.60-, 0.70-, 0.80-, 0.85-, 0.90-, and 0.95-exceedance probability statistics determined from the daily mean streamflow with a reporting limit set at 0.1 ft3/s. The final statewide regression equation for Iowa computed by using left-censored regression techniques was computed for the 0.99-exceedance probability statistic determined from the daily mean streamflow with a low limit threshold and a reporting limit set at 0.1 ft3/s. For the Flow Anywhere method, results of the validation study conducted by using six streamgages show that differences between the root-mean-square error and the mean absolute error ranged from 1,016 to 138 ft3/s, with the larger value signifying a greater occurrence of outliers between observed and estimated streamflows. Root-mean-square-error values ranged from 1,690 to 237 ft3/s. Values of the percent root-mean-square error ranged from 115 percent to 26.2 percent. The logarithm (base 10) streamflow percent root-mean-square error ranged from 13.0 to 5.3 percent. Root-mean-square-error observations standard-deviation-ratio values ranged from 0.80 to 0.40. Percent-bias values ranged from 25.4 to 4.0 percent. Untransformed streamflow Nash-Sutcliffe efficiency values ranged from 0.84 to 0.35. The logarithm (base 10) streamflow Nash-Sutcliffe efficiency values ranged from 0.86 to 0.56. For the streamgage with the best agreement between observed and estimated streamflow, higher streamflows appear to be underestimated. For the streamgage with the worst agreement between observed and estimated streamflow, low flows appear to be overestimated whereas higher flows seem to be underestimated. Estimated cumulative streamflows for the period October 1, 2004, to September 30, 2009, are underestimated by -25.8 and -7.4 percent for the closest and poorest comparisons, respectively. For the Flow Duration Curve Transfer method, results of the validation study conducted by using the same six streamgages show that differences between the root-mean-square error and the mean absolute error ranged from 437 to 93.9 ft3/s, with the larger value signifying a greater occurrence of outliers between observed and estimated streamflows. Root-mean-square-error values ranged from 906 to 169 ft3/s. Values of the percent root-mean-square-error ranged from 67.0 to 25.6 percent. The logarithm (base 10) streamflow percent root-mean-square error ranged from 12.5 to 4.4 percent. Root-mean-square-error observations standard-deviation-ratio values ranged from 0.79 to 0.40. Percent-bias values ranged from 22.7 to 0.94 percent. Untransformed streamflow Nash-Sutcliffe efficiency values ranged from 0.84 to 0.38. The logarithm (base 10) streamflow Nash-Sutcliffe efficiency values ranged from 0.89 to 0.48. For the streamgage with the closest agreement between observed and estimated streamflow, there is relatively good agreement between observed and estimated streamflows. For the streamgage with the poorest agreement between observed and estimated streamflow, streamflows appear to be substantially underestimated for much of the time period. Estimated cumulative streamflow for the period October 1, 2004, to September 30, 2009, are underestimated by -9.3 and -22.7 percent for the closest and poorest comparisons, respectively.
Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello
2016-05-01
Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.
Land area change in coastal Louisiana (1932 to 2016)
Couvillion, Brady R.; Beck, Holly; Schoolmaster, Donald; Fischer, Michelle
2017-07-12
Coastal Louisiana wetlands are one of the most critically threatened environments in the United States. These wetlands are in peril because Louisiana currently experiences greater coastal wetland loss than all other States in the contiguous United States combined. The analyses of landscape change presented here have utilized historical surveys, aerial, and satellite data to quantify landscape changes from 1932 to 2016. Analyses show that coastal Louisiana has experienced a net change in land area of approximately -4,833 square kilometers (modeled estimate: -5,197 +/- 443 square kilometers) from 1932 to 2016. This net change in land area amounts to a decrease of approximately 25 percent of the 1932 land area. Previous studies have presented linear rates of change over multidecadal time periods which unintentionally suggest that wetland change occurs at a constant rate, although in many cases, wetland change rates vary with time. A penalized regression spline technique was used to determine the model that best fit the data, rather than fitting the data with linear trends. Trend analyses from model fits indicate that coastwide rates of wetland change have varied from -83.5 +/- 11.8 square kilometers per year to -28.01 +/- 16.37 square kilometers per year. To put these numbers into perspective, this equates to long-term average loss rates of approximately an American football field’s worth of coastal wetlands within 34 minutes when losses are rapid to within 100 minutes at more recent, slower rates. Of note is the slowing of the rate of wetland change since its peak in the mid- 1970s. Not only have rates of wetland loss been decreasing since that time, a further rate reduction has been observed since 2010. Possible reasons for this reduction include recovery from lows affected by the hurricanes of 2005 and 2008, the lack of major storms in the past 8 years, a possible slowing of subsidence rates, the reduction in and relocation of oil and gas extraction and infrastructure since the peak of such activities in the late 1960s, and restoration activities. In addition, many wetlands in more exposed positions in the landscape have already been lost. Most notable of the factors listed above is the lack of major storms over the past 8 years. The observed coastwide net “stability” in land area observed over the past 6–8 years does not imply that loss has ceased. Future disturbance events such as a major hurricane impact could change the trajectory of the rates. Sea-level rise is projected to increase at an exponential rate, and that would also expedite the rate of wetland loss.
Soil sail content estimation in the yellow river delta with satellite hyperspectral data
Weng, Yongling; Gong, Peng; Zhu, Zhi-Liang
2008-01-01
Soil salinization is one of the most common land degradation processes and is a severe environmental hazard. The primary objective of this study is to investigate the potential of predicting salt content in soils with hyperspectral data acquired with EO-1 Hyperion. Both partial least-squares regression (PLSR) and conventional multiple linear regression (MLR), such as stepwise regression (SWR), were tested as the prediction model. PLSR is commonly used to overcome the problem caused by high-dimensional and correlated predictors. Chemical analysis of 95 samples collected from the top layer of soils in the Yellow River delta area shows that salt content was high on average, and the dominant chemicals in the saline soil were NaCl and MgCl2. Multivariate models were established between soil contents and hyperspectral data. Our results indicate that the PLSR technique with laboratory spectral data has a strong prediction capacity. Spectral bands at 1487-1527, 1971-1991, 2032-2092, and 2163-2355 nm possessed large absolute values of regression coefficients, with the largest coefficient at 2203 nm. We obtained a root mean squared error (RMSE) for calibration (with 61 samples) of RMSEC = 0.753 (R2 = 0.893) and a root mean squared error for validation (with 30 samples) of RMSEV = 0.574. The prediction model was applied on a pixel-by-pixel basis to a Hyperion reflectance image to yield a quantitative surface distribution map of soil salt content. The result was validated successfully from 38 sampling points. We obtained an RMSE estimate of 1.037 (R2 = 0.784) for the soil salt content map derived by the PLSR model. The salinity map derived from the SWR model shows that the predicted value is higher than the true value. These results demonstrate that the PLSR method is a more suitable technique than stepwise regression for quantitative estimation of soil salt content in a large area. ?? 2008 CASI.
Geodesic least squares regression on information manifolds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Verdoolaege, Geert, E-mail: geert.verdoolaege@ugent.be
We present a novel regression method targeted at situations with significant uncertainty on both the dependent and independent variables or with non-Gaussian distribution models. Unlike the classic regression model, the conditional distribution of the response variable suggested by the data need not be the same as the modeled distribution. Instead they are matched by minimizing the Rao geodesic distance between them. This yields a more flexible regression method that is less constrained by the assumptions imposed through the regression model. As an example, we demonstrate the improved resistance of our method against some flawed model assumptions and we apply thismore » to scaling laws in magnetic confinement fusion.« less
Least Squares Moving-Window Spectral Analysis.
Lee, Young Jong
2017-08-01
Least squares regression is proposed as a moving-windows method for analysis of a series of spectra acquired as a function of external perturbation. The least squares moving-window (LSMW) method can be considered an extended form of the Savitzky-Golay differentiation for nonuniform perturbation spacing. LSMW is characterized in terms of moving-window size, perturbation spacing type, and intensity noise. Simulation results from LSMW are compared with results from other numerical differentiation methods, such as single-interval differentiation, autocorrelation moving-window, and perturbation correlation moving-window methods. It is demonstrated that this simple LSMW method can be useful for quantitative analysis of nonuniformly spaced spectral data with high frequency noise.
Discordance between net analyte signal theory and practical multivariate calibration.
Brown, Christopher D
2004-08-01
Lorber's concept of net analyte signal is reviewed in the context of classical and inverse least-squares approaches to multivariate calibration. It is shown that, in the presence of device measurement error, the classical and inverse calibration procedures have radically different theoretical prediction objectives, and the assertion that the popular inverse least-squares procedures (including partial least squares, principal components regression) approximate Lorber's net analyte signal vector in the limit is disproved. Exact theoretical expressions for the prediction error bias, variance, and mean-squared error are given under general measurement error conditions, which reinforce the very discrepant behavior between these two predictive approaches, and Lorber's net analyte signal theory. Implications for multivariate figures of merit and numerous recently proposed preprocessing treatments involving orthogonal projections are also discussed.
SUPPLEMENTARY COMPARISON: EUROMET.L-S10 Comparison of squareness measurements
NASA Astrophysics Data System (ADS)
Mokros, Jiri
2005-01-01
The idea of performing a comparison of squareness resulted from the need to review the MRA Appendix C, Category 90° square. At its meeting in October 1999 (in Prague) it was decided upon a first comparison of squareness measurements in the framework of EUROMET, numbered #570, starting in 2000, with the Slovak Institute of Metrology (SMU) as the pilot laboratory. During the preparation stage of the project, it was agreed that it should be submitted as a EUROMET supplementary comparison in the framework of the Mutual Recognition Arrangement (MRA) of the Metre Convention and would boost confidence in calibration and measurement certificates issued by the participating national metrology institutes. The aim of the comparison of squareness measurement was to compare and verify the declared calibration measurement capabilities of participating laboratories and to investigate the effect of systematic influences in the measurement process and their elimination. Eleven NMIs from the EUROMET region carried out this project. Two standards were calibrated: granite squareness standard of rectangular shape, cylindrical squareness standard of steel with marked positions for the profile lines. The following parameters had to be calibrated: granite squareness standard: interior angle γB between two lines AB and AC (envelope - LS regression) fitted through the measured profiles, and/or granite squareness standard: interior angle γLS between two LS regression lines AB and AC fitted through the measured profiles, cylindrical squareness standard: interior angles γ0°, γ90°, γ180°, γ270° between the LS regression line fitted through the measurement profiles at 0°, 90°, 180°, 270° and the envelope plane of the basis (resting on a surface plate), local LS straightness deviation for all measured profiles (2 and 4) of both standards. The results of the comparison are the deviations of profiles and angles measured by the individual NMIs from the reference values. These resulted from the weighted mean of data from participating laboratories, while some of them were excluded on the basis of statistical evaluation. Graphical interpretations of all deviations are contained in the Final Report. In order to compare the individual deviations mutually (25 profiles for the granite square and 44 profiles for the cylinder), graphical illustrations of 'standard deviations' and both extreme values (max. and min.) of deviations were created. This regional supplementary comparison has provided independent information about the metrological properties of the measuring equipment and method used by the participating NMIs. The Final Report does not contain the En values. Participants could not estimate some contributions in the uncertainty budget on the basis of previous comparisons, since no comparison of this kind had ever been organized. Therefore the En value cannot reflect the actual state of the given NMI. Instead of En, an analysis has been performed by means of the Grubbs test according to ISO 5725-2. This comparison provided information about the state of provision of metrological services in the field of big squares measurement. Main text. To reach the main text of this paper, click on Final Report. Note that this text is that which appears in Appendix B of the BIPM key comparison database kcdb.bipm.org/. The final report has been peer-reviewed and approved for publication by EUROMET, according to the provisions of the Mutual Recognition Arrangement (MRA).
Douglas, R K; Nawar, S; Alamar, M C; Mouazen, A M; Coulon, F
2018-03-01
Visible and near infrared spectrometry (vis-NIRS) coupled with data mining techniques can offer fast and cost-effective quantitative measurement of total petroleum hydrocarbons (TPH) in contaminated soils. Literature showed however significant differences in the performance on the vis-NIRS between linear and non-linear calibration methods. This study compared the performance of linear partial least squares regression (PLSR) with a nonlinear random forest (RF) regression for the calibration of vis-NIRS when analysing TPH in soils. 88 soil samples (3 uncontaminated and 85 contaminated) collected from three sites located in the Niger Delta were scanned using an analytical spectral device (ASD) spectrophotometer (350-2500nm) in diffuse reflectance mode. Sequential ultrasonic solvent extraction-gas chromatography (SUSE-GC) was used as reference quantification method for TPH which equal to the sum of aliphatic and aromatic fractions ranging between C 10 and C 35 . Prior to model development, spectra were subjected to pre-processing including noise cut, maximum normalization, first derivative and smoothing. Then 65 samples were selected as calibration set and the remaining 20 samples as validation set. Both vis-NIR spectrometry and gas chromatography profiles of the 85 soil samples were subjected to RF and PLSR with leave-one-out cross-validation (LOOCV) for the calibration models. Results showed that RF calibration model with a coefficient of determination (R 2 ) of 0.85, a root means square error of prediction (RMSEP) 68.43mgkg -1 , and a residual prediction deviation (RPD) of 2.61 outperformed PLSR (R 2 =0.63, RMSEP=107.54mgkg -1 and RDP=2.55) in cross-validation. These results indicate that RF modelling approach is accounting for the nonlinearity of the soil spectral responses hence, providing significantly higher prediction accuracy compared to the linear PLSR. It is recommended to adopt the vis-NIRS coupled with RF modelling approach as a portable and cost effective method for the rapid quantification of TPH in soils. Copyright © 2017 Elsevier B.V. All rights reserved.
Regression Simulation of Turbine Engine Performance - Accuracy Improvement (TASK IV)
1978-09-30
33 21 Generalized Form of the Regression Equation for the Optimized Polynomial Exponent M ethod...altitude, Mach number and power setting combinations were generated during the ARES evaluation. The orthogonal Latin Square selection procedure...pattern. In data generation , the low (L), mid (M), and high (H) values of a variable are not always the same. At some of the corner points where
ERIC Educational Resources Information Center
Le, Huy; Marcus, Justin
2012-01-01
This study used Monte Carlo simulation to examine the properties of the overall odds ratio (OOR), which was recently introduced as an index for overall effect size in multiple logistic regression. It was found that the OOR was relatively independent of study base rate and performed better than most commonly used R-square analogs in indexing model…
USDA-ARS?s Scientific Manuscript database
Illegal use of nitrogen-rich melamine (C3H6N6) to boost perceived protein content of food products such as milk, infant formula, frozen yogurt, pet food, biscuits, and coffee drinks has caused serious food safety problems. Conventional methods to detect melamine in foods, such as Enzyme-linked immun...
Image interpolation via regularized local linear regression.
Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang
2011-12-01
The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Bader, Jon B.
2009-01-01
Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.
From direct-space discrepancy functions to crystallographic least squares.
Giacovazzo, Carmelo
2015-01-01
Crystallographic least squares are a fundamental tool for crystal structure analysis. In this paper their properties are derived from functions estimating the degree of similarity between two electron-density maps. The new approach leads also to modifications of the standard least-squares procedures, potentially able to improve their efficiency. The role of the scaling factor between observed and model amplitudes is analysed: the concept of unlocated model is discussed and its scattering contribution is combined with that arising from the located model. Also, the possible use of an ancillary parameter, to be associated with the classical weight related to the variance of the observed amplitudes, is studied. The crystallographic discrepancy factors, basic tools often combined with least-squares procedures in phasing approaches, are analysed. The mathematical approach here described includes, as a special case, the so-called vector refinement, used when accurate estimates of the target phases are available.
Sex segregation in undergraduate engineering majors
NASA Astrophysics Data System (ADS)
Litzler, Elizabeth
Gender inequality in engineering persists in spite of women reaching parity in college enrollments and degrees granted. To date, no analyses of educational sex segregation have comprehensively examined segregation within one discipline. To move beyond traditional methods of studying the long-standing stratification by field of study in higher education, I explore gender stratification within one field: engineering. This dissertation investigates why some engineering disciplines have a greater representation of women than other engineering disciplines. I assess the individual and institutional factors and conditions associated with women's representation in certain engineering departments and compare the mechanisms affecting women's and men's choice of majors. I use national data from the Engineering Workforce Commission, survey data from 21 schools in the Project to Assess Climate in Engineering study, and Carnegie Foundation classification information to study sex segregation in engineering majors from multiple perspectives: the individual, major, institution, and country. I utilize correlations, t-tests, cross-tabulations, log-linear modeling, multilevel logistic regression and weighted least squares regression to test the relative utility of alternative explanations for women's disproportionate representation across engineering majors. As a whole, the analyses illustrate the importance of context and environment for women's representation in engineering majors. Hypotheses regarding hostile climate and discrimination find wide support across different analyses, suggesting that women's under-representation in certain engineering majors is not a question of choice or ability. However, individual level factors such as having engineering coursework prior to college show an especially strong association with student choice of major. Overall, the analyses indicate that institutions matter, albeit less for women, and women's under-representation in engineering is not reducible to individual choice. This dissertation provides a broad, descriptive view of the state of sex segregation in engineering as well as a careful analysis of how individual and institutional factors inhibit or encourage sex segregation. This study contributes to the research literature through the use of novel data, testing of occupational segregation theories, and the use of multiple levels of analysis. The analyses provide new insight into an enduring phenomenon, and suggest new avenues for understanding sex segregation in higher education.
Least-Squares Data Adjustment with Rank-Deficient Data Covariance Matrices
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, J.G.
2011-07-01
A derivation of the linear least-squares adjustment formulae is required that avoids the assumption that the covariance matrix of prior parameters can be inverted. Possible proofs are of several kinds, including: (i) extension of standard results for the linear regression formulae, and (ii) minimization by differentiation of a quadratic form of the deviations in parameters and responses. In this paper, the least-squares adjustment equations are derived in both these ways, while explicitly assuming that the covariance matrix of prior parameters is singular. It will be proved that the solutions are unique and that, contrary to statements that have appeared inmore » the literature, the least-squares adjustment problem is not ill-posed. No modification is required to the adjustment formulae that have been used in the past in the case of a singular covariance matrix for the priors. In conclusion: The linear least-squares adjustment formula that has been used in the past is valid in the case of a singular covariance matrix for the covariance matrix of prior parameters. Furthermore, it provides a unique solution. Statements in the literature, to the effect that the problem is ill-posed are wrong. No regularization of the problem is required. This has been proved in the present paper by two methods, while explicitly assuming that the covariance matrix of prior parameters is singular: i) extension of standard results for the linear regression formulae, and (ii) minimization by differentiation of a quadratic form of the deviations in parameters and responses. No modification is needed to the adjustment formulae that have been used in the past. (author)« less
Techniques for estimating flood-peak discharges of rural, unregulated streams in Ohio
Koltun, G.F.
2003-01-01
Regional equations for estimating 2-, 5-, 10-, 25-, 50-, 100-, and 500-year flood-peak discharges at ungaged sites on rural, unregulated streams in Ohio were developed by means of ordinary and generalized least-squares (GLS) regression techniques. One-variable, simple equations and three-variable, full-model equations were developed on the basis of selected basin characteristics and flood-frequency estimates determined for 305 streamflow-gaging stations in Ohio and adjacent states. The average standard errors of prediction ranged from about 39 to 49 percent for the simple equations, and from about 34 to 41 percent for the full-model equations. Flood-frequency estimates determined by means of log-Pearson Type III analyses are reported along with weighted flood-frequency estimates, computed as a function of the log-Pearson Type III estimates and the regression estimates. Values of explanatory variables used in the regression models were determined from digital spatial data sets by means of a geographic information system (GIS), with the exception of drainage area, which was determined by digitizing the area within basin boundaries manually delineated on topographic maps. Use of GIS-based explanatory variables represents a major departure in methodology from that described in previous reports on estimating flood-frequency characteristics of Ohio streams. Examples are presented illustrating application of the regression equations to ungaged sites on ungaged and gaged streams. A method is provided to adjust regression estimates for ungaged sites by use of weighted and regression estimates for a gaged site on the same stream. A region-of-influence method, which employs a computer program to estimate flood-frequency characteristics for ungaged sites based on data from gaged sites with similar characteristics, was also tested and compared to the GLS full-model equations. For all recurrence intervals, the GLS full-model equations had superior prediction accuracy relative to the simple equations and therefore are recommended for use.
Padula, William V; Mishra, Manish K; Weaver, Christopher D; Yilmaz, Taygan; Splaine, Mark E
2012-06-01
To demonstrate complementary results of regression and statistical process control (SPC) chart analyses for hospital-acquired pressure ulcers (HAPUs), and identify possible links between changes and opportunities for improvement between hospital microsystems and macrosystems. Ordinary least squares and panel data regression of retrospective hospital billing data, and SPC charts of prospective patient records for a US tertiary-care facility (2004-2007). A prospective cohort of hospital inpatients at risk for HAPUs was the study population. There were 337 HAPU incidences hospital wide among 43 844 inpatients. A probit regression model predicted the correlation of age, gender and length of stay on HAPU incidence (pseudo R(2)=0.096). Panel data analysis determined that for each additional day in the hospital, there was a 0.28% increase in the likelihood of HAPU incidence. A p-chart of HAPU incidence showed a mean incidence rate of 1.17% remaining in statistical control. A t-chart showed the average time between events for the last 25 HAPUs was 13.25 days. There was one 57-day period between two incidences during the observation period. A p-chart addressing Braden scale assessments showed that 40.5% of all patients were risk stratified for HAPUs upon admission. SPC charts complement standard regression analysis. SPC amplifies patient outcomes at the microsystem level and is useful for guiding quality improvement. Macrosystems should monitor effective quality improvement initiatives in microsystems and aid the spread of successful initiatives to other microsystems, followed by system-wide analysis with regression. Although HAPU incidence in this study is below the national mean, there is still room to improve HAPU incidence in this hospital setting since 0% incidence is theoretically achievable. Further assessment of pressure ulcer incidence could illustrate improvement in the quality of care and prevent HAPUs.
The Application of Censored Regression Models in Low Streamflow Analyses
NASA Astrophysics Data System (ADS)
Kroll, C.; Luz, J.
2003-12-01
Estimation of low streamflow statistics at gauged and ungauged river sites is often a daunting task. This process is further confounded by the presence of intermittent streamflows, where streamflow is sometimes reported as zero, within a region. Streamflows recorded as zero may be zero, or may be less than the measurement detection limit. Such data is often referred to as censored data. Numerous methods have been developed to characterize intermittent streamflow series. Logit regression has been proposed to develop regional models of the probability annual lowflows series (such as 7-day lowflows) are zero. In addition, Tobit regression, a method of regression that allows for censored dependent variables, has been proposed for lowflow regional regression models in regions where the lowflow statistic of interest estimated as zero at some sites in the region. While these methods have been proposed, their use in practice has been limited. Here a delete-one jackknife simulation is presented to examine the performance of Logit and Tobit models of 7-day annual minimum flows in 6 USGS water resource regions in the United States. For the Logit model, an assessment is made of whether sites are correctly classified as having at least 10% of 7-day annual lowflows equal to zero. In such a situation, the 7-day, 10-year lowflow (Q710), a commonly employed low streamflow statistic, would be reported as zero. For the Tobit model, a comparison is made between results from the Tobit model, and from performing either ordinary least squares (OLS) or principal component regression (PCR) after the zero sites are dropped from the analysis. Initial results for the Logit model indicate this method to have a high probability of correctly classifying sites into groups with Q710s as zero and non-zero. Initial results also indicate the Tobit model produces better results than PCR and OLS when more than 5% of the sites in the region have Q710 values calculated as zero.
Bonilla, Manuel G.; Mark, Robert K.; Lienkaemper, James J.
1984-01-01
In order to refine correlations of surface-wave magnitude, fault rupture length at the ground surface, and fault displacement at the surface by including the uncertainties in these variables, the existing data were critically reviewed and a new data base was compiled. Earthquake magnitudes were redetermined as necessary to make them as consistent as possible with the Gutenberg methods and results, which make up much of the data base. Measurement errors were estimated for the three variables for 58 moderate to large shallow-focus earthquakes. Regression analyses were then made utilizing the estimated measurement errors.The regression analysis demonstrates that the relations among the variables magnitude, length, and displacement are stochastic in nature. The stochastic variance, introduced in part by incomplete surface expression of seismogenic faulting, variation in shear modulus, and regional factors, dominates the estimated measurement errors. Thus, it is appropriate to use ordinary least squares for the regression models, rather than regression models based upon an underlying deterministic relation in which the variance results primarily from measurement errors.Significant differences exist in correlations of certain combinations of length, displacement, and magnitude when events are grouped by fault type or by region, including attenuation regions delineated by Evernden and others.Estimates of the magnitude and the standard deviation of the magnitude of a prehistoric or future earthquake associated with a fault can be made by correlating Ms with the logarithms of rupture length, fault displacement, or the product of length and displacement.Fault rupture area could be reliably estimated for about 20 of the events in the data set. Regression of Ms on rupture area did not result in a marked improvement over regressions that did not involve rupture area. Because no subduction-zone earthquakes are included in this study, the reported results do not apply to such zones.
Shah, Drishti; Shah, Anuj; Tan, Xi; Sambamoorthi, Usha
2017-08-01
In 2009, the FDA required a black box warning (BBW) on bupropion and varenicline, the two commonly prescribed smoking cessation agents due to reports of adverse neuropsychiatric events. We investigated if there was a decline in use of bupropion and varenicline after the BBW by comparing the percent using these medications before and after BBW. We conducted a retrospective observational study using data from the Medical Expenditure Panel Survey from 2007 to 2014. The study sample consisted of adult smokers, who were advised by their physicians to quit smoking. We divided the time period into "pre-warning", "post-warning: immediate", and "post-warning: late." Unadjusted analysis using chi-square tests and adjusted analyses using logistic regressions were conducted to evaluate the change in bupropion and varenicline use before and after the BBW. Secondary analyses using piecewise regression were also conducted. On an average, 49.04% of smokers were advised by their physicians to quit smoking. We observed a statistically significant decline in varenicline use from 22.1% in year 2007 to 9.23% in 2014 (p value<0.001). In the logistic (Adjusted Odds Ratio=0.36, 95% CI=0.22-0.58) and piecewise regressions (Odds Ratio=0.64, 95% CI=0.41-0.99) smokers who were advised to quit smoking by their physicians were less likely to use varenicline in the immediate post-BBW period as compared to pre-BBW period. While the use of varenicline continued to be significantly low in the late post-BBW period (AOR=0.45, 95% CI=0.31-0.64) as compared to the pre-BBW period, the trend in use as seen in piecewise regression remained stable (OR=0.90, 95% CI=0.75-1.06). We did not observe significant differences in bupropion use between the pre- and post-BBW periods. The passage of the FDA boxed warning was associated with a significant decline in the use of varenicline, but not in the use of bupropion. Copyright © 2017 Elsevier B.V. All rights reserved.
da Silva, Fabiana E B; Flores, Érico M M; Parisotto, Graciele; Müller, Edson I; Ferrão, Marco F
2016-03-01
An alternative method for the quantification of sulphametoxazole (SMZ) and trimethoprim (TMP) using diffuse reflectance infrared Fourier-transform spectroscopy (DRIFTS) and partial least square regression (PLS) was developed. Interval Partial Least Square (iPLS) and Synergy Partial Least Square (siPLS) were applied to select a spectral range that provided the lowest prediction error in comparison to the full-spectrum model. Fifteen commercial tablet formulations and forty-nine synthetic samples were used. The ranges of concentration considered were 400 to 900 mg g-1SMZ and 80 to 240 mg g-1 TMP. Spectral data were recorded between 600 and 4000 cm-1 with a 4 cm-1 resolution by Diffuse Reflectance Infrared Fourier Transform Spectroscopy (DRIFTS). The proposed procedure was compared to high performance liquid chromatography (HPLC). The results obtained from the root mean square error of prediction (RMSEP), during the validation of the models for samples of sulphamethoxazole (SMZ) and trimethoprim (TMP) using siPLS, demonstrate that this approach is a valid technique for use in quantitative analysis of pharmaceutical formulations. The selected interval algorithm allowed building regression models with minor errors when compared to the full spectrum PLS model. A RMSEP of 13.03 mg g-1for SMZ and 4.88 mg g-1 for TMP was obtained after the selection the best spectral regions by siPLS.
Kaneko, Hiromasa; Funatsu, Kimito
2013-09-23
We propose predictive performance criteria for nonlinear regression models without cross-validation. The proposed criteria are the determination coefficient and the root-mean-square error for the midpoints between k-nearest-neighbor data points. These criteria can be used to evaluate predictive ability after the regression models are updated, whereas cross-validation cannot be performed in such a situation. The proposed method is effective and helpful in handling big data when cross-validation cannot be applied. By analyzing data from numerical simulations and quantitative structural relationships, we confirm that the proposed criteria enable the predictive ability of the nonlinear regression models to be appropriately quantified.
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Development and validation of PRISM: a survey tool to identify diabetes self-management barriers.
Cox, Elizabeth D; Fritz, Katie A; Hansen, Kristofer W; Brown, Roger L; Rajamanickam, Victoria; Wiles, Kaelyn E; Fate, Bryan H; Young, Henry N; Moreno, Megan A
2014-04-01
Although most children with type 1 diabetes do not achieve optimal glycemic control, no systematic method exists to identify and address self-management barriers. This study develops and validates PRISM (Problem Recognition in Illness Self-Management), a survey-based tool for efficiently identifying self-management barriers experienced by children/adolescents with diabetes and their parents. Adolescents 13 years and older and parents of children 8 years and older visiting for routine diabetes management (n=425) were surveyed about self-management barriers. HbA1c was abstracted from the electronic health record. To develop PRISM, exploratory and confirmatory factor analyses were used. To assess validity, the association of PRISM scores with HbA1c was examined using linear regression. Factor analyses of adolescent and parent data yielded well-fitting models of self-management barriers, reflecting the following domains: (1) Understanding and Organizing Care, (2) Regimen Pain and Bother, (3) Denial of Disease and Consequences, and (4) Healthcare Team, (5) Family, or (6) Peer Interactions. All models exhibited good fit, with χ(2) ratios<2.21, root mean square errors of approximation<0.09, Confirmatory Fit Indices and Tucker-Lewis Indices both >0.92, and weighted root mean square residuals<1.71. Greater PRISM barrier scores were significantly associated with higher HbA1cs. Our findings suggest at least six different domains exist within self-management barriers, nearly all of which are significantly related to HbA1c. PRISM could be used in clinical practice to identify each child and family's unique self-management barriers, allowing existing self-management resources to be tailored to the family's barriers, ultimately improving effectiveness of such services. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Development and Validation of PRISM: A Survey Tool to Identify Diabetes Self-Management Barriers
Cox, Elizabeth D.; Fritz, Katie A.; Hansen, Kristofer W.; Brown, Roger L.; Rajamanickam, Victoria; Wiles, Kaelyn E.; Fate, Bryan H.; Young, Henry N.; Moreno, Megan A.
2014-01-01
Aims Although most children with type 1 diabetes don’t achieve optimal glycemic control, no systematic method exists to identify and address self-management barriers. This study develops and validates PRISM (Problem Recognition in Illness Self-Management), a survey-based tool for efficiently identifying self-management barriers experienced by children/adolescents with diabetes and their parents. Methods Adolescents 13 years and older and parents of children 8 years and older visiting for routine diabetes management (n=425) were surveyed about self-management barriers. HbA1c was abstracted from the electronic health record. To develop PRISM, exploratory and confirmatory factor analyses were used. To assess validity, the association of PRISM scores with HbA1c was examined using linear regression. Results Factor analyses of adolescent and parent data yielded well-fitting models of self-management barriers, reflecting the following domains: 1) Understanding and Organizing Care, 2) Regimen Pain and Bother, 3) Denial of Disease and Consequences, and 4) Healthcare Team, 5) Family, or 6) Peer Interactions. All models exhibited good fit, with X2 ratios<2.21, root mean square errors of approximation<0.09, Confirmatory Fit Indices and Tucker-Lewis Indices both >0.92, and weighted root mean square residuals<1.71. Greater PRISM barrier scores were significantly associated with higher HbA1cs. Conclusions Our findings suggest at least six different domains exist within self-management barriers, nearly all of which are significantly related to HbA1c. PRISM could be used in clinical practice to identify each child and family’s unique self-management barriers, allowing existing self-management resources to be tailored to the family’s barriers, ultimately improving effectiveness of such services. PMID:24552680
Salivary markers of oxidative stress are related to age and oral health in adult non-smokers.
Celecová, Viera; Kamodyová, Natália; Tóthová, Lubomíra; Kúdela, Matúš; Celec, Peter
2013-03-01
Salivary concentrations of thiobarbituric acid-reacting substances (TBARS) are associated with the periodontal status assessed as papillary bleeding index (PBI). Whether this association is age independent is currently unclear. Salivary concentrations of advanced oxidation protein products (AOPPs) and advanced glycation end products (AGEs) have not been assessed in relation to age or oral health yet. The aim of our study was to analyse salivary markers of oxidative stress in dental patients in relation to age, gender and oral health. Consecutive adult non-smoking dental patients were enrolled (n = 204; aged 19-83 years). PBI and the caries index (CI) were assessed. Markers of oxidative stress, such as TBARS, AOPPs and AGEs, and the total antioxidant capacity (TAC) were measured in saliva samples taken before clinical examination. Linear regression showed that salivary TBARS, AGEs and TAC significantly increase with age (r squared = 5.3%, 2.1% and 5%, respectively). PBI is an independent predictor of salivary TBARS (r squared = 5.5%), and the CI negatively affected AOPPs (r = 3.2%) and positively affected TBARS (r = 2.5%). Gender did not affect any of the analysed parameters. Age as a significant contributor to the variance should be taken into account in studies focusing on salivary markers of oxidative stress. The relationship between PBI and salivary TBARS confirms results from previous studies. In addition, our results show that the association is age independent. Negative association between the CI and AOPPs might be related to recent findings that AOPP might be actually a marker of non-enzymatic antioxidant status. © 2012 John Wiley & Sons A/S. All rights reserved.
Miloudi, Lynda; Bonnier, Franck; Bertrand, Dominique; Byrne, Hugh J; Perse, Xavier; Chourpa, Igor; Munnier, Emilie
2017-07-01
Core-shell nanocarriers are increasingly being adapted in cosmetic and dermatological fields, aiming to provide an increased penetration of the active pharmaceutical or cosmetic ingredients (API and ACI) through the skin. In the final form, the nanocarriers (NC) are usually prepared in hydrogels, conferring desired viscous properties for topical application. Combined with the high chemical complexity of the encapsulating system itself, involving numerous ingredients to form a stable core and quantifying the NC and/or the encapsulated active without labor-intensive and destructive methods remains challenging. In this respect, the specific molecular fingerprint obtained from vibrational spectroscopy analysis could unambiguously overcome current obstacles in the development of fast and cost-effective quality control tools for NC-based products. The present study demonstrates the feasibility to deliver accurate quantification of the concentrations of curcumin (ACI)-loaded alginate nanocarriers in hydrogel matrices, coupling partial least square regression (PLSR) to infrared (IR) absorption and Raman spectroscopic analyses. With respective root mean square errors of 0.1469 ± 0.0175% w/w and 0.4462 ± 0.0631% w/w, both approaches offer acceptable precision. Further investigation of the PLSR results allowed to highlight the different selectivity of each approach, indicating only IR analysis delivers direct monitoring of the NC through the quantification of the Labrafac®, the main NC ingredient. Raman analyses are rather dominated by the contribution of the ACI which opens numerous perspectives to quantify the active molecules without interferences from the complex core-shell encapsulating systems thus positioning the technique as a powerful analytical tool for industrial screening of cosmetic and pharmaceutical products. Graphical abstract Quantitative analysis of encapuslated active molecules in hydrogel-based samples by means of infrared and Raman spectroscopy.
Mediation analysis: a retrospective snapshot of practice and more recent directions.
Gelfand, Lois A; Mensinger, Janell L; Tenhave, Thomas
2009-04-01
R. Baron and D. A. Kenny's (1986) paper introducing mediation analysis has been cited over 9,000 times, but concerns have been expressed about how this method is used. The authors review past and recent methodological literature and make recommendations for how to address 3 main issues: association, temporal order, and the no omitted variables assumption. The authors briefly visit the topics of reliability and the confirmatory-exploratory distinction. In addition, to provide a sense of the extent to which the earlier literature had been absorbed into practice, the authors examined a sample of 50 articles from 2002 citing R. Baron and D. A. Kenny and containing at least 1 mediation analysis via ordinary least squares regression. A substantial proportion of these articles included problematic reporting; as of 2002, there appeared to be room for improvement in conducting such mediation analyses. Future literature reviews will demonstrate the extent to which the situation has improved.
Gardening as a potential activity to reduce falls in older adults.
Chen, Tuo-Yu; Janke, Megan C
2012-01-01
This study examines whether participation in gardening predicts reduced fall risk and performance on balance and gait-speed measures in older adults. Data on adults age 65 and older (N = 3,237) from the Health and Retirement Study and Consumption and Activities Mail Survey were analyzed. Participants who spent 1 hr or more gardening in the past week were defined as gardeners, resulting in a total of 1,585 gardeners and 1,652 nongardeners. Independent t tests, chi square, and regression analyses were conducted to examine the relationship between gardening and health outcomes. Findings indicate that gardeners reported significantly better balance and gait speed and had fewer chronic conditions and functional limitations than nongardeners. Significantly fewer gardeners than nongardeners reported a fall in the past 2 yr. The findings suggest that gardening may be a potential activity to incorporate into future fall-prevention programs.
XAP, a program for deconvolution and analysis of complex X-ray spectra
Quick, James E.; Haleby, Abdul Malik
1989-01-01
The X-ray analysis program (XAP) is a spectral-deconvolution program written in BASIC and specifically designed to analyze complex spectra produced by energy-dispersive X-ray analytical systems (EDS). XAP compensates for spectrometer drift, utilizes digital filtering to remove background from spectra, and solves for element abundances by least-squares, multiple-regression analysis. Rather than base analyses on only a few channels, broad spectral regions of a sample are reconstructed from standard reference spectra. The effects of this approach are (1) elimination of tedious spectrometer adjustments, (2) removal of background independent of sample composition, and (3) automatic correction for peak overlaps. Although the program was written specifically to operate a KEVEX 7000 X-ray fluorescence analytical system, it could be adapted (with minor modifications) to analyze spectra produced by scanning electron microscopes, electron microprobes, and probes, and X-ray defractometer patterns obtained from whole-rock powders.
TLE uncertainty estimation using robust weighted differencing
NASA Astrophysics Data System (ADS)
Geul, Jacco; Mooij, Erwin; Noomen, Ron
2017-05-01
Accurate knowledge of satellite orbit errors is essential for many types of analyses. Unfortunately, for two-line elements (TLEs) this is not available. This paper presents a weighted differencing method using robust least-squares regression for estimating many important error characteristics. The method is applied to both classic and enhanced TLEs, compared to previous implementations, and validated using Global Positioning System (GPS) solutions for the GOCE satellite in Low-Earth Orbit (LEO), prior to its re-entry. The method is found to be more accurate than previous TLE differencing efforts in estimating initial uncertainty, as well as error growth. The method also proves more reliable and requires no data filtering (such as outlier removal). Sensitivity analysis shows a strong relationship between argument of latitude and covariance (standard deviations and correlations), which the method is able to approximate. Overall, the method proves accurate, computationally fast, and robust, and is applicable to any object in the satellite catalogue (SATCAT).
Determination of butter adulteration with margarine using Raman spectroscopy.
Uysal, Reyhan Selin; Boyaci, Ismail Hakki; Genis, Hüseyin Efe; Tamer, Ugur
2013-12-15
In this study, adulteration of butter with margarine was analysed using Raman spectroscopy combined with chemometric methods (principal component analysis (PCA), principal component regression (PCR), partial least squares (PLS)) and artificial neural networks (ANNs). Different butter and margarine samples were mixed at various concentrations ranging from 0% to 100% w/w. PCA analysis was applied for the classification of butters, margarines and mixtures. PCR, PLS and ANN were used for the detection of adulteration ratios of butter. Models were created using a calibration data set and developed models were evaluated using a validation data set. The coefficient of determination (R(2)) values between actual and predicted values obtained for PCR, PLS and ANN for the validation data set were 0.968, 0.987 and 0.978, respectively. In conclusion, a combination of Raman spectroscopy with chemometrics and ANN methods can be applied for testing butter adulteration. Copyright © 2013 Elsevier Ltd. All rights reserved.
Trend Analyses of Users of a Syringe Exchange Program in Philadelphia, Pennsylvania: 1999-2014.
Maurer, Laurie A; Bass, Sarah Bauerle; Ye, Du; Benitez, José; Mazzella, Silvana; Krafty, Robert
2016-12-01
This study examines trends of injection drug users' (IDUs) use of a Philadelphia, Pennsylvania, syringe exchange program (SEP) from 1999 to 2014, including changes in demographics, drug use, substance abuse treatment, geographic indicators, and SEP use. Prevention Point Philadelphia's SEP registration data were analyzed using linear regression, Pearson's Chi square, and t-tests. Over time new SEP registrants have become younger, more racially diverse, and geographically more concentrated in specific areas of the city, corresponding to urban demographic shifts. The number of new registrants per year has decreased, however syringes exchanged have increased. Gentrification, cultural norms, and changes in risk perception are believed to have contributed to the changes in SEP registration. Demographic changes indicate outreach strategies for IDUs may need adjusting to address unique barriers for younger, more racially diverse users. Implications for SEPs are discussed, including policy and continued ability to address current public health threats.
Sieve estimation in semiparametric modeling of longitudinal data with informative observation times.
Zhao, Xingqiu; Deng, Shirong; Liu, Li; Liu, Lei
2014-01-01
Analyzing irregularly spaced longitudinal data often involves modeling possibly correlated response and observation processes. In this article, we propose a new class of semiparametric mean models that allows for the interaction between the observation history and covariates, leaving patterns of the observation process to be arbitrary. For inference on the regression parameters and the baseline mean function, a spline-based least squares estimation approach is proposed. The consistency, rate of convergence, and asymptotic normality of the proposed estimators are established. Our new approach is different from the usual approaches relying on the model specification of the observation scheme, and it can be easily used for predicting the longitudinal response. Simulation studies demonstrate that the proposed inference procedure performs well and is more robust. The analyses of bladder tumor data and medical cost data are presented to illustrate the proposed method.
40 CFR 86.331-79 - Hydrocarbon analyzer calibration.
Code of Federal Regulations, 2012 CFR
2012-07-01
... concentration. It is permitted to use additional concentrations. (5) Perform a linear least square regression on... least one-half hour after the oven has reached temperature for the system to equilibrate. (c) Initial...
Graphical Evaluation of the Ridge-Type Robust Regression Estimators in Mixture Experiments
Erkoc, Ali; Emiroglu, Esra
2014-01-01
In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set. PMID:25202738
Graphical evaluation of the ridge-type robust regression estimators in mixture experiments.
Erkoc, Ali; Emiroglu, Esra; Akay, Kadri Ulas
2014-01-01
In mixture experiments, estimation of the parameters is generally based on ordinary least squares (OLS). However, in the presence of multicollinearity and outliers, OLS can result in very poor estimates. In this case, effects due to the combined outlier-multicollinearity problem can be reduced to certain extent by using alternative approaches. One of these approaches is to use biased-robust regression techniques for the estimation of parameters. In this paper, we evaluate various ridge-type robust estimators in the cases where there are multicollinearity and outliers during the analysis of mixture experiments. Also, for selection of biasing parameter, we use fraction of design space plots for evaluating the effect of the ridge-type robust estimators with respect to the scaled mean squared error of prediction. The suggested graphical approach is illustrated on Hald cement data set.
NASA Technical Reports Server (NTRS)
Whitlock, C. H., III
1977-01-01
Constituents with linear radiance gradients with concentration may be quantified from signals which contain nonlinear atmospheric and surface reflection effects for both homogeneous and non-homogeneous water bodies provided accurate data can be obtained and nonlinearities are constant with wavelength. Statistical parameters must be used which give an indication of bias as well as total squared error to insure that an equation with an optimum combination of bands is selected. It is concluded that the effect of error in upwelled radiance measurements is to reduce the accuracy of the least square fitting process and to increase the number of points required to obtain a satisfactory fit. The problem of obtaining a multiple regression equation that is extremely sensitive to error is discussed.
Linge, Annett; Schötz, Ulrike; Löck, Steffen; Lohaus, Fabian; von Neubeck, Cläre; Gudziol, Volker; Nowak, Alexander; Tinhofer, Inge; Budach, Volker; Sak, Ali; Stuschke, Martin; Balermpas, Panagiotis; Rödel, Claus; Bunea, Hatice; Grosu, Anca-Ligia; Abdollahi, Amir; Debus, Jürgen; Ganswindt, Ute; Lauber, Kirsten; Pigorsch, Steffi; Combs, Stephanie E; Mönnich, David; Zips, Daniel; Baretton, Gustavo B; Buchholz, Frank; Krause, Mechthild; Belka, Claus; Baumann, Michael
2018-04-01
To compare six HPV detection methods in pre-treatment FFPE tumour samples from patients with locally advanced head and neck squamous cell carcinoma (HNSCC) who received postoperative (N = 175) or primary (N = 90) radiochemotherapy. HPV analyses included detection of (i) HPV16 E6/E7 RNA, (ii) HPV16 DNA (PCR-based arrays, A-PCR), (iii) HPV DNA (GP5+/GP6+ qPCR, (GP-PCR)), (iv) p16 (immunohistochemistry, p16 IHC), (v) combining p16 IHC and the A-PCR result and (vi) combining p16 IHC and the GP-PCR result. Differences between HPV positive and negative subgroups were evaluated for the primary endpoint loco-regional control (LRC) using Cox regression. Correlation between the HPV detection methods was high (chi-squared test, p < 0.001). While p16 IHC analysis resulted in several false positive classifications, A-PCR, GP-PCR and the combination of p16 IHC and A-PCR or GP-PCR led to results comparable to RNA analysis. In both cohorts, Cox regression analyses revealed significantly prolonged LRC for patients with HPV positive tumours irrespective of the detection method. The most stringent classification was obtained by detection of HPV16 RNA, or combining p16 IHC with A-PCR or GP-PCR. This approach revealed the lowest rate of recurrence in patients with tumours classified as HPV positive and therefore appears most suited for patient stratification in HPV-based clinical studies. Copyright © 2017 Elsevier B.V. All rights reserved.
Hom, Melanie A; Stanley, Ian H; Gutierrez, Peter M; Joiner, Thomas E
2017-01-01
Past research suggests that suicide has a profound impact on surviving family members and friends; yet, little is known about experiences with suicide bereavement among military populations. This study aimed to characterize experiences with suicide exposure and their associations with lifetime and current psychiatric symptoms among military service members and veterans. A sample of 1753 United States military service members and veterans completed self-report questionnaires assessing experiences with suicide exposure, lifetime history of suicidal thoughts and behaviors, current suicidal symptoms, and perceived likelihood of making a future suicide attempt. The majority of participants (57.3%) reported knowing someone who had died by suicide, and of these individuals, most (53.1%) reported having lost a friend to suicide. Chi-square tests, one-way ANOVAs, and logistic regression analyses revealed that those who reported knowing a suicide decedent were more likely to report more severe current suicidal symptoms and a history of suicidal thoughts and behaviors compared to those who did not know a suicide decedent. Hierarchical linear regression analyses indicated that greater self-reported interpersonal closeness to a suicide decedent predicted greater self-reported likelihood of a future suicide attempt, even after controlling for current suicidal symptoms and prior suicidal thoughts and behaviors. This study utilized cross-sectional data, and information regarding degree of exposure to suicide was not collected. Military personnel and veterans who have been bereaved by suicide may themselves be at elevated risk for suicidal thoughts and behaviors. Additional work is needed to delineate the relationship between these experiences. Copyright © 2016 Elsevier B.V. All rights reserved.
Comorbid depression/anxiety and teeth removed: Behavioral Risk Factor Surveillance System 2010.
Wiener, R Constance; Wiener, Michael A; McNeil, Daniel W
2015-10-01
The purpose of this study was to examine the association between participants (i) who reported having had clinical diagnoses of depression and anxiety with 6+ teeth removed and (ii) who reported having had clinical diagnoses of depression and anxiety with edentulism. The Behavioral Risk Factor Surveillance System (BRFSS) Survey 2010 was used for the study. Analyses involved using SAS 9.3® to determine variable frequencies, Rao-Scott chi-square bivariate analyses, and Proc Surveylogistic for the logistic regressions on complex survey designs. Participants eligibility included being 18 years or older and having complete data on depression, anxiety, and number of teeth removed. There were 76 292 eligible participants; 13.4% reported an anxiety diagnosis, 16.7% reported a depression diagnosis, and 8.6% reported comorbid depression and anxiety. The adjusted logistic regression models were significant for anxiety and depression alone and in combination for 6+ teeth removed (AOR: anxiety 1.23; 95% CI: 1.10, 1.38; P = 0.0773; AOR: depression 1.23; 95% CI: 1.10, 1.37; P = 0.0275; P < 0.0001; and AOR: comorbid depression and anxiety 1.30; 95% CI: 1.14, 1.49; P = 0.0001). However, the adjusted models with edentulism as the outcome failed to reach significance. Comorbid depression and anxiety are associated independently with 6+ teeth removed compared with 0-5 teeth removed in a national study conducted in United States. Comorbid depression and anxiety were not shown to be associated with edentulism as compared with any teeth present. © 2015 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Predictors of prison-based treatment outcomes: a comparison of men and women participants.
Messina, Nena; Burdon, William; Hagopian, Garo; Prendergast, Michael
2006-01-01
The purpose of this study was to examine differences between men and women entering prison-based therapeutic community (TC) treatment and to explore the relationship of those differences to posttreatment outcomes (i.e., aftercare participation and reincarceration rates). Extensive treatment-intake interview data for 4,386 women and 4,164 men from 16 prison-based TCs in California were compared using chi-square analyses and t-tests. Logistic regression analyses were then conducted separately for men and women to identify gender-specific factors associated with post-treatment outcomes. Prison intake data and treatment participation data come from a 5-year process and outcome evaluation of the California Department of Corrections' (CDC) Prison Treatment Expansion Initiative. The return-to-custody data came from the CDC's Offender Based Information System. Bivariate results showed that women were at a substantial disadvantage compared with their male counterparts with regard to histories of employment, substance abuse, psychological functioning, and sexual and physical abuse prior to incarceration. In contrast, men had more serious criminal justice involvement than women prior to incarceration. After controlling for these and other factors related to outcomes, regression findings showed that there were both similarities and differences with regard to gender-specific predictors of posttreatment outcomes. Time in treatment and motivation for treatment were similar predictors of aftercare participation for men and women. Psychological impairment was the strongest predictor of recidivism for both men and women. Substantial differences in background characteristics and the limited number of predictors related to posttreatment outcomes for women suggests the plausibility of gender-specific paths in the recovery process.
Open access to journal articles in dentistry: Prevalence and citation impact.
Hua, Fang; Sun, Heyuan; Walsh, Tanya; Worthington, Helen; Glenny, Anne-Marie
2016-04-01
To investigate the current prevalence of open access (OA) in the field of dentistry, the means used to provide OA, as well as the association between OA and citation counts. PubMed was searched for dental articles published in 2013. The OA status of each article was determined by manually checking Google, Google Scholar, PubMed and ResearchGate. Citation data were extracted from Google Scholar, Scopus and Web of Science. Chi-square tests were used to compare the OA prevalence by different subjects, study types, and continents of origin. The association between OA and citation count was studied with multivariable logistic regression analyses. A random sample of 908 articles was deemed eligible and therefore included. Among these, 416 were found freely available online, indicating an overall OA rate of 45.8%. Significant difference in OA rate was detected among articles in different subjects (P<0.001) and among those from different continents (P<0.001). Of articles that were OA, 74.2% were available via self-archiving ('Green road' OA), 53.3% were available from publishers ('Gold road' OA). According to multivariable logistic regression analyses, OA status was not significantly associated with either the existence of citation (P=0.37) or the level of citation (P=0.52). In the field of dentistry, 54% of recent journal articles are behind the paywall (non-OA) one year after their publication dates. The 'Green road' of providing OA was more common than the 'Gold road'. No evidence suggested that OA articles received significantly more citations than non-OA articles. Copyright © 2016 Elsevier Ltd. All rights reserved.
Tsai, Feng-Jen
2016-01-01
This study aims to determine the impact of grandchild care provision on elders' mental health by self-comparison and longitudinal study design. Information of 2930 grandparents from the Study of Health and Living Status of the Middle-Aged and Elderly in Taiwan were analysed. Elders' mental health was evaluated by Epidemiological Studies Depression Scale in both 2003 and 2007. Participants were divided into 4 groups based on their changing behaviour of caring for grandchildren from 2003 to 2007. Chi-square test was used to compare changes in elders' individual characteristics and total CESD scores between and within groups. ANOVA was used to compare the means of elders' depressive symptoms between groups while paired-t test was used to compare changes in elders' depression symptoms from 2003 to 2007. Logistic regression was performed to determine the associations between elders' changing behaviour of caring for grandchildren and changes in depressive symptoms. Elders continuously caring for grandchildren or started to take care of grandchildren significantly felt happier and enjoyed life more than before and more than elders who do not provide grandchild care. Logistic regression analyses exploring the impact of grandchild care provision found that elders provided no grandchild care had worst mental health amongst all. Elders stopped providing grandchild care had significantly higher risk of developing depressive symptoms (OR=1.40) than elders provided no grandchild care at all time. By self-comparison, this study illustrates how taking care of grandchildren maintains elders' mental health, especially against them from loneliness and depression. Copyright © 2016. Published by Elsevier Ireland Ltd.