Local Linear Regression for Data with AR Errors.
Li, Runze; Li, Yan
2009-07-01
In many statistical applications, data are collected over time, and they are likely correlated. In this paper, we investigate how to incorporate the correlation information into the local linear regression. Under the assumption that the error process is an auto-regressive process, a new estimation procedure is proposed for the nonparametric regression by using local linear regression method and the profile least squares techniques. We further propose the SCAD penalized profile least squares method to determine the order of auto-regressive process. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed procedure, and to compare the performance of the proposed procedures with the existing one. From our empirical studies, the newly proposed procedures can dramatically improve the accuracy of naive local linear regression with working-independent error structure. We illustrate the proposed methodology by an analysis of real data set.
Locally linear regression for pose-invariant face recognition.
Chai, Xiujuan; Shan, Shiguang; Chen, Xilin; Gao, Wen
2007-07-01
The variation of facial appearance due to the viewpoint (/pose) degrades face recognition systems considerably, which is one of the bottlenecks in face recognition. One of the possible solutions is generating virtual frontal view from any given nonfrontal view to obtain a virtual gallery/probe face. Following this idea, this paper proposes a simple, but efficient, novel locally linear regression (LLR) method, which generates the virtual frontal view from a given nonfrontal face image. We first justify the basic assumption of the paper that there exists an approximate linear mapping between a nonfrontal face image and its frontal counterpart. Then, by formulating the estimation of the linear mapping as a prediction problem, we present the regression-based solution, i.e., globally linear regression. To improve the prediction accuracy in the case of coarse alignment, LLR is further proposed. In LLR, we first perform dense sampling in the nonfrontal face image to obtain many overlapped local patches. Then, the linear regression technique is applied to each small patch for the prediction of its virtual frontal patch. Through the combination of all these patches, the virtual frontal view is generated. The experimental results on the CMU PIE database show distinct advantage of the proposed method over Eigen light-field method.
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-06-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.
STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION
Fan, Jianqing; Xue, Lingzhou; Zou, Hui
2014-01-01
Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, i.e., sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression. PMID:25598560
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Estimating monotonic rates from biological data using local linear regression.
Olito, Colin; White, Craig R; Marshall, Dustin J; Barneche, Diego R
2017-03-01
Accessing many fundamental questions in biology begins with empirical estimation of simple monotonic rates of underlying biological processes. Across a variety of disciplines, ranging from physiology to biogeochemistry, these rates are routinely estimated from non-linear and noisy time series data using linear regression and ad hoc manual truncation of non-linearities. Here, we introduce the R package LoLinR, a flexible toolkit to implement local linear regression techniques to objectively and reproducibly estimate monotonic biological rates from non-linear time series data, and demonstrate possible applications using metabolic rate data. LoLinR provides methods to easily and reliably estimate monotonic rates from time series data in a way that is statistically robust, facilitates reproducible research and is applicable to a wide variety of research disciplines in the biological sciences. © 2017. Published by The Company of Biologists Ltd.
Adaptive local linear regression with application to printer color management.
Gupta, Maya R; Garcia, Eric K; Chin, Erika
2008-06-01
Local learning methods, such as local linear regression and nearest neighbor classifiers, base estimates on nearby training samples, neighbors. Usually, the number of neighbors used in estimation is fixed to be a global "optimal" value, chosen by cross validation. This paper proposes adapting the number of neighbors used for estimation to the local geometry of the data, without need for cross validation. The term enclosing neighborhood is introduced to describe a set of neighbors whose convex hull contains the test point when possible. It is proven that enclosing neighborhoods yield bounded estimation variance under some assumptions. Three such enclosing neighborhood definitions are presented: natural neighbors, natural neighbors inclusive, and enclosing k-NN. The effectiveness of these neighborhood definitions with local linear regression is tested for estimating lookup tables for color management. Significant improvements in error metrics are shown, indicating that enclosing neighborhoods may be a promising adaptive neighborhood definition for other local learning tasks as well, depending on the density of training samples.
Jiang, Feng; Han, Ji-zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088
Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
Image interpolation via regularized local linear regression.
Liu, Xianming; Zhao, Debin; Xiong, Ruiqin; Ma, Siwei; Gao, Wen; Sun, Huifang
2011-12-01
The linear regression model is a very attractive tool to design effective image interpolation schemes. Some regression-based image interpolation algorithms have been proposed in the literature, in which the objective functions are optimized by ordinary least squares (OLS). However, it is shown that interpolation with OLS may have some undesirable properties from a robustness point of view: even small amounts of outliers can dramatically affect the estimates. To address these issues, in this paper we propose a novel image interpolation algorithm based on regularized local linear regression (RLLR). Starting with the linear regression model where we replace the OLS error norm with the moving least squares (MLS) error norm leads to a robust estimator of local image structure. To keep the solution stable and avoid overfitting, we incorporate the l(2)-norm as the estimator complexity penalty. Moreover, motivated by recent progress on manifold-based semi-supervised learning, we explicitly consider the intrinsic manifold structure by making use of both measured and unmeasured data points. Specifically, our framework incorporates the geometric structure of the marginal probability distribution induced by unmeasured samples as an additional local smoothness preserving constraint. The optimal model parameters can be obtained with a closed-form solution by solving a convex optimization problem. Experimental results on benchmark test images demonstrate that the proposed method achieves very competitive performance with the state-of-the-art interpolation algorithms, especially in image edge structure preservation. © 2011 IEEE
Automating approximate Bayesian computation by local linear regression.
Thornton, Kevin R
2009-07-07
In several biological contexts, parameter inference often relies on computationally-intensive techniques. "Approximate Bayesian Computation", or ABC, methods based on summary statistics have become increasingly popular. A particular flavor of ABC based on using a linear regression to approximate the posterior distribution of the parameters, conditional on the summary statistics, is computationally appealing, yet no standalone tool exists to automate the procedure. Here, I describe a program to implement the method. The software package ABCreg implements the local linear-regression approach to ABC. The advantages are: 1. The code is standalone, and fully-documented. 2. The program will automatically process multiple data sets, and create unique output files for each (which may be processed immediately in R), facilitating the testing of inference procedures on simulated data, or the analysis of multiple data sets. 3. The program implements two different transformation methods for the regression step. 4. Analysis options are controlled on the command line by the user, and the program is designed to output warnings for cases where the regression fails. 5. The program does not depend on any particular simulation machinery (coalescent, forward-time, etc.), and therefore is a general tool for processing the results from any simulation. 6. The code is open-source, and modular.Examples of applying the software to empirical data from Drosophila melanogaster, and testing the procedure on simulated data, are shown. In practice, the ABCreg simplifies implementing ABC based on local-linear regression.
Local linear regression for function learning: an analysis based on sample discrepancy.
Cervellera, Cristiano; Macciò, Danilo
2014-11-01
Local linear regression models, a kind of nonparametric structures that locally perform a linear estimation of the target function, are analyzed in the context of empirical risk minimization (ERM) for function learning. The analysis is carried out with emphasis on geometric properties of the available data. In particular, the discrepancy of the observation points used both to build the local regression models and compute the empirical risk is considered. This allows to treat indifferently the case in which the samples come from a random external source and the one in which the input space can be freely explored. Both consistency of the ERM procedure and approximating capabilities of the estimator are analyzed, proving conditions to ensure convergence. Since the theoretical analysis shows that the estimation improves as the discrepancy of the observation points becomes smaller, low-discrepancy sequences, a family of sampling methods commonly employed for efficient numerical integration, are also analyzed. Simulation results involving two different examples of function learning are provided.
Incremental online learning in high dimensions.
Vijayakumar, Sethu; D'Souza, Aaron; Schaal, Stefan
2005-12-01
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high-dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally efficient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high-dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it (1) learns rapidly with second-order learning methods based on incremental training, (2) uses statistically sound stochastic leave-one-out cross validation for learning without the need to memorize training data, (3) adjusts its weighting kernels based on only local information in order to minimize the danger of negative interference of incremental learning, (4) has a computational complexity that is linear in the number of inputs, and (5) can deal with a large number of-possibly redundant-inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and efficiently operate in very high-dimensional spaces.
Single Image Super-Resolution Using Global Regression Based on Multiple Local Linear Mappings.
Choi, Jae-Seok; Kim, Munchurl
2017-03-01
Super-resolution (SR) has become more vital, because of its capability to generate high-quality ultra-high definition (UHD) high-resolution (HR) images from low-resolution (LR) input images. Conventional SR methods entail high computational complexity, which makes them difficult to be implemented for up-scaling of full-high-definition input images into UHD-resolution images. Nevertheless, our previous super-interpolation (SI) method showed a good compromise between Peak-Signal-to-Noise Ratio (PSNR) performances and computational complexity. However, since SI only utilizes simple linear mappings, it may fail to precisely reconstruct HR patches with complex texture. In this paper, we present a novel SR method, which inherits the large-to-small patch conversion scheme from SI but uses global regression based on local linear mappings (GLM). Thus, our new SR method is called GLM-SI. In GLM-SI, each LR input patch is divided into 25 overlapped subpatches. Next, based on the local properties of these subpatches, 25 different local linear mappings are applied to the current LR input patch to generate 25 HR patch candidates, which are then regressed into one final HR patch using a global regressor. The local linear mappings are learned cluster-wise in our off-line training phase. The main contribution of this paper is as follows: Previously, linear-mapping-based conventional SR methods, including SI only used one simple yet coarse linear mapping to each patch to reconstruct its HR version. On the contrary, for each LR input patch, our GLM-SI is the first to apply a combination of multiple local linear mappings, where each local linear mapping is found according to local properties of the current LR patch. Therefore, it can better approximate nonlinear LR-to-HR mappings for HR patches with complex texture. Experiment results show that the proposed GLM-SI method outperforms most of the state-of-the-art methods, and shows comparable PSNR performance with much lower computational complexity when compared with a super-resolution method based on convolutional neural nets (SRCNN15). Compared with the previous SI method that is limited with a scale factor of 2, GLM-SI shows superior performance with average 0.79 dB higher in PSNR, and can be used for scale factors of 3 or higher.
NASA Astrophysics Data System (ADS)
Martínez-Fernández, J.; Chuvieco, E.; Koutsias, N.
2013-02-01
Humans are responsible for most forest fires in Europe, but anthropogenic factors behind these events are still poorly understood. We tried to identify the driving factors of human-caused fire occurrence in Spain by applying two different statistical approaches. Firstly, assuming stationary processes for the whole country, we created models based on multiple linear regression and binary logistic regression to find factors associated with fire density and fire presence, respectively. Secondly, we used geographically weighted regression (GWR) to better understand and explore the local and regional variations of those factors behind human-caused fire occurrence. The number of human-caused fires occurring within a 25-yr period (1983-2007) was computed for each of the 7638 Spanish mainland municipalities, creating a binary variable (fire/no fire) to develop logistic models, and a continuous variable (fire density) to build standard linear regression models. A total of 383 657 fires were registered in the study dataset. The binary logistic model, which estimates the probability of having/not having a fire, successfully classified 76.4% of the total observations, while the ordinary least squares (OLS) regression model explained 53% of the variation of the fire density patterns (adjusted R2 = 0.53). Both approaches confirmed, in addition to forest and climatic variables, the importance of variables related with agrarian activities, land abandonment, rural population exodus and developmental processes as underlying factors of fire occurrence. For the GWR approach, the explanatory power of the GW linear model for fire density using an adaptive bandwidth increased from 53% to 67%, while for the GW logistic model the correctly classified observations improved only slightly, from 76.4% to 78.4%, but significantly according to the corrected Akaike Information Criterion (AICc), from 3451.19 to 3321.19. The results from GWR indicated a significant spatial variation in the local parameter estimates for all the variables and an important reduction of the autocorrelation in the residuals of the GW linear model. Despite the fitting improvement of local models, GW regression, more than an alternative to "global" or traditional regression modelling, seems to be a valuable complement to explore the non-stationary relationships between the response variable and the explanatory variables. The synergy of global and local modelling provides insights into fire management and policy and helps further our understanding of the fire problem over large areas while at the same time recognizing its local character.
Reasons for Hierarchical Linear Modeling: A Reminder.
ERIC Educational Resources Information Center
Wang, Jianjun
1999-01-01
Uses examples of hierarchical linear modeling (HLM) at local and national levels to illustrate proper applications of HLM and dummy variable regression. Raises cautions about the circumstances under which hierarchical data do not need HLM. (SLD)
Parameterizing sorption isotherms using a hybrid global-local fitting procedure.
Matott, L Shawn; Singh, Anshuman; Rabideau, Alan J
2017-05-01
Predictive modeling of the transport and remediation of groundwater contaminants requires an accurate description of the sorption process, which is usually provided by fitting an isotherm model to site-specific laboratory data. Commonly used calibration procedures, listed in order of increasing sophistication, include: trial-and-error, linearization, non-linear regression, global search, and hybrid global-local search. Given the considerable variability in fitting procedures applied in published isotherm studies, we investigated the importance of algorithm selection through a series of numerical experiments involving 13 previously published sorption datasets. These datasets, considered representative of state-of-the-art for isotherm experiments, had been previously analyzed using trial-and-error, linearization, or non-linear regression methods. The isotherm expressions were re-fit using a 3-stage hybrid global-local search procedure (i.e. global search using particle swarm optimization followed by Powell's derivative free local search method and Gauss-Marquardt-Levenberg non-linear regression). The re-fitted expressions were then compared to previously published fits in terms of the optimized weighted sum of squared residuals (WSSR) fitness function, the final estimated parameters, and the influence on contaminant transport predictions - where easily computed concentration-dependent contaminant retardation factors served as a surrogate measure of likely transport behavior. Results suggest that many of the previously published calibrated isotherm parameter sets were local minima. In some cases, the updated hybrid global-local search yielded order-of-magnitude reductions in the fitness function. In particular, of the candidate isotherms, the Polanyi-type models were most likely to benefit from the use of the hybrid fitting procedure. In some cases, improvements in fitness function were associated with slight (<10%) changes in parameter values, but in other cases significant (>50%) changes in parameter values were noted. Despite these differences, the influence of isotherm misspecification on contaminant transport predictions was quite variable and difficult to predict from inspection of the isotherms. Copyright © 2017 Elsevier B.V. All rights reserved.
Partitioning sources of variation in vertebrate species richness
Boone, R.B.; Krohn, W.B.
2000-01-01
Aim: To explore biogeographic patterns of terrestrial vertebrates in Maine, USA using techniques that would describe local and spatial correlations with the environment. Location: Maine, USA. Methods: We delineated the ranges within Maine (86,156 km2) of 275 species using literature and expert review. Ranges were combined into species richness maps, and compared to geomorphology, climate, and woody plant distributions. Methods were adapted that compared richness of all vertebrate classes to each environmental correlate, rather than assessing a single explanatory theory. We partitioned variation in species richness into components using tree and multiple linear regression. Methods were used that allowed for useful comparisons between tree and linear regression results. For both methods we partitioned variation into broad-scale (spatially autocorrelated) and fine-scale (spatially uncorrelated) explained and unexplained components. By partitioning variance, and using both tree and linear regression in analyses, we explored the degree of variation in species richness for each vertebrate group that Could be explained by the relative contribution of each environmental variable. Results: In tree regression, climate variation explained richness better (92% of mean deviance explained for all species) than woody plant variation (87%) and geomorphology (86%). Reptiles were highly correlated with environmental variation (93%), followed by mammals, amphibians, and birds (each with 84-82% deviance explained). In multiple linear regression, climate was most closely associated with total vertebrate richness (78%), followed by woody plants (67%) and geomorphology (56%). Again, reptiles were closely correlated with the environment (95%), followed by mammals (73%), amphibians (63%) and birds (57%). Main conclusions: Comparing variation explained using tree and multiple linear regression quantified the importance of nonlinear relationships and local interactions between species richness and environmental variation, identifying the importance of linear relationships between reptiles and the environment, and nonlinear relationships between birds and woody plants, for example. Conservation planners should capture climatic variation in broad-scale designs; temperatures may shift during climate change, but the underlying correlations between the environment and species richness will presumably remain.
Acoustic-articulatory mapping in vowels by locally weighted regression
McGowan, Richard S.; Berger, Michael A.
2009-01-01
A method for mapping between simultaneously measured articulatory and acoustic data is proposed. The method uses principal components analysis on the articulatory and acoustic variables, and mapping between the domains by locally weighted linear regression, or loess [Cleveland, W. S. (1979). J. Am. Stat. Assoc. 74, 829–836]. The latter method permits local variation in the slopes of the linear regression, assuming that the function being approximated is smooth. The methodology is applied to vowels of four speakers in the Wisconsin X-ray Microbeam Speech Production Database, with formant analysis. Results are examined in terms of (1) examples of forward (articulation-to-acoustics) mappings and inverse mappings, (2) distributions of local slopes and constants, (3) examples of correlations among slopes and constants, (4) root-mean-square error, and (5) sensitivity of formant frequencies to articulatory change. It is shown that the results are qualitatively correct and that loess performs better than global regression. The forward mappings show different root-mean-square error properties than the inverse mappings indicating that this method is better suited for the forward mappings than the inverse mappings, at least for the data chosen for the current study. Some preliminary results on sensitivity of the first two formant frequencies to the two most important articulatory principal components are presented. PMID:19813812
Aqil, Muhammad; Kita, Ichiro; Yano, Akira; Nishiyama, Soichi
2007-10-01
Traditionally, the multiple linear regression technique has been one of the most widely used models in simulating hydrological time series. However, when the nonlinear phenomenon is significant, the multiple linear will fail to develop an appropriate predictive model. Recently, neuro-fuzzy systems have gained much popularity for calibrating the nonlinear relationships. This study evaluated the potential of a neuro-fuzzy system as an alternative to the traditional statistical regression technique for the purpose of predicting flow from a local source in a river basin. The effectiveness of the proposed identification technique was demonstrated through a simulation study of the river flow time series of the Citarum River in Indonesia. Furthermore, in order to provide the uncertainty associated with the estimation of river flow, a Monte Carlo simulation was performed. As a comparison, a multiple linear regression analysis that was being used by the Citarum River Authority was also examined using various statistical indices. The simulation results using 95% confidence intervals indicated that the neuro-fuzzy model consistently underestimated the magnitude of high flow while the low and medium flow magnitudes were estimated closer to the observed data. The comparison of the prediction accuracy of the neuro-fuzzy and linear regression methods indicated that the neuro-fuzzy approach was more accurate in predicting river flow dynamics. The neuro-fuzzy model was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively. Considering its simplicity and efficiency, the neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Chaurasia, Ashok; Harel, Ofer
2015-02-10
Tests for regression coefficients such as global, local, and partial F-tests are common in applied research. In the framework of multiple imputation, there are several papers addressing tests for regression coefficients. However, for simultaneous hypothesis testing, the existing methods are computationally intensive because they involve calculation with vectors and (inversion of) matrices. In this paper, we propose a simple method based on the scalar entity, coefficient of determination, to perform (global, local, and partial) F-tests with multiply imputed data. The proposed method is evaluated using simulated data and applied to suicide prevention data. Copyright © 2014 John Wiley & Sons, Ltd.
Khalil, Mohamed H.; Shebl, Mostafa K.; Kosba, Mohamed A.; El-Sabrout, Karim; Zaki, Nesma
2016-01-01
Aim: This research was conducted to determine the most affecting parameters on hatchability of indigenous and improved local chickens’ eggs. Materials and Methods: Five parameters were studied (fertility, early and late embryonic mortalities, shape index, egg weight, and egg weight loss) on four strains, namely Fayoumi, Alexandria, Matrouh, and Montazah. Multiple linear regression was performed on the studied parameters to determine the most influencing one on hatchability. Results: The results showed significant differences in commercial and scientific hatchability among strains. Alexandria strain has the highest significant commercial hatchability (80.70%). Regarding the studied strains, highly significant differences in hatching chick weight among strains were observed. Using multiple linear regression analysis, fertility made the greatest percent contribution (71.31%) to hatchability, and the lowest percent contributions were made by shape index and egg weight loss. Conclusion: A prediction of hatchability using multiple regression analysis could be a good tool to improve hatchability percentage in chickens. PMID:27651666
Wheeler, David C.; Hickson, DeMarc A.; Waller, Lance A.
2010-01-01
Many diagnostic tools and goodness-of-fit measures, such as the Akaike information criterion (AIC) and the Bayesian deviance information criterion (DIC), are available to evaluate the overall adequacy of linear regression models. In addition, visually assessing adequacy in models has become an essential part of any regression analysis. In this paper, we focus on a spatial consideration of the local DIC measure for model selection and goodness-of-fit evaluation. We use a partitioning of the DIC into the local DIC, leverage, and deviance residuals to assess local model fit and influence for both individual observations and groups of observations in a Bayesian framework. We use visualization of the local DIC and differences in local DIC between models to assist in model selection and to visualize the global and local impacts of adding covariates or model parameters. We demonstrate the utility of the local DIC in assessing model adequacy using HIV prevalence data from pregnant women in the Butare province of Rwanda during 1989-1993 using a range of linear model specifications, from global effects only to spatially varying coefficient models, and a set of covariates related to sexual behavior. Results of applying the diagnostic visualization approach include more refined model selection and greater understanding of the models as applied to the data. PMID:21243121
42 CFR 433.68 - Permissible health care-related taxes.
Code of Federal Regulations, 2011 CFR
2011-10-01
...-related tax is imposed by a unit of local government, the tax must extend to all items or services or... least squares, the slope (designated as (B) (that is. the value of the x coefficient) of two linear... linear regression, as described in paragraph (e)(2)(i) of this section, for the State's tax program, if...
42 CFR 433.68 - Permissible health care-related taxes.
Code of Federal Regulations, 2013 CFR
2013-10-01
...-related tax is imposed by a unit of local government, the tax must extend to all items or services or... least squares, the slope (designated as (B) (that is. the value of the x coefficient) of two linear... linear regression, as described in paragraph (e)(2)(i) of this section, for the State's tax program, if...
NASA Astrophysics Data System (ADS)
Kumar, Vandhna; Meyssignac, Benoit; Melet, Angélique; Ganachaud, Alexandre
2017-04-01
Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years is up to 3 times the global average. In this study, we attempt to reconstruct sea levels at selected sites in the region (Suva, Lautoka, Noumea - Fiji and New Caledonia) as a mutiple-linear regression of atmospheric and oceanic variables. We focus on interannual-to-decadal scale variability, and lower (including the global mean sea level rise) over the 1979-2014 period. Sea levels are taken from tide gauge records and the ORAS4 reanalysis dataset, and are expressed as a sum of steric and mass changes as a preliminary step. The key development in our methodology is using leading wind stress curl as a proxy for the thermosteric component. This is based on the knowledge that wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. The analysis is primarily based on correlation between local sea level and selected predictors, the dominant one being wind stress curl. In the first step, proxy boxes for wind stress curl are determined via regions of highest correlation. The proportion of sea level explained via linear regression is then removed, leaving a residual. This residual is then correlated with other locally acting potential predictors: halosteric sea level, the zonal and meridional wind stress components, and sea surface temperature. The statistically significant predictors are used in a multi-linear regression function to simulate the observed sea level. The method is able to reproduce between 40 to 80% of the variance in observed sea level. Based on the skill of the model, it has high potential in sea level projection and downscaling studies.
Kim, Dae-Hee; Choi, Jae-Hun; Lim, Myung-Eun; Park, Soo-Jun
2008-01-01
This paper suggests the method of correcting distance between an ambient intelligence display and a user based on linear regression and smoothing method, by which distance information of a user who approaches to the display can he accurately output even in an unanticipated condition using a passive infrared VIR) sensor and an ultrasonic device. The developed system consists of an ambient intelligence display and an ultrasonic transmitter, and a sensor gateway. Each module communicates with each other through RF (Radio frequency) communication. The ambient intelligence display includes an ultrasonic receiver and a PIR sensor for motion detection. In particular, this system selects and processes algorithms such as smoothing or linear regression for current input data processing dynamically through judgment process that is determined using the previous reliable data stored in a queue. In addition, we implemented GUI software with JAVA for real time location tracking and an ambient intelligence display.
Jaber, Abobaker M; Ismail, Mohd Tahir; Altaher, Alsaidi M
2014-01-01
This paper mainly forecasts the daily closing price of stock markets. We propose a two-stage technique that combines the empirical mode decomposition (EMD) with nonparametric methods of local linear quantile (LLQ). We use the proposed technique, EMD-LLQ, to forecast two stock index time series. Detailed experiments are implemented for the proposed method, in which EMD-LPQ, EMD, and Holt-Winter methods are compared. The proposed EMD-LPQ model is determined to be superior to the EMD and Holt-Winter methods in predicting the stock closing prices.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Gelman, Andrew; Imbens, Guido
2014-01-01
It is common in regression discontinuity analysis to control for high order (third, fourth, or higher) polynomials of the forcing variable. We argue that estimators for causal effects based on such methods can be misleading, and we recommend researchers do not use them, and instead use estimators based on local linear or quadratic polynomials or…
Spreco, A; Eriksson, O; Dahlström, Ö; Timpka, T
2017-07-01
Methods for the detection of influenza epidemics and prediction of their progress have seldom been comparatively evaluated using prospective designs. This study aimed to perform a prospective comparative trial of algorithms for the detection and prediction of increased local influenza activity. Data on clinical influenza diagnoses recorded by physicians and syndromic data from a telenursing service were used. Five detection and three prediction algorithms previously evaluated in public health settings were calibrated and then evaluated over 3 years. When applied on diagnostic data, only detection using the Serfling regression method and prediction using the non-adaptive log-linear regression method showed acceptable performances during winter influenza seasons. For the syndromic data, none of the detection algorithms displayed a satisfactory performance, while non-adaptive log-linear regression was the best performing prediction method. We conclude that evidence was found for that available algorithms for influenza detection and prediction display satisfactory performance when applied on local diagnostic data during winter influenza seasons. When applied on local syndromic data, the evaluated algorithms did not display consistent performance. Further evaluations and research on combination of methods of these types in public health information infrastructures for 'nowcasting' (integrated detection and prediction) of influenza activity are warranted.
Qian, Jianjun; Yang, Jian; Xu, Yong
2013-09-01
This paper presents a robust but simple image feature extraction method, called image decomposition based on local structure (IDLS). It is assumed that in the local window of an image, the macro-pixel (patch) of the central pixel, and those of its neighbors, are locally linear. IDLS captures the local structural information by describing the relationship between the central macro-pixel and its neighbors. This relationship is represented with the linear representation coefficients determined using ridge regression. One image is actually decomposed into a series of sub-images (also called structure images) according to a local structure feature vector. All the structure images, after being down-sampled for dimensionality reduction, are concatenated into one super-vector. Fisher linear discriminant analysis is then used to provide a low-dimensional, compact, and discriminative representation for each super-vector. The proposed method is applied to face recognition and examined using our real-world face image database, NUST-RWFR, and five popular, publicly available, benchmark face image databases (AR, Extended Yale B, PIE, FERET, and LFW). Experimental results show the performance advantages of IDLS over state-of-the-art algorithms.
Robust neural network with applications to credit portfolio data analysis.
Feng, Yijia; Li, Runze; Sudjianto, Agus; Zhang, Yiyun
2010-01-01
In this article, we study nonparametric conditional quantile estimation via neural network structure. We proposed an estimation method that combines quantile regression and neural network (robust neural network, RNN). It provides good smoothing performance in the presence of outliers and can be used to construct prediction bands. A Majorization-Minimization (MM) algorithm was developed for optimization. Monte Carlo simulation study is conducted to assess the performance of RNN. Comparison with other nonparametric regression methods (e.g., local linear regression and regression splines) in real data application demonstrate the advantage of the newly proposed procedure.
Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald
2011-06-01
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852
Face Alignment via Regressing Local Binary Features.
Ren, Shaoqing; Cao, Xudong; Wei, Yichen; Sun, Jian
2016-03-01
This paper presents a highly efficient and accurate regression approach for face alignment. Our approach has two novel components: 1) a set of local binary features and 2) a locality principle for learning those features. The locality principle guides us to learn a set of highly discriminative local binary features for each facial landmark independently. The obtained local binary features are used to jointly learn a linear regression for the final output. This approach achieves the state-of-the-art results when tested on the most challenging benchmarks to date. Furthermore, because extracting and regressing local binary features are computationally very cheap, our system is much faster than previous methods. It achieves over 3000 frames per second (FPS) on a desktop or 300 FPS on a mobile phone for locating a few dozens of landmarks. We also study a key issue that is important but has received little attention in the previous research, which is the face detector used to initialize alignment. We investigate several face detectors and perform quantitative evaluation on how they affect alignment accuracy. We find that an alignment friendly detector can further greatly boost the accuracy of our alignment method, reducing the error up to 16% relatively. To facilitate practical usage of face detection/alignment methods, we also propose a convenient metric to measure how good a detector is for alignment initialization.
Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C
2015-01-01
We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.
SU-F-R-20: Image Texture Features Correlate with Time to Local Failure in Lung SBRT Patients
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrews, M; Abazeed, M; Woody, N
Purpose: To explore possible correlation between CT image-based texture and histogram features and time-to-local-failure in early stage non-small cell lung cancer (NSCLC) patients treated with stereotactic body radiotherapy (SBRT).Methods and Materials: From an IRB-approved lung SBRT registry for patients treated between 2009–2013 we selected 48 (20 male, 28 female) patients with local failure. Median patient age was 72.3±10.3 years. Mean time to local failure was 15 ± 7.1 months. Physician-contoured gross tumor volumes (GTV) on the planning CT images were processed and 3D gray-level co-occurrence matrix (GLCM) based texture and histogram features were calculated in Matlab. Data were exported tomore » R and a multiple linear regression model was used to examine the relationship between texture features and time-to-local-failure. Results: Multiple linear regression revealed that entropy (p=0.0233, multiple R2=0.60) from GLCM-based texture analysis and the standard deviation (p=0.0194, multiple R2=0.60) from the histogram-based features were statistically significantly correlated with the time-to-local-failure. Conclusion: Image-based texture analysis can be used to predict certain aspects of treatment outcomes of NSCLC patients treated with SBRT. We found entropy and standard deviation calculated for the GTV on the CT images displayed a statistically significant correlation with and time-to-local-failure in lung SBRT patients.« less
Sun, Yanqing; Sun, Liuquan; Zhou, Jie
2013-07-01
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.
Majorization Minimization by Coordinate Descent for Concave Penalized Generalized Linear Models
Jiang, Dingfeng; Huang, Jian
2013-01-01
Recent studies have demonstrated theoretical attractiveness of a class of concave penalties in variable selection, including the smoothly clipped absolute deviation and minimax concave penalties. The computation of the concave penalized solutions in high-dimensional models, however, is a difficult task. We propose a majorization minimization by coordinate descent (MMCD) algorithm for computing the concave penalized solutions in generalized linear models. In contrast to the existing algorithms that use local quadratic or local linear approximation to the penalty function, the MMCD seeks to majorize the negative log-likelihood by a quadratic loss, but does not use any approximation to the penalty. This strategy makes it possible to avoid the computation of a scaling factor in each update of the solutions, which improves the efficiency of coordinate descent. Under certain regularity conditions, we establish theoretical convergence property of the MMCD. We implement this algorithm for a penalized logistic regression model using the SCAD and MCP penalties. Simulation studies and a data example demonstrate that the MMCD works sufficiently fast for the penalized logistic regression in high-dimensional settings where the number of covariates is much larger than the sample size. PMID:25309048
Inferring gene regression networks with model trees
2010-01-01
Background Novel strategies are required in order to handle the huge amount of data produced by microarray technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between genes building the so-called gene co-expression networks. They are typically generated using correlation statistics as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two genes have a strong global similarity but do not detect local similarities. Results We propose model trees as a method to identify gene interaction networks. While correlation-based methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods. Conclusions REGNET generates gene association networks from gene expression data, and differs from correlation-based methods in that the relationship between one gene and others is calculated simultaneously. Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression functions. They are very often more precise than linear regression models because they can add just different linear regressions to separate areas of the search space favoring to infer localized similarities over a more global similarity. Furthermore, experimental results show the good performance of REGNET. PMID:20950452
Hao, Yong; Sun, Xu-Dong; Yang, Qiang
2012-12-01
Variables selection strategy combined with local linear embedding (LLE) was introduced for the analysis of complex samples by using near infrared spectroscopy (NIRS). Three methods include Monte Carlo uninformation variable elimination (MCUVE), successive projections algorithm (SPA) and MCUVE connected with SPA were used for eliminating redundancy spectral variables. Partial least squares regression (PLSR) and LLE-PLSR were used for modeling complex samples. The results shown that MCUVE can both extract effective informative variables and improve the precision of models. Compared with PLSR models, LLE-PLSR models can achieve more accurate analysis results. MCUVE combined with LLE-PLSR is an effective modeling method for NIRS quantitative analysis.
Distributed Monitoring of the R(sup 2) Statistic for Linear Regression
NASA Technical Reports Server (NTRS)
Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.
2011-01-01
The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.
Local Intrinsic Dimension Estimation by Generalized Linear Modeling.
Hino, Hideitsu; Fujiki, Jun; Akaho, Shotaro; Murata, Noboru
2017-07-01
We propose a method for intrinsic dimension estimation. By fitting the power of distance from an inspection point and the number of samples included inside a ball with a radius equal to the distance, to a regression model, we estimate the goodness of fit. Then, by using the maximum likelihood method, we estimate the local intrinsic dimension around the inspection point. The proposed method is shown to be comparable to conventional methods in global intrinsic dimension estimation experiments. Furthermore, we experimentally show that the proposed method outperforms a conventional local dimension estimation method.
Extraction of object skeletons in multispectral imagery by the orthogonal regression fitting
NASA Astrophysics Data System (ADS)
Palenichka, Roman M.; Zaremba, Marek B.
2003-03-01
Accurate and automatic extraction of skeletal shape of objects of interest from satellite images provides an efficient solution to such image analysis tasks as object detection, object identification, and shape description. The problem of skeletal shape extraction can be effectively solved in three basic steps: intensity clustering (i.e. segmentation) of objects, extraction of a structural graph of the object shape, and refinement of structural graph by the orthogonal regression fitting. The objects of interest are segmented from the background by a clustering transformation of primary features (spectral components) with respect to each pixel. The structural graph is composed of connected skeleton vertices and represents the topology of the skeleton. In the general case, it is a quite rough piecewise-linear representation of object skeletons. The positions of skeleton vertices on the image plane are adjusted by means of the orthogonal regression fitting. It consists of changing positions of existing vertices according to the minimum of the mean orthogonal distances and, eventually, adding new vertices in-between if a given accuracy if not yet satisfied. Vertices of initial piecewise-linear skeletons are extracted by using a multi-scale image relevance function. The relevance function is an image local operator that has local maximums at the centers of the objects of interest.
Event detection and localization for small mobile robots using reservoir computing.
Antonelo, E A; Schrauwen, B; Stroobandt, D
2008-08-01
Reservoir Computing (RC) techniques use a fixed (usually randomly created) recurrent neural network, or more generally any dynamic system, which operates at the edge of stability, where only a linear static readout output layer is trained by standard linear regression methods. In this work, RC is used for detecting complex events in autonomous robot navigation. This can be extended to robot localization tasks which are solely based on a few low-range, high-noise sensory data. The robot thus builds an implicit map of the environment (after learning) that is used for efficient localization by simply processing the input stream of distance sensors. These techniques are demonstrated in both a simple simulation environment and in the physically realistic Webots simulation of the commercially available e-puck robot, using several complex and even dynamic environments.
Reliable two-dimensional phase unwrapping method using region growing and local linear estimation.
Zhou, Kun; Zaitsev, Maxim; Bao, Shanglian
2009-10-01
In MRI, phase maps can provide useful information about parameters such as field inhomogeneity, velocity of blood flow, and the chemical shift between water and fat. As phase is defined in the (-pi,pi] range, however, phase wraps often occur, which complicates image analysis and interpretation. This work presents a two-dimensional phase unwrapping algorithm that uses quality-guided region growing and local linear estimation. The quality map employs the variance of the second-order partial derivatives of the phase as the quality criterion. Phase information from unwrapped neighboring pixels is used to predict the correct phase of the current pixel using a linear regression method. The algorithm was tested on both simulated and real data, and is shown to successfully unwrap phase images that are corrupted by noise and have rapidly changing phase. (c) 2009 Wiley-Liss, Inc.
Conditional parametric models for storm sewer runoff
NASA Astrophysics Data System (ADS)
Jonsdottir, H.; Nielsen, H. Aa; Madsen, H.; Eliasson, J.; Palsson, O. P.; Nielsen, M. K.
2007-05-01
The method of conditional parametric modeling is introduced for flow prediction in a sewage system. It is a well-known fact that in hydrological modeling the response (runoff) to input (precipitation) varies depending on soil moisture and several other factors. Consequently, nonlinear input-output models are needed. The model formulation described in this paper is similar to the traditional linear models like final impulse response (FIR) and autoregressive exogenous (ARX) except that the parameters vary as a function of some external variables. The parameter variation is modeled by local lines, using kernels for local linear regression. As such, the method might be referred to as a nearest neighbor method. The results achieved in this study were compared to results from the conventional linear methods, FIR and ARX. The increase in the coefficient of determination is substantial. Furthermore, the new approach conserves the mass balance better. Hence this new approach looks promising for various hydrological models and analysis.
Local Fiscal Allocation for Public Health Departments.
McCullough, J Mac; Leider, Jonathon P; Riley, William J
2015-12-01
We examined the percentage of local government taxes ("fiscal allocation") dedicated to local health departments on a national level, as well as correlates of local investment in public health. Using the most recent data available--the 2008 National Association of City and County Health Officials Profile survey and the 2007 U.S. Census Bureau Census of Local Governments-generalized linear regression models examined associations between fiscal allocation and local health department setting, governance, finance, and service provision. Models were stratified by the extent of long-term debt for the jurisdiction. Analyses were performed in 2014. Average fiscal allocation for public health was 3.31% of total local taxes. In multivariate regressions, per capita expenditures, having a local board of health and public health service provision were associated with higher fiscal allocation. Stratified models showed that local board of health and local health department taxing authority were associated with fiscal allocation in low and high long-term debt areas, respectively. The proportion of all local taxes allocated to local public health is related to local health department expenditures, service provision, and governance. These relationships depend upon the extent of long-term debt in the jurisdiction. Copyright © 2015 American Journal of Preventive Medicine. Published by Elsevier Inc. All rights reserved.
Quantifying and Reducing Curve-Fitting Uncertainty in Isc
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campanelli, Mark; Duck, Benjamin; Emery, Keith
2015-06-14
Current-voltage (I-V) curve measurements of photovoltaic (PV) devices are used to determine performance parameters and to establish traceable calibration chains. Measurement standards specify localized curve fitting methods, e.g., straight-line interpolation/extrapolation of the I-V curve points near short-circuit current, Isc. By considering such fits as statistical linear regressions, uncertainties in the performance parameters are readily quantified. However, the legitimacy of such a computed uncertainty requires that the model be a valid (local) representation of the I-V curve and that the noise be sufficiently well characterized. Using more data points often has the advantage of lowering the uncertainty. However, more data pointsmore » can make the uncertainty in the fit arbitrarily small, and this fit uncertainty misses the dominant residual uncertainty due to so-called model discrepancy. Using objective Bayesian linear regression for straight-line fits for Isc, we investigate an evidence-based method to automatically choose data windows of I-V points with reduced model discrepancy. We also investigate noise effects. Uncertainties, aligned with the Guide to the Expression of Uncertainty in Measurement (GUM), are quantified throughout.« less
Self-organising mixture autoregressive model for non-stationary time series modelling.
Ni, He; Yin, Hujun
2008-12-01
Modelling non-stationary time series has been a difficult task for both parametric and nonparametric methods. One promising solution is to combine the flexibility of nonparametric models with the simplicity of parametric models. In this paper, the self-organising mixture autoregressive (SOMAR) network is adopted as a such mixture model. It breaks time series into underlying segments and at the same time fits local linear regressive models to the clusters of segments. In such a way, a global non-stationary time series is represented by a dynamic set of local linear regressive models. Neural gas is used for a more flexible structure of the mixture model. Furthermore, a new similarity measure has been introduced in the self-organising network to better quantify the similarity of time series segments. The network can be used naturally in modelling and forecasting non-stationary time series. Experiments on artificial, benchmark time series (e.g. Mackey-Glass) and real-world data (e.g. numbers of sunspots and Forex rates) are presented and the results show that the proposed SOMAR network is effective and superior to other similar approaches.
Quantifying and Reducing Curve-Fitting Uncertainty in Isc: Preprint
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campanelli, Mark; Duck, Benjamin; Emery, Keith
Current-voltage (I-V) curve measurements of photovoltaic (PV) devices are used to determine performance parameters and to establish traceable calibration chains. Measurement standards specify localized curve fitting methods, e.g., straight-line interpolation/extrapolation of the I-V curve points near short-circuit current, Isc. By considering such fits as statistical linear regressions, uncertainties in the performance parameters are readily quantified. However, the legitimacy of such a computed uncertainty requires that the model be a valid (local) representation of the I-V curve and that the noise be sufficiently well characterized. Using more data points often has the advantage of lowering the uncertainty. However, more data pointsmore » can make the uncertainty in the fit arbitrarily small, and this fit uncertainty misses the dominant residual uncertainty due to so-called model discrepancy. Using objective Bayesian linear regression for straight-line fits for Isc, we investigate an evidence-based method to automatically choose data windows of I-V points with reduced model discrepancy. We also investigate noise effects. Uncertainties, aligned with the Guide to the Expression of Uncertainty in Measurement (GUM), are quantified throughout.« less
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Rosenblum, Michael; van der Laan, Mark J.
2010-01-01
Models, such as logistic regression and Poisson regression models, are often used to estimate treatment effects in randomized trials. These models leverage information in variables collected before randomization, in order to obtain more precise estimates of treatment effects. However, there is the danger that model misspecification will lead to bias. We show that certain easy to compute, model-based estimators are asymptotically unbiased even when the working model used is arbitrarily misspecified. Furthermore, these estimators are locally efficient. As a special case of our main result, we consider a simple Poisson working model containing only main terms; in this case, we prove the maximum likelihood estimate of the coefficient corresponding to the treatment variable is an asymptotically unbiased estimator of the marginal log rate ratio, even when the working model is arbitrarily misspecified. This is the log-linear analog of ANCOVA for linear models. Our results demonstrate one application of targeted maximum likelihood estimation. PMID:20628636
Adaptive Nonparametric Kinematic Modeling of Concentric Tube Robots.
Fagogenis, Georgios; Bergeles, Christos; Dupont, Pierre E
2016-10-01
Concentric tube robots comprise telescopic precurved elastic tubes. The robot's tip and shape are controlled via relative tube motions, i.e. tube rotations and translations. Non-linear interactions between the tubes, e.g. friction and torsion, as well as uncertainty in the physical properties of the tubes themselves, e.g. the Young's modulus, curvature, or stiffness, hinder accurate kinematic modelling. In this paper, we present a machine-learning-based methodology for kinematic modelling of concentric tube robots and in situ model adaptation. Our approach is based on Locally Weighted Projection Regression (LWPR). The model comprises an ensemble of linear models, each of which locally approximates the original complex kinematic relation. LWPR can accommodate for model deviations by adjusting the respective local models at run-time, resulting in an adaptive kinematics framework. We evaluated our approach on data gathered from a three-tube robot, and report high accuracy across the robot's configuration space.
Developing a predictive tropospheric ozone model for Tabriz
NASA Astrophysics Data System (ADS)
Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi
2013-04-01
Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.
Ecological and Topographic Features of Volcanic Ash-Influenced Forest Soils
Mark Kimsey; Brian Gardner; Alan Busacca
2007-01-01
Volcanic ash distribution and thickness were determined for a forested region of north-central Idaho. Mean ash thickness and multiple linear regression analyses were used to model the effect of environmental variables on ash thickness. Slope and slope curvature relationships with volcanic ash thickness varied on a local spatial scale across the study area. Ash...
The development of global motion discrimination in school aged children
Bogfjellmo, Lotte-Guri; Bex, Peter J.; Falkenberg, Helle K.
2014-01-01
Global motion perception matures during childhood and involves the detection of local directional signals that are integrated across space. We examine the maturation of local directional selectivity and global motion integration with an equivalent noise paradigm applied to direction discrimination. One hundred and three observers (6–17 years) identified the global direction of motion in a 2AFC task. The 8° central stimuli consisted of 100 dots of 10% Michelson contrast moving 2.8°/s or 9.8°/s. Local directional selectivity and global sampling efficiency were estimated from direction discrimination thresholds as a function of external directional noise, speed, and age. Direction discrimination thresholds improved gradually until the age of 14 years (linear regression, p < 0.05) for both speeds. This improvement was associated with a gradual increase in sampling efficiency (linear regression, p < 0.05), with no significant change in internal noise. Direction sensitivity was lower for dots moving at 2.8°/s than at 9.8°/s for all ages (paired t test, p < 0.05) and is mainly due to lower sampling efficiency. Global motion perception improves gradually during development and matures by age 14. There was no change in internal noise after the age of 6, suggesting that local direction selectivity is mature by that age. The improvement in global motion perception is underpinned by a steady increase in the efficiency with which direction signals are pooled, suggesting that global motion pooling processes mature for longer and later than local motion processing. PMID:24569985
An Analysis of San Diego's Housing Market Using a Geographically Weighted Regression Approach
NASA Astrophysics Data System (ADS)
Grant, Christina P.
San Diego County real estate transaction data was evaluated with a set of linear models calibrated by ordinary least squares and geographically weighted regression (GWR). The goal of the analysis was to determine whether the spatial effects assumed to be in the data are best studied globally with no spatial terms, globally with a fixed effects submarket variable, or locally with GWR. 18,050 single-family residential sales which closed in the six months between April 2014 and September 2014 were used in the analysis. Diagnostic statistics including AICc, R2, Global Moran's I, and visual inspection of diagnostic plots and maps indicate superior model performance by GWR as compared to both global regressions.
Yousefi, Siamak; Balasubramanian, Madhusudhanan; Goldbaum, Michael H; Medeiros, Felipe A; Zangwill, Linda M; Weinreb, Robert N; Liebmann, Jeffrey M; Girkin, Christopher A; Bowd, Christopher
2016-05-01
To validate Gaussian mixture-model with expectation maximization (GEM) and variational Bayesian independent component analysis mixture-models (VIM) for detecting glaucomatous progression along visual field (VF) defect patterns (GEM-progression of patterns (POP) and VIM-POP). To compare GEM-POP and VIM-POP with other methods. GEM and VIM models separated cross-sectional abnormal VFs from 859 eyes and normal VFs from 1117 eyes into abnormal and normal clusters. Clusters were decomposed into independent axes. The confidence limit (CL) of stability was established for each axis with a set of 84 stable eyes. Sensitivity for detecting progression was assessed in a sample of 83 eyes with known progressive glaucomatous optic neuropathy (PGON). Eyes were classified as progressed if any defect pattern progressed beyond the CL of stability. Performance of GEM-POP and VIM-POP was compared to point-wise linear regression (PLR), permutation analysis of PLR (PoPLR), and linear regression (LR) of mean deviation (MD), and visual field index (VFI). Sensitivity and specificity for detecting glaucomatous VFs were 89.9% and 93.8%, respectively, for GEM and 93.0% and 97.0%, respectively, for VIM. Receiver operating characteristic (ROC) curve areas for classifying progressed eyes were 0.82 for VIM-POP, 0.86 for GEM-POP, 0.81 for PoPLR, 0.69 for LR of MD, and 0.76 for LR of VFI. GEM-POP was significantly more sensitive to PGON than PoPLR and linear regression of MD and VFI in our sample, while providing localized progression information. Detection of glaucomatous progression can be improved by assessing longitudinal changes in localized patterns of glaucomatous defect identified by unsupervised machine learning.
Zhang, Xu; Wang, Dongqing; Yu, Zaiyang; Chen, Xiang; Li, Sheng; Zhou, Ping
2017-11-01
This study examines the electromyogram (EMG)-torque relation for chronic stroke survivors using a novel EMG complexity representation. Ten stroke subjects performed a series of submaximal isometric elbow flexion tasks using their affected and contralateral arms, respectively, while a 20-channel linear electrode array was used to record surface EMG from the biceps brachii muscles. The sample entropy (SampEn) of surface EMG signals was calculated with both global and local tolerance schemes. A regression analysis was performed between SampEn of each channel's surface EMG and elbow flexion torque. It was found that a linear regression can be used to well describe the relation between surface EMG SampEn and the torque. Each channel's root mean square (RMS) amplitude of surface EMG signal in the different torque level was computed to determine the channel with the highest EMG amplitude. The slope of the regression (observed from the channel with the highest EMG amplitude) was smaller on the impaired side than on the nonimpaired side in 8 of the 10 subjects, regardless of the tolerance scheme (global or local) and the range of torques (full or matched range) used for comparison. The surface EMG signals from the channels above the estimated muscle innervation zones demonstrated significantly lower levels of complexity compared with other channels between innervation zones and muscle tendons. The study provides a novel point of view of the EMG-torque relation in the complexity domain, and reveals its alterations post stroke, which are associated with complex neural and muscular changes post stroke. The slope difference between channels with regard to innervation zones also confirms the relevance of electrode position in surface EMG analysis.
NASA Astrophysics Data System (ADS)
Rahmani, Elham; Friederichs, Petra; Keller, Jan; Hense, Andreas
2016-05-01
The main purpose of this study is to develop an easy-to-use weather generator (WG) for the downscaling of gridded data to point measurements at regional scale. The WG is applied to daily averaged temperatures and annual growing degree days (GDD) of wheat. This particular choice of variables is motivated by future investigations on temperature impacts as the most important climate variable for wheat cultivation under irrigation in Iran. The proposed statistical downscaling relates large-scale ERA-40 reanalysis to local daily temperature and annual GDD. Long-term local observations in Iran are used at 16 synoptic stations from 1961 to 2001, which is the common period with ERA-40 data. We perform downscaling using two approaches: the first is a linear regression model that uses the ERA-40 fingerprints (FP) defined by the squared correlation with local variability, and the second employs a linear multiple regression (MR) analysis to relate the large-scale information at the neighboring grid points to the station data. Extending the usual downscaling, we implement a WG providing uncertainty information and realizations of the local temperatures and GDD by adding a Gaussian random noise. ERA-40 reanalysis well represents the local daily temperature as well as the annual GDD variability. For 2-m temperature, the FPs are more localized during the warm compared with the cold season. While MR is slightly superior for daily temperature time series, FP seems to perform best for annual GDD. We further assess the quality of the WGs applying probabilistic verification scores like the continuous ranked probability score (CRPS) and the respective skill score. They clearly demonstrate the superiority of WGs compared with a deterministic downscaling.
An approach of traffic signal control based on NLRSQP algorithm
NASA Astrophysics Data System (ADS)
Zou, Yuan-Yang; Hu, Yu
2017-11-01
This paper presents a linear program model with linear complementarity constraints (LPLCC) to solve traffic signal optimization problem. The objective function of the model is to obtain the minimization of total queue length with weight factors at the end of each cycle. Then, a combination algorithm based on the nonlinear least regression and sequence quadratic program (NLRSQP) is proposed, by which the local optimal solution can be obtained. Furthermore, four numerical experiments are proposed to study how to set the initial solution of the algorithm that can get a better local optimal solution more quickly. In particular, the results of numerical experiments show that: The model is effective for different arrival rates and weight factors; and the lower bound of the initial solution is, the better optimal solution can be obtained.
Statistical downscaling of precipitation using long short-term memory recurrent neural networks
NASA Astrophysics Data System (ADS)
Misra, Saptarshi; Sarkar, Sudeshna; Mitra, Pabitra
2017-11-01
Hydrological impacts of global climate change on regional scale are generally assessed by downscaling large-scale climatic variables, simulated by General Circulation Models (GCMs), to regional, small-scale hydrometeorological variables like precipitation, temperature, etc. In this study, we propose a new statistical downscaling model based on Recurrent Neural Network with Long Short-Term Memory which captures the spatio-temporal dependencies in local rainfall. The previous studies have used several other methods such as linear regression, quantile regression, kernel regression, beta regression, and artificial neural networks. Deep neural networks and recurrent neural networks have been shown to be highly promising in modeling complex and highly non-linear relationships between input and output variables in different domains and hence we investigated their performance in the task of statistical downscaling. We have tested this model on two datasets—one on precipitation in Mahanadi basin in India and the second on precipitation in Campbell River basin in Canada. Our autoencoder coupled long short-term memory recurrent neural network model performs the best compared to other existing methods on both the datasets with respect to temporal cross-correlation, mean squared error, and capturing the extremes.
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
Robust estimation for partially linear models with large-dimensional covariates
Zhu, LiPing; Li, RunZe; Cui, HengJian
2014-01-01
We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of o(n), where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures. PMID:24955087
Robust estimation for partially linear models with large-dimensional covariates.
Zhu, LiPing; Li, RunZe; Cui, HengJian
2013-10-01
We are concerned with robust estimation procedures to estimate the parameters in partially linear models with large-dimensional covariates. To enhance the interpretability, we suggest implementing a noncon-cave regularization method in the robust estimation procedure to select important covariates from the linear component. We establish the consistency for both the linear and the nonlinear components when the covariate dimension diverges at the rate of [Formula: see text], where n is the sample size. We show that the robust estimate of linear component performs asymptotically as well as its oracle counterpart which assumes the baseline function and the unimportant covariates were known a priori. With a consistent estimator of the linear component, we estimate the nonparametric component by a robust local linear regression. It is proved that the robust estimate of nonlinear component performs asymptotically as well as if the linear component were known in advance. Comprehensive simulation studies are carried out and an application is presented to examine the finite-sample performance of the proposed procedures.
Paul G. Schaberg; Brynne E. Lazarus; Gary J. Hawley; Joshua M. Halman; Catherine H. Borer; Christopher F. Hansen
2011-01-01
Despite considerable study, it remains uncertain what environmental factors contribute to red spruce (Picea rubens Sarg.) foliar winter injury and how much this injury influences tree C stores. We used a long-term record of winter injury in a plantation in New Hampshire and conducted stepwise linear regression analyses with local weather and regional...
Outsourcing primary health care services--how politicians explain the grounds for their decisions.
Laamanen, Ritva; Simonsen-Rehn, Nina; Suominen, Sakari; Øvretveit, John; Brommels, Mats
2008-12-01
To explore outsourcing of primary health care (PHC) services in four municipalities in Finland with varying amounts and types of outsourcing: a Southern municipality (SM) which contracted all PHC services to a not-for-profit voluntary organization, and Eastern (EM), South-Western (SWM) and Western (WM) municipalities which had contracted out only a few services to profit or public organizations. A mail survey to all municipality politicians (response rate 52%, N=101) in 2004. Data were analyzed using cross-tabulations, Spearman correlation and linear regression analyses. Politicians were willing to outsource PHC services only partially, and many problems relating to outsourcing were reported. Politicians in all municipalities were least likely to outsource preventive services. A multiple linear regression model showed that reported preference to outsource in EM and in SWM was lower than in SM, and also lower among politicians from "leftist" political parties than "rightist" political parties. Perceived difficulties in local health policy issues were related to reduced preference to outsource. The model explained 27% of the variance of the inclination to outsource PHC services. The findings highlight how important it is to take into account local health policy issues when assessing service-provision models.
NASA Astrophysics Data System (ADS)
Astuti, H. N.; Saputro, D. R. S.; Susanti, Y.
2017-06-01
MGWR model is combination of linear regression model and geographically weighted regression (GWR) model, therefore, MGWR model could produce parameter estimation that had global parameter estimation, and other parameter that had local parameter in accordance with its observation location. The linkage between locations of the observations expressed in specific weighting that is adaptive bi-square. In this research, we applied MGWR model with weighted adaptive bi-square for case of DHF in Surakarta based on 10 factors (variables) that is supposed to influence the number of people with DHF. The observation unit in the research is 51 urban villages and the variables are number of inhabitants, number of houses, house index, many public places, number of healthy homes, number of Posyandu, area width, level population density, welfare of the family, and high-region. Based on this research, we obtained 51 MGWR models. The MGWR model were divided into 4 groups with significant variable is house index as a global variable, an area width as a local variable and the remaining variables vary in each. Global variables are variables that significantly affect all locations, while local variables are variables that significantly affect a specific location.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing
2014-06-16
Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas.
Chen, Wen-Yuan; Wang, Mei; Fu, Zhou-Xing
2014-01-01
Most railway accidents happen at railway crossings. Therefore, how to detect humans or objects present in the risk area of a railway crossing and thus prevent accidents are important tasks. In this paper, three strategies are used to detect the risk area of a railway crossing: (1) we use a terrain drop compensation (TDC) technique to solve the problem of the concavity of railway crossings; (2) we use a linear regression technique to predict the position and length of an object from image processing; (3) we have developed a novel strategy called calculating local maximum Y-coordinate object points (CLMYOP) to obtain the ground points of the object. In addition, image preprocessing is also applied to filter out the noise and successfully improve the object detection. From the experimental results, it is demonstrated that our scheme is an effective and corrective method for the detection of railway crossing risk areas. PMID:24936948
2010-01-01
Background There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment) to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2) in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR) modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Methods Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008) and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs) of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Results Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32), the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p < 0.001). Local meteorology did not significantly affect associations with principal GIS variables, and the regression model structure was robust to various model-building approaches. Conclusion Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the airport terminal. PMID:21083910
Adamkiewicz, Gary; Hsu, Hsiao-Hsien; Vallarino, Jose; Melly, Steven J; Spengler, John D; Levy, Jonathan I
2010-11-17
There is growing concern in communities surrounding airports regarding the contribution of various emission sources (such as aircraft and ground support equipment) to nearby ambient concentrations. We used extensive monitoring of nitrogen dioxide (NO2) in neighborhoods surrounding T.F. Green Airport in Warwick, RI, and land-use regression (LUR) modeling techniques to determine the impact of proximity to the airport and local traffic on these concentrations. Palmes diffusion tube samplers were deployed along the airport's fence line and within surrounding neighborhoods for one to two weeks. In total, 644 measurements were collected over three sampling campaigns (October 2007, March 2008 and June 2008) and each sampling location was geocoded. GIS-based variables were created as proxies for local traffic and airport activity. A forward stepwise regression methodology was employed to create general linear models (GLMs) of NO2 variability near the airport. The effect of local meteorology on associations with GIS-based variables was also explored. Higher concentrations of NO2 were seen near the airport terminal, entrance roads to the terminal, and near major roads, with qualitatively consistent spatial patterns between seasons. In our final multivariate model (R2 = 0.32), the local influences of highways and arterial/collector roads were statistically significant, as were local traffic density and distance to the airport terminal (all p < 0.001). Local meteorology did not significantly affect associations with principal GIS variables, and the regression model structure was robust to various model-building approaches. Our study has shown that there are clear local variations in NO2 in the neighborhoods that surround an urban airport, which are spatially consistent across seasons. LUR modeling demonstrated a strong influence of local traffic, except the smallest roads that predominate in residential areas, as well as proximity to the airport terminal.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Callister, Stephen J.; Barry, Richard C.; Adkins, Joshua N.
2006-02-01
Central tendency, linear regression, locally weighted regression, and quantile techniques were investigated for normalization of peptide abundance measurements obtained from high-throughput liquid chromatography-Fourier transform ion cyclotron resonance mass spectrometry (LC-FTICR MS). Arbitrary abundances of peptides were obtained from three sample sets, including a standard protein sample, two Deinococcus radiodurans samples taken from different growth phases, and two mouse striatum samples from control and methamphetamine-stressed mice (strain C57BL/6). The selected normalization techniques were evaluated in both the absence and presence of biological variability by estimating extraneous variability prior to and following normalization. Prior to normalization, replicate runs from each sample setmore » were observed to be statistically different, while following normalization replicate runs were no longer statistically different. Although all techniques reduced systematic bias, assigned ranks among the techniques revealed significant trends. For most LC-FTICR MS analyses, linear regression normalization ranked either first or second among the four techniques, suggesting that this technique was more generally suitable for reducing systematic biases.« less
Heat and mass transfer rates during flow of dissociated hydrogen gas over graphite surface
NASA Technical Reports Server (NTRS)
Nema, V. K.; Sharma, O. P.
1986-01-01
To improve upon the performance of chemical rockets, the nuclear reactor has been applied to a rocket propulsion system using hydrogen gas as working fluid and a graphite-composite forming a part of the structure. Under the boundary layer approximation, theoretical predictions of skin friction coefficient, surface heat transfer rate and surface regression rate have been made for laminar/turbulent dissociated hydrogen gas flowing over a flat graphite surface. The external stream is assumed to be frozen. The analysis is restricted to Mach numbers low enough to deal with the situation of only surface-reaction between hydrogen and graphite. Empirical correlations of displacement thickness, local skin friction coefficient, local Nusselt number and local non-dimensional heat transfer rate have been obtained. The magnitude of the surface regression rate is found low enough to ensure the use of graphite as a linear or a component of the system over an extended period without loss of performance.
Marrero-Ponce, Yovani; Medina-Marrero, Ricardo; Castillo-Garit, Juan A; Romero-Zaldivar, Vicente; Torrens, Francisco; Castro, Eduardo A
2005-04-15
A novel approach to bio-macromolecular design from a linear algebra point of view is introduced. A protein's total (whole protein) and local (one or more amino acid) linear indices are a new set of bio-macromolecular descriptors of relevance to protein QSAR/QSPR studies. These amino-acid level biochemical descriptors are based on the calculation of linear maps on Rn[f k(xmi):Rn-->Rn] in canonical basis. These bio-macromolecular indices are calculated from the kth power of the macromolecular pseudograph alpha-carbon atom adjacency matrix. Total linear indices are linear functional on Rn. That is, the kth total linear indices are linear maps from Rn to the scalar R[f k(xm):Rn-->R]. Thus, the kth total linear indices are calculated by summing the amino-acid linear indices of all amino acids in the protein molecule. A study of the protein stability effects for a complete set of alanine substitutions in the Arc repressor illustrates this approach. A quantitative model that discriminates near wild-type stability alanine mutants from the reduced-stability ones in a training series was obtained. This model permitted the correct classification of 97.56% (40/41) and 91.67% (11/12) of proteins in the training and test set, respectively. It shows a high Matthews correlation coefficient (MCC=0.952) for the training set and an MCC=0.837 for the external prediction set. Additionally, canonical regression analysis corroborated the statistical quality of the classification model (Rcanc=0.824). This analysis was also used to compute biological stability canonical scores for each Arc alanine mutant. On the other hand, the linear piecewise regression model compared favorably with respect to the linear regression one on predicting the melting temperature (tm) of the Arc alanine mutants. The linear model explains almost 81% of the variance of the experimental tm (R=0.90 and s=4.29) and the LOO press statistics evidenced its predictive ability (q2=0.72 and scv=4.79). Moreover, the TOMOCOMD-CAMPS method produced a linear piecewise regression (R=0.97) between protein backbone descriptors and tm values for alanine mutants of the Arc repressor. A break-point value of 51.87 degrees C characterized two mutant clusters and coincided perfectly with the experimental scale. For this reason, we can use the linear discriminant analysis and piecewise models in combination to classify and predict the stability of the mutant Arc homodimers. These models also permitted the interpretation of the driving forces of such folding process, indicating that topologic/topographic protein backbone interactions control the stability profile of wild-type Arc and its alanine mutants.
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning
Fu, QiMing
2016-01-01
To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency. PMID:27795704
Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning.
Zhong, Shan; Liu, Quan; Fu, QiMing
2016-01-01
To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ 2 -regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the policy, are approximated by local linear regression (LLR) and linear function approximation (LFA), respectively. Both the local model and the global model are applied to generate samples for planning; the former is used only if the state-prediction error does not surpass the threshold at each time step, while the latter is utilized at the end of each episode. The purpose of taking both models is to improve the sample efficiency and accelerate the convergence rate of the whole algorithm through fully utilizing the local and global information. Experimentally, AC-HMLP and RAC-HMLP are compared with three representative algorithms on two Reinforcement Learning (RL) benchmark problems. The results demonstrate that they perform best in terms of convergence rate and sample efficiency.
NASA Astrophysics Data System (ADS)
Zhao, Wei; Fan, Shaojia; Guo, Hai; Gao, Bo; Sun, Jiaren; Chen, Laiguo
2016-11-01
The quantile regression (QR) method has been increasingly introduced to atmospheric environmental studies to explore the non-linear relationship between local meteorological conditions and ozone mixing ratios. In this study, we applied QR for the first time, together with multiple linear regression (MLR), to analyze the dominant meteorological parameters influencing the mean, 10th percentile, 90th percentile and 99th percentile of maximum daily 8-h average (MDA8) ozone concentrations in 2000-2015 in Hong Kong. The dominance analysis (DA) was used to assess the relative importance of meteorological variables in the regression models. Results showed that the MLR models worked better at suburban and rural sites than at urban sites, and worked better in winter than in summer. QR models performed better in summer for 99th and 90th percentiles and performed better in autumn and winter for 10th percentile. And QR models also performed better in suburban and rural areas for 10th percentile. The top 3 dominant variables associated with MDA8 ozone concentrations, changing with seasons and regions, were frequently associated with the six meteorological parameters: boundary layer height, humidity, wind direction, surface solar radiation, total cloud cover and sea level pressure. Temperature rarely became a significant variable in any season, which could partly explain the peak of monthly average ozone concentrations in October in Hong Kong. And we found the effect of solar radiation would be enhanced during extremely ozone pollution episodes (i.e., the 99th percentile). Finally, meteorological effects on MDA8 ozone had no significant changes before and after the 2010 Asian Games.
Kumar, K Vasanth
2007-04-02
Kinetic experiments were carried out for the sorption of safranin onto activated carbon particles. The kinetic data were fitted to pseudo-second order model of Ho, Sobkowsk and Czerwinski, Blanchard et al. and Ritchie by linear and non-linear regression methods. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo-second order models were the same. Non-linear regression analysis showed that both Blanchard et al. and Ho have similar ideas on the pseudo-second order model but with different assumptions. The best fit of experimental data in Ho's pseudo-second order expression by linear and non-linear regression method showed that Ho pseudo-second order model was a better kinetic expression when compared to other pseudo-second order kinetic expressions.
Geographical variation of cerebrovascular disease in New York State: the correlation with income
Han, Daikwon; Carrow, Shannon S; Rogerson, Peter A; Munschauer, Frederick E
2005-01-01
Background Income is known to be associated with cerebrovascular disease; however, little is known about the more detailed relationship between cerebrovascular disease and income. We examined the hypothesis that the geographical distribution of cerebrovascular disease in New York State may be predicted by a nonlinear model using income as a surrogate socioeconomic risk factor. Results We used spatial clustering methods to identify areas with high and low prevalence of cerebrovascular disease at the ZIP code level after smoothing rates and correcting for edge effects; geographic locations of high and low clusters of cerebrovascular disease in New York State were identified with and without income adjustment. To examine effects of income, we calculated the excess number of cases using a non-linear regression with cerebrovascular disease rates taken as the dependent variable and income and income squared taken as independent variables. The resulting regression equation was: excess rate = 32.075 - 1.22*10-4(income) + 8.068*10-10(income2), and both income and income squared variables were significant at the 0.01 level. When income was included as a covariate in the non-linear regression, the number and size of clusters of high cerebrovascular disease prevalence decreased. Some 87 ZIP codes exceeded the critical value of the local statistic yielding a relative risk of 1.2. The majority of low cerebrovascular disease prevalence geographic clusters disappeared when the non-linear income effect was included. For linear regression, the excess rate of cerebrovascular disease falls with income; each $10,000 increase in median income of each ZIP code resulted in an average reduction of 3.83 observed cases. The significant nonlinear effect indicates a lessening of this income effect with increasing income. Conclusion Income is a non-linear predictor of excess cerebrovascular disease rates, with both low and high observed cerebrovascular disease rate areas associated with higher income. Income alone explains a significant amount of the geographical variance in cerebrovascular disease across New York State since both high and low clusters of cerebrovascular disease dissipate or disappear with income adjustment. Geographical modeling, including non-linear effects of income, may allow for better identification of other non-traditional risk factors. PMID:16242043
NASA Astrophysics Data System (ADS)
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2018-01-01
Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
Steinmann, Zoran J N; Venkatesh, Aranya; Hauck, Mara; Schipper, Aafke M; Karuppiah, Ramkumar; Laurenzi, Ian J; Huijbregts, Mark A J
2014-05-06
One of the major challenges in life cycle assessment (LCA) is the availability and quality of data used to develop models and to make appropriate recommendations. Approximations and assumptions are often made if appropriate data are not readily available. However, these proxies may introduce uncertainty into the results. A regression model framework may be employed to assess missing data in LCAs of products and processes. In this study, we develop such a regression-based framework to estimate CO2 emission factors associated with coal power plants in the absence of reported data. Our framework hypothesizes that emissions from coal power plants can be explained by plant-specific factors (predictors) that include steam pressure, total capacity, plant age, fuel type, and gross domestic product (GDP) per capita of the resident nations of those plants. Using reported emission data for 444 plants worldwide, plant level CO2 emission factors were fitted to the selected predictors by a multiple linear regression model and a local linear regression model. The validated models were then applied to 764 coal power plants worldwide, for which no reported data were available. Cumulatively, available reported data and our predictions together account for 74% of the total world's coal-fired power generation capacity.
Extending local canonical correlation analysis to handle general linear contrasts for FMRI data.
Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar
2012-01-01
Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic.
Extending Local Canonical Correlation Analysis to Handle General Linear Contrasts for fMRI Data
Jin, Mingwu; Nandy, Rajesh; Curran, Tim; Cordes, Dietmar
2012-01-01
Local canonical correlation analysis (CCA) is a multivariate method that has been proposed to more accurately determine activation patterns in fMRI data. In its conventional formulation, CCA has several drawbacks that limit its usefulness in fMRI. A major drawback is that, unlike the general linear model (GLM), a test of general linear contrasts of the temporal regressors has not been incorporated into the CCA formalism. To overcome this drawback, a novel directional test statistic was derived using the equivalence of multivariate multiple regression (MVMR) and CCA. This extension will allow CCA to be used for inference of general linear contrasts in more complicated fMRI designs without reparameterization of the design matrix and without reestimating the CCA solutions for each particular contrast of interest. With the proper constraints on the spatial coefficients of CCA, this test statistic can yield a more powerful test on the inference of evoked brain regional activations from noisy fMRI data than the conventional t-test in the GLM. The quantitative results from simulated and pseudoreal data and activation maps from fMRI data were used to demonstrate the advantage of this novel test statistic. PMID:22461786
Linear regression crash prediction models : issues and proposed solutions.
DOT National Transportation Integrated Search
2010-05-01
The paper develops a linear regression model approach that can be applied to : crash data to predict vehicle crashes. The proposed approach involves novice data aggregation : to satisfy linear regression assumptions; namely error structure normality ...
Comparison between Linear and Nonlinear Regression in a Laboratory Heat Transfer Experiment
ERIC Educational Resources Information Center
Gonçalves, Carine Messias; Schwaab, Marcio; Pinto, José Carlos
2013-01-01
In order to interpret laboratory experimental data, undergraduate students are used to perform linear regression through linearized versions of nonlinear models. However, the use of linearized models can lead to statistically biased parameter estimates. Even so, it is not an easy task to introduce nonlinear regression and show for the students…
Ding, Aidong Adam; Hsieh, Jin-Jian; Wang, Weijing
2015-01-01
Bivariate survival analysis has wide applications. In the presence of covariates, most literature focuses on studying their effects on the marginal distributions. However covariates can also affect the association between the two variables. In this article we consider the latter issue by proposing a nonstandard local linear estimator for the concordance probability as a function of covariates. Under the Clayton copula, the conditional concordance probability has a simple one-to-one correspondence with the copula parameter for different data structures including those subject to independent or dependent censoring and dependent truncation. The proposed method can be used to study how covariates affect the Clayton association parameter without specifying marginal regression models. Asymptotic properties of the proposed estimators are derived and their finite-sample performances are examined via simulations. Finally, for illustration, we apply the proposed method to analyze a bone marrow transplant data set.
The Application of the Cumulative Logistic Regression Model to Automated Essay Scoring
ERIC Educational Resources Information Center
Haberman, Shelby J.; Sinharay, Sandip
2010-01-01
Most automated essay scoring programs use a linear regression model to predict an essay score from several essay features. This article applied a cumulative logit model instead of the linear regression model to automated essay scoring. Comparison of the performances of the linear regression model and the cumulative logit model was performed on a…
NASA Astrophysics Data System (ADS)
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Gao, Xiangyun; An, Haizhong; Fang, Wei; Huang, Xuan; Li, Huajiao; Zhong, Weiqiong; Ding, Yinghui
2014-07-01
The linear regression parameters between two time series can be different under different lengths of observation period. If we study the whole period by the sliding window of a short period, the change of the linear regression parameters is a process of dynamic transmission over time. We tackle fundamental research that presents a simple and efficient computational scheme: a linear regression patterns transmission algorithm, which transforms linear regression patterns into directed and weighted networks. The linear regression patterns (nodes) are defined by the combination of intervals of the linear regression parameters and the results of the significance testing under different sizes of the sliding window. The transmissions between adjacent patterns are defined as edges, and the weights of the edges are the frequency of the transmissions. The major patterns, the distance, and the medium in the process of the transmission can be captured. The statistical results of weighted out-degree and betweenness centrality are mapped on timelines, which shows the features of the distribution of the results. Many measurements in different areas that involve two related time series variables could take advantage of this algorithm to characterize the dynamic relationships between the time series from a new perspective.
Korany, Mohamed A; Gazy, Azza A; Khamis, Essam F; Ragab, Marwa A A; Kamal, Miranda F
2018-06-01
This study outlines two robust regression approaches, namely least median of squares (LMS) and iteratively re-weighted least squares (IRLS) to investigate their application in instrument analysis of nutraceuticals (that is, fluorescence quenching of merbromin reagent upon lipoic acid addition). These robust regression methods were used to calculate calibration data from the fluorescence quenching reaction (∆F and F-ratio) under ideal or non-ideal linearity conditions. For each condition, data were treated using three regression fittings: Ordinary Least Squares (OLS), LMS and IRLS. Assessment of linearity, limits of detection (LOD) and quantitation (LOQ), accuracy and precision were carefully studied for each condition. LMS and IRLS regression line fittings showed significant improvement in correlation coefficients and all regression parameters for both methods and both conditions. In the ideal linearity condition, the intercept and slope changed insignificantly, but a dramatic change was observed for the non-ideal condition and linearity intercept. Under both linearity conditions, LOD and LOQ values after the robust regression line fitting of data were lower than those obtained before data treatment. The results obtained after statistical treatment indicated that the linearity ranges for drug determination could be expanded to lower limits of quantitation by enhancing the regression equation parameters after data treatment. Analysis results for lipoic acid in capsules, using both fluorimetric methods, treated by parametric OLS and after treatment by robust LMS and IRLS were compared for both linearity conditions. Copyright © 2018 John Wiley & Sons, Ltd.
Structured penalties for functional linear models-partially empirical eigenvectors for regression.
Randolph, Timothy W; Harezlak, Jaroslaw; Feng, Ziding
2012-01-01
One of the challenges with functional data is incorporating geometric structure, or local correlation, into the analysis. This structure is inherent in the output from an increasing number of biomedical technologies, and a functional linear model is often used to estimate the relationship between the predictor functions and scalar responses. Common approaches to the problem of estimating a coefficient function typically involve two stages: regularization and estimation. Regularization is usually done via dimension reduction, projecting onto a predefined span of basis functions or a reduced set of eigenvectors (principal components). In contrast, we present a unified approach that directly incorporates geometric structure into the estimation process by exploiting the joint eigenproperties of the predictors and a linear penalty operator. In this sense, the components in the regression are 'partially empirical' and the framework is provided by the generalized singular value decomposition (GSVD). The form of the penalized estimation is not new, but the GSVD clarifies the process and informs the choice of penalty by making explicit the joint influence of the penalty and predictors on the bias, variance and performance of the estimated coefficient function. Laboratory spectroscopy data and simulations are used to illustrate the concepts.
Liu, Weijian; Wang, Yilong; Chen, Yuanchen; Tao, Shu; Liu, Wenxin
2017-07-01
The total concentrations and component profiles of polycyclic aromatic hydrocarbons (PAHs) in ambient air, surface soil and wheat grain collected from wheat fields near a large steel-smelting manufacturer in Northern China were determined. Based on the specific isomeric ratios of paired species in ambient air, principle component analysis and multivariate linear regression, the main emission source of local PAHs was identified as a mixture of industrial and domestic coal combustion, biomass burning and traffic exhaust. The total organic carbon (TOC) fraction was considerably correlated with the total and individual PAH concentrations in surface soil. The total concentrations of PAHs in wheat grain were relatively low, with dominant low molecular weight constituents, and the compositional profile was more similar to that in ambient air than in topsoil. Combined with more significant results from partial correlation and linear regression models, the contribution from air PAHs to grain PAHs may be greater than that from soil PAHs. Copyright © 2016. Published by Elsevier B.V.
Godin, Bruno; Mayer, Frédéric; Agneessens, Richard; Gerin, Patrick; Dardenne, Pierre; Delfosse, Philippe; Delcarte, Jérôme
2015-01-01
The reliability of different models to predict the biochemical methane potential (BMP) of various plant biomasses using a multispecies dataset was compared. The most reliable prediction models of the BMP were those based on the near infrared (NIR) spectrum compared to those based on the chemical composition. The NIR predictions of local (specific regression and non-linear) models were able to estimate quantitatively, rapidly, cheaply and easily the BMP. Such a model could be further used for biomethanation plant management and optimization. The predictions of non-linear models were more reliable compared to those of linear models. The presentation form (green-dried, silage-dried and silage-wet form) of biomasses to the NIR spectrometer did not influence the performances of the NIR prediction models. The accuracy of the BMP method should be improved to enhance further the BMP prediction models. Copyright © 2014 Elsevier Ltd. All rights reserved.
Hyper-Spectral Image Analysis With Partially Latent Regression and Spatial Markov Dependencies
NASA Astrophysics Data System (ADS)
Deleforge, Antoine; Forbes, Florence; Ba, Sileye; Horaud, Radu
2015-09-01
Hyper-spectral data can be analyzed to recover physical properties at large planetary scales. This involves resolving inverse problems which can be addressed within machine learning, with the advantage that, once a relationship between physical parameters and spectra has been established in a data-driven fashion, the learned relationship can be used to estimate physical parameters for new hyper-spectral observations. Within this framework, we propose a spatially-constrained and partially-latent regression method which maps high-dimensional inputs (hyper-spectral images) onto low-dimensional responses (physical parameters such as the local chemical composition of the soil). The proposed regression model comprises two key features. Firstly, it combines a Gaussian mixture of locally-linear mappings (GLLiM) with a partially-latent response model. While the former makes high-dimensional regression tractable, the latter enables to deal with physical parameters that cannot be observed or, more generally, with data contaminated by experimental artifacts that cannot be explained with noise models. Secondly, spatial constraints are introduced in the model through a Markov random field (MRF) prior which provides a spatial structure to the Gaussian-mixture hidden variables. Experiments conducted on a database composed of remotely sensed observations collected from the Mars planet by the Mars Express orbiter demonstrate the effectiveness of the proposed model.
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
Source localization in an ocean waveguide using supervised machine learning.
Niu, Haiqiang; Reeves, Emma; Gerstoft, Peter
2017-09-01
Source localization in ocean acoustics is posed as a machine learning problem in which data-driven methods learn source ranges directly from observed acoustic data. The pressure received by a vertical linear array is preprocessed by constructing a normalized sample covariance matrix and used as the input for three machine learning methods: feed-forward neural networks (FNN), support vector machines (SVM), and random forests (RF). The range estimation problem is solved both as a classification problem and as a regression problem by these three machine learning algorithms. The results of range estimation for the Noise09 experiment are compared for FNN, SVM, RF, and conventional matched-field processing and demonstrate the potential of machine learning for underwater source localization.
Element enrichment factor calculation using grain-size distribution and functional data regression.
Sierra, C; Ordóñez, C; Saavedra, A; Gallego, J R
2015-01-01
In environmental geochemistry studies it is common practice to normalize element concentrations in order to remove the effect of grain size. Linear regression with respect to a particular grain size or conservative element is a widely used method of normalization. In this paper, the utility of functional linear regression, in which the grain-size curve is the independent variable and the concentration of pollutant the dependent variable, is analyzed and applied to detrital sediment. After implementing functional linear regression and classical linear regression models to normalize and calculate enrichment factors, we concluded that the former regression technique has some advantages over the latter. First, functional linear regression directly considers the grain-size distribution of the samples as the explanatory variable. Second, as the regression coefficients are not constant values but functions depending on the grain size, it is easier to comprehend the relationship between grain size and pollutant concentration. Third, regularization can be introduced into the model in order to establish equilibrium between reliability of the data and smoothness of the solutions. Copyright © 2014 Elsevier Ltd. All rights reserved.
Who Will Win?: Predicting the Presidential Election Using Linear Regression
ERIC Educational Resources Information Center
Lamb, John H.
2007-01-01
This article outlines a linear regression activity that engages learners, uses technology, and fosters cooperation. Students generated least-squares linear regression equations using TI-83 Plus[TM] graphing calculators, Microsoft[C] Excel, and paper-and-pencil calculations using derived normal equations to predict the 2004 presidential election.…
Marginal regression analysis of recurrent events with coarsened censoring times.
Hu, X Joan; Rosychuk, Rhonda J
2016-12-01
Motivated by an ongoing pediatric mental health care (PMHC) study, this article presents weakly structured methods for analyzing doubly censored recurrent event data where only coarsened information on censoring is available. The study extracted administrative records of emergency department visits from provincial health administrative databases. The available information of each individual subject is limited to a subject-specific time window determined up to concealed data. To evaluate time-dependent effect of exposures, we adapt the local linear estimation with right censored survival times under the Cox regression model with time-varying coefficients (cf. Cai and Sun, Scandinavian Journal of Statistics 2003, 30, 93-111). We establish the pointwise consistency and asymptotic normality of the regression parameter estimator, and examine its performance by simulation. The PMHC study illustrates the proposed approach throughout the article. © 2016, The International Biometric Society.
The microcomputer scientific software series 2: general linear model--regression.
Harold M. Rauscher
1983-01-01
The general linear model regression (GLMR) program provides the microcomputer user with a sophisticated regression analysis capability. The output provides a regression ANOVA table, estimators of the regression model coefficients, their confidence intervals, confidence intervals around the predicted Y-values, residuals for plotting, a check for multicollinearity, a...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Viani, Gustavo Arruda; Stefano, Eduardo Jose; Afonso, Sergio Luis
2009-08-01
Purpose: To determine in a meta-analysis whether the outcomes in men with localized prostate cancer treated with high-dose radiotherapy (HDRT) are better than those in men treated with conventional-dose radiotherapy (CDRT), by quantifying the effect of the total dose of radiotherapy on biochemical control (BC). Methods and Materials: The MEDLINE, EMBASE, CANCERLIT, and Cochrane Library databases, as well as the proceedings of annual meetings, were systematically searched to identify randomized, controlled studies comparing HDRT with CDRT for localized prostate cancer. To evaluate the dose-response relationship, we conducted a meta-regression analysis of BC ratios by means of weighted linear regression. Results:more » Seven RCTs with a total patient population of 2812 were identified that met the study criteria. Pooled results from these RCTs showed a significant reduction in the incidence of biochemical failure in those patients with prostate cancer treated with HDRT (p < 0.0001). However, there was no difference in the mortality rate (p = 0.38) and specific prostate cancer mortality rates (p = 0.45) between the groups receiving HDRT and CDRT. However, there were more cases of late Grade >2 gastrointestinal toxicity after HDRT than after CDRT. In the subgroup analysis, patients classified as being at low (p = 0.007), intermediate (p < 0.0001), and high risk (p < 0.0001) of biochemical failure all showed a benefit from HDRT. The meta-regression analysis also detected a linear correlation between the total dose of radiotherapy and biochemical failure (BC = -67.3 + [1.8 x radiotherapy total dose in Gy]; p = 0.04). Conclusions: Our meta-analysis showed that HDRT is superior to CDRT in preventing biochemical failure in low-, intermediate-, and high-risk prostate cancer patients, suggesting that this should be offered as a treatment for all patients, regardless of their risk status.« less
Geodesic regression for image time-series.
Niethammer, Marc; Huang, Yang; Vialard, François-Xavier
2011-01-01
Registration of image-time series has so far been accomplished (i) by concatenating registrations between image pairs, (ii) by solving a joint estimation problem resulting in piecewise geodesic paths between image pairs, (iii) by kernel based local averaging or (iv) by augmenting the joint estimation with additional temporal irregularity penalties. Here, we propose a generative model extending least squares linear regression to the space of images by using a second-order dynamic formulation for image registration. Unlike previous approaches, the formulation allows for a compact representation of an approximation to the full spatio-temporal trajectory through its initial values. The method also opens up possibilities to design image-based approximation algorithms. The resulting optimization problem is solved using an adjoint method.
The allometry of coarse root biomass: log-transformed linear regression or nonlinear regression?
Lai, Jiangshan; Yang, Bo; Lin, Dunmei; Kerkhoff, Andrew J; Ma, Keping
2013-01-01
Precise estimation of root biomass is important for understanding carbon stocks and dynamics in forests. Traditionally, biomass estimates are based on allometric scaling relationships between stem diameter and coarse root biomass calculated using linear regression (LR) on log-transformed data. Recently, it has been suggested that nonlinear regression (NLR) is a preferable fitting method for scaling relationships. But while this claim has been contested on both theoretical and empirical grounds, and statistical methods have been developed to aid in choosing between the two methods in particular cases, few studies have examined the ramifications of erroneously applying NLR. Here, we use direct measurements of 159 trees belonging to three locally dominant species in east China to compare the LR and NLR models of diameter-root biomass allometry. We then contrast model predictions by estimating stand coarse root biomass based on census data from the nearby 24-ha Gutianshan forest plot and by testing the ability of the models to predict known root biomass values measured on multiple tropical species at the Pasoh Forest Reserve in Malaysia. Based on likelihood estimates for model error distributions, as well as the accuracy of extrapolative predictions, we find that LR on log-transformed data is superior to NLR for fitting diameter-root biomass scaling models. More importantly, inappropriately using NLR leads to grossly inaccurate stand biomass estimates, especially for stands dominated by smaller trees.
The Relationship Between Surface Curvature and Abdominal Aortic Aneurysm Wall Stress.
de Galarreta, Sergio Ruiz; Cazón, Aitor; Antón, Raúl; Finol, Ender A
2017-08-01
The maximum diameter (MD) criterion is the most important factor when predicting risk of rupture of abdominal aortic aneurysms (AAAs). An elevated wall stress has also been linked to a high risk of aneurysm rupture, yet is an uncommon clinical practice to compute AAA wall stress. The purpose of this study is to assess whether other characteristics of the AAA geometry are statistically correlated with wall stress. Using in-house segmentation and meshing algorithms, 30 patient-specific AAA models were generated for finite element analysis (FEA). These models were subsequently used to estimate wall stress and maximum diameter and to evaluate the spatial distributions of wall thickness, cross-sectional diameter, mean curvature, and Gaussian curvature. Data analysis consisted of statistical correlations of the aforementioned geometry metrics with wall stress for the 30 AAA inner and outer wall surfaces. In addition, a linear regression analysis was performed with all the AAA wall surfaces to quantify the relationship of the geometric indices with wall stress. These analyses indicated that while all the geometry metrics have statistically significant correlations with wall stress, the local mean curvature (LMC) exhibits the highest average Pearson's correlation coefficient for both inner and outer wall surfaces. The linear regression analysis revealed coefficients of determination for the outer and inner wall surfaces of 0.712 and 0.516, respectively, with LMC having the largest effect on the linear regression equation with wall stress. This work underscores the importance of evaluating AAA mean wall curvature as a potential surrogate for wall stress.
Woo, John H; Wang, Sumei; Melhem, Elias R; Gee, James C; Cucchiara, Andrew; McCluskey, Leo; Elman, Lauren
2014-01-01
To assess the relationship between clinically assessed Upper Motor Neuron (UMN) disease in Amyotrophic Lateral Sclerosis (ALS) and local diffusion alterations measured in the brain corticospinal tract (CST) by a tractography-driven template-space region-of-interest (ROI) analysis of Diffusion Tensor Imaging (DTI). This cross-sectional study included 34 patients with ALS, on whom DTI was performed. Clinical measures were separately obtained including the Penn UMN Score, a summary metric based upon standard clinical methods. After normalizing all DTI data to a population-specific template, tractography was performed to determine a region-of-interest (ROI) outlining the CST, in which average Mean Diffusivity (MD) and Fractional Anisotropy (FA) were estimated. Linear regression analyses were used to investigate associations of DTI metrics (MD, FA) with clinical measures (Penn UMN Score, ALSFRS-R, duration-of-disease), along with age, sex, handedness, and El Escorial category as covariates. For MD, the regression model was significant (p = 0.02), and the only significant predictors were the Penn UMN Score (p = 0.005) and age (p = 0.03). The FA regression model was also significant (p = 0.02); the only significant predictor was the Penn UMN Score (p = 0.003). Measured by the template-space ROI method, both MD and FA were linearly associated with the Penn UMN Score, supporting the hypothesis that DTI alterations reflect UMN pathology as assessed by the clinical examination.
Itoh, Taihei; Kimura, Masaomi; Sasaki, Shingo; Owada, Shingen; Horiuchi, Daisuke; Sasaki, Kenichi; Ishida, Yuji; Takahiko, Kinjo; Okumura, Ken
2014-04-01
Low conduction velocity (CV) in the area showing low electrogram amplitude (EA) is characteristic of reentry circuit of atypical atrial flutter (AFL). The quantitative relationship between CV and EA remains unclear. We characterized AFL reentry circuit in the right atrium (RA), focusing on the relationship between local CV and bipolar EA on the circuit. We investigated 26 RA AFL (10 with typical AFL; 10 atypical incisional AFL; 6 atypical nonincisional AFL) using CARTO system. By referring to isochronal and propagation maps delineated during AFL, points activated faster on the circuit were selected (median, 7 per circuit). At the 196 selected points obtained from all patients, local CV measured between the adjacent points and bipolar EA were analyzed. There was a highly significant correlation between local CV and natural logarithm of EA (lnEA) (R(2) = 0.809, P < 0.001). Among 26 AFL, linear regression analysis of mean CV, calculated by dividing circuit length (152.3 ± 41.7 mm) by tachycardia cycle length (TCL) (median 246 msec), and mean lnEA, calculated by dividing area under curve of lnEA during one tachycardia cycle by TCL, showed y = 0.695 + 0.191x (where: y = mean CV, x = lnEA; R(2) = 0.993, P < 0.001). Local CV estimated from EA with the use of this formula showed a highly significant linear correlation with that measured by the map (R(2) = 0.809, P < 0.001). The lnEA and estimated local CV show a highly positive linear correlation. CV is possibly estimated by EA measured by CARTO mapping. © 2013 Wiley Periodicals, Inc.
Locally Linear Embedding of Local Orthogonal Least Squares Images for Face Recognition
NASA Astrophysics Data System (ADS)
Hafizhelmi Kamaru Zaman, Fadhlan
2018-03-01
Dimensionality reduction is very important in face recognition since it ensures that high-dimensionality data can be mapped to lower dimensional space without losing salient and integral facial information. Locally Linear Embedding (LLE) has been previously used to serve this purpose, however, the process of acquiring LLE features requires high computation and resources. To overcome this limitation, we propose a locally-applied Local Orthogonal Least Squares (LOLS) model can be used as initial feature extraction before the application of LLE. By construction of least squares regression under orthogonal constraints we can preserve more discriminant information in the local subspace of facial features while reducing the overall features into a more compact form that we called LOLS images. LLE can then be applied on the LOLS images to maps its representation into a global coordinate system of much lower dimensionality. Several experiments carried out using publicly available face datasets such as AR, ORL, YaleB, and FERET under Single Sample Per Person (SSPP) constraint demonstrates that our proposed method can reduce the time required to compute LLE features while delivering better accuracy when compared to when either LLE or OLS alone is used. Comparison against several other feature extraction methods and more recent feature-learning method such as state-of-the-art Convolutional Neural Networks (CNN) also reveal the superiority of the proposed method under SSPP constraint.
NASA Astrophysics Data System (ADS)
Choudhury, Anustup; Farrell, Suzanne; Atkins, Robin; Daly, Scott
2017-09-01
We present an approach to predict overall HDR display quality as a function of key HDR display parameters. We first performed subjective experiments on a high quality HDR display that explored five key HDR display parameters: maximum luminance, minimum luminance, color gamut, bit-depth and local contrast. Subjects rated overall quality for different combinations of these display parameters. We explored two models | a physical model solely based on physically measured display characteristics and a perceptual model that transforms physical parameters using human vision system models. For the perceptual model, we use a family of metrics based on a recently published color volume model (ICT-CP), which consists of the PQ luminance non-linearity (ST2084) and LMS-based opponent color, as well as an estimate of the display point spread function. To predict overall visual quality, we apply linear regression and machine learning techniques such as Multilayer Perceptron, RBF and SVM networks. We use RMSE and Pearson/Spearman correlation coefficients to quantify performance. We found that the perceptual model is better at predicting subjective quality than the physical model and that SVM is better at prediction than linear regression. The significance and contribution of each display parameter was investigated. In addition, we found that combined parameters such as contrast do not improve prediction. Traditional perceptual models were also evaluated and we found that models based on the PQ non-linearity performed better.
Wang, D Z; Wang, C; Shen, C F; Zhang, Y; Zhang, H; Song, G D; Xue, X D; Xu, Z L; Zhang, S; Jiang, G H
2017-05-10
We described the time trend of acute myocardial infarction (AMI) from 1999 to 2013 in Tianjin incidence rate with Cochran-Armitage trend (CAT) test and linear regression analysis, and the results were compared. Based on actual population, CAT test had much stronger statistical power than linear regression analysis for both overall incidence trend and age specific incidence trend (Cochran-Armitage trend P value
Yang, Ruiqi; Wang, Fei; Zhang, Jialing; Zhu, Chonglei; Fan, Limei
2015-05-19
To establish the reference values of thalamus, caudate nucleus and lenticular nucleus diameters through fetal thalamic transverse section. A total of 265 fetuses at our hospital were randomly selected from November 2012 to August 2014. And the transverse and length diameters of thalamus, caudate nucleus and lenticular nucleus were measured. SPSS 19.0 statistical software was used to calculate the regression curve of fetal diameter changes and gestational weeks of pregnancy. P < 0.05 was considered as having statistical significance. The linear regression equation of fetal thalamic length diameter and gestational week was: Y = 0.051X+0.201, R = 0.876, linear regression equation of thalamic transverse diameter and fetal gestational week was: Y = 0.031X+0.229, R = 0.817, linear regression equation of fetal head of caudate nucleus length diameter and gestational age was: Y = 0.033X+0.101, R = 0.722, linear regression equation of fetal head of caudate nucleus transverse diameter and gestational week was: R = 0.025 - 0.046, R = 0.711, linear regression equation of fetal lentiform nucleus length diameter and gestational week was: Y = 0.046+0.229, R = 0.765, linear regression equation of fetal lentiform nucleus diameter and gestational week was: Y = 0.025 - 0.05, R = 0.772. Ultrasonic measurement of diameter of fetal thalamus caudate nucleus, and lenticular nucleus through thalamic transverse section is simple and convenient. And measurements increase with fetal gestational weeks and there is linear regression relationship between them.
Orthogonal Regression: A Teaching Perspective
ERIC Educational Resources Information Center
Carr, James R.
2012-01-01
A well-known approach to linear least squares regression is that which involves minimizing the sum of squared orthogonal projections of data points onto the best fit line. This form of regression is known as orthogonal regression, and the linear model that it yields is known as the major axis. A similar method, reduced major axis regression, is…
Liu, Hongcheng; Yao, Tao; Li, Runze; Ye, Yinyu
2017-11-01
This paper concerns the folded concave penalized sparse linear regression (FCPSLR), a class of popular sparse recovery methods. Although FCPSLR yields desirable recovery performance when solved globally, computing a global solution is NP-complete. Despite some existing statistical performance analyses on local minimizers or on specific FCPSLR-based learning algorithms, it still remains open questions whether local solutions that are known to admit fully polynomial-time approximation schemes (FPTAS) may already be sufficient to ensure the statistical performance, and whether that statistical performance can be non-contingent on the specific designs of computing procedures. To address the questions, this paper presents the following threefold results: (i) Any local solution (stationary point) is a sparse estimator, under some conditions on the parameters of the folded concave penalties. (ii) Perhaps more importantly, any local solution satisfying a significant subspace second-order necessary condition (S 3 ONC), which is weaker than the second-order KKT condition, yields a bounded error in approximating the true parameter with high probability. In addition, if the minimal signal strength is sufficient, the S 3 ONC solution likely recovers the oracle solution. This result also explicates that the goal of improving the statistical performance is consistent with the optimization criteria of minimizing the suboptimality gap in solving the non-convex programming formulation of FCPSLR. (iii) We apply (ii) to the special case of FCPSLR with minimax concave penalty (MCP) and show that under the restricted eigenvalue condition, any S 3 ONC solution with a better objective value than the Lasso solution entails the strong oracle property. In addition, such a solution generates a model error (ME) comparable to the optimal but exponential-time sparse estimator given a sufficient sample size, while the worst-case ME is comparable to the Lasso in general. Furthermore, to guarantee the S 3 ONC admits FPTAS.
Practical Session: Simple Linear Regression
NASA Astrophysics Data System (ADS)
Clausel, M.; Grégoire, G.
2014-12-01
Two exercises are proposed to illustrate the simple linear regression. The first one is based on the famous Galton's data set on heredity. We use the lm R command and get coefficients estimates, standard error of the error, R2, residuals …In the second example, devoted to data related to the vapor tension of mercury, we fit a simple linear regression, predict values, and anticipate on multiple linear regression. This pratical session is an excerpt from practical exercises proposed by A. Dalalyan at EPNC (see Exercises 1 and 2 of http://certis.enpc.fr/~dalalyan/Download/TP_ENPC_4.pdf).
Chen, M; Jiang, G L; Fu, X L; Wang, L J; Qian, H; Chen, G Y; Zhao, S; Liu, T F
2000-04-01
A retrospective study was carried out to evaluate the impact of overall treatment time (OTT) on the results of radiation therapy for non-small cell lung cancer (NSCLC). From Jan. 1990 to Dec. 1996, 256 patients with stages I-IIIb NSCLC entered this analysis. All patients received definitive radiotherapy. Biologically effective dose (BED) was used to standardize the irradiation effects. The correlation between OTT and local progression-free survival was analyzed by linear-regression and Cox proportional hazard models. The prognostic variables for survival and distant metastasis were also briefly studied. OTT had been shortened in 64 patients because of an accelerated hyperfractioned irradiation, while OTT was prolonged i n 114 patients due to interruptions of irradiation courses. The main ca uses of interruption were machine breakdown or delayed preparations of c errobend block for boost fields (55%), holidays (11%) and treatment toxi city and side effects (34%). Patients tre ated with prolonged OTT (> 45 days) had significant poorer local progression-free survival than whom with OTT of =45 days, 1, 3 and 5 year actuarial local progression-free survivals being 49, 17 and 15% for the former, and 74, 35 and 25% for the latter, respectively (P<0.001). BED-T that contained the factor of OTT correlated directly to local controls, which implied that BED-T represented radiobiological effects accurately, in other words, OTT had played a role in determining the radiobiological effects. Linear-regression on 103 cases treated with BED of 80-85 Gy(10) showed that 3 year local progression-free survival decreased by 9% per week with prolongation of OTT, or vice versa it increased by 9% per week with shortening OTT in an OTT range of 30-76 days. Cox multivariate analyses confirmed that OTT was an independent prognostic factor for local controls. OTT may have played an important role in determining local controls in radiotherapy for NSCLC. One should always keep in mind to make the OTT as short as possible, provided the patients can tolerate it, and to reduce irradiation interruptions for whatever reasons to a minimum.
Morse Code, Scrabble, and the Alphabet
ERIC Educational Resources Information Center
Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss
2004-01-01
In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…
DOE Office of Scientific and Technical Information (OSTI.GOV)
2015-09-14
This package contains statistical routines for extracting features from multivariate time-series data which can then be used for subsequent multivariate statistical analysis to identify patterns and anomalous behavior. It calculates local linear or quadratic regression model fits to moving windows for each series and then summarizes the model coefficients across user-defined time intervals for each series. These methods are domain agnostic-but they have been successfully applied to a variety of domains, including commercial aviation and electric power grid data.
Adaptive Local Linear Regression with Application to Printer Color Management
2008-01-01
values formed the test samples. This process guaranteed that the CIELAB test samples were in the gamut for each printer, but each printer had a...digital images has recently led to increased consumer demand for accurate color reproduction. Given a CIELAB color one would like to reproduce, the color...management problem is to determine what RGB color one must send the printer to minimize the error between the desired CIELAB color and the CIELAB
Russell, Brook T; Wang, Dewei; McMahan, Christopher S
2017-08-01
Fine particulate matter (PM 2.5 ) poses a significant risk to human health, with long-term exposure being linked to conditions such as asthma, chronic bronchitis, lung cancer, atherosclerosis, etc. In order to improve current pollution control strategies and to better shape public policy, the development of a more comprehensive understanding of this air pollutant is necessary. To this end, this work attempts to quantify the relationship between certain meteorological drivers and the levels of PM 2.5 . It is expected that the set of important meteorological drivers will vary both spatially and within the conditional distribution of PM 2.5 levels. To account for these characteristics, a new local linear penalized quantile regression methodology is developed. The proposed estimator uniquely selects the set of important drivers at every spatial location and for each quantile of the conditional distribution of PM 2.5 levels. The performance of the proposed methodology is illustrated through simulation, and it is then used to determine the association between several meteorological drivers and PM 2.5 over the Eastern United States (US). This analysis suggests that the primary drivers throughout much of the Eastern US tend to differ based on season and geographic location, with similarities existing between "typical" and "high" PM 2.5 levels.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
NASA Astrophysics Data System (ADS)
Kang, Pilsang; Koo, Changhoi; Roh, Hokyu
2017-11-01
Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
Spectral Regression Discriminant Analysis for Hyperspectral Image Classification
NASA Astrophysics Data System (ADS)
Pan, Y.; Wu, J.; Huang, H.; Liu, J.
2012-08-01
Dimensionality reduction algorithms, which aim to select a small set of efficient and discriminant features, have attracted great attention for Hyperspectral Image Classification. The manifold learning methods are popular for dimensionality reduction, such as Locally Linear Embedding, Isomap, and Laplacian Eigenmap. However, a disadvantage of many manifold learning methods is that their computations usually involve eigen-decomposition of dense matrices which is expensive in both time and memory. In this paper, we introduce a new dimensionality reduction method, called Spectral Regression Discriminant Analysis (SRDA). SRDA casts the problem of learning an embedding function into a regression framework, which avoids eigen-decomposition of dense matrices. Also, with the regression based framework, different kinds of regularizes can be naturally incorporated into our algorithm which makes it more flexible. It can make efficient use of data points to discover the intrinsic discriminant structure in the data. Experimental results on Washington DC Mall and AVIRIS Indian Pines hyperspectral data sets demonstrate the effectiveness of the proposed method.
Barzi, Federica; Jones, Graham R D; Hughes, Jaquelyne T; Lawton, Paul D; Hoy, Wendy; O'Dea, Kerin; Jerums, George; MacIsaac, Richard J; Cass, Alan; Maple-Brown, Louise J
2018-03-01
Being able to estimate kidney decline accurately is particularly important in Indigenous Australians, a population at increased risk of developing chronic kidney disease and end stage kidney disease. The aim of this analysis was to explore the trend of decline in estimated glomerular filtration rate (eGFR) over a four year period using multiple local creatinine measures, compared with estimates derived using centrally-measured enzymatic creatinine and with estimates derived using only two local measures. The eGFR study comprised a cohort of over 600 Aboriginal Australian participants recruited from over twenty sites in urban, regional and remote Australia across five strata of health, diabetes and kidney function. Trajectories of eGFR were explored on 385 participants with at least three local creatinine records using graphical methods that compared the linear trends fitted using linear mixed models with non-linear trends fitted using fractional polynomial equations. Temporal changes of local creatinine were also characterized using group-based modelling. Analyses were stratified by eGFR (<60; 60-89; 90-119 and ≥120ml/min/1.73m 2 ) and albuminuria categories (<3mg/mmol; 3-30mg/mmol; >30mg/mmol). Mean age of the participants was 48years, 64% were female and the median follow-up was 3years. Decline of eGFR was accurately estimated using simple linear regression models and locally measured creatinine was as good as centrally measured creatinine at predicting kidney decline in people with an eGFR<60 and an eGFR 60-90ml/min/1.73m 2 with albuminuria. Analyses showed that one baseline and one follow-up locally measured creatinine may be sufficient to estimate short term (up to four years) kidney function decline. The greatest yearly decline was estimated in those with eGFR 60-90 and macro-albuminuria: -6.21 (-8.20, -4.23) ml/min/1.73m 2 . Short term estimates of kidney function decline can be reliably derived using an easy to implement and simple to interpret linear mixed effect model. Locally measured creatinine did not differ to centrally measured creatinine, thus is an accurate cost-efficient and timely means to monitoring kidney function progression. Copyright © 2018 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Marrero-Ponce, Yovani
2004-01-01
This report describes a new set of molecular descriptors of relevance to QSAR/QSPR studies and drug design, atom linear indices fk(xi). These atomic level chemical descriptors are based on the calculation of linear maps on Rn[fk(xi): Rn--> Rn] in canonical basis. In this context, the kth power of the molecular pseudograph's atom adjacency matrix [Mk(G)] denotes the matrix of fk(xi) with respect to the canonical basis. In addition, a local-fragment (atom-type) formalism was developed. The kth atom-type linear indices are calculated by summing the kth atom linear indices of all atoms of the same atom type in the molecules. Moreover, total (whole-molecule) linear indices are also proposed. This descriptor is a linear functional (linear form) on Rn. That is, the kth total linear indices is a linear map from Rn to the scalar R[ fk(x): Rn --> R]. Thus, the kth total linear indices are calculated by summing the atom linear indices of all atoms in the molecule. The features of the kth total and local linear indices are illustrated by examples of various types of molecular structures, including chain-lengthening, branching, heteroatoms-content, and multiple bonds. Additionally, the linear independence of the local linear indices to other 0D, 1D, 2D, and 3D molecular descriptors is demonstrated by using principal component analysis for 42 very heterogeneous molecules. Much redundancy and overlapping was found among total linear indices and most of the other structural indices presently in use in the QSPR/QSAR practice. On the contrary, the information carried by atom-type linear indices was strikingly different from that codified in most of the 229 0D-3D molecular descriptors used in this study. It is concluded that the local linear indices are an independent indices containing important structural information to be used in QSPR/QSAR and drug design studies. In this sense, atom, atom-type, and total linear indices were used for the prediction of pIC50 values for the cleavage process of a set of flavone derivatives inhibitors of HIV-1 integrase. Quantitative models found are significant from a statistical point of view (R of 0.965, 0.902, and 0.927, respectively) and permit a clear interpretation of the studied properties in terms of the structural features of molecules. A LOO cross-validation procedure revealed that the regression models had a fairly good predictability (q2 of 0.679, 0.543, and 0.721, respectively). The comparison with other approaches reveals good behavior of the method proposed. The approach described in this paper appears to be an excellent alternative or guides for discovery and optimization of new lead compounds.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.
Ferrari, Alberto; Comelli, Mario
2016-12-01
In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Shoeibi, Nasser; Hosseini, Seyedeh Maryam; Banaee, Touka; Ansari-Astaneh, Mohammad-Reza; Abrishami, Majid; Ahmadieh, Hamid
2018-01-01
Reporting a special clinical finding after intravitreal bevacizumab monotherapy for retinopathy of prematurity. In a retrospective case series, the clinical courses of five premature infants with similar vitreous changes after a single dose of intravitreal bevacizumab (IVB) injection without additional laser therapy were reported. The mean post-conceptional age at IVB injection was 39.8 ± 2.2 (range 37-43) weeks. Localized vitreous syneresis and linear fibrotic vitreous condensation occurred 8.2 ± 2.3 weeks after IVB monotherapy in our patients (15.5% of injections). The mean last post injection visit was 61.6 ± 5.3 weeks (post-conceptional age). Further regression and complete retinal vascularization occurred in all patients. Thread-like vitreous condensation with localized vitreous liquefaction may be related to involutional ROP disease itself, combined to anti VEGF therapy and may be a predictor factor for further regression and retinal vascularization. The case series describes a successful response to anti-VEGF monotherapy with no further complications.
Quality of life in breast cancer patients--a quantile regression analysis.
Pourhoseingholi, Mohamad Amin; Safaee, Azadeh; Moghimi-Dehkordi, Bijan; Zeighami, Bahram; Faghihzadeh, Soghrat; Tabatabaee, Hamid Reza; Pourhoseingholi, Asma
2008-01-01
Quality of life study has an important role in health care especially in chronic diseases, in clinical judgment and in medical resources supplying. Statistical tools like linear regression are widely used to assess the predictors of quality of life. But when the response is not normal the results are misleading. The aim of this study is to determine the predictors of quality of life in breast cancer patients, using quantile regression model and compare to linear regression. A cross-sectional study conducted on 119 breast cancer patients that admitted and treated in chemotherapy ward of Namazi hospital in Shiraz. We used QLQ-C30 questionnaire to assessment quality of life in these patients. A quantile regression was employed to assess the assocciated factors and the results were compared to linear regression. All analysis carried out using SAS. The mean score for the global health status for breast cancer patients was 64.92+/-11.42. Linear regression showed that only grade of tumor, occupational status, menopausal status, financial difficulties and dyspnea were statistically significant. In spite of linear regression, financial difficulties were not significant in quantile regression analysis and dyspnea was only significant for first quartile. Also emotion functioning and duration of disease statistically predicted the QOL score in the third quartile. The results have demonstrated that using quantile regression leads to better interpretation and richer inference about predictors of the breast cancer patient quality of life.
Interpretation of commonly used statistical regression models.
Kasza, Jessica; Wolfe, Rory
2014-01-01
A review of some regression models commonly used in respiratory health applications is provided in this article. Simple linear regression, multiple linear regression, logistic regression and ordinal logistic regression are considered. The focus of this article is on the interpretation of the regression coefficients of each model, which are illustrated through the application of these models to a respiratory health research study. © 2013 The Authors. Respirology © 2013 Asian Pacific Society of Respirology.
Synoptic and meteorological drivers of extreme ozone concentrations over Europe
NASA Astrophysics Data System (ADS)
Otero, Noelia Felipe; Sillmann, Jana; Schnell, Jordan L.; Rust, Henning W.; Butler, Tim
2016-04-01
The present work assesses the relationship between local and synoptic meteorological conditions and surface ozone concentration over Europe in spring and summer months, during the period 1998-2012 using a new interpolated data set of observed surface ozone concentrations over the European domain. Along with local meteorological conditions, the influence of large-scale atmospheric circulation on surface ozone is addressed through a set of airflow indices computed with a novel implementation of a grid-by-grid weather type classification across Europe. Drivers of surface ozone over the full distribution of maximum daily 8-hour average values are investigated, along with drivers of the extreme high percentiles and exceedances or air quality guideline thresholds. Three different regression techniques are applied: multiple linear regression to assess the drivers of maximum daily ozone, logistic regression to assess the probability of threshold exceedances and quantile regression to estimate the meteorological influence on extreme values, as represented by the 95th percentile. The relative importance of the input parameters (predictors) is assessed by a backward stepwise regression procedure that allows the identification of the most important predictors in each model. Spatial patterns of model performance exhibit distinct variations between regions. The inclusion of the ozone persistence is particularly relevant over Southern Europe. In general, the best model performance is found over Central Europe, where the maximum temperature plays an important role as a driver of maximum daily ozone as well as its extreme values, especially during warmer months.
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
External contribution to urban air pollution.
Grima, Ramon; Micallef, Alfred; Colls, Jeremy J
2002-02-01
Elevated particulate matter concentrations in urban locations have normally been associated with local traffic emissions. Recently it has been suggested that such episodes are influenced to a high degree by PM10 sources external to urban areas. To further corroborate this hypothesis, linear regression was sought between PM10 concentrations measured at eight urban sites in the U.K., with particulate sulphate concentration measured at two rural sites, for the years 1993-1997. Analysis of the slopes, intercepts and correlation coefficients indicate a possible relationship between urban PM10 and rural sulphate concentrations. The influences of wind direction and of the distance of the urban from the rural sites on the values of the three statistical parameters are also explored. The value of linear regression as an analysis tool in such cases is discussed and it is shown that an analysis of the sign of the rate of change of the urban PM10 and rural sulphate concentrations provides a more realistic method of correlation. The results indicate a major influence on urban PM10 concentrations from the eastern side of the United Kingdom. Linear correlation was also sought using PM10 data from nine urban sites in London and nearby rural Rochester. Analysis of the magnitude of the gradients and intercepts together with episode correlation analysis between the two sites showed the effect of transported PM10 on the local London concentrations. This article also presents methods to estimate the influence of rural and urban PM10 sources on urban PM10 concentrations and to obtain a rough estimate of the transboundary contribution to urban air pollution from the PM10 concentration data of the urban site.
Simplified large African carnivore density estimators from track indices.
Winterbach, Christiaan W; Ferreira, Sam M; Funston, Paul J; Somers, Michael J
2016-01-01
The range, population size and trend of large carnivores are important parameters to assess their status globally and to plan conservation strategies. One can use linear models to assess population size and trends of large carnivores from track-based surveys on suitable substrates. The conventional approach of a linear model with intercept may not intercept at zero, but may fit the data better than linear model through the origin. We assess whether a linear regression through the origin is more appropriate than a linear regression with intercept to model large African carnivore densities and track indices. We did simple linear regression with intercept analysis and simple linear regression through the origin and used the confidence interval for ß in the linear model y = αx + ß, Standard Error of Estimate, Mean Squares Residual and Akaike Information Criteria to evaluate the models. The Lion on Clay and Low Density on Sand models with intercept were not significant ( P > 0.05). The other four models with intercept and the six models thorough origin were all significant ( P < 0.05). The models using linear regression with intercept all included zero in the confidence interval for ß and the null hypothesis that ß = 0 could not be rejected. All models showed that the linear model through the origin provided a better fit than the linear model with intercept, as indicated by the Standard Error of Estimate and Mean Square Residuals. Akaike Information Criteria showed that linear models through the origin were better and that none of the linear models with intercept had substantial support. Our results showed that linear regression through the origin is justified over the more typical linear regression with intercept for all models we tested. A general model can be used to estimate large carnivore densities from track densities across species and study areas. The formula observed track density = 3.26 × carnivore density can be used to estimate densities of large African carnivores using track counts on sandy substrates in areas where carnivore densities are 0.27 carnivores/100 km 2 or higher. To improve the current models, we need independent data to validate the models and data to test for non-linear relationship between track indices and true density at low densities.
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Hemmila, April; McGill, Jim; Ritter, David
2008-03-01
To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
Gimelfarb, A.; Willis, J. H.
1994-01-01
An experiment was conducted to investigate the offspring-parent regression for three quantitative traits (weight, abdominal bristles and wing length) in Drosophila melanogaster. Linear and polynomial models were fitted for the regressions of a character in offspring on both parents. It is demonstrated that responses by the characters to selection predicted by the nonlinear regressions may differ substantially from those predicted by the linear regressions. This is true even, and especially, if selection is weak. The realized heritability for a character under selection is shown to be determined not only by the offspring-parent regression but also by the distribution of the character and by the form and strength of selection. PMID:7828818
Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control.
Hahne, J M; Biessmann, F; Jiang, N; Rehbaum, H; Farina, D; Meinecke, F C; Muller, K-R; Parra, L C
2014-03-01
In recent years the number of active controllable joints in electrically powered hand-prostheses has increased significantly. However, the control strategies for these devices in current clinical use are inadequate as they require separate and sequential control of each degree-of-freedom (DoF). In this study we systematically compare linear and nonlinear regression techniques for an independent, simultaneous and proportional myoelectric control of wrist movements with two DoF. These techniques include linear regression, mixture of linear experts (ME), multilayer-perceptron, and kernel ridge regression (KRR). They are investigated offline with electro-myographic signals acquired from ten able-bodied subjects and one person with congenital upper limb deficiency. The control accuracy is reported as a function of the number of electrodes and the amount and diversity of training data providing guidance for the requirements in clinical practice. The results showed that KRR, a nonparametric statistical learning method, outperformed the other methods. However, simple transformations in the feature space could linearize the problem, so that linear models could achieve similar performance as KRR at much lower computational costs. Especially ME, a physiologically inspired extension of linear regression represents a promising candidate for the next generation of prosthetic devices.
A multimedia retrieval framework based on semi-supervised ranking and relevance feedback.
Yang, Yi; Nie, Feiping; Xu, Dong; Luo, Jiebo; Zhuang, Yueting; Pan, Yunhe
2012-04-01
We present a new framework for multimedia content analysis and retrieval which consists of two independent algorithms. First, we propose a new semi-supervised algorithm called ranking with Local Regression and Global Alignment (LRGA) to learn a robust Laplacian matrix for data ranking. In LRGA, for each data point, a local linear regression model is used to predict the ranking scores of its neighboring points. A unified objective function is then proposed to globally align the local models from all the data points so that an optimal ranking score can be assigned to each data point. Second, we propose a semi-supervised long-term Relevance Feedback (RF) algorithm to refine the multimedia data representation. The proposed long-term RF algorithm utilizes both the multimedia data distribution in multimedia feature space and the history RF information provided by users. A trace ratio optimization problem is then formulated and solved by an efficient algorithm. The algorithms have been applied to several content-based multimedia retrieval applications, including cross-media retrieval, image retrieval, and 3D motion/pose data retrieval. Comprehensive experiments on four data sets have demonstrated its advantages in precision, robustness, scalability, and computational efficiency.
Unitary Response Regression Models
ERIC Educational Resources Information Center
Lipovetsky, S.
2007-01-01
The dependent variable in a regular linear regression is a numerical variable, and in a logistic regression it is a binary or categorical variable. In these models the dependent variable has varying values. However, there are problems yielding an identity output of a constant value which can also be modelled in a linear or logistic regression with…
An Expert System for the Evaluation of Cost Models
1990-09-01
contrast to the condition of equal error variance, called homoscedasticity. (Reference: Applied Linear Regression Models by John Neter - page 423...normal. (Reference: Applied Linear Regression Models by John Neter - page 125) Click Here to continue -> Autocorrelation Click Here for the index - Index...over time. Error terms correlated over time are said to be autocorrelated or serially correlated. (REFERENCE: Applied Linear Regression Models by John
Seeking maximum linearity of transfer functions
NASA Astrophysics Data System (ADS)
Silva, Filipi N.; Comin, Cesar H.; Costa, Luciano da F.
2016-12-01
Linearity is an important and frequently sought property in electronics and instrumentation. Here, we report a method capable of, given a transfer function (theoretical or derived from some real system), identifying the respective most linear region of operation with a fixed width. This methodology, which is based on least squares regression and systematic consideration of all possible regions, has been illustrated with respect to both an analytical (sigmoid transfer function) and a simple situation involving experimental data of a low-power, one-stage class A transistor current amplifier. Such an approach, which has been addressed in terms of transfer functions derived from experimentally obtained characteristic surface, also yielded contributions such as the estimation of local constants of the device, as opposed to typically considered average values. The reported method and results pave the way to several further applications in other types of devices and systems, intelligent control operation, and other areas such as identifying regions of power law behavior.
Robust Multiple Linear Regression.
1982-12-01
difficulty, but it might have more solutions corresponding to local minima. Influence Function of M-Estimates The influence function describes the effect...distributionn n function. In case of M-Estimates the influence function was found to be pro- portional to and given as T(X F)) " C(xpF,T) = .(X.T(F) F(dx...where the inverse of any distribution function F is defined in the usual way as F- (s) = inf{x IF(x) > s) 0<sə Influence Function of L-Estimates In a
Nonparametric regression applied to quantitative structure-activity relationships
Constans; Hirst
2000-03-01
Several nonparametric regressors have been applied to modeling quantitative structure-activity relationship (QSAR) data. The simplest regressor, the Nadaraya-Watson, was assessed in a genuine multivariate setting. Other regressors, the local linear and the shifted Nadaraya-Watson, were implemented within additive models--a computationally more expedient approach, better suited for low-density designs. Performances were benchmarked against the nonlinear method of smoothing splines. A linear reference point was provided by multilinear regression (MLR). Variable selection was explored using systematic combinations of different variables and combinations of principal components. For the data set examined, 47 inhibitors of dopamine beta-hydroxylase, the additive nonparametric regressors have greater predictive accuracy (as measured by the mean absolute error of the predictions or the Pearson correlation in cross-validation trails) than MLR. The use of principal components did not improve the performance of the nonparametric regressors over use of the original descriptors, since the original descriptors are not strongly correlated. It remains to be seen if the nonparametric regressors can be successfully coupled with better variable selection and dimensionality reduction in the context of high-dimensional QSARs.
Compound Identification Using Penalized Linear Regression on Metabolomics
Liu, Ruiqi; Wu, Dongfeng; Zhang, Xiang; Kim, Seongho
2014-01-01
Compound identification is often achieved by matching the experimental mass spectra to the mass spectra stored in a reference library based on mass spectral similarity. Because the number of compounds in the reference library is much larger than the range of mass-to-charge ratio (m/z) values so that the data become high dimensional data suffering from singularity. For this reason, penalized linear regressions such as ridge regression and the lasso are used instead of the ordinary least squares regression. Furthermore, two-step approaches using the dot product and Pearson’s correlation along with the penalized linear regression are proposed in this study. PMID:27212894
Ferrari, Andrea; Lo Vullo, Salvatore; Giardiello, Daniele; Veneroni, Laura; Magni, Chiara; Clerici, Carlo Alfredo; Chiaravalli, Stefano; Casanova, Michela; Luksch, Roberto; Terenziani, Monica; Spreafico, Filippo; Meazza, Cristina; Catania, Serena; Schiavello, Elisabetta; Biassoni, Veronica; Podda, Marta; Bergamaschi, Luca; Puma, Nadia; Massimino, Maura; Mariani, Luigi
2016-03-01
The potential impact of diagnostic delays on patients' outcomes is a debated issue in pediatric oncology and discordant results have been published so far. We attempted to tackle this issue by analyzing a prospective series of 351 consecutive children and adolescents with solid malignancies using innovative statistical tools. To address the nonlinear complexity of the association between symptom interval and overall survival (OS), a regression tree algorithm was constructed with sequential binary splitting rules and used to identify homogeneous patient groups vis-à-vis functional relationship between diagnostic delay and OS. Three different groups were identified: group A, with localized disease and good prognosis (5-year OS 85.4%); group B, with locally or regionally advanced, or metastatic disease and intermediate prognosis (5-year OS 72.9%), including neuroblastoma, Wilms tumor, non-rhabdomyosarcoma soft tissue sarcoma, and germ cell tumor; and group C, with locally or regionally advanced, or metastatic disease and poor prognosis (5-year OS 45%), including brain tumors, rhabdomyosarcoma, and bone sarcoma. The functional relationship between symptom interval and mortality risk differed between the three subgroups, there being no association in group A (hazard ratio [HR]: 0.96), a positive linear association in group B (HR: 1.48), and a negative linear association in group C (HR: 0.61). Our analysis suggests that at least a subset of patients can benefit from an earlier diagnosis in terms of survival. For others, intrinsic aggressiveness may mask the potential effect of diagnostic delays. Based on these findings, early diagnosis should remain a goal for pediatric cancer patients. © 2015 Wiley Periodicals, Inc.
Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties–sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen–in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models–multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)–were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness, coloration and saturation were prominent predictors in digital soil mapping. Considering the increased availability of freely available Remote Sensing data (e.g. Landsat, SRTM, Sentinels), soil information at local and regional scales in data poor regions such as West Africa can be improved with relatively little financial and human resources. PMID:28114334
Forkuor, Gerald; Hounkpatin, Ozias K L; Welp, Gerhard; Thiel, Michael
2017-01-01
Accurate and detailed spatial soil information is essential for environmental modelling, risk assessment and decision making. The use of Remote Sensing data as secondary sources of information in digital soil mapping has been found to be cost effective and less time consuming compared to traditional soil mapping approaches. But the potentials of Remote Sensing data in improving knowledge of local scale soil information in West Africa have not been fully explored. This study investigated the use of high spatial resolution satellite data (RapidEye and Landsat), terrain/climatic data and laboratory analysed soil samples to map the spatial distribution of six soil properties-sand, silt, clay, cation exchange capacity (CEC), soil organic carbon (SOC) and nitrogen-in a 580 km2 agricultural watershed in south-western Burkina Faso. Four statistical prediction models-multiple linear regression (MLR), random forest regression (RFR), support vector machine (SVM), stochastic gradient boosting (SGB)-were tested and compared. Internal validation was conducted by cross validation while the predictions were validated against an independent set of soil samples considering the modelling area and an extrapolation area. Model performance statistics revealed that the machine learning techniques performed marginally better than the MLR, with the RFR providing in most cases the highest accuracy. The inability of MLR to handle non-linear relationships between dependent and independent variables was found to be a limitation in accurately predicting soil properties at unsampled locations. Satellite data acquired during ploughing or early crop development stages (e.g. May, June) were found to be the most important spectral predictors while elevation, temperature and precipitation came up as prominent terrain/climatic variables in predicting soil properties. The results further showed that shortwave infrared and near infrared channels of Landsat8 as well as soil specific indices of redness, coloration and saturation were prominent predictors in digital soil mapping. Considering the increased availability of freely available Remote Sensing data (e.g. Landsat, SRTM, Sentinels), soil information at local and regional scales in data poor regions such as West Africa can be improved with relatively little financial and human resources.
Control Variate Selection for Multiresponse Simulation.
1987-05-01
M. H. Knuter, Applied Linear Regression Mfodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F., Probability, Allyn and Bacon...1982. Neter, J., V. Wasserman, and M. H. Knuter, Applied Linear Regression .fodels, Richard D. Erwin, Inc., Homewood, Illinois, 1983. Neuts, Marcel F...Aspects of J%,ultivariate Statistical Theory, John Wiley and Sons, New York, New York, 1982. dY Neter, J., W. Wasserman, and M. H. Knuter, Applied Linear Regression Mfodels
ERIC Educational Resources Information Center
Kobrin, Jennifer L.; Sinharay, Sandip; Haberman, Shelby J.; Chajewski, Michael
2011-01-01
This study examined the adequacy of a multiple linear regression model for predicting first-year college grade point average (FYGPA) using SAT[R] scores and high school grade point average (HSGPA). A variety of techniques, both graphical and statistical, were used to examine if it is possible to improve on the linear regression model. The results…
High correlations between MRI brain volume measurements based on NeuroQuant® and FreeSurfer.
Ross, David E; Ochs, Alfred L; Tate, David F; Tokac, Umit; Seabaugh, John; Abildskov, Tracy J; Bigler, Erin D
2018-05-30
NeuroQuant ® (NQ) and FreeSurfer (FS) are commonly used computer-automated programs for measuring MRI brain volume. Previously they were reported to have high intermethod reliabilities but often large intermethod effect size differences. We hypothesized that linear transformations could be used to reduce the large effect sizes. This study was an extension of our previously reported study. We performed NQ and FS brain volume measurements on 60 subjects (including normal controls, patients with traumatic brain injury, and patients with Alzheimer's disease). We used two statistical approaches in parallel to develop methods for transforming FS volumes into NQ volumes: traditional linear regression, and Bayesian linear regression. For both methods, we used regression analyses to develop linear transformations of the FS volumes to make them more similar to the NQ volumes. The FS-to-NQ transformations based on traditional linear regression resulted in effect sizes which were small to moderate. The transformations based on Bayesian linear regression resulted in all effect sizes being trivially small. To our knowledge, this is the first report describing a method for transforming FS to NQ data so as to achieve high reliability and low effect size differences. Machine learning methods like Bayesian regression may be more useful than traditional methods. Copyright © 2018 Elsevier B.V. All rights reserved.
Quantile Regression in the Study of Developmental Sciences
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of the outcome’s distribution. Using data from the High School and Beyond and U.S. Sustained Effects Study databases, quantile regression is demonstrated and contrasted with linear regression when considering models with: (a) one continuous predictor, (b) one dichotomous predictor, (c) a continuous and a dichotomous predictor, and (d) a longitudinal application. Results from each example exhibited the differential inferences which may be drawn using linear or quantile regression. PMID:24329596
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
A SEMIPARAMETRIC BAYESIAN MODEL FOR CIRCULAR-LINEAR REGRESSION
We present a Bayesian approach to regress a circular variable on a linear predictor. The regression coefficients are assumed to have a nonparametric distribution with a Dirichlet process prior. The semiparametric Bayesian approach gives added flexibility to the model and is usefu...
Kumar, K Vasanth; Sivanesan, S
2006-08-25
Pseudo second order kinetic expressions of Ho, Sobkowsk and Czerwinski, Blanachard et al. and Ritchie were fitted to the experimental kinetic data of malachite green onto activated carbon by non-linear and linear method. Non-linear method was found to be a better way of obtaining the parameters involved in the second order rate kinetic expressions. Both linear and non-linear regression showed that the Sobkowsk and Czerwinski and Ritchie's pseudo second order model were the same. Non-linear regression analysis showed that both Blanachard et al. and Ho have similar ideas on the pseudo second order model but with different assumptions. The best fit of experimental data in Ho's pseudo second order expression by linear and non-linear regression method showed that Ho pseudo second order model was a better kinetic expression when compared to other pseudo second order kinetic expressions. The amount of dye adsorbed at equilibrium, q(e), was predicted from Ho pseudo second order expression and were fitted to the Langmuir, Freundlich and Redlich Peterson expressions by both linear and non-linear method to obtain the pseudo isotherms. The best fitting pseudo isotherm was found to be the Langmuir and Redlich Peterson isotherm. Redlich Peterson is a special case of Langmuir when the constant g equals unity.
2015-07-15
Long-term effects on cancer survivors’ quality of life of physical training versus physical training combined with cognitive-behavioral therapy ...COMPARISON OF NEURAL NETWORK AND LINEAR REGRESSION MODELS IN STATISTICALLY PREDICTING MENTAL AND PHYSICAL HEALTH STATUS OF BREAST...34Comparison of Neural Network and Linear Regression Models in Statistically Predicting Mental and Physical Health Status of Breast Cancer Survivors
Prediction of the Main Engine Power of a New Container Ship at the Preliminary Design Stage
NASA Astrophysics Data System (ADS)
Cepowski, Tomasz
2017-06-01
The paper presents mathematical relationships that allow us to forecast the estimated main engine power of new container ships, based on data concerning vessels built in 2005-2015. The presented approximations allow us to estimate the engine power based on the length between perpendiculars and the number of containers the ship will carry. The approximations were developed using simple linear regression and multivariate linear regression analysis. The presented relations have practical application for estimation of container ship engine power needed in preliminary parametric design of the ship. It follows from the above that the use of multiple linear regression to predict the main engine power of a container ship brings more accurate solutions than simple linear regression.
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
Ernst, Anja F; Albers, Casper J
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking.
Ernst, Anja F.
2017-01-01
Misconceptions about the assumptions behind the standard linear regression model are widespread and dangerous. These lead to using linear regression when inappropriate, and to employing alternative procedures with less statistical power when unnecessary. Our systematic literature review investigated employment and reporting of assumption checks in twelve clinical psychology journals. Findings indicate that normality of the variables themselves, rather than of the errors, was wrongfully held for a necessary assumption in 4% of papers that use regression. Furthermore, 92% of all papers using linear regression were unclear about their assumption checks, violating APA-recommendations. This paper appeals for a heightened awareness for and increased transparency in the reporting of statistical assumption checking. PMID:28533971
Estimating linear temporal trends from aggregated environmental monitoring data
Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.
2017-01-01
Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.
NASA Astrophysics Data System (ADS)
Camera, Corrado; Bruggeman, Adriana; Hadjinicolaou, Panos; Pashiardis, Stelios; Lange, Manfred A.
2014-01-01
High-resolution gridded daily data sets are essential for natural resource management and the analyses of climate changes and their effects. This study aims to evaluate the performance of 15 simple or complex interpolation techniques in reproducing daily precipitation at a resolution of 1 km2 over topographically complex areas. Methods are tested considering two different sets of observation densities and different rainfall amounts. We used rainfall data that were recorded at 74 and 145 observational stations, respectively, spread over the 5760 km2 of the Republic of Cyprus, in the Eastern Mediterranean. Regression analyses utilizing geographical copredictors and neighboring interpolation techniques were evaluated both in isolation and combined. Linear multiple regression (LMR) and geographically weighted regression methods (GWR) were tested. These included a step-wise selection of covariables, as well as inverse distance weighting (IDW), kriging, and 3D-thin plate splines (TPS). The relative rank of the different techniques changes with different station density and rainfall amounts. Our results indicate that TPS performs well for low station density and large-scale events and also when coupled with regression models. It performs poorly for high station density. The opposite is observed when using IDW. Simple IDW performs best for local events, while a combination of step-wise GWR and IDW proves to be the best method for large-scale events and high station density. This study indicates that the use of step-wise regression with a variable set of geographic parameters can improve the interpolation of large-scale events because it facilitates the representation of local climate dynamics.
Large signal-to-noise ratio quantification in MLE for ARARMAX models
NASA Astrophysics Data System (ADS)
Zou, Yiqun; Tang, Xiafei
2014-06-01
It has been shown that closed-loop linear system identification by indirect method can be generally transferred to open-loop ARARMAX (AutoRegressive AutoRegressive Moving Average with eXogenous input) estimation. For such models, the gradient-related optimisation with large enough signal-to-noise ratio (SNR) can avoid the potential local convergence in maximum likelihood estimation. To ease the application of this condition, the threshold SNR needs to be quantified. In this paper, we build the amplitude coefficient which is an equivalence to the SNR and prove the finiteness of the threshold amplitude coefficient within the stability region. The quantification of threshold is achieved by the minimisation of an elaborately designed multi-variable cost function which unifies all the restrictions on the amplitude coefficient. The corresponding algorithm based on two sets of physically realisable system input-output data details the minimisation and also points out how to use the gradient-related method to estimate ARARMAX parameters when local minimum is present as the SNR is small. Then, the algorithm is tested on a theoretical AutoRegressive Moving Average with eXogenous input model for the derivation of the threshold and a gas turbine engine real system for model identification, respectively. Finally, the graphical validation of threshold on a two-dimensional plot is discussed.
Comparing The Effectiveness of a90/95 Calculations (Preprint)
2006-09-01
Nachtsheim, John Neter, William Li, Applied Linear Statistical Models , 5th ed., McGraw-Hill/Irwin, 2005 5. Mood, Graybill and Boes, Introduction...curves is based on methods that are only valid for ordinary linear regression. Requirements for a valid Ordinary Least-Squares Regression Model There... linear . For example is a linear model ; is not. 2. Uniform variance (homoscedasticity
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
2017-10-01
ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID PROPELLANT GRAIN GEOMETRIES Brian...author(s) and should not be construed as an official Department of the Army position, policy, or decision, unless so designated by other documentation...U.S. ARMY ARMAMENT RESEARCH, DEVELOPMENT AND ENGINEERING CENTER GRAIN EVALUATION SOFTWARE TO NUMERICALLY PREDICT LINEAR BURN REGRESSION FOR SOLID
Geographically weighted regression as a generalized Wombling to detect barriers to gene flow.
Diniz-Filho, José Alexandre Felizola; Soares, Thannya Nascimento; de Campos Telles, Mariana Pires
2016-08-01
Barriers to gene flow play an important role in structuring populations, especially in human-modified landscapes, and several methods have been proposed to detect such barriers. However, most applications of these methods require a relative large number of individuals or populations distributed in space, connected by vertices from Delaunay or Gabriel networks. Here we show, using both simulated and empirical data, a new application of geographically weighted regression (GWR) to detect such barriers, modeling the genetic variation as a "local" linear function of geographic coordinates (latitude and longitude). In the GWR, standard regression statistics, such as R(2) and slopes, are estimated for each sampling unit and thus are mapped. Peaks in these local statistics are then expected close to the barriers if genetic discontinuities exist, capturing a higher rate of population differentiation among neighboring populations. Isolation-by-Distance simulations on a longitudinally warped lattice revealed that higher local slopes from GWR coincide with the barrier detected with Monmonier algorithm. Even with a relatively small effect of the barrier, the power of local GWR in detecting the east-west barriers was higher than 95 %. We also analyzed empirical data of genetic differentiation among tree populations of Dipteryx alata and Eugenia dysenterica Brazilian Cerrado. GWR was applied to the principal coordinate of the pairwise FST matrix based on microsatellite loci. In both simulated and empirical data, the GWR results were consistent with discontinuities detected by Monmonier algorithm, as well as with previous explanations for the spatial patterns of genetic differentiation for the two species. Our analyses reveal how this new application of GWR can viewed as a generalized Wombling in a continuous space and be a useful approach to detect barriers and discontinuities to gene flow.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Modeling Laterality of the Globus Pallidus Internus in Patients With Parkinson's Disease.
Sharim, Justin; Yazdi, Daniel; Baohan, Amy; Behnke, Eric; Pouratian, Nader
2017-04-01
Neurosurgical interventions such as deep brain stimulation surgery of the globus pallidus internus (GPi) play an important role in the treatment of medically refractory Parkinson's disease (PD), and require high targeting accuracy. Variability in the laterality of the GPi across patients with PD has not been well characterized. The aim of this report is to identify factors that may contribute to differences in position of the motor region of GPi. The charts and operative reports of 101 PD patients following deep brain stimulation surgery (70 males, aged 11-78 years) representing 201 GPi were retrospectively reviewed. Data extracted for each subject include age, gender, anterior and posterior commissures (AC-PC) distance, and third ventricular width. Multiple linear regression, stepwise regression, and relative importance of regressors analysis were performed to assess the predictive ability of these variables on GPi laterality. Multiple linear regression for target vs. third ventricular width, gender, AC-PC distance, and age were significant for normalized linear regression coefficients of 0.333 (p < 0.0001), 0.206 (p = 0.00219), 0.168 (p = 0.0119), and 0.159 (p = 0.0136), respectively. Third ventricular width, gender, AC-PC distance, and age each account for 44.06% (21.38-65.69%, 95% CI), 20.82% (10.51-35.88%), 21.46% (8.28-37.05%), and 13.66% (2.62-28.64%) of the R 2 value, respectively. Effect size calculation was significant for a change in the GPi laterality of 0.19 mm per mm of ventricular width, 0.11 mm per mm of AC-PC distance, 0.017 mm per year in age, and 0.54 mm increase for male gender. This variability highlights the limitations of indirect targeting alone, and argues for the continued use of MRI as well as intraoperative physiological testing to account for such factors that contribute to patient-specific variability in GPi localization. © 2016 International Neuromodulation Society.
Multiple imputation for cure rate quantile regression with censored data.
Wu, Yuanshan; Yin, Guosheng
2017-03-01
The main challenge in the context of cure rate analysis is that one never knows whether censored subjects are cured or uncured, or whether they are susceptible or insusceptible to the event of interest. Considering the susceptible indicator as missing data, we propose a multiple imputation approach to cure rate quantile regression for censored data with a survival fraction. We develop an iterative algorithm to estimate the conditionally uncured probability for each subject. By utilizing this estimated probability and Bernoulli sample imputation, we can classify each subject as cured or uncured, and then employ the locally weighted method to estimate the quantile regression coefficients with only the uncured subjects. Repeating the imputation procedure multiple times and taking an average over the resultant estimators, we obtain consistent estimators for the quantile regression coefficients. Our approach relaxes the usual global linearity assumption, so that we can apply quantile regression to any particular quantile of interest. We establish asymptotic properties for the proposed estimators, including both consistency and asymptotic normality. We conduct simulation studies to assess the finite-sample performance of the proposed multiple imputation method and apply it to a lung cancer study as an illustration. © 2016, The International Biometric Society.
A Constrained Linear Estimator for Multiple Regression
ERIC Educational Resources Information Center
Davis-Stober, Clintin P.; Dana, Jason; Budescu, David V.
2010-01-01
"Improper linear models" (see Dawes, Am. Psychol. 34:571-582, "1979"), such as equal weighting, have garnered interest as alternatives to standard regression models. We analyze the general circumstances under which these models perform well by recasting a class of "improper" linear models as "proper" statistical models with a single predictor. We…
Neuberger, Ulf; Kickingereder, Philipp; Helluy, Xavier; Fischer, Manuel; Bendszus, Martin; Heiland, Sabine
2017-12-01
Non-invasive detection of 2-hydroxyglutarate (2HG) by magnetic resonance spectroscopy is attractive since it is related to tumor metabolism. Here, we compare the detection accuracy of 2HG in a controlled phantom setting via widely used localized spectroscopy sequences quantified by linear combination of metabolite signals vs. a more complex approach applying a J-difference editing technique at 9.4T. Different phantoms, comprised out of a concentration series of 2HG and overlapping brain metabolites, were measured with an optimized point-resolved-spectroscopy sequence (PRESS) and an in-house developed J-difference editing sequence. The acquired spectra were post-processed with LCModel and a simulated metabolite set (PRESS) or with a quantification formula for J-difference editing. Linear regression analysis demonstrated a high correlation of real 2HG values with those measured with the PRESS method (adjusted R-squared: 0.700, p<0.001) as well as with those measured with the J-difference editing method (adjusted R-squared: 0.908, p<0.001). The regression model with the J-difference editing method however had a significantly higher explanatory value over the regression model with the PRESS method (p<0.0001). Moreover, with J-difference editing 2HG was discernible down to 1mM, whereas with the PRESS method 2HG values were not discernable below 2mM and with higher systematic errors, particularly in phantoms with high concentrations of N-acetyl-asparate (NAA) and glutamate (Glu). In summary, quantification of 2HG with linear combination of metabolite signals shows high systematic errors particularly at low 2HG concentration and high concentration of confounding metabolites such as NAA and Glu. In contrast, J-difference editing offers a more accurate quantification even at low 2HG concentrations, which outweighs the downsides of longer measurement time and more complex postprocessing. Copyright © 2017. Published by Elsevier GmbH.
On the design of classifiers for crop inventories
NASA Technical Reports Server (NTRS)
Heydorn, R. P.; Takacs, H. C.
1986-01-01
Crop proportion estimators that use classifications of satellite data to correct, in an additive way, a given estimate acquired from ground observations are discussed. A linear version of these estimators is optimal, in terms of minimum variance, when the regression of the ground observations onto the satellite observations in linear. When this regression is not linear, but the reverse regression (satellite observations onto ground observations) is linear, the estimator is suboptimal but still has certain appealing variance properties. In this paper expressions are derived for those regressions which relate the intercepts and slopes to conditional classification probabilities. These expressions are then used to discuss the question of classifier designs that can lead to low-variance crop proportion estimates. Variance expressions for these estimates in terms of classifier omission and commission errors are also derived.
Georgopoulos, Michael; Zehetmayer, Martin; Ruhswurm, Irene; Toma-Bstaendig, Sabine; Ségur-Eltz, Nikolaus; Sacu, Stefan; Menapace, Rupert
2003-01-01
This study assesses differences in relative tumour regression and internal acoustic reflectivity after 3 methods of radiotherapy for uveal melanoma: (1) brachytherapy with ruthenium-106 radioactive plaques (RU), (2) fractionated high-dose gamma knife stereotactic irradiation in 2-3 fractions (GK) or (3) fractionated linear-accelerator-based stereotactic teletherapy in 5 fractions (Linac). Ultrasound measurements of tumour thickness and internal reflectivity were performed with standardised A scan pre-operatively and 3, 6, 9, 12, 18, 24 and 36 months postoperatively. Of 211 patients included in the study, 111 had a complete 3-year follow-up (RU: 41, GK: 37, Linac: 33). Differences in tumour thickness and internal reflectivity were assessed with analysis of variance, and post hoc multiple comparisons were calculated with Tukey's honestly significant difference test. Local tumour control was excellent with all 3 methods (>93%). At 36 months, relative tumour height reduction was 69, 50 and 30% after RU, GK and Linac, respectively. In all 3 treatment groups, internal reflectivity increased from about 30% initially to 60-70% 3 years after treatment. Brachytherapy with ruthenium-106 plaques results in a faster tumour regression as compared to teletherapy with gamma knife or Linac. Internal reflectivity increases comparably in all 3 groups. Besides tumour growth arrest, increasing internal reflectivity is considered as an important factor indicating successful treatment. Copyright 2003 S. Karger AG, Basel
Bokhari, Syed Akhtar H; Khan, Ayyaz A; Butt, Arshad K; Hanif, Mohammad; Izhar, Mateen; Tatakis, Dimitris N; Ashfaq, Mohammad
2014-11-01
Few studies have examined the relationship of individual periodontal parameters with individual systemic biomarkers. This study assessed the possible association between specific clinical parameters of periodontitis and systemic biomarkers of coronary heart disease risk in coronary heart disease patients with periodontitis. Angiographically proven coronary heart disease patients with periodontitis (n = 317), aged >30 years and without other systemic illness were examined. Periodontal clinical parameters of bleeding on probing (BOP), probing depth (PD), and clinical attachment level (CAL) and systemic levels of high-sensitivity C-reactive protein (CRP), fibrinogen (FIB) and white blood cells (WBC) were noted and analyzed to identify associations through linear and stepwise multiple regression analyses. Unadjusted linear regression showed significant associations between periodontal and systemic parameters; the strongest association (r = 0.629; p < 0.001) was found between BOP and CRP levels, the periodontal and systemic inflammation marker, respectively. Stepwise regression analysis models revealed that BOP was a predictor of systemic CRP levels (p < 0.0001). BOP was the only periodontal parameter significantly associated with each systemic parameter (CRP, FIB, and WBC). In coronary heart disease patients with periodontitis, BOP is strongly associated with systemic CRP levels; this association possibly reflects the potential significance of the local periodontal inflammatory burden for systemic inflammation. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
NASA Astrophysics Data System (ADS)
Le Duy, Nguyen; Heidbüchel, Ingo; Meyer, Hanno; Merz, Bruno; Apel, Heiko
2018-02-01
This study analyzes the influence of local and regional climatic factors on the stable isotopic composition of rainfall in the Vietnamese Mekong Delta (VMD) as part of the Asian monsoon region. It is based on 1.5 years of weekly rainfall samples. In the first step, the isotopic composition of the samples is analyzed by local meteoric water lines (LMWLs) and single-factor linear correlations. Additionally, the contribution of several regional and local factors is quantified by multiple linear regression (MLR) of all possible factor combinations and by relative importance analysis. This approach is novel for the interpretation of isotopic records and enables an objective quantification of the explained variance in isotopic records for individual factors. In this study, the local factors are extracted from local climate records, while the regional factors are derived from atmospheric backward trajectories of water particles. The regional factors, i.e., precipitation, temperature, relative humidity and the length of backward trajectories, are combined with equivalent local climatic parameters to explain the response variables δ18O, δ2H, and d-excess of precipitation at the station of measurement. The results indicate that (i) MLR can better explain the isotopic variation in precipitation (R2 = 0.8) compared to single-factor linear regression (R2 = 0.3); (ii) the isotopic variation in precipitation is controlled dominantly by regional moisture regimes (˜ 70 %) compared to local climatic conditions (˜ 30 %); (iii) the most important climatic parameter during the rainy season is the precipitation amount along the trajectories of air mass movement; (iv) the influence of local precipitation amount and temperature is not significant during the rainy season, unlike the regional precipitation amount effect; (v) secondary fractionation processes (e.g., sub-cloud evaporation) can be identified through the d-excess and take place mainly in the dry season, either locally for δ18O and δ2H, or along the air mass trajectories for d-excess. The analysis shows that regional and local factors vary in importance over the seasons and that the source regions and transport pathways, and particularly the climatic conditions along the pathways, have a large influence on the isotopic composition of rainfall. Although the general results have been reported qualitatively in previous studies (proving the validity of the approach), the proposed method provides quantitative estimates of the controlling factors, both for the whole data set and for distinct seasons. Therefore, it is argued that the approach constitutes an advancement in the statistical analysis of isotopic records in rainfall that can supplement or precede more complex studies utilizing atmospheric models. Due to its relative simplicity, the method can be easily transferred to other regions, or extended with other factors. The results illustrate that the interpretation of the isotopic composition of precipitation as a recorder of local climatic conditions, as for example performed for paleorecords of water isotopes, may not be adequate in the southern part of the Indochinese Peninsula, and likely neither in other regions affected by monsoon processes. However, the presented approach could open a pathway towards better and seasonally differentiated reconstruction of paleoclimates based on isotopic records.
Durango delta: Complications on San Juan basin Cretaceous linear strandline theme
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zech, R.S.; Wright, R.
1989-09-01
The Upper Cretaceous Point Lookout Sandstone generally conforms to a predictable cyclic shoreface model in which prograding linear strandline lithosomes dominate formation architecture. Multiple transgressive-regressive cycles results in systematic repetition of lithologies deposited in beach to inner shelf environments. Deposits of approximately five cycles are locally grouped into bundles. Such bundles extend at least 20 km along depositional strike and change from foreshore sandstone to offshore, time-equivalent Mancos mud rock in a downdip distance of 17 to 20 km. Excellent hydrocarbon reservoirs exist where well-sorted shoreface sandstone bundles stack and the formation thickens. This depositional model breaks down in themore » vicinity of Durango, Colorado, where a fluvial-dominated delta front and associated large distributary channels characterize the Point Lookout Sandstone and overlying Menefee Formation.« less
NASA Astrophysics Data System (ADS)
Webb, Mathew A.; Hall, Andrew; Kidd, Darren; Minansy, Budiman
2016-05-01
Assessment of local spatial climatic variability is important in the planning of planting locations for horticultural crops. This study investigated three regression-based calibration methods (i.e. traditional versus two optimized methods) to relate short-term 12-month data series from 170 temperature loggers and 4 weather station sites with data series from nearby long-term Australian Bureau of Meteorology climate stations. The techniques trialled to interpolate climatic temperature variables, such as frost risk, growing degree days (GDDs) and chill hours, were regression kriging (RK), regression trees (RTs) and random forests (RFs). All three calibration methods produced accurate results, with the RK-based calibration method delivering the most accurate validation measures: coefficients of determination ( R 2) of 0.92, 0.97 and 0.95 and root-mean-square errors of 1.30, 0.80 and 1.31 °C, for daily minimum, daily maximum and hourly temperatures, respectively. Compared with the traditional method of calibration using direct linear regression between short-term and long-term stations, the RK-based calibration method improved R 2 and reduced root-mean-square error (RMSE) by at least 5 % and 0.47 °C for daily minimum temperature, 1 % and 0.23 °C for daily maximum temperature and 3 % and 0.33 °C for hourly temperature. Spatial modelling indicated insignificant differences between the interpolation methods, with the RK technique tending to be the slightly better method due to the high degree of spatial autocorrelation between logger sites.
NASA Astrophysics Data System (ADS)
Powell, James Eckhardt
Emissions inventories are an important tool, often built by governments, and used to manage emissions. To build an inventory of urban CO2 emissions and other fossil fuel combustion products in the urban atmosphere, an inventory of on-road traffic is required. In particular, a high resolution inventory is necessary to capture the local characteristics of transport emissions. These emissions vary widely due to the local nature of the fleet, fuel, and roads. Here we show a new model of ADT for the Portland, OR metropolitan region. The backbone is traffic counter recordings made by the Portland Bureau of Transportation at 7,767 sites over 21 years (1986-2006), augmented with PORTAL (The Portland Regional Transportation Archive Listing) freeway traffic count data. We constructed a regression model to fill in traffic network gaps using GIS data such as road class and population density. An EPA-supplied emissions factor was used to estimate transportation CO2 emissions, which is compared to several other estimates for the city's CO2 footprint.
Bowen, Stephen R; Chappell, Richard J; Bentzen, Søren M; Deveau, Michael A; Forrest, Lisa J; Jeraj, Robert
2012-01-01
Purpose To quantify associations between pre-radiotherapy and post-radiotherapy PET parameters via spatially resolved regression. Materials and methods Ten canine sinonasal cancer patients underwent PET/CT scans of [18F]FDG (FDGpre), [18F]FLT (FLTpre), and [61Cu]Cu-ATSM (Cu-ATSMpre). Following radiotherapy regimens of 50 Gy in 10 fractions, veterinary patients underwent FDG PET/CT scans at three months (FDGpost). Regression of standardized uptake values in baseline FDGpre, FLTpre and Cu-ATSMpre tumour voxels to those in FDGpost images was performed for linear, log-linear, generalized-linear and mixed-fit linear models. Goodness-of-fit in regression coefficients was assessed by R2. Hypothesis testing of coefficients over the patient population was performed. Results Multivariate linear model fits of FDGpre to FDGpost were significantly positive over the population (FDGpost~0.17 FDGpre, p=0.03), and classified slopes of RECIST non-responders and responders to be different (0.37 vs. 0.07, p=0.01). Generalized-linear model fits related FDGpre to FDGpost by a linear power law (FDGpost~FDGpre0.93, p<0.001). Univariate mixture model fits of FDGpre improved R2 from 0.17 to 0.52. Neither baseline FLT PET nor Cu-ATSM PET uptake contributed statistically significant multivariate regression coefficients. Conclusions Spatially resolved regression analysis indicates that pre-treatment FDG PET uptake is most strongly associated with three-month post-treatment FDG PET uptake in this patient population, though associations are histopathology-dependent. PMID:22682748
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
An open-access CMIP5 pattern library for temperature and precipitation: Description and methodology
Lynch, Cary D.; Hartin, Corinne A.; Bond-Lamberty, Benjamin; ...
2017-05-15
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squared regression methods. We exploremore » the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90°N/S). Bias and mean errors between modeled and pattern predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5°C, but choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. As a result, this paper describes our library of least squared regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns.« less
Linear regression analysis of survival data with missing censoring indicators.
Wang, Qihua; Dinse, Gregg E
2011-04-01
Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
An Analysis of COLA (Cost of Living Adjustment) Allocation within the United States Coast Guard.
1983-09-01
books Applied Linear Regression [Ref. 39], and Statistical Methods in Research and Production [Ref. 40], or any other book on regression. In the event...Indexes, Master’s Thesis, Air Force Institute of Technology, Wright-Patterson AFB, 1976. 39. Weisberg, Stanford, Applied Linear Regression , Wiley, 1980. 40
Testing hypotheses for differences between linear regression lines
Stanley J. Zarnoch
2009-01-01
Five hypotheses are identified for testing differences between simple linear regression lines. The distinctions between these hypotheses are based on a priori assumptions and illustrated with full and reduced models. The contrast approach is presented as an easy and complete method for testing for overall differences between the regressions and for making pairwise...
Graphical Description of Johnson-Neyman Outcomes for Linear and Quadratic Regression Surfaces.
ERIC Educational Resources Information Center
Schafer, William D.; Wang, Yuh-Yin
A modification of the usual graphical representation of heterogeneous regressions is described that can aid in interpreting significant regions for linear or quadratic surfaces. The standard Johnson-Neyman graph is a bivariate plot with the criterion variable on the ordinate and the predictor variable on the abscissa. Regression surfaces are drawn…
Teaching the Concept of Breakdown Point in Simple Linear Regression.
ERIC Educational Resources Information Center
Chan, Wai-Sum
2001-01-01
Most introductory textbooks on simple linear regression analysis mention the fact that extreme data points have a great influence on ordinary least-squares regression estimation; however, not many textbooks provide a rigorous mathematical explanation of this phenomenon. Suggests a way to fill this gap by teaching students the concept of breakdown…
A Novel Local Learning based Approach With Application to Breast Cancer Diagnosis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Songhua; Tourassi, Georgia
2012-01-01
The purpose of this study is to develop and evaluate a novel local learning-based approach for computer-assisted diagnosis of breast cancer. Our new local learning based algorithm using the linear logistic regression method as its base learner is described. Overall, our algorithm will perform its stochastic searching process until the total allowed computing time is used up by our random walk process in identifying the most suitable population subdivision scheme and their corresponding individual base learners. The proposed local learning-based approach was applied for the prediction of breast cancer given 11 mammographic and clinical findings reported by physicians using themore » BI-RADS lexicon. Our database consisted of 850 patients with biopsy confirmed diagnosis (290 malignant and 560 benign). We also compared the performance of our method with a collection of publicly available state-of-the-art machine learning methods. Predictive performance for all classifiers was evaluated using 10-fold cross validation and Receiver Operating Characteristics (ROC) analysis. Figure 1 reports the performance of 54 machine learning methods implemented in the machine learning toolkit Weka (version 3.0). We introduced a novel local learning-based classifier and compared it with an extensive list of other classifiers for the problem of breast cancer diagnosis. Our experiments show that the algorithm superior prediction performance outperforming a wide range of other well established machine learning techniques. Our conclusion complements the existing understanding in the machine learning field that local learning may capture complicated, non-linear relationships exhibited by real-world datasets.« less
Effect of Malmquist bias on correlation studies with IRAS data base
NASA Technical Reports Server (NTRS)
Verter, Frances
1993-01-01
The relationships between galaxy properties in the sample of Trinchieri et al. (1989) are reexamined with corrections for Malmquist bias. The linear correlations are tested and linear regressions are fit for log-log plots of L(FIR), L(H-alpha), and L(B) as well as ratios of these quantities. The linear correlations for Malmquist bias are corrected using the method of Verter (1988), in which each galaxy observation is weighted by the inverse of its sampling volume. The linear regressions are corrected for Malmquist bias by a new method invented here in which each galaxy observation is weighted by its sampling volume. The results of correlation and regressions among the sample are significantly changed in the anticipated sense that the corrected correlation confidences are lower and the corrected slopes of the linear regressions are lower. The elimination of Malmquist bias eliminates the nonlinear rise in luminosity that has caused some authors to hypothesize additional components of FIR emission.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Choo, Richard; Klotz, Laurence; Deboer, Gerrit; Danjoux, Cyril; Morton, Gerard C
2004-08-01
To assess the prostate specific antigen (PSA) doubling time of untreated, clinically localized, low-to-intermediate grade prostate carcinoma. A prospective single-arm cohort study has been in progress since November 1995 to assess the feasibility of a watchful-observation protocol with selective delayed intervention for clinically localized, low-to-intermediate grade prostate adenocarcinoma. The PSA doubling time was estimated from a linear regression of ln(PSA) against time, assuming a simple exponential growth model. As of March 2003, 231 patients had at least 6 months of follow-up (median 45) and at least three PSA measurements (median 8, range 3-21). The distribution of the doubling time was: < 2 years, 26 patients; 2-5 years, 65; 5-10 years, 42; 10-20 years, 26; 20-50 years, 16; >50 years, 56. The median doubling time was 7.0 years; 42% of men had a doubling time of >10 years. The doubling time of untreated clinically localized, low-to-intermediate grade prostate cancer varies widely.
Changes in Clavicle Length and Maturation in Americans: 1840-1980.
Langley, Natalie R; Cridlin, Sandra
2016-01-01
Secular changes refer to short-term biological changes ostensibly due to environmental factors. Two well-documented secular trends in many populations are earlier age of menarche and increasing stature. This study synthesizes data on maximum clavicle length and fusion of the medial epiphysis in 1840-1980 American birth cohorts to provide a comprehensive assessment of developmental and morphological change in the clavicle. Clavicles from the Hamann-Todd Human Osteological Collection (n = 354), McKern and Stewart Korean War males (n = 341), Forensic Anthropology Data Bank (n = 1,239), and the McCormick Clavicle Collection (n = 1,137) were used in the analysis. Transition analysis was used to evaluate fusion of the medial epiphysis (scored as unfused, fusing, or fused). Several statistical treatments were used to assess fluctuations in maximum clavicle length. First, Durbin-Watson tests were used to evaluate autocorrelation, and a local regression (LOESS) was used to identify visual shifts in the regression slope. Next, piecewise regression was used to fit linear regression models before and after the estimated breakpoints. Multiple starting parameters were tested in the range determined to contain the breakpoint, and the model with the smallest mean squared error was chosen as the best fit. The parameters from the best-fit models were then used to derive the piecewise models, which were compared with the initial simple linear regression models to determine which model provided the best fit for the secular change data. The epiphyseal union data indicate a decline in the age at onset of fusion since the early twentieth century. Fusion commences approximately four years earlier in mid- to late twentieth-century birth cohorts than in late nineteenth- and early twentieth-century birth cohorts. However, fusion is completed at roughly the same age across cohorts. The most significant decline in age at onset of epiphyseal union appears to have occurred since the mid-twentieth century. LOESS plots show a breakpoint in the clavicle length data around the mid-twentieth century in both sexes, and piecewise regression models indicate a significant decrease in clavicle length in the American population after 1940. The piecewise model provides a slightly better fit than the simple linear model. Since the model standard error is not substantially different from the piecewise model, an argument could be made to select the less complex linear model. However, we chose the piecewise model to detect changes in clavicle length that are overfitted with a linear model. The decrease in maximum clavicle length is in line with a documented narrowing of the American skeletal form, as shown by analyses of cranial and facial breadth and bi-iliac breadth of the pelvis. Environmental influences on skeletal form include increases in body mass index, health improvements, improved socioeconomic status, and elimination of infectious diseases. Secular changes in bony dimensions and skeletal maturation stipulate that medical and forensic standards used to deduce information about growth, health, and biological traits must be derived from modern populations.
ERIC Educational Resources Information Center
Rocconi, Louis M.
2013-01-01
This study examined the differing conclusions one may come to depending upon the type of analysis chosen, hierarchical linear modeling or ordinary least squares (OLS) regression. To illustrate this point, this study examined the influences of seniors' self-reported critical thinking abilities three ways: (1) an OLS regression with the student…
Tokunaga, Mutsumi; Hashimoto, Hideki
2013-01-01
While the distinct behaviors of for-profit and non-profit providers in the healthcare market have been compared in the economic literature, their choices regarding market entry and exit have only recently been debated. Since 2000, when public Long-Term Care Insurance was introduced in Japan, for-profit providers have been able to provide formal long-term homecare services. The aim of this study is to determine which factors have affected market entry of for-profit providers under price regulation and in competition with existing non-profit providers. We used nation-wide panel data from 2002 to 2010, aggregated at the level of local public insurers (n = 1557), a basic area unit of service provision. The number of for-profit providers per elderly population in the area unit was regressed against factors related to local demand and service costs using first-difference linear regression, a fixed effects model, and Tobit regression for robustness checking. Results showed that demand (the number of eligible care recipients) and cost factors (population density and minimum wage) significantly influenced for-profit providers' choice of market entry. These findings indicate that for-profit providers will strategically choose a local market for maximizing profit. We believe that price regulation should be redesigned to incorporate quality of care and market conditions, regardless of the profit status of the providers, to ensure equal access to efficient delivery of long-term care across all regions. Copyright © 2012 Elsevier Ltd. All rights reserved.
Pakes, D; Boulding, E G
2010-08-01
Empirical estimates of selection gradients caused by predators are common, yet no one has quantified how these estimates vary with predator ontogeny. We used logistic regression to investigate how selection on gastropod shell thickness changed with predator size. Only small and medium purple shore crabs (Hemigrapsus nudus) exerted a linear selection gradient for increased shell-thickness within a single population of the intertidal snail (Littorina subrotundata). The shape of the fitness function for shell thickness was confirmed to be linear for small and medium crabs but was humped for large male crabs, suggesting no directional selection. A second experiment using two prey species to amplify shell thickness differences established that the selection differential on adult snails decreased linearly as crab size increased. We observed differences in size distribution and sex ratios among three natural shore crab populations that may cause spatial and temporal variation in predator-mediated selection on local snail populations.
ERIC Educational Resources Information Center
Rocconi, Louis M.
2011-01-01
Hierarchical linear models (HLM) solve the problems associated with the unit of analysis problem such as misestimated standard errors, heterogeneity of regression and aggregation bias by modeling all levels of interest simultaneously. Hierarchical linear modeling resolves the problem of misestimated standard errors by incorporating a unique random…
ERIC Educational Resources Information Center
Preacher, Kristopher J.; Curran, Patrick J.; Bauer, Daniel J.
2006-01-01
Simple slopes, regions of significance, and confidence bands are commonly used to evaluate interactions in multiple linear regression (MLR) models, and the use of these techniques has recently been extended to multilevel or hierarchical linear modeling (HLM) and latent curve analysis (LCA). However, conducting these tests and plotting the…
Classical Testing in Functional Linear Models.
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications.
Classical Testing in Functional Linear Models
Kong, Dehan; Staicu, Ana-Maria; Maity, Arnab
2016-01-01
We extend four tests common in classical regression - Wald, score, likelihood ratio and F tests - to functional linear regression, for testing the null hypothesis, that there is no association between a scalar response and a functional covariate. Using functional principal component analysis, we re-express the functional linear model as a standard linear model, where the effect of the functional covariate can be approximated by a finite linear combination of the functional principal component scores. In this setting, we consider application of the four traditional tests. The proposed testing procedures are investigated theoretically for densely observed functional covariates when the number of principal components diverges. Using the theoretical distribution of the tests under the alternative hypothesis, we develop a procedure for sample size calculation in the context of functional linear regression. The four tests are further compared numerically for both densely and sparsely observed noisy functional data in simulation experiments and using two real data applications. PMID:28955155
Musuku, Adrien; Tan, Aimin; Awaiye, Kayode; Trabelsi, Fethi
2013-09-01
Linear calibration is usually performed using eight to ten calibration concentration levels in regulated LC-MS bioanalysis because a minimum of six are specified in regulatory guidelines. However, we have previously reported that two-concentration linear calibration is as reliable as or even better than using multiple concentrations. The purpose of this research is to compare two-concentration with multiple-concentration linear calibration through retrospective data analysis of multiple bioanalytical projects that were conducted in an independent regulated bioanalytical laboratory. A total of 12 bioanalytical projects were randomly selected: two validations and two studies for each of the three most commonly used types of sample extraction methods (protein precipitation, liquid-liquid extraction, solid-phase extraction). When the existing data were retrospectively linearly regressed using only the lowest and the highest concentration levels, no extra batch failure/QC rejection was observed and the differences in accuracy and precision between the original multi-concentration regression and the new two-concentration linear regression are negligible. Specifically, the differences in overall mean apparent bias (square root of mean individual bias squares) are within the ranges of -0.3% to 0.7% and 0.1-0.7% for the validations and studies, respectively. The differences in mean QC concentrations are within the ranges of -0.6% to 1.8% and -0.8% to 2.5% for the validations and studies, respectively. The differences in %CV are within the ranges of -0.7% to 0.9% and -0.3% to 0.6% for the validations and studies, respectively. The average differences in study sample concentrations are within the range of -0.8% to 2.3%. With two-concentration linear regression, an average of 13% of time and cost could have been saved for each batch together with 53% of saving in the lead-in for each project (the preparation of working standard solutions, spiking, and aliquoting). Furthermore, examples are given as how to evaluate the linearity over the entire concentration range when only two concentration levels are used for linear regression. To conclude, two-concentration linear regression is accurate and robust enough for routine use in regulated LC-MS bioanalysis and it significantly saves time and cost as well. Copyright © 2013 Elsevier B.V. All rights reserved.
A Linear Regression and Markov Chain Model for the Arabian Horse Registry
1993-04-01
as a tax deduction? Yes No T-4367 68 26. Regardless of previous equine tax deductions, do you consider your current horse activities to be... (Mark one...E L T-4367 A Linear Regression and Markov Chain Model For the Arabian Horse Registry Accesion For NTIS CRA&I UT 7 4:iC=D 5 D-IC JA" LI J:13tjlC,3 lO...the Arabian Horse Registry, which needed to forecast its future registration of purebred Arabian horses . A linear regression model was utilized to
An improved multiple linear regression and data analysis computer program package
NASA Technical Reports Server (NTRS)
Sidik, S. M.
1972-01-01
NEWRAP, an improved version of a previous multiple linear regression program called RAPIER, CREDUC, and CRSPLT, allows for a complete regression analysis including cross plots of the independent and dependent variables, correlation coefficients, regression coefficients, analysis of variance tables, t-statistics and their probability levels, rejection of independent variables, plots of residuals against the independent and dependent variables, and a canonical reduction of quadratic response functions useful in optimum seeking experimentation. A major improvement over RAPIER is that all regression calculations are done in double precision arithmetic.
Wu, Yao; Yang, Wei; Lu, Lijun; Lu, Zhentai; Zhong, Liming; Huang, Meiyan; Feng, Yanqiu; Feng, Qianjin; Chen, Wufan
2016-10-01
Attenuation correction is important for PET reconstruction. In PET/MR, MR intensities are not directly related to attenuation coefficients that are needed in PET imaging. The attenuation coefficient map can be derived from CT images. Therefore, prediction of CT substitutes from MR images is desired for attenuation correction in PET/MR. This study presents a patch-based method for CT prediction from MR images, generating attenuation maps for PET reconstruction. Because no global relation exists between MR and CT intensities, we propose local diffeomorphic mapping (LDM) for CT prediction. In LDM, we assume that MR and CT patches are located on 2 nonlinear manifolds, and the mapping from the MR manifold to the CT manifold approximates a diffeomorphism under a local constraint. Locality is important in LDM and is constrained by the following techniques. The first is local dictionary construction, wherein, for each patch in the testing MR image, a local search window is used to extract patches from training MR/CT pairs to construct MR and CT dictionaries. The k-nearest neighbors and an outlier detection strategy are then used to constrain the locality in MR and CT dictionaries. Second is local linear representation, wherein, local anchor embedding is used to solve MR dictionary coefficients when representing the MR testing sample. Under these local constraints, dictionary coefficients are linearly transferred from the MR manifold to the CT manifold and used to combine CT training samples to generate CT predictions. Our dataset contains 13 healthy subjects, each with T1- and T2-weighted MR and CT brain images. This method provides CT predictions with a mean absolute error of 110.1 Hounsfield units, Pearson linear correlation of 0.82, peak signal-to-noise ratio of 24.81 dB, and Dice in bone regions of 0.84 as compared with real CTs. CT substitute-based PET reconstruction has a regression slope of 1.0084 and R 2 of 0.9903 compared with real CT-based PET. In this method, no image segmentation or accurate registration is required. Our method demonstrates superior performance in CT prediction and PET reconstruction compared with competing methods. © 2016 by the Society of Nuclear Medicine and Molecular Imaging, Inc.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Benign-malignant mass classification in mammogram using edge weighted local texture features
NASA Astrophysics Data System (ADS)
Rabidas, Rinku; Midya, Abhishek; Sadhu, Anup; Chakraborty, Jayasree
2016-03-01
This paper introduces novel Discriminative Robust Local Binary Pattern (DRLBP) and Discriminative Robust Local Ternary Pattern (DRLTP) for the classification of mammographic masses as benign or malignant. Mass is one of the common, however, challenging evidence of breast cancer in mammography and diagnosis of masses is a difficult task. Since DRLBP and DRLTP overcome the drawbacks of Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) by discriminating a brighter object against the dark background and vice-versa, in addition to the preservation of the edge information along with the texture information, several edge-preserving texture features are extracted, in this study, from DRLBP and DRLTP. Finally, a Fisher Linear Discriminant Analysis method is incorporated with discriminating features, selected by stepwise logistic regression method, for the classification of benign and malignant masses. The performance characteristics of DRLBP and DRLTP features are evaluated using a ten-fold cross-validation technique with 58 masses from the mini-MIAS database, and the best result is observed with DRLBP having an area under the receiver operating characteristic curve of 0.982.
Barnett, Elizabeth; Halverson, Joel
2001-01-01
Objectives. This study analyzed coronary heart disease (CHD) mortality trends from 1985 to 1995, by race and sex, among Black and White adults 35 years and older to determine whether adverse trends were evident in any US localities. Methods. Log-linear regression models of annual age-adjusted death rates provided a quantitative measure of local mortality trends. Results. Increasing trends in CHD mortality were observed in 11 of 174 labor market areas for Black women, 23 of 175 areas for Black men, 10 of 394 areas for White women, and 4 of 394 areas for White men. Nationwide, adverse trends affected 1.7% of Black women, 8.0% of Black men, 1.1% of White women, and 0.3% of White men. Conclusions. From 1985 to 1995, moderate to strong local increases in CHD mortality were observed, predominantly in the southern United States. Black men evidenced the most unfavorable trends and were 25 times as likely as White men to be part of a local population experiencing increases in coronary heart disease mortality. PMID:11527788
Fast neutron irradiation for locally advanced pancreatic cancer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, F.P.; Schein, P.S.; MacDonald, J.S.
1981-11-01
Nineteen patients with locally advanced pancreatic cancer and one patient with islet cell cancer were treated with 1700-1500 neutron rad alone or in combination with 5-fluorouracil to exploit the theoretic advantages of higher linear energy of transfer, and lower oxygen enhancement ratio of neutrons. Only 5 of 14 (36%) obtained partial tumor regression. The median survival for all patients with pancreatic cancer was 6 months, which is less than that reported with 5-fluorouracil and conventional photon irradiation. Gastrointestinal toxicity was considerable; hemorhagic gastritis in five patients, colitis in two and esophagitis in one. One patient developed radiation myelitis. We therefore,more » caution any enthusiasm for this modality of therapy until clear evidence of a therapeutic advantage over photon therapy is demonstrated in controlled clinical trials.« less
NASA Astrophysics Data System (ADS)
Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.
2015-03-01
During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.
Biostatistics Series Module 6: Correlation and Linear Regression.
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient ( r ). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r 2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation ( y = a + bx ), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous.
Biostatistics Series Module 6: Correlation and Linear Regression
Hazra, Avijit; Gogtay, Nithya
2016-01-01
Correlation and linear regression are the most commonly used techniques for quantifying the association between two numeric variables. Correlation quantifies the strength of the linear relationship between paired variables, expressing this as a correlation coefficient. If both variables x and y are normally distributed, we calculate Pearson's correlation coefficient (r). If normality assumption is not met for one or both variables in a correlation analysis, a rank correlation coefficient, such as Spearman's rho (ρ) may be calculated. A hypothesis test of correlation tests whether the linear relationship between the two variables holds in the underlying population, in which case it returns a P < 0.05. A 95% confidence interval of the correlation coefficient can also be calculated for an idea of the correlation in the population. The value r2 denotes the proportion of the variability of the dependent variable y that can be attributed to its linear relation with the independent variable x and is called the coefficient of determination. Linear regression is a technique that attempts to link two correlated variables x and y in the form of a mathematical equation (y = a + bx), such that given the value of one variable the other may be predicted. In general, the method of least squares is applied to obtain the equation of the regression line. Correlation and linear regression analysis are based on certain assumptions pertaining to the data sets. If these assumptions are not met, misleading conclusions may be drawn. The first assumption is that of linear relationship between the two variables. A scatter plot is essential before embarking on any correlation-regression analysis to show that this is indeed the case. Outliers or clustering within data sets can distort the correlation coefficient value. Finally, it is vital to remember that though strong correlation can be a pointer toward causation, the two are not synonymous. PMID:27904175
ERIC Educational Resources Information Center
Quinino, Roberto C.; Reis, Edna A.; Bessegato, Lupercio F.
2013-01-01
This article proposes the use of the coefficient of determination as a statistic for hypothesis testing in multiple linear regression based on distributions acquired by beta sampling. (Contains 3 figures.)
Statistical power analyses using G*Power 3.1: tests for correlation and regression analyses.
Faul, Franz; Erdfelder, Edgar; Buchner, Axel; Lang, Albert-Georg
2009-11-01
G*Power is a free power analysis program for a variety of statistical tests. We present extensions and improvements of the version introduced by Faul, Erdfelder, Lang, and Buchner (2007) in the domain of correlation and regression analyses. In the new version, we have added procedures to analyze the power of tests based on (1) single-sample tetrachoric correlations, (2) comparisons of dependent correlations, (3) bivariate linear regression, (4) multiple linear regression based on the random predictor model, (5) logistic regression, and (6) Poisson regression. We describe these new features and provide a brief introduction to their scope and handling.
Rasmussen, Patrick P.; Gray, John R.; Glysson, G. Douglas; Ziegler, Andrew C.
2009-01-01
In-stream continuous turbidity and streamflow data, calibrated with measured suspended-sediment concentration data, can be used to compute a time series of suspended-sediment concentration and load at a stream site. Development of a simple linear (ordinary least squares) regression model for computing suspended-sediment concentrations from instantaneous turbidity data is the first step in the computation process. If the model standard percentage error (MSPE) of the simple linear regression model meets a minimum criterion, this model should be used to compute a time series of suspended-sediment concentrations. Otherwise, a multiple linear regression model using paired instantaneous turbidity and streamflow data is developed and compared to the simple regression model. If the inclusion of the streamflow variable proves to be statistically significant and the uncertainty associated with the multiple regression model results in an improvement over that for the simple linear model, the turbidity-streamflow multiple linear regression model should be used to compute a suspended-sediment concentration time series. The computed concentration time series is subsequently used with its paired streamflow time series to compute suspended-sediment loads by standard U.S. Geological Survey techniques. Once an acceptable regression model is developed, it can be used to compute suspended-sediment concentration beyond the period of record used in model development with proper ongoing collection and analysis of calibration samples. Regression models to compute suspended-sediment concentrations are generally site specific and should never be considered static, but they represent a set period in a continually dynamic system in which additional data will help verify any change in sediment load, type, and source.
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
NASA Astrophysics Data System (ADS)
Kumar, V.; Melet, A.; Meyssignac, B.; Ganachaud, A.; Kessler, W. S.; Singh, A.; Aucan, J.
2018-02-01
Rising sea levels are a critical concern in small island nations. The problem is especially serious in the western south Pacific, where the total sea level rise over the last 60 years has been up to 3 times the global average. In this study, we aim at reconstructing sea levels at selected sites in the region (Suva, Lautoka—Fiji, and Nouméa—New Caledonia) as a multilinear regression (MLR) of atmospheric and oceanic variables. We focus on sea level variability at interannual-to-interdecadal time scales, and trend over the 1988-2014 period. Local sea levels are first expressed as a sum of steric and mass changes. Then a dynamical approach is used based on wind stress curl as a proxy for the thermosteric component, as wind stress curl anomalies can modulate the thermocline depth and resultant sea levels via Rossby wave propagation. Statistically significant predictors among wind stress curl, halosteric sea level, zonal/meridional wind stress components, and sea surface temperature are used to construct a MLR model simulating local sea levels. Although we are focusing on the local scale, the global mean sea level needs to be adjusted for. Our reconstructions provide insights on key drivers of sea level variability at the selected sites, showing that while local dynamics and the global signal modulate sea level to a given extent, most of the variance is driven by regional factors. On average, the MLR model is able to reproduce 82% of the variance in island sea level, and could be used to derive local sea level projections via downscaling of climate models.
NASA Astrophysics Data System (ADS)
Mahaboob, B.; Venkateswarlu, B.; Sankar, J. Ravi; Balasiddamuni, P.
2017-11-01
This paper uses matrix calculus techniques to obtain Nonlinear Least Squares Estimator (NLSE), Maximum Likelihood Estimator (MLE) and Linear Pseudo model for nonlinear regression model. David Pollard and Peter Radchenko [1] explained analytic techniques to compute the NLSE. However the present research paper introduces an innovative method to compute the NLSE using principles in multivariate calculus. This study is concerned with very new optimization techniques used to compute MLE and NLSE. Anh [2] derived NLSE and MLE of a heteroscedatistic regression model. Lemcoff [3] discussed a procedure to get linear pseudo model for nonlinear regression model. In this research article a new technique is developed to get the linear pseudo model for nonlinear regression model using multivariate calculus. The linear pseudo model of Edmond Malinvaud [4] has been explained in a very different way in this paper. David Pollard et.al used empirical process techniques to study the asymptotic of the LSE (Least-squares estimation) for the fitting of nonlinear regression function in 2006. In Jae Myung [13] provided a go conceptual for Maximum likelihood estimation in his work “Tutorial on maximum likelihood estimation
Real-time model learning using Incremental Sparse Spectrum Gaussian Process Regression.
Gijsberts, Arjan; Metta, Giorgio
2013-05-01
Novel applications in unstructured and non-stationary human environments require robots that learn from experience and adapt autonomously to changing conditions. Predictive models therefore not only need to be accurate, but should also be updated incrementally in real-time and require minimal human intervention. Incremental Sparse Spectrum Gaussian Process Regression is an algorithm that is targeted specifically for use in this context. Rather than developing a novel algorithm from the ground up, the method is based on the thoroughly studied Gaussian Process Regression algorithm, therefore ensuring a solid theoretical foundation. Non-linearity and a bounded update complexity are achieved simultaneously by means of a finite dimensional random feature mapping that approximates a kernel function. As a result, the computational cost for each update remains constant over time. Finally, algorithmic simplicity and support for automated hyperparameter optimization ensures convenience when employed in practice. Empirical validation on a number of synthetic and real-life learning problems confirms that the performance of Incremental Sparse Spectrum Gaussian Process Regression is superior with respect to the popular Locally Weighted Projection Regression, while computational requirements are found to be significantly lower. The method is therefore particularly suited for learning with real-time constraints or when computational resources are limited. Copyright © 2012 Elsevier Ltd. All rights reserved.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
GIS Tools to Estimate Average Annual Daily Traffic
DOT National Transportation Integrated Search
2012-06-01
This project presents five tools that were created for a geographical information system to estimate Annual Average Daily : Traffic using linear regression. Three of the tools can be used to prepare spatial data for linear regression. One tool can be...
Zhang, Hua; Kurgan, Lukasz
2014-12-01
Knowledge of protein flexibility is vital for deciphering the corresponding functional mechanisms. This knowledge would help, for instance, in improving computational drug design and refinement in homology-based modeling. We propose a new predictor of the residue flexibility, which is expressed by B-factors, from protein chains that use local (in the chain) predicted (or native) relative solvent accessibility (RSA) and custom-derived amino acid (AA) alphabets. Our predictor is implemented as a two-stage linear regression model that uses RSA-based space in a local sequence window in the first stage and a reduced AA pair-based space in the second stage as the inputs. This method is easy to comprehend explicit linear form in both stages. Particle swarm optimization was used to find an optimal reduced AA alphabet to simplify the input space and improve the prediction performance. The average correlation coefficients between the native and predicted B-factors measured on a large benchmark dataset are improved from 0.65 to 0.67 when using the native RSA values and from 0.55 to 0.57 when using the predicted RSA values. Blind tests that were performed on two independent datasets show consistent improvements in the average correlation coefficients by a modest value of 0.02 for both native and predicted RSA-based predictions.
Jose F. Negron; Willis C. Schaupp; Kenneth E. Gibson; John Anhold; Dawn Hansen; Ralph Thier; Phil Mocettini
1999-01-01
Data collected from Douglas-fir stands infected by the Douglas-fir beetle in Wyoming, Montana, Idaho, and Utah, were used to develop models to estimate amount of mortality in terms of basal area killed. Models were built using stepwise linear regression and regression tree approaches. Linear regression models using initial Douglas-fir basal area were built for all...
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
An open-access CMIP5 pattern library for temperature and precipitation: description and methodology
NASA Astrophysics Data System (ADS)
Lynch, Cary; Hartin, Corinne; Bond-Lamberty, Ben; Kravitz, Ben
2017-05-01
Pattern scaling is used to efficiently emulate general circulation models and explore uncertainty in climate projections under multiple forcing scenarios. Pattern scaling methods assume that local climate changes scale with a global mean temperature increase, allowing for spatial patterns to be generated for multiple models for any future emission scenario. For uncertainty quantification and probabilistic statistical analysis, a library of patterns with descriptive statistics for each file would be beneficial, but such a library does not presently exist. Of the possible techniques used to generate patterns, the two most prominent are the delta and least squares regression methods. We explore the differences and statistical significance between patterns generated by each method and assess performance of the generated patterns across methods and scenarios. Differences in patterns across seasons between methods and epochs were largest in high latitudes (60-90° N/S). Bias and mean errors between modeled and pattern-predicted output from the linear regression method were smaller than patterns generated by the delta method. Across scenarios, differences in the linear regression method patterns were more statistically significant, especially at high latitudes. We found that pattern generation methodologies were able to approximate the forced signal of change to within ≤ 0.5 °C, but the choice of pattern generation methodology for pattern scaling purposes should be informed by user goals and criteria. This paper describes our library of least squares regression patterns from all CMIP5 models for temperature and precipitation on an annual and sub-annual basis, along with the code used to generate these patterns. The dataset and netCDF data generation code are available at doi:10.5281/zenodo.495632.
Watanabe, Hiroyuki; Miyazaki, Hiroyasu
2006-01-01
Over- and/or under-correction of QT intervals for changes in heart rate may lead to misleading conclusions and/or masking the potential of a drug to prolong the QT interval. This study examines a nonparametric regression model (Loess Smoother) to adjust the QT interval for differences in heart rate, with an improved fitness over a wide range of heart rates. 240 sets of (QT, RR) observations collected from each of 8 conscious and non-treated beagle dogs were used as the materials for investigation. The fitness of the nonparametric regression model to the QT-RR relationship was compared with four models (individual linear regression, common linear regression, and Bazett's and Fridericia's correlation models) with reference to Akaike's Information Criterion (AIC). Residuals were visually assessed. The bias-corrected AIC of the nonparametric regression model was the best of the models examined in this study. Although the parametric models did not fit, the nonparametric regression model improved the fitting at both fast and slow heart rates. The nonparametric regression model is the more flexible method compared with the parametric method. The mathematical fit for linear regression models was unsatisfactory at both fast and slow heart rates, while the nonparametric regression model showed significant improvement at all heart rates in beagle dogs.
Hashimoto, Ken; Zúniga, Concepción; Romero, Eduardo; Morales, Zoraida; Maguire, James H
2015-01-01
Central American countries face a major challenge in the control of Triatoma dimidiata, a widespread vector of Chagas disease that cannot be eliminated. The key to maintaining the risk of transmission of Trypanosoma cruzi at lowest levels is to sustain surveillance throughout endemic areas. Guatemala, El Salvador, and Honduras integrated community-based vector surveillance into local health systems. Community participation was effective in detection of the vector, but some health services had difficulty sustaining their response to reports of vectors from the population. To date, no research has investigated how best to maintain and reinforce health service responsiveness, especially in resource-limited settings. We reviewed surveillance and response records of 12 health centers in Guatemala, El Salvador, and Honduras from 2008 to 2012 and analyzed the data in relation to the volume of reports of vector infestation, local geography, demography, human resources, managerial approach, and results of interviews with health workers. Health service responsiveness was defined as the percentage of households that reported vector infestation for which the local health service provided indoor residual spraying of insecticide or educational advice. Eight potential determinants of responsiveness were evaluated by linear and mixed-effects multi-linear regression. Health service responsiveness (overall 77.4%) was significantly associated with quarterly monitoring by departmental health offices. Other potential determinants of responsiveness were not found to be significant, partly because of short- and long-term strategies, such as temporary adjustments in manpower and redistribution of tasks among local participants in the effort. Consistent monitoring within the local health system contributes to sustainability of health service responsiveness in community-based vector surveillance of Chagas disease. Even with limited resources, countries can improve health service responsiveness with thoughtful strategies and management practices in the local health systems.
Dussaillant, Francisca; Apablaza, Mauricio
2017-08-01
After a major earthquake, the assignment of scarce mental health emergency personnel to different geographic areas is crucial to the effective management of the crisis. The scarce information that is available in the aftermath of a disaster may be valuable in helping predict where are the populations that are in most need. The objectives of this study were to derive algorithms to predict posttraumatic stress (PTS) symptom prevalence and local distribution after an earthquake and to test whether there are algorithms that require few input data and are still reasonably predictive. A rich database of PTS symptoms, informed after Chile's 2010 earthquake and tsunami, was used. Several model specifications for the mean and centiles of the distribution of PTS symptoms, together with posttraumatic stress disorder (PTSD) prevalence, were estimated via linear and quantile regressions. The models varied in the set of covariates included. Adjusted R2 for the most liberal specifications (in terms of numbers of covariates included) ranged from 0.62 to 0.74, depending on the outcome. When only including peak ground acceleration (PGA), poverty rate, and household damage in linear and quadratic form, predictive capacity was still good (adjusted R2 from 0.59 to 0.67 were obtained). Information about local poverty, household damage, and PGA can be used as an aid to predict PTS symptom prevalence and local distribution after an earthquake. This can be of help to improve the assignment of mental health personnel to the affected localities. Dussaillant F , Apablaza M . Predicting posttraumatic stress symptom prevalence and local distribution after an earthquake with scarce data. Prehosp Disaster Med. 2017;32(4):357-367.
Linear regression analysis: part 14 of a series on evaluation of scientific publications.
Schneider, Astrid; Hommel, Gerhard; Blettner, Maria
2010-11-01
Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
Grajeda, Laura M; Ivanescu, Andrada; Saito, Mayuko; Crainiceanu, Ciprian; Jaganath, Devan; Gilman, Robert H; Crabtree, Jean E; Kelleher, Dermott; Cabrera, Lilia; Cama, Vitaliano; Checkley, William
2016-01-01
Childhood growth is a cornerstone of pediatric research. Statistical models need to consider individual trajectories to adequately describe growth outcomes. Specifically, well-defined longitudinal models are essential to characterize both population and subject-specific growth. Linear mixed-effect models with cubic regression splines can account for the nonlinearity of growth curves and provide reasonable estimators of population and subject-specific growth, velocity and acceleration. We provide a stepwise approach that builds from simple to complex models, and account for the intrinsic complexity of the data. We start with standard cubic splines regression models and build up to a model that includes subject-specific random intercepts and slopes and residual autocorrelation. We then compared cubic regression splines vis-à-vis linear piecewise splines, and with varying number of knots and positions. Statistical code is provided to ensure reproducibility and improve dissemination of methods. Models are applied to longitudinal height measurements in a cohort of 215 Peruvian children followed from birth until their fourth year of life. Unexplained variability, as measured by the variance of the regression model, was reduced from 7.34 when using ordinary least squares to 0.81 (p < 0.001) when using a linear mixed-effect models with random slopes and a first order continuous autoregressive error term. There was substantial heterogeneity in both the intercept (p < 0.001) and slopes (p < 0.001) of the individual growth trajectories. We also identified important serial correlation within the structure of the data (ρ = 0.66; 95 % CI 0.64 to 0.68; p < 0.001), which we modeled with a first order continuous autoregressive error term as evidenced by the variogram of the residuals and by a lack of association among residuals. The final model provides a parametric linear regression equation for both estimation and prediction of population- and individual-level growth in height. We show that cubic regression splines are superior to linear regression splines for the case of a small number of knots in both estimation and prediction with the full linear mixed effect model (AIC 19,352 vs. 19,598, respectively). While the regression parameters are more complex to interpret in the former, we argue that inference for any problem depends more on the estimated curve or differences in curves rather than the coefficients. Moreover, use of cubic regression splines provides biological meaningful growth velocity and acceleration curves despite increased complexity in coefficient interpretation. Through this stepwise approach, we provide a set of tools to model longitudinal childhood data for non-statisticians using linear mixed-effect models.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
NASA Astrophysics Data System (ADS)
Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew
2017-05-01
This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
Regression Model Term Selection for the Analysis of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert Manfred; Volden, Thomas R.
2010-01-01
The paper discusses the selection of regression model terms for the analysis of wind tunnel strain-gage balance calibration data. Different function class combinations are presented that may be used to analyze calibration data using either a non-iterative or an iterative method. The role of the intercept term in a regression model of calibration data is reviewed. In addition, useful algorithms and metrics originating from linear algebra and statistics are recommended that will help an analyst (i) to identify and avoid both linear and near-linear dependencies between regression model terms and (ii) to make sure that the selected regression model of the calibration data uses only statistically significant terms. Three different tests are suggested that may be used to objectively assess the predictive capability of the final regression model of the calibration data. These tests use both the original data points and regression model independent confirmation points. Finally, data from a simplified manual calibration of the Ames MK40 balance is used to illustrate the application of some of the metrics and tests to a realistic calibration data set.
SU-C-304-05: Use of Local Noise Power Spectrum and Wavelets in Comprehensive EPID Quality Assurance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lee, S; Gopal, A; Yan, G
2015-06-15
Purpose: As EPIDs are increasingly used for IMRT QA and real-time treatment verification, comprehensive quality assurance (QA) of EPIDs becomes critical. Current QA with phantoms such as the Las Vegas and PIPSpro™ can fail in the early detection of EPID artifacts. Beyond image quality assessment, we propose a quantitative methodology using local noise power spectrum (NPS) to characterize image noise and wavelet transform to identify bad pixels and inter-subpanel flat-fielding artifacts. Methods: A total of 93 image sets including bar-pattern images and open exposure images were collected from four iViewGT a-Si EPID systems over three years. Quantitative metrics such asmore » modulation transform function (MTF), NPS and detective quantum efficiency (DQE) were computed for each image set. Local 2D NPS was calculated for each subpanel. A 1D NPS was obtained by radial averaging the 2D NPS and fitted to a power-law function. R-square and slope of the linear regression analysis were used for panel performance assessment. Haar wavelet transformation was employed to identify pixel defects and non-uniform gain correction across subpanels. Results: Overall image quality was assessed with DQE based on empirically derived area under curve (AUC) thresholds. Using linear regression analysis of 1D NPS, panels with acceptable flat fielding were indicated by r-square between 0.8 and 1, and slopes of −0.4 to −0.7. However, for panels requiring flat fielding recalibration, r-square values less than 0.8 and slopes from +0.2 to −0.4 were observed. The wavelet transform successfully identified pixel defects and inter-subpanel flat fielding artifacts. Standard QA with the Las Vegas and PIPSpro phantoms failed to detect these artifacts. Conclusion: The proposed QA methodology is promising for the early detection of imaging and dosimetric artifacts of EPIDs. Local NPS can accurately characterize the noise level within each subpanel, while the wavelet transforms can detect bad pixels and inter-subpanel flat fielding artifacts.« less
NASA Astrophysics Data System (ADS)
Lorenzi, Juan M.; Stecher, Thomas; Reuter, Karsten; Matera, Sebastian
2017-10-01
Many problems in computational materials science and chemistry require the evaluation of expensive functions with locally rapid changes, such as the turn-over frequency of first principles kinetic Monte Carlo models for heterogeneous catalysis. Because of the high computational cost, it is often desirable to replace the original with a surrogate model, e.g., for use in coupled multiscale simulations. The construction of surrogates becomes particularly challenging in high-dimensions. Here, we present a novel version of the modified Shepard interpolation method which can overcome the curse of dimensionality for such functions to give faithful reconstructions even from very modest numbers of function evaluations. The introduction of local metrics allows us to take advantage of the fact that, on a local scale, rapid variation often occurs only across a small number of directions. Furthermore, we use local error estimates to weigh different local approximations, which helps avoid artificial oscillations. Finally, we test our approach on a number of challenging analytic functions as well as a realistic kinetic Monte Carlo model. Our method not only outperforms existing isotropic metric Shepard methods but also state-of-the-art Gaussian process regression.
Lorenzi, Juan M; Stecher, Thomas; Reuter, Karsten; Matera, Sebastian
2017-10-28
Many problems in computational materials science and chemistry require the evaluation of expensive functions with locally rapid changes, such as the turn-over frequency of first principles kinetic Monte Carlo models for heterogeneous catalysis. Because of the high computational cost, it is often desirable to replace the original with a surrogate model, e.g., for use in coupled multiscale simulations. The construction of surrogates becomes particularly challenging in high-dimensions. Here, we present a novel version of the modified Shepard interpolation method which can overcome the curse of dimensionality for such functions to give faithful reconstructions even from very modest numbers of function evaluations. The introduction of local metrics allows us to take advantage of the fact that, on a local scale, rapid variation often occurs only across a small number of directions. Furthermore, we use local error estimates to weigh different local approximations, which helps avoid artificial oscillations. Finally, we test our approach on a number of challenging analytic functions as well as a realistic kinetic Monte Carlo model. Our method not only outperforms existing isotropic metric Shepard methods but also state-of-the-art Gaussian process regression.
Local variations in the timing of RSV epidemics.
Noveroske, Douglas B; Warren, Joshua L; Pitzer, Virginia E; Weinberger, Daniel M
2016-11-11
Respiratory syncytial virus (RSV) is a primary cause of hospitalizations in children worldwide. The timing of seasonal RSV epidemics needs to be known in order to administer prophylaxis to high-risk infants at the appropriate time. We used data from the Connecticut State Inpatient Database to identify RSV hospitalizations based on ICD-9 diagnostic codes. Harmonic regression analyses were used to evaluate RSV epidemic timing at the county level and ZIP code levels. Linear regression was used to investigate associations between the socioeconomic status of a locality and RSV epidemic timing. 9,740 hospitalizations coded as RSV occurred among children less than 2 years old between July 1, 1997 and June 30, 2013. The earliest ZIP code had a seasonal RSV epidemic that peaked, on average, 4.64 weeks earlier than the latest ZIP code. Earlier epidemic timing was significantly associated with demographic characteristics (higher population density and larger fraction of the population that was black). Seasonal RSV epidemics in Connecticut occurred earlier in areas that were more urban (higher population density and larger fraction of the population that was). These findings could be used to better time the administration of prophylaxis to high-risk infants.
Robust extraction of baseline signal of atmospheric trace species using local regression
NASA Astrophysics Data System (ADS)
Ruckstuhl, A. F.; Henne, S.; Reimann, S.; Steinbacher, M.; Vollmer, M. K.; O'Doherty, S.; Buchmann, B.; Hueglin, C.
2012-11-01
The identification of atmospheric trace species measurements that are representative of well-mixed background air masses is required for monitoring atmospheric composition change at background sites. We present a statistical method based on robust local regression that is well suited for the selection of background measurements and the estimation of associated baseline curves. The bootstrap technique is applied to calculate the uncertainty in the resulting baseline curve. The non-parametric nature of the proposed approach makes it a very flexible data filtering method. Application to carbon monoxide (CO) measured from 1996 to 2009 at the high-alpine site Jungfraujoch (Switzerland, 3580 m a.s.l.), and to measurements of 1,1-difluoroethane (HFC-152a) from Jungfraujoch (2000 to 2009) and Mace Head (Ireland, 1995 to 2009) demonstrates the feasibility and usefulness of the proposed approach. The determined average annual change of CO at Jungfraujoch for the 1996 to 2009 period as estimated from filtered annual mean CO concentrations is -2.2 ± 1.1 ppb yr-1. For comparison, the linear trend of unfiltered CO measurements at Jungfraujoch for this time period is -2.9 ± 1.3 ppb yr-1.
Locally Weighted Score Estimation for Quantile Classification in Binary Regression Models
Rice, John D.; Taylor, Jeremy M. G.
2016-01-01
One common use of binary response regression methods is classification based on an arbitrary probability threshold dictated by the particular application. Since this is given to us a priori, it is sensible to incorporate the threshold into our estimation procedure. Specifically, for the linear logistic model, we solve a set of locally weighted score equations, using a kernel-like weight function centered at the threshold. The bandwidth for the weight function is selected by cross validation of a novel hybrid loss function that combines classification error and a continuous measure of divergence between observed and fitted values; other possible cross-validation functions based on more common binary classification metrics are also examined. This work has much in common with robust estimation, but diers from previous approaches in this area in its focus on prediction, specifically classification into high- and low-risk groups. Simulation results are given showing the reduction in error rates that can be obtained with this method when compared with maximum likelihood estimation, especially under certain forms of model misspecification. Analysis of a melanoma data set is presented to illustrate the use of the method in practice. PMID:28018492
Vengeance, Condomless Sex and HIV Disclosure Among Men Who Have Sex with Men Living with HIV.
Brown, Monique J; Serovich, Julianne M; Kimberly, Judy A; Hu, Jinxiang
2017-09-01
Vengeance has been shown to be a risk factor for HIV nondisclosure. Research examining the associations between vengeance, condomless sex, and HIV nondisclosure is lacking. The aim of the current study was to explore the association between vengeance, condomless sex and disclosure (behavior, attitude and intention) among men who have sex with men (MSM) living with HIV. Participants included 266 MSM who were a part of a disclosure intervention study. Men were recruited from local and state AIDS service organizations (ASOs), HIV-related venues and forums, and at local eating and drinking establishments in Tampa, Florida, and Columbus and Dayton, Ohio metropolitan statistical areas (MSAs). Advertisements were also placed in local daily newspapers. Vengeance was operationalized into three groups based on percentiles (least, more, and most vengeful) and as a continuous variable. Crude and multivariable logistic regression models were used to examine the association between vengeance and condomless sex in the past 30 days. Simple and multiple linear regression models were used to determine the association between vengeance and HIV disclosure. After adjusting for demographic and geographic characteristics, participants who were "most vengeful" had, on average, an approximate six-point decrease (β: -5.46; 95% CI -9.55, -1.36) in disclosure intention compared to MSM who were "least vengeful." Prevention and intervention programs geared towards improving disclosure among MSM should address vengeance.
A simple approach to lifetime learning in genetic programming-based symbolic regression.
Azad, Raja Muhammad Atif; Ryan, Conor
2014-01-01
Genetic programming (GP) coarsely models natural evolution to evolve computer programs. Unlike in nature, where individuals can often improve their fitness through lifetime experience, the fitness of GP individuals generally does not change during their lifetime, and there is usually no opportunity to pass on acquired knowledge. This paper introduces the Chameleon system to address this discrepancy and augment GP with lifetime learning by adding a simple local search that operates by tuning the internal nodes of individuals. Although not the first attempt to combine local search with GP, its simplicity means that it is easy to understand and cheap to implement. A simple cache is added which leverages the local search to reduce the tuning cost to a small fraction of the expected cost, and we provide a theoretical upper limit on the maximum tuning expense given the average tree size of the population and show that this limit grows very conservatively as the average tree size of the population increases. We show that Chameleon uses available genetic material more efficiently by exploring more actively than with standard GP, and demonstrate that not only does Chameleon outperform standard GP (on both training and test data) over a number of symbolic regression type problems, it does so by producing smaller individuals and it works harmoniously with two other well-known extensions to GP, namely, linear scaling and a diversity-promoting tournament selection method.
Scarneciu, Camelia C; Sangeorzan, Livia; Rus, Horatiu; Scarneciu, Vlad D; Varciu, Mihai S; Andreescu, Oana; Scarneciu, Ioan
2017-01-01
This study aimed at assessing the incidence of pulmonary hypertension (PH) at newly diagnosed hyperthyroid patients and at finding a simple model showing the complex functional relation between pulmonary hypertension in hyperthyroidism and the factors causing it. The 53 hyperthyroid patients (H-group) were evaluated mainly by using an echocardiographical method and compared with 35 euthyroid (E-group) and 25 healthy people (C-group). In order to identify the factors causing pulmonary hypertension the statistical method of comparing the values of arithmetical means is used. The functional relation between the two random variables (PAPs and each of the factors determining it within our research study) can be expressed by linear or non-linear function. By applying the linear regression method described by a first-degree equation the line of regression (linear model) has been determined; by applying the non-linear regression method described by a second degree equation, a parabola-type curve of regression (non-linear or polynomial model) has been determined. We made the comparison and the validation of these two models by calculating the determination coefficient (criterion 1), the comparison of residuals (criterion 2), application of AIC criterion (criterion 3) and use of F-test (criterion 4). From the H-group, 47% have pulmonary hypertension completely reversible when obtaining euthyroidism. The factors causing pulmonary hypertension were identified: previously known- level of free thyroxin, pulmonary vascular resistance, cardiac output; new factors identified in this study- pretreatment period, age, systolic blood pressure. According to the four criteria and to the clinical judgment, we consider that the polynomial model (graphically parabola- type) is better than the linear one. The better model showing the functional relation between the pulmonary hypertension in hyperthyroidism and the factors identified in this study is given by a polynomial equation of second degree where the parabola is its graphical representation.
As a fast and effective technique, the multiple linear regression (MLR) method has been widely used in modeling and prediction of beach bacteria concentrations. Among previous works on this subject, however, several issues were insufficiently or inconsistently addressed. Those is...
A simplified competition data analysis for radioligand specific activity determination.
Venturino, A; Rivera, E S; Bergoc, R M; Caro, R A
1990-01-01
Non-linear regression and two-step linear fit methods were developed to determine the actual specific activity of 125I-ovine prolactin by radioreceptor self-displacement analysis. The experimental results obtained by the different methods are superposable. The non-linear regression method is considered to be the most adequate procedure to calculate the specific activity, but if its software is not available, the other described methods are also suitable.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
NASA Astrophysics Data System (ADS)
Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.
2009-08-01
In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
Carvalho, Carlos; Gomes, Danielo G.; Agoulmine, Nazim; de Souza, José Neuman
2011-01-01
This paper proposes a method based on multivariate spatial and temporal correlation to improve prediction accuracy in data reduction for Wireless Sensor Networks (WSN). Prediction of data not sent to the sink node is a technique used to save energy in WSNs by reducing the amount of data traffic. However, it may not be very accurate. Simulations were made involving simple linear regression and multiple linear regression functions to assess the performance of the proposed method. The results show a higher correlation between gathered inputs when compared to time, which is an independent variable widely used for prediction and forecasting. Prediction accuracy is lower when simple linear regression is used, whereas multiple linear regression is the most accurate one. In addition to that, our proposal outperforms some current solutions by about 50% in humidity prediction and 21% in light prediction. To the best of our knowledge, we believe that we are probably the first to address prediction based on multivariate correlation for WSN data reduction. PMID:22346626
Confidence limits for data mining models of options prices
NASA Astrophysics Data System (ADS)
Healy, J. V.; Dixon, M.; Read, B. J.; Cai, F. F.
2004-12-01
Non-parametric methods such as artificial neural nets can successfully model prices of financial options, out-performing the Black-Scholes analytic model (Eur. Phys. J. B 27 (2002) 219). However, the accuracy of such approaches is usually expressed only by a global fitting/error measure. This paper describes a robust method for determining prediction intervals for models derived by non-linear regression. We have demonstrated it by application to a standard synthetic example (29th Annual Conference of the IEEE Industrial Electronics Society, Special Session on Intelligent Systems, pp. 1926-1931). The method is used here to obtain prediction intervals for option prices using market data for LIFFE “ESX” FTSE 100 index options ( http://www.liffe.com/liffedata/contracts/month_onmonth.xls). We avoid special neural net architectures and use standard regression procedures to determine local error bars. The method is appropriate for target data with non constant variance (or volatility).
Using the Ridge Regression Procedures to Estimate the Multiple Linear Regression Coefficients
NASA Astrophysics Data System (ADS)
Gorgees, HazimMansoor; Mahdi, FatimahAssim
2018-05-01
This article concerns with comparing the performance of different types of ordinary ridge regression estimators that have been already proposed to estimate the regression parameters when the near exact linear relationships among the explanatory variables is presented. For this situations we employ the data obtained from tagi gas filling company during the period (2008-2010). The main result we reached is that the method based on the condition number performs better than other methods since it has smaller mean square error (MSE) than the other stated methods.
NASA Astrophysics Data System (ADS)
Hoffman, A.; Forest, C. E.; Kemanian, A.
2016-12-01
A significant number of food-insecure nations exist in regions of the world where dust plays a large role in the climate system. While the impacts of common climate variables (e.g. temperature, precipitation, ozone, and carbon dioxide) on crop yields are relatively well understood, the impact of mineral aerosols on yields have not yet been thoroughly investigated. This research aims to develop the data and tools to progress our understanding of mineral aerosol impacts on crop yields. Suspended dust affects crop yields by altering the amount and type of radiation reaching the plant, modifying local temperature and precipitation. While dust events (i.e. dust storms) affect crop yields by depleting the soil of nutrients or by defoliation via particle abrasion. The impact of dust on yields is modeled statistically because we are uncertain which impacts will dominate the response on national and regional scales considered in this study. Multiple linear regression is used in a number of large-scale statistical crop modeling studies to estimate yield responses to various climate variables. In alignment with previous work, we develop linear crop models, but build upon this simple method of regression with machine-learning techniques (e.g. random forests) to identify important statistical predictors and isolate how dust affects yields on the scales of interest. To perform this analysis, we develop a crop-climate dataset for maize, soybean, groundnut, sorghum, rice, and wheat for the regions of West Africa, East Africa, South Africa, and the Sahel. Random forest regression models consistently model historic crop yields better than the linear models. In several instances, the random forest models accurately capture the temperature and precipitation threshold behavior in crops. Additionally, improving agricultural technology has caused a well-documented positive trend that dominates time series of global and regional yields. This trend is often removed before regression with traditional crop models, but likely at the cost of removing climate information. Our random forest models consistently discover the positive trend without removing any additional data. The application of random forests as a statistical crop model provides insight into understanding the impact of dust on yields in marginal food producing regions.
Alzheimer's Disease Detection by Pseudo Zernike Moment and Linear Regression Classification.
Wang, Shui-Hua; Du, Sidan; Zhang, Yin; Phillips, Preetha; Wu, Le-Nan; Chen, Xian-Qing; Zhang, Yu-Dong
2017-01-01
This study presents an improved method based on "Gorji et al. Neuroscience. 2015" by introducing a relatively new classifier-linear regression classification. Our method selects one axial slice from 3D brain image, and employed pseudo Zernike moment with maximum order of 15 to extract 256 features from each image. Finally, linear regression classification was harnessed as the classifier. The proposed approach obtains an accuracy of 97.51%, a sensitivity of 96.71%, and a specificity of 97.73%. Our method performs better than Gorji's approach and five other state-of-the-art approaches. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
New Approach To Hour-By-Hour Weather Forecast
NASA Astrophysics Data System (ADS)
Liao, Q. Q.; Wang, B.
2017-12-01
Fine hourly forecast in single station weather forecast is required in many human production and life application situations. Most previous MOS (Model Output Statistics) which used a linear regression model are hard to solve nonlinear natures of the weather prediction and forecast accuracy has not been sufficient at high temporal resolution. This study is to predict the future meteorological elements including temperature, precipitation, relative humidity and wind speed in a local region over a relatively short period of time at hourly level. By means of hour-to-hour NWP (Numeral Weather Prediction)meteorological field from Forcastio (https://darksky.net/dev/docs/forecast) and real-time instrumental observation including 29 stations in Yunnan and 3 stations in Tianjin of China from June to October 2016, predictions are made of the 24-hour hour-by-hour ahead. This study presents an ensemble approach to combine the information of instrumental observation itself and NWP. Use autoregressive-moving-average (ARMA) model to predict future values of the observation time series. Put newest NWP products into the equations derived from the multiple linear regression MOS technique. Handle residual series of MOS outputs with autoregressive (AR) model for the linear property presented in time series. Due to the complexity of non-linear property of atmospheric flow, support vector machine (SVM) is also introduced . Therefore basic data quality control and cross validation makes it able to optimize the model function parameters , and do 24 hours ahead residual reduction with AR/SVM model. Results show that AR model technique is better than corresponding multi-variant MOS regression method especially at the early 4 hours when the predictor is temperature. MOS-AR combined model which is comparable to MOS-SVM model outperform than MOS. Both of their root mean square error and correlation coefficients for 2 m temperature are reduced to 1.6 degree Celsius and 0.91 respectively. The forecast accuracy of 24- hour forecast deviation no more than 2 degree Celsius is 78.75 % for MOS-AR model and 81.23 % for AR model.
Comparison of machine-learning methods for above-ground biomass estimation based on Landsat imagery
NASA Astrophysics Data System (ADS)
Wu, Chaofan; Shen, Huanhuan; Shen, Aihua; Deng, Jinsong; Gan, Muye; Zhu, Jinxia; Xu, Hongwei; Wang, Ke
2016-07-01
Biomass is one significant biophysical parameter of a forest ecosystem, and accurate biomass estimation on the regional scale provides important information for carbon-cycle investigation and sustainable forest management. In this study, Landsat satellite imagery data combined with field-based measurements were integrated through comparisons of five regression approaches [stepwise linear regression, K-nearest neighbor, support vector regression, random forest (RF), and stochastic gradient boosting] with two different candidate variable strategies to implement the optimal spatial above-ground biomass (AGB) estimation. The results suggested that RF algorithm exhibited the best performance by 10-fold cross-validation with respect to R2 (0.63) and root-mean-square error (26.44 ton/ha). Consequently, the map of estimated AGB was generated with a mean value of 89.34 ton/ha in northwestern Zhejiang Province, China, with a similar pattern to the distribution mode of local forest species. This research indicates that machine-learning approaches associated with Landsat imagery provide an economical way for biomass estimation. Moreover, ensemble methods using all candidate variables, especially for Landsat images, provide an alternative for regional biomass simulation.
Prediction of hourly PM2.5 using a space-time support vector regression model
NASA Astrophysics Data System (ADS)
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Kwan, Johnny S H; Kung, Annie W C; Sham, Pak C
2011-09-01
Selective genotyping can increase power in quantitative trait association. One example of selective genotyping is two-tail extreme selection, but simple linear regression analysis gives a biased genetic effect estimate. Here, we present a simple correction for the bias.
Predicting residue-wise contact orders in proteins by support vector regression.
Song, Jiangning; Burrage, Kevin
2006-10-03
The residue-wise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence. It is a new kind of one-dimensional protein structure that represents the extent of long-range contacts and is considered as a generalization of contact order. Together with secondary structure, accessible surface area, the B factor, and contact number, RWCO provides comprehensive and indispensable important information to reconstructing the protein three-dimensional structure from a set of one-dimensional structural properties. Accurately predicting RWCO values could have many important applications in protein three-dimensional structure prediction and protein folding rate prediction, and give deep insights into protein sequence-structure relationships. We developed a novel approach to predict residue-wise contact order values in proteins based on support vector regression (SVR), starting from primary amino acid sequences. We explored seven different sequence encoding schemes to examine their effects on the prediction performance, including local sequence in the form of PSI-BLAST profiles, local sequence plus amino acid composition, local sequence plus molecular weight, local sequence plus secondary structure predicted by PSIPRED, local sequence plus molecular weight and amino acid composition, local sequence plus molecular weight and predicted secondary structure, and local sequence plus molecular weight, amino acid composition and predicted secondary structure. When using local sequences with multiple sequence alignments in the form of PSI-BLAST profiles, we could predict the RWCO distribution with a Pearson correlation coefficient (CC) between the predicted and observed RWCO values of 0.55, and root mean square error (RMSE) of 0.82, based on a well-defined dataset with 680 protein sequences. Moreover, by incorporating global features such as molecular weight and amino acid composition we could further improve the prediction performance with the CC to 0.57 and an RMSE of 0.79. In addition, combining the predicted secondary structure by PSIPRED was found to significantly improve the prediction performance and could yield the best prediction accuracy with a CC of 0.60 and RMSE of 0.78, which provided at least comparable performance compared with the other existing methods. The SVR method shows a prediction performance competitive with or at least comparable to the previously developed linear regression-based methods for predicting RWCO values. In contrast to support vector classification (SVC), SVR is very good at estimating the raw value profiles of the samples. The successful application of the SVR approach in this study reinforces the fact that support vector regression is a powerful tool in extracting the protein sequence-structure relationship and in estimating the protein structural profiles from amino acid sequences.
2013-01-01
application of the Hammett equation with the constants rph in the chemistry of organophosphorus compounds, Russ. Chem. Rev. 38 (1969) 795–811. [13...of oximes and OP compounds and the ability of oximes to reactivate OP- inhibited AChE. Multiple linear regression equations were analyzed using...phosphonate pairs, 21 oxime/ phosphoramidate pairs and 12 oxime/phosphate pairs. The best linear regression equation resulting from multiple regression anal
Singh, Simone R; Young, Gary J
2017-12-01
To investigate whether tax-exempt hospitals' investments in community health are associated with patterns of governmental public health spending focusing specifically on the relationship between hospitals' community benefit expenditures and the spending patterns of local health departments (LHDs). We combined data on tax-exempt hospitals' community benefit spending with data on spending by the corresponding LHD that served the county in which a hospital was located. Data were available for 2 years, 2009 and 2013. Generalized linear regressions were estimated with indicators of hospital community benefit spending as the dependent variable and LHD spending as the key independent variable. Hospital community benefit spending was unrelated to how much local public health agencies spent, per capita, on public health in their communities. Patterns of local public health spending do not appear to impact the investments of tax-exempt hospitals in community health activities. Opportunities may, however, exist for a more active engagement between the public and private sector to ensure that the expenditures of all stakeholders involved in community health improvement efforts complement one another. © Health Research and Educational Trust.
Oxygen isotope variations in phosphate of deer bones
NASA Astrophysics Data System (ADS)
Luz, Boaz; Cormie, Allison B.; Schwarcz, Henry P.
1990-06-01
Variations of δ 18O of bone phosphate (δ p) of white tailed deer were studied in samples with wide geographic distribution in North America. Bones from the same locality have similar isotopic values, and the difference between specimens (0.4‰) is not large relative to the measurement error (0.3‰). The total range of δ p values is about 12‰. This indicates that deer use water from a relatively small area, and thus their δ p indicates local environmental conditions. Multiple regression analysis between oxygen isotope composition of deer bone phosphate and of local relative humidity and precipitation (δ w) yields a high correlation coefficient (0.95). This correlation is significantly better than the linear correlation (0.81) between δ p and δ w of precipitation alone. Thus δ p depends on both isotopic composition of precipitation and on relative humidity. This is because deer obtain most of their water from leaves, the isotopic composition of which is partly controlled by relative humidity through evaporation/transpiration.
Institutional and Economic Determinants of Public Health System Performance
Mays, Glen P.; McHugh, Megan C.; Shim, Kyumin; Perry, Natalie; Lenaway, Dennis; Halverson, Paul K.; Moonesinghe, Ramal
2006-01-01
Objectives. Although a growing body of evidence demonstrates that availability and quality of essential public health services vary widely across communities, relatively little is known about the factors that give rise to these variations. We examined the association of institutional, financial, and community characteristics of local public health delivery systems and the performance of essential services. Methods. Performance measures were collected from local public health systems in 7 states and combined with secondary data sources. Multivariate, linear, and nonlinear regression models were used to estimate associations between system characteristics and the performance of essential services. Results. Performance varied significantly with the size, financial resources, and organizational structure of local public health systems, with some public health services appearing more sensitive to these characteristics than others. Staffing levels and community characteristics also appeared to be related to the performance of selected services. Conclusions. Reconfiguring the organization and financing of public health systems in some communities—such as through consolidation and enhanced intergovernmental coordination—may hold promise for improving the performance of essential services. PMID:16449584
Specialization Agreements in the Council for Mutual Economic Assistance
1988-02-01
proportions to stabilize variance (S. Weisberg, Applied Linear Regression , 2nd ed., John Wiley & Sons, New York, 1985, p. 134). If the dependent...27, 1986, p. 3. Weisberg, S., Applied Linear Regression , 2nd ed., John Wiley & Sons, New York, 1985, p. 134. Wiles, P. J., Communist International
Radio Propagation Prediction Software for Complex Mixed Path Physical Channels
2006-08-14
63 4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz 69 4.4.7. Projected Scaling to...4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz In order to construct a comprehensive numerical algorithm capable of
Due to the complexity of the processes contributing to beach bacteria concentrations, many researchers rely on statistical modeling, among which multiple linear regression (MLR) modeling is most widely used. Despite its ease of use and interpretation, there may be time dependence...
Data Transformations for Inference with Linear Regression: Clarifications and Recommendations
ERIC Educational Resources Information Center
Pek, Jolynn; Wong, Octavia; Wong, C. M.
2017-01-01
Data transformations have been promoted as a popular and easy-to-implement remedy to address the assumption of normally distributed errors (in the population) in linear regression. However, the application of data transformations introduces non-ignorable complexities which should be fully appreciated before their implementation. This paper adds to…
USING LINEAR AND POLYNOMIAL MODELS TO EXAMINE THE ENVIRONMENTAL STABILITY OF VIRUSES
The article presents the development of model equations for describing the fate of viral infectivity in environmental samples. Most of the models were based upon the use of a two-step linear regression approach. The first step employs regression of log base 10 transformed viral t...
Identifying the Factors That Influence Change in SEBD Using Logistic Regression Analysis
ERIC Educational Resources Information Center
Camilleri, Liberato; Cefai, Carmel
2013-01-01
Multiple linear regression and ANOVA models are widely used in applications since they provide effective statistical tools for assessing the relationship between a continuous dependent variable and several predictors. However these models rely heavily on linearity and normality assumptions and they do not accommodate categorical dependent…
Analytics of Radioactive Materials Released in the Fukushima Daiichi Nuclear Accident
DOE Office of Scientific and Technical Information (OSTI.GOV)
Egarievwe, Stephen U.; Nuclear Engineering Department, University of Tennessee, Knoxville, TN; Coble, Jamie B.
The 2011 Fukushima Daiichi nuclear accident in Japan resulted in the release of radioactive materials into the atmosphere, the nearby sea, and the surrounding land. Following the accident, several meteorological models were used to predict the transport of the radioactive materials to other continents such as North America and Europe. Also of high importance is the dispersion of radioactive materials locally and within Japan. Based on the International Atomic Energy Agency (IAEA) Convention on Early Notification of a nuclear accident, several radiological data sets were collected on the accident by the Japanese authorities. Among the radioactive materials monitored, are I-131more » and Cs-137 which form the major contributions to the contamination of drinking water. The radiation dose in the atmosphere was also measured. It is impractical to measure contamination and radiation dose in every place of interest. Therefore, modeling helps to predict contamination and radiation dose. Some modeling studies that have been reported in the literature include the simulation of transport and deposition of I-131 and Cs-137 from the accident, Cs-137 deposition and contamination of Japanese soils, and preliminary estimates of I-131 and Cs-137 discharged from the plant into the atmosphere. In this paper, we present statistical analytics of I-131 and Cs-137 with the goal of predicting gamma dose from the Fukushima Daiichi nuclear accident. The data sets used in our study were collected from the IAEA Fukushima Monitoring Database. As part of this study, we investigated several regression models to find the best algorithm for modeling the gamma dose. The modeling techniques used in our study include linear regression, principal component regression (PCR), partial least square (PLS) regression, and ridge regression. Our preliminary results on the first set of data showed that the linear regression model with one variable was the best with a root mean square error of 0.0133 μSv/h, compared to 0.0210 μSv/h for PCR, 0.231 μSv/h for ridge regression L-curve, 0.0856 μSv/h for PLS, and 0.0860 μSv/h for ridge regression cross validation. Complete results using the full datasets for these models will also be presented. (authors)« less
Simple and multiple linear regression: sample size considerations.
Hanley, James A
2016-11-01
The suggested "two subjects per variable" (2SPV) rule of thumb in the Austin and Steyerberg article is a chance to bring out some long-established and quite intuitive sample size considerations for both simple and multiple linear regression. This article distinguishes two of the major uses of regression models that imply very different sample size considerations, neither served well by the 2SPV rule. The first is etiological research, which contrasts mean Y levels at differing "exposure" (X) values and thus tends to focus on a single regression coefficient, possibly adjusted for confounders. The second research genre guides clinical practice. It addresses Y levels for individuals with different covariate patterns or "profiles." It focuses on the profile-specific (mean) Y levels themselves, estimating them via linear compounds of regression coefficients and covariates. By drawing on long-established closed-form variance formulae that lie beneath the standard errors in multiple regression, and by rearranging them for heuristic purposes, one arrives at quite intuitive sample size considerations for both research genres. Copyright © 2016 Elsevier Inc. All rights reserved.
Pang, Junbiao; Qin, Lei; Zhang, Chunjie; Zhang, Weigang; Huang, Qingming; Yin, Baocai
2015-12-01
Local coordinate coding (LCC) is a framework to approximate a Lipschitz smooth function by combining linear functions into a nonlinear one. For locally linear classification, LCC requires a coding scheme that heavily determines the nonlinear approximation ability, posing two main challenges: 1) the locality making faraway anchors have smaller influences on current data and 2) the flexibility balancing well between the reconstruction of current data and the locality. In this paper, we address the problem from the theoretical analysis of the simplest local coding schemes, i.e., local Gaussian coding and local student coding, and propose local Laplacian coding (LPC) to achieve the locality and the flexibility. We apply LPC into locally linear classifiers to solve diverse classification tasks. The comparable or exceeded performances of state-of-the-art methods demonstrate the effectiveness of the proposed method.
Esserman, Denise A.; Moore, Charity G.; Roth, Mary T.
2009-01-01
Older community dwelling adults often take multiple medications for numerous chronic diseases. Non-adherence to these medications can have a large public health impact. Therefore, the measurement and modeling of medication adherence in the setting of polypharmacy is an important area of research. We apply a variety of different modeling techniques (standard linear regression; weighted linear regression; adjusted linear regression; naïve logistic regression; beta-binomial (BB) regression; generalized estimating equations (GEE)) to binary medication adherence data from a study in a North Carolina based population of older adults, where each medication an individual was taking was classified as adherent or non-adherent. In addition, through simulation we compare these different methods based on Type I error rates, bias, power, empirical 95% coverage, and goodness of fit. We find that estimation and inference using GEE is robust to a wide variety of scenarios and we recommend using this in the setting of polypharmacy when adherence is dichotomously measured for multiple medications per person. PMID:20414358
Genetic Programming Transforms in Linear Regression Situations
NASA Astrophysics Data System (ADS)
Castillo, Flor; Kordon, Arthur; Villa, Carlos
The chapter summarizes the use of Genetic Programming (GP) inMultiple Linear Regression (MLR) to address multicollinearity and Lack of Fit (LOF). The basis of the proposed method is applying appropriate input transforms (model respecification) that deal with these issues while preserving the information content of the original variables. The transforms are selected from symbolic regression models with optimal trade-off between accuracy of prediction and expressional complexity, generated by multiobjective Pareto-front GP. The chapter includes a comparative study of the GP-generated transforms with Ridge Regression, a variant of ordinary Multiple Linear Regression, which has been a useful and commonly employed approach for reducing multicollinearity. The advantages of GP-generated model respecification are clearly defined and demonstrated. Some recommendations for transforms selection are given as well. The application benefits of the proposed approach are illustrated with a real industrial application in one of the broadest empirical modeling areas in manufacturing - robust inferential sensors. The chapter contributes to increasing the awareness of the potential of GP in statistical model building by MLR.
Naval Research Logistics Quarterly. Volume 28. Number 3,
1981-09-01
denotes component-wise maximum. f has antone (isotone) differences on C x D if for cl < c2 and d, < d2, NAVAL RESEARCH LOGISTICS QUARTERLY VOL. 28...or negative correlations and linear or nonlinear regressions. Given are the mo- ments to order two and, for special cases, (he regression function and...data sets. We designate this bnb distribution as G - B - N(a, 0, v). The distribution admits only of positive correlation and linear regressions
NASA Astrophysics Data System (ADS)
Jakubowski, J.; Stypulkowski, J. B.; Bernardeau, F. G.
2017-12-01
The first phase of the Abu Hamour drainage and storm tunnel was completed in early 2017. The 9.5 km long, 3.7 m diameter tunnel was excavated with two Earth Pressure Balance (EPB) Tunnel Boring Machines from Herrenknecht. TBM operation processes were monitored and recorded by Data Acquisition and Evaluation System. The authors coupled collected TBM drive data with available information on rock mass properties, cleansed, completed with secondary variables and aggregated by weeks and shifts. Correlations and descriptive statistics charts were examined. Multivariate Linear Regression and CART regression tree models linking TBM penetration rate (PR), penetration per revolution (PPR) and field penetration index (FPI) with TBM operational and geotechnical characteristics were performed for the conditions of the weak/soft rock of Doha. Both regression methods are interpretable and the data were screened with different computational approaches allowing enriched insight. The primary goal of the analysis was to investigate empirical relations between multiple explanatory and responding variables, to search for best subsets of explanatory variables and to evaluate the strength of linear and non-linear relations. For each of the penetration indices, a predictive model coupling both regression methods was built and validated. The resultant models appeared to be stronger than constituent ones and indicated an opportunity for more accurate and robust TBM performance predictions.
Spectral-Spatial Shared Linear Regression for Hyperspectral Image Classification.
Haoliang Yuan; Yuan Yan Tang
2017-04-01
Classification of the pixels in hyperspectral image (HSI) is an important task and has been popularly applied in many practical applications. Its major challenge is the high-dimensional small-sized problem. To deal with this problem, lots of subspace learning (SL) methods are developed to reduce the dimension of the pixels while preserving the important discriminant information. Motivated by ridge linear regression (RLR) framework for SL, we propose a spectral-spatial shared linear regression method (SSSLR) for extracting the feature representation. Comparing with RLR, our proposed SSSLR has the following two advantages. First, we utilize a convex set to explore the spatial structure for computing the linear projection matrix. Second, we utilize a shared structure learning model, which is formed by original data space and a hidden feature space, to learn a more discriminant linear projection matrix for classification. To optimize our proposed method, an efficient iterative algorithm is proposed. Experimental results on two popular HSI data sets, i.e., Indian Pines and Salinas demonstrate that our proposed methods outperform many SL methods.
[Psychological conditions and the influence factors of the Sichuan Three Gorges immigrations].
Cong, Jianni; Wang, Lin; Wang, Yang; Li, Ge
2009-01-01
To learn and analyze the psychological conditions and the influence factors of Sichuan immigrations so as to provide the science basis for the government. Take residents generally questionnaire, symptom checklist (SCL90), psychosocial stress survey for groups(PSSG) and social support rating scale (SSRS) four questionnaires to collect and analyze the mental conditions and influences of Sichuan immigrations and local residents by cluster stratified random sampling. There is no difference in the sex, age, marriage, culture, occupation, economy and character between immigrations and local residents. Immigrations owned medical safeguard are less than local residents (P < 0.01). The SCL-90 (symptom checklist 90) and PSSG (psychosocial stress survey for groups) scores of Sichuan immigrations are higher than the local residents (P < 0.01). Social support of immigrations is worse than local residents (P < 0.01). 56.00% occupations are changed after the immigration. Multiple linear regression analysis that whether immigrates, the age, the marriage, the occupation, psychological stress and social support of migrants relate to the mental health of migrants. The mental health of Sichuan immigrations is bad, so the government should strengthen their financial support and pay attention to their humanist concern.
Simple linear and multivariate regression models.
Rodríguez del Águila, M M; Benítez-Parejo, N
2011-01-01
In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.
Monopole and dipole estimation for multi-frequency sky maps by linear regression
NASA Astrophysics Data System (ADS)
Wehus, I. K.; Fuskeland, U.; Eriksen, H. K.; Banday, A. J.; Dickinson, C.; Ghosh, T.; Górski, K. M.; Lawrence, C. R.; Leahy, J. P.; Maino, D.; Reich, P.; Reich, W.
2017-01-01
We describe a simple but efficient method for deriving a consistent set of monopole and dipole corrections for multi-frequency sky map data sets, allowing robust parametric component separation with the same data set. The computational core of this method is linear regression between pairs of frequency maps, often called T-T plots. Individual contributions from monopole and dipole terms are determined by performing the regression locally in patches on the sky, while the degeneracy between different frequencies is lifted whenever the dominant foreground component exhibits a significant spatial spectral index variation. Based on this method, we present two different, but each internally consistent, sets of monopole and dipole coefficients for the nine-year WMAP, Planck 2013, SFD 100 μm, Haslam 408 MHz and Reich & Reich 1420 MHz maps. The two sets have been derived with different analysis assumptions and data selection, and provide an estimate of residual systematic uncertainties. In general, our values are in good agreement with previously published results. Among the most notable results are a relative dipole between the WMAP and Planck experiments of 10-15μK (depending on frequency), an estimate of the 408 MHz map monopole of 8.9 ± 1.3 K, and a non-zero dipole in the 1420 MHz map of 0.15 ± 0.03 K pointing towards Galactic coordinates (l,b) = (308°,-36°) ± 14°. These values represent the sum of any instrumental and data processing offsets, as well as any Galactic or extra-Galactic component that is spectrally uniform over the full sky.
Narayanan, Neethu; Gupta, Suman; Gajbhiye, V T; Manjaiah, K M
2017-04-01
A carboxy methyl cellulose-nano organoclay (nano montmorillonite modified with 35-45 wt % dimethyl dialkyl (C 14 -C 18 ) amine (DMDA)) composite was prepared by solution intercalation method. The prepared composite was characterized by infrared spectroscopy (FTIR), X-Ray diffraction spectroscopy (XRD) and scanning electron microscopy (SEM). The composite was utilized for its pesticide sorption efficiency for atrazine, imidacloprid and thiamethoxam. The sorption data was fitted into Langmuir and Freundlich isotherms using linear and non linear methods. The linear regression method suggested best fitting of sorption data into Type II Langmuir and Freundlich isotherms. In order to avoid the bias resulting from linearization, seven different error parameters were also analyzed by non linear regression method. The non linear error analysis suggested that the sorption data fitted well into Langmuir model rather than in Freundlich model. The maximum sorption capacity, Q 0 (μg/g) was given by imidacloprid (2000) followed by thiamethoxam (1667) and atrazine (1429). The study suggests that the degree of determination of linear regression alone cannot be used for comparing the best fitting of Langmuir and Freundlich models and non-linear error analysis needs to be done to avoid inaccurate results. Copyright © 2017 Elsevier Ltd. All rights reserved.
London Measure of Unplanned Pregnancy: guidance for its use as an outcome measure
Hall, Jennifer A; Barrett, Geraldine; Copas, Andrew; Stephenson, Judith
2017-01-01
Background The London Measure of Unplanned Pregnancy (LMUP) is a psychometrically validated measure of the degree of intention of a current or recent pregnancy. The LMUP is increasingly being used worldwide, and can be used to evaluate family planning or preconception care programs. However, beyond recommending the use of the full LMUP scale, there is no published guidance on how to use the LMUP as an outcome measure. Ordinal logistic regression has been recommended informally, but studies published to date have all used binary logistic regression and dichotomized the scale at different cut points. There is thus a need for evidence-based guidance to provide a standardized methodology for multivariate analysis and to enable comparison of results. This paper makes recommendations for the regression method for analysis of the LMUP as an outcome measure. Materials and methods Data collected from 4,244 pregnant women in Malawi were used to compare five regression methods: linear, logistic with two cut points, and ordinal logistic with either the full or grouped LMUP score. The recommendations were then tested on the original UK LMUP data. Results There were small but no important differences in the findings across the regression models. Logistic regression resulted in the largest loss of information, and assumptions were violated for the linear and ordinal logistic regression. Consequently, robust standard errors were used for linear regression and a partial proportional odds ordinal logistic regression model attempted. The latter could only be fitted for grouped LMUP score. Conclusion We recommend the linear regression model with robust standard errors to make full use of the LMUP score when analyzed as an outcome measure. Ordinal logistic regression could be considered, but a partial proportional odds model with grouped LMUP score may be required. Logistic regression is the least-favored option, due to the loss of information. For logistic regression, the cut point for un/planned pregnancy should be between nine and ten. These recommendations will standardize the analysis of LMUP data and enhance comparability of results across studies. PMID:28435343
1994-09-01
Institute of Technology, Wright- Patterson AFB OH, January 1994. 4. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 5...Technology, Wright-Patterson AFB OH 5 April 1994. 29. Neter, John and others. Applied Linear Regression Models. Boston: Irwin, 1989. 30. Office of
An Evaluation of the Automated Cost Estimating Integrated Tools (ACEIT) System
1989-09-01
residual and it is described as the residual divided by its standard deviation (13:App A,17). Neter, Wasserman, and Kutner, in Applied Linear Regression Models...others. Applied Linear Regression Models. Homewood IL: Irwin, 1983. 19. Raduchel, William J. "A Professional’s Perspective on User-Friendliness," Byte
A Simple and Convenient Method of Multiple Linear Regression to Calculate Iodine Molecular Constants
ERIC Educational Resources Information Center
Cooper, Paul D.
2010-01-01
A new procedure using a student-friendly least-squares multiple linear-regression technique utilizing a function within Microsoft Excel is described that enables students to calculate molecular constants from the vibronic spectrum of iodine. This method is advantageous pedagogically as it calculates molecular constants for ground and excited…
Conjoint Analysis: A Study of the Effects of Using Person Variables.
ERIC Educational Resources Information Center
Fraas, John W.; Newman, Isadore
Three statistical techniques--conjoint analysis, a multiple linear regression model, and a multiple linear regression model with a surrogate person variable--were used to estimate the relative importance of five university attributes for students in the process of selecting a college. The five attributes include: availability and variety of…
Fitting program for linear regressions according to Mahon (1996)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trappitsch, Reto G.
2018-01-09
This program takes the users' Input data and fits a linear regression to it using the prescription presented by Mahon (1996). Compared to the commonly used York fit, this method has the correct prescription for measurement error propagation. This software should facilitate the proper fitting of measurements with a simple Interface.
How Robust Is Linear Regression with Dummy Variables?
ERIC Educational Resources Information Center
Blankmeyer, Eric
2006-01-01
Researchers in education and the social sciences make extensive use of linear regression models in which the dependent variable is continuous-valued while the explanatory variables are a combination of continuous-valued regressors and dummy variables. The dummies partition the sample into groups, some of which may contain only a few observations.…
Revisiting the Scale-Invariant, Two-Dimensional Linear Regression Method
ERIC Educational Resources Information Center
Patzer, A. Beate C.; Bauer, Hans; Chang, Christian; Bolte, Jan; Su¨lzle, Detlev
2018-01-01
The scale-invariant way to analyze two-dimensional experimental and theoretical data with statistical errors in both the independent and dependent variables is revisited by using what we call the triangular linear regression method. This is compared to the standard least-squares fit approach by applying it to typical simple sets of example data…
ERIC Educational Resources Information Center
Thompson, Russel L.
Homoscedasticity is an important assumption of linear regression. This paper explains what it is and why it is important to the researcher. Graphical and mathematical methods for testing the homoscedasticity assumption are demonstrated. Sources of homoscedasticity and types of homoscedasticity are discussed, and methods for correction are…
On the null distribution of Bayes factors in linear regression
USDA-ARS?s Scientific Manuscript database
We show that under the null, the 2 log (Bayes factor) is asymptotically distributed as a weighted sum of chi-squared random variables with a shifted mean. This claim holds for Bayesian multi-linear regression with a family of conjugate priors, namely, the normal-inverse-gamma prior, the g-prior, and...
Common pitfalls in statistical analysis: Linear regression analysis
Aggarwal, Rakesh; Ranganathan, Priya
2017-01-01
In a previous article in this series, we explained correlation analysis which describes the strength of relationship between two continuous variables. In this article, we deal with linear regression analysis which predicts the value of one continuous variable from another. We also discuss the assumptions and pitfalls associated with this analysis. PMID:28447022
Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.
Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo
2015-08-01
Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.
NASA Astrophysics Data System (ADS)
Thompson, A. P.; Swiler, L. P.; Trott, C. R.; Foiles, S. M.; Tucker, G. J.
2015-03-01
We present a new interatomic potential for solids and liquids called Spectral Neighbor Analysis Potential (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected onto a basis of hyperspherical harmonics in four dimensions. The bispectrum components are the same bond-orientational order parameters employed by the GAP potential [1]. The SNAP potential, unlike GAP, assumes a linear relationship between atom energy and bispectrum components. The linear SNAP coefficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. We demonstrate that a previously unnoticed symmetry property can be exploited to reduce the computational cost of the force calculations by more than one order of magnitude. We present results for a SNAP potential for tantalum, showing that it accurately reproduces a range of commonly calculated properties of both the crystalline solid and the liquid phases. In addition, unlike simpler existing potentials, SNAP correctly predicts the energy barrier for screw dislocation migration in BCC tantalum.
Nam, R K; Klotz, L H; Jewett, M A; Danjoux, C; Trachtenberg, J
1998-01-01
To study the rate of change in prostate specific antigen (PSA velocity) in patients with prostate cancer initially managed by 'watchful waiting'. Serial PSA levels were determined in 141 patients with prostate cancer confirmed by biopsy, who were initially managed expectantly and enrolled between May 1990 and December 1995. Sixty-seven patients eventually underwent surgery (mean age 59 years) because they chose it (the decision for surgery was not based on PSA velocity). A cohort of 74 patients remained on 'watchful waiting' (mean age 69 years). Linear regression and logarithmic transformations were used to segregate those patients who showed a rapid rise, defined as a > 50% rise in PSA per year (or a doubling time of < 2 years) and designated 'rapid risers'. An initial analysis based on a minimum of two PSA values showed that 31% were rapid risers. Only 15% of patients with more than three serial PSA determinations over > or = 6 months showed a rapid rise in PSA level. There was no advantage of log-linear analysis over linear regression models. Three serial PSA determinations over > or = 6 months in patients with clinically localized prostate cancer identifies a subset (15%) of patients with a rapidly rising PSA level. Shorter PSA surveillance with fewer PSA values may falsely identify patients with rapid rises in PSA level. However, further follow-up is required to determine if a rapid rise in PSA level identifies a subset of patients with an aggressive biological phenotype who are either still curable or who have already progressed to incurability through metastatic disease.
Lead in teeth from lead-dosed goats: Microdistribution and relationship to the cumulative lead dose
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bellis, David J.; Hetter, Katherine M.; Jones, Joseph
2008-01-15
Teeth are commonly used as a biomarker of long-term lead exposure. There appear to be few data, however, on the content or distribution of lead in teeth where data on specific lead intake (dose) are also available. This study describes the analysis of a convenience sample of teeth from animals that were dosed with lead for other purposes, i.e., a proficiency testing program for blood lead. Lead concentration of whole teeth obtained from 23 animals, as determined by atomic absorption spectrometry, varied from 0.6 to 80 {mu}g g{sup -1}. Linear regression of whole tooth lead ({mu}g g{sup -1}) on themore » cumulative lead dose received by the animal (g) yielded a slope of 1.2, with r{sup 2}=0.647 (p<0.0001). Laser ablation inductively coupled plasma mass spectrometry was employed to determine lead content at micrometer scale spatial resolution in the teeth of seven goats representing the dosing range. Highly localized concentrations of lead, ranging from about 10 to 2000 {mu}g g{sup -1}, were found in circumpulpal dentine. Linear regression of circumpulpal lead ({mu}g g{sup -1}) on cumulative lead dose (g) yielded a slope of 23 with r{sup 2}=0.961 (p=0.0001). The data indicated that whole tooth lead, and especially circumpulpal lead, of dosed goats increased linearly with cumulative lead exposure. These data suggest that circumpulpal dentine is a better biomarker of cumulative lead exposure than is whole tooth lead, at least for lead-dosed goats.« less
NASA Astrophysics Data System (ADS)
Tao, Yulong; Miao, Yunshui; Han, Jiaqi; Yan, Feiyun
2018-05-01
Aiming at the low accuracy of traditional forecasting methods such as linear regression method, this paper presents a prediction method for predicting the relationship between bridge steel box girder and its displacement with wavelet neural network. Compared with traditional forecasting methods, this scheme has better local characteristics and learning ability, which greatly improves the prediction ability of deformation. Through analysis of the instance and found that after compared with the traditional prediction method based on wavelet neural network, the rigid beam deformation prediction accuracy is higher, and is superior to the BP neural network prediction results, conform to the actual demand of engineering design.
Puente, Celso
1976-01-01
Water-level, springflow, and streamflow data were used to develop simple and multiple linear-regression equations for use in estimating water levels in wells and the flow of three major springs in the Edwards aquifer in the eastern San Antonio area. The equations provide daily, monthly, and annual estimates that compare very favorably with observed data. Analyses of geologic and hydrologic data indicate that the water discharged by the major springs is supplied primarily by regional underflow from the west and southwest and by local recharge in the infiltration area in northern Bexar, Comal, and Hays Counties.
NASA Astrophysics Data System (ADS)
Wu, Cheng; Zhen Yu, Jian
2018-03-01
Linear regression techniques are widely used in atmospheric science, but they are often improperly applied due to lack of consideration or inappropriate handling of measurement uncertainty. In this work, numerical experiments are performed to evaluate the performance of five linear regression techniques, significantly extending previous works by Chu and Saylor. The five techniques are ordinary least squares (OLS), Deming regression (DR), orthogonal distance regression (ODR), weighted ODR (WODR), and York regression (YR). We first introduce a new data generation scheme that employs the Mersenne twister (MT) pseudorandom number generator. The numerical simulations are also improved by (a) refining the parameterization of nonlinear measurement uncertainties, (b) inclusion of a linear measurement uncertainty, and (c) inclusion of WODR for comparison. Results show that DR, WODR and YR produce an accurate slope, but the intercept by WODR and YR is overestimated and the degree of bias is more pronounced with a low R2 XY dataset. The importance of a properly weighting parameter λ in DR is investigated by sensitivity tests, and it is found that an improper λ in DR can lead to a bias in both the slope and intercept estimation. Because the λ calculation depends on the actual form of the measurement error, it is essential to determine the exact form of measurement error in the XY data during the measurement stage. If a priori error in one of the variables is unknown, or the measurement error described cannot be trusted, DR, WODR and YR can provide the least biases in slope and intercept among all tested regression techniques. For these reasons, DR, WODR and YR are recommended for atmospheric studies when both X and Y data have measurement errors. An Igor Pro-based program (Scatter Plot) was developed to facilitate the implementation of error-in-variables regressions.
Afantitis, Antreas; Melagraki, Georgia; Sarimveis, Haralambos; Koutentis, Panayiotis A; Markopoulos, John; Igglessi-Markopoulou, Olga
2006-08-01
A quantitative-structure activity relationship was obtained by applying Multiple Linear Regression Analysis to a series of 80 1-[2-hydroxyethoxy-methyl]-6-(phenylthio) thymine (HEPT) derivatives with significant anti-HIV activity. For the selection of the best among 37 different descriptors, the Elimination Selection Stepwise Regression Method (ES-SWR) was utilized. The resulting QSAR model (R (2) (CV) = 0.8160; S (PRESS) = 0.5680) proved to be very accurate both in training and predictive stages.
Wavelet regression model in forecasting crude oil price
NASA Astrophysics Data System (ADS)
Hamid, Mohd Helmie; Shabri, Ani
2017-05-01
This study presents the performance of wavelet multiple linear regression (WMLR) technique in daily crude oil forecasting. WMLR model was developed by integrating the discrete wavelet transform (DWT) and multiple linear regression (MLR) model. The original time series was decomposed to sub-time series with different scales by wavelet theory. Correlation analysis was conducted to assist in the selection of optimal decomposed components as inputs for the WMLR model. The daily WTI crude oil price series has been used in this study to test the prediction capability of the proposed model. The forecasting performance of WMLR model were also compared with regular multiple linear regression (MLR), Autoregressive Moving Average (ARIMA) and Generalized Autoregressive Conditional Heteroscedasticity (GARCH) using root mean square errors (RMSE) and mean absolute errors (MAE). Based on the experimental results, it appears that the WMLR model performs better than the other forecasting technique tested in this study.
Satellite remote sensing of fine particulate air pollutants over Indian mega cities
NASA Astrophysics Data System (ADS)
Sreekanth, V.; Mahesh, B.; Niranjan, K.
2017-11-01
In the backdrop of the need for high spatio-temporal resolution data on PM2.5 mass concentrations for health and epidemiological studies over India, empirical relations between Aerosol Optical Depth (AOD) and PM2.5 mass concentrations are established over five Indian mega cities. These relations are sought to predict the surface PM2.5 mass concentrations from high resolution columnar AOD datasets. Current study utilizes multi-city public domain PM2.5 data (from US Consulate and Embassy's air monitoring program) and MODIS AOD, spanning for almost four years. PM2.5 is found to be positively correlated with AOD. Station-wise linear regression analysis has shown spatially varying regression coefficients. Similar analysis has been repeated by eliminating data from the elevated aerosol prone seasons, which has improved the correlation coefficient. The impact of the day to day variability in the local meteorological conditions on the AOD-PM2.5 relationship has been explored by performing a multiple regression analysis. A cross-validation approach for the multiple regression analysis considering three years of data as training dataset and one-year data as validation dataset yielded an R value of ∼0.63. The study was concluded by discussing the factors which can improve the relationship.
The utility of gravity and water-level monitoring at alluvial aquifer wells in southern Arizona
Pool, D.R.
2008-01-01
Coincident monitoring of gravity and water levels at 39 wells in southern Arizona indicate that water-level change might not be a reliable indicator of aquifer-storage change for alluvial aquifer systems. One reason is that water levels in wells that are screened across single or multiple aquifers might not represent the hydraulic head and storage change in a local unconfined aquifer. Gravity estimates of aquifer-storage change can be approximated as a one-dimensional feature except near some withdrawal wells and recharge sources. The aquifer storage coefficient is estimated by the linear regression slope of storage change (estimated using gravity methods) and water-level change. Nonaquifer storage change that does not percolate to the aquifer can be significant, greater than 3 ??Gal, when water is held in the root zone during brief periods following extreme rates of precipitation. Monitor-ing of storage change using gravity methods at wells also can improve understanding of local hydrogeologic conditions. In the study area, confined aquifer conditions are likely at three wells where large water-level variations were accompanied by little gravity change. Unconfined conditions were indicated at 15 wells where significant water-level and gravity change were positively linearly correlated. Good positive linear correlations resulted in extremely large specific-yield values, greater than 0.35, at seven wells where it is likely that significant ephemeral streamflow infiltration resulted in unsaturated storage change. Poor or negative linear correlations indicate the occurrence of confined, multiple, or perched aquifers. Monitoring of a multiple compressible aquifer system at one well resulted in negative correlation of rising water levels and subsidence-corrected gravity change, which suggests that water-level trends at the well are not a good indicatior of overall storage change. ?? 2008 Society of Exploration Geophysicists. All rights reserved.
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
van de Kassteele, Jan; Zwakhals, Laurens; Breugelmans, Oscar; Ameling, Caroline; van den Brink, Carolien
2017-07-01
Local policy makers increasingly need information on health-related indicators at smaller geographic levels like districts or neighbourhoods. Although more large data sources have become available, direct estimates of the prevalence of a health-related indicator cannot be produced for neighbourhoods for which only small samples or no samples are available. Small area estimation provides a solution, but unit-level models for binary-valued outcomes that can handle both non-linear effects of the predictors and spatially correlated random effects in a unified framework are rarely encountered. We used data on 26 binary-valued health-related indicators collected on 387,195 persons in the Netherlands. We associated the health-related indicators at the individual level with a set of 12 predictors obtained from national registry data. We formulated a structured additive regression model for small area estimation. The model captured potential non-linear relations between the predictors and the outcome through additive terms in a functional form using penalized splines and included a term that accounted for spatially correlated heterogeneity between neighbourhoods. The registry data were used to predict individual outcomes which in turn are aggregated into higher geographical levels, i.e. neighbourhoods. We validated our method by comparing the estimated prevalences with observed prevalences at the individual level and by comparing the estimated prevalences with direct estimates obtained by weighting methods at municipality level. We estimated the prevalence of the 26 health-related indicators for 415 municipalities, 2599 districts and 11,432 neighbourhoods in the Netherlands. We illustrate our method on overweight data and show that there are distinct geographic patterns in the overweight prevalence. Calibration plots show that the estimated prevalences agree very well with observed prevalences at the individual level. The estimated prevalences agree reasonably well with the direct estimates at the municipal level. Structured additive regression is a useful tool to provide small area estimates in a unified framework. We are able to produce valid nationwide small area estimates of 26 health-related indicators at neighbourhood level in the Netherlands. The results can be used for local policy makers to make appropriate health policy decisions.
Ding, Chuan; Chen, Peng; Jiao, Junfeng
2018-03-01
Although a growing body of literature focuses on the relationship between the built environment and pedestrian crashes, limited evidence is provided about the relative importance of many built environment attributes by accounting for their mutual interaction effects and their non-linear effects on automobile-involved pedestrian crashes. This study adopts the approach of Multiple Additive Poisson Regression Trees (MAPRT) to fill such gaps using pedestrian collision data collected from Seattle, Washington. Traffic analysis zones are chosen as the analytical unit. The effects of various factors on pedestrian crash frequency investigated include characteristics the of road network, street elements, land use patterns, and traffic demand. Density and the degree of mixed land use have major effects on pedestrian crash frequency, accounting for approximately 66% of the effects in total. More importantly, some factors show clear non-linear relationships with pedestrian crash frequency, challenging the linearity assumption commonly used in existing studies which employ statistical models. With various accurately identified non-linear relationships between the built environment and pedestrian crashes, this study suggests local agencies to adopt geo-spatial differentiated policies to establish a safe walking environment. These findings, especially the effective ranges of the built environment, provide evidence to support for transport and land use planning, policy recommendations, and road safety programs. Copyright © 2018 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Rabidas, Rinku; Midya, Abhishek; Chakraborty, Jayasree; Sadhu, Anup; Arif, Wasim
2018-02-01
In this paper, Curvelet based local attributes, Curvelet-Local configuration pattern (C-LCP), is introduced for the characterization of mammographic masses as benign or malignant. Amid different anomalies such as micro- calcification, bilateral asymmetry, architectural distortion, and masses, the reason for targeting the mass lesions is due to their variation in shape, size, and margin which makes the diagnosis a challenging task. Being efficient in classification, multi-resolution property of the Curvelet transform is exploited and local information is extracted from the coefficients of each subband using Local configuration pattern (LCP). The microscopic measures in concatenation with the local textural information provide more discriminating capability than individual. The measures embody the magnitude information along with the pixel-wise relationships among the neighboring pixels. The performance analysis is conducted with 200 mammograms of the DDSM database containing 100 mass cases of each benign and malignant. The optimal set of features is acquired via stepwise logistic regression method and the classification is carried out with Fisher linear discriminant analysis. The best area under the receiver operating characteristic curve and accuracy of 0.95 and 87.55% are achieved with the proposed method, which is further compared with some of the state-of-the-art competing methods.
A new method for wind speed forecasting based on copula theory.
Wang, Yuankun; Ma, Huiqun; Wang, Dong; Wang, Guizuo; Wu, Jichun; Bian, Jinyu; Liu, Jiufu
2018-01-01
How to determine representative wind speed is crucial in wind resource assessment. Accurate wind resource assessments are important to wind farms development. Linear regressions are usually used to obtain the representative wind speed. However, terrain flexibility of wind farm and long distance between wind speed sites often lead to low correlation. In this study, copula method is used to determine the representative year's wind speed in wind farm by interpreting the interaction of the local wind farm and the meteorological station. The result shows that the method proposed here can not only determine the relationship between the local anemometric tower and nearby meteorological station through Kendall's tau, but also determine the joint distribution without assuming the variables to be independent. Moreover, the representative wind data can be obtained by the conditional distribution much more reasonably. We hope this study could provide scientific reference for accurate wind resource assessments. Copyright © 2017 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Zeng, Xiang-Yang; Wang, Shu-Guang; Gao, Li-Ping
2010-09-01
As the basic data for virtual auditory technology, head-related transfer function (HRTF) has many applications in the areas of room acoustic modeling, spatial hearing and multimedia. How to individualize HRTF fast and effectively has become an opening problem at present. Based on the similarity and relativity of anthropometric structures, a hybrid HRTF customization algorithm, which has combined the method of principal component analysis (PCA), multiple linear regression (MLR) and database matching (DM), has been presented in this paper. The HRTFs selected by both the best match and the worst match have been applied into obtaining binaurally auralized sounds, which are then used for subjective listening experiments and the results are compared. For the area in the horizontal plane, the localization results have shown that the selection of HRTFs can enhance the localization accuracy and can also abate the problem of front-back confusion.
Commuting to work: RN travel time to employment in rural and urban areas.
Rosenberg, Marie-Claire; Corcoran, Sean P; Kovner, Christine; Brewer, Carol
2011-02-01
To investigate the variation in average daily travel time to work among registered nurses (RNs) living in urban, suburban, and rural areas. We examine how travel time varies across RN characteristics, job setting, and availability of local employment opportunities. Descriptive statistics and linear regression using a 5% sample from the 2000 Census and a longitudinal survey of newly licensed RNs (NLRN). Travel time for NLRN respondents was estimated using geographic information systems (GIS) software. In the NLRN, rural nurses and those living in small towns had significantly longer average commute times. Young married RNs and RNs with children also tended to have longer commute times, as did RNs employed by hospitals. The findings indicate that travel time to work varies significantly across locale types. Further research is needed to understand whether and to what extent lengthy commute times impact RN workforce needs in rural and urban areas.
Post-processing through linear regression
NASA Astrophysics Data System (ADS)
van Schaeybroeck, B.; Vannitsem, S.
2011-03-01
Various post-processing techniques are compared for both deterministic and ensemble forecasts, all based on linear regression between forecast data and observations. In order to evaluate the quality of the regression methods, three criteria are proposed, related to the effective correction of forecast error, the optimal variability of the corrected forecast and multicollinearity. The regression schemes under consideration include the ordinary least-square (OLS) method, a new time-dependent Tikhonov regularization (TDTR) method, the total least-square method, a new geometric-mean regression (GM), a recently introduced error-in-variables (EVMOS) method and, finally, a "best member" OLS method. The advantages and drawbacks of each method are clarified. These techniques are applied in the context of the 63 Lorenz system, whose model version is affected by both initial condition and model errors. For short forecast lead times, the number and choice of predictors plays an important role. Contrarily to the other techniques, GM degrades when the number of predictors increases. At intermediate lead times, linear regression is unable to provide corrections to the forecast and can sometimes degrade the performance (GM and the best member OLS with noise). At long lead times the regression schemes (EVMOS, TDTR) which yield the correct variability and the largest correlation between ensemble error and spread, should be preferred.
Linear regression metamodeling as a tool to summarize and present simulation model results.
Jalal, Hawre; Dowd, Bryan; Sainfort, François; Kuntz, Karen M
2013-10-01
Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
Aptel, Florent; Sayous, Romain; Fortoul, Vincent; Beccat, Sylvain; Denis, Philippe
2010-12-01
To evaluate and compare the regional relationships between visual field sensitivity and retinal nerve fiber layer (RNFL) thickness as measured by spectral-domain optical coherence tomography (OCT) and scanning laser polarimetry. Prospective cross-sectional study. One hundred and twenty eyes of 120 patients (40 with healthy eyes, 40 with suspected glaucoma, and 40 with glaucoma) were tested on Cirrus-OCT, GDx VCC, and standard automated perimetry. Raw data on RNFL thickness were extracted for 256 peripapillary sectors of 1.40625 degrees each for the OCT measurement ellipse and 64 peripapillary sectors of 5.625 degrees each for the GDx VCC measurement ellipse. Correlations between peripapillary RNFL thickness in 6 sectors and visual field sensitivity in the 6 corresponding areas were evaluated using linear and logarithmic regression analysis. Receiver operating curve areas were calculated for each instrument. With spectral-domain OCT, the correlations (r(2)) between RNFL thickness and visual field sensitivity ranged from 0.082 (nasal RNFL and corresponding visual field area, linear regression) to 0.726 (supratemporal RNFL and corresponding visual field area, logarithmic regression). By comparison, with GDx-VCC, the correlations ranged from 0.062 (temporal RNFL and corresponding visual field area, linear regression) to 0.362 (supratemporal RNFL and corresponding visual field area, logarithmic regression). In pairwise comparisons, these structure-function correlations were generally stronger with spectral-domain OCT than with GDx VCC and with logarithmic regression than with linear regression. The largest areas under the receiver operating curve were seen for OCT superior thickness (0.963 ± 0.022; P < .001) in eyes with glaucoma and for OCT average thickness (0.888 ± 0.072; P < .001) in eyes with suspected glaucoma. The structure-function relationship was significantly stronger with spectral-domain OCT than with scanning laser polarimetry, and was better expressed logarithmically than linearly. Measurements with these 2 instruments should not be considered to be interchangeable. Copyright © 2010 Elsevier Inc. All rights reserved.
ERIC Educational Resources Information Center
Rule, David L.
Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…
Unit Cohesion and the Surface Navy: Does Cohesion Affect Performance
1989-12-01
v. 68, 1968. Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. Rand Corporation R-2607...Neter, J., Wasserman, W., and Kutner, M. H., Applied Linear Regression Models, 2d ed., Boston, MA: Irwin, 1989. SAS User’s Guide: Basics, Version 5 ed
1990-03-01
and M.H. Knuter. Applied Linear Regression Models. Homewood IL: Richard D. Erwin Inc., 1983. Pritsker, A. Alan B. Introduction to Simulation and SLAM...Control Variates in Simulation," European Journal of Operational Research, 42: (1989). Neter, J., W. Wasserman, and M.H. Xnuter. Applied Linear Regression Models
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Calibrated Peer Review for Interpreting Linear Regression Parameters: Results from a Graduate Course
ERIC Educational Resources Information Center
Enders, Felicity B.; Jenkins, Sarah; Hoverman, Verna
2010-01-01
Biostatistics is traditionally a difficult subject for students to learn. While the mathematical aspects are challenging, it can also be demanding for students to learn the exact language to use to correctly interpret statistical results. In particular, correctly interpreting the parameters from linear regression is both a vital tool and a…
ERIC Educational Resources Information Center
Richter, Tobias
2006-01-01
Most reading time studies using naturalistic texts yield data sets characterized by a multilevel structure: Sentences (sentence level) are nested within persons (person level). In contrast to analysis of variance and multiple regression techniques, hierarchical linear models take the multilevel structure of reading time data into account. They…
Some Applied Research Concerns Using Multiple Linear Regression Analysis.
ERIC Educational Resources Information Center
Newman, Isadore; Fraas, John W.
The intention of this paper is to provide an overall reference on how a researcher can apply multiple linear regression in order to utilize the advantages that it has to offer. The advantages and some concerns expressed about the technique are examined. A number of practical ways by which researchers can deal with such concerns as…
ERIC Educational Resources Information Center
Nelson, Dean
2009-01-01
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…
Quantum State Tomography via Linear Regression Estimation
Qi, Bo; Hou, Zhibo; Li, Li; Dong, Daoyi; Xiang, Guoyong; Guo, Guangcan
2013-01-01
A simple yet efficient state reconstruction algorithm of linear regression estimation (LRE) is presented for quantum state tomography. In this method, quantum state reconstruction is converted into a parameter estimation problem of a linear regression model and the least-squares method is employed to estimate the unknown parameters. An asymptotic mean squared error (MSE) upper bound for all possible states to be estimated is given analytically, which depends explicitly upon the involved measurement bases. This analytical MSE upper bound can guide one to choose optimal measurement sets. The computational complexity of LRE is O(d4) where d is the dimension of the quantum state. Numerical examples show that LRE is much faster than maximum-likelihood estimation for quantum state tomography. PMID:24336519
Applications of statistics to medical science, III. Correlation and regression.
Watanabe, Hiroshi
2012-01-01
In this third part of a series surveying medical statistics, the concepts of correlation and regression are reviewed. In particular, methods of linear regression and logistic regression are discussed. Arguments related to survival analysis will be made in a subsequent paper.
A phenomenological biological dose model for proton therapy based on linear energy transfer spectra.
Rørvik, Eivind; Thörnqvist, Sara; Stokkevåg, Camilla H; Dahle, Tordis J; Fjaera, Lars Fredrik; Ytre-Hauge, Kristian S
2017-06-01
The relative biological effectiveness (RBE) of protons varies with the radiation quality, quantified by the linear energy transfer (LET). Most phenomenological models employ a linear dependency of the dose-averaged LET (LET d ) to calculate the biological dose. However, several experiments have indicated a possible non-linear trend. Our aim was to investigate if biological dose models including non-linear LET dependencies should be considered, by introducing a LET spectrum based dose model. The RBE-LET relationship was investigated by fitting of polynomials from 1st to 5th degree to a database of 85 data points from aerobic in vitro experiments. We included both unweighted and weighted regression, the latter taking into account experimental uncertainties. Statistical testing was performed to decide whether higher degree polynomials provided better fits to the data as compared to lower degrees. The newly developed models were compared to three published LET d based models for a simulated spread out Bragg peak (SOBP) scenario. The statistical analysis of the weighted regression analysis favored a non-linear RBE-LET relationship, with the quartic polynomial found to best represent the experimental data (P = 0.010). The results of the unweighted regression analysis were on the borderline of statistical significance for non-linear functions (P = 0.053), and with the current database a linear dependency could not be rejected. For the SOBP scenario, the weighted non-linear model estimated a similar mean RBE value (1.14) compared to the three established models (1.13-1.17). The unweighted model calculated a considerably higher RBE value (1.22). The analysis indicated that non-linear models could give a better representation of the RBE-LET relationship. However, this is not decisive, as inclusion of the experimental uncertainties in the regression analysis had a significant impact on the determination and ranking of the models. As differences between the models were observed for the SOBP scenario, both non-linear LET spectrum- and linear LET d based models should be further evaluated in clinically realistic scenarios. © 2017 American Association of Physicists in Medicine.
Goovaerts, P; Albuquerque, Teresa; Antunes, Margarida
2016-11-01
This paper describes a multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration, with an application to an abandoned sedimentary gold mining region in Portugal. The main challenge was the existence of only a dozen gold measurements confined to the grounds of the old gold mines, which precluded the application of traditional interpolation techniques, such as cokriging. The analysis could, however, capitalize on 376 stream sediment samples that were analyzed for twenty two elements. Gold (Au) was first predicted at all 376 locations using linear regression (R 2 =0.798) and four metals (Fe, As, Sn and W), which are known to be mostly associated with the local gold's paragenesis. One hundred realizations of the spatial distribution of gold content were generated using sequential indicator simulation and a soft indicator coding of regression estimates, to supplement the hard indicator coding of gold measurements. Each simulated map then underwent a local cluster analysis to identify significant aggregates of low or high values. The one hundred classified maps were processed to derive the most likely classification of each simulated node and the associated probability of occurrence. Examining the distribution of the hot-spots and cold-spots reveals a clear enrichment in Au along the Erges River downstream from the old sedimentary mineralization.
Regression of non-linear coupling of noise in LIGO detectors
NASA Astrophysics Data System (ADS)
Da Silva Costa, C. F.; Billman, C.; Effler, A.; Klimenko, S.; Cheng, H.-P.
2018-03-01
In 2015, after their upgrade, the advanced Laser Interferometer Gravitational-Wave Observatory (LIGO) detectors started acquiring data. The effort to improve their sensitivity has never stopped since then. The goal to achieve design sensitivity is challenging. Environmental and instrumental noise couple to the detector output with different, linear and non-linear, coupling mechanisms. The noise regression method we use is based on the Wiener–Kolmogorov filter, which uses witness channels to make noise predictions. We present here how this method helped to determine complex non-linear noise couplings in the output mode cleaner and in the mirror suspension system of the LIGO detector.
Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan
2012-12-01
A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Yalcin, Semra; Leroux, Shawn James
2018-04-14
Land-cover and climate change are two main drivers of changes in species ranges. Yet, the majority of studies investigating the impacts of global change on biodiversity focus on one global change driver and usually use simulations to project biodiversity responses to future conditions. We conduct an empirical test of the relative and combined effects of land-cover and climate change on species occurrence changes. Specifically, we examine whether observed local colonization and extinctions of North American birds between 1981-1985 and 2001-2005 are correlated with land-cover and climate change and whether bird life history and ecological traits explain interspecific variation in observed occurrence changes. We fit logistic regression models to test the impact of physical land-cover change, changes in net primary productivity, winter precipitation, mean summer temperature, and mean winter temperature on the probability of Ontario breeding bird local colonization and extinction. Models with climate change, land-cover change, and the combination of these two drivers were the top ranked models of local colonization for 30%, 27%, and 29% of species, respectively. Conversely, models with climate change, land-cover change, and the combination of these two drivers were the top ranked models of local extinction for 61%, 7%, and 9% of species, respectively. The quantitative impacts of land-cover and climate change variables also vary among bird species. We then fit linear regression models to test whether the variation in regional colonization and extinction rate could be explained by mean body mass, migratory strategy, and habitat preference of birds. Overall, species traits were weakly correlated with heterogeneity in species occurrence changes. We provide empirical evidence showing that land-cover change, climate change, and the combination of multiple global change drivers can differentially explain observed species local colonization and extinction. © 2018 John Wiley & Sons Ltd.
Evaluating Differential Effects Using Regression Interactions and Regression Mixture Models
ERIC Educational Resources Information Center
Van Horn, M. Lee; Jaki, Thomas; Masyn, Katherine; Howe, George; Feaster, Daniel J.; Lamont, Andrea E.; George, Melissa R. W.; Kim, Minjung
2015-01-01
Research increasingly emphasizes understanding differential effects. This article focuses on understanding regression mixture models, which are relatively new statistical methods for assessing differential effects by comparing results to using an interactive term in linear regression. The research questions which each model answers, their…
Zhang, Hong-guang; Lu, Jian-gang
2016-02-01
Abstract To overcome the problems of significant difference among samples and nonlinearity between the property and spectra of samples in spectral quantitative analysis, a local regression algorithm is proposed in this paper. In this algorithm, net signal analysis method(NAS) was firstly used to obtain the net analyte signal of the calibration samples and unknown samples, then the Euclidean distance between net analyte signal of the sample and net analyte signal of calibration samples was calculated and utilized as similarity index. According to the defined similarity index, the local calibration sets were individually selected for each unknown sample. Finally, a local PLS regression model was built on each local calibration sets for each unknown sample. The proposed method was applied to a set of near infrared spectra of meat samples. The results demonstrate that the prediction precision and model complexity of the proposed method are superior to global PLS regression method and conventional local regression algorithm based on spectral Euclidean distance.
Motion prediction in MRI-guided radiotherapy based on interleaved orthogonal cine-MRI
NASA Astrophysics Data System (ADS)
Seregni, M.; Paganelli, C.; Lee, D.; Greer, P. B.; Baroni, G.; Keall, P. J.; Riboldi, M.
2016-01-01
In-room cine-MRI guidance can provide non-invasive target localization during radiotherapy treatment. However, in order to cope with finite imaging frequency and system latencies between target localization and dose delivery, tumour motion prediction is required. This work proposes a framework for motion prediction dedicated to cine-MRI guidance, aiming at quantifying the geometric uncertainties introduced by this process for both tumour tracking and beam gating. The tumour position, identified through scale invariant features detected in cine-MRI slices, is estimated at high-frequency (25 Hz) using three independent predictors, one for each anatomical coordinate. Linear extrapolation, auto-regressive and support vector machine algorithms are compared against systems that use no prediction or surrogate-based motion estimation. Geometric uncertainties are reported as a function of image acquisition period and system latency. Average results show that the tracking error RMS can be decreased down to a [0.2; 1.2] mm range, for acquisition periods between 250 and 750 ms and system latencies between 50 and 300 ms. Except for the linear extrapolator, tracking and gating prediction errors were, on average, lower than those measured for surrogate-based motion estimation. This finding suggests that cine-MRI guidance, combined with appropriate prediction algorithms, could relevantly decrease geometric uncertainties in motion compensated treatments.
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Predictive and mechanistic multivariate linear regression models for reaction development
Santiago, Celine B.; Guo, Jing-Yao
2018-01-01
Multivariate Linear Regression (MLR) models utilizing computationally-derived and empirically-derived physical organic molecular descriptors are described in this review. Several reports demonstrating the effectiveness of this methodological approach towards reaction optimization and mechanistic interrogation are discussed. A detailed protocol to access quantitative and predictive MLR models is provided as a guide for model development and parameter analysis. PMID:29719711
Adding a Parameter Increases the Variance of an Estimated Regression Function
ERIC Educational Resources Information Center
Withers, Christopher S.; Nadarajah, Saralees
2011-01-01
The linear regression model is one of the most popular models in statistics. It is also one of the simplest models in statistics. It has received applications in almost every area of science, engineering and medicine. In this article, the authors show that adding a predictor to a linear model increases the variance of the estimated regression…
Using nonlinear quantile regression to estimate the self-thinning boundary curve
Quang V. Cao; Thomas J. Dean
2015-01-01
The relationship between tree size (quadratic mean diameter) and tree density (number of trees per unit area) has been a topic of research and discussion for many decades. Starting with Reineke in 1933, the maximum size-density relationship, on a log-log scale, has been assumed to be linear. Several techniques, including linear quantile regression, have been employed...
Simultaneous spectrophotometric determination of salbutamol and bromhexine in tablets.
Habib, I H I; Hassouna, M E M; Zaki, G A
2005-03-01
Typical anti-mucolytic drugs called salbutamol hydrochloride and bromhexine sulfate encountered in tablets were determined simultaneously either by using linear regression at zero-crossing wavelengths of the first derivation of UV-spectra or by application of multiple linear partial least squares regression method. The results obtained by the two proposed mathematical methods were compared with those obtained by the HPLC technique.
Laurens, L M L; Wolfrum, E J
2013-12-18
One of the challenges associated with microalgal biomass characterization and the comparison of microalgal strains and conversion processes is the rapid determination of the composition of algae. We have developed and applied a high-throughput screening technology based on near-infrared (NIR) spectroscopy for the rapid and accurate determination of algal biomass composition. We show that NIR spectroscopy can accurately predict the full composition using multivariate linear regression analysis of varying lipid, protein, and carbohydrate content of algal biomass samples from three strains. We also demonstrate a high quality of predictions of an independent validation set. A high-throughput 96-well configuration for spectroscopy gives equally good prediction relative to a ring-cup configuration, and thus, spectra can be obtained from as little as 10-20 mg of material. We found that lipids exhibit a dominant, distinct, and unique fingerprint in the NIR spectrum that allows for the use of single and multiple linear regression of respective wavelengths for the prediction of the biomass lipid content. This is not the case for carbohydrate and protein content, and thus, the use of multivariate statistical modeling approaches remains necessary.
Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei
2014-01-01
The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.
Hashimoto, Ken; Zúniga, Concepción; Romero, Eduardo; Morales, Zoraida; Maguire, James H.
2015-01-01
Background Central American countries face a major challenge in the control of Triatoma dimidiata, a widespread vector of Chagas disease that cannot be eliminated. The key to maintaining the risk of transmission of Trypanosoma cruzi at lowest levels is to sustain surveillance throughout endemic areas. Guatemala, El Salvador, and Honduras integrated community-based vector surveillance into local health systems. Community participation was effective in detection of the vector, but some health services had difficulty sustaining their response to reports of vectors from the population. To date, no research has investigated how best to maintain and reinforce health service responsiveness, especially in resource-limited settings. Methodology/Principal Findings We reviewed surveillance and response records of 12 health centers in Guatemala, El Salvador, and Honduras from 2008 to 2012 and analyzed the data in relation to the volume of reports of vector infestation, local geography, demography, human resources, managerial approach, and results of interviews with health workers. Health service responsiveness was defined as the percentage of households that reported vector infestation for which the local health service provided indoor residual spraying of insecticide or educational advice. Eight potential determinants of responsiveness were evaluated by linear and mixed-effects multi-linear regression. Health service responsiveness (overall 77.4%) was significantly associated with quarterly monitoring by departmental health offices. Other potential determinants of responsiveness were not found to be significant, partly because of short- and long-term strategies, such as temporary adjustments in manpower and redistribution of tasks among local participants in the effort. Conclusions/Significance Consistent monitoring within the local health system contributes to sustainability of health service responsiveness in community-based vector surveillance of Chagas disease. Even with limited resources, countries can improve health service responsiveness with thoughtful strategies and management practices in the local health systems. PMID:26252767
A Continuous Threshold Expectile Model.
Zhang, Feipeng; Li, Qunhua
2017-12-01
Expectile regression is a useful tool for exploring the relation between the response and the explanatory variables beyond the conditional mean. A continuous threshold expectile regression is developed for modeling data in which the effect of a covariate on the response variable is linear but varies below and above an unknown threshold in a continuous way. The estimators for the threshold and the regression coefficients are obtained using a grid search approach. The asymptotic properties for all the estimators are derived, and the estimator for the threshold is shown to achieve root-n consistency. A weighted CUSUM type test statistic is proposed for the existence of a threshold at a given expectile, and its asymptotic properties are derived under both the null and the local alternative models. This test only requires fitting the model under the null hypothesis in the absence of a threshold, thus it is computationally more efficient than the likelihood-ratio type tests. Simulation studies show that the proposed estimators and test have desirable finite sample performance in both homoscedastic and heteroscedastic cases. The application of the proposed method on a Dutch growth data and a baseball pitcher salary data reveals interesting insights. The proposed method is implemented in the R package cthreshER .
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
Preliminary Estimation of Kappa Parameter in Croatia
NASA Astrophysics Data System (ADS)
Stanko, Davor; Markušić, Snježana; Ivančić, Ines; Mario, Gazdek; Gülerce, Zeynep
2017-12-01
Spectral parameter kappa κ is used to describe spectral amplitude decay “crash syndrome” at high frequencies. The purpose of this research is to estimate spectral parameter kappa for the first time in Croatia based on small and moderate earthquakes. Recordings of local earthquakes with magnitudes higher than 3, epicentre distances less than 150 km, and focal depths less than 30 km from seismological stations in Croatia are used. The value of kappa was estimated from the acceleration amplitude spectrum of shear waves from the slope of the high-frequency part where the spectrum starts to decay rapidly to a noise floor. Kappa models as a function of a site and distance were derived from a standard linear regression of kappa-distance dependence. Site kappa was determined from the extrapolation of the regression line to a zero distance. The preliminary results of site kappa across Croatia are promising. In this research, these results are compared with local site condition parameters for each station, e.g. shear wave velocity in the upper 30 m from geophysical measurements and with existing global shear wave velocity - site kappa values. Spatial distribution of individual kappa’s is compared with the azimuthal distribution of earthquake epicentres. These results are significant for a couple of reasons: to extend the knowledge of the attenuation of near-surface crust layers of the Dinarides and to provide additional information on the local earthquake parameters for updating seismic hazard maps of studied area. Site kappa can be used in the re-creation, and re-calibration of attenuation of peak horizontal and/or vertical acceleration in the Dinarides area since information on the local site conditions were not included in the previous studies.
NASA Astrophysics Data System (ADS)
Anick, David J.
2003-12-01
A method is described for a rapid prediction of B3LYP-optimized geometries for polyhedral water clusters (PWCs). Starting with a database of 121 B3LYP-optimized PWCs containing 2277 H-bonds, linear regressions yield formulas correlating O-O distances, O-O-O angles, and H-O-H orientation parameters, with local and global cluster descriptors. The formulas predict O-O distances with a rms error of 0.85 pm to 1.29 pm and predict O-O-O angles with a rms error of 0.6° to 2.2°. An algorithm is given which uses the O-O and O-O-O formulas to determine coordinates for the oxygen nuclei of a PWC. The H-O-H formulas then determine positions for two H's at each O. For 15 test clusters, the gap between the electronic energy of the predicted geometry and the true B3LYP optimum ranges from 0.11 to 0.54 kcal/mol or 4 to 18 cal/mol per H-bond. Linear regression also identifies 14 parameters that strongly correlate with PWC electronic energy. These descriptors include the number of H-bonds in which both oxygens carry a non-H-bonding H, the number of quadrilateral faces, the number of symmetric angles in 5- and in 6-sided faces, and the square of the cluster's estimated dipole moment.
NASA Astrophysics Data System (ADS)
Xiao, Fan; Chen, Zhijun; Chen, Jianguo; Zhou, Yongzhang
2016-05-01
In this study, a novel batch sliding window (BSW) based singularity mapping approach was proposed. Compared to the traditional sliding window (SW) technique with disadvantages of the empirical predetermination of a fixed maximum window size and outliers sensitivity of least-squares (LS) linear regression method, the BSW based singularity mapping approach can automatically determine the optimal size of the largest window for each estimated position, and utilizes robust linear regression (RLR) which is insensitive to outlier values. In the case study, tin geochemical data in Gejiu, Yunnan, have been processed by BSW based singularity mapping approach. The results show that the BSW approach can improve the accuracy of the calculation of singularity exponent values due to the determination of the optimal maximum window size. The utilization of RLR method in the BSW approach can smoothen the distribution of singularity index values with few or even without much high fluctuate values looking like noise points that usually make a singularity map much roughly and discontinuously. Furthermore, the student's t-statistic diagram indicates a strong spatial correlation between high geochemical anomaly and known tin polymetallic deposits. The target areas within high tin geochemical anomaly could probably have much higher potential for the exploration of new tin polymetallic deposits than other areas, particularly for the areas that show strong tin geochemical anomalies whereas no tin polymetallic deposits have been found in them.
2011-01-01
Background The majority of studies of the local food environment in relation to obesity risk have been conducted in the US, UK, and Australia. The evidence remains limited to western societies. The aim of this paper is to examine the association of local food environment to body mass index (BMI) in a study of older Japanese individuals. Methods The analysis was based on 12,595 respondents from cross-sectional data of the Aichi Gerontological Evaluation Study (AGES), conducted in 2006 and 2007. Using Geographic Information Systems (GIS), we mapped respondents' access to supermarkets, convenience stores, and fast food outlets, based on a street network (both the distance to the nearest stores and the number of stores within 500 m of the respondents' home). Multiple linear regression and logistic regression analyses were performed to examine the association between food environment and BMI. Results In contrast to previous reports, we found that better access to supermarkets was related to higher BMI. Better access to fast food outlets or convenience stores was also associated with higher BMI, but only among those living alone. The logistic regression analysis, using categorized BMI, showed that the access to supermarkets was only related to being overweight or obese, but not related to being underweight. Conclusions Our findings provide mixed support for the types of food environment measures previously used in western settings. Importantly, our results suggest the need to develop culture-specific approaches to characterizing neighborhood contexts when hypotheses are extrapolated across national borders. PMID:21777439
Zhu, Hongxiao; Morris, Jeffrey S; Wei, Fengrong; Cox, Dennis D
2017-07-01
Many scientific studies measure different types of high-dimensional signals or images from the same subject, producing multivariate functional data. These functional measurements carry different types of information about the scientific process, and a joint analysis that integrates information across them may provide new insights into the underlying mechanism for the phenomenon under study. Motivated by fluorescence spectroscopy data in a cervical pre-cancer study, a multivariate functional response regression model is proposed, which treats multivariate functional observations as responses and a common set of covariates as predictors. This novel modeling framework simultaneously accounts for correlations between functional variables and potential multi-level structures in data that are induced by experimental design. The model is fitted by performing a two-stage linear transformation-a basis expansion to each functional variable followed by principal component analysis for the concatenated basis coefficients. This transformation effectively reduces the intra-and inter-function correlations and facilitates fast and convenient calculation. A fully Bayesian approach is adopted to sample the model parameters in the transformed space, and posterior inference is performed after inverse-transforming the regression coefficients back to the original data domain. The proposed approach produces functional tests that flag local regions on the functional effects, while controlling the overall experiment-wise error rate or false discovery rate. It also enables functional discriminant analysis through posterior predictive calculation. Analysis of the fluorescence spectroscopy data reveals local regions with differential expressions across the pre-cancer and normal samples. These regions may serve as biomarkers for prognosis and disease assessment.
Song, Xiao-Dong; Zhang, Gan-Lin; Liu, Feng; Li, De-Cheng; Zhao, Yu-Guo
2016-11-01
The influence of anthropogenic activities and natural processes involved high uncertainties to the spatial variation modeling of soil available zinc (AZn) in plain river network regions. Four datasets with different sampling densities were split over the Qiaocheng district of Bozhou City, China. The difference of AZn concentrations regarding soil types was analyzed by the principal component analysis (PCA). Since the stationarity was not indicated and effective ranges of four datasets were larger than the sampling extent (about 400 m), two investigation tools, namely F3 test and stationarity index (SI), were employed to test the local non-stationarity. Geographically weighted regression (GWR) technique was performed to describe the spatial heterogeneity of AZn concentrations under the non-stationarity assumption. GWR based on grouped soil type information (GWRG for short) was proposed so as to benefit the local modeling of soil AZn within each soil-landscape unit. For reference, the multiple linear regression (MLR) model, a global regression technique, was also employed and incorporated the same predictors as in the GWR models. Validation results based on 100 times realization demonstrated that GWRG outperformed MLR and can produce similar or better accuracy than the GWR approach. Nevertheless, GWRG can generate better soil maps than GWR for limit soil data. Two-sample t test of produced soil maps also confirmed significantly different means. Variogram analysis of the model residuals exhibited weak spatial correlation, rejecting the use of hybrid kriging techniques. As a heuristically statistical method, the GWRG was beneficial in this study and potentially for other soil properties.
Understanding multidecadal variability in ENSO amplitude
NASA Astrophysics Data System (ADS)
Russell, A.; Gnanadesikan, A.
2013-12-01
Sea surface temperatures (SSTs) in the tropical Pacific vary as a result of the coupling between the ocean and atmosphere driven largely by the El Niño - Southern Oscillation (ENSO). ENSO has a large impact on the local climate and hydrology of the tropical Pacific, as well as broad-reaching effects on global climate. ENSO amplitude is known to vary on long timescales, which makes it very difficult to quantify its response to climate change and constrain the physical processes that drive it. In order to assess the extent of unforced multidecadal changes in ENSO variability, a linear regression of local SST changes is applied to the GFDL CM2.1 model 4000-yr pre-industrial control run. The resulting regression coefficient strengths, which represent the sensitivity of SST changes to thermocline depth and zonal wind stress, vary by up to a factor of 2 on multi-decadal time scales. This long-term modulation in ocean-atmosphere coupling is highly correlated with ENSO variability, but do not explain the reasons for such variability. Variation in the relationship between SST changes and wind stress points to a role for changing stratification in the central equatorial Pacific in modulating ENSO amplitudes with stronger stratification reducing the response to winds. The main driving mechanism we have identified for higher ENSO variance are changes in the response of zonal winds to SST anomalies. The shifting convection and precipitation patterns associated with the changing state of the atmosphere also contribute to the variability of the regression coefficients. These mechanisms drive much of the variability in ENSO amplitude and hence ocean-atmosphere coupling in the tropical Pacific.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, S.-Y.; Chang, K.-P.; Graduate Institute of Clinical Medical Sciences, Chang Gung University, Linkou, Taiwan
Purpose: The presence of Epstein-Barr virus latent membrane protein-1 (LMP-1) gene in nasopharyngeal swabs indicates the presence of nasopharyngeal carcinoma (NPC) mucosal tumor cells. This study was undertaken to investigate whether the time taken for LMP-1 to disappear after initiation of primary radiotherapy (RT) was inversely associated with NPC local control. Methods and Materials: During July 1999 and October 2002, there were 127 nondisseminated NPC patients receiving serial examinations of nasopharyngeal swabbing with detection of LMP-1 during the RT course. The time for LMP-1 regression was defined as the number of days after initiation of RT for LMP-1 results tomore » turn negative. The primary outcome was local control, which was represented by freedom from local recurrence. Results: The time for LMP-1 regression showed a statistically significant influence on NPC local control both univariately (p < 0.0001) and multivariately (p = 0.004). In multivariate analysis, the administration of chemotherapy conferred a significantly more favorable local control (p = 0.03). Advanced T status ({>=} T2b), overall treatment time of external photon radiotherapy longer than 55 days, and older age showed trends toward being poor prognosticators. The time for LMP-1 regression was very heterogeneous. According to the quartiles of the time for LMP-1 regression, we defined the pattern of LMP-1 regression as late regression if it required 40 days or more. Kaplan-Meier plots indicated that the patients with late regression had a significantly worse local control than those with intermediate or early regression (p 0.0129). Conclusion: Among the potential prognostic factors examined in this study, the time for LMP-1 regression was the most independently significant factor that was inversely associated with NPC local control.« less
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications. PMID:27806075
Miguel-Hurtado, Oscar; Guest, Richard; Stevenage, Sarah V; Neil, Greg J; Black, Sue
2016-01-01
Understanding the relationship between physiological measurements from human subjects and their demographic data is important within both the biometric and forensic domains. In this paper we explore the relationship between measurements of the human hand and a range of demographic features. We assess the ability of linear regression and machine learning classifiers to predict demographics from hand features, thereby providing evidence on both the strength of relationship and the key features underpinning this relationship. Our results show that we are able to predict sex, height, weight and foot size accurately within various data-range bin sizes, with machine learning classification algorithms out-performing linear regression in most situations. In addition, we identify the features used to provide these relationships applicable across multiple applications.
Kumar, K Vasanth; Porkodi, K; Rocha, F
2008-01-15
A comparison of linear and non-linear regression method in selecting the optimum isotherm was made to the experimental equilibrium data of basic red 9 sorption by activated carbon. The r(2) was used to select the best fit linear theoretical isotherm. In the case of non-linear regression method, six error functions namely coefficient of determination (r(2)), hybrid fractional error function (HYBRID), Marquardt's percent standard deviation (MPSD), the average relative error (ARE), sum of the errors squared (ERRSQ) and sum of the absolute errors (EABS) were used to predict the parameters involved in the two and three parameter isotherms and also to predict the optimum isotherm. Non-linear regression was found to be a better way to obtain the parameters involved in the isotherms and also the optimum isotherm. For two parameter isotherm, MPSD was found to be the best error function in minimizing the error distribution between the experimental equilibrium data and predicted isotherms. In the case of three parameter isotherm, r(2) was found to be the best error function to minimize the error distribution structure between experimental equilibrium data and theoretical isotherms. The present study showed that the size of the error function alone is not a deciding factor to choose the optimum isotherm. In addition to the size of error function, the theory behind the predicted isotherm should be verified with the help of experimental data while selecting the optimum isotherm. A coefficient of non-determination, K(2) was explained and was found to be very useful in identifying the best error function while selecting the optimum isotherm.
Kuriakose, Saji; Joe, I Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC=0.00009% v/v). The lowest root mean square error of prediction (RMSEP=0.00016% v/v) in the test set and the highest coefficient of determination (R(2)=0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Kuriakose, Saji; Joe, I. Hubert
2013-11-01
Determination of the authenticity of essential oils has become more significant, in recent years, following some illegal adulteration and contamination scandals. The present investigative study focuses on the application of near infrared spectroscopy to detect sample authenticity and quantify economic adulteration of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using partial least square regression (PLSR). The quantitative data analysis is done using a new spectral approach - full spectrum or sequential spectrum. The optimum number of PLS components is obtained according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method, locally weighted regression (LWR), is added to extract nonlinear information and to compare with the linear PLSR model.
Applied Multiple Linear Regression: A General Research Strategy
ERIC Educational Resources Information Center
Smith, Brandon B.
1969-01-01
Illustrates some of the basic concepts and procedures for using regression analysis in experimental design, analysis of variance, analysis of covariance, and curvilinear regression. Applications to evaluation of instruction and vocational education programs are illustrated. (GR)
Music, Mark; Finderle, Zarko; Cankar, Ksenija
2011-05-01
The aim of the present study was to investigate the effect of quantitatively measured cold perception (CP) thresholds on microcirculatory response to local cooling as measured by direct and indirect response of laser-Doppler (LD) flux during local cooling at different temperatures. The CP thresholds were measured in 18 healthy males using the Marstock method (thermode placed on the thenar). The direct (at the cooling site) and indirect (on contralateral hand) LD flux responses were recorded during immersion of the hand in a water bath at 20°C, 15°C, and 10°C. The cold perception threshold correlated (linear regression analysis, Pearson correlation) with the indirect LD flux response at cooling temperatures 20°C (r=0.782, p<0.01) and 15°C (r=0.605, p<0.01). In contrast, there was no correlation between the CP threshold and the indirect LD flux response during cooling in water at 10°C. The results demonstrate that during local cooling, depending on the cooling temperature used, cold perception threshold influences indirect LD flux response. Copyright © 2011 Elsevier Inc. All rights reserved.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Murphy, Kevin; Birn, Rasmus M.; Handwerker, Daniel A.; Jones, Tyler B.; Bandettini, Peter A.
2009-01-01
Low-frequency fluctuations in fMRI signal have been used to map several consistent resting state networks in the brain. Using the posterior cingulate cortex as a seed region, functional connectivity analyses have found not only positive correlations in the default mode network but negative correlations in another resting state network related to attentional processes. The interpretation is that the human brain is intrinsically organized into dynamic, anti-correlated functional networks. Global variations of the BOLD signal are often considered nuisance effects and are commonly removed using a general linear model (GLM) technique. This global signal regression method has been shown to introduce negative activation measures in standard fMRI analyses. The topic of this paper is whether such a correction technique could be the cause of anti-correlated resting state networks in functional connectivity analyses. Here we show that, after global signal regression, correlation values to a seed voxel must sum to a negative value. Simulations also show that small phase differences between regions can lead to spurious negative correlation values. A combination breath holding and visual task demonstrates that the relative phase of global and local signals can affect connectivity measures and that, experimentally, global signal regression leads to bell-shaped correlation value distributions, centred on zero. Finally, analyses of negatively correlated networks in resting state data show that global signal regression is most likely the cause of anti-correlations. These results call into question the interpretation of negatively correlated regions in the brain when using global signal regression as an initial processing step. PMID:18976716
Murphy, Kevin; Birn, Rasmus M; Handwerker, Daniel A; Jones, Tyler B; Bandettini, Peter A
2009-02-01
Low-frequency fluctuations in fMRI signal have been used to map several consistent resting state networks in the brain. Using the posterior cingulate cortex as a seed region, functional connectivity analyses have found not only positive correlations in the default mode network but negative correlations in another resting state network related to attentional processes. The interpretation is that the human brain is intrinsically organized into dynamic, anti-correlated functional networks. Global variations of the BOLD signal are often considered nuisance effects and are commonly removed using a general linear model (GLM) technique. This global signal regression method has been shown to introduce negative activation measures in standard fMRI analyses. The topic of this paper is whether such a correction technique could be the cause of anti-correlated resting state networks in functional connectivity analyses. Here we show that, after global signal regression, correlation values to a seed voxel must sum to a negative value. Simulations also show that small phase differences between regions can lead to spurious negative correlation values. A combination breath holding and visual task demonstrates that the relative phase of global and local signals can affect connectivity measures and that, experimentally, global signal regression leads to bell-shaped correlation value distributions, centred on zero. Finally, analyses of negatively correlated networks in resting state data show that global signal regression is most likely the cause of anti-correlations. These results call into question the interpretation of negatively correlated regions in the brain when using global signal regression as an initial processing step.
Spatial interpolation schemes of daily precipitation for hydrologic modeling
Hwang, Y.; Clark, M.R.; Rajagopalan, B.; Leavesley, G.
2012-01-01
Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. We compare and contrast the performance of regression-based statistical methods for the spatial estimation of precipitation in two hydrologically different basins and confirmed that widely used regression-based estimation schemes fail to describe the realistic spatial variability of daily precipitation field. The methods assessed are: (1) inverse distance weighted average; (2) multiple linear regression (MLR); (3) climatological MLR; and (4) locally weighted polynomial regression (LWP). In order to improve the performance of the interpolations, the authors propose a two-step regression technique for effective daily precipitation estimation. In this simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before estimate the amount of precipitation separately on wet days. This process generated the precipitation occurrence, amount, and spatial correlation effectively. A distributed hydrologic model (PRMS) was used for the impact analysis in daily time step simulation. Multiple simulations suggested noticeable differences between the input alternatives generated by three different interpolation schemes. Differences are shown in overall simulation error against the observations, degree of explained variability, and seasonal volumes. Simulated streamflows also showed different characteristics in mean, maximum, minimum, and peak flows. Given the same parameter optimization technique, LWP input showed least streamflow error in Alapaha basin and CMLR input showed least error (still very close to LWP) in Animas basin. All of the two-step interpolation inputs resulted in lower streamflow error compared to the directly interpolated inputs. ?? 2011 Springer-Verlag.
Bennett, Bradley C; Husby, Chad E
2008-03-28
Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
Wen, Cheng; Dallimer, Martin; Carver, Steve; Ziv, Guy
2018-05-06
Despite the great potential of mitigating carbon emission, development of wind farms is often opposed by local communities due to the visual impact on landscape. A growing number of studies have applied nonmarket valuation methods like Choice Experiments (CE) to value the visual impact by eliciting respondents' willingness to pay (WTP) or willingness to accept (WTA) for hypothetical wind farms through survey questions. Several meta-analyses have been found in the literature to synthesize results from different valuation studies, but they have various limitations related to the use of the prevailing multivariate meta-regression analysis. In this paper, we propose a new meta-analysis method to establish general functions for the relationships between the estimated WTP or WTA and three wind farm attributes, namely the distance to residential/coastal areas, the number of turbines and turbine height. This method involves establishing WTA or WTP functions for individual studies, fitting the average derivative functions and deriving the general integral functions of WTP or WTA against wind farm attributes. Results indicate that respondents in different studies consistently showed increasing WTP for moving wind farms to greater distances, which can be fitted by non-linear (natural logarithm) functions. However, divergent preferences for the number of turbines and turbine height were found in different studies. We argue that the new analysis method proposed in this paper is an alternative to the mainstream multivariate meta-regression analysis for synthesizing CE studies and the general integral functions of WTP or WTA against wind farm attributes are useful for future spatial modelling and benefit transfer studies. We also suggest that future multivariate meta-analyses should include non-linear components in the regression functions. Copyright © 2018. Published by Elsevier B.V.
Homan, Tobias; Maire, Nicolas; Hiscox, Alexandra; Di Pasquale, Aurelio; Kiche, Ibrahim; Onoka, Kelvin; Mweresa, Collins; Mukabana, Wolfgang R; Ross, Amanda; Smith, Thomas A; Takken, Willem
2016-01-04
Large reductions in malaria transmission and mortality have been achieved over the last decade, and this has mainly been attributed to the scale-up of long-lasting insecticidal bed nets and indoor residual spraying with insecticides. Despite these gains considerable residual, spatially heterogeneous, transmission remains. To reduce transmission in these foci, researchers need to consider the local demographical, environmental and social context, and design an appropriate set of interventions. Exploring spatially variable risk factors for malaria can give insight into which human and environmental characteristics play important roles in sustaining malaria transmission. On Rusinga Island, western Kenya, malaria infection was tested by rapid diagnostic tests during two cross-sectional surveys conducted 3 months apart in 3632 individuals from 790 households. For all households demographic data were collected by means of questionnaires. Environmental variables were derived using Quickbird satellite images. Analyses were performed on 81 project clusters constructed by a traveling salesman algorithm, each containing 50-51 households. A standard linear regression model was fitted containing multiple variables to determine how much of the spatial variation in malaria prevalence could be explained by the demographic and environmental data. Subsequently, a geographically-weighted regression (GWR) was performed assuming non-stationarity of risk factors. Special attention was taken to investigate the effect of residual spatial autocorrelation and local multicollinearity. Combining the data from both surveys, overall malaria prevalence was 24%. Scan statistics revealed two clusters which had significantly elevated numbers of malaria cases compared to the background prevalence across the rest of the study area. A multivariable linear model including environmental and household factors revealed that higher socioeconomic status, outdoor occupation and population density were associated with increased malaria risk. The local GWR model improved the model fit considerably and the relationship of malaria with risk factors was found to vary spatially over the island; in different areas of the island socio-economic status, outdoor occupation and population density were found to be positively or negatively associated with malaria prevalence. Identification of risk factors for malaria that vary geographically can provide insight into the local epidemiology of malaria. Examining spatially variable relationships can be a helpful tool in exploring which set of targeted interventions could locally be implemented. Supplementary malaria control may be directed at areas, which are identified as at risk. For instance, areas with many people that work outdoors at night may need more focus in terms of vector control. Trialregister.nl NTR3496-SolarMal, registered on 20 June 2012.
Jacob, Benjamin J; Krapp, Fiorella; Ponce, Mario; Gottuzzo, Eduardo; Griffith, Daniel A; Novak, Robert J
2010-05-01
Spatial autocorrelation is problematic for classical hierarchical cluster detection tests commonly used in multi-drug resistant tuberculosis (MDR-TB) analyses as considerable random error can occur. Therefore, when MDRTB clusters are spatially autocorrelated the assumption that the clusters are independently random is invalid. In this research, a product moment correlation coefficient (i.e., the Moran's coefficient) was used to quantify local spatial variation in multiple clinical and environmental predictor variables sampled in San Juan de Lurigancho, Lima, Peru. Initially, QuickBird 0.61 m data, encompassing visible bands and the near infra-red bands, were selected to synthesize images of land cover attributes of the study site. Data of residential addresses of individual patients with smear-positive MDR-TB were geocoded, prevalence rates calculated and then digitally overlaid onto the satellite data within a 2 km buffer of 31 georeferenced health centers, using a 10 m2 grid-based algorithm. Geographical information system (GIS)-gridded measurements of each health center were generated based on preliminary base maps of the georeferenced data aggregated to block groups and census tracts within each buffered area. A three-dimensional model of the study site was constructed based on a digital elevation model (DEM) to determine terrain covariates associated with the sampled MDR-TB covariates. Pearson's correlation was used to evaluate the linear relationship between the DEM and the sampled MDR-TB data. A SAS/GIS(R) module was then used to calculate univariate statistics and to perform linear and non-linear regression analyses using the sampled predictor variables. The estimates generated from a global autocorrelation analyses were then spatially decomposed into empirical orthogonal bases using a negative binomial regression with a non-homogeneous mean. Results of the DEM analyses indicated a statistically non-significant, linear relationship between georeferenced health centers and the sampled covariate elevation. The data exhibited positive spatial autocorrelation and the decomposition of Moran's coefficient into uncorrelated, orthogonal map pattern components revealed global spatial heterogeneities necessary to capture latent autocorrelation in the MDR-TB model. It was thus shown that Poisson regression analyses and spatial eigenvector mapping can elucidate the mechanics of MDR-TB transmission by prioritizing clinical and environmental-sampled predictor variables for identifying high risk populations.
An Application to the Prediction of LOD Change Based on General Regression Neural Network
NASA Astrophysics Data System (ADS)
Zhang, X. H.; Wang, Q. J.; Zhu, J. J.; Zhang, H.
2011-07-01
Traditional prediction of the LOD (length of day) change was based on linear models, such as the least square model and the autoregressive technique, etc. Due to the complex non-linear features of the LOD variation, the performances of the linear model predictors are not fully satisfactory. This paper applies a non-linear neural network - general regression neural network (GRNN) model to forecast the LOD change, and the results are analyzed and compared with those obtained with the back propagation neural network and other models. The comparison shows that the performance of the GRNN model in the prediction of the LOD change is efficient and feasible.
DOT National Transportation Integrated Search
2016-09-01
We consider the problem of solving mixed random linear equations with k components. This is the noiseless setting of mixed linear regression. The goal is to estimate multiple linear models from mixed samples in the case where the labels (which sample...
Linear regression techniques for use in the EC tracer method of secondary organic aerosol estimation
NASA Astrophysics Data System (ADS)
Saylor, Rick D.; Edgerton, Eric S.; Hartsell, Benjamin E.
A variety of linear regression techniques and simple slope estimators are evaluated for use in the elemental carbon (EC) tracer method of secondary organic carbon (OC) estimation. Linear regression techniques based on ordinary least squares are not suitable for situations where measurement uncertainties exist in both regressed variables. In the past, regression based on the method of Deming [1943. Statistical Adjustment of Data. Wiley, London] has been the preferred choice for EC tracer method parameter estimation. In agreement with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], we find that in the limited case where primary non-combustion OC (OC non-comb) is assumed to be zero, the ratio of averages (ROA) approach provides a stable and reliable estimate of the primary OC-EC ratio, (OC/EC) pri. In contrast with Chu [2005. Stable estimate of primary OC/EC ratios in the EC tracer method. Atmospheric Environment 39, 1383-1392], however, we find that the optimal use of Deming regression (and the more general York et al. [2004. Unified equations for the slope, intercept, and standard errors of the best straight line. American Journal of Physics 72, 367-375] regression) provides excellent results as well. For the more typical case where OC non-comb is allowed to obtain a non-zero value, we find that regression based on the method of York is the preferred choice for EC tracer method parameter estimation. In the York regression technique, detailed information on uncertainties in the measurement of OC and EC is used to improve the linear best fit to the given data. If only limited information is available on the relative uncertainties of OC and EC, then Deming regression should be used. On the other hand, use of ROA in the estimation of secondary OC, and thus the assumption of a zero OC non-comb value, generally leads to an overestimation of the contribution of secondary OC to total measured OC.
Chen, Qihong; Long, Rong; Quan, Shuhai
2014-01-01
This paper presents a neural network predictive control strategy to optimize power distribution for a fuel cell/ultracapacitor hybrid power system of a robot. We model the nonlinear power system by employing time variant auto-regressive moving average with exogenous (ARMAX), and using recurrent neural network to represent the complicated coefficients of the ARMAX model. Because the dynamic of the system is viewed as operating- state- dependent time varying local linear behavior in this frame, a linear constrained model predictive control algorithm is developed to optimize the power splitting between the fuel cell and ultracapacitor. The proposed algorithm significantly simplifies implementation of the controller and can handle multiple constraints, such as limiting substantial fluctuation of fuel cell current. Experiment and simulation results demonstrate that the control strategy can optimally split power between the fuel cell and ultracapacitor, limit the change rate of the fuel cell current, and so as to extend the lifetime of the fuel cell. PMID:24707206
An Improved Incremental Learning Approach for KPI Prognosis of Dynamic Fuel Cell System.
Yin, Shen; Xie, Xiaochen; Lam, James; Cheung, Kie Chung; Gao, Huijun
2016-12-01
The key performance indicator (KPI) has an important practical value with respect to the product quality and economic benefits for modern industry. To cope with the KPI prognosis issue under nonlinear conditions, this paper presents an improved incremental learning approach based on available process measurements. The proposed approach takes advantage of the algorithm overlapping of locally weighted projection regression (LWPR) and partial least squares (PLS), implementing the PLS-based prognosis in each locally linear model produced by the incremental learning process of LWPR. The global prognosis results including KPI prediction and process monitoring are obtained from the corresponding normalized weighted means of all the local models. The statistical indicators for prognosis are enhanced as well by the design of novel KPI-related and KPI-unrelated statistics with suitable control limits for non-Gaussian data. For application-oriented purpose, the process measurements from real datasets of a proton exchange membrane fuel cell system are employed to demonstrate the effectiveness of KPI prognosis. The proposed approach is finally extended to a long-term voltage prediction for potential reference of further fuel cell applications.
Issel, L Michele; Olorunsaiye, Comfort; Snebold, Laura; Handler, Arden
2015-04-01
We explored the relationships between local health department (LHD) structure, capacity, and macro-context variables and performance of essential public health services (EPHS). In 2012, we assessed a stratified, random sample of 195 LHDs that provided data via an online survey regarding performance of EPHS, the services provided or contracted out, the financial strategies used in response to budgetary pressures, and the extent of collaborations. We performed weighted analyses that included analysis of variance, pairwise correlations by jurisdiction population size, and linear regressions. On average, LHDs provided approximately 13 (36%) of 35 possible services either directly or by contract. Rather than cut services or externally consolidating, LHDs took steps to generate more revenue and maximize capacity. Higher LHD performance of EPHS was significantly associated with delivering more services, initiating more financial strategies, and engaging in collaboration, after adjusting for the effects of the Affordable Care Act and jurisdiction size. During changing economic and health care environments, we found that strong structural capacity enhanced local health department EPHS performance for maternal, child, and adolescent health.
Yang, Xiaowei; Nie, Kun
2008-03-15
Longitudinal data sets in biomedical research often consist of large numbers of repeated measures. In many cases, the trajectories do not look globally linear or polynomial, making it difficult to summarize the data or test hypotheses using standard longitudinal data analysis based on various linear models. An alternative approach is to apply the approaches of functional data analysis, which directly target the continuous nonlinear curves underlying discretely sampled repeated measures. For the purposes of data exploration, many functional data analysis strategies have been developed based on various schemes of smoothing, but fewer options are available for making causal inferences regarding predictor-outcome relationships, a common task seen in hypothesis-driven medical studies. To compare groups of curves, two testing strategies with good power have been proposed for high-dimensional analysis of variance: the Fourier-based adaptive Neyman test and the wavelet-based thresholding test. Using a smoking cessation clinical trial data set, this paper demonstrates how to extend the strategies for hypothesis testing into the framework of functional linear regression models (FLRMs) with continuous functional responses and categorical or continuous scalar predictors. The analysis procedure consists of three steps: first, apply the Fourier or wavelet transform to the original repeated measures; then fit a multivariate linear model in the transformed domain; and finally, test the regression coefficients using either adaptive Neyman or thresholding statistics. Since a FLRM can be viewed as a natural extension of the traditional multiple linear regression model, the development of this model and computational tools should enhance the capacity of medical statistics for longitudinal data.
NASA Astrophysics Data System (ADS)
Gonçalves, Karen dos Santos; Winkler, Mirko S.; Benchimol-Barbosa, Paulo Roberto; de Hoogh, Kees; Artaxo, Paulo Eduardo; de Souza Hacon, Sandra; Schindler, Christian; Künzli, Nino
2018-07-01
Epidemiological studies generally use particulate matter measurements with diameter less 2.5 μm (PM2.5) from monitoring networks. Satellite aerosol optical depth (AOD) data has considerable potential in predicting PM2.5 concentrations, and thus provides an alternative method for producing knowledge regarding the level of pollution and its health impact in areas where no ground PM2.5 measurements are available. This is the case in the Brazilian Amazon rainforest region where forest fires are frequent sources of high pollution. In this study, we applied a non-linear model for predicting PM2.5 concentration from AOD retrievals using interaction terms between average temperature, relative humidity, sine, cosine of date in a period of 365,25 days and the square of the lagged relative residual. Regression performance statistics were tested comparing the goodness of fit and R2 based on results from linear regression and non-linear regression for six different models. The regression results for non-linear prediction showed the best performance, explaining on average 82% of the daily PM2.5 concentrations when considering the whole period studied. In the context of Amazonia, it was the first study predicting PM2.5 concentrations using the latest high-resolution AOD products also in combination with the testing of a non-linear model performance. Our results permitted a reliable prediction considering the AOD-PM2.5 relationship and set the basis for further investigations on air pollution impacts in the complex context of Brazilian Amazon Region.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, Aidan P.; Swiler, Laura P.; Trott, Christian R.
2015-03-15
Here, we present a new interatomic potential for solids and liquids called Spectral Neighbor Analysis Potential (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected onto a basis of hyperspherical harmonics in four dimensions. The bispectrum components are the same bond-orientational order parameters employed by the GAP potential [1].more » The SNAP potential, unlike GAP, assumes a linear relationship between atom energy and bispectrum components. The linear SNAP coefficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. We demonstrate that a previously unnoticed symmetry property can be exploited to reduce the computational cost of the force calculations by more than one order of magnitude. We present results for a SNAP potential for tantalum, showing that it accurately reproduces a range of commonly calculated properties of both the crystalline solid and the liquid phases. In addition, unlike simpler existing potentials, SNAP correctly predicts the energy barrier for screw dislocation migration in BCC tantalum.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, A.P., E-mail: athomps@sandia.gov; Swiler, L.P., E-mail: lpswile@sandia.gov; Trott, C.R., E-mail: crtrott@sandia.gov
2015-03-15
We present a new interatomic potential for solids and liquids called Spectral Neighbor Analysis Potential (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projected onto a basis of hyperspherical harmonics in four dimensions. The bispectrum components are the same bond-orientational order parameters employed by the GAP potential [1]. Themore » SNAP potential, unlike GAP, assumes a linear relationship between atom energy and bispectrum components. The linear SNAP coefficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. We demonstrate that a previously unnoticed symmetry property can be exploited to reduce the computational cost of the force calculations by more than one order of magnitude. We present results for a SNAP potential for tantalum, showing that it accurately reproduces a range of commonly calculated properties of both the crystalline solid and the liquid phases. In addition, unlike simpler existing potentials, SNAP correctly predicts the energy barrier for screw dislocation migration in BCC tantalum.« less
NASA Astrophysics Data System (ADS)
Li, X.; Gao, M.
2017-12-01
The magnitude of an earthquake is one of its basic parameters and is a measure of its scale. It plays a significant role in seismology and earthquake engineering research, particularly in the calculations of the seismic rate and b value in earthquake prediction and seismic hazard analysis. However, several current types of magnitudes used in seismology research, such as local magnitude (ML), surface wave magnitude (MS), and body-wave magnitude (MB), have a common limitation, which is the magnitude saturation phenomenon. Fortunately, the problem of magnitude saturation was solved by a formula for calculating the seismic moment magnitude (MW) based on the seismic moment, which describes the seismic source strength. Now the moment magnitude is very commonly used in seismology research. However, in China, the earthquake scale is primarily based on local and surface-wave magnitudes. In the present work, we studied the empirical relationships between moment magnitude (MW) and local magnitude (ML) as well as surface wave magnitude (MS) in the Chinese Mainland. The China Earthquake Networks Center (CENC) ML catalog, China Seismograph Network (CSN) MS catalog, ANSS Comprehensive Earthquake Catalog (ComCat), and Global Centroid Moment Tensor (GCMT) are adopted to regress the relationships using the orthogonal regression method. The obtained relationships are as follows: MW=0.64+0.87MS; MW=1.16+0.75ML. Therefore, in China, if the moment magnitude of an earthquake is not reported by any agency in the world, we can use the equations mentioned above for converting ML to MW and MS to MW. These relationships are very important, because they will allow the China earthquake catalogs to be used more effectively for seismic hazard analysis, earthquake prediction, and other seismology research. We also computed the relationships of and (where Mo is the seismic moment) by linear regression using the Global Centroid Moment Tensor. The obtained relationships are as follows: logMo=18.21+1.05ML; logMo=17.04+1.32MS. This formula can be used by seismologists to convert the ML/MS of Chinese mainland events into their seismic moments.
Application of Local Linear Embedding to Nonlinear Exploratory Latent Structure Analysis
ERIC Educational Resources Information Center
Wang, Haonan; Iyer, Hari
2007-01-01
In this paper we discuss the use of a recent dimension reduction technique called Locally Linear Embedding, introduced by Roweis and Saul, for performing an exploratory latent structure analysis. The coordinate variables from the locally linear embedding describing the manifold on which the data reside serve as the latent variable scores. We…
Senn, Stephen; Graf, Erika; Caputo, Angelika
2007-12-30
Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome. Copyright (c) 2007 John Wiley & Sons, Ltd.
Procedures for adjusting regional regression models of urban-runoff quality using local data
Hoos, A.B.; Sisolak, J.K.
1993-01-01
Statistical operations termed model-adjustment procedures (MAP?s) can be used to incorporate local data into existing regression models to improve the prediction of urban-runoff quality. Each MAP is a form of regression analysis in which the local data base is used as a calibration data set. Regression coefficients are determined from the local data base, and the resulting `adjusted? regression models can then be used to predict storm-runoff quality at unmonitored sites. The response variable in the regression analyses is the observed load or mean concentration of a constituent in storm runoff for a single storm. The set of explanatory variables used in the regression analyses is different for each MAP, but always includes the predicted value of load or mean concentration from a regional regression model. The four MAP?s examined in this study were: single-factor regression against the regional model prediction, P, (termed MAP-lF-P), regression against P,, (termed MAP-R-P), regression against P, and additional local variables (termed MAP-R-P+nV), and a weighted combination of P, and a local-regression prediction (termed MAP-W). The procedures were tested by means of split-sample analysis, using data from three cities included in the Nationwide Urban Runoff Program: Denver, Colorado; Bellevue, Washington; and Knoxville, Tennessee. The MAP that provided the greatest predictive accuracy for the verification data set differed among the three test data bases and among model types (MAP-W for Denver and Knoxville, MAP-lF-P and MAP-R-P for Bellevue load models, and MAP-R-P+nV for Bellevue concentration models) and, in many cases, was not clearly indicated by the values of standard error of estimate for the calibration data set. A scheme to guide MAP selection, based on exploratory data analysis of the calibration data set, is presented and tested. The MAP?s were tested for sensitivity to the size of a calibration data set. As expected, predictive accuracy of all MAP?s for the verification data set decreased as the calibration data-set size decreased, but predictive accuracy was not as sensitive for the MAP?s as it was for the local regression models.
Non-Linear Approach in Kinesiology Should Be Preferred to the Linear--A Case of Basketball.
Trninić, Marko; Jeličić, Mario; Papić, Vladan
2015-07-01
In kinesiology, medicine, biology and psychology, in which research focus is on dynamical self-organized systems, complex connections exist between variables. Non-linear nature of complex systems has been discussed and explained by the example of non-linear anthropometric predictors of performance in basketball. Previous studies interpreted relations between anthropometric features and measures of effectiveness in basketball by (a) using linear correlation models, and by (b) including all basketball athletes in the same sample of participants regardless of their playing position. In this paper the significance and character of linear and non-linear relations between simple anthropometric predictors (AP) and performance criteria consisting of situation-related measures of effectiveness (SE) in basketball were determined and evaluated. The sample of participants consisted of top-level junior basketball players divided in three groups according to their playing time (8 minutes and more per game) and playing position: guards (N = 42), forwards (N = 26) and centers (N = 40). Linear (general model) and non-linear (general model) regression models were calculated simultaneously and separately for each group. The conclusion is viable: non-linear regressions are frequently superior to linear correlations when interpreting actual association logic among research variables.
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A.
2013-01-01
Background Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. Objective We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Design Using cross-sectional data for children aged 0–24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. Results At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Conclusions Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role. PMID:24223839
Fenske, Nora; Burns, Jacob; Hothorn, Torsten; Rehfuess, Eva A
2013-01-01
Most attempts to address undernutrition, responsible for one third of global child deaths, have fallen behind expectations. This suggests that the assumptions underlying current modelling and intervention practices should be revisited. We undertook a comprehensive analysis of the determinants of child stunting in India, and explored whether the established focus on linear effects of single risks is appropriate. Using cross-sectional data for children aged 0-24 months from the Indian National Family Health Survey for 2005/2006, we populated an evidence-based diagram of immediate, intermediate and underlying determinants of stunting. We modelled linear, non-linear, spatial and age-varying effects of these determinants using additive quantile regression for four quantiles of the Z-score of standardized height-for-age and logistic regression for stunting and severe stunting. At least one variable within each of eleven groups of determinants was significantly associated with height-for-age in the 35% Z-score quantile regression. The non-modifiable risk factors child age and sex, and the protective factors household wealth, maternal education and BMI showed the largest effects. Being a twin or multiple birth was associated with dramatically decreased height-for-age. Maternal age, maternal BMI, birth order and number of antenatal visits influenced child stunting in non-linear ways. Findings across the four quantile and two logistic regression models were largely comparable. Our analysis confirms the multifactorial nature of child stunting. It emphasizes the need to pursue a systems-based approach and to consider non-linear effects, and suggests that differential effects across the height-for-age distribution do not play a major role.
González-Aparicio, I; Hidalgo, J; Baklanov, A; Padró, A; Santa-Coloma, O
2013-07-01
There is extensive evidence of the negative impacts on health linked to the rise of the regional background of particulate matter (PM) 10 levels. These levels are often increased over urban areas becoming one of the main air pollution concerns. This is the case on the Bilbao metropolitan area, Spain. This study describes a data-driven model to diagnose PM10 levels in Bilbao at hourly intervals. The model is built with a training period of 7-year historical data covering different urban environments (inland, city centre and coastal sites). The explanatory variables are quantitative-log [NO2], temperature, short-wave incoming radiation, wind speed and direction, specific humidity, hour and vehicle intensity-and qualitative-working days/weekends, season (winter/summer), the hour (from 00 to 23 UTC) and precipitation/no precipitation. Three different linear regression models are compared: simple linear regression; linear regression with interaction terms (INT); and linear regression with interaction terms following the Sawa's Bayesian Information Criteria (INT-BIC). Each type of model is calculated selecting two different periods: the training (it consists of 6 years) and the testing dataset (it consists of 1 year). The results of each type of model show that the INT-BIC-based model (R(2) = 0.42) is the best. Results were R of 0.65, 0.63 and 0.60 for the city centre, inland and coastal sites, respectively, a level of confidence similar to the state-of-the art methodology. The related error calculated for longer time intervals (monthly or seasonal means) diminished significantly (R of 0.75-0.80 for monthly means and R of 0.80 to 0.98 at seasonally means) with respect to shorter periods.
O'Leary, Neil; Chauhan, Balwantray C; Artes, Paul H
2012-10-01
To establish a method for estimating the overall statistical significance of visual field deterioration from an individual patient's data, and to compare its performance to pointwise linear regression. The Truncated Product Method was used to calculate a statistic S that combines evidence of deterioration from individual test locations in the visual field. The overall statistical significance (P value) of visual field deterioration was inferred by comparing S with its permutation distribution, derived from repeated reordering of the visual field series. Permutation of pointwise linear regression (PoPLR) and pointwise linear regression were evaluated in data from patients with glaucoma (944 eyes, median mean deviation -2.9 dB, interquartile range: -6.3, -1.2 dB) followed for more than 4 years (median 10 examinations over 8 years). False-positive rates were estimated from randomly reordered series of this dataset, and hit rates (proportion of eyes with significant deterioration) were estimated from the original series. The false-positive rates of PoPLR were indistinguishable from the corresponding nominal significance levels and were independent of baseline visual field damage and length of follow-up. At P < 0.05, the hit rates of PoPLR were 12, 29, and 42%, at the fifth, eighth, and final examinations, respectively, and at matching specificities they were consistently higher than those of pointwise linear regression. In contrast to population-based progression analyses, PoPLR provides a continuous estimate of statistical significance for visual field deterioration individualized to a particular patient's data. This allows close control over specificity, essential for monitoring patients in clinical practice and in clinical trials.
ERIC Educational Resources Information Center
Liou, Pey-Yan
2009-01-01
The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
Use of AMMI and linear regression models to analyze genotype-environment interaction in durum wheat.
Nachit, M M; Nachit, G; Ketata, H; Gauch, H G; Zobel, R W
1992-03-01
The joint durum wheat (Triticum turgidum L var 'durum') breeding program of the International Maize and Wheat Improvement Center (CIMMYT) and the International Center for Agricultural Research in the Dry Areas (ICARDA) for the Mediterranean region employs extensive multilocation testing. Multilocation testing produces significant genotype-environment (GE) interaction that reduces the accuracy for estimating yield and selecting appropriate germ plasm. The sum of squares (SS) of GE interaction was partitioned by linear regression techniques into joint, genotypic, and environmental regressions, and by Additive Main effects and the Multiplicative Interactions (AMMI) model into five significant Interaction Principal Component Axes (IPCA). The AMMI model was more effective in partitioning the interaction SS than the linear regression technique. The SS contained in the AMMI model was 6 times higher than the SS for all three regressions. Postdictive assessment recommended the use of the first five IPCA axes, while predictive assessment AMMI1 (main effects plus IPCA1). After elimination of random variation, AMMI1 estimates for genotypic yields within sites were more precise than unadjusted means. This increased precision was equivalent to increasing the number of replications by a factor of 3.7.
Lorenzo-Seva, Urbano; Ferrando, Pere J
2011-03-01
We provide an SPSS program that implements currently recommended techniques and recent developments for selecting variables in multiple linear regression analysis via the relative importance of predictors. The approach consists of: (1) optimally splitting the data for cross-validation, (2) selecting the final set of predictors to be retained in the equation regression, and (3) assessing the behavior of the chosen model using standard indices and procedures. The SPSS syntax, a short manual, and data files related to this article are available as supplemental materials from brm.psychonomic-journals.org/content/supplemental.
NASA Astrophysics Data System (ADS)
Gusriani, N.; Firdaniza
2018-03-01
The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.
Orthogonal Projection in Teaching Regression and Financial Mathematics
ERIC Educational Resources Information Center
Kachapova, Farida; Kachapov, Ilias
2010-01-01
Two improvements in teaching linear regression are suggested. The first is to include the population regression model at the beginning of the topic. The second is to use a geometric approach: to interpret the regression estimate as an orthogonal projection and the estimation error as the distance (which is minimized by the projection). Linear…
Logistic models--an odd(s) kind of regression.
Jupiter, Daniel C
2013-01-01
The logistic regression model bears some similarity to the multivariable linear regression with which we are familiar. However, the differences are great enough to warrant a discussion of the need for and interpretation of logistic regression. Copyright © 2013 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Wang, Zengjian; Zhang, Delong; Liang, Bishan; Chang, Song; Pan, Jinghua; Huang, Ruiwang; Liu, Ming
2016-01-01
Biological motion perception (BMP) refers to the ability to perceive the moving form of a human figure from a limited amount of stimuli, such as from a few point lights located on the joints of a moving body. BMP is commonplace and important, but there is great inter-individual variability in this ability. This study used multiple regression model analysis to explore the association between BMP performance and intrinsic brain activity, in order to investigate the neural substrates underlying inter-individual variability of BMP performance. The resting-state functional magnetic resonance imaging (rs-fMRI) and BMP performance data were collected from 24 healthy participants, for whom intrinsic brain networks were constructed, and a graph-based network efficiency metric was measured. Then, a multiple linear regression model was used to explore the association between network regional efficiency and BMP performance. We found that the local and global network efficiency of many regions was significantly correlated with BMP performance. Further analysis showed that the local efficiency rather than global efficiency could be used to explain most of the BMP inter-individual variability, and the regions involved were predominately located in the Default Mode Network (DMN). Additionally, discrimination analysis showed that the local efficiency of certain regions such as the thalamus could be used to classify BMP performance across participants. Notably, the association pattern between network nodal efficiency and BMP was different from the association pattern of static directional/gender information perception. Overall, these findings show that intrinsic brain network efficiency may be considered a neural factor that explains BMP inter-individual variability. PMID:27853427
Analysis of Learning Curve Fitting Techniques.
1987-09-01
1986. 15. Neter, John and others. Applied Linear Regression Models. Homewood IL: Irwin, 19-33. 16. SAS User’s Guide: Basics, Version 5 Edition. SAS... Linear Regression Techniques (15:23-52). Random errors are assumed to be normally distributed when using -# ordinary least-squares, according to Johnston...lot estimated by the improvement curve formula. For a more detailed explanation of the ordinary least-squares technique, see Neter, et. al., Applied
On vertical profile of ozone at Syowa
NASA Technical Reports Server (NTRS)
Chubachi, Shigeru
1994-01-01
The difference in the vertical ozone profile at Syowa between 1966-1981 and 1982-1988 is shown. The month-height cross section of the slope of the linear regressions between ozone partial pressure and 100-mb temperature is also shown. The vertically integrated values of the slopes are in close agreement with the slopes calculated by linear regression of Dobson total ozone on 100-mb temperature in the period of 1982-1988.
Kovačević, Strahinja; Karadžić, Milica; Podunavac-Kuzmanović, Sanja; Jevrić, Lidija
2018-01-01
The present study is based on the quantitative structure-activity relationship (QSAR) analysis of binding affinity toward human prion protein (huPrP C ) of quinacrine, pyridine dicarbonitrile, diphenylthiazole and diphenyloxazole analogs applying different linear and non-linear chemometric regression techniques, including univariate linear regression, multiple linear regression, partial least squares regression and artificial neural networks. The QSAR analysis distinguished molecular lipophilicity as an important factor that contributes to the binding affinity. Principal component analysis was used in order to reveal similarities or dissimilarities among the studied compounds. The analysis of in silico absorption, distribution, metabolism, excretion and toxicity (ADMET) parameters was conducted. The ranking of the studied analogs on the basis of their ADMET parameters was done applying the sum of ranking differences, as a relatively new chemometric method. The main aim of the study was to reveal the most important molecular features whose changes lead to the changes in the binding affinities of the studied compounds. Another point of view on the binding affinity of the most promising analogs was established by application of molecular docking analysis. The results of the molecular docking were proven to be in agreement with the experimental outcome. Copyright © 2017 Elsevier B.V. All rights reserved.
Classification of sodium MRI data of cartilage using machine learning.
Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R
2015-11-01
To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
Claessens, T E; Georgakopoulos, D; Afanasyeva, M; Vermeersch, S J; Millar, H D; Stergiopulos, N; Westerhof, N; Verdonck, P R; Segers, P
2006-04-01
The linear time-varying elastance theory is frequently used to describe the change in ventricular stiffness during the cardiac cycle. The concept assumes that all isochrones (i.e., curves that connect pressure-volume data occurring at the same time) are linear and have a common volume intercept. Of specific interest is the steepest isochrone, the end-systolic pressure-volume relationship (ESPVR), of which the slope serves as an index for cardiac contractile function. Pressure-volume measurements, achieved with a combined pressure-conductance catheter in the left ventricle of 13 open-chest anesthetized mice, showed a marked curvilinearity of the isochrones. We therefore analyzed the shape of the isochrones by using six regression algorithms (two linear, two quadratic, and two logarithmic, each with a fixed or time-varying intercept) and discussed the consequences for the elastance concept. Our main observations were 1) the volume intercept varies considerably with time; 2) isochrones are equally well described by using quadratic or logarithmic regression; 3) linear regression with a fixed intercept shows poor correlation (R(2) < 0.75) during isovolumic relaxation and early filling; and 4) logarithmic regression is superior in estimating the fixed volume intercept of the ESPVR. In conclusion, the linear time-varying elastance fails to provide a sufficiently robust model to account for changes in pressure and volume during the cardiac cycle in the mouse ventricle. A new framework accounting for the nonlinear shape of the isochrones needs to be developed.
Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M
2017-06-01
The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.
Application of General Regression Neural Network to the Prediction of LOD Change
NASA Astrophysics Data System (ADS)
Zhang, Xiao-Hong; Wang, Qi-Jie; Zhu, Jian-Jun; Zhang, Hao
2012-01-01
Traditional methods for predicting the change in length of day (LOD change) are mainly based on some linear models, such as the least square model and autoregression model, etc. However, the LOD change comprises complicated non-linear factors and the prediction effect of the linear models is always not so ideal. Thus, a kind of non-linear neural network — general regression neural network (GRNN) model is tried to make the prediction of the LOD change and the result is compared with the predicted results obtained by taking advantage of the BP (back propagation) neural network model and other models. The comparison result shows that the application of the GRNN to the prediction of the LOD change is highly effective and feasible.
[Visual field progression in glaucoma: cluster analysis].
Bresson-Dumont, H; Hatton, J; Foucher, J; Fonteneau, M
2012-11-01
Visual field progression analysis is one of the key points in glaucoma monitoring, but distinction between true progression and random fluctuation is sometimes difficult. There are several different algorithms but no real consensus for detecting visual field progression. The trend analysis of global indices (MD, sLV) may miss localized deficits or be affected by media opacities. Conversely, point-by-point analysis makes progression difficult to differentiate from physiological variability, particularly when the sensitivity of a point is already low. The goal of our study was to analyse visual field progression with the EyeSuite™ Octopus Perimetry Clusters algorithm in patients with no significant changes in global indices or worsening of the analysis of pointwise linear regression. We analyzed the visual fields of 162 eyes (100 patients - 58 women, 42 men, average age 66.8 ± 10.91) with ocular hypertension or glaucoma. For inclusion, at least six reliable visual fields per eye were required, and the trend analysis (EyeSuite™ Perimetry) of visual field global indices (MD and SLV), could show no significant progression. The analysis of changes in cluster mode was then performed. In a second step, eyes with statistically significant worsening of at least one of their clusters were analyzed point-by-point with the Octopus Field Analysis (OFA). Fifty four eyes (33.33%) had a significant worsening in some clusters, while their global indices remained stable over time. In this group of patients, more advanced glaucoma was present than in stable group (MD 6.41 dB vs. 2.87); 64.82% (35/54) of those eyes in which the clusters progressed, however, had no statistically significant change in the trend analysis by pointwise linear regression. Most software algorithms for analyzing visual field progression are essentially trend analyses of global indices, or point-by-point linear regression. This study shows the potential role of analysis by clusters trend. However, for best results, it is preferable to compare the analyses of several tests in combination with morphologic exam. Copyright © 2012 Elsevier Masson SAS. All rights reserved.
Wu, F; Callisaya, M; Laslett, L L; Wills, K; Zhou, Y; Jones, G; Winzenberg, T
2016-07-01
This was the first study investigating both linear associations between lower limb muscle strength and balance in middle-aged women and the potential for thresholds for the associations. There was strong evidence that even in middle-aged women, poorer LMS was associated with reduced balance. However, no evidence was found for thresholds. Decline in balance begins in middle age, yet, the role of muscle strength in balance is rarely examined in this age group. We aimed to determine the association between lower limb muscle strength (LMS) and balance in middle-aged women and investigate whether cut-points of LMS exist that might identify women at risk of poorer balance. Cross-sectional analysis of 345 women aged 36-57 years was done. Associations between LMS and balance tests (timed up and go (TUG), step test (ST), functional reach test (FRT), and lateral reach test (LRT)) were assessed using linear regression. Nonlinear associations were explored using locally weighted regression smoothing (LOWESS) and potential cut-points identified using nonlinear least-squares estimation. Segmented regression was used to estimate associations above and below the identified cut-points. Weaker LMS was associated with poorer performance on the TUG (β -0.008 (95 % CI: -0.010, -0.005) second/kg), ST (β 0.031 (0.011, 0.051) step/kg), FRT (β 0.071 (0.047, 0.096) cm/kg), and LRT (β 0.028 (0.011, 0.044) cm/kg), independent of confounders. Potential nonlinear associations were evident from LOWESS results; significant cut-points of LMS were identified for all balance tests (29-50 kg). However, excepting ST, cut-points did not persist after excluding potentially influential data points. In middle-aged women, poorer LMS is associated with reduced balance. Therefore, improving muscle strength in middle-age may be a useful strategy to improve balance and reduce falls risk in later life. Middle-aged women with low muscle strength may be an effective target group for future randomized controlled trials. Australian New Zealand Clinical Trials Registry (ANZCTR) NCT00273260.
The Evaluation on the Cadmium Net Concentration for Soil Ecosystems.
Yao, Yu; Wang, Pei-Fang; Wang, Chao; Hou, Jun; Miao, Ling-Zhan
2017-03-12
Yixing, known as the "City of Ceramics", is facing a new dilemma: a raw material crisis. Cadmium (Cd) exists in extremely high concentrations in soil due to the considerable input of industrial wastewater into the soil ecosystem. The in situ technique of diffusive gradients in thin film (DGT), the ex situ static equilibrium approach (HAc, EDTA and CaCl2), and the dissolved concentration in soil solution, as well as microwave digestion, were applied to predict the Cd bioavailability of soil, aiming to provide a robust and accurate method for Cd bioavailability evaluation in Yixing. Moreover, the typical local cash crops-paddy and zizania aquatica-were selected for Cd accumulation, aiming to select the ideal plants with tolerance to the soil Cd contamination. The results indicated that the biomasses of the two applied plants were sufficiently sensitive to reflect the stark regional differences of different sampling sites. The zizania aquatica could effectively reduce the total Cd concentration, as indicated by the high accumulation coefficients. However, the fact that the zizania aquatica has extremely high transfer coefficients, and its stem, as the edible part, might accumulate large amounts of Cd, led to the conclusion that zizania aquatica was not an ideal cash crop in Yixing. Furthermore, the labile Cd concentrations which were obtained by the DGT technique and dissolved in the soil solution showed a significant correlation with the Cd concentrations of the biota accumulation. However, the ex situ methods and the microwave digestion-obtained Cd concentrations showed a poor correlation with the accumulated Cd concentration in plant tissue. Correspondingly, the multiple linear regression models were built for fundamental analysis of the performance of different methods available for Cd bioavailability evaluation. The correlation coefficients of DGT obtained by the improved multiple linear regression model have not significantly improved compared to the coefficients obtained by the simple linear regression model. The results revealed that DGT was a robust measurement, which could obtain the labile Cd concentrations independent of the physicochemical features' variation in the soil ecosystem. Consequently, these findings provide stronger evidence that DGT is an effective and ideal tool for labile Cd evaluation in Yixing.
The Evaluation on the Cadmium Net Concentration for Soil Ecosystems
Yao, Yu; Wang, Pei-Fang; Wang, Chao; Hou, Jun; Miao, Ling-Zhan
2017-01-01
Yixing, known as the “City of Ceramics”, is facing a new dilemma: a raw material crisis. Cadmium (Cd) exists in extremely high concentrations in soil due to the considerable input of industrial wastewater into the soil ecosystem. The in situ technique of diffusive gradients in thin film (DGT), the ex situ static equilibrium approach (HAc, EDTA and CaCl2), and the dissolved concentration in soil solution, as well as microwave digestion, were applied to predict the Cd bioavailability of soil, aiming to provide a robust and accurate method for Cd bioavailability evaluation in Yixing. Moreover, the typical local cash crops—paddy and zizania aquatica—were selected for Cd accumulation, aiming to select the ideal plants with tolerance to the soil Cd contamination. The results indicated that the biomasses of the two applied plants were sufficiently sensitive to reflect the stark regional differences of different sampling sites. The zizania aquatica could effectively reduce the total Cd concentration, as indicated by the high accumulation coefficients. However, the fact that the zizania aquatica has extremely high transfer coefficients, and its stem, as the edible part, might accumulate large amounts of Cd, led to the conclusion that zizania aquatica was not an ideal cash crop in Yixing. Furthermore, the labile Cd concentrations which were obtained by the DGT technique and dissolved in the soil solution showed a significant correlation with the Cd concentrations of the biota accumulation. However, the ex situ methods and the microwave digestion-obtained Cd concentrations showed a poor correlation with the accumulated Cd concentration in plant tissue. Correspondingly, the multiple linear regression models were built for fundamental analysis of the performance of different methods available for Cd bioavailability evaluation. The correlation coefficients of DGT obtained by the improved multiple linear regression model have not significantly improved compared to the coefficients obtained by the simple linear regression model. The results revealed that DGT was a robust measurement, which could obtain the labile Cd concentrations independent of the physicochemical features’ variation in the soil ecosystem. Consequently, these findings provide stronger evidence that DGT is an effective and ideal tool for labile Cd evaluation in Yixing. PMID:28287500
Estimating effects of limiting factors with regression quantiles
Cade, B.S.; Terrell, J.W.; Schroeder, R.L.
1999-01-01
In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
40 CFR 1066.220 - Linearity verification for chassis dynamometer systems.
Code of Federal Regulations, 2014 CFR
2014-07-01
... dynamometer speed and torque at least as frequently as indicated in Table 1 of § 1066.215. The intent of... linear regression and the linearity criteria specified in Table 1 of this section. (b) Performance requirements. If a measurement system does not meet the applicable linearity criteria in Table 1 of this...
ERIC Educational Resources Information Center
Hovardas, Tasos
2016-01-01
Although ecological systems at varying scales involve non-linear interactions, learners insist thinking in a linear fashion when they deal with ecological phenomena. The overall objective of the present contribution was to propose a hypothetical learning progression for developing non-linear reasoning in prey-predator systems and to provide…
ERIC Educational Resources Information Center
Ker, H. W.
2014-01-01
Multilevel data are very common in educational research. Hierarchical linear models/linear mixed-effects models (HLMs/LMEs) are often utilized to analyze multilevel data nowadays. This paper discusses the problems of utilizing ordinary regressions for modeling multilevel educational data, compare the data analytic results from three regression…
Artes, Paul H; Crabb, David P
2010-01-01
To investigate why the specificity of the Moorfields Regression Analysis (MRA) of the Heidelberg Retina Tomograph (HRT) varies with disc size, and to derive accurate normative limits for neuroretinal rim area to address this problem. Two datasets from healthy subjects (Manchester, UK, n = 88; Halifax, Nova Scotia, Canada, n = 75) were used to investigate the physiological relationship between the optic disc and neuroretinal rim area. Normative limits for rim area were derived by quantile regression (QR) and compared with those of the MRA (derived by linear regression). Logistic regression analyses were performed to quantify the association between disc size and positive classifications with the MRA, as well as with the QR-derived normative limits. In both datasets, the specificity of the MRA depended on optic disc size. The odds of observing a borderline or outside-normal-limits classification increased by approximately 10% for each 0.1 mm(2) increase in disc area (P < 0.1). The lower specificity of the MRA with large optic discs could be explained by the failure of linear regression to model the extremes of the rim area distribution (observations far from the mean). In comparison, the normative limits predicted by QR were larger for smaller discs (less specific, more sensitive), and smaller for larger discs, such that false-positive rates became independent of optic disc size. Normative limits derived by quantile regression appear to remove the size-dependence of specificity with the MRA. Because quantile regression does not rely on the restrictive assumptions of standard linear regression, it may be a more appropriate method for establishing normative limits in other clinical applications where the underlying distributions are nonnormal or have nonconstant variance.
NASA Astrophysics Data System (ADS)
Lu, Cai; Jia, Yifei; Jing, Lei; Zeng, Qing; Lei, Jialin; Zhang, Shuanghu; Lei, Guangchun; Wen, Li
2018-04-01
Better understanding of the dynamics of hydrological connectivity between river and floodplain is essential for the ecological integrity of river systems. In this study, we proposed a regime-switch modelling (RSM) framework, which integrates change point analysis with dynamic linear regression, to detect and date change points in linear regression, and to quantify the relative importance of natural variations and anthropogenic disturbances. The approach was applied to the long-term hydrological time series to investigate the evolution of river-floodplain relation in Dongting Lake in the last five decades, during which the Yangtze River system experienced unprecedented anthropogenic manipulations. Our results suggested that 1) there were five distinct regimes during which the influence of inflows and local climate on lake water level changed significantly. The detected change points were well corresponding to the major events occurred upon the Yangtze; 2) although the importance of inflows from the Yangtze was greater than that of the tributaries flows over the five regimes, the relative contribution gradually decreased from regime 1 to regime 5. The weakening of hydrological forcing from the Yangtze was mainly attributed to the reduction in channel capacity resulting from sedimentation in the outfalls and water level dropping caused by river bed scour in the mainstream; 3) the effects of local climate was much smaller than these of inflows; and 4) since the operation of The Three Gorges Dam in 2006, the river-floodplain relationship entered a new equilibrium in that all investigated variables changed synchronously in terms of direction and magnitude. The results from this study reveal the mechanisms underlying the alternated inundation regime in Dongting Lake. The identified change points, some of which have not been previously reported, will allow a reappraisal of the current dam and reservoir operation strategies not only for flood/drought risk management but also for the maintenance and restoration of the regional ecological integrity.
Cui, Yang; Wang, Silong; Yan, Shaokui
2016-01-01
Phi coefficient directly depends on the frequencies of occurrence of organisms and has been widely used in vegetation ecology to analyse the associations of organisms with site groups, providing a characterization of ecological preference, but its application in soil ecology remains rare. Based on a single field experiment, this study assessed the applicability of phi coefficient in indicating the habitat preferences of soil fauna, through comparing phi coefficient-induced results with those of ordination methods in charactering soil fauna-habitat(factors) relationships. Eight different habitats of soil fauna were implemented by reciprocal transfer of defaunated soil cores between two types of subtropical forests. Canonical correlation analysis (CCorA) showed that ecological patterns of fauna-habitat relationships and inter-fauna taxa relationships expressed, respectively, by phi coefficients and predicted abundances calculated from partial redundancy analysis (RDA), were extremely similar, and a highly significant relationship between the two datasets was observed (Pillai's trace statistic = 1.998, P = 0.007). In addition, highly positive correlations between phi coefficients and predicted abundances for Acari, Collembola, Nematode and Hemiptera were observed using linear regression analysis. Quantitative relationships between habitat preferences and soil chemical variables were also obtained by linear regression, which were analogous to the results displayed in a partial RDA biplot. Our results suggest that phi coefficient could be applicable on a local scale in evaluating habitat preferences of soil fauna at coarse taxonomic levels, and that the phi coefficient-induced information, such as ecological preferences and the associated quantitative relationships with habitat factors, will be largely complementary to the results of ordination methods. The application of phi coefficient in soil ecology may extend our knowledge about habitat preferences and distribution-abundance relationships, which will benefit the understanding of biodistributions and variations in community compositions in the soil. Similar studies in other places and scales apart from our local site will be need for further evaluation of phi coefficient.
Cui, Yang; Wang, Silong; Yan, Shaokui
2016-01-01
Phi coefficient directly depends on the frequencies of occurrence of organisms and has been widely used in vegetation ecology to analyse the associations of organisms with site groups, providing a characterization of ecological preference, but its application in soil ecology remains rare. Based on a single field experiment, this study assessed the applicability of phi coefficient in indicating the habitat preferences of soil fauna, through comparing phi coefficient-induced results with those of ordination methods in charactering soil fauna-habitat(factors) relationships. Eight different habitats of soil fauna were implemented by reciprocal transfer of defaunated soil cores between two types of subtropical forests. Canonical correlation analysis (CCorA) showed that ecological patterns of fauna-habitat relationships and inter-fauna taxa relationships expressed, respectively, by phi coefficients and predicted abundances calculated from partial redundancy analysis (RDA), were extremely similar, and a highly significant relationship between the two datasets was observed (Pillai's trace statistic = 1.998, P = 0.007). In addition, highly positive correlations between phi coefficients and predicted abundances for Acari, Collembola, Nematode and Hemiptera were observed using linear regression analysis. Quantitative relationships between habitat preferences and soil chemical variables were also obtained by linear regression, which were analogous to the results displayed in a partial RDA biplot. Our results suggest that phi coefficient could be applicable on a local scale in evaluating habitat preferences of soil fauna at coarse taxonomic levels, and that the phi coefficient-induced information, such as ecological preferences and the associated quantitative relationships with habitat factors, will be largely complementary to the results of ordination methods. The application of phi coefficient in soil ecology may extend our knowledge about habitat preferences and distribution-abundance relationships, which will benefit the understanding of biodistributions and variations in community compositions in the soil. Similar studies in other places and scales apart from our local site will be need for further evaluation of phi coefficient. PMID:26930593
Díaz, Humberto González; de Armas, Ronal Ramos; Molina, Reinaldo
2003-11-01
Many experts worldwide have highlighted the potential of RNA molecules as drug targets for the chemotherapeutic treatment of a range of diseases. In particular, the molecular pockets of RNA in the HIV-1 packaging region have been postulated as promising sites for antiviral action. The discovery of simpler methods to accurately represent drug-RNA interactions could therefore become an interesting and rapid way to generate models that are complementary to docking-based systems. The entropies of a vibrational Markov chain have been introduced here as physically meaningful descriptors for the local drug-nucleic acid complexes. A study of the interaction of the antibiotic Paromomycin with the packaging region of the RNA present in type-1 HIV has been carried out as an illustrative example of this approach. A linear discriminant function gave rise to excellent discrimination among 80.13% of interacting/non-interacting sites. More specifically, the model classified 36/45 nucleotides (80.0%) that interacted with paromomycin and, in addition, 85/106 (80.2%) footprinted (non-interacting) sites from the RNA viral sequence were recognized. The model showed a high Matthews' regression coefficient (C = 0.64). The Jackknife method was also used to assess the stability and predictability of the model by leaving out adenines, C, G, or U. Matthews' coefficients and overall accuracies for these approaches were between 0.55 and 0.68 and 75.8 and 82.7, respectively. On the other hand, a linear regression model predicted the local binding affinity constants between a specific nucleotide and the aforementioned antibiotic (R2 = 0.83,Q2 = 0.825). These kinds of models may play an important role either in the discovery of new anti-HIV compounds or in the elucidation of their mode of action. On request from the corresponding author (humbertogd@cbq.uclv.edu.cu or humbertogd@navegalia.com).
Automated Algorithms for Quantum-Level Accuracy in Atomistic Simulations: LDRD Final Report.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Thompson, Aidan Patrick; Schultz, Peter Andrew; Crozier, Paul
2014-09-01
This report summarizes the result of LDRD project 12-0395, titled "Automated Algorithms for Quantum-level Accuracy in Atomistic Simulations." During the course of this LDRD, we have developed an interatomic potential for solids and liquids called Spectral Neighbor Analysis Poten- tial (SNAP). The SNAP potential has a very general form and uses machine-learning techniques to reproduce the energies, forces, and stress tensors of a large set of small configurations of atoms, which are obtained using high-accuracy quantum electronic structure (QM) calculations. The local environment of each atom is characterized by a set of bispectrum components of the local neighbor density projectedmore » on to a basis of hyperspherical harmonics in four dimensions. The SNAP coef- ficients are determined using weighted least-squares linear regression against the full QM training set. This allows the SNAP potential to be fit in a robust, automated manner to large QM data sets using many bispectrum components. The calculation of the bispectrum components and the SNAP potential are implemented in the LAMMPS parallel molecular dynamics code. Global optimization methods in the DAKOTA software package are used to seek out good choices of hyperparameters that define the overall structure of the SNAP potential. FitSnap.py, a Python-based software pack- age interfacing to both LAMMPS and DAKOTA is used to formulate the linear regression problem, solve it, and analyze the accuracy of the resultant SNAP potential. We describe a SNAP potential for tantalum that accurately reproduces a variety of solid and liquid properties. Most significantly, in contrast to existing tantalum potentials, SNAP correctly predicts the Peierls barrier for screw dislocation motion. We also present results from SNAP potentials generated for indium phosphide (InP) and silica (SiO 2 ). We describe efficient algorithms for calculating SNAP forces and energies in molecular dynamics simulations using massively parallel computers and advanced processor ar- chitectures. Finally, we briefly describe the MSM method for efficient calculation of electrostatic interactions on massively parallel computers.« less
El Beltagi, Tarek A; Bowd, Christopher; Boden, Catherine; Amini, Payam; Sample, Pamela A; Zangwill, Linda M; Weinreb, Robert N
2003-11-01
To determine the relationship between areas of glaucomatous retinal nerve fiber layer thinning identified by optical coherence tomography and areas of decreased visual field sensitivity identified by standard automated perimetry in glaucomatous eyes. Retrospective observational case series. Forty-three patients with glaucomatous optic neuropathy identified by optic disc stereo photographs and standard automated perimetry mean deviations >-8 dB were included. Participants were imaged with optical coherence tomography within 6 months of reliable standard automated perimetry testing. The location and number of optical coherence tomography clock hour retinal nerve fiber layer thickness measures outside normal limits were compared with the location and number of standard automated perimetry visual field zones outside normal limits. Further, the relationship between the deviation from normal optical coherence tomography-measured retinal nerve fiber layer thickness at each clock hour and the average pattern deviation in each visual field zone was examined by using linear regression (R(2)). The retinal nerve fiber layer areas most frequently outside normal limits were the inferior and inferior temporal regions. The least sensitive visual field zones were in the superior hemifield. Linear regression results (R(2)) showed that deviation from the normal retinal nerve fiber layer thickness at optical coherence tomography clock hour positions 6 o'clock, 7 o'clock, and 8 o'clock (inferior and inferior temporal) was best correlated with standard automated perimetry pattern deviation in visual field zones corresponding to the superior arcuate and nasal step regions (R(2) range, 0.34-0.57). These associations were much stronger than those between clock hour position 6 o'clock and the visual field zone corresponding to the inferior nasal step region (R(2) = 0.01). Localized retinal nerve fiber layer thinning, measured by optical coherence tomography, is topographically related to decreased localized standard automated perimetry sensitivity in glaucoma patients.
NASA Technical Reports Server (NTRS)
MCKissick, Burnell T. (Technical Monitor); Plassman, Gerald E.; Mall, Gerald H.; Quagliano, John R.
2005-01-01
Linear multivariable regression models for predicting day and night Eddy Dissipation Rate (EDR) from available meteorological data sources are defined and validated. Model definition is based on a combination of 1997-2000 Dallas/Fort Worth (DFW) data sources, EDR from Aircraft Vortex Spacing System (AVOSS) deployment data, and regression variables primarily from corresponding Automated Surface Observation System (ASOS) data. Model validation is accomplished through EDR predictions on a similar combination of 1994-1995 Memphis (MEM) AVOSS and ASOS data. Model forms include an intercept plus a single term of fixed optimal power for each of these regression variables; 30-minute forward averaged mean and variance of near-surface wind speed and temperature, variance of wind direction, and a discrete cloud cover metric. Distinct day and night models, regressing on EDR and the natural log of EDR respectively, yield best performance and avoid model discontinuity over day/night data boundaries.
NASA Astrophysics Data System (ADS)
Chu, Hone-Jay; Kong, Shish-Jeng; Chang, Chih-Hua
2018-03-01
The turbidity (TB) of a water body varies with time and space. Water quality is traditionally estimated via linear regression based on satellite images. However, estimating and mapping water quality require a spatio-temporal nonstationary model, while TB mapping necessitates the use of geographically and temporally weighted regression (GTWR) and geographically weighted regression (GWR) models, both of which are more precise than linear regression. Given the temporal nonstationary models for mapping water quality, GTWR offers the best option for estimating regional water quality. Compared with GWR, GTWR provides highly reliable information for water quality mapping, boasts a relatively high goodness of fit, improves the explanation of variance from 44% to 87%, and shows a sufficient space-time explanatory power. The seasonal patterns of TB and the main spatial patterns of TB variability can be identified using the estimated TB maps from GTWR and by conducting an empirical orthogonal function (EOF) analysis.
Vaughan, Adam S; Kramer, Michael R; Waller, Lance A; Schieb, Linda J; Greer, Sophia; Casper, Michele
2015-05-01
To demonstrate the implications of choosing analytical methods for quantifying spatiotemporal trends, we compare the assumptions, implementation, and outcomes of popular methods using county-level heart disease mortality in the United States between 1973 and 2010. We applied four regression-based approaches (joinpoint regression, both aspatial and spatial generalized linear mixed models, and Bayesian space-time model) and compared resulting inferences for geographic patterns of local estimates of annual percent change and associated uncertainty. The average local percent change in heart disease mortality from each method was -4.5%, with the Bayesian model having the smallest range of values. The associated uncertainty in percent change differed markedly across the methods, with the Bayesian space-time model producing the narrowest range of variance (0.0-0.8). The geographic pattern of percent change was consistent across methods with smaller declines in the South Central United States and larger declines in the Northeast and Midwest. However, the geographic patterns of uncertainty differed markedly between methods. The similarity of results, including geographic patterns, for magnitude of percent change across these methods validates the underlying spatial pattern of declines in heart disease mortality. However, marked differences in degree of uncertainty indicate that Bayesian modeling offers substantially more precise estimates. Copyright © 2015 Elsevier Inc. All rights reserved.
Goovaerts, P.; Albuquerque, Teresa; Antunes, Margarida
2015-01-01
This paper describes a multivariate geostatistical methodology to delineate areas of potential interest for future sedimentary gold exploration, with an application to an abandoned sedimentary gold mining region in Portugal. The main challenge was the existence of only a dozen gold measurements confined to the grounds of the old gold mines, which precluded the application of traditional interpolation techniques, such as cokriging. The analysis could, however, capitalize on 376 stream sediment samples that were analyzed for twenty two elements. Gold (Au) was first predicted at all 376 locations using linear regression (R2=0.798) and four metals (Fe, As, Sn and W), which are known to be mostly associated with the local gold’s paragenesis. One hundred realizations of the spatial distribution of gold content were generated using sequential indicator simulation and a soft indicator coding of regression estimates, to supplement the hard indicator coding of gold measurements. Each simulated map then underwent a local cluster analysis to identify significant aggregates of low or high values. The one hundred classified maps were processed to derive the most likely classification of each simulated node and the associated probability of occurrence. Examining the distribution of the hot-spots and cold-spots reveals a clear enrichment in Au along the Erges River downstream from the old sedimentary mineralization. PMID:27777638
Mental chronometry with simple linear regression.
Chen, J Y
1997-10-01
Typically, mental chronometry is performed by means of introducing an independent variable postulated to affect selectively some stage of a presumed multistage process. However, the effect could be a global one that spreads proportionally over all stages of the process. Currently, there is no method to test this possibility although simple linear regression might serve the purpose. In the present study, the regression approach was tested with tasks (memory scanning and mental rotation) that involved a selective effect and with a task (word superiority effect) that involved a global effect, by the dominant theories. The results indicate (1) the manipulation of the size of a memory set or of angular disparity affects the intercept of the regression function that relates the times for memory scanning with different set sizes or for mental rotation with different angular disparities and (2) the manipulation of context affects the slope of the regression function that relates the times for detecting a target character under word and nonword conditions. These ratify the regression approach as a useful method for doing mental chronometry.
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
Nonparametric Regression and the Parametric Bootstrap for Local Dependence Assessment.
ERIC Educational Resources Information Center
Habing, Brian
2001-01-01
Discusses ideas underlying nonparametric regression and the parametric bootstrap with an overview of their application to item response theory and the assessment of local dependence. Illustrates the use of the method in assessing local dependence that varies with examinee trait levels. (SLD)
How is the weather? Forecasting inpatient glycemic control
Saulnier, George E; Castro, Janna C; Cook, Curtiss B; Thompson, Bithika M
2017-01-01
Aim: Apply methods of damped trend analysis to forecast inpatient glycemic control. Method: Observed and calculated point-of-care blood glucose data trends were determined over 62 weeks. Mean absolute percent error was used to calculate differences between observed and forecasted values. Comparisons were drawn between model results and linear regression forecasting. Results: The forecasted mean glucose trends observed during the first 24 and 48 weeks of projections compared favorably to the results provided by linear regression forecasting. However, in some scenarios, the damped trend method changed inferences compared with linear regression. In all scenarios, mean absolute percent error values remained below the 10% accepted by demand industries. Conclusion: Results indicate that forecasting methods historically applied within demand industries can project future inpatient glycemic control. Additional study is needed to determine if forecasting is useful in the analyses of other glucometric parameters and, if so, how to apply the techniques to quality improvement. PMID:29134125
Lee, Eunjee; Zhu, Hongtu; Kong, Dehan; Wang, Yalin; Giovanello, Kelly Sullivan; Ibrahim, Joseph G
2015-01-01
The aim of this paper is to develop a Bayesian functional linear Cox regression model (BFLCRM) with both functional and scalar covariates. This new development is motivated by establishing the likelihood of conversion to Alzheimer’s disease (AD) in 346 patients with mild cognitive impairment (MCI) enrolled in the Alzheimer’s Disease Neuroimaging Initiative 1 (ADNI-1) and the early markers of conversion. These 346 MCI patients were followed over 48 months, with 161 MCI participants progressing to AD at 48 months. The functional linear Cox regression model was used to establish that functional covariates including hippocampus surface morphology and scalar covariates including brain MRI volumes, cognitive performance (ADAS-Cog), and APOE status can accurately predict time to onset of AD. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of BFLCRM. PMID:26900412
Liquid electrolyte informatics using an exhaustive search with linear regression.
Sodeyama, Keitaro; Igarashi, Yasuhiko; Nakayama, Tomofumi; Tateyama, Yoshitaka; Okada, Masato
2018-06-14
Exploring new liquid electrolyte materials is a fundamental target for developing new high-performance lithium-ion batteries. In contrast to solid materials, disordered liquid solution properties have been less studied by data-driven information techniques. Here, we examined the estimation accuracy and efficiency of three information techniques, multiple linear regression (MLR), least absolute shrinkage and selection operator (LASSO), and exhaustive search with linear regression (ES-LiR), by using coordination energy and melting point as test liquid properties. We then confirmed that ES-LiR gives the most accurate estimation among the techniques. We also found that ES-LiR can provide the relationship between the "prediction accuracy" and "calculation cost" of the properties via a weight diagram of descriptors. This technique makes it possible to choose the balance of the "accuracy" and "cost" when the search of a huge amount of new materials was carried out.
Huang, Jian; Zhang, Cun-Hui
2013-01-01
The ℓ1-penalized method, or the Lasso, has emerged as an important tool for the analysis of large data sets. Many important results have been obtained for the Lasso in linear regression which have led to a deeper understanding of high-dimensional statistical problems. In this article, we consider a class of weighted ℓ1-penalized estimators for convex loss functions of a general form, including the generalized linear models. We study the estimation, prediction, selection and sparsity properties of the weighted ℓ1-penalized estimator in sparse, high-dimensional settings where the number of predictors p can be much larger than the sample size n. Adaptive Lasso is considered as a special case. A multistage method is developed to approximate concave regularized estimation by applying an adaptive Lasso recursively. We provide prediction and estimation oracle inequalities for single- and multi-stage estimators, a general selection consistency theorem, and an upper bound for the dimension of the Lasso estimator. Important models including the linear regression, logistic regression and log-linear models are used throughout to illustrate the applications of the general results. PMID:24348100
Takahashi, Masanori; Maruyama, Hitoshi; Shimada, Taro; Kamezaki, Hidehiro; Okabe, Shinichiro; Kanai, Fumihiko; Yoshikawa, Masaharu; Yokosuka, Osamu
2012-11-01
This prospective study was performed in 179 hepatocellular carcinoma (HCC) lesions treated by radio-frequency ablation (RFA) to explore the clinical outcome of "linear enhancement" on contrast-enhanced sonogram. Thirty-three lesions (18.4%) showed linear enhancement, a linear-shaped positive enhancement in the RFA-treated area. Seventeen of them were followed up with no treatment (remaining 16; dropout in eight, additional RFA in six and ineffective treatment in two) and three lesions (3/17, 17.6%) showed local tumor progression corresponding to linear enhancement at 7, 14, 19 months after RFA. Although there was no significant difference in local recurrence rate between the lesions with (3/17) and without linear enhancement (10/35), local tumor progression inside the ablation zone occurred only in the lesions with linear enhancement. In conclusion, linear enhancement inside the RFA-treated area should be followed up within 7 months because it has a risk of local tumor progression. Histology of linear enhancement and its influence on distant recurrence remain to be solved. Copyright © 2012 World Federation for Ultrasound in Medicine & Biology. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Haris, A.; Nafian, M.; Riyanto, A.
2017-07-01
Danish North Sea Fields consist of several formations (Ekofisk, Tor, and Cromer Knoll) that was started from the age of Paleocene to Miocene. In this study, the integration of seismic and well log data set is carried out to determine the chalk sand distribution in the Danish North Sea field. The integration of seismic and well log data set is performed by using the seismic inversion analysis and seismic multi-attribute. The seismic inversion algorithm, which is used to derive acoustic impedance (AI), is model-based technique. The derived AI is then used as external attributes for the input of multi-attribute analysis. Moreover, the multi-attribute analysis is used to generate the linear and non-linear transformation of among well log properties. In the case of the linear model, selected transformation is conducted by weighting step-wise linear regression (SWR), while for the non-linear model is performed by using probabilistic neural networks (PNN). The estimated porosity, which is resulted by PNN shows better suited to the well log data compared with the results of SWR. This result can be understood since PNN perform non-linear regression so that the relationship between the attribute data and predicted log data can be optimized. The distribution of chalk sand has been successfully identified and characterized by porosity value ranging from 23% up to 30%.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso.
Kong, Shengchun; Nan, Bin
2014-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses.
Non-Asymptotic Oracle Inequalities for the High-Dimensional Cox Regression via Lasso
Kong, Shengchun; Nan, Bin
2013-01-01
We consider finite sample properties of the regularized high-dimensional Cox regression via lasso. Existing literature focuses on linear models or generalized linear models with Lipschitz loss functions, where the empirical risk functions are the summations of independent and identically distributed (iid) losses. The summands in the negative log partial likelihood function for censored survival data, however, are neither iid nor Lipschitz.We first approximate the negative log partial likelihood function by a sum of iid non-Lipschitz terms, then derive the non-asymptotic oracle inequalities for the lasso penalized Cox regression using pointwise arguments to tackle the difficulties caused by lacking iid Lipschitz losses. PMID:24516328
Brain tumor segmentation based on local independent projection-based classification.
Huang, Meiyan; Yang, Wei; Wu, Yao; Jiang, Jun; Chen, Wufan; Feng, Qianjin
2014-10-01
Brain tumor segmentation is an important procedure for early tumor diagnosis and radiotherapy planning. Although numerous brain tumor segmentation methods have been presented, enhancing tumor segmentation methods is still challenging because brain tumor MRI images exhibit complex characteristics, such as high diversity in tumor appearance and ambiguous tumor boundaries. To address this problem, we propose a novel automatic tumor segmentation method for MRI images. This method treats tumor segmentation as a classification problem. Additionally, the local independent projection-based classification (LIPC) method is used to classify each voxel into different classes. A novel classification framework is derived by introducing the local independent projection into the classical classification model. Locality is important in the calculation of local independent projections for LIPC. Locality is also considered in determining whether local anchor embedding is more applicable in solving linear projection weights compared with other coding methods. Moreover, LIPC considers the data distribution of different classes by learning a softmax regression model, which can further improve classification performance. In this study, 80 brain tumor MRI images with ground truth data are used as training data and 40 images without ground truth data are used as testing data. The segmentation results of testing data are evaluated by an online evaluation tool. The average dice similarities of the proposed method for segmenting complete tumor, tumor core, and contrast-enhancing tumor on real patient data are 0.84, 0.685, and 0.585, respectively. These results are comparable to other state-of-the-art methods.
Functional Relationships and Regression Analysis.
ERIC Educational Resources Information Center
Preece, Peter F. W.
1978-01-01
Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Isolating and Examining Sources of Suppression and Multicollinearity in Multiple Linear Regression
ERIC Educational Resources Information Center
Beckstead, Jason W.
2012-01-01
The presence of suppression (and multicollinearity) in multiple regression analysis complicates interpretation of predictor-criterion relationships. The mathematical conditions that produce suppression in regression analysis have received considerable attention in the methodological literature but until now nothing in the way of an analytic…
Suppression Situations in Multiple Linear Regression
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
Local field potential spectral tuning in motor cortex during reaching.
Heldman, Dustin A; Wang, Wei; Chan, Sherwin S; Moran, Daniel W
2006-06-01
In this paper, intracortical local field potentials (LFPs) and single units were recorded from the motor cortices of monkeys (Macaca fascicularis) while they preformed a standard three-dimensional (3-D) center-out reaching task. During the center-out task, the subjects held their hands at the location of a central target and then reached to one of eight peripheral targets forming the corners of a virtual cube. The spectral amplitudes of the recorded LFPs were calculated, with the high-frequency LFP (HF-LFP) defined as the average spectral amplitude change from baseline from 60 to 200 Hz. A 3-D linear regression across the eight center-out targets revealed that approximately 6% of the beta LFPs (18-26 Hz) and 18% of the HF-LFPs were tuned for velocity (p-value < 0.05), while 10% of the beta LFPs and 15% of the HF-LFPs were tuned for position. These results suggest that a multidegree-of-freedom brain-machine interface is possible using high-frequency LFP recordings in motor cortex.
Gerbasi, Margaret E.; Richards, Lauren K.; Thomas, Jennifer J.; Agnew-Blais, Jessica C.; Thompson-Brenner, Heather; Gilman, Stephen E.; Becker, Anne E.
2014-01-01
Objective The increasing global health burden imposed by eating disorders warrants close examination of social exposures associated with globalization that potentially elevate risk during the critical developmental period of adolescence in low- and middle-income countries (LMICs). The study aim was to investigate the association of peer influence and perceived social norms with adolescent eating pathology in Fiji, a LMIC undergoing rapid social change. Method We measured peer influence on eating concerns (with the Inventory of Peer Influence on Eating Concerns; IPIEC), perceived peer norms associated with disordered eating and body concerns, perceived community cultural norms, and individual cultural orientations in a representative sample of school-going ethnic Fijian adolescent girls (n=523). We then developed a multivariable linear regression model to examine their relation to eating pathology (measured by the Eating Disorder Examination-Questionnaire; EDE-Q). Results We found independent and statistically significant associations between both IPIEC scores and our proxy for perceived social norms specific to disordered eating (both p <.001) and EDE-Q global scores in a fully adjusted linear regression model. Discussion Study findings support the possibility that peer influence as well as perceived social norms relevant to disordered eating may elevate risk for disordered eating in Fiji, during the critical developmental period of adolescence. Replication and extension of these research findings in other populations undergoing rapid social transition—and where globalization is also influencing local social norms—may enrich etiologic models and inform strategies to mitigate risk. PMID:25139374
Gerbasi, Margaret E; Richards, Lauren K; Thomas, Jennifer J; Agnew-Blais, Jessica C; Thompson-Brenner, Heather; Gilman, Stephen E; Becker, Anne E
2014-11-01
The increasing global health burden imposed by eating disorders warrants close examination of social exposures associated with globalization that potentially elevate risk during the critical developmental period of adolescence in low- and middle-income countries (LMICs). The study aim was to investigate the association of peer influence and perceived social norms with adolescent eating pathology in Fiji, a LMIC undergoing rapid social change. We measured peer influence on eating concerns (with the Inventory of Peer Influence on Eating Concerns; IPIEC), perceived peer norms associated with disordered eating and body concerns, perceived community cultural norms, and individual cultural orientations in a representative sample of school-going ethnic Fijian adolescent girls (n = 523). We then developed a multivariable linear regression model to examine their relation to eating pathology (measured by the Eating Disorder Examination-Questionnaire; EDE-Q). We found independent and statistically significant associations between both IPIEC scores and our proxy for perceived social norms specific to disordered eating (both p < .001) and EDE-Q global scores in a fully adjusted linear regression model. Study findings support the possibility that peer influence as well as perceived social norms relevant to disordered eating may elevate risk for disordered eating in Fiji, during the critical developmental period of adolescence. Replication and extension of these research findings in other populations undergoing rapid social transition--and where globalization is also influencing local social norms--may enrich etiologic models and inform strategies to mitigate risk. © 2014 Wiley Periodicals, Inc.
Qu, Mingkai; Wang, Yan; Huang, Biao; Zhao, Yongcun
2018-06-01
The traditional source apportionment models, such as absolute principal component scores-multiple linear regression (APCS-MLR), are usually susceptible to outliers, which may be widely present in the regional geochemical dataset. Furthermore, the models are merely built on variable space instead of geographical space and thus cannot effectively capture the local spatial characteristics of each source contributions. To overcome the limitations, a new receptor model, robust absolute principal component scores-robust geographically weighted regression (RAPCS-RGWR), was proposed based on the traditional APCS-MLR model. Then, the new method was applied to the source apportionment of soil metal elements in a region of Wuhan City, China as a case study. Evaluations revealed that: (i) RAPCS-RGWR model had better performance than APCS-MLR model in the identification of the major sources of soil metal elements, and (ii) source contributions estimated by RAPCS-RGWR model were more close to the true soil metal concentrations than that estimated by APCS-MLR model. It is shown that the proposed RAPCS-RGWR model is a more effective source apportionment method than APCS-MLR (i.e., non-robust and global model) in dealing with the regional geochemical dataset. Copyright © 2018 Elsevier B.V. All rights reserved.
Brumback, Babette A; Cai, Zhuangyu; Dailey, Amy B
2014-05-15
Reasons for health disparities may include neighborhood-level factors, such as availability of health services, social norms, and environmental determinants, as well as individual-level factors. Investigating health inequalities using nationally or locally representative data often requires an approach that can accommodate a complex sampling design, in which individuals have unequal probabilities of selection into the study. The goal of the present article is to review and compare methods of estimating or accounting for neighborhood influences with complex survey data. We considered 3 types of methods, each generalized for use with complex survey data: ordinary regression, conditional likelihood regression, and generalized linear mixed-model regression. The relative strengths and weaknesses of each method differ from one study to another; we provide an overview of the advantages and disadvantages of each method theoretically, in terms of the nature of the estimable associations and the plausibility of the assumptions required for validity, and also practically, via a simulation study and 2 epidemiologic data analyses. The first analysis addresses determinants of repeat mammography screening use using data from the 2005 National Health Interview Survey. The second analysis addresses disparities in preventive oral health care using data from the 2008 Florida Behavioral Risk Factor Surveillance System Survey.
Predicting U.S. Army Reserve Unit Manning Using Market Demographics
2015-06-01
develops linear regression , classification tree, and logistic regression models to determine the ability of the location to support manning requirements... logistic regression model delivers predictive results that allow decision-makers to identify locations with a high probability of meeting unit...manning requirements. The recommendation of this thesis is that the USAR implement the logistic regression model. 14. SUBJECT TERMS U.S
Real, J; Cleries, R; Forné, C; Roso-Llorach, A; Martínez-Sánchez, J M
In medicine and biomedical research, statistical techniques like logistic, linear, Cox and Poisson regression are widely known. The main objective is to describe the evolution of multivariate techniques used in observational studies indexed in PubMed (1970-2013), and to check the requirements of the STROBE guidelines in the author guidelines in Spanish journals indexed in PubMed. A targeted PubMed search was performed to identify papers that used logistic linear Cox and Poisson models. Furthermore, a review was also made of the author guidelines of journals published in Spain and indexed in PubMed and Web of Science. Only 6.1% of the indexed manuscripts included a term related to multivariate analysis, increasing from 0.14% in 1980 to 12.3% in 2013. In 2013, 6.7, 2.5, 3.5, and 0.31% of the manuscripts contained terms related to logistic, linear, Cox and Poisson regression, respectively. On the other hand, 12.8% of journals author guidelines explicitly recommend to follow the STROBE guidelines, and 35.9% recommend the CONSORT guideline. A low percentage of Spanish scientific journals indexed in PubMed include the STROBE statement requirement in the author guidelines. Multivariate regression models in published observational studies such as logistic regression, linear, Cox and Poisson are increasingly used both at international level, as well as in journals published in Spanish. Copyright © 2015 Sociedad Española de Médicos de Atención Primaria (SEMERGEN). Publicado por Elsevier España, S.L.U. All rights reserved.
Wu, Lingtao; Lord, Dominique
2017-05-01
This study further examined the use of regression models for developing crash modification factors (CMFs), specifically focusing on the misspecification in the link function. The primary objectives were to validate the accuracy of CMFs derived from the commonly used regression models (i.e., generalized linear models or GLMs with additive linear link functions) when some of the variables have nonlinear relationships and quantify the amount of bias as a function of the nonlinearity. Using the concept of artificial realistic data, various linear and nonlinear crash modification functions (CM-Functions) were assumed for three variables. Crash counts were randomly generated based on these CM-Functions. CMFs were then derived from regression models for three different scenarios. The results were compared with the assumed true values. The main findings are summarized as follows: (1) when some variables have nonlinear relationships with crash risk, the CMFs for these variables derived from the commonly used GLMs are all biased, especially around areas away from the baseline conditions (e.g., boundary areas); (2) with the increase in nonlinearity (i.e., nonlinear relationship becomes stronger), the bias becomes more significant; (3) the quality of CMFs for other variables having linear relationships can be influenced when mixed with those having nonlinear relationships, but the accuracy may still be acceptable; and (4) the misuse of the link function for one or more variables can also lead to biased estimates for other parameters. This study raised the importance of the link function when using regression models for developing CMFs. Copyright © 2017 Elsevier Ltd. All rights reserved.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Regression Commonality Analysis: A Technique for Quantitative Theory Building
ERIC Educational Resources Information Center
Nimon, Kim; Reio, Thomas G., Jr.
2011-01-01
When it comes to multiple linear regression analysis (MLR), it is common for social and behavioral science researchers to rely predominately on beta weights when evaluating how predictors contribute to a regression model. Presenting an underutilized statistical technique, this article describes how organizational researchers can use commonality…
Precision Efficacy Analysis for Regression.
ERIC Educational Resources Information Center
Brooks, Gordon P.
When multiple linear regression is used to develop a prediction model, sample size must be large enough to ensure stable coefficients. If the derivation sample size is inadequate, the model may not predict well for future subjects. The precision efficacy analysis for regression (PEAR) method uses a cross- validity approach to select sample sizes…
ERIC Educational Resources Information Center
Jurs, Stephen; And Others
The scree test and its linear regression technique are reviewed, and results of its use in factor analysis and Delphi data sets are described. The scree test was originally a visual approach for making judgments about eigenvalues, which considered the relationships of the eigenvalues to one another as well as their actual values. The graph that is…
Mays, G. P.; Halverson, P. K.; Stevens, R.
2001-01-01
OBJECTIVE: The authors examine the extent and nature of managed care plans participating in local public health activities. METHODS: In 1998, the authors surveyed the directors of all US local health departments serving jurisdictions of at least 100,000 residents to collect information about public health activities performed in their jurisdictions and about organizations participating in the activities. Multivariate logistic and linear regression models were used to examine organizational and market characteristics associated with managed care plan participation in public health activities. RESULTS: Managed care plans were reported to participate in public health activities in 164 (46%) of the jurisdictions surveyed, and to contribute to 13% of the public health activities performed in the average jurisdiction. Plans appeared most likely to participate in public health activities involving the delivery or management of personal health services and the exchange of health-related information. Managed care participation was more likely to occur in jurisdictions with higher HMO penetration, fewer competing plans, and larger proportions of plans enrolling Medicaid recipients. Participation was positively associated with the overall scope and perceived effectiveness of local public health activities. CONCLUSIONS: Although plans participate in a narrow range of activities, these contributions may complement the work of public health agencies. PMID:11889275
NASA Astrophysics Data System (ADS)
Benhalouche, Fatima Zohra; Karoui, Moussa Sofiane; Deville, Yannick; Ouamri, Abdelaziz
2015-10-01
In this paper, a new Spectral-Unmixing-based approach, using Nonnegative Matrix Factorization (NMF), is proposed to locally multi-sharpen hyperspectral data by integrating a Digital Surface Model (DSM) obtained from LIDAR data. In this new approach, the nature of the local mixing model is detected by using the local variance of the object elevations. The hyper/multispectral images are explored using small zones. In each zone, the variance of the object elevations is calculated from the DSM data in this zone. This variance is compared to a threshold value and the adequate linear/linearquadratic spectral unmixing technique is used in the considered zone to independently unmix hyperspectral and multispectral data, using an adequate linear/linear-quadratic NMF-based approach. The obtained spectral and spatial information thus respectively extracted from the hyper/multispectral images are then recombined in the considered zone, according to the selected mixing model. Experiments based on synthetic hyper/multispectral data are carried out to evaluate the performance of the proposed multi-sharpening approach and literature linear/linear-quadratic approaches used on the whole hyper/multispectral data. In these experiments, real DSM data are used to generate synthetic data containing linear and linear-quadratic mixed pixel zones. The DSM data are also used for locally detecting the nature of the mixing model in the proposed approach. Globally, the proposed approach yields good spatial and spectral fidelities for the multi-sharpened data and significantly outperforms the used literature methods.
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method.
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2015-11-18
Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available.
Henrard, S; Speybroeck, N; Hermans, C
2015-11-01
Haemophilia is a rare genetic haemorrhagic disease characterized by partial or complete deficiency of coagulation factor VIII, for haemophilia A, or IX, for haemophilia B. As in any other medical research domain, the field of haemophilia research is increasingly concerned with finding factors associated with binary or continuous outcomes through multivariable models. Traditional models include multiple logistic regressions, for binary outcomes, and multiple linear regressions for continuous outcomes. Yet these regression models are at times difficult to implement, especially for non-statisticians, and can be difficult to interpret. The present paper sought to didactically explain how, why, and when to use classification and regression tree (CART) analysis for haemophilia research. The CART method is non-parametric and non-linear, based on the repeated partitioning of a sample into subgroups based on a certain criterion. Breiman developed this method in 1984. Classification trees (CTs) are used to analyse categorical outcomes and regression trees (RTs) to analyse continuous ones. The CART methodology has become increasingly popular in the medical field, yet only a few examples of studies using this methodology specifically in haemophilia have to date been published. Two examples using CART analysis and previously published in this field are didactically explained in details. There is increasing interest in using CART analysis in the health domain, primarily due to its ease of implementation, use, and interpretation, thus facilitating medical decision-making. This method should be promoted for analysing continuous or categorical outcomes in haemophilia, when applicable. © 2015 John Wiley & Sons Ltd.
Birthweight Related Factors in Northwestern Iran: Using Quantile Regression Method
Fallah, Ramazan; Kazemnejad, Anoshirvan; Zayeri, Farid; Shoghli, Alireza
2016-01-01
Introduction: Birthweight is one of the most important predicting indicators of the health status in adulthood. Having a balanced birthweight is one of the priorities of the health system in most of the industrial and developed countries. This indicator is used to assess the growth and health status of the infants. The aim of this study was to assess the birthweight of the neonates by using quantile regression in Zanjan province. Methods: This analytical descriptive study was carried out using pre-registered (March 2010 - March 2012) data of neonates in urban/rural health centers of Zanjan province using multiple-stage cluster sampling. Data were analyzed using multiple linear regressions andquantile regression method and SAS 9.2 statistical software. Results: From 8456 newborn baby, 4146 (49%) were female. The mean age of the mothers was 27.1±5.4 years. The mean birthweight of the neonates was 3104 ± 431 grams. Five hundred and seventy-three patients (6.8%) of the neonates were less than 2500 grams. In all quantiles, gestational age of neonates (p<0.05), weight and educational level of the mothers (p<0.05) showed a linear significant relationship with the i of the neonates. However, sex and birth rank of the neonates, mothers age, place of residence (urban/rural) and career were not significant in all quantiles (p>0.05). Conclusion: This study revealed the results of multiple linear regression and quantile regression were not identical. We strictly recommend the use of quantile regression when an asymmetric response variable or data with outliers is available. PMID:26925889
Is the local linearity of space-time inherited from the linearity of probabilities?
NASA Astrophysics Data System (ADS)
Müller, Markus P.; Carrozza, Sylvain; Höhn, Philipp A.
2017-02-01
The appearance of linear spaces, describing physical quantities by vectors and tensors, is ubiquitous in all of physics, from classical mechanics to the modern notion of local Lorentz invariance. However, as natural as this seems to the physicist, most computer scientists would argue that something like a ‘local linear tangent space’ is not very typical and in fact a quite surprising property of any conceivable world or algorithm. In this paper, we take the perspective of the computer scientist seriously, and ask whether there could be any inherently information-theoretic reason to expect this notion of linearity to appear in physics. We give a series of simple arguments, spanning quantum information theory, group representation theory, and renormalization in quantum gravity, that supports a surprising thesis: namely, that the local linearity of space-time might ultimately be a consequence of the linearity of probabilities. While our arguments involve a fair amount of speculation, they have the virtue of being independent of any detailed assumptions on quantum gravity, and they are in harmony with several independent recent ideas on emergent space-time in high-energy physics.
Jia, Peng; Xierali, Imam M
2015-09-17
Congestive heart failure (CHF) is a major public health problem in the United States and is a leading cause of hospitalization in the elderly population. Understanding the health care travel patterns of CHF patients and their underlying cause is important to balance the supply and demand for local hospital resources. This article explores the nonclinical factors that prompt CHF patients to seek distant instead of local hospitalization. Local hospitalization was defined as inpatients staying within hospital service areas, and distant hospitalization was defined as inpatients traveling outside hospital service areas, based on individual hospital discharge data in 2011 generated by a Dartmouth-Swiss hybrid approach. Multiple logistic and linear regression models were used to compare the travel patterns of different groups of inpatients in Florida. Black patients, no-charge patients, patients living in large metropolitan areas, and patients with a low socioeconomic status were more likely to seek local hospitalization than were white patients, those who were privately insured, those who lived in rural areas, and those with a high socioeconomic status, respectively. Findings indicate that different populations diagnosed with CHF had different travel patterns for hospitalization. Changes or disruptions in local hospital supply could differentially affect different groups in a population. Policy makers could target efforts to CHF patients who are less likely to travel to seek treatment.
Xierali, Imam M.
2015-01-01
Introduction Congestive heart failure (CHF) is a major public health problem in the United States and is a leading cause of hospitalization in the elderly population. Understanding the health care travel patterns of CHF patients and their underlying cause is important to balance the supply and demand for local hospital resources. This article explores the nonclinical factors that prompt CHF patients to seek distant instead of local hospitalization. Methods Local hospitalization was defined as inpatients staying within hospital service areas, and distant hospitalization was defined as inpatients traveling outside hospital service areas, based on individual hospital discharge data in 2011 generated by a Dartmouth–Swiss hybrid approach. Multiple logistic and linear regression models were used to compare the travel patterns of different groups of inpatients in Florida. Results Black patients, no-charge patients, patients living in large metropolitan areas, and patients with a low socioeconomic status were more likely to seek local hospitalization than were white patients, those who were privately insured, those who lived in rural areas, and those with a high socioeconomic status, respectively. Conclusion Findings indicate that different populations diagnosed with CHF had different travel patterns for hospitalization. Changes or disruptions in local hospital supply could differentially affect different groups in a population. Policy makers could target efforts to CHF patients who are less likely to travel to seek treatment. PMID:26378896
Local concentration of fast-food outlets is associated with poor nutrition and obesity.
Kruger, Daniel J; Greenberg, Emily; Murphy, Jillian B; DiFazio, Lindsay A; Youra, Kathryn R
2014-01-01
We investigated the relationship of the local availability of fast-food restaurant locations with diet and obesity. We geocoded addresses of survey respondents and fast-food restaurant locations to assess the association between the local concentration of fast-food outlets, BMI, and fruit and vegetable consumption. The survey was conducted in Genesee County, Michigan. There were 1345 individuals included in this analysis, and the response rate was 25%. The Speak to Your Health! Community Survey included fruit and vegetable consumption items from the Behavioral Risk Factor Surveillance System, height, weight, and demographics. We used ArcGIS to map fast-food outlets and survey respondents. Stepwise linear regressions identified unique predictors of body mass index (BMI) and fruit and vegetable consumption. Survey respondents had 8 ± 7 fast-food outlets within 2 miles of their home. Individuals living in close proximity to fast-food restaurants had higher BMIs t(1342) = 3.21, p < .001, and lower fruit and vegetable consumption, t(1342) = 2.67, p = .008. Individuals may be at greater risk for adverse consequences of poor nutrition because of the patterns in local food availability, which may constrain the success of nutrition promotion efforts. Efforts to decrease the local availability of unhealthy foods as well as programs to help consumers identify strategies for obtaining healthy meals at fast-food outlets may improve health outcomes.
NASA Astrophysics Data System (ADS)
Karl, Thomas R.; Wang, Wei-Chyung; Schlesinger, Michael E.; Knight, Richard W.; Portman, David
1990-10-01
Important surface observations such as the daily maximum and minimum temperature, daily precipitation, and cloud ceilings often have localized characteristics that are difficult to reproduce with the current resolution and the physical parameterizations in state-of-the-art General Circulation climate Models (GCMs). Many of the difficulties can be partially attributed to mismatches in scale, local topography. regional geography and boundary conditions between models and surface-based observations. Here, we present a method, called climatological projection by model statistics (CPMS), to relate GCM grid-point flee-atmosphere statistics, the predictors, to these important local surface observations. The method can be viewed as a generalization of the model output statistics (MOS) and perfect prog (PP) procedures used in numerical weather prediction (NWP) models. It consists of the application of three statistical methods: 1) principle component analysis (FICA), 2) canonical correlation, and 3) inflated regression analysis. The PCA reduces the redundancy of the predictors The canonical correlation is used to develop simultaneous relationships between linear combinations of the predictors, the canonical variables, and the surface-based observations. Finally, inflated regression is used to relate the important canonical variables to each of the surface-based observed variables.We demonstrate that even an early version of the Oregon State University two-level atmospheric GCM (with prescribed sea surface temperature) produces free-atmosphere statistics than can, when standardized using the model's internal means and variances (the MOS-like version of CPMS), closely approximate the observed local climate. When the model data are standardized by the observed free-atmosphere means and variances (the PP version of CPMS), however, the model does not reproduce the observed surface climate as well. Our results indicate that in the MOS-like version of CPMS the differences between the output of a ten-year GCM control run and the surface-based observations are often smaller than the differences between the observations of two ten-year periods. Such positive results suggest that GCMs may already contain important climatological information that can be used to infer the local climate.
Some comparisons of complexity in dictionary-based and linear computational models.
Gnecco, Giorgio; Kůrková, Věra; Sanguineti, Marcello
2011-03-01
Neural networks provide a more flexible approximation of functions than traditional linear regression. In the latter, one can only adjust the coefficients in linear combinations of fixed sets of functions, such as orthogonal polynomials or Hermite functions, while for neural networks, one may also adjust the parameters of the functions which are being combined. However, some useful properties of linear approximators (such as uniqueness, homogeneity, and continuity of best approximation operators) are not satisfied by neural networks. Moreover, optimization of parameters in neural networks becomes more difficult than in linear regression. Experimental results suggest that these drawbacks of neural networks are offset by substantially lower model complexity, allowing accuracy of approximation even in high-dimensional cases. We give some theoretical results comparing requirements on model complexity for two types of approximators, the traditional linear ones and so called variable-basis types, which include neural networks, radial, and kernel models. We compare upper bounds on worst-case errors in variable-basis approximation with lower bounds on such errors for any linear approximator. Using methods from nonlinear approximation and integral representations tailored to computational units, we describe some cases where neural networks outperform any linear approximator. Copyright © 2010 Elsevier Ltd. All rights reserved.
Montoye, Alexander H K; Begum, Munni; Henning, Zachary; Pfeiffer, Karin A
2017-02-01
This study had three purposes, all related to evaluating energy expenditure (EE) prediction accuracy from body-worn accelerometers: (1) compare linear regression to linear mixed models, (2) compare linear models to artificial neural network models, and (3) compare accuracy of accelerometers placed on the hip, thigh, and wrists. Forty individuals performed 13 activities in a 90 min semi-structured, laboratory-based protocol. Participants wore accelerometers on the right hip, right thigh, and both wrists and a portable metabolic analyzer (EE criterion). Four EE prediction models were developed for each accelerometer: linear regression, linear mixed, and two ANN models. EE prediction accuracy was assessed using correlations, root mean square error (RMSE), and bias and was compared across models and accelerometers using repeated-measures analysis of variance. For all accelerometer placements, there were no significant differences for correlations or RMSE between linear regression and linear mixed models (correlations: r = 0.71-0.88, RMSE: 1.11-1.61 METs; p > 0.05). For the thigh-worn accelerometer, there were no differences in correlations or RMSE between linear and ANN models (ANN-correlations: r = 0.89, RMSE: 1.07-1.08 METs. Linear models-correlations: r = 0.88, RMSE: 1.10-1.11 METs; p > 0.05). Conversely, one ANN had higher correlations and lower RMSE than both linear models for the hip (ANN-correlation: r = 0.88, RMSE: 1.12 METs. Linear models-correlations: r = 0.86, RMSE: 1.18-1.19 METs; p < 0.05), and both ANNs had higher correlations and lower RMSE than both linear models for the wrist-worn accelerometers (ANN-correlations: r = 0.82-0.84, RMSE: 1.26-1.32 METs. Linear models-correlations: r = 0.71-0.73, RMSE: 1.55-1.61 METs; p < 0.01). For studies using wrist-worn accelerometers, machine learning models offer a significant improvement in EE prediction accuracy over linear models. Conversely, linear models showed similar EE prediction accuracy to machine learning models for hip- and thigh-worn accelerometers and may be viable alternative modeling techniques for EE prediction for hip- or thigh-worn accelerometers.
Diagnosis of Enzyme Inhibition Using Excel Solver: A Combined Dry and Wet Laboratory Exercise
ERIC Educational Resources Information Center
Dias, Albino A.; Pinto, Paula A.; Fraga, Irene; Bezerra, Rui M. F.
2014-01-01
In enzyme kinetic studies, linear transformations of the Michaelis-Menten equation, such as the Lineweaver-Burk double-reciprocal transformation, present some constraints. The linear transformation distorts the experimental error and the relationship between "x" and "y" axes; consequently, linear regression of transformed data…
Clustering performance comparison using K-means and expectation maximization algorithms.
Jung, Yong Gyu; Kang, Min Soo; Heo, Jun
2014-11-14
Clustering is an important means of data mining based on separating data categories by similar features. Unlike the classification algorithm, clustering belongs to the unsupervised type of algorithms. Two representatives of the clustering algorithms are the K -means and the expectation maximization (EM) algorithm. Linear regression analysis was extended to the category-type dependent variable, while logistic regression was achieved using a linear combination of independent variables. To predict the possibility of occurrence of an event, a statistical approach is used. However, the classification of all data by means of logistic regression analysis cannot guarantee the accuracy of the results. In this paper, the logistic regression analysis is applied to EM clusters and the K -means clustering method for quality assessment of red wine, and a method is proposed for ensuring the accuracy of the classification results.
Improvement of Storm Forecasts Using Gridded Bayesian Linear Regression for Northeast United States
NASA Astrophysics Data System (ADS)
Yang, J.; Astitha, M.; Schwartz, C. S.
2017-12-01
Bayesian linear regression (BLR) is a post-processing technique in which regression coefficients are derived and used to correct raw forecasts based on pairs of observation-model values. This study presents the development and application of a gridded Bayesian linear regression (GBLR) as a new post-processing technique to improve numerical weather prediction (NWP) of rain and wind storm forecasts over northeast United States. Ten controlled variables produced from ten ensemble members of the National Center for Atmospheric Research (NCAR) real-time prediction system are used for a GBLR model. In the GBLR framework, leave-one-storm-out cross-validation is utilized to study the performances of the post-processing technique in a database composed of 92 storms. To estimate the regression coefficients of the GBLR, optimization procedures that minimize the systematic and random error of predicted atmospheric variables (wind speed, precipitation, etc.) are implemented for the modeled-observed pairs of training storms. The regression coefficients calculated for meteorological stations of the National Weather Service are interpolated back to the model domain. An analysis of forecast improvements based on error reductions during the storms will demonstrate the value of GBLR approach. This presentation will also illustrate how the variances are optimized for the training partition in GBLR and discuss the verification strategy for grid points where no observations are available. The new post-processing technique is successful in improving wind speed and precipitation storm forecasts using past event-based data and has the potential to be implemented in real-time.
Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello
2016-05-01
Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.
Mixed kernel function support vector regression for global sensitivity analysis
NASA Astrophysics Data System (ADS)
Cheng, Kai; Lu, Zhenzhou; Wei, Yuhao; Shi, Yan; Zhou, Yicheng
2017-11-01
Global sensitivity analysis (GSA) plays an important role in exploring the respective effects of input variables on an assigned output response. Amongst the wide sensitivity analyses in literature, the Sobol indices have attracted much attention since they can provide accurate information for most models. In this paper, a mixed kernel function (MKF) based support vector regression (SVR) model is employed to evaluate the Sobol indices at low computational cost. By the proposed derivation, the estimation of the Sobol indices can be obtained by post-processing the coefficients of the SVR meta-model. The MKF is constituted by the orthogonal polynomials kernel function and Gaussian radial basis kernel function, thus the MKF possesses both the global characteristic advantage of the polynomials kernel function and the local characteristic advantage of the Gaussian radial basis kernel function. The proposed approach is suitable for high-dimensional and non-linear problems. Performance of the proposed approach is validated by various analytical functions and compared with the popular polynomial chaos expansion (PCE). Results demonstrate that the proposed approach is an efficient method for global sensitivity analysis.
ERIC Educational Resources Information Center
Dolan, Conor V.; Wicherts, Jelte M.; Molenaar, Peter C. M.
2004-01-01
We consider the question of how variation in the number and reliability of indicators affects the power to reject the hypothesis that the regression coefficients are zero in latent linear regression analysis. We show that power remains constant as long as the coefficient of determination remains unchanged. Any increase in the number of indicators…
Local Composite Quantile Regression Smoothing for Harris Recurrent Markov Processes
Li, Degui; Li, Runze
2016-01-01
In this paper, we study the local polynomial composite quantile regression (CQR) smoothing method for the nonlinear and nonparametric models under the Harris recurrent Markov chain framework. The local polynomial CQR regression method is a robust alternative to the widely-used local polynomial method, and has been well studied in stationary time series. In this paper, we relax the stationarity restriction on the model, and allow that the regressors are generated by a general Harris recurrent Markov process which includes both the stationary (positive recurrent) and nonstationary (null recurrent) cases. Under some mild conditions, we establish the asymptotic theory for the proposed local polynomial CQR estimator of the mean regression function, and show that the convergence rate for the estimator in nonstationary case is slower than that in stationary case. Furthermore, a weighted type local polynomial CQR estimator is provided to improve the estimation efficiency, and a data-driven bandwidth selection is introduced to choose the optimal bandwidth involved in the nonparametric estimators. Finally, we give some numerical studies to examine the finite sample performance of the developed methodology and theory. PMID:27667894
Liu, Xiaoxiao; Bertazzon, Stefania
2017-01-01
Spatial and temporal analyses are critical to understand the pattern of myocardial infarction (MI) hospitalizations over space and time, and to identify their underlying determinants. In this paper, we analyze MI hospitalizations in Calgary from 2004 to 2013, stratified by age and gender. First, a seasonal trend decomposition analyzes the seasonality; then a linear regression models the trend component. Moran’s I and hot spot analyses explore the spatial pattern. Though exploratory, results show that most age and gender groups feature a statistically significant decline over the 10 years, consistent with previous studies in Canada. Decline rates vary across ages and genders, with the slowest decline observed for younger males. Each gender exhibits a seasonal pattern with peaks in both winter and summer. Spatially, MI hot spots are identified in older communities, and in socioeconomically and environmentally disadvantaged communities. In the older communities, higher MI rates appear to be more highly associated with demographics. Conversely, worse air quality appears to be locally associated with higher MI incidence in younger age groups. The study helps identify areas of concern, where MI hot spots are identified for younger age groups, suggesting the need for localized public health policies to target local risk factors. PMID:29232910
Predictors of Serum Dioxin, Furan and PCB Concentrations among Women from Chapaevsk, Russia
Humblet, Olivier; Williams, Paige L.; Korrick, Susan A.; Sergeyev, Oleg; Emond, Claude; Birnbaum, Linda S.; Burns, Jane S.; Altshul, Larisa; Patterson, Donald G.; Turner, Wayman E.; Lee, Mary M.; Revich, Boris; Hauser, Russ
2011-01-01
INTRODUCTION Dioxins, furans and polychlorinated biphenyls (PCBs) are persistent and bioaccumulative toxic chemicals that are ubiquitous in the environment. We assessed predictors of their serum concentrations among women living in a Russian town contaminated by past industrial activity. METHODS Blood samples from 446 mothers aged 23–52 years were collected between 2003–2005 as part of the Russian Children’s Study. Serum dioxin, furan and PCB concentrations were quantified using high-resolution gas chromatography-mass spectrometry. Potential determinants of exposure were collected through interviews. Multivariate linear regression models were used to identify predictors of serum concentrations and toxic equivalencies (TEQs). RESULTS AND DISCUSSION The median total PCB concentrations and total TEQs were 260 ng/g lipid and 25 pg TEQ/g lipid, respectively. In multivariate analyses, both total PCB concentrations and total TEQs increased significantly with age, residential proximity to a local chemical plant, duration of local farming, and consumption of local beef. Both decreased with longer breastfeeding, recent increases in body mass index, and later blood draw date. These demographic and lifestyle predictors showed generally similar associations with the various measures of serum dioxins, furans, and PCBs. PMID:20578718
NASA Technical Reports Server (NTRS)
Whitlock, C. H.; Kuo, C. Y.
1979-01-01
The objective of this paper is to define optical physics and/or environmental conditions under which the linear multiple-regression should be applicable. An investigation of the signal-response equations is conducted and the concept is tested by application to actual remote sensing data from a laboratory experiment performed under controlled conditions. Investigation of the signal-response equations shows that the exact solution for a number of optical physics conditions is of the same form as a linearized multiple-regression equation, even if nonlinear contributions from surface reflections, atmospheric constituents, or other water pollutants are included. Limitations on achieving this type of solution are defined.
Wang, Yun; Huang, Fangzhou
2018-01-01
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC2), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible. PMID:29666661
Xu, Jiucheng; Mu, Huiyu; Wang, Yun; Huang, Fangzhou
2018-01-01
The selection of feature genes with high recognition ability from the gene expression profiles has gained great significance in biology. However, most of the existing methods have a high time complexity and poor classification performance. Motivated by this, an effective feature selection method, called supervised locally linear embedding and Spearman's rank correlation coefficient (SLLE-SC 2 ), is proposed which is based on the concept of locally linear embedding and correlation coefficient algorithms. Supervised locally linear embedding takes into account class label information and improves the classification performance. Furthermore, Spearman's rank correlation coefficient is used to remove the coexpression genes. The experiment results obtained on four public tumor microarray datasets illustrate that our method is valid and feasible.
Locally-Based Kernal PLS Smoothing to Non-Parametric Regression Curve Fitting
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Trejo, Leonard J.; Wheeler, Kevin; Korsmeyer, David (Technical Monitor)
2002-01-01
We present a novel smoothing approach to non-parametric regression curve fitting. This is based on kernel partial least squares (PLS) regression in reproducing kernel Hilbert space. It is our concern to apply the methodology for smoothing experimental data where some level of knowledge about the approximate shape, local inhomogeneities or points where the desired function changes its curvature is known a priori or can be derived based on the observed noisy data. We propose locally-based kernel PLS regression that extends the previous kernel PLS methodology by incorporating this knowledge. We compare our approach with existing smoothing splines, hybrid adaptive splines and wavelet shrinkage techniques on two generated data sets.
1981-09-01
corresponds to the same square footage that consumed the electrical energy. 3. The basic assumptions of multiple linear regres- sion, as enumerated in...7. Data related to the sample of bases is assumed to be representative of bases in the population. Limitations Basic limitations on this research were... Ratemaking --Overview. Rand Report R-5894, Santa Monica CA, May 1977. Chatterjee, Samprit, and Bertram Price. Regression Analysis by Example. New York: John
Szilágyi, Keely L.; Liu, Cong; Zhang, Xu; Wang, Ting; Fortman, Jeffrey D.; Zhang, Wei; Garcia, Joe G.N.
2016-01-01
Acute respiratory distress syndrome (ARDS) is a devastating clinical syndrome with a considerable case fatality rate (~30-40%). Health disparities exist with African descent subjects (ADs) exhibiting greater mortality than European descent individuals (EDs). Myosin light chain kinase (MLCK) is encoded by MYLK whose genetic variants are implicated in ARDS pathogenesis and may influence ARDS mortality. As baseline population-specific epigenetic changes, i.e. cytosine modifications, have been observed between AD and ED individuals, epigenetic variations in MYLK may provide insights into ARDS disparities. We compared methylation levels of MYLK CpGs between ARDS patients and ICU controls overall and by ethnicity in a nested case control study of 39 ARDS cases and 75 non-ARDS intensive care unit controls. Two MYLK CpG sites (cg03892735, cg23344121) were differentially modified between ARDS subjects and controls (p<0.05; q<0.25) in a logistic regression model, where no effect modification from ethnicity or age was found. One CpG site was associated with ARDS in patients less than 58 years old, cg19611163 (intron 19,20). Two CpG sites were associated with ARDS in EDs only, gene body CpG (cg01894985, intron 2,3) and CpG (cg16212219, intron 31,32), with higher modification levels exhibited in ARDS subjects than controls. Cis-acting mQTL (modified cytosine quantitative trait loci) were identified using linear regression between local genetic variants and modification levels for two ARDS-associated CpGs (cg23344121, cg16212219). In summary, these ARDS-associated MYLK CpGs with effect modification by ethnicity and local mQTL, suggest that MYLK epigenetic variation and local genetic background may contribute to health disparities observed in ARDS. PMID:27543902
NASA Astrophysics Data System (ADS)
Nieschulze, Jens; Erasmi, Stefan; Dietz, Johannes; Hölscher, Dirk
2009-01-01
SummaryRainforest conversion to other land use types drastically alters the hydrological cycle in which changes in rainfall interception contribute significantly to the observed differences. However, little is known about the effects of more gradual changes in forest structure and at regional scales. We studied land use types ranging from natural forest over selectively-logged forest to cacao agroforest in a lower montane region in Central Sulawesi, Indonesia, and tested the suitability of high-resolution optical satellite imagery for modeling observed interception patterns. Investigated characteristics indicating canopy structure were mean and standard deviation of reflectance values, local maxima, and self-similarity measures based on the grey level co-occurrence matrix and geostatistical variogram analysis. Previously studied and published rainfall interception data comprised twelve plots and median values per land use type ranged from 30% in natural forest to 18% in cacao agroforests. A linear regression model with local maxima, mean contrast and normalized digital vegetation index (NDVI) as regressors was able to explain more than 84% ( Radj2) of the variation encountered in the data. Other investigated characteristics did not prove significant in the regression analysis. The model yielded stable results with respect to cross-validation and also produced realistic values and spatial patterns when applied at the landscape level (783.6 ha). High values of interception were rare and localized in natural forest stands distant to villages, whereas low interception characterized the intensively used sites close to settlements. We conclude that forest use intensity significantly reduced rainfall interception and satellite image analysis can successfully be applied for its regional prediction, and most forest in the study region has already been subject to human-induced structural changes.
Study on power grid characteristics in summer based on Linear regression analysis
NASA Astrophysics Data System (ADS)
Tang, Jin-hui; Liu, You-fei; Liu, Juan; Liu, Qiang; Liu, Zhuan; Xu, Xi
2018-05-01
The correlation analysis of power load and temperature is the precondition and foundation for accurate load prediction, and a great deal of research has been made. This paper constructed the linear correlation model between temperature and power load, then the correlation of fault maintenance work orders with the power load is researched. Data details of Jiangxi province in 2017 summer such as temperature, power load, fault maintenance work orders were adopted in this paper to develop data analysis and mining. Linear regression models established in this paper will promote electricity load growth forecast, fault repair work order review, distribution network operation weakness analysis and other work to further deepen the refinement.
Interpreting Regression Results: beta Weights and Structure Coefficients are Both Important.
ERIC Educational Resources Information Center
Thompson, Bruce
Various realizations have led to less frequent use of the "OVA" methods (analysis of variance--ANOVA--among others) and to more frequent use of general linear model approaches such as regression. However, too few researchers understand all the various coefficients produced in regression. This paper explains these coefficients and their…
Spatial Assessment of Model Errors from Four Regression Techniques
Lianjun Zhang; Jeffrey H. Gove; Jeffrey H. Gove
2005-01-01
Fomst modelers have attempted to account for the spatial autocorrelations among trees in growth and yield models by applying alternative regression techniques such as linear mixed models (LMM), generalized additive models (GAM), and geographicalIy weighted regression (GWR). However, the model errors are commonly assessed using average errors across the entire study...
Quantile Regression in the Study of Developmental Sciences
ERIC Educational Resources Information Center
Petscher, Yaacov; Logan, Jessica A. R.
2014-01-01
Linear regression analysis is one of the most common techniques applied in developmental research, but only allows for an estimate of the average relations between the predictor(s) and the outcome. This study describes quantile regression, which provides estimates of the relations between the predictor(s) and outcome, but across multiple points of…
Maintenance Operations in Mission Oriented Protective Posture Level IV (MOPPIV)
1987-10-01
Repair FADAC Printed Circuit Board ............. 6 3. Data Analysis Techniques ............................. 6 a. Multiple Linear Regression... ANALYSIS /DISCUSSION ............................... 12 1. Exa-ple of Regression Analysis ..................... 12 S2. Regression results for all tasks...6 * TABLE 9. Task Grouping for Analysis ........................ 7 "TABXLE 10. Remove/Replace H60A3 Power Pack................. 8 TABLE
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Regression analysis using dependent Polya trees.
Schörgendorfer, Angela; Branscum, Adam J
2013-11-30
Many commonly used models for linear regression analysis force overly simplistic shape and scale constraints on the residual structure of data. We propose a semiparametric Bayesian model for regression analysis that produces data-driven inference by using a new type of dependent Polya tree prior to model arbitrary residual distributions that are allowed to evolve across increasing levels of an ordinal covariate (e.g., time, in repeated measurement studies). By modeling residual distributions at consecutive covariate levels or time points using separate, but dependent Polya tree priors, distributional information is pooled while allowing for broad pliability to accommodate many types of changing residual distributions. We can use the proposed dependent residual structure in a wide range of regression settings, including fixed-effects and mixed-effects linear and nonlinear models for cross-sectional, prospective, and repeated measurement data. A simulation study illustrates the flexibility of our novel semiparametric regression model to accurately capture evolving residual distributions. In an application to immune development data on immunoglobulin G antibodies in children, our new model outperforms several contemporary semiparametric regression models based on a predictive model selection criterion. Copyright © 2013 John Wiley & Sons, Ltd.
Intramuscular Pressure Measurement During Locomotion in Humans
NASA Technical Reports Server (NTRS)
Ballard, Ricard E.
1996-01-01
To assess the usefulness of intramuscular pressure (IMP) measurement for studying muscle function during gait, IMP was recorded in the soleus and tibialis anterior muscles of ten volunteers during, treadmill walking, and running using transducer-tipped catheters. Soleus IMP exhibited single peaks during late-stance phase of walking (181 +/- 69 mmHg, mean +/- S.E.) and running (269 +/- 95 mmHg). Tibialis anterior IMP showed a biphasic response, with the largest peak (90 +/- 15 mmHg during walking and 151 +/- 25 mmHg during running) occurring shortly after heel strike. IMP magnitude increased with gait speed in both muscles. Linear regression of soleus IMP against ankle joint torque obtained by a dynamometer in two subjects produced linear relationships (r = 0.97). Application of these relationships to IMP data yielded estimated peak soleus moment contributions of 0.95-165 Nm/Kg during walking, and 1.43-2.70 Nm/Kg during running. IMP results from local muscle tissue deformations caused by muscle force development and thus, provides a direct, practical index of muscle function during locomotion in humans.
Leg intramuscular pressures during locomotion in humans
NASA Technical Reports Server (NTRS)
Ballard, R. E.; Watenpaugh, D. E.; Breit, G. A.; Murthy, G.; Holley, D. C.; Hargens, A. R.
1998-01-01
To assess the usefulness of intramuscular pressure (IMP) measurement for studying muscle function during gait, IMP was recorded in the soleus and tibialis anterior muscles of 10 volunteers during treadmill walking and running by using transducer-tipped catheters. Soleus IMP exhibited single peaks during late-stance phase of walking [181 +/- 69 (SE) mmHg] and running (269 +/- 95 mmHg). Tibialis anterior IMP showed a biphasic response, with the largest peak (90 +/- 15 mmHg during walking and 151 +/- 25 mmHg during running) occurring shortly after heel strike. IMP magnitude increased with gait speed in both muscles. Linear regression of soleus IMP against ankle joint torque obtained by a dynamometer produced linear relationships (n = 2, r = 0.97 for both). Application of these relationships to IMP data yielded estimated peak soleus moment contributions of 0.95-1.65 N . m/kg during walking, and 1.43-2.70 N . m/kg during running. Phasic elevations of IMP during exercise are probably generated by local muscle tissue deformations due to muscle force development. Thus profiles of IMP provide a direct, reproducible index of muscle function during locomotion in humans.
Cruz, Antonio M; Barr, Cameron; Puñales-Pozo, Elsa
2008-01-01
This research's main goals were to build a predictor for a turnaround time (TAT) indicator for estimating its values and use a numerical clustering technique for finding possible causes of undesirable TAT values. The following stages were used: domain understanding, data characterisation and sample reduction and insight characterisation. Building the TAT indicator multiple linear regression predictor and clustering techniques were used for improving corrective maintenance task efficiency in a clinical engineering department (CED). The indicator being studied was turnaround time (TAT). Multiple linear regression was used for building a predictive TAT value model. The variables contributing to such model were clinical engineering department response time (CE(rt), 0.415 positive coefficient), stock service response time (Stock(rt), 0.734 positive coefficient), priority level (0.21 positive coefficient) and service time (0.06 positive coefficient). The regression process showed heavy reliance on Stock(rt), CE(rt) and priority, in that order. Clustering techniques revealed the main causes of high TAT values. This examination has provided a means for analysing current technical service quality and effectiveness. In doing so, it has demonstrated a process for identifying areas and methods of improvement and a model against which to analyse these methods' effectiveness.
A New SEYHAN's Approach in Case of Heterogeneity of Regression Slopes in ANCOVA.
Ankarali, Handan; Cangur, Sengul; Ankarali, Seyit
2018-06-01
In this study, when the assumptions of linearity and homogeneity of regression slopes of conventional ANCOVA are not met, a new approach named as SEYHAN has been suggested to use conventional ANCOVA instead of robust or nonlinear ANCOVA. The proposed SEYHAN's approach involves transformation of continuous covariate into categorical structure when the relationship between covariate and dependent variable is nonlinear and the regression slopes are not homogenous. A simulated data set was used to explain SEYHAN's approach. In this approach, we performed conventional ANCOVA in each subgroup which is constituted according to knot values and analysis of variance with two-factor model after MARS method was used for categorization of covariate. The first model is a simpler model than the second model that includes interaction term. Since the model with interaction effect has more subjects, the power of test also increases and the existing significant difference is revealed better. We can say that linearity and homogeneity of regression slopes are not problem for data analysis by conventional linear ANCOVA model by helping this approach. It can be used fast and efficiently for the presence of one or more covariates.
The Influential Effect of Blending, Bump, Changing Period, and Eclipsing Cepheids on the Leavitt Law
NASA Astrophysics Data System (ADS)
García-Varela, A.; Muñoz, J. R.; Sabogal, B. E.; Vargas Domínguez, S.; Martínez, J.
2016-06-01
The investigation of the nonlinearity of the Leavitt law (LL) is a topic that began more than seven decades ago, when some of the studies in this field found that the LL has a break at about 10 days. The goal of this work is to investigate a possible statistical cause of this nonlinearity. By applying linear regressions to OGLE-II and OGLE-IV data, we find that to obtain the LL by using linear regression, robust techniques to deal with influential points and/or outliers are needed instead of the ordinary least-squares regression traditionally used. In particular, by using M- and MM-regressions we establish firmly and without doubt the linearity of the LL in the Large Magellanic Cloud, without rejecting or excluding Cepheid data from the analysis. This implies that light curves of Cepheids suggesting blending, bumps, eclipses, or period changes do not affect the LL for this galaxy. For the Small Magellanic Cloud, when including Cepheids of this kind, it is not possible to find an adequate model, probably because of the geometry of the galaxy. In that case, a possible influence of these stars could exist.
Multiple regression technique for Pth degree polynominals with and without linear cross products
NASA Technical Reports Server (NTRS)
Davis, J. W.
1973-01-01
A multiple regression technique was developed by which the nonlinear behavior of specified independent variables can be related to a given dependent variable. The polynomial expression can be of Pth degree and can incorporate N independent variables. Two cases are treated such that mathematical models can be studied both with and without linear cross products. The resulting surface fits can be used to summarize trends for a given phenomenon and provide a mathematical relationship for subsequent analysis. To implement this technique, separate computer programs were developed for the case without linear cross products and for the case incorporating such cross products which evaluate the various constants in the model regression equation. In addition, the significance of the estimated regression equation is considered and the standard deviation, the F statistic, the maximum absolute percent error, and the average of the absolute values of the percent of error evaluated. The computer programs and their manner of utilization are described. Sample problems are included to illustrate the use and capability of the technique which show the output formats and typical plots comparing computer results to each set of input data.
Zhang, Hanze; Huang, Yangxin; Wang, Wei; Chen, Henian; Langland-Orban, Barbara
2017-01-01
In longitudinal AIDS studies, it is of interest to investigate the relationship between HIV viral load and CD4 cell counts, as well as the complicated time effect. Most of common models to analyze such complex longitudinal data are based on mean-regression, which fails to provide efficient estimates due to outliers and/or heavy tails. Quantile regression-based partially linear mixed-effects models, a special case of semiparametric models enjoying benefits of both parametric and nonparametric models, have the flexibility to monitor the viral dynamics nonparametrically and detect the varying CD4 effects parametrically at different quantiles of viral load. Meanwhile, it is critical to consider various data features of repeated measurements, including left-censoring due to a limit of detection, covariate measurement error, and asymmetric distribution. In this research, we first establish a Bayesian joint models that accounts for all these data features simultaneously in the framework of quantile regression-based partially linear mixed-effects models. The proposed models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also conducted to assess the performance of the proposed methods under different scenarios.
Kumar, K Vasanth
2006-10-11
Batch kinetic experiments were carried out for the sorption of methylene blue onto activated carbon. The experimental kinetics were fitted to the pseudo first-order and pseudo second-order kinetics by linear and a non-linear method. The five different types of Ho pseudo second-order expression have been discussed. A comparison of linear least-squares method and a trial and error non-linear method of estimating the pseudo second-order rate kinetic parameters were examined. The sorption process was found to follow a both pseudo first-order kinetic and pseudo second-order kinetic model. Present investigation showed that it is inappropriate to use a type 1 and type pseudo second-order expressions as proposed by Ho and Blanachard et al. respectively for predicting the kinetic rate constants and the initial sorption rate for the studied system. Three correct possible alternate linear expressions (type 2 to type 4) to better predict the initial sorption rate and kinetic rate constants for the studied system (methylene blue/activated carbon) was proposed. Linear method was found to check only the hypothesis instead of verifying the kinetic model. Non-linear regression method was found to be the more appropriate method to determine the rate kinetic parameters.
Adjusted variable plots for Cox's proportional hazards regression model.
Hall, C B; Zeger, S L; Bandeen-Roche, K J
1996-01-01
Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.
NASA Astrophysics Data System (ADS)
Sahabiev, I. A.; Ryazanov, S. S.; Kolcova, T. G.; Grigoryan, B. R.
2018-03-01
The three most common techniques to interpolate soil properties at a field scale—ordinary kriging (OK), regression kriging with multiple linear regression drift model (RK + MLR), and regression kriging with principal component regression drift model (RK + PCR)—were examined. The results of the performed study were compiled into an algorithm of choosing the most appropriate soil mapping technique. Relief attributes were used as the auxiliary variables. When spatial dependence of a target variable was strong, the OK method showed more accurate interpolation results, and the inclusion of the auxiliary data resulted in an insignificant improvement in prediction accuracy. According to the algorithm, the RK + PCR method effectively eliminates multicollinearity of explanatory variables. However, if the number of predictors is less than ten, the probability of multicollinearity is reduced, and application of the PCR becomes irrational. In that case, the multiple linear regression should be used instead.
Jupiter, Daniel C
2012-01-01
In this first of a series of statistical methodology commentaries for the clinician, we discuss the use of multivariate linear regression. Copyright © 2012 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
An evaluation of bias in propensity score-adjusted non-linear regression models.
Wan, Fei; Mitra, Nandita
2018-03-01
Propensity score methods are commonly used to adjust for observed confounding when estimating the conditional treatment effect in observational studies. One popular method, covariate adjustment of the propensity score in a regression model, has been empirically shown to be biased in non-linear models. However, no compelling underlying theoretical reason has been presented. We propose a new framework to investigate bias and consistency of propensity score-adjusted treatment effects in non-linear models that uses a simple geometric approach to forge a link between the consistency of the propensity score estimator and the collapsibility of non-linear models. Under this framework, we demonstrate that adjustment of the propensity score in an outcome model results in the decomposition of observed covariates into the propensity score and a remainder term. Omission of this remainder term from a non-collapsible regression model leads to biased estimates of the conditional odds ratio and conditional hazard ratio, but not for the conditional rate ratio. We further show, via simulation studies, that the bias in these propensity score-adjusted estimators increases with larger treatment effect size, larger covariate effects, and increasing dissimilarity between the coefficients of the covariates in the treatment model versus the outcome model.
Modification of the USLE K factor for soil erodibility assessment on calcareous soils in Iran
NASA Astrophysics Data System (ADS)
Ostovari, Yaser; Ghorbani-Dashtaki, Shoja; Bahrami, Hossein-Ali; Naderi, Mehdi; Dematte, Jose Alexandre M.; Kerry, Ruth
2016-11-01
The measurement of soil erodibility (K) in the field is tedious, time-consuming and expensive; therefore, its prediction through pedotransfer functions (PTFs) could be far less costly and time-consuming. The aim of this study was to develop new PTFs to estimate the K factor using multiple linear regression, Mamdani fuzzy inference systems, and artificial neural networks. For this purpose, K was measured in 40 erosion plots with natural rainfall. Various soil properties including the soil particle size distribution, calcium carbonate equivalent, organic matter, permeability, and wet-aggregate stability were measured. The results showed that the mean measured K was 0.014 t h MJ- 1 mm- 1 and 2.08 times less than the estimated mean K (0.030 t h MJ- 1 mm- 1) using the USLE model. Permeability, wet-aggregate stability, very fine sand, and calcium carbonate were selected as independent variables by forward stepwise regression in order to assess the ability of multiple linear regression, Mamdani fuzzy inference systems and artificial neural networks to predict K. The calcium carbonate equivalent, which is not accounted for in the USLE model, had a significant impact on K in multiple linear regression due to its strong influence on the stability of aggregates and soil permeability. Statistical indices in validation and calibration datasets determined that the artificial neural networks method with the highest R2, lowest RMSE, and lowest ME was the best model for estimating the K factor. A strong correlation (R2 = 0.81, n = 40, p < 0.05) between the estimated K from multiple linear regression and measured K indicates that the use of calcium carbonate equivalent as a predictor variable gives a better estimation of K in areas with calcareous soils.
Postmolar gestational trophoblastic neoplasia: beyond the traditional risk factors.
Bakhtiyari, Mahmood; Mirzamoradi, Masoumeh; Kimyaiee, Parichehr; Aghaie, Abbas; Mansournia, Mohammd Ali; Ashrafi-Vand, Sepideh; Sarfjoo, Fatemeh Sadat
2015-09-01
To investigate the slope of linear regression of postevacuation serum hCG as an independent risk factor for postmolar gestational trophoblastic neoplasia (GTN). Multicenter retrospective cohort study. Academic referral health care centers. All subjects with confirmed hydatidiform mole and at least four measurements of β-hCG titer. None. Type and magnitude of the relationship between the slope of linear regression of β-hCG as a new risk factor and GTN using Bayesian logistic regression with penalized log-likelihood estimation. Among the high-risk and low-risk molar pregnancy cases, 11 (18.6%) and 19 cases (13.3%) had GTN, respectively. No significant relationship was found between the components of a high-risk pregnancy and GTN. The β-hCG return slope was higher in the spontaneous cure group. However, the initial level of this hormone in the first measurement was higher in the GTN group compared with in the spontaneous recovery group. The average time for diagnosing GTN in the high-risk molar pregnancy group was 2 weeks less than that of the low-risk molar pregnancy group. In addition to slope of linear regression of β-hCG (odds ratio [OR], 12.74, confidence interval [CI], 5.42-29.2), abortion history (OR, 2.53; 95% CI, 1.27-5.04) and large uterine height for gestational age (OR, 1.26; CI, 1.04-1.54) had the maximum effects on GTN outcome, respectively. The slope of linear regression of β-hCG was introduced as an independent risk factor, which could be used for clinical decision making based on records of β-hCG titer and subsequent prevention program. Copyright © 2015 American Society for Reproductive Medicine. Published by Elsevier Inc. All rights reserved.
Tolerance of ciliated protozoan Paramecium bursaria (Protozoa, Ciliophora) to ammonia and nitrites
NASA Astrophysics Data System (ADS)
Xu, Henglong; Song, Weibo; Lu, Lu; Alan, Warren
2005-09-01
The tolerance to ammonia and nitrites in freshwater ciliate Paramecium bursaria was measured in a conventional open system. The ciliate was exposed to different concentrations of ammonia and nitrites for 2h and 12h in order to determine the lethal concentrations. Linear regression analysis revealed that the 2h-LC50 value for ammonia was 95.94 mg/L and for nitrite 27.35 mg/L using probit scale method (with 95% confidence intervals). There was a linear correlation between the mortality probit scale and logarithmic concentration of ammonia which fit by a regression equation y=7.32 x 9.51 ( R 2=0.98; y, mortality probit scale; x, logarithmic concentration of ammonia), by which 2 h-LC50 value for ammonia was found to be 95.50 mg/L. A linear correlation between mortality probit scales and logarithmic concentration of nitrite is also followed the regression equation y=2.86 x+0.89 ( R 2=0.95; y, mortality probit scale; x, logarithmic concentration of nitrite). The regression analysis of toxicity curves showed that the linear correlation between exposed time of ammonia-N LC50 value and ammonia-N LC50 value followed the regression equation y=2 862.85 e -0.08 x ( R 2=0.95; y, duration of exposure to LC50 value; x, LC50 value), and that between exposed time of nitrite-N LC50 value and nitrite-N LC50 value followed the regression equation y=127.15 e -0.13 x ( R 2=0.91; y, exposed time of LC50 value; x, LC50 value). The results demonstrate that the tolerance to ammonia in P. bursaria is considerably higher than that of the larvae or juveniles of some metozoa, e.g. cultured prawns and oysters. In addition, ciliates, as bacterial predators, are likely to play a positive role in maintaining and improving water quality in aquatic environments with high-level ammonium, such as sewage treatment systems.