Agogo, George O.; van der Voet, Hilko; Veer, Pieter van’t; Ferrari, Pietro; Leenders, Max; Muller, David C.; Sánchez-Cantalejo, Emilio; Bamia, Christina; Braaten, Tonje; Knüppel, Sven; Johansson, Ingegerd; van Eeuwijk, Fred A.; Boshuizen, Hendriek
2014-01-01
In epidemiologic studies, measurement error in dietary variables often attenuates association between dietary intake and disease occurrence. To adjust for the attenuation caused by error in dietary intake, regression calibration is commonly used. To apply regression calibration, unbiased reference measurements are required. Short-term reference measurements for foods that are not consumed daily contain excess zeroes that pose challenges in the calibration model. We adapted two-part regression calibration model, initially developed for multiple replicates of reference measurements per individual to a single-replicate setting. We showed how to handle excess zero reference measurements by two-step modeling approach, how to explore heteroscedasticity in the consumed amount with variance-mean graph, how to explore nonlinearity with the generalized additive modeling (GAM) and the empirical logit approaches, and how to select covariates in the calibration model. The performance of two-part calibration model was compared with the one-part counterpart. We used vegetable intake and mortality data from European Prospective Investigation on Cancer and Nutrition (EPIC) study. In the EPIC, reference measurements were taken with 24-hour recalls. For each of the three vegetable subgroups assessed separately, correcting for error with an appropriately specified two-part calibration model resulted in about three fold increase in the strength of association with all-cause mortality, as measured by the log hazard ratio. Further found is that the standard way of including covariates in the calibration model can lead to over fitting the two-part calibration model. Moreover, the extent of adjusting for error is influenced by the number and forms of covariates in the calibration model. For episodically consumed foods, we advise researchers to pay special attention to response distribution, nonlinearity, and covariate inclusion in specifying the calibration model. PMID:25402487
Logistic regression for dichotomized counts.
Preisser, John S; Das, Kalyan; Benecha, Habtamu; Stamm, John W
2016-12-01
Sometimes there is interest in a dichotomized outcome indicating whether a count variable is positive or zero. Under this scenario, the application of ordinary logistic regression may result in efficiency loss, which is quantifiable under an assumed model for the counts. In such situations, a shared-parameter hurdle model is investigated for more efficient estimation of regression parameters relating to overall effects of covariates on the dichotomous outcome, while handling count data with many zeroes. One model part provides a logistic regression containing marginal log odds ratio effects of primary interest, while an ancillary model part describes the mean count of a Poisson or negative binomial process in terms of nuisance regression parameters. Asymptotic efficiency of the logistic model parameter estimators of the two-part models is evaluated with respect to ordinary logistic regression. Simulations are used to assess the properties of the models with respect to power and Type I error, the latter investigated under both misspecified and correctly specified models. The methods are applied to data from a randomized clinical trial of three toothpaste formulations to prevent incident dental caries in a large population of Scottish schoolchildren. © The Author(s) 2014.
Two-Part and Related Regression Models for Longitudinal Data
Farewell, V.T.; Long, D.L.; Tom, B.D.M.; Yiu, S.; Su, L.
2017-01-01
Statistical models that involve a two-part mixture distribution are applicable in a variety of situations. Frequently, the two parts are a model for the binary response variable and a model for the outcome variable that is conditioned on the binary response. Two common examples are zero-inflated or hurdle models for count data and two-part models for semicontinuous data. Recently, there has been particular interest in the use of these models for the analysis of repeated measures of an outcome variable over time. The aim of this review is to consider motivations for the use of such models in this context and to highlight the central issues that arise with their use. We examine two-part models for semicontinuous and zero-heavy count data, and we also consider models for count data with a two-part random effects distribution. PMID:28890906
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
NASA Astrophysics Data System (ADS)
Mansouri, Edris; Feizi, Faranak; Jafari Rad, Alireza; Arian, Mehran
2018-03-01
This paper uses multivariate regression to create a mathematical model for iron skarn exploration in the Sarvian area, central Iran, using multivariate regression for mineral prospectivity mapping (MPM). The main target of this paper is to apply multivariate regression analysis (as an MPM method) to map iron outcrops in the northeastern part of the study area in order to discover new iron deposits in other parts of the study area. Two types of multivariate regression models using two linear equations were employed to discover new mineral deposits. This method is one of the reliable methods for processing satellite images. ASTER satellite images (14 bands) were used as unique independent variables (UIVs), and iron outcrops were mapped as dependent variables for MPM. According to the results of the probability value (p value), coefficient of determination value (R2) and adjusted determination coefficient (Radj2), the second regression model (which consistent of multiple UIVs) fitted better than other models. The accuracy of the model was confirmed by iron outcrops map and geological observation. Based on field observation, iron mineralization occurs at the contact of limestone and intrusive rocks (skarn type).
Cevenini, Gabriele; Barbini, Emanuela; Scolletta, Sabino; Biagioli, Bonizella; Giomarelli, Pierpaolo; Barbini, Paolo
2007-11-22
Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example. Eight models were developed: Bayes linear and quadratic models, k-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively. Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and k-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, k-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results. Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.
Weibull mixture regression for marginal inference in zero-heavy continuous outcomes.
Gebregziabher, Mulugeta; Voronca, Delia; Teklehaimanot, Abeba; Santa Ana, Elizabeth J
2017-06-01
Continuous outcomes with preponderance of zero values are ubiquitous in data that arise from biomedical studies, for example studies of addictive disorders. This is known to lead to violation of standard assumptions in parametric inference and enhances the risk of misleading conclusions unless managed properly. Two-part models are commonly used to deal with this problem. However, standard two-part models have limitations with respect to obtaining parameter estimates that have marginal interpretation of covariate effects which are important in many biomedical applications. Recently marginalized two-part models are proposed but their development is limited to log-normal and log-skew-normal distributions. Thus, in this paper, we propose a finite mixture approach, with Weibull mixture regression as a special case, to deal with the problem. We use extensive simulation study to assess the performance of the proposed model in finite samples and to make comparisons with other family of models via statistical information and mean squared error criteria. We demonstrate its application on real data from a randomized controlled trial of addictive disorders. Our results show that a two-component Weibull mixture model is preferred for modeling zero-heavy continuous data when the non-zero part are simulated from Weibull or similar distributions such as Gamma or truncated Gauss.
Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)
NASA Astrophysics Data System (ADS)
Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi
2017-06-01
Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.
NASA Astrophysics Data System (ADS)
Bae, Gihyun; Huh, Hoon; Park, Sungho
This paper deals with a regression model for light weight and crashworthiness enhancement design of automotive parts in frontal car crash. The ULSAB-AVC model is employed for the crash analysis and effective parts are selected based on the amount of energy absorption during the crash behavior. Finite element analyses are carried out for designated design cases in order to investigate the crashworthiness and weight according to the material and thickness of main energy absorption parts. Based on simulations results, a regression analysis is performed to construct a regression model utilized for light weight and crashworthiness enhancement design of automotive parts. An example for weight reduction of main energy absorption parts demonstrates the validity of a regression model constructed.
A kinetic energy model of two-vehicle crash injury severity.
Sobhani, Amir; Young, William; Logan, David; Bahrololoom, Sareh
2011-05-01
An important part of any model of vehicle crashes is the development of a procedure to estimate crash injury severity. After reviewing existing models of crash severity, this paper outlines the development of a modelling approach aimed at measuring the injury severity of people in two-vehicle road crashes. This model can be incorporated into a discrete event traffic simulation model, using simulation model outputs as its input. The model can then serve as an integral part of a simulation model estimating the crash potential of components of the traffic system. The model is developed using Newtonian Mechanics and Generalised Linear Regression. The factors contributing to the speed change (ΔV(s)) of a subject vehicle are identified using the law of conservation of momentum. A Log-Gamma regression model is fitted to measure speed change (ΔV(s)) of the subject vehicle based on the identified crash characteristics. The kinetic energy applied to the subject vehicle is calculated by the model, which in turn uses a Log-Gamma Regression Model to estimate the Injury Severity Score of the crash from the calculated kinetic energy, crash impact type, presence of airbag and/or seat belt and occupant age. Copyright © 2010 Elsevier Ltd. All rights reserved.
NASA Technical Reports Server (NTRS)
Kuo, Kenneth K.; Lu, Yeu-Cherng; Chiaverini, Martin J.; Johnson, David K.; Serin, Nadir; Risha, Grant A.; Merkle, Charles L.; Venkateswaran, Sankaran
1996-01-01
This final report summarizes the major findings on the subject of 'Fundamental Phenomena on Fuel Decomposition and Boundary-Layer Combustion Processes with Applications to Hybrid Rocket Motors', performed from 1 April 1994 to 30 June 1996. Both experimental results from Task 1 and theoretical/numerical results from Task 2 are reported here in two parts. Part 1 covers the experimental work performed and describes the test facility setup, data reduction techniques employed, and results of the test firings, including effects of operating conditions and fuel additives on solid fuel regression rate and thermal profiles of the condensed phase. Part 2 concerns the theoretical/numerical work. It covers physical modeling of the combustion processes including gas/surface coupling, and radiation effect on regression rate. The numerical solution of the flowfield structure and condensed phase regression behavior are presented. Experimental data from the test firings were used for numerical model validation.
Mechanisms behind the estimation of photosynthesis traits from leaf reflectance observations
NASA Astrophysics Data System (ADS)
Dechant, Benjamin; Cuntz, Matthias; Doktor, Daniel; Vohland, Michael
2016-04-01
Many studies have investigated the reflectance-based estimation of leaf chlorophyll, water and dry matter contents of plants. Only few studies focused on photosynthesis traits, however. The maximum potential uptake of carbon dioxide under given environmental conditions is determined mainly by RuBisCO activity, limiting carboxylation, or the speed of photosynthetic electron transport. These two main limitations are represented by the maximum carboxylation capacity, V cmax,25, and the maximum electron transport rate, Jmax,25. These traits were estimated from leaf reflectance before but the mechanisms underlying the estimation remain rather speculative. The aim of this study was therefore to reveal the mechanisms behind reflectance-based estimation of V cmax,25 and Jmax,25. Leaf reflectance, photosynthetic response curves as well as nitrogen content per area, Narea, and leaf mass per area, LMA, were measured on 37 deciduous tree species. V cmax,25 and Jmax,25 were determined from the response curves. Partial Least Squares (PLS) regression models for the two photosynthesis traits V cmax,25 and Jmax,25 as well as Narea and LMA were studied using a cross-validation approach. Analyses of linear regression models based on Narea and other leaf traits estimated via PROSPECT inversion, PLS regression coefficients and model residuals were conducted in order to reveal the mechanisms behind the reflectance-based estimation. We found that V cmax,25 and Jmax,25 can be estimated from leaf reflectance with good to moderate accuracy for a large number of species and different light conditions. The dominant mechanism behind the estimations was the strong relationship between photosynthesis traits and leaf nitrogen content. This was concluded from very strong relationships between PLS regression coefficients, the model residuals as well as the prediction performance of Narea- based linear regression models compared to PLS regression models. While the PLS regression model for V cmax,25 was fully based on the correlation to Narea, the PLS regression model for Jmax,25 was not entirely based on it. Analyses of the contributions of different parts of the reflectance spectrum revealed that the information contributing to the Jmax,25 PLS regression model in addition to the main source of information, Narea, was mainly located in the visible part of the spectrum (500-900 nm). Estimated chlorophyll content could be excluded as potential source of this extra information. The PLS regression coefficients of the Jmax,25 model indicated possible contributions from chlorophyll fluorescence and cytochrome f content. In summary, we found that the main mechanism behind the estimation of V cmax,25 and Jmax,25 from leaf reflectance observations is the correlation to Narea but that there is additional information related to Jmax,25 mainly in the visible part of the spectrum.
Steen, Paul J.; Passino-Reader, Dora R.; Wiley, Michael J.
2006-01-01
As a part of the Great Lakes Regional Aquatic Gap Analysis Project, we evaluated methodologies for modeling associations between fish species and habitat characteristics at a landscape scale. To do this, we created brook trout Salvelinus fontinalis presence and absence models based on four different techniques: multiple linear regression, logistic regression, neural networks, and classification trees. The models were tested in two ways: by application to an independent validation database and cross-validation using the training data, and by visual comparison of statewide distribution maps with historically recorded occurrences from the Michigan Fish Atlas. Although differences in the accuracy of our models were slight, the logistic regression model predicted with the least error, followed by multiple regression, then classification trees, then the neural networks. These models will provide natural resource managers a way to identify habitats requiring protection for the conservation of fish species.
Environmental, Spatial, and Sociodemographic Factors Associated with Nonfatal Injuries in Indonesia.
Irianti, Sri; Prasetyoputra, Puguh
2017-01-01
Background . The determinants of injuries and their reoccurrence in Indonesia are not well understood, despite their importance in the prevention of injuries. Therefore, this study seeks to investigate the environmental, spatial, and sociodemographic factors associated with the reoccurrence of injuries among Indonesian people. Methods . Data from the 2013 round of the Indonesia Baseline Health Research (IBHR 2013) were analysed using a two-part hurdle regression model. A logit regression model was chosen for the zero-hurdle part , while a zero-truncated negative binomial regression model was selected for the counts part . Odds ratio (OR) and incidence rate ratio (IRR) were the measures of association, respectively. Results . The results suggest that living in a household with distant drinking water source, residing in slum areas, residing in Eastern Indonesia, having low educational attainment, being men, and being poorer are positively related to the likelihood of experiencing injury. Moreover, being a farmer or fishermen, having low educational attainment, and being men are positively associated with the frequency of injuries. Conclusion . This study would be useful to prioritise injury prevention programs in Indonesia based on the environmental, spatial, and sociodemographic characteristics.
Fischer, A; Friggens, N C; Berry, D P; Faverdin, P
2018-07-01
The ability to properly assess and accurately phenotype true differences in feed efficiency among dairy cows is key to the development of breeding programs for improving feed efficiency. The variability among individuals in feed efficiency is commonly characterised by the residual intake approach. Residual feed intake is represented by the residuals of a linear regression of intake on the corresponding quantities of the biological functions that consume (or release) energy. However, the residuals include both, model fitting and measurement errors as well as any variability in cow efficiency. The objective of this study was to isolate the individual animal variability in feed efficiency from the residual component. Two separate models were fitted, in one the standard residual energy intake (REI) was calculated as the residual of a multiple linear regression of lactation average net energy intake (NEI) on lactation average milk energy output, average metabolic BW, as well as lactation loss and gain of body condition score. In the other, a linear mixed model was used to simultaneously fit fixed linear regressions and random cow levels on the biological traits and intercept using fortnight repeated measures for the variables. This method split the predicted NEI in two parts: one quantifying the population mean intercept and coefficients, and one quantifying cow-specific deviations in the intercept and coefficients. The cow-specific part of predicted NEI was assumed to isolate true differences in feed efficiency among cows. NEI and associated energy expenditure phenotypes were available for the first 17 fortnights of lactation from 119 Holstein cows; all fed a constant energy-rich diet. Mixed models fitting cow-specific intercept and coefficients to different combinations of the aforementioned energy expenditure traits, calculated on a fortnightly basis, were compared. The variance of REI estimated with the lactation average model represented only 8% of the variance of measured NEI. Among all compared mixed models, the variance of the cow-specific part of predicted NEI represented between 53% and 59% of the variance of REI estimated from the lactation average model or between 4% and 5% of the variance of measured NEI. The remaining 41% to 47% of the variance of REI estimated with the lactation average model may therefore reflect model fitting errors or measurement errors. In conclusion, the use of a mixed model framework with cow-specific random regressions seems to be a promising method to isolate the cow-specific component of REI in dairy cows.
Variable selection with stepwise and best subset approaches
2016-01-01
While purposeful selection is performed partly by software and partly by hand, the stepwise and best subset approaches are automatically performed by software. Two R functions stepAIC() and bestglm() are well designed for stepwise and best subset regression, respectively. The stepAIC() function begins with a full or null model, and methods for stepwise regression can be specified in the direction argument with character values “forward”, “backward” and “both”. The bestglm() function begins with a data frame containing explanatory variables and response variables. The response variable should be in the last column. Varieties of goodness-of-fit criteria can be specified in the IC argument. The Bayesian information criterion (BIC) usually results in more parsimonious model than the Akaike information criterion. PMID:27162786
Egler, Silvia G; Rodrigues-Filho, Saulo; Villas-Bôas, Roberto C; Beinhoff, Christian
2006-09-01
This study examines the total Hg contamination in soil and sediments, and the correlation between the total Hg concentration in soil and vegetables in two small scale gold mining areas, São Chico and Creporizinho, in the State of Para, Brazilian Amazon. Total Hg values for soil samples for both study areas are higher than region background values (ca. 0.15 mg/kg). At São Chico, mean values in soils samples are higher than at Creporizinho, but without significant differences at alpha<0.05 level. São Chico's aboveground produce samples possess significantly higher values for total Hg levels than samples from Creporizinho. Creporizinho's soil-root produce regression model were significant, and the slope negative. Creporizinho's soil-aboveground and root wild plants regression models were also significant, and the slopes positives. Although, aboveground:root ratios were >1 in all of São Chico's produce samples, soil-plant parts regression were not significant, and Hg uptake probably occurs through stomata by atmospheric mercury deposition. Wild plants aboveground:root ratios were <1 at both study areas, and soil-plant parts regressions were significant in samples of Creporizinho, suggesting that they function as an excluder. The average total contents of Hg in edible parts of produces were close to FAO/WHO/JECFA PTWI values in São Chico area, and much lower in Creporizinho. However, Hg inorganic small gastrointestinal absorption reduces its adverse health effects.
Prediction of dynamical systems by symbolic regression
NASA Astrophysics Data System (ADS)
Quade, Markus; Abel, Markus; Shafi, Kamran; Niven, Robert K.; Noack, Bernd R.
2016-07-01
We study the modeling and prediction of dynamical systems based on conventional models derived from measurements. Such algorithms are highly desirable in situations where the underlying dynamics are hard to model from physical principles or simplified models need to be found. We focus on symbolic regression methods as a part of machine learning. These algorithms are capable of learning an analytically tractable model from data, a highly valuable property. Symbolic regression methods can be considered as generalized regression methods. We investigate two particular algorithms, the so-called fast function extraction which is a generalized linear regression algorithm, and genetic programming which is a very general method. Both are able to combine functions in a certain way such that a good model for the prediction of the temporal evolution of a dynamical system can be identified. We illustrate the algorithms by finding a prediction for the evolution of a harmonic oscillator based on measurements, by detecting an arriving front in an excitable system, and as a real-world application, the prediction of solar power production based on energy production observations at a given site together with the weather forecast.
Assessing the Impact of Drug Use on Hospital Costs
Stuart, Bruce C; Doshi, Jalpa A; Terza, Joseph V
2009-01-01
Objective To assess whether outpatient prescription drug utilization produces offsets in the cost of hospitalization for Medicare beneficiaries. Data Sources/Study Setting The study analyzed a sample (N=3,101) of community-dwelling fee-for-service U.S. Medicare beneficiaries drawn from the 1999 and 2000 Medicare Current Beneficiary Surveys. Study Design Using a two-part model specification, we regressed any hospital admission (part 1: probit) and hospital spending by those with one or more admissions (part 2: nonlinear least squares regression) on drug use in a standard model with strong covariate controls and a residual inclusion instrumental variable (IV) model using an exogenous measure of drug coverage as the instrument. Principal Findings The covariate control model predicted that each additional prescription drug used (mean=30) raised hospital spending by $16 (p<.001). The residual inclusion IV model prediction was that each additional prescription fill reduced hospital spending by $104 (p<.001). Conclusions The findings indicate that drug use is associated with cost offsets in hospitalization among Medicare beneficiaries, once omitted variable bias is corrected using an IV technique appropriate for nonlinear applications. PMID:18783453
Estimating linear temporal trends from aggregated environmental monitoring data
Erickson, Richard A.; Gray, Brian R.; Eager, Eric A.
2017-01-01
Trend estimates are often used as part of environmental monitoring programs. These trends inform managers (e.g., are desired species increasing or undesired species decreasing?). Data collected from environmental monitoring programs is often aggregated (i.e., averaged), which confounds sampling and process variation. State-space models allow sampling variation and process variations to be separated. We used simulated time-series to compare linear trend estimations from three state-space models, a simple linear regression model, and an auto-regressive model. We also compared the performance of these five models to estimate trends from a long term monitoring program. We specifically estimated trends for two species of fish and four species of aquatic vegetation from the Upper Mississippi River system. We found that the simple linear regression had the best performance of all the given models because it was best able to recover parameters and had consistent numerical convergence. Conversely, the simple linear regression did the worst job estimating populations in a given year. The state-space models did not estimate trends well, but estimated population sizes best when the models converged. We found that a simple linear regression performed better than more complex autoregression and state-space models when used to analyze aggregated environmental monitoring data.
Detection of epistatic effects with logic regression and a classical linear regression model.
Malina, Magdalena; Ickstadt, Katja; Schwender, Holger; Posch, Martin; Bogdan, Małgorzata
2014-02-01
To locate multiple interacting quantitative trait loci (QTL) influencing a trait of interest within experimental populations, usually methods as the Cockerham's model are applied. Within this framework, interactions are understood as the part of the joined effect of several genes which cannot be explained as the sum of their additive effects. However, if a change in the phenotype (as disease) is caused by Boolean combinations of genotypes of several QTLs, this Cockerham's approach is often not capable to identify them properly. To detect such interactions more efficiently, we propose a logic regression framework. Even though with the logic regression approach a larger number of models has to be considered (requiring more stringent multiple testing correction) the efficient representation of higher order logic interactions in logic regression models leads to a significant increase of power to detect such interactions as compared to a Cockerham's approach. The increase in power is demonstrated analytically for a simple two-way interaction model and illustrated in more complex settings with simulation study and real data analysis.
A primer on marginal effects-part II: health services research applications.
Onukwugha, E; Bergtold, J; Jain, R
2015-02-01
Marginal analysis evaluates changes in a regression function associated with a unit change in a relevant variable. The primary statistic of marginal analysis is the marginal effect (ME). The ME facilitates the examination of outcomes for defined patient profiles or individuals while measuring the change in original units (e.g., costs, probabilities). The ME has a long history in economics; however, it is not widely used in health services research despite its flexibility and ability to provide unique insights. This article, the second in a two-part series, discusses practical issues that arise in the estimation and interpretation of the ME for a variety of regression models often used in health services research. Part one provided an overview of prior studies discussing ME followed by derivation of ME formulas for various regression models relevant for health services research studies examining costs and utilization. The current article illustrates the calculation and interpretation of ME in practice and discusses practical issues that arise during the implementation, including: understanding differences between software packages in terms of functionality available for calculating the ME and its confidence interval, interpretation of average marginal effect versus marginal effect at the mean, and the difference between ME and relative effects (e.g., odds ratio). Programming code to calculate ME using SAS, STATA, LIMDEP, and MATLAB are also provided. The illustration, discussion, and application of ME in this two-part series support the conduct of future studies applying the concept of marginal analysis.
NASA Astrophysics Data System (ADS)
Jintao, Xue; Yufei, Liu; Liming, Ye; Chunyan, Li; Quanwei, Yang; Weiying, Wang; Yun, Jing; Minxiang, Zhang; Peng, Li
2018-01-01
Near-Infrared Spectroscopy (NIRS) was first used to develop a method for rapid and simultaneous determination of 5 active alkaloids (berberine, coptisine, palmatine, epiberberine and jatrorrhizine) in 4 parts (rhizome, fibrous root, stem and leaf) of Coptidis Rhizoma. A total of 100 samples from 4 main places of origin were collected and studied. With HPLC analysis values as calibration reference, the quantitative analysis of 5 marker components was performed by two different modeling methods, partial least-squares (PLS) regression as linear regression and artificial neural networks (ANN) as non-linear regression. The results indicated that the 2 types of models established were robust, accurate and repeatable for five active alkaloids, and the ANN models was more suitable for the determination of berberine, coptisine and palmatine while the PLS model was more suitable for the analysis of epiberberine and jatrorrhizine. The performance of the optimal models was achieved as follows: the correlation coefficient (R) for berberine, coptisine, palmatine, epiberberine and jatrorrhizine was 0.9958, 0.9956, 0.9959, 0.9963 and 0.9923, respectively; the root mean square error of validation (RMSEP) was 0.5093, 0.0578, 0.0443, 0.0563 and 0.0090, respectively. Furthermore, for the comprehensive exploitation and utilization of plant resource of Coptidis Rhizoma, the established NIR models were used to analysis the content of 5 active alkaloids in 4 parts of Coptidis Rhizoma and 4 main origin of places. This work demonstrated that NIRS may be a promising method as routine screening for off-line fast analysis or on-line quality assessment of traditional Chinese medicine (TCM).
Moderation analysis using a two-level regression model.
Yuan, Ke-Hai; Cheng, Ying; Maxwell, Scott
2014-10-01
Moderation analysis is widely used in social and behavioral research. The most commonly used model for moderation analysis is moderated multiple regression (MMR) in which the explanatory variables of the regression model include product terms, and the model is typically estimated by least squares (LS). This paper argues for a two-level regression model in which the regression coefficients of a criterion variable on predictors are further regressed on moderator variables. An algorithm for estimating the parameters of the two-level model by normal-distribution-based maximum likelihood (NML) is developed. Formulas for the standard errors (SEs) of the parameter estimates are provided and studied. Results indicate that, when heteroscedasticity exists, NML with the two-level model gives more efficient and more accurate parameter estimates than the LS analysis of the MMR model. When error variances are homoscedastic, NML with the two-level model leads to essentially the same results as LS with the MMR model. Most importantly, the two-level regression model permits estimating the percentage of variance of each regression coefficient that is due to moderator variables. When applied to data from General Social Surveys 1991, NML with the two-level model identified a significant moderation effect of race on the regression of job prestige on years of education while LS with the MMR model did not. An R package is also developed and documented to facilitate the application of the two-level model.
Goodness-Of-Fit Test for Nonparametric Regression Models: Smoothing Spline ANOVA Models as Example.
Teran Hidalgo, Sebastian J; Wu, Michael C; Engel, Stephanie M; Kosorok, Michael R
2018-06-01
Nonparametric regression models do not require the specification of the functional form between the outcome and the covariates. Despite their popularity, the amount of diagnostic statistics, in comparison to their parametric counter-parts, is small. We propose a goodness-of-fit test for nonparametric regression models with linear smoother form. In particular, we apply this testing framework to smoothing spline ANOVA models. The test can consider two sources of lack-of-fit: whether covariates that are not currently in the model need to be included, and whether the current model fits the data well. The proposed method derives estimated residuals from the model. Then, statistical dependence is assessed between the estimated residuals and the covariates using the HSIC. If dependence exists, the model does not capture all the variability in the outcome associated with the covariates, otherwise the model fits the data well. The bootstrap is used to obtain p-values. Application of the method is demonstrated with a neonatal mental development data analysis. We demonstrate correct type I error as well as power performance through simulations.
Nonlinear-regression flow model of the Gulf Coast aquifer systems in the south-central United States
Kuiper, L.K.
1994-01-01
A multiple-regression methodology was used to help answer questions concerning model reliability, and to calibrate a time-dependent variable-density ground-water flow model of the gulf coast aquifer systems in the south-central United States. More than 40 regression models with 2 to 31 regressions parameters are used and detailed results are presented for 12 of the models. More than 3,000 values for grid-element volume-averaged head and hydraulic conductivity are used for the regression model observations. Calculated prediction interval half widths, though perhaps inaccurate due to a lack of normality of the residuals, are the smallest for models with only four regression parameters. In addition, the root-mean weighted residual decreases very little with an increase in the number of regression parameters. The various models showed considerable overlap between the prediction inter- vals for shallow head and hydraulic conductivity. Approximate 95-percent prediction interval half widths for volume-averaged freshwater head exceed 108 feet; for volume-averaged base 10 logarithm hydraulic conductivity, they exceed 0.89. All of the models are unreliable for the prediction of head and ground-water flow in the deeper parts of the aquifer systems, including the amount of flow coming from the underlying geopressured zone. Truncating the domain of solution of one model to exclude that part of the system having a ground-water density greater than 1.005 grams per cubic centimeter or to exclude that part of the systems below a depth of 3,000 feet, and setting the density to that of freshwater does not appreciably change the results for head and ground-water flow, except for locations close to the truncation surface.
Faculty Personality: A Factor of Student Retention
ERIC Educational Resources Information Center
Shaw, Cassandra S.; Wu, Xiaodong; Irwin, Kathleen C.; Patrizi, L. A. Chad
2016-01-01
The purpose of this study was to determine the relationship between student retention and faculty personality as it was hypothesized that faculty personality has an effect on student retention. The methodology adopted for this study was quantitative and in two parts 1) using linear regression models to examine the impact or causality of faculty…
NASA Astrophysics Data System (ADS)
Keat, Sim Chong; Chun, Beh Boon; San, Lim Hwee; Jafri, Mohd Zubir Mat
2015-04-01
Climate change due to carbon dioxide (CO2) emissions is one of the most complex challenges threatening our planet. This issue considered as a great and international concern that primary attributed from different fossil fuels. In this paper, regression model is used for analyzing the causal relationship among CO2 emissions based on the energy consumption in Malaysia using time series data for the period of 1980-2010. The equations were developed using regression model based on the eight major sources that contribute to the CO2 emissions such as non energy, Liquefied Petroleum Gas (LPG), diesel, kerosene, refinery gas, Aviation Turbine Fuel (ATF) and Aviation Gasoline (AV Gas), fuel oil and motor petrol. The related data partly used for predict the regression model (1980-2000) and partly used for validate the regression model (2001-2010). The results of the prediction model with the measured data showed a high correlation coefficient (R2=0.9544), indicating the model's accuracy and efficiency. These results are accurate and can be used in early warning of the population to comply with air quality standards.
Weichenthal, Scott; Ryswyk, Keith Van; Goldstein, Alon; Bagg, Scott; Shekkarizfard, Maryam; Hatzopoulou, Marianne
2016-04-01
Existing evidence suggests that ambient ultrafine particles (UFPs) (<0.1µm) may contribute to acute cardiorespiratory morbidity. However, few studies have examined the long-term health effects of these pollutants owing in part to a need for exposure surfaces that can be applied in large population-based studies. To address this need, we developed a land use regression model for UFPs in Montreal, Canada using mobile monitoring data collected from 414 road segments during the summer and winter months between 2011 and 2012. Two different approaches were examined for model development including standard multivariable linear regression and a machine learning approach (kernel-based regularized least squares (KRLS)) that learns the functional form of covariate impacts on ambient UFP concentrations from the data. The final models included parameters for population density, ambient temperature and wind speed, land use parameters (park space and open space), length of local roads and rail, and estimated annual average NOx emissions from traffic. The final multivariable linear regression model explained 62% of the spatial variation in ambient UFP concentrations whereas the KRLS model explained 79% of the variance. The KRLS model performed slightly better than the linear regression model when evaluated using an external dataset (R(2)=0.58 vs. 0.55) or a cross-validation procedure (R(2)=0.67 vs. 0.60). In general, our findings suggest that the KRLS approach may offer modest improvements in predictive performance compared to standard multivariable linear regression models used to estimate spatial variations in ambient UFPs. However, differences in predictive performance were not statistically significant when evaluated using the cross-validation procedure. Crown Copyright © 2015. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Suhartono, Lee, Muhammad Hisyam; Prastyo, Dedy Dwi
2015-12-01
The aim of this research is to develop a calendar variation model for forecasting retail sales data with the Eid ul-Fitr effect. The proposed model is based on two methods, namely two levels ARIMAX and regression methods. Two levels ARIMAX and regression models are built by using ARIMAX for the first level and regression for the second level. Monthly men's jeans and women's trousers sales in a retail company for the period January 2002 to September 2009 are used as case study. In general, two levels of calendar variation model yields two models, namely the first model to reconstruct the sales pattern that already occurred, and the second model to forecast the effect of increasing sales due to Eid ul-Fitr that affected sales at the same and the previous months. The results show that the proposed two level calendar variation model based on ARIMAX and regression methods yields better forecast compared to the seasonal ARIMA model and Neural Networks.
NASA Astrophysics Data System (ADS)
Nazeer, Majid; Bilal, Muhammad
2018-04-01
Landsat-5 Thematic Mapper (TM) dataset have been used to estimate salinity in the coastal area of Hong Kong. Four adjacent Landsat TM images were used in this study, which was atmospherically corrected using the Second Simulation of the Satellite Signal in the Solar Spectrum (6S) radiative transfer code. The atmospherically corrected images were further used to develop models for salinity using Ordinary Least Square (OLS) regression and Geographically Weighted Regression (GWR) based on in situ data of October 2009. Results show that the coefficient of determination ( R 2) of 0.42 between the OLS estimated and in situ measured salinity is much lower than that of the GWR model, which is two times higher ( R 2 = 0.86). It indicates that the GWR model has more ability than the OLS regression model to predict salinity and show its spatial heterogeneity better. It was observed that the salinity was high in Deep Bay (north-western part of Hong Kong) which might be due to the industrial waste disposal, whereas the salinity was estimated to be constant (32 practical salinity units) towards the open sea.
NASA Astrophysics Data System (ADS)
Öktem, H.
2012-01-01
Plastic injection molding plays a key role in the production of high-quality plastic parts. Shrinkage is one of the most significant problems of a plastic part in terms of quality in the plastic injection molding. This article focuses on the study of the modeling and analysis of the effects of process parameters on the shrinkage by evaluating the quality of the plastic part of a DVD-ROM cover made with Acrylonitrile Butadiene Styrene (ABS) polymer material. An effective regression model was developed to determine the mathematical relationship between the process parameters (mold temperature, melt temperature, injection pressure, injection time, and cooling time) and the volumetric shrinkage by utilizing the analysis data. Finite element (FE) analyses designed by Taguchi (L27) orthogonal arrays were run in the Moldflow simulation program. Analysis of variance (ANOVA) was then performed to check the adequacy of the regression model and to determine the effect of the process parameters on the shrinkage. Experiments were conducted to control the accuracy of the regression model with the FE analyses obtained from Moldflow. The results show that the regression model agrees very well with the FE analyses and the experiments. From this, it can be concluded that this study succeeded in modeling the shrinkage problem in our application.
Above-ground biomass of mangrove species. I. Analysis of models
NASA Astrophysics Data System (ADS)
Soares, Mário Luiz Gomes; Schaeffer-Novelli, Yara
2005-10-01
This study analyzes the above-ground biomass of Rhizophora mangle and Laguncularia racemosa located in the mangroves of Bertioga (SP) and Guaratiba (RJ), Southeast Brazil. Its purpose is to determine the best regression model to estimate the total above-ground biomass and compartment (leaves, reproductive parts, twigs, branches, trunk and prop roots) biomass, indirectly. To do this, we used structural measurements such as height, diameter at breast-height (DBH), and crown area. A combination of regression types with several compositions of independent variables generated 2.272 models that were later tested. Subsequent analysis of the models indicated that the biomass of reproductive parts, branches, and prop roots yielded great variability, probably because of environmental factors and seasonality (in the case of reproductive parts). It also indicated the superiority of multiple regression to estimate above-ground biomass as it allows researchers to consider several aspects that affect above-ground biomass, specially the influence of environmental factors. This fact has been attested to the models that estimated the biomass of crown compartments.
Vaeth, Michael; Skovlund, Eva
2004-06-15
For a given regression problem it is possible to identify a suitably defined equivalent two-sample problem such that the power or sample size obtained for the two-sample problem also applies to the regression problem. For a standard linear regression model the equivalent two-sample problem is easily identified, but for generalized linear models and for Cox regression models the situation is more complicated. An approximately equivalent two-sample problem may, however, also be identified here. In particular, we show that for logistic regression and Cox regression models the equivalent two-sample problem is obtained by selecting two equally sized samples for which the parameters differ by a value equal to the slope times twice the standard deviation of the independent variable and further requiring that the overall expected number of events is unchanged. In a simulation study we examine the validity of this approach to power calculations in logistic regression and Cox regression models. Several different covariate distributions are considered for selected values of the overall response probability and a range of alternatives. For the Cox regression model we consider both constant and non-constant hazard rates. The results show that in general the approach is remarkably accurate even in relatively small samples. Some discrepancies are, however, found in small samples with few events and a highly skewed covariate distribution. Comparison with results based on alternative methods for logistic regression models with a single continuous covariate indicates that the proposed method is at least as good as its competitors. The method is easy to implement and therefore provides a simple way to extend the range of problems that can be covered by the usual formulas for power and sample size determination. Copyright 2004 John Wiley & Sons, Ltd.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
NASA Astrophysics Data System (ADS)
Bhattacharyya, Sidhakam; Bandyopadhyay, Gautam
2010-10-01
The council of most of the Urban Local Bodies (ULBs) has a limited scope for decision making in the absence of appropriate financial control mechanism. The information about expected amount of own fund during a particular period is of great importance for decision making. Therefore, in this paper, efforts are being made to present set of findings and to establish a model of estimating receipts of own sources and payments thereof using multiple regression analysis. Data for sixty months from a reputed ULB in West Bengal have been considered for ascertaining the regression models. This can be used as a part of financial management and control procedure by the council to estimate the effect on own fund. In our study we have considered two models using multiple regression analysis. "Model I" comprises of total adjusted receipt as the dependent variable and selected individual receipts as the independent variables. Similarly "Model II" consists of total adjusted payments as the dependent variable and selected individual payments as independent variables. The resultant of Model I and Model II is the surplus or deficit effecting own fund. This may be applied for decision making purpose by the council.
Poisson Mixture Regression Models for Heart Disease Prediction.
Mufudza, Chipo; Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model.
Poisson Mixture Regression Models for Heart Disease Prediction
Erol, Hamza
2016-01-01
Early heart disease control can be achieved by high disease prediction and diagnosis efficiency. This paper focuses on the use of model based clustering techniques to predict and diagnose heart disease via Poisson mixture regression models. Analysis and application of Poisson mixture regression models is here addressed under two different classes: standard and concomitant variable mixture regression models. Results show that a two-component concomitant variable Poisson mixture regression model predicts heart disease better than both the standard Poisson mixture regression model and the ordinary general linear Poisson regression model due to its low Bayesian Information Criteria value. Furthermore, a Zero Inflated Poisson Mixture Regression model turned out to be the best model for heart prediction over all models as it both clusters individuals into high or low risk category and predicts rate to heart disease componentwise given clusters available. It is deduced that heart disease prediction can be effectively done by identifying the major risks componentwise using Poisson mixture regression model. PMID:27999611
Section 3. The SPARROW Surface Water-Quality Model: Theory, Application and User Documentation
Schwarz, G.E.; Hoos, A.B.; Alexander, R.B.; Smith, R.A.
2006-01-01
SPARROW (SPAtially Referenced Regressions On Watershed attributes) is a watershed modeling technique for relating water-quality measurements made at a network of monitoring stations to attributes of the watersheds containing the stations. The core of the model consists of a nonlinear regression equation describing the non-conservative transport of contaminants from point and diffuse sources on land to rivers and through the stream and river network. The model predicts contaminant flux, concentration, and yield in streams and has been used to evaluate alternative hypotheses about the important contaminant sources and watershed properties that control transport over large spatial scales. This report provides documentation for the SPARROW modeling technique and computer software to guide users in constructing and applying basic SPARROW models. The documentation gives details of the SPARROW software, including the input data and installation requirements, and guidance in the specification, calibration, and application of basic SPARROW models, as well as descriptions of the model output and its interpretation. The documentation is intended for both researchers and water-resource managers with interest in using the results of existing models and developing and applying new SPARROW models. The documentation of the model is presented in two parts. Part 1 provides a theoretical and practical introduction to SPARROW modeling techniques, which includes a discussion of the objectives, conceptual attributes, and model infrastructure of SPARROW. Part 1 also includes background on the commonly used model specifications and the methods for estimating and evaluating parameters, evaluating model fit, and generating water-quality predictions and measures of uncertainty. Part 2 provides a user's guide to SPARROW, which includes a discussion of the software architecture and details of the model input requirements and output files, graphs, and maps. The text documentation and computer software are available on the Web at http://usgs.er.gov/sparrow/sparrow-mod/.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wenzel, Tom P.
2016-05-20
Previous analyses have indicated that mass reduction is associated with an increase in crash frequency (crashes per VMT), but a decrease in fatality or casualty risk once a crash has occurred, across all types of light-duty vehicles. These results are counter-intuitive: one would expect that lighter, and perhaps smaller, vehicles have better handling and shorter braking distances, and thus should be able to avoid crashes that heavier vehicles cannot. And one would expect that heavier vehicles would have lower risk once a crash has occurred than lighter vehicles. However, these trends occur under several alternative regression model specifications. This reportmore » tests whether these results continue to hold after accounting for crash severity, by excluding crashes that result in relatively minor damage to the vehicle(s) involved in the crash. Excluding non-severe crashes from the initial LBNL Phase 2 and simultaneous two-stage regression models for the most part has little effect on the unexpected relationships observed in the baseline regression models. This finding suggests that other subtle differences in vehicles and/or their drivers, or perhaps biases in the data reported in state crash databases, are causing the unexpected results from the regression models.« less
Development of Ensemble Model Based Water Demand Forecasting Model
NASA Astrophysics Data System (ADS)
Kwon, Hyun-Han; So, Byung-Jin; Kim, Seong-Hyeon; Kim, Byung-Seop
2014-05-01
In recent years, Smart Water Grid (SWG) concept has globally emerged over the last decade and also gained significant recognition in South Korea. Especially, there has been growing interest in water demand forecast and optimal pump operation and this has led to various studies regarding energy saving and improvement of water supply reliability. Existing water demand forecasting models are categorized into two groups in view of modeling and predicting their behavior in time series. One is to consider embedded patterns such as seasonality, periodicity and trends, and the other one is an autoregressive model that is using short memory Markovian processes (Emmanuel et al., 2012). The main disadvantage of the abovementioned model is that there is a limit to predictability of water demands of about sub-daily scale because the system is nonlinear. In this regard, this study aims to develop a nonlinear ensemble model for hourly water demand forecasting which allow us to estimate uncertainties across different model classes. The proposed model is consist of two parts. One is a multi-model scheme that is based on combination of independent prediction model. The other one is a cross validation scheme named Bagging approach introduced by Brieman (1996) to derive weighting factors corresponding to individual models. Individual forecasting models that used in this study are linear regression analysis model, polynomial regression, multivariate adaptive regression splines(MARS), SVM(support vector machine). The concepts are demonstrated through application to observed from water plant at several locations in the South Korea. Keywords: water demand, non-linear model, the ensemble forecasting model, uncertainty. Acknowledgements This subject is supported by Korea Ministry of Environment as "Projects for Developing Eco-Innovation Technologies (GT-11-G-02-001-6)
Predictive models of safety based on audit findings: Part 2: Measurement of model validity.
Hsiao, Yu-Lin; Drury, Colin; Wu, Changxu; Paquet, Victor
2013-07-01
Part 1 of this study sequence developed a human factors/ergonomics (HF/E) based classification system (termed HFACS-MA) for safety audit findings and proved its measurement reliability. In Part 2, we used the human error categories of HFACS-MA as predictors of future safety performance. Audit records and monthly safety incident reports from two airlines submitted to their regulatory authority were available for analysis, covering over 6.5 years. Two participants derived consensus results of HF/E errors from the audit reports using HFACS-MA. We adopted Neural Network and Poisson regression methods to establish nonlinear and linear prediction models respectively. These models were tested for the validity of prediction of the safety data, and only Neural Network method resulted in substantially significant predictive ability for each airline. Alternative predictions from counting of audit findings and from time sequence of safety data produced some significant results, but of much smaller magnitude than HFACS-MA. The use of HF/E analysis of audit findings provided proactive predictors of future safety performance in the aviation maintenance field. Copyright © 2013 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Mental health status and healthcare utilization among community dwelling older adults.
Adepoju, Omolola; Lin, Szu-Hsuan; Mileski, Michael; Kruse, Clemens Scott; Mask, Andrew
2018-04-27
Shifts in mental health utilization patterns are necessary to allow for meaningful access to care for vulnerable populations. There have been long standing issues in how mental health is provided, which has caused problems in that care being efficacious for those seeking it. To assess the relationship between mental health status and healthcare utilization among adults ≥65 years. A negative binomial regression model was used to assess the relationship between mental health status and healthcare utilization related to office-based physician visits, while a two-part model, consisting of logistic regression and negative binomial regression, was used to separately model emergency visits and inpatient services. The receipt of care in office-based settings were marginally higher for subjects with mental health difficulties. Both probabilities and counts of inpatient hospitalizations were similar across mental health categories. The count of ER visits was similar across mental health categories; however, the probability of having an emergency department visit was marginally higher for older adults who reported mental health difficulties in 2012. These findings are encouraging and lend promise to the recent initiatives on addressing gaps in mental healthcare services.
ERIC Educational Resources Information Center
And Others; Werts, Charles E.
1979-01-01
It is shown how partial covariance, part and partial correlation, and regression weights can be estimated and tested for significance by means of a factor analytic model. Comparable partial covariance, correlations, and regression weights have identical significance tests. (Author)
Savary, Serge; Delbac, Lionel; Rochas, Amélie; Taisant, Guillaume; Willocquet, Laetitia
2009-08-01
Dual epidemics are defined as epidemics developing on two or several plant organs in the course of a cropping season. Agricultural pathosystems where such epidemics develop are often very important, because the harvestable part is one of the organs affected. These epidemics also are often difficult to manage, because the linkage between epidemiological components occurring on different organs is poorly understood, and because prediction of the risk toward the harvestable organs is difficult. In the case of downy mildew (DM) and powdery mildew (PM) of grapevine, nonlinear modeling and logistic regression indicated nonlinearity in the foliage-cluster relationships. Nonlinear modeling enabled the parameterization of a transmission coefficient that numerically links the two components, leaves and clusters, in DM and PM epidemics. Logistic regression analysis yielded a series of probabilistic models that enabled predicting preset levels of cluster infection risks based on DM and PM severities on the foliage at successive crop stages. The usefulness of this framework for tactical decision-making for disease control is discussed.
A Survey of UML Based Regression Testing
NASA Astrophysics Data System (ADS)
Fahad, Muhammad; Nadeem, Aamer
Regression testing is the process of ensuring software quality by analyzing whether changed parts behave as intended, and unchanged parts are not affected by the modifications. Since it is a costly process, a lot of techniques are proposed in the research literature that suggest testers how to build regression test suite from existing test suite with minimum cost. In this paper, we discuss the advantages and drawbacks of using UML diagrams for regression testing and analyze that UML model helps in identifying changes for regression test selection effectively. We survey the existing UML based regression testing techniques and provide an analysis matrix to give a quick insight into prominent features of the literature work. We discuss the open research issues like managing and reducing the size of regression test suite, prioritization of the test cases that would be helpful during strict schedule and resources that remain to be addressed for UML based regression testing.
SPSS macros to compare any two fitted values from a regression model.
Weaver, Bruce; Dubois, Sacha
2012-12-01
In regression models with first-order terms only, the coefficient for a given variable is typically interpreted as the change in the fitted value of Y for a one-unit increase in that variable, with all other variables held constant. Therefore, each regression coefficient represents the difference between two fitted values of Y. But the coefficients represent only a fraction of the possible fitted value comparisons that might be of interest to researchers. For many fitted value comparisons that are not captured by any of the regression coefficients, common statistical software packages do not provide the standard errors needed to compute confidence intervals or carry out statistical tests-particularly in more complex models that include interactions, polynomial terms, or regression splines. We describe two SPSS macros that implement a matrix algebra method for comparing any two fitted values from a regression model. The !OLScomp and !MLEcomp macros are for use with models fitted via ordinary least squares and maximum likelihood estimation, respectively. The output from the macros includes the standard error of the difference between the two fitted values, a 95% confidence interval for the difference, and a corresponding statistical test with its p-value.
A gentle introduction to quantile regression for ecologists
Cade, B.S.; Noon, B.R.
2003-01-01
Quantile regression is a way to estimate the conditional quantiles of a response variable distribution in the linear model that provides a more complete view of possible causal relationships between variables in ecological processes. Typically, all the factors that affect ecological processes are not measured and included in the statistical models used to investigate relationships between variables associated with those processes. As a consequence, there may be a weak or no predictive relationship between the mean of the response variable (y) distribution and the measured predictive factors (X). Yet there may be stronger, useful predictive relationships with other parts of the response variable distribution. This primer relates quantile regression estimates to prediction intervals in parametric error distribution regression models (eg least squares), and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of the estimates for homogeneous and heterogeneous regression models.
Coker, Freya; Williams, Cylie M; Taylor, Nicholas F; Caspers, Kirsten; McAlinden, Fiona; Wilton, Anita; Shields, Nora; Haines, Terry P
2018-05-10
This protocol considers three allied health staffing models across public health subacute hospitals. This quasi-experimental mixed-methods study, including qualitative process evaluation, aims to evaluate the impact of additional allied health services in subacute care, in rehabilitation and geriatric evaluation management settings, on patient, health service and societal outcomes. This health services research will analyse outcomes of patients exposed to different allied health models of care at three health services. Each health service will have a control ward (routine care) and an intervention ward (additional allied health). This project has two parts. Part 1: a whole of site data extraction for included wards. Outcome measures will include: length of stay, rate of readmissions, discharge destinations, community referrals, patient feedback and staff perspectives. Part 2: Functional Independence Measure scores will be collected every 2-3 days for the duration of 60 patient admissions.Data from part 1 will be analysed by linear regression analysis for continuous outcomes using patient-level data and logistic regression analysis for binary outcomes. Qualitative data will be analysed using a deductive thematic approach. For part 2, a linear mixed model analysis will be conducted using therapy service delivery and days since admission to subacute care as fixed factors in the model and individual participant as a random factor. Graphical analysis will be used to examine the growth curve of the model and transformations. The days since admission factor will be used to examine non-linear growth trajectories to determine if they lead to better model fit. Findings will be disseminated through local reports and to the Department of Health and Human Services Victoria. Results will be presented at conferences and submitted to peer-reviewed journals. The Monash Health Human Research Ethics committee approved this multisite research (HREC/17/MonH/144 and HREC/17/MonH/547). © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
ERIC Educational Resources Information Center
Mitchell, Don C.; Shen, Xingjia; Green, Matthew J.; Hodgson, Timothy L.
2008-01-01
When people read temporarily ambiguous sentences, there is often an increased prevalence of regressive eye-movements launched from the word that resolves the ambiguity. Traditionally, such regressions have been interpreted at least in part as reflecting readers' efforts to re-read and reconfigure earlier material, as exemplified by the Selective…
Holtschlag, David J.; Koschik, John A.
2002-01-01
The St. Clair–Detroit River Waterway connects Lake Huron with Lake Erie in the Great Lakes basin to form part of the international boundary between the United States and Canada. A two-dimensional hydrodynamic model is developed to compute flow velocities and water levels as part of a source-water assessment of public water intakes. The model, which uses the generalized finite-element code RMA2, discretizes the waterway into a mesh formed by 13,783 quadratic elements defined by 42,936 nodes. Seven steadystate scenarios are used to calibrate the model by adjusting parameters associated with channel roughness in 25 material zones in sub-areas of the waterway. An inverse modeling code is used to systematically adjust model parameters and to determine their associated uncertainty by use of nonlinear regression. Calibration results show close agreement between simulated and expected flows in major channels and water levels at gaging stations. Sensitivity analyses describe the amount of information available to estimate individual model parameters, and quantify the utility of flow measurements at selected cross sections and water-level measurements at gaging stations. Further data collection, model calibration analysis, and grid refinements are planned to assess and enhance two-dimensional flow simulation capabilities describing the horizontal flow distributions in St. Clair and Detroit Rivers and circulation patterns in Lake St. Clair.
Chen, Baojiang; Qin, Jing
2014-05-10
In statistical analysis, a regression model is needed if one is interested in finding the relationship between a response variable and covariates. When the response depends on the covariate, then it may also depend on the function of this covariate. If one has no knowledge of this functional form but expect for monotonic increasing or decreasing, then the isotonic regression model is preferable. Estimation of parameters for isotonic regression models is based on the pool-adjacent-violators algorithm (PAVA), where the monotonicity constraints are built in. With missing data, people often employ the augmented estimating method to improve estimation efficiency by incorporating auxiliary information through a working regression model. However, under the framework of the isotonic regression model, the PAVA does not work as the monotonicity constraints are violated. In this paper, we develop an empirical likelihood-based method for isotonic regression model to incorporate the auxiliary information. Because the monotonicity constraints still hold, the PAVA can be used for parameter estimation. Simulation studies demonstrate that the proposed method can yield more efficient estimates, and in some situations, the efficiency improvement is substantial. We apply this method to a dementia study. Copyright © 2013 John Wiley & Sons, Ltd.
A Technique of Fuzzy C-Mean in Multiple Linear Regression Model toward Paddy Yield
NASA Astrophysics Data System (ADS)
Syazwan Wahab, Nur; Saifullah Rusiman, Mohd; Mohamad, Mahathir; Amira Azmi, Nur; Che Him, Norziha; Ghazali Kamardan, M.; Ali, Maselan
2018-04-01
In this paper, we propose a hybrid model which is a combination of multiple linear regression model and fuzzy c-means method. This research involved a relationship between 20 variates of the top soil that are analyzed prior to planting of paddy yields at standard fertilizer rates. Data used were from the multi-location trials for rice carried out by MARDI at major paddy granary in Peninsular Malaysia during the period from 2009 to 2012. Missing observations were estimated using mean estimation techniques. The data were analyzed using multiple linear regression model and a combination of multiple linear regression model and fuzzy c-means method. Analysis of normality and multicollinearity indicate that the data is normally scattered without multicollinearity among independent variables. Analysis of fuzzy c-means cluster the yield of paddy into two clusters before the multiple linear regression model can be used. The comparison between two method indicate that the hybrid of multiple linear regression model and fuzzy c-means method outperform the multiple linear regression model with lower value of mean square error.
van der Ploeg, Tjeerd; Austin, Peter C; Steyerberg, Ewout W
2014-12-22
Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size ("data hungriness"). We performed simulation studies based on three clinical cohorts: 1282 patients with head and neck cancer (with 46.9% 5 year survival), 1731 patients with traumatic brain injury (22.3% 6 month mortality) and 3181 patients with minor head injury (7.6% with CT scan abnormalities). We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01). We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable. Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, N.; Bader, Jon B.
2010-01-01
Calibration data of a wind tunnel sting balance was processed using a candidate math model search algorithm that recommends an optimized regression model for the data analysis. During the calibration the normal force and the moment at the balance moment center were selected as independent calibration variables. The sting balance itself had two moment gages. Therefore, after analyzing the connection between calibration loads and gage outputs, it was decided to choose the difference and the sum of the gage outputs as the two responses that best describe the behavior of the balance. The math model search algorithm was applied to these two responses. An optimized regression model was obtained for each response. Classical strain gage balance load transformations and the equations of the deflection of a cantilever beam under load are used to show that the search algorithm s two optimized regression models are supported by a theoretical analysis of the relationship between the applied calibration loads and the measured gage outputs. The analysis of the sting balance calibration data set is a rare example of a situation when terms of a regression model of a balance can directly be derived from first principles of physics. In addition, it is interesting to note that the search algorithm recommended the correct regression model term combinations using only a set of statistical quality metrics that were applied to the experimental data during the algorithm s term selection process.
NASA Astrophysics Data System (ADS)
Chardon, Jérémy; Hingray, Benoit; Favre, Anne-Catherine
2018-01-01
Statistical downscaling models (SDMs) are often used to produce local weather scenarios from large-scale atmospheric information. SDMs include transfer functions which are based on a statistical link identified from observations between local weather and a set of large-scale predictors. As physical processes driving surface weather vary in time, the most relevant predictors and the regression link are likely to vary in time too. This is well known for precipitation for instance and the link is thus often estimated after some seasonal stratification of the data. In this study, we present a two-stage analog/regression model where the regression link is estimated from atmospheric analogs of the current prediction day. Atmospheric analogs are identified from fields of geopotential heights at 1000 and 500 hPa. For the regression stage, two generalized linear models are further used to model the probability of precipitation occurrence and the distribution of non-zero precipitation amounts, respectively. The two-stage model is evaluated for the probabilistic prediction of small-scale precipitation over France. It noticeably improves the skill of the prediction for both precipitation occurrence and amount. As the analog days vary from one prediction day to another, the atmospheric predictors selected in the regression stage and the value of the corresponding regression coefficients can vary from one prediction day to another. The model allows thus for a day-to-day adaptive and tailored downscaling. It can also reveal specific predictors for peculiar and non-frequent weather configurations.
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley P.
2004-01-01
Propulsion ground test facilities face the daily challenges of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Due to budgetary and schedule constraints, NASA and industry customers are pushing to test more components, for less money, in a shorter period of time. As these new rocket engine component test programs are undertaken, the lack of technology maturity in the test articles, combined with pushing the test facilities capabilities to their limits, tends to lead to an increase in facility breakdowns and unsuccessful tests. Over the last five years Stennis Space Center's propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and broken numerous test facility and test article parts. While various initiatives have been implemented to provide better propulsion test techniques and improve the quality, reliability, and maintainability of goods and parts used in the propulsion test facilities, unexpected failures during testing still occur quite regularly due to the harsh environment in which the propulsion test facilities operate. Previous attempts at modeling the lifecycle of a propulsion component test project have met with little success. Each of the attempts suffered form incomplete or inconsistent data on which to base the models. By focusing on the actual test phase of the tests project rather than the formulation, design or construction phases of the test project, the quality and quantity of available data increases dramatically. A logistic regression model has been developed form the data collected over the last five years, allowing the probability of successfully completing a rocket propulsion component test to be calculated. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),..,X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure. Logistic regression has primarily been used in the fields of epidemiology and biomedical research, but lends itself to many other applications. As indicated the use of logistic regression is not new, however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from the models provide project managers with insight and confidence into the affectivity of rocket engine component ground test projects. The initial success in modeling rocket propulsion ground test projects clears the way for more complex models to be developed in this area.
Montesinos-López, Abelardo; Montesinos-López, Osval A; Cuevas, Jaime; Mata-López, Walter A; Burgueño, Juan; Mondal, Sushismita; Huerta, Julio; Singh, Ravi; Autrique, Enrique; González-Pérez, Lorena; Crossa, José
2017-01-01
Modern agriculture uses hyperspectral cameras that provide hundreds of reflectance data at discrete narrow bands in many environments. These bands often cover the whole visible light spectrum and part of the infrared and ultraviolet light spectra. With the bands, vegetation indices are constructed for predicting agronomically important traits such as grain yield and biomass. However, since vegetation indices only use some wavelengths (referred to as bands), we propose using all bands simultaneously as predictor variables for the primary trait grain yield; results of several multi-environment maize (Aguate et al. in Crop Sci 57(5):1-8, 2017) and wheat (Montesinos-López et al. in Plant Methods 13(4):1-23, 2017) breeding trials indicated that using all bands produced better prediction accuracy than vegetation indices. However, until now, these prediction models have not accounted for the effects of genotype × environment (G × E) and band × environment (B × E) interactions incorporating genomic or pedigree information. In this study, we propose Bayesian functional regression models that take into account all available bands, genomic or pedigree information, the main effects of lines and environments, as well as G × E and B × E interaction effects. The data set used is comprised of 976 wheat lines evaluated for grain yield in three environments (Drought, Irrigated and Reduced Irrigation). The reflectance data were measured in 250 discrete narrow bands ranging from 392 to 851 nm (nm). The proposed Bayesian functional regression models were implemented using two types of basis: B-splines and Fourier. Results of the proposed Bayesian functional regression models, including all the wavelengths for predicting grain yield, were compared with results from conventional models with and without bands. We observed that the models with B × E interaction terms were the most accurate models, whereas the functional regression models (with B-splines and Fourier basis) and the conventional models performed similarly in terms of prediction accuracy. However, the functional regression models are more parsimonious and computationally more efficient because the number of beta coefficients to be estimated is 21 (number of basis), rather than estimating the 250 regression coefficients for all bands. In this study adding pedigree or genomic information did not increase prediction accuracy.
Jin, H; Wu, S; Vidyanti, I; Di Capua, P; Wu, B
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Depression is a common and often undiagnosed condition for patients with diabetes. It is also a condition that significantly impacts healthcare outcomes, use, and cost as well as elevating suicide risk. Therefore, a model to predict depression among diabetes patients is a promising and valuable tool for providers to proactively assess depressive symptoms and identify those with depression. This study seeks to develop a generalized multilevel regression model, using a longitudinal data set from a recent large-scale clinical trial, to predict depression severity and presence of major depression among patients with diabetes. Severity of depression was measured by the Patient Health Questionnaire PHQ-9 score. Predictors were selected from 29 candidate factors to develop a 2-level Poisson regression model that can make population-average predictions for all patients and subject-specific predictions for individual patients with historical records. Newly obtained patient records can be incorporated with historical records to update the prediction model. Root-mean-square errors (RMSE) were used to evaluate predictive accuracy of PHQ-9 scores. The study also evaluated the classification ability of using the predicted PHQ-9 scores to classify patients as having major depression. Two time-invariant and 10 time-varying predictors were selected for the model. Incorporating historical records and using them to update the model may improve both predictive accuracy of PHQ-9 scores and classification ability of the predicted scores. Subject-specific predictions (for individual patients with historical records) achieved RMSE about 4 and areas under the receiver operating characteristic (ROC) curve about 0.9 and are better than population-average predictions. The study developed a generalized multilevel regression model to predict depression and demonstrated that using generalized multilevel regression based on longitudinal patient records can achieve high predictive ability.
Regression Models For Multivariate Count Data
Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei
2016-01-01
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data. PMID:28348500
Regression Models For Multivariate Count Data.
Zhang, Yiwen; Zhou, Hua; Zhou, Jin; Sun, Wei
2017-01-01
Data with multivariate count responses frequently occur in modern applications. The commonly used multinomial-logit model is limiting due to its restrictive mean-variance structure. For instance, analyzing count data from the recent RNA-seq technology by the multinomial-logit model leads to serious errors in hypothesis testing. The ubiquity of over-dispersion and complicated correlation structures among multivariate counts calls for more flexible regression models. In this article, we study some generalized linear models that incorporate various correlation structures among the counts. Current literature lacks a treatment of these models, partly due to the fact that they do not belong to the natural exponential family. We study the estimation, testing, and variable selection for these models in a unifying framework. The regression models are compared on both synthetic and real RNA-seq data.
Determination of riverbank erosion probability using Locally Weighted Logistic Regression
NASA Astrophysics Data System (ADS)
Ioannidou, Elena; Flori, Aikaterini; Varouchakis, Emmanouil A.; Giannakis, Georgios; Vozinaki, Anthi Eirini K.; Karatzas, George P.; Nikolaidis, Nikolaos
2015-04-01
Riverbank erosion is a natural geomorphologic process that affects the fluvial environment. The most important issue concerning riverbank erosion is the identification of the vulnerable locations. An alternative to the usual hydrodynamic models to predict vulnerable locations is to quantify the probability of erosion occurrence. This can be achieved by identifying the underlying relations between riverbank erosion and the geomorphological or hydrological variables that prevent or stimulate erosion. Thus, riverbank erosion can be determined by a regression model using independent variables that are considered to affect the erosion process. The impact of such variables may vary spatially, therefore, a non-stationary regression model is preferred instead of a stationary equivalent. Locally Weighted Regression (LWR) is proposed as a suitable choice. This method can be extended to predict the binary presence or absence of erosion based on a series of independent local variables by using the logistic regression model. It is referred to as Locally Weighted Logistic Regression (LWLR). Logistic regression is a type of regression analysis used for predicting the outcome of a categorical dependent variable (e.g. binary response) based on one or more predictor variables. The method can be combined with LWR to assign weights to local independent variables of the dependent one. LWR allows model parameters to vary over space in order to reflect spatial heterogeneity. The probabilities of the possible outcomes are modelled as a function of the independent variables using a logistic function. Logistic regression measures the relationship between a categorical dependent variable and, usually, one or several continuous independent variables by converting the dependent variable to probability scores. Then, a logistic regression is formed, which predicts success or failure of a given binary variable (e.g. erosion presence or absence) for any value of the independent variables. The erosion occurrence probability can be calculated in conjunction with the model deviance regarding the independent variables tested. The most straightforward measure for goodness of fit is the G statistic. It is a simple and effective way to study and evaluate the Logistic Regression model efficiency and the reliability of each independent variable. The developed statistical model is applied to the Koiliaris River Basin on the island of Crete, Greece. Two datasets of river bank slope, river cross-section width and indications of erosion were available for the analysis (12 and 8 locations). Two different types of spatial dependence functions, exponential and tricubic, were examined to determine the local spatial dependence of the independent variables at the measurement locations. The results show a significant improvement when the tricubic function is applied as the erosion probability is accurately predicted at all eight validation locations. Results for the model deviance show that cross-section width is more important than bank slope in the estimation of erosion probability along the Koiliaris riverbanks. The proposed statistical model is a useful tool that quantifies the erosion probability along the riverbanks and can be used to assist managing erosion and flooding events. Acknowledgements This work is part of an on-going THALES project (CYBERSENSORS - High Frequency Monitoring System for Integrated Water Resources Management of Rivers). The project has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: THALES. Investing in knowledge society through the European Social Fund.
Cardiac surgery productivity and throughput improvements.
Lehtonen, Juha-Matti; Kujala, Jaakko; Kouri, Juhani; Hippeläinen, Mikko
2007-01-01
The high variability in cardiac surgery length--is one of the main challenges for staff managing productivity. This study aims to evaluate the impact of six interventions on open-heart surgery operating theatre productivity. A discrete operating theatre event simulation model with empirical operation time input data from 2603 patients is used to evaluate the effect that these process interventions have on the surgery output and overtime work. A linear regression model was used to get operation time forecasts for surgery scheduling while it also could be used to explain operation time. A forecasting model based on the linear regression of variables available before the surgery explains 46 per cent operating time variance. The main factors influencing operation length were type of operation, redoing the operation and the head surgeon. Reduction of changeover time between surgeries by inducing anaesthesia outside an operating theatre and by reducing slack time at the end of day after a second surgery have the strongest effects on surgery output and productivity. A more accurate operation time forecast did not have any effect on output, although improved operation time forecast did decrease overtime work. A reduction in the operation time itself is not studied in this article. However, the forecasting model can also be applied to discover which factors are most significant in explaining variation in the length of open-heart surgery. The challenge in scheduling two open-heart surgeries in one day can be partly resolved by increasing the length of the day, decreasing the time between two surgeries or by improving patient scheduling procedures so that two short surgeries can be paired. A linear regression model is created in the paper to increase the accuracy of operation time forecasting and to identify factors that have the most influence on operation time. A simulation model is used to analyse the impact of improved surgical length forecasting and five selected process interventions on productivity in cardiac surgery.
CIEL*a*b* color space predictive models for colorimetry devices--analysis of perfume quality.
Korifi, Rabia; Le Dréau, Yveline; Antinelli, Jean-François; Valls, Robert; Dupuy, Nathalie
2013-01-30
Color perception plays a major role in the consumer evaluation of perfume quality. Consumers need first to be entirely satisfied with the sensory properties of products, before other quality dimensions become relevant. The evaluation of complex mixtures color presents a challenge even for modern analytical techniques. A variety of instruments are available for color measurement. They can be classified as tristimulus colorimeters and spectrophotometers. Obsolescence of the electronics of old tristimulus colorimeter arises from the difficulty in finding repair parts and leads to its replacement by more modern instruments. High quality levels in color measurement, i.e., accuracy and reliability in color control are the major advantages of the new generation of color instrumentation, the integrating sphere spectrophotometer. Two models of spectrophotometer were tested in transmittance mode, employing the d/0° geometry. The CIEL(*)a(*)b(*) color space parameters were measured with each instrument for 380 samples of raw materials and bases used in the perfume compositions. The results were graphically compared between the colorimeter device and the spectrophotometer devices. All color space parameters obtained with the colorimeter were used as dependent variables to generate regression equations with values obtained from the spectrophotometers. The data was statistically analyzed to create predictive model between the reference and the target instruments through two methods. The first method uses linear regression analysis and the second method consists of partial least square regression (PLS) on each component. Copyright © 2012 Elsevier B.V. All rights reserved.
Holtschlag, David J.; Shively, Dawn; Whitman, Richard L.; Haack, Sheridan K.; Fogarty, Lisa R.
2008-01-01
Regression analyses and hydrodynamic modeling were used to identify environmental factors and flow paths associated with Escherichia coli (E. coli) concentrations at Memorial and Metropolitan Beaches on Lake St. Clair in Macomb County, Mich. Lake St. Clair is part of the binational waterway between the United States and Canada that connects Lake Huron with Lake Erie in the Great Lakes Basin. Linear regression, regression-tree, and logistic regression models were developed from E. coli concentration and ancillary environmental data. Linear regression models on log10 E. coli concentrations indicated that rainfall prior to sampling, water temperature, and turbidity were positively associated with bacteria concentrations at both beaches. Flow from Clinton River, changes in water levels, wind conditions, and log10 E. coli concentrations 2 days before or after the target bacteria concentrations were statistically significant at one or both beaches. In addition, various interaction terms were significant at Memorial Beach. Linear regression models for both beaches explained only about 30 percent of the variability in log10 E. coli concentrations. Regression-tree models were developed from data from both Memorial and Metropolitan Beaches but were found to have limited predictive capability in this study. The results indicate that too few observations were available to develop reliable regression-tree models. Linear logistic models were developed to estimate the probability of E. coli concentrations exceeding 300 most probable number (MPN) per 100 milliliters (mL). Rainfall amounts before bacteria sampling were positively associated with exceedance probabilities at both beaches. Flow of Clinton River, turbidity, and log10 E. coli concentrations measured before or after the target E. coli measurements were related to exceedances at one or both beaches. The linear logistic models were effective in estimating bacteria exceedances at both beaches. A receiver operating characteristic (ROC) analysis was used to determine cut points for maximizing the true positive rate prediction while minimizing the false positive rate. A two-dimensional hydrodynamic model was developed to simulate horizontal current patterns on Lake St. Clair in response to wind, flow, and water-level conditions at model boundaries. Simulated velocity fields were used to track hypothetical massless particles backward in time from the beaches along flow paths toward source areas. Reverse particle tracking for idealized steady-state conditions shows changes in expected flow paths and traveltimes with wind speeds and directions from 24 sectors. The results indicate that three to four sets of contiguous wind sectors have similar effects on flow paths in the vicinity of the beaches. In addition, reverse particle tracking was used for transient conditions to identify expected flow paths for 10 E. coli sampling events in 2004. These results demonstrate the ability to track hypothetical particles from the beaches, backward in time, to likely source areas. This ability, coupled with a greater frequency of bacteria sampling, may provide insight into changes in bacteria concentrations between source and sink areas.
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
Anderson, Ryan; Clegg, Samuel M.; Frydenvang, Jens; Wiens, Roger C.; McLennan, Scott M.; Morris, Richard V.; Ehlmann, Bethany L.; Dyar, M. Darby
2017-01-01
Accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response of an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “sub-model” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. The sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.
Reliability analysis of C-130 turboprop engine components using artificial neural network
NASA Astrophysics Data System (ADS)
Qattan, Nizar A.
In this study, we predict the failure rate of Lockheed C-130 Engine Turbine. More than thirty years of local operational field data were used for failure rate prediction and validation. The Weibull regression model and the Artificial Neural Network model including (feed-forward back-propagation, radial basis neural network, and multilayer perceptron neural network model); will be utilized to perform this study. For this purpose, the thesis will be divided into five major parts. First part deals with Weibull regression model to predict the turbine general failure rate, and the rate of failures that require overhaul maintenance. The second part will cover the Artificial Neural Network (ANN) model utilizing the feed-forward back-propagation algorithm as a learning rule. The MATLAB package will be used in order to build and design a code to simulate the given data, the inputs to the neural network are the independent variables, the output is the general failure rate of the turbine, and the failures which required overhaul maintenance. In the third part we predict the general failure rate of the turbine and the failures which require overhaul maintenance, using radial basis neural network model on MATLAB tool box. In the fourth part we compare the predictions of the feed-forward back-propagation model, with that of Weibull regression model, and radial basis neural network model. The results show that the failure rate predicted by the feed-forward back-propagation artificial neural network model is closer in agreement with radial basis neural network model compared with the actual field-data, than the failure rate predicted by the Weibull model. By the end of the study, we forecast the general failure rate of the Lockheed C-130 Engine Turbine, the failures which required overhaul maintenance and six categorical failures using multilayer perceptron neural network (MLP) model on DTREG commercial software. The results also give an insight into the reliability of the engine turbine under actual operating conditions, which can be used by aircraft operators for assessing system and component failures and customizing the maintenance programs recommended by the manufacturer.
Structural nested mean models for assessing time-varying effect moderation.
Almirall, Daniel; Ten Have, Thomas; Murphy, Susan A
2010-03-01
This article considers the problem of assessing causal effect moderation in longitudinal settings in which treatment (or exposure) is time varying and so are the covariates said to moderate its effect. Intermediate causal effects that describe time-varying causal effects of treatment conditional on past covariate history are introduced and considered as part of Robins' structural nested mean model. Two estimators of the intermediate causal effects, and their standard errors, are presented and discussed: The first is a proposed two-stage regression estimator. The second is Robins' G-estimator. The results of a small simulation study that begins to shed light on the small versus large sample performance of the estimators, and on the bias-variance trade-off between the two estimators are presented. The methodology is illustrated using longitudinal data from a depression study.
Zhou, Qingping; Jiang, Haiyan; Wang, Jianzhou; Zhou, Jianling
2014-10-15
Exposure to high concentrations of fine particulate matter (PM₂.₅) can cause serious health problems because PM₂.₅ contains microscopic solid or liquid droplets that are sufficiently small to be ingested deep into human lungs. Thus, daily prediction of PM₂.₅ levels is notably important for regulatory plans that inform the public and restrict social activities in advance when harmful episodes are foreseen. A hybrid EEMD-GRNN (ensemble empirical mode decomposition-general regression neural network) model based on data preprocessing and analysis is firstly proposed in this paper for one-day-ahead prediction of PM₂.₅ concentrations. The EEMD part is utilized to decompose original PM₂.₅ data into several intrinsic mode functions (IMFs), while the GRNN part is used for the prediction of each IMF. The hybrid EEMD-GRNN model is trained using input variables obtained from principal component regression (PCR) model to remove redundancy. These input variables accurately and succinctly reflect the relationships between PM₂.₅ and both air quality and meteorological data. The model is trained with data from January 1 to November 1, 2013 and is validated with data from November 2 to November 21, 2013 in Xi'an Province, China. The experimental results show that the developed hybrid EEMD-GRNN model outperforms a single GRNN model without EEMD, a multiple linear regression (MLR) model, a PCR model, and a traditional autoregressive integrated moving average (ARIMA) model. The hybrid model with fast and accurate results can be used to develop rapid air quality warning systems. Copyright © 2014 Elsevier B.V. All rights reserved.
Schuller, Alwin G; Barry, Evan R; Jones, Rhys D O; Henry, Ryan E; Frigault, Melanie M; Beran, Garry; Linsenmayer, David; Hattersley, Maureen; Smith, Aaron; Wilson, Joanne; Cairo, Stefano; Déas, Olivier; Nicolle, Delphine; Adam, Ammar; Zinda, Michael; Reimer, Corinne; Fawell, Stephen E; Clark, Edwin A; D'Cruz, Celina M
2015-06-15
Papillary renal cell carcinoma (PRCC) is the second most common cancer of the kidney and carries a poor prognosis for patients with nonlocalized disease. The HGF receptor MET plays a central role in PRCC and aberrations, either through mutation, copy number gain, or trisomy of chromosome 7 occurring in the majority of cases. The development of effective therapies in PRCC has been hampered in part by a lack of available preclinical models. We determined the pharmacodynamic and antitumor response of the selective MET inhibitor AZD6094 in two PRCC patient-derived xenograft (PDX) models. Two PRCC PDX models were identified and MET mutation status and copy number determined. Pharmacodynamic and antitumor activity of AZD6094 was tested using a dose response up to 25 mg/kg daily, representing clinically achievable exposures, and compared with the activity of the RCC standard-of-care sunitinib (in RCC43b) or the multikinase inhibitor crizotinib (in RCC47). AZD6094 treatment resulted in tumor regressions, whereas sunitinib or crizotinib resulted in unsustained growth inhibition. Pharmacodynamic analysis of tumors revealed that AZD6094 could robustly suppress pMET and the duration of target inhibition was dose related. AZD6094 inhibited multiple signaling nodes, including MAPK, PI3K, and EGFR. Finally, at doses that induced tumor regression, AZD6094 resulted in a dose- and time-dependent induction of cleaved PARP, a marker of cell death. Data presented provide the first report testing therapeutics in preclinical in vivo models of PRCC and support the clinical development of AZD6094 in this indication. ©2015 American Association for Cancer Research.
Regression estimators for generic health-related quality of life and quality-adjusted life years.
Basu, Anirban; Manca, Andrea
2012-01-01
To develop regression models for outcomes with truncated supports, such as health-related quality of life (HRQoL) data, and account for features typical of such data such as a skewed distribution, spikes at 1 or 0, and heteroskedasticity. Regression estimators based on features of the Beta distribution. First, both a single equation and a 2-part model are presented, along with estimation algorithms based on maximum-likelihood, quasi-likelihood, and Bayesian Markov-chain Monte Carlo methods. A novel Bayesian quasi-likelihood estimator is proposed. Second, a simulation exercise is presented to assess the performance of the proposed estimators against ordinary least squares (OLS) regression for a variety of HRQoL distributions that are encountered in practice. Finally, the performance of the proposed estimators is assessed by using them to quantify the treatment effect on QALYs in the EVALUATE hysterectomy trial. Overall model fit is studied using several goodness-of-fit tests such as Pearson's correlation test, link and reset tests, and a modified Hosmer-Lemeshow test. The simulation results indicate that the proposed methods are more robust in estimating covariate effects than OLS, especially when the effects are large or the HRQoL distribution has a large spike at 1. Quasi-likelihood techniques are more robust than maximum likelihood estimators. When applied to the EVALUATE trial, all but the maximum likelihood estimators produce unbiased estimates of the treatment effect. One and 2-part Beta regression models provide flexible approaches to regress the outcomes with truncated supports, such as HRQoL, on covariates, after accounting for many idiosyncratic features of the outcomes distribution. This work will provide applied researchers with a practical set of tools to model outcomes in cost-effectiveness analysis.
Modified Regression Correlation Coefficient for Poisson Regression Model
NASA Astrophysics Data System (ADS)
Kaengthong, Nattacha; Domthong, Uthumporn
2017-09-01
This study gives attention to indicators in predictive power of the Generalized Linear Model (GLM) which are widely used; however, often having some restrictions. We are interested in regression correlation coefficient for a Poisson regression model. This is a measure of predictive power, and defined by the relationship between the dependent variable (Y) and the expected value of the dependent variable given the independent variables [E(Y|X)] for the Poisson regression model. The dependent variable is distributed as Poisson. The purpose of this research was modifying regression correlation coefficient for Poisson regression model. We also compare the proposed modified regression correlation coefficient with the traditional regression correlation coefficient in the case of two or more independent variables, and having multicollinearity in independent variables. The result shows that the proposed regression correlation coefficient is better than the traditional regression correlation coefficient based on Bias and the Root Mean Square Error (RMSE).
USDA-ARS?s Scientific Manuscript database
Incomplete meteorological data has been a problem in environmental modeling studies. The objective of this work was to develop a technique to reconstruct missing daily precipitation data in the central part of Chesapeake Bay Watershed using regression trees (RT) and artificial neural networks (ANN)....
USDA-ARS?s Scientific Manuscript database
Missing meteorological data have to be estimated for agricultural and environmental modeling. The objective of this work was to develop a technique to reconstruct the missing daily precipitation data in the central part of the Chesapeake Bay Watershed using regression trees (RT) and artificial neura...
Robust mislabel logistic regression without modeling mislabel probabilities.
Hung, Hung; Jou, Zhi-Yu; Huang, Su-Yun
2018-03-01
Logistic regression is among the most widely used statistical methods for linear discriminant analysis. In many applications, we only observe possibly mislabeled responses. Fitting a conventional logistic regression can then lead to biased estimation. One common resolution is to fit a mislabel logistic regression model, which takes into consideration of mislabeled responses. Another common method is to adopt a robust M-estimation by down-weighting suspected instances. In this work, we propose a new robust mislabel logistic regression based on γ-divergence. Our proposal possesses two advantageous features: (1) It does not need to model the mislabel probabilities. (2) The minimum γ-divergence estimation leads to a weighted estimating equation without the need to include any bias correction term, that is, it is automatically bias-corrected. These features make the proposed γ-logistic regression more robust in model fitting and more intuitive for model interpretation through a simple weighting scheme. Our method is also easy to implement, and two types of algorithms are included. Simulation studies and the Pima data application are presented to demonstrate the performance of γ-logistic regression. © 2017, The International Biometric Society.
Linear regression in astronomy. II
NASA Technical Reports Server (NTRS)
Feigelson, Eric D.; Babu, Gutti J.
1992-01-01
A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.
Analysis of Sting Balance Calibration Data Using Optimized Regression Models
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert; Bader, Jon B.
2009-01-01
Calibration data of a wind tunnel sting balance was processed using a search algorithm that identifies an optimized regression model for the data analysis. The selected sting balance had two moment gages that were mounted forward and aft of the balance moment center. The difference and the sum of the two gage outputs were fitted in the least squares sense using the normal force and the pitching moment at the balance moment center as independent variables. The regression model search algorithm predicted that the difference of the gage outputs should be modeled using the intercept and the normal force. The sum of the two gage outputs, on the other hand, should be modeled using the intercept, the pitching moment, and the square of the pitching moment. Equations of the deflection of a cantilever beam are used to show that the search algorithm s two recommended math models can also be obtained after performing a rigorous theoretical analysis of the deflection of the sting balance under load. The analysis of the sting balance calibration data set is a rare example of a situation when regression models of balance calibration data can directly be derived from first principles of physics and engineering. In addition, it is interesting to see that the search algorithm recommended the same regression models for the data analysis using only a set of statistical quality metrics.
Cox regression analysis with missing covariates via nonparametric multiple imputation.
Hsu, Chiu-Hsieh; Yu, Mandi
2018-01-01
We consider the situation of estimating Cox regression in which some covariates are subject to missing, and there exists additional information (including observed event time, censoring indicator and fully observed covariates) which may be predictive of the missing covariates. We propose to use two working regression models: one for predicting the missing covariates and the other for predicting the missing probabilities. For each missing covariate observation, these two working models are used to define a nearest neighbor imputing set. This set is then used to non-parametrically impute covariate values for the missing observation. Upon the completion of imputation, Cox regression is performed on the multiply imputed datasets to estimate the regression coefficients. In a simulation study, we compare the nonparametric multiple imputation approach with the augmented inverse probability weighted (AIPW) method, which directly incorporates the two working models into estimation of Cox regression, and the predictive mean matching imputation (PMM) method. We show that all approaches can reduce bias due to non-ignorable missing mechanism. The proposed nonparametric imputation method is robust to mis-specification of either one of the two working models and robust to mis-specification of the link function of the two working models. In contrast, the PMM method is sensitive to misspecification of the covariates included in imputation. The AIPW method is sensitive to the selection probability. We apply the approaches to a breast cancer dataset from Surveillance, Epidemiology and End Results (SEER) Program.
A consistent framework for Horton regression statistics that leads to a modified Hack's law
Furey, P.R.; Troutman, B.M.
2008-01-01
A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
NASA Astrophysics Data System (ADS)
Sadler, J. M.; Goodall, J. L.; Morsy, M. M.; Spencer, K.
2018-04-01
Sea level rise has already caused more frequent and severe coastal flooding and this trend will likely continue. Flood prediction is an essential part of a coastal city's capacity to adapt to and mitigate this growing problem. Complex coastal urban hydrological systems however, do not always lend themselves easily to physically-based flood prediction approaches. This paper presents a method for using a data-driven approach to estimate flood severity in an urban coastal setting using crowd-sourced data, a non-traditional but growing data source, along with environmental observation data. Two data-driven models, Poisson regression and Random Forest regression, are trained to predict the number of flood reports per storm event as a proxy for flood severity, given extensive environmental data (i.e., rainfall, tide, groundwater table level, and wind conditions) as input. The method is demonstrated using data from Norfolk, Virginia USA from September 2010 to October 2016. Quality-controlled, crowd-sourced street flooding reports ranging from 1 to 159 per storm event for 45 storm events are used to train and evaluate the models. Random Forest performed better than Poisson regression at predicting the number of flood reports and had a lower false negative rate. From the Random Forest model, total cumulative rainfall was by far the most dominant input variable in predicting flood severity, followed by low tide and lower low tide. These methods serve as a first step toward using data-driven methods for spatially and temporally detailed coastal urban flood prediction.
Genetic parameters of legendre polynomials for first parity lactation curves.
Pool, M H; Janss, L L; Meuwissen, T H
2000-11-01
Variance components of the covariance function coefficients in a random regression test-day model were estimated by Legendre polynomials up to a fifth order for first-parity records of Dutch dairy cows using Gibbs sampling. Two Legendre polynomials of equal order were used to model the random part of the lactation curve, one for the genetic component and one for permanent environment. Test-day records from cows registered between 1990 to 1996 and collected by regular milk recording were available. For the data set, 23,700 complete lactations were selected from 475 herds sired by 262 sires. Because the application of a random regression model is limited by computing capacity, we investigated the minimum order needed to fit the variance structure in the data sufficiently. Predictions of genetic and permanent environmental variance structures were compared with bivariate estimates on 30-d intervals. A third-order or higher polynomial modeled the shape of variance curves over DIM with sufficient accuracy for the genetic and permanent environment part. Also, the genetic correlation structure was fitted with sufficient accuracy by a third-order polynomial, but, for the permanent environmental component, a fourth order was needed. Because equal orders are suggested in the literature, a fourth-order Legendre polynomial is recommended in this study. However, a rank of three for the genetic covariance matrix and of four for permanent environment allows a simpler covariance function with a reduced number of parameters based on the eigenvalues and eigenvectors.
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens
We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Improved accuracy in quantitative laser-induced breakdown spectroscopy using sub-models
Anderson, Ryan B.; Clegg, Samuel M.; Frydenvang, Jens; ...
2016-12-15
We report that accurate quantitative analysis of diverse geologic materials is one of the primary challenges faced by the Laser-Induced Breakdown Spectroscopy (LIBS)-based ChemCam instrument on the Mars Science Laboratory (MSL) rover. The SuperCam instrument on the Mars 2020 rover, as well as other LIBS instruments developed for geochemical analysis on Earth or other planets, will face the same challenge. Consequently, part of the ChemCam science team has focused on the development of improved multivariate analysis calibrations methods. Developing a single regression model capable of accurately determining the composition of very different target materials is difficult because the response ofmore » an element’s emission lines in LIBS spectra can vary with the concentration of other elements. We demonstrate a conceptually simple “submodel” method for improving the accuracy of quantitative LIBS analysis of diverse target materials. The method is based on training several regression models on sets of targets with limited composition ranges and then “blending” these “sub-models” into a single final result. Tests of the sub-model method show improvement in test set root mean squared error of prediction (RMSEP) for almost all cases. Lastly, the sub-model method, using partial least squares regression (PLS), is being used as part of the current ChemCam quantitative calibration, but the sub-model method is applicable to any multivariate regression method and may yield similar improvements.« less
Correlation and simple linear regression.
Eberly, Lynn E
2007-01-01
This chapter highlights important steps in using correlation and simple linear regression to address scientific questions about the association of two continuous variables with each other. These steps include estimation and inference, assessing model fit, the connection between regression and ANOVA, and study design. Examples in microbiology are used throughout. This chapter provides a framework that is helpful in understanding more complex statistical techniques, such as multiple linear regression, linear mixed effects models, logistic regression, and proportional hazards regression.
2014-01-01
Background Meta-regression is becoming increasingly used to model study level covariate effects. However this type of statistical analysis presents many difficulties and challenges. Here two methods for calculating confidence intervals for the magnitude of the residual between-study variance in random effects meta-regression models are developed. A further suggestion for calculating credible intervals using informative prior distributions for the residual between-study variance is presented. Methods Two recently proposed and, under the assumptions of the random effects model, exact methods for constructing confidence intervals for the between-study variance in random effects meta-analyses are extended to the meta-regression setting. The use of Generalised Cochran heterogeneity statistics is extended to the meta-regression setting and a Newton-Raphson procedure is developed to implement the Q profile method for meta-analysis and meta-regression. WinBUGS is used to implement informative priors for the residual between-study variance in the context of Bayesian meta-regressions. Results Results are obtained for two contrasting examples, where the first example involves a binary covariate and the second involves a continuous covariate. Intervals for the residual between-study variance are wide for both examples. Conclusions Statistical methods, and R computer software, are available to compute exact confidence intervals for the residual between-study variance under the random effects model for meta-regression. These frequentist methods are almost as easily implemented as their established counterparts for meta-analysis. Bayesian meta-regressions are also easily performed by analysts who are comfortable using WinBUGS. Estimates of the residual between-study variance in random effects meta-regressions should be routinely reported and accompanied by some measure of their uncertainty. Confidence and/or credible intervals are well-suited to this purpose. PMID:25196829
Meyer, Katharina; Niedermann, Karin; Tschopp, Alois; Klipstein, Andreas
2013-01-01
Background The work incapacity of ankylosing spondylitis (AS) ranges between 3% and 50% in Europe. In many countries, work incapacity is difficult to quantify. The work ability index (WAI) is applied to measure the work ability in workers, but it is not well investigated in patients. Aims To investigate the work incapacity in terms of absence days in patients with AS and to evaluate whether the WAI reflects the absence from work. Hypothesis Absence days can be estimated based on the WAI and other variables. Design Cross-sectional design. Setting In a secondary care centre in Switzerland, the WAI and a questionnaire about work absence were administered in AS patients prior to cardiovascular training. The number of absence days was collected retrospectively. The absence days were estimated using a two-part regression model. Participants 92 AS patients (58 men (63%)). Inclusion criteria: AS diagnosis, ability to cycle, age between 18 and 65 years. Exclusion criteria: severe heart disease. Primary and secondary outcome measures Absence days. Results Of the 92 patients, 14 received a disability pension and 78 were in the working process. The median absence days per year of the 78 patients due to AS alone and including other reasons was 0 days (IQR 0–12.3) and 2.5 days (IQR 0–19), respectively. The WAI score (regression coefficient=−4.66 (p<0.001, CI −6.1 to −3.2), ‘getting a disability pension’ (regression coefficient=−106.8 (p<0.001, 95% CI −141.6 to −72.0) and other not significant variables explained 70% of the variance in absence days (p<0.001), and therefore may estimate the number of absence days. Conclusions Absences in our sample of AS patients were equal to pan-European countries. In groups of AS patients, the WAI and other variables are valid to estimate absence days with the help of a two-part regression model. PMID:23524041
Li, Feiming; Gimpel, John R; Arenson, Ethan; Song, Hao; Bates, Bruce P; Ludwin, Fredric
2014-04-01
Few studies have investigated how well scores from the Comprehensive Osteopathic Medical Licensing Examination-USA (COMLEX-USA) series predict resident outcomes, such as performance on board certification examinations. To determine how well COMLEX-USA predicts performance on the American Osteopathic Board of Emergency Medicine (AOBEM) Part I certification examination. The target study population was first-time examinees who took AOBEM Part I in 2011 and 2012 with matched performances on COMLEX-USA Level 1, Level 2-Cognitive Evaluation (CE), and Level 3. Pearson correlations were computed between AOBEM Part I first-attempt scores and COMLEX-USA performances to measure the association between these examinations. Stepwise linear regression analysis was conducted to predict AOBEM Part I scores by the 3 COMLEX-USA scores. An independent t test was conducted to compare mean COMLEX-USA performances between candidates who passed and who failed AOBEM Part I, and a stepwise logistic regression analysis was used to predict the log-odds of passing AOBEM Part I on the basis of COMLEX-USA scores. Scores from AOBEM Part I had the highest correlation with COMLEX-USA Level 3 scores (.57) and slightly lower correlation with COMLEX-USA Level 2-CE scores (.53). The lowest correlation was between AOBEM Part I and COMLEX-USA Level 1 scores (.47). According to the stepwise regression model, COMLEX-USA Level 1 and Level 2-CE scores, which residency programs often use as selection criteria, together explained 30% of variance in AOBEM Part I scores. Adding Level 3 scores explained 37% of variance. The independent t test indicated that the 397 examinees passing AOBEM Part I performed significantly better than the 54 examinees failing AOBEM Part I in all 3 COMLEX-USA levels (P<.001 for all 3 levels). The logistic regression model showed that COMLEX-USA Level 1 and Level 3 scores predicted the log-odds of passing AOBEM Part I (P=.03 and P<.001, respectively). The present study empirically supported the predictive and discriminant validities of the COMLEX-USA series in relation to the AOBEM Part I certification examination. Although residency programs may use COMLEX-USA Level 1 and Level 2-CE scores as partial criteria in selecting residents, Level 3 scores, though typically not available at the time of application, are actually the most statistically related to performances on AOBEM Part I.
1974-01-01
REGRESSION MODEL - THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January 1974 Nelson Delfino d’Avila Mascarenha;? Image...Report 520 DIGITAL IMAGE RESTORATION UNDER A REGRESSION MODEL THE UNCONSTRAINED, LINEAR EQUALITY AND INEQUALITY CONSTRAINED APPROACHES January...a two- dimensional form adequately describes the linear model . A dis- cretization is performed by using quadrature methods. By trans
NASA Astrophysics Data System (ADS)
Li, Wang; Niu, Zheng; Gao, Shuai; Wang, Cheng
2014-11-01
Light Detection and Ranging (LiDAR) and Synthetic Aperture Radar (SAR) are two competitive active remote sensing techniques in forest above ground biomass estimation, which is important for forest management and global climate change study. This study aims to further explore their capabilities in temperate forest above ground biomass (AGB) estimation by emphasizing the spatial auto-correlation of variables obtained from these two remote sensing tools, which is a usually overlooked aspect in remote sensing applications to vegetation studies. Remote sensing variables including airborne LiDAR metrics, backscattering coefficient for different SAR polarizations and their ratio variables for Radarsat-2 imagery were calculated. First, simple linear regression models (SLR) was established between the field-estimated above ground biomass and the remote sensing variables. Pearson's correlation coefficient (R2) was used to find which LiDAR metric showed the most significant correlation with the regression residuals and could be selected as co-variable in regression co-kriging (RCoKrig). Second, regression co-kriging was conducted by choosing the regression residuals as dependent variable and the LiDAR metric (Hmean) with highest R2 as co-variable. Third, above ground biomass over the study area was estimated using SLR model and RCoKrig model, respectively. The results for these two models were validated using the same ground points. Results showed that both of these two methods achieved satisfactory prediction accuracy, while regression co-kriging showed the lower estimation error. It is proved that regression co-kriging model is feasible and effective in mapping the spatial pattern of AGB in the temperate forest using Radarsat-2 data calibrated by airborne LiDAR metrics.
A multilevel model for comorbid outcomes: obesity and diabetes in the US.
Congdon, Peter
2010-02-01
Multilevel models are overwhelmingly applied to single health outcomes, but when two or more health conditions are closely related, it is important that contextual variation in their joint prevalence (e.g., variations over different geographic settings) is considered. A multinomial multilevel logit regression approach for analysing joint prevalence is proposed here that includes subject level risk factors (e.g., age, race, education) while also taking account of geographic context. Data from a US population health survey (the 2007 Behavioral Risk Factor Surveillance System or BRFSS) are used to illustrate the method, with a six category multinomial outcome defined by diabetic status and weight category (obese, overweight, normal). The influence of geographic context is partly represented by known geographic variables (e.g., county poverty), and partly by a model for latent area influences. In particular, a shared latent variable (common factor) approach is proposed to measure the impact of unobserved area influences on joint weight and diabetes status, with the latent variable being spatially structured to reflect geographic clustering in risk.
Regression modeling of ground-water flow
Cooley, R.L.; Naff, R.L.
1985-01-01
Nonlinear multiple regression methods are developed to model and analyze groundwater flow systems. Complete descriptions of regression methodology as applied to groundwater flow models allow scientists and engineers engaged in flow modeling to apply the methods to a wide range of problems. Organization of the text proceeds from an introduction that discusses the general topic of groundwater flow modeling, to a review of basic statistics necessary to properly apply regression techniques, and then to the main topic: exposition and use of linear and nonlinear regression to model groundwater flow. Statistical procedures are given to analyze and use the regression models. A number of exercises and answers are included to exercise the student on nearly all the methods that are presented for modeling and statistical analysis. Three computer programs implement the more complex methods. These three are a general two-dimensional, steady-state regression model for flow in an anisotropic, heterogeneous porous medium, a program to calculate a measure of model nonlinearity with respect to the regression parameters, and a program to analyze model errors in computed dependent variables such as hydraulic head. (USGS)
Optimization of Regression Models of Experimental Data Using Confirmation Points
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2010-01-01
A new search metric is discussed that may be used to better assess the predictive capability of different math term combinations during the optimization of a regression model of experimental data. The new search metric can be determined for each tested math term combination if the given experimental data set is split into two subsets. The first subset consists of data points that are only used to determine the coefficients of the regression model. The second subset consists of confirmation points that are exclusively used to test the regression model. The new search metric value is assigned after comparing two values that describe the quality of the fit of each subset. The first value is the standard deviation of the PRESS residuals of the data points. The second value is the standard deviation of the response residuals of the confirmation points. The greater of the two values is used as the new search metric value. This choice guarantees that both standard deviations are always less or equal to the value that is used during the optimization. Experimental data from the calibration of a wind tunnel strain-gage balance is used to illustrate the application of the new search metric. The new search metric ultimately generates an optimized regression model that was already tested at regression model independent confirmation points before it is ever used to predict an unknown response from a set of regressors.
ERIC Educational Resources Information Center
Li, Deping; Oranje, Andreas
2007-01-01
Two versions of a general method for approximating standard error of regression effect estimates within an IRT-based latent regression model are compared. The general method is based on Binder's (1983) approach, accounting for complex samples and finite populations by Taylor series linearization. In contrast, the current National Assessment of…
NASA Astrophysics Data System (ADS)
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Suppression Situations in Multiple Linear Regression
ERIC Educational Resources Information Center
Shieh, Gwowen
2006-01-01
This article proposes alternative expressions for the two most prevailing definitions of suppression without resorting to the standardized regression modeling. The formulation provides a simple basis for the examination of their relationship. For the two-predictor regression, the author demonstrates that the previous results in the literature are…
Modelling Nitrogen Oxides in Los Angeles Using a Hybrid Dispersion/Land Use Regression Model
NASA Astrophysics Data System (ADS)
Wilton, Darren C.
The goal of this dissertation is to develop models capable of predicting long term annual average NOx concentrations in urban areas. Predictions from simple meteorological dispersion models and seasonal proxies for NO2 oxidation were included as covariates in a land use regression (LUR) model for NOx in Los Angeles, CA. The NO x measurements were obtained from a comprehensive measurement campaign that is part of the Multi-Ethnic Study of Atherosclerosis Air Pollution Study (MESA Air). Simple land use regression models were initially developed using a suite of GIS-derived land use variables developed from various buffer sizes (R²=0.15). Caline3, a simple steady-state Gaussian line source model, was initially incorporated into the land-use regression framework. The addition of this spatio-temporally varying Caline3 covariate improved the simple LUR model predictions. The extent of improvement was much more pronounced for models based solely on the summer measurements (simple LUR: R²=0.45; Caline3/LUR: R²=0.70), than it was for models based on all seasons (R²=0.20). We then used a Lagrangian dispersion model to convert static land use covariates for population density, commercial/industrial area into spatially and temporally varying covariates. The inclusion of these covariates resulted in significant improvement in model prediction (R²=0.57). In addition to the dispersion model covariates described above, a two-week average value of daily peak-hour ozone was included as a surrogate of the oxidation of NO2 during the different sampling periods. This additional covariate further improved overall model performance for all models. The best model by 10-fold cross validation (R²=0.73) contained the Caline3 prediction, a static covariate for length of A3 roads within 50 meters, the Calpuff-adjusted covariates derived from both population density and industrial/commercial land area, and the ozone covariate. This model was tested against annual average NOx concentrations from an independent data set from the EPA's Air Quality System (AQS) and MESA Air fixed site monitors, and performed very well (R²=0.82).
Sparse kernel methods for high-dimensional survival data.
Evers, Ludger; Messow, Claudia-Martina
2008-07-15
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
Chaudhuri, Anoshua; Roy, Kakoli
2008-10-01
Economic reforms in Vietnam initiated in the late 1980s included deregulation of the health system resulting in extensive changes in health care delivery, access, and financing. One aspect of the health sector reform was the introduction of user fees at both public and private health facilities, which was in stark contrast to the former socialized system of free medical care. Subsequently, health insurance and free health care cards for the poor were introduced to mitigate the barriers to seeking care and financial burden imposed by out-of-pocket (OOP) health payments as a result of the user fees. To examine the determinants of seeking care and OOP payments as well as the relationship between individual out-of-pocket (OOP) health expenditures and household ability to pay (ATP) during 1992-2002. The data are drawn from 1992-93 and 1997-98 Vietnam Living Standard Surveys (VLSS) and 2002 Vietnam Household and Living Standards Survey (VHLSS). We use a two-part model where the first part is a probit model that estimates the probability that an individual will seek treatment. The second part is a truncated non-linear regression model that uses ordinary least-squares and fixed effects methods to estimate the determinants of OOP payments that are measured both as absolute as well as relative expenditures. Based on the analysis, we examine the relationship between the predicted shares of individual OOP health payments and household's ATP as well as selected socioeconomic characteristics. Our results indicate that payments increased with increasing ATP, but the consequent financial burden (payment share) decreased with increasing ATP, indicating a regressive system during the first two periods. However, share of payments increased with ATP, indicating a progressive system by 2002. When comparing across years, we find horizontal inequities in all the years that worsened between 1992 and 1998 but improved by 2002. The regressivity in payments noted during 1992 and 1998 might be because the rich could avail of health insurance more than those at lower incomes and as a consequence, were able to use the healthcare system more effectively without paying a high OOP payment. In contrast, the poor either incurred higher OOP payments or were discouraged from seeking treatments until their ailment became serious. This inequality becomes exacerbated in 1998 when insurance take-up rates were not high, but the impact of privatization and deregulation was already occurring. By 2002, insurance take-up rates were much higher, and poverty alleviation policies (e.g., free health insurance and health fund membership targeted for the poor) were instituted, which may have resulted in a less regressive system.
Cooley, Richard L.
1982-01-01
Prior information on the parameters of a groundwater flow model can be used to improve parameter estimates obtained from nonlinear regression solution of a modeling problem. Two scales of prior information can be available: (1) prior information having known reliability (that is, bias and random error structure) and (2) prior information consisting of best available estimates of unknown reliability. A regression method that incorporates the second scale of prior information assumes the prior information to be fixed for any particular analysis to produce improved, although biased, parameter estimates. Approximate optimization of two auxiliary parameters of the formulation is used to help minimize the bias, which is almost always much smaller than that resulting from standard ridge regression. It is shown that if both scales of prior information are available, then a combined regression analysis may be made.
Normality of raw data in general linear models: The most widespread myth in statistics
Kery, Marc; Hatfield, Jeff S.
2003-01-01
In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.
Factors affecting dental service quality.
Bahadori, Mohammadkarim; Raadabadi, Mehdi; Ravangard, Ramin; Baldacchino, Donia
2015-01-01
Measuring dental clinic service quality is the first and most important factor in improving care. The quality provided plays an important role in patient satisfaction. The purpose of this paper is to identify factors affecting dental service quality from the patients' viewpoint. This cross-sectional, descriptive-analytical study was conducted in a dental clinic in Tehran between January and June 2014. A sample of 385 patients was selected from two work shifts using stratified sampling proportional to size and simple random sampling methods. The data were collected, a self-administered questionnaire designed for the purpose of the study, based on the Parasuraman and Zeithaml's model of service quality which consisted of two parts: the patients' demographic characteristics and a 30-item questionnaire to measure the five dimensions of the service quality. The collected data were analysed using SPSS 21.0 and Amos 18.0 through some descriptive statistics such as mean, standard deviation, as well as analytical methods, including confirmatory factor. Results showed that the correlation coefficients for all dimensions were higher than 0.5. In this model, assurance (regression weight=0.99) and tangibility (regression weight=0.86) had, respectively, the highest and lowest effects on dental service quality. The Parasuraman and Zeithaml's model is suitable to measure quality in dental services. The variables related to dental services quality have been made according to the model. This is a pioneering study that uses Parasuraman and Zeithaml's model and CFA in a dental setting. This study provides useful insights and guidance for dental service quality assurance.
A Novel Approach to Implement Takagi-Sugeno Fuzzy Models.
Chang, Chia-Wen; Tao, Chin-Wang
2017-09-01
This paper proposes new algorithms based on the fuzzy c-regressing model algorithm for Takagi-Sugeno (T-S) fuzzy modeling of the complex nonlinear systems. A fuzzy c-regression state model (FCRSM) algorithm is a T-S fuzzy model in which the functional antecedent and the state-space-model-type consequent are considered with the available input-output data. The antecedent and consequent forms of the proposed FCRSM consists mainly of two advantages: one is that the FCRSM has low computation load due to only one input variable is considered in the antecedent part; another is that the unknown system can be modeled to not only the polynomial form but also the state-space form. Moreover, the FCRSM can be extended to FCRSM-ND and FCRSM-Free algorithms. An algorithm FCRSM-ND is presented to find the T-S fuzzy state-space model of the nonlinear system when the input-output data cannot be precollected and an assumed effective controller is available. In the practical applications, the mathematical model of controller may be hard to be obtained. In this case, an online tuning algorithm, FCRSM-FREE, is designed such that the parameters of a T-S fuzzy controller and the T-S fuzzy state model of an unknown system can be online tuned simultaneously. Four numerical simulations are given to demonstrate the effectiveness of the proposed approach.
ERIC Educational Resources Information Center
Li, Spencer D.
2011-01-01
Mediation analysis in child and adolescent development research is possible using large secondary data sets. This article provides an overview of two statistical methods commonly used to test mediated effects in secondary analysis: multiple regression and structural equation modeling (SEM). Two empirical studies are presented to illustrate the…
ERIC Educational Resources Information Center
Jaeger, Audrey J.; Hinz, Derik
2009-01-01
Part-time faculty clearly serve a valuable purpose in higher education; however, their increased use raises concerns for administrators, faculty, and policy makers. Part-time faculty members spend a greater proportion of their overall time teaching, but the initial evidence suggests that these instructors are less available to students and are…
Modeling Governance KB with CATPCA to Overcome Multicollinearity in the Logistic Regression
NASA Astrophysics Data System (ADS)
Khikmah, L.; Wijayanto, H.; Syafitri, U. D.
2017-04-01
The problem often encounters in logistic regression modeling are multicollinearity problems. Data that have multicollinearity between explanatory variables with the result in the estimation of parameters to be bias. Besides, the multicollinearity will result in error in the classification. In general, to overcome multicollinearity in regression used stepwise regression. They are also another method to overcome multicollinearity which involves all variable for prediction. That is Principal Component Analysis (PCA). However, classical PCA in only for numeric data. Its data are categorical, one method to solve the problems is Categorical Principal Component Analysis (CATPCA). Data were used in this research were a part of data Demographic and Population Survey Indonesia (IDHS) 2012. This research focuses on the characteristic of women of using the contraceptive methods. Classification results evaluated using Area Under Curve (AUC) values. The higher the AUC value, the better. Based on AUC values, the classification of the contraceptive method using stepwise method (58.66%) is better than the logistic regression model (57.39%) and CATPCA (57.39%). Evaluation of the results of logistic regression using sensitivity, shows the opposite where CATPCA method (99.79%) is better than logistic regression method (92.43%) and stepwise (92.05%). Therefore in this study focuses on major class classification (using a contraceptive method), then the selected model is CATPCA because it can raise the level of the major class model accuracy.
Comparison and Contrast of Two General Functional Regression Modeling Frameworks
Morris, Jeffrey S.
2017-01-01
In this article, Greven and Scheipl describe an impressively general framework for performing functional regression that builds upon the generalized additive modeling framework. Over the past number of years, my collaborators and I have also been developing a general framework for functional regression, functional mixed models, which shares many similarities with this framework, but has many differences as well. In this discussion, I compare and contrast these two frameworks, to hopefully illuminate characteristics of each, highlighting their respecitve strengths and weaknesses, and providing recommendations regarding the settings in which each approach might be preferable. PMID:28736502
Comparison and Contrast of Two General Functional Regression Modeling Frameworks.
Morris, Jeffrey S
2017-02-01
In this article, Greven and Scheipl describe an impressively general framework for performing functional regression that builds upon the generalized additive modeling framework. Over the past number of years, my collaborators and I have also been developing a general framework for functional regression, functional mixed models, which shares many similarities with this framework, but has many differences as well. In this discussion, I compare and contrast these two frameworks, to hopefully illuminate characteristics of each, highlighting their respecitve strengths and weaknesses, and providing recommendations regarding the settings in which each approach might be preferable.
Developing a predictive tropospheric ozone model for Tabriz
NASA Astrophysics Data System (ADS)
Khatibi, Rahman; Naghipour, Leila; Ghorbani, Mohammad A.; Smith, Michael S.; Karimi, Vahid; Farhoudi, Reza; Delafrouz, Hadi; Arvanaghi, Hadi
2013-04-01
Predictive ozone models are becoming indispensable tools by providing a capability for pollution alerts to serve people who are vulnerable to the risks. We have developed a tropospheric ozone prediction capability for Tabriz, Iran, by using the following five modeling strategies: three regression-type methods: Multiple Linear Regression (MLR), Artificial Neural Networks (ANNs), and Gene Expression Programming (GEP); and two auto-regression-type models: Nonlinear Local Prediction (NLP) to implement chaos theory and Auto-Regressive Integrated Moving Average (ARIMA) models. The regression-type modeling strategies explain the data in terms of: temperature, solar radiation, dew point temperature, and wind speed, by regressing present ozone values to their past values. The ozone time series are available at various time intervals, including hourly intervals, from August 2010 to March 2011. The results for MLR, ANN and GEP models are not overly good but those produced by NLP and ARIMA are promising for the establishing a forecasting capability.
Efficient least angle regression for identification of linear-in-the-parameters models
Beach, Thomas H.; Rezgui, Yacine
2017-01-01
Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140
Milne, Roger L.; Herranz, Jesús; Michailidou, Kyriaki; Dennis, Joe; Tyrer, Jonathan P.; Zamora, M. Pilar; Arias-Perez, José Ignacio; González-Neira, Anna; Pita, Guillermo; Alonso, M. Rosario; Wang, Qin; Bolla, Manjeet K.; Czene, Kamila; Eriksson, Mikael; Humphreys, Keith; Darabi, Hatef; Li, Jingmei; Anton-Culver, Hoda; Neuhausen, Susan L.; Ziogas, Argyrios; Clarke, Christina A.; Hopper, John L.; Dite, Gillian S.; Apicella, Carmel; Southey, Melissa C.; Chenevix-Trench, Georgia; Swerdlow, Anthony; Ashworth, Alan; Orr, Nicholas; Schoemaker, Minouk; Jakubowska, Anna; Lubinski, Jan; Jaworska-Bieniek, Katarzyna; Durda, Katarzyna; Andrulis, Irene L.; Knight, Julia A.; Glendon, Gord; Mulligan, Anna Marie; Bojesen, Stig E.; Nordestgaard, Børge G.; Flyger, Henrik; Nevanlinna, Heli; Muranen, Taru A.; Aittomäki, Kristiina; Blomqvist, Carl; Chang-Claude, Jenny; Rudolph, Anja; Seibold, Petra; Flesch-Janys, Dieter; Wang, Xianshu; Olson, Janet E.; Vachon, Celine; Purrington, Kristen; Winqvist, Robert; Pylkäs, Katri; Jukkola-Vuorinen, Arja; Grip, Mervi; Dunning, Alison M.; Shah, Mitul; Guénel, Pascal; Truong, Thérèse; Sanchez, Marie; Mulot, Claire; Brenner, Hermann; Dieffenbach, Aida Karina; Arndt, Volker; Stegmaier, Christa; Lindblom, Annika; Margolin, Sara; Hooning, Maartje J.; Hollestelle, Antoinette; Collée, J. Margriet; Jager, Agnes; Cox, Angela; Brock, Ian W.; Reed, Malcolm W.R.; Devilee, Peter; Tollenaar, Robert A.E.M.; Seynaeve, Caroline; Haiman, Christopher A.; Henderson, Brian E.; Schumacher, Fredrick; Le Marchand, Loic; Simard, Jacques; Dumont, Martine; Soucy, Penny; Dörk, Thilo; Bogdanova, Natalia V.; Hamann, Ute; Försti, Asta; Rüdiger, Thomas; Ulmer, Hans-Ulrich; Fasching, Peter A.; Häberle, Lothar; Ekici, Arif B.; Beckmann, Matthias W.; Fletcher, Olivia; Johnson, Nichola; dos Santos Silva, Isabel; Peto, Julian; Radice, Paolo; Peterlongo, Paolo; Peissel, Bernard; Mariani, Paolo; Giles, Graham G.; Severi, Gianluca; Baglietto, Laura; Sawyer, Elinor; Tomlinson, Ian; Kerin, Michael; Miller, Nicola; Marme, Federik; Burwinkel, Barbara; Mannermaa, Arto; Kataja, Vesa; Kosma, Veli-Matti; Hartikainen, Jaana M.; Lambrechts, Diether; Yesilyurt, Betul T.; Floris, Giuseppe; Leunen, Karin; Alnæs, Grethe Grenaker; Kristensen, Vessela; Børresen-Dale, Anne-Lise; García-Closas, Montserrat; Chanock, Stephen J.; Lissowska, Jolanta; Figueroa, Jonine D.; Schmidt, Marjanka K.; Broeks, Annegien; Verhoef, Senno; Rutgers, Emiel J.; Brauch, Hiltrud; Brüning, Thomas; Ko, Yon-Dschun; Couch, Fergus J.; Toland, Amanda E.; Yannoukakos, Drakoulis; Pharoah, Paul D.P.; Hall, Per; Benítez, Javier; Malats, Núria; Easton, Douglas F.
2014-01-01
Part of the substantial unexplained familial aggregation of breast cancer may be due to interactions between common variants, but few studies have had adequate statistical power to detect interactions of realistic magnitude. We aimed to assess all two-way interactions in breast cancer susceptibility between 70 917 single nucleotide polymorphisms (SNPs) selected primarily based on prior evidence of a marginal effect. Thirty-eight international studies contributed data for 46 450 breast cancer cases and 42 461 controls of European origin as part of a multi-consortium project (COGS). First, SNPs were preselected based on evidence (P < 0.01) of a per-allele main effect, and all two-way combinations of those were evaluated by a per-allele (1 d.f.) test for interaction using logistic regression. Second, all 2.5 billion possible two-SNP combinations were evaluated using Boolean operation-based screening and testing, and SNP pairs with the strongest evidence of interaction (P < 10−4) were selected for more careful assessment by logistic regression. Under the first approach, 3277 SNPs were preselected, but an evaluation of all possible two-SNP combinations (1 d.f.) identified no interactions at P < 10−8. Results from the second analytic approach were consistent with those from the first (P > 10−10). In summary, we observed little evidence of two-way SNP interactions in breast cancer susceptibility, despite the large number of SNPs with potential marginal effects considered and the very large sample size. This finding may have important implications for risk prediction, simplifying the modelling required. Further comprehensive, large-scale genome-wide interaction studies may identify novel interacting loci if the inherent logistic and computational challenges can be overcome. PMID:24242184
Milne, Roger L; Herranz, Jesús; Michailidou, Kyriaki; Dennis, Joe; Tyrer, Jonathan P; Zamora, M Pilar; Arias-Perez, José Ignacio; González-Neira, Anna; Pita, Guillermo; Alonso, M Rosario; Wang, Qin; Bolla, Manjeet K; Czene, Kamila; Eriksson, Mikael; Humphreys, Keith; Darabi, Hatef; Li, Jingmei; Anton-Culver, Hoda; Neuhausen, Susan L; Ziogas, Argyrios; Clarke, Christina A; Hopper, John L; Dite, Gillian S; Apicella, Carmel; Southey, Melissa C; Chenevix-Trench, Georgia; Swerdlow, Anthony; Ashworth, Alan; Orr, Nicholas; Schoemaker, Minouk; Jakubowska, Anna; Lubinski, Jan; Jaworska-Bieniek, Katarzyna; Durda, Katarzyna; Andrulis, Irene L; Knight, Julia A; Glendon, Gord; Mulligan, Anna Marie; Bojesen, Stig E; Nordestgaard, Børge G; Flyger, Henrik; Nevanlinna, Heli; Muranen, Taru A; Aittomäki, Kristiina; Blomqvist, Carl; Chang-Claude, Jenny; Rudolph, Anja; Seibold, Petra; Flesch-Janys, Dieter; Wang, Xianshu; Olson, Janet E; Vachon, Celine; Purrington, Kristen; Winqvist, Robert; Pylkäs, Katri; Jukkola-Vuorinen, Arja; Grip, Mervi; Dunning, Alison M; Shah, Mitul; Guénel, Pascal; Truong, Thérèse; Sanchez, Marie; Mulot, Claire; Brenner, Hermann; Dieffenbach, Aida Karina; Arndt, Volker; Stegmaier, Christa; Lindblom, Annika; Margolin, Sara; Hooning, Maartje J; Hollestelle, Antoinette; Collée, J Margriet; Jager, Agnes; Cox, Angela; Brock, Ian W; Reed, Malcolm W R; Devilee, Peter; Tollenaar, Robert A E M; Seynaeve, Caroline; Haiman, Christopher A; Henderson, Brian E; Schumacher, Fredrick; Le Marchand, Loic; Simard, Jacques; Dumont, Martine; Soucy, Penny; Dörk, Thilo; Bogdanova, Natalia V; Hamann, Ute; Försti, Asta; Rüdiger, Thomas; Ulmer, Hans-Ulrich; Fasching, Peter A; Häberle, Lothar; Ekici, Arif B; Beckmann, Matthias W; Fletcher, Olivia; Johnson, Nichola; dos Santos Silva, Isabel; Peto, Julian; Radice, Paolo; Peterlongo, Paolo; Peissel, Bernard; Mariani, Paolo; Giles, Graham G; Severi, Gianluca; Baglietto, Laura; Sawyer, Elinor; Tomlinson, Ian; Kerin, Michael; Miller, Nicola; Marme, Federik; Burwinkel, Barbara; Mannermaa, Arto; Kataja, Vesa; Kosma, Veli-Matti; Hartikainen, Jaana M; Lambrechts, Diether; Yesilyurt, Betul T; Floris, Giuseppe; Leunen, Karin; Alnæs, Grethe Grenaker; Kristensen, Vessela; Børresen-Dale, Anne-Lise; García-Closas, Montserrat; Chanock, Stephen J; Lissowska, Jolanta; Figueroa, Jonine D; Schmidt, Marjanka K; Broeks, Annegien; Verhoef, Senno; Rutgers, Emiel J; Brauch, Hiltrud; Brüning, Thomas; Ko, Yon-Dschun; Couch, Fergus J; Toland, Amanda E; Yannoukakos, Drakoulis; Pharoah, Paul D P; Hall, Per; Benítez, Javier; Malats, Núria; Easton, Douglas F
2014-04-01
Part of the substantial unexplained familial aggregation of breast cancer may be due to interactions between common variants, but few studies have had adequate statistical power to detect interactions of realistic magnitude. We aimed to assess all two-way interactions in breast cancer susceptibility between 70,917 single nucleotide polymorphisms (SNPs) selected primarily based on prior evidence of a marginal effect. Thirty-eight international studies contributed data for 46,450 breast cancer cases and 42,461 controls of European origin as part of a multi-consortium project (COGS). First, SNPs were preselected based on evidence (P < 0.01) of a per-allele main effect, and all two-way combinations of those were evaluated by a per-allele (1 d.f.) test for interaction using logistic regression. Second, all 2.5 billion possible two-SNP combinations were evaluated using Boolean operation-based screening and testing, and SNP pairs with the strongest evidence of interaction (P < 10(-4)) were selected for more careful assessment by logistic regression. Under the first approach, 3277 SNPs were preselected, but an evaluation of all possible two-SNP combinations (1 d.f.) identified no interactions at P < 10(-8). Results from the second analytic approach were consistent with those from the first (P > 10(-10)). In summary, we observed little evidence of two-way SNP interactions in breast cancer susceptibility, despite the large number of SNPs with potential marginal effects considered and the very large sample size. This finding may have important implications for risk prediction, simplifying the modelling required. Further comprehensive, large-scale genome-wide interaction studies may identify novel interacting loci if the inherent logistic and computational challenges can be overcome.
Mohamed, Omar Ahmed; Masood, Syed Hasan; Bhowmik, Jahar Lal
2016-11-04
Fused deposition modeling (FDM) additive manufacturing has been intensively used for many industrial applications due to its attractive advantages over traditional manufacturing processes. The process parameters used in FDM have significant influence on the part quality and its properties. This process produces the plastic part through complex mechanisms and it involves complex relationships between the manufacturing conditions and the quality of the processed part. In the present study, the influence of multi-level manufacturing parameters on the temperature-dependent dynamic mechanical properties of FDM processed parts was investigated using IV-optimality response surface methodology (RSM) and multilayer feed-forward neural networks (MFNNs). The process parameters considered for optimization and investigation are slice thickness, raster to raster air gap, deposition angle, part print direction, bead width, and number of perimeters. Storage compliance and loss compliance were considered as response variables. The effect of each process parameter was investigated using developed regression models and multiple regression analysis. The surface characteristics are studied using scanning electron microscope (SEM). Furthermore, performance of optimum conditions was determined and validated by conducting confirmation experiment. The comparison between the experimental values and the predicted values by IV-Optimal RSM and MFNN was conducted for each experimental run and results indicate that the MFNN provides better predictions than IV-Optimal RSM.
Mohamed, Omar Ahmed; Masood, Syed Hasan; Bhowmik, Jahar Lal
2016-01-01
Fused deposition modeling (FDM) additive manufacturing has been intensively used for many industrial applications due to its attractive advantages over traditional manufacturing processes. The process parameters used in FDM have significant influence on the part quality and its properties. This process produces the plastic part through complex mechanisms and it involves complex relationships between the manufacturing conditions and the quality of the processed part. In the present study, the influence of multi-level manufacturing parameters on the temperature-dependent dynamic mechanical properties of FDM processed parts was investigated using IV-optimality response surface methodology (RSM) and multilayer feed-forward neural networks (MFNNs). The process parameters considered for optimization and investigation are slice thickness, raster to raster air gap, deposition angle, part print direction, bead width, and number of perimeters. Storage compliance and loss compliance were considered as response variables. The effect of each process parameter was investigated using developed regression models and multiple regression analysis. The surface characteristics are studied using scanning electron microscope (SEM). Furthermore, performance of optimum conditions was determined and validated by conducting confirmation experiment. The comparison between the experimental values and the predicted values by IV-Optimal RSM and MFNN was conducted for each experimental run and results indicate that the MFNN provides better predictions than IV-Optimal RSM. PMID:28774019
Erosion and soil displacement related to timber harvesting in northwestern California, U.S.A.
R.M. Rice; D.J. Furbish
1984-01-01
The relationship between measures of site disturbance and erosion resulting from timber harvest was studied by regression analyses. None of the 12 regression models developed and tested yielded a coefficient of determination (R2) greater than 0.60. The results indicated that the poor fits to the data were due, in part, to unexplained qualitative...
"Erosion and soil displacement related to timber harvesting in northwestern California, U.S.A."
R. M. Rice; D. J. Furbish
1984-01-01
The relationship between measures of site disturbance and erosion resulting from timber harvest was studied by regression analyses. None of the 12 regression models developed and tested yielded a coefficient of determination (R 2) greater than 0.60. The results indicated that the poor fits to the data were due, in part, to unexplained qualitative differences in...
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-11-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach has been justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatlands sites in Finland and a tundra site in Siberia. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. However, a rather large percentage of the exponential regression functions showed curvatures not consistent with the theoretical model which is considered to be caused by violations of the underlying model assumptions. Especially the effects of turbulence and pressure disturbances by the chamber deployment are suspected to have caused unexplainable curvatures. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes. The degree of underestimation increased with increasing CO2 flux strength and was dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
CDF and PDF Comparison Between Humacao, Puerto Rico and Florida
NASA Technical Reports Server (NTRS)
Gonzalez-Rodriguez, Rosana
2004-01-01
The knowledge of the atmospherics phenomenon is an important part in the communication system. The principal factor that contributes to the attenuation in a Ka band communication system is the rain attenuation. We have four years of tropical region observations. The data in the tropical region was taken in Humacao, Puerto Rico. Previous data had been collected at various climate regions such as desserts, template area and sub-tropical regions. Figure 1 shows the ITU-R rain zone map for North America. Rain rates are important to the rain attenuation prediction models. The models that predict attenuation generally are of two different kinds. The first one is the regression models. By using a data set these models provide an idea of the observed attenuation and rain rates distribution in the present, past and future. The second kinds of models are physical models which use the probability density functions (PDF).
Evaluation of trends in wheat yield models
NASA Technical Reports Server (NTRS)
Ferguson, M. C.
1982-01-01
Trend terms in models for wheat yield in the U.S. Great Plains for the years 1932 to 1976 are evaluated. The subset of meteorological variables yielding the largest adjusted R(2) is selected using the method of leaps and bounds. Latent root regression is used to eliminate multicollinearities, and generalized ridge regression is used to introduce bias to provide stability in the data matrix. The regression model used provides for two trends in each of two models: a dependent model in which the trend line is piece-wise continuous, and an independent model in which the trend line is discontinuous at the year of the slope change. It was found that the trend lines best describing the wheat yields consisted of combinations of increasing, decreasing, and constant trend: four combinations for the dependent model and seven for the independent model.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Butler, W.J.; Kalasinski, L.A.
In this paper, a generalized logistic regression model for correlated observations is used to analyze epidemiologic data on the frequency of spontaneous abortion among a group of women office workers. The results are compared to those obtained from the use of the standard logistic regression model that assumes statistical independence among all the pregnancies contributed by one woman. In this example, the correlation among pregnancies from the same woman is fairly small and did not have a substantial impact on the magnitude of estimates of parameters of the model. This is due at least partly to the small average numbermore » of pregnancies contributed by each woman.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kersaudy, Pierric, E-mail: pierric.kersaudy@orange.com; Whist Lab, 38 avenue du Général Leclerc, 92130 Issy-les-Moulineaux; ESYCOM, Université Paris-Est Marne-la-Vallée, 5 boulevard Descartes, 77700 Marne-la-Vallée
2015-04-01
In numerical dosimetry, the recent advances in high performance computing led to a strong reduction of the required computational time to assess the specific absorption rate (SAR) characterizing the human exposure to electromagnetic waves. However, this procedure remains time-consuming and a single simulation can request several hours. As a consequence, the influence of uncertain input parameters on the SAR cannot be analyzed using crude Monte Carlo simulation. The solution presented here to perform such an analysis is surrogate modeling. This paper proposes a novel approach to build such a surrogate model from a design of experiments. Considering a sparse representationmore » of the polynomial chaos expansions using least-angle regression as a selection algorithm to retain the most influential polynomials, this paper proposes to use the selected polynomials as regression functions for the universal Kriging model. The leave-one-out cross validation is used to select the optimal number of polynomials in the deterministic part of the Kriging model. The proposed approach, called LARS-Kriging-PC modeling, is applied to three benchmark examples and then to a full-scale metamodeling problem involving the exposure of a numerical fetus model to a femtocell device. The performances of the LARS-Kriging-PC are compared to an ordinary Kriging model and to a classical sparse polynomial chaos expansion. The LARS-Kriging-PC appears to have better performances than the two other approaches. A significant accuracy improvement is observed compared to the ordinary Kriging or to the sparse polynomial chaos depending on the studied case. This approach seems to be an optimal solution between the two other classical approaches. A global sensitivity analysis is finally performed on the LARS-Kriging-PC model of the fetus exposure problem.« less
Forsman, Henrietta; Rudman, Ann; Gustavsson, Petter; Ehrenberg, Anna; Wallin, Lars
2012-05-18
Nurses' research utilization (RU) as part of evidence-based practice is strongly emphasized in today's nursing education and clinical practice. The primary aim of RU is to provide high-quality nursing care to patients. Data on newly graduated nurses' RU are scarce, but a predominance of low use has been reported in recent studies. Factors associated with nurses' RU have previously been identified among individual and organizational/contextual factors, but there is a lack of knowledge about how these factors, including educational ones, interact with each other and with RU, particularly in nurses during the first years after graduation. The purpose of this study was therefore to identify factors that predict the probability for low RU among registered nurses two years after graduation. Data were collected as part of the LANE study (Longitudinal Analysis of Nursing Education), a Swedish national survey of nursing students and registered nurses. Data on nurses' instrumental, conceptual, and persuasive RU were collected two years after graduation (2007, n = 845), together with data on work contextual factors. Data on individual and educational factors were collected in the first year (2002) and last term of education (2004). Guided by an analytic schedule, bivariate analyses, followed by logistic regression modeling, were applied. Of the variables associated with RU in the bivariate analyses, six were found to be significantly related to low RU in the final logistic regression model: work in the psychiatric setting, role ambiguity, sufficient staffing, low work challenge, being male, and low student activity. A number of factors associated with nurses' low extent of RU two years postgraduation were found, most of them potentially modifiable. These findings illustrate the multitude of factors related to low RU extent and take their interrelationships into account. This knowledge might serve as useful input in planning future studies aiming to improve nurses', specifically newly graduated nurses', RU.
2012-01-01
Background Nurses’ research utilization (RU) as part of evidence-based practice is strongly emphasized in today’s nursing education and clinical practice. The primary aim of RU is to provide high-quality nursing care to patients. Data on newly graduated nurses’ RU are scarce, but a predominance of low use has been reported in recent studies. Factors associated with nurses’ RU have previously been identified among individual and organizational/contextual factors, but there is a lack of knowledge about how these factors, including educational ones, interact with each other and with RU, particularly in nurses during the first years after graduation. The purpose of this study was therefore to identify factors that predict the probability for low RU among registered nurses two years after graduation. Methods Data were collected as part of the LANE study (Longitudinal Analysis of Nursing Education), a Swedish national survey of nursing students and registered nurses. Data on nurses’ instrumental, conceptual, and persuasive RU were collected two years after graduation (2007, n = 845), together with data on work contextual factors. Data on individual and educational factors were collected in the first year (2002) and last term of education (2004). Guided by an analytic schedule, bivariate analyses, followed by logistic regression modeling, were applied. Results Of the variables associated with RU in the bivariate analyses, six were found to be significantly related to low RU in the final logistic regression model: work in the psychiatric setting, role ambiguity, sufficient staffing, low work challenge, being male, and low student activity. Conclusions A number of factors associated with nurses’ low extent of RU two years postgraduation were found, most of them potentially modifiable. These findings illustrate the multitude of factors related to low RU extent and take their interrelationships into account. This knowledge might serve as useful input in planning future studies aiming to improve nurses’, specifically newly graduated nurses’, RU. PMID:22607663
Survival Data and Regression Models
NASA Astrophysics Data System (ADS)
Grégoire, G.
2014-12-01
We start this chapter by introducing some basic elements for the analysis of censored survival data. Then we focus on right censored data and develop two types of regression models. The first one concerns the so-called accelerated failure time models (AFT), which are parametric models where a function of a parameter depends linearly on the covariables. The second one is a semiparametric model, where the covariables enter in a multiplicative form in the expression of the hazard rate function. The main statistical tool for analysing these regression models is the maximum likelihood methodology and, in spite we recall some essential results about the ML theory, we refer to the chapter "Logistic Regression" for a more detailed presentation.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Tøndel, Kristin; Indahl, Ulf G; Gjuvsland, Arne B; Vik, Jon Olav; Hunter, Peter; Omholt, Stig W; Martens, Harald
2011-06-01
Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems.
2011-01-01
Background Deterministic dynamic models of complex biological systems contain a large number of parameters and state variables, related through nonlinear differential equations with various types of feedback. A metamodel of such a dynamic model is a statistical approximation model that maps variation in parameters and initial conditions (inputs) to variation in features of the trajectories of the state variables (outputs) throughout the entire biologically relevant input space. A sufficiently accurate mapping can be exploited both instrumentally and epistemically. Multivariate regression methodology is a commonly used approach for emulating dynamic models. However, when the input-output relations are highly nonlinear or non-monotone, a standard linear regression approach is prone to give suboptimal results. We therefore hypothesised that a more accurate mapping can be obtained by locally linear or locally polynomial regression. We present here a new method for local regression modelling, Hierarchical Cluster-based PLS regression (HC-PLSR), where fuzzy C-means clustering is used to separate the data set into parts according to the structure of the response surface. We compare the metamodelling performance of HC-PLSR with polynomial partial least squares regression (PLSR) and ordinary least squares (OLS) regression on various systems: six different gene regulatory network models with various types of feedback, a deterministic mathematical model of the mammalian circadian clock and a model of the mouse ventricular myocyte function. Results Our results indicate that multivariate regression is well suited for emulating dynamic models in systems biology. The hierarchical approach turned out to be superior to both polynomial PLSR and OLS regression in all three test cases. The advantage, in terms of explained variance and prediction accuracy, was largest in systems with highly nonlinear functional relationships and in systems with positive feedback loops. Conclusions HC-PLSR is a promising approach for metamodelling in systems biology, especially for highly nonlinear or non-monotone parameter to phenotype maps. The algorithm can be flexibly adjusted to suit the complexity of the dynamic model behaviour, inviting automation in the metamodelling of complex systems. PMID:21627852
Zhang, Tao; Yang, Xiaojun
2013-01-01
Watershed-wide land-cover proportions can be used to predict the in-stream non-point source pollutant loadings through regression modeling. However, the model performance can vary greatly across different study sites and among various watersheds. Existing literature has shown that this type of regression modeling tends to perform better for large watersheds than for small ones, and that such a performance variation has been largely linked with different interwatershed landscape heterogeneity levels. The purpose of this study is to further examine the previously mentioned empirical observation based on a set of watersheds in the northern part of Georgia (USA) to explore the underlying causes of the variation in model performance. Through the combined use of the neutral landscape modeling approach and a spatially explicit nutrient loading model, we tested whether the regression model performance variation over the watershed groups ranging in size is due to the different watershed landscape heterogeneity levels. We adopted three neutral landscape modeling criteria that were tied with different similarity levels in watershed landscape properties and used the nutrient loading model to estimate the nitrogen loads for these neutral watersheds. Then we compared the regression model performance for the real and neutral landscape scenarios, respectively. We found that watershed size can affect the regression model performance both directly and indirectly. Along with the indirect effect through interwatershed heterogeneity, watershed size can directly affect the model performance over the watersheds varying in size. We also found that the regression model performance can be more significantly affected by other physiographic properties shaping nitrogen delivery effectiveness than the watershed land-cover heterogeneity. This study contrasts with many existing studies because it goes beyond hypothesis formulation based on empirical observations and into hypothesis testing to explore the fundamental mechanism.
Johnston, Kenton J; Hockenberry, Jason M; Rask, Kimberly J; Cunningham, Lynn; Brigham, Kenneth L; Martin, Greg S
2015-08-01
To evaluate the impact of a pilot workplace health partner intervention delivered by a predictive health institute to university and academic medical center employees on per-member, per-month health care expenditures. We analyzed the health care claims of participants versus nonparticipants, with a 12-month baseline and 24-month intervention period. Total per-member, per-month expenditures were analyzed using two-part regression models that controlled for sex, age, health benefit plan type, medical member months, and active employment months. Our regression results found no statistical differences in total expenditures at baseline and intervention. Further sensitivity analyses controlling for high cost outliers, comorbidities, and propensity to be in the intervention group confirmed these findings. We find no difference in health care expenditures attributable to the health partner intervention. The intervention does not seem to have raised expenditures in the short term.
NASA Astrophysics Data System (ADS)
Hoffmann, P.
2018-04-01
In this study two complementary approaches have been combined to estimate the reliability of the data-driven seasonal predictability of the meteorological summer mean temperature (T_{JJA}) over Europe. The developed model is based on linear regressions and uses early season predictors to estimate the target value T_{JJA}. We found for the Potsdam (Germany) climate station that the monthly standard deviations (σ) from January to April and the temperature mean ( m) in April are good predictors to describe T_{JJA} after 1990. However, before 1990 the model failed. The core region where this model works is the north-eastern part of Central Europe. We also analyzed long-term trends of monthly Hess/Brezowsky weather types as possible causes of the dynamical changes. In spring, a significant increase of the occurrences for two opposite weather patterns was found: Zonal Ridge across Central Europe (BM) and Trough over Central Europe (TRM). Both currently make up about 30% of the total alternating weather systems over Europe. Other weather types are predominantly decreasing or their trends are not significant. Thus, the predictability may be attributed to these two weather types where the difference between the two Z500 composite patterns is large. This also applies to the north-eastern part of Central Europe. Finally, the detected enhanced seasonal predictability over Europe is alarming, because severe side effects may occur. One of these are more frequent climate extremes in summer half-year.
Stamey, Timothy C.
1998-01-01
Simple and reliable methods for estimating hourly streamflow are needed for the calibration and verification of a Chattahoochee River basin model between Buford Dam and Franklin, Ga. The river basin model is being developed by Georgia Department of Natural Resources, Environmental Protection Division, as part of their Chattahoochee River Modeling Project. Concurrent streamflow data collected at 19 continuous-record, and 31 partial-record streamflow stations, were used in ordinary least-squares linear regression analyses to define estimating equations, and in verifying drainage-area prorations. The resulting regression or drainage-area ratio estimating equations were used to compute hourly streamflow at the partial-record stations. The coefficients of determination (r-squared values) for the regression estimating equations ranged from 0.90 to 0.99. Observed and estimated hourly and daily streamflow data were computed for May 1, 1995, through October 31, 1995. Comparisons of observed and estimated daily streamflow data for 12 continuous-record tributary stations, that had available streamflow data for all or part of the period from May 1, 1995, to October 31, 1995, indicate that the mean error of estimate for the daily streamflow was about 25 percent.
Guo, Changning; Doub, William H; Kauffman, John F
2010-08-01
Monte Carlo simulations were applied to investigate the propagation of uncertainty in both input variables and response measurements on model prediction for nasal spray product performance design of experiment (DOE) models in the first part of this study, with an initial assumption that the models perfectly represent the relationship between input variables and the measured responses. In this article, we discard the initial assumption, and extended the Monte Carlo simulation study to examine the influence of both input variable variation and product performance measurement variation on the uncertainty in DOE model coefficients. The Monte Carlo simulations presented in this article illustrate the importance of careful error propagation during product performance modeling. Our results show that the error estimates based on Monte Carlo simulation result in smaller model coefficient standard deviations than those from regression methods. This suggests that the estimated standard deviations from regression may overestimate the uncertainties in the model coefficients. Monte Carlo simulations provide a simple software solution to understand the propagation of uncertainty in complex DOE models so that design space can be specified with statistically meaningful confidence levels. (c) 2010 Wiley-Liss, Inc. and the American Pharmacists Association
Developing an Adequately Specified Model of State Level Student Achievement with Multilevel Data.
ERIC Educational Resources Information Center
Bernstein, Lawrence
Limitations of using linear, unilevel regression procedures in modeling student achievement are discussed. This study is a part of a broader study that is developing an empirically-based predictive model of variables associated with academic achievement from a multilevel perspective and examining the differences by which parameters are estimated…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Braeton J.; Shaneyfelt, Calvin R.
A NISAC study on the economic effects of a hypothetical H1N1 pandemic was done in order to assess the differential impacts at the state and industry levels given changes in absenteeism, mortality, and consumer spending rates. Part of the analysis was to determine if there were any direct relationships between pandemic impacts and gross domestic product (GDP) losses. Multiple regression analysis was used because it shows very clearly which predictors are significant in their impact on GDP. GDP impact data taken from the REMI PI+ (Regional Economic Models, Inc., Policy Insight +) model was used to serve as the responsemore » variable. NISAC economists selected the average absenteeism rate, mortality rate, and consumer spending categories as the predictor variables. Two outliers were found in the data: Nevada and Washington, DC. The analysis was done twice, with the outliers removed for the second analysis. The second set of regressions yielded a cleaner model, but for the purposes of this study, the analysts deemed it not as useful because particular interest was placed on determining the differential impacts to states. Hospitals and accommodation were found to be the most important predictors of percentage change in GDP among the consumer spending variables.« less
ERIC Educational Resources Information Center
Leow, Christine; Wen, Xiaoli; Korfmacher, Jon
2015-01-01
This article compares regression modeling and propensity score analysis as different types of statistical techniques used in addressing selection bias when estimating the impact of two-year versus one-year Head Start on children's school readiness. The analyses were based on the national Head Start secondary dataset. After controlling for…
Austin, Peter C; Lee, Douglas S; Steyerberg, Ewout W; Tu, Jack V
2012-01-01
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1999–2001 and 2004–2005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease. PMID:22777999
An Exploration of Changes in the Measurement of Mammography in the National Health Interview Survey.
Gonzales, Felisa A; Willis, Gordon B; Breen, Nancy; Yan, Ting; Cronin, Kathy A; Taplin, Stephen H; Yu, Mandi
2017-11-01
Background: Using the National Health Interview Survey (NHIS), we examined the effect of question wording on estimates of past-year mammography among racially/ethnically diverse women ages 40-49 and 50-74 without a history of breast cancer. Methods: Data from one-part ("Have you had a mammogram during the past 12 months?") and two-part ("Have you ever had a mammogram"; "When did you have your most recent mammogram?") mammography history questions administered in the 2008, 2011, and 2013 NHIS were analyzed. χ 2 tests provided estimates of changes in mammography when question wording was either the same (two-part question) or differed (two-part question followed by one-part question) in the two survey years compared. Crosstabulations and regression models assessed the type, extent, and correlates of inconsistent responses to the two questions in 2013. Results: Reports of past-year mammography were slightly higher in years when the one-part question was asked than when the two-part question was asked. Nearly 10% of women provided inconsistent responses to the two questions asked in 2013. Black women ages 50 to 74 [adjusted OR (aOR), 1.50; 95% confidence interval (CI), 1.16-1.93] and women ages 40-49 in poor health (aOR, 2.22; 95% CI, 1.09-4.52) had higher odds of inconsistent responses; women without a usual source of care had lower odds (40-49: aOR, 0.42; 95% CI, 0.21-0.85; 50-74: aOR, 0.42; 95% CI, 0.24-0.74). Conclusions: Self-reports of mammography are sensitive to question wording. Researchers should use equivalent questions that have been designed to minimize response biases such as telescoping and social desirability. Impact: Trend analyses relying on differently worded questions may be misleading and conceal disparities. Cancer Epidemiol Biomarkers Prev; 26(11); 1611-8. ©2017 AACR . ©2017 American Association for Cancer Research.
A method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1971-01-01
A computer-oriented technique is presented for performing a nonlinear exponential regression analysis on decay-type experimental data. The technique involves the least squares procedure wherein the nonlinear problem is linearized by expansion in a Taylor series. A linear curve fitting procedure for determining the initial nominal estimates for the unknown exponential model parameters is included as an integral part of the technique. A correction matrix was derived and then applied to the nominal estimate to produce an improved set of model parameters. The solution cycle is repeated until some predetermined criterion is satisfied.
USING LINEAR AND POLYNOMIAL MODELS TO EXAMINE THE ENVIRONMENTAL STABILITY OF VIRUSES
The article presents the development of model equations for describing the fate of viral infectivity in environmental samples. Most of the models were based upon the use of a two-step linear regression approach. The first step employs regression of log base 10 transformed viral t...
NASA Astrophysics Data System (ADS)
Wibowo, Wahyu; Wene, Chatrien; Budiantara, I. Nyoman; Permatasari, Erma Oktania
2017-03-01
Multiresponse semiparametric regression is simultaneous equation regression model and fusion of parametric and nonparametric model. The regression model comprise several models and each model has two components, parametric and nonparametric. The used model has linear function as parametric and polynomial truncated spline as nonparametric component. The model can handle both linearity and nonlinearity relationship between response and the sets of predictor variables. The aim of this paper is to demonstrate the application of the regression model for modeling of effect of regional socio-economic on use of information technology. More specific, the response variables are percentage of households has access to internet and percentage of households has personal computer. Then, predictor variables are percentage of literacy people, percentage of electrification and percentage of economic growth. Based on identification of the relationship between response and predictor variable, economic growth is treated as nonparametric predictor and the others are parametric predictors. The result shows that the multiresponse semiparametric regression can be applied well as indicate by the high coefficient determination, 90 percent.
Regression Models and Fuzzy Logic Prediction of TBM Penetration Rate
NASA Astrophysics Data System (ADS)
Minh, Vu Trieu; Katushin, Dmitri; Antonov, Maksim; Veinthal, Renno
2017-03-01
This paper presents statistical analyses of rock engineering properties and the measured penetration rate of tunnel boring machine (TBM) based on the data of an actual project. The aim of this study is to analyze the influence of rock engineering properties including uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW), and the alpha angle (Alpha) between the tunnel axis and the planes of weakness on the TBM rate of penetration (ROP). Four
[From clinical judgment to linear regression model.
Palacios-Cruz, Lino; Pérez, Marcela; Rivas-Ruiz, Rodolfo; Talavera, Juan O
2013-01-01
When we think about mathematical models, such as linear regression model, we think that these terms are only used by those engaged in research, a notion that is far from the truth. Legendre described the first mathematical model in 1805, and Galton introduced the formal term in 1886. Linear regression is one of the most commonly used regression models in clinical practice. It is useful to predict or show the relationship between two or more variables as long as the dependent variable is quantitative and has normal distribution. Stated in another way, the regression is used to predict a measure based on the knowledge of at least one other variable. Linear regression has as it's first objective to determine the slope or inclination of the regression line: Y = a + bx, where "a" is the intercept or regression constant and it is equivalent to "Y" value when "X" equals 0 and "b" (also called slope) indicates the increase or decrease that occurs when the variable "x" increases or decreases in one unit. In the regression line, "b" is called regression coefficient. The coefficient of determination (R 2 ) indicates the importance of independent variables in the outcome.
Recruitment of burbot (Lota lota L.) in Lake Erie: An empirical modelling approach
Stapanian, M.A.; Witzel, L.D.; Cook, A.
2010-01-01
World-wide, many burbot Lota lota (L.) populations have been extirpated or are otherwise in need of conservation measures. By contrast, burbot made a dramatic recovery in Lake Erie during 1993-2001 but declined during 2002-2007, due in part to a sharp decrease in recruitment. We used Akaike's Information Criterion to evaluate 129 linear regression models that included all combinations of one to seven ecological indices as predictors of burbot recruitment. Two models were substantially supported by the data: (i) the number of days in which water temperatures were within optimal ranges for burbot spawning and development combined with biomass of yearling and older (YAO) yellow perch Perca flavescens (Mitchill); and (ii) biomass of YAO yellow perch. Warmer winter water temperatures and increases in yellow perch biomass were associated with decreases in burbot recruitment. Continued warm winter water temperatures could result in declines in burbot recruitment, particularly in the southern part of the species' range. Published 2010. This article is a US Government work and is in the public domain in the USA.
A Multilevel Model for Comorbid Outcomes: Obesity and Diabetes in the US
Congdon, Peter
2010-01-01
Multilevel models are overwhelmingly applied to single health outcomes, but when two or more health conditions are closely related, it is important that contextual variation in their joint prevalence (e.g., variations over different geographic settings) is considered. A multinomial multilevel logit regression approach for analysing joint prevalence is proposed here that includes subject level risk factors (e.g., age, race, education) while also taking account of geographic context. Data from a US population health survey (the 2007 Behavioral Risk Factor Surveillance System or BRFSS) are used to illustrate the method, with a six category multinomial outcome defined by diabetic status and weight category (obese, overweight, normal). The influence of geographic context is partly represented by known geographic variables (e.g., county poverty), and partly by a model for latent area influences. In particular, a shared latent variable (common factor) approach is proposed to measure the impact of unobserved area influences on joint weight and diabetes status, with the latent variable being spatially structured to reflect geographic clustering in risk. PMID:20616977
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (P<0.05). In addition, a comparison of the area under receiver operating characteristic curves of the two models showed a statistically significant difference (P<0.05). The RBF ANNs model is more likely to predict the occurrence of PVT induced by AP than logistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Spatial interpolation schemes of daily precipitation for hydrologic modeling
Hwang, Y.; Clark, M.R.; Rajagopalan, B.; Leavesley, G.
2012-01-01
Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. We compare and contrast the performance of regression-based statistical methods for the spatial estimation of precipitation in two hydrologically different basins and confirmed that widely used regression-based estimation schemes fail to describe the realistic spatial variability of daily precipitation field. The methods assessed are: (1) inverse distance weighted average; (2) multiple linear regression (MLR); (3) climatological MLR; and (4) locally weighted polynomial regression (LWP). In order to improve the performance of the interpolations, the authors propose a two-step regression technique for effective daily precipitation estimation. In this simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before estimate the amount of precipitation separately on wet days. This process generated the precipitation occurrence, amount, and spatial correlation effectively. A distributed hydrologic model (PRMS) was used for the impact analysis in daily time step simulation. Multiple simulations suggested noticeable differences between the input alternatives generated by three different interpolation schemes. Differences are shown in overall simulation error against the observations, degree of explained variability, and seasonal volumes. Simulated streamflows also showed different characteristics in mean, maximum, minimum, and peak flows. Given the same parameter optimization technique, LWP input showed least streamflow error in Alapaha basin and CMLR input showed least error (still very close to LWP) in Animas basin. All of the two-step interpolation inputs resulted in lower streamflow error compared to the directly interpolated inputs. ?? 2011 Springer-Verlag.
ERIC Educational Resources Information Center
Yan, Jun; Aseltine, Robert H., Jr.; Harel, Ofer
2013-01-01
Comparing regression coefficients between models when one model is nested within another is of great practical interest when two explanations of a given phenomenon are specified as linear models. The statistical problem is whether the coefficients associated with a given set of covariates change significantly when other covariates are added into…
Hsu, David
2015-09-27
Clustering methods are often used to model energy consumption for two reasons. First, clustering is often used to process data and to improve the predictive accuracy of subsequent energy models. Second, stable clusters that are reproducible with respect to non-essential changes can be used to group, target, and interpret observed subjects. However, it is well known that clustering methods are highly sensitive to the choice of algorithms and variables. This can lead to misleading assessments of predictive accuracy and mis-interpretation of clusters in policymaking. This paper therefore introduces two methods to the modeling of energy consumption in buildings: clusterwise regression,more » also known as latent class regression, which integrates clustering and regression simultaneously; and cluster validation methods to measure stability. Using a large dataset of multifamily buildings in New York City, clusterwise regression is compared to common two-stage algorithms that use K-means and model-based clustering with linear regression. Predictive accuracy is evaluated using 20-fold cross validation, and the stability of the perturbed clusters is measured using the Jaccard coefficient. These results show that there seems to be an inherent tradeoff between prediction accuracy and cluster stability. This paper concludes by discussing which clustering methods may be appropriate for different analytical purposes.« less
Zhang, Chao; Jia, Pengli; Yu, Liu; Xu, Chang
2018-05-01
Dose-response meta-analysis (DRMA) is widely applied to investigate the dose-specific relationship between independent and dependent variables. Such methods have been in use for over 30 years and are increasingly employed in healthcare and clinical decision-making. In this article, we give an overview of the methodology used in DRMA. We summarize the commonly used regression model and the pooled method in DRMA. We also use an example to illustrate how to employ a DRMA by these methods. Five regression models, linear regression, piecewise regression, natural polynomial regression, fractional polynomial regression, and restricted cubic spline regression, were illustrated in this article to fit the dose-response relationship. And two types of pooling approaches, that is, one-stage approach and two-stage approach are illustrated to pool the dose-response relationship across studies. The example showed similar results among these models. Several dose-response meta-analysis methods can be used for investigating the relationship between exposure level and the risk of an outcome. However the methodology of DRMA still needs to be improved. © 2018 Chinese Cochrane Center, West China Hospital of Sichuan University and John Wiley & Sons Australia, Ltd.
Lee, Byeong-Ju; Kim, Hye-Youn; Lim, Sa Rang; Huang, Linfang; Choi, Hyung-Kyoon
2017-01-01
Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values.
Lim, Sa Rang; Huang, Linfang
2017-01-01
Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values. PMID:29049369
Mager, P P; Rothe, H
1990-10-01
Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.
Occlusal factors are not related to self-reported bruxism.
Manfredini, Daniele; Visscher, Corine M; Guarda-Nardini, Luca; Lobbezoo, Frank
2012-01-01
To estimate the contribution of various occlusal features of the natural dentition that may identify self-reported bruxers compared to nonbruxers. Two age- and sex-matched groups of self-reported bruxers (n = 67) and self-reported nonbruxers (n = 75) took part in the study. For each patient, the following occlusal features were clinically assessed: retruded contact position (RCP) to intercuspal contact position (ICP) slide length (< 2 mm was considered normal), vertical overlap (< 0 mm was considered an anterior open bite; > 4 mm, a deep bite), horizontal overlap (> 4 mm was considered a large horizontal overlap), incisor dental midline discrepancy (< 2 mm was considered normal), and the presence of a unilateral posterior crossbite, mediotrusive interferences, and laterotrusive interferences. A multiple logistic regression model was used to identify the significant associations between the assessed occlusal features (independent variables) and self-reported bruxism (dependent variable). Accuracy values to predict self-reported bruxism were unacceptable for all occlusal variables. The only variable remaining in the final regression model was laterotrusive interferences (P = .030). The percentage of explained variance for bruxism by the final multiple regression model was 4.6%. This model including only one occlusal factor showed low positive (58.1%) and negative predictive values (59.7%), thus showing a poor accuracy to predict the presence of self-reported bruxism (59.2%). This investigation suggested that the contribution of occlusion to the differentiation between bruxers and nonbruxers is negligible. This finding supports theories that advocate a much diminished role for peripheral anatomical-structural factors in the pathogenesis of bruxism.
Hendriks, A Jan; Smítková, Hana; Huijbregts, Mark A J
2007-11-01
Exposure of humans to chemicals in beef or milk is part of almost all risk evaluation procedures carried out to reduce emissions or to remediate sites. Concentrations of substances in these livestock products are often estimated using log-log regressions that relate the biotransfer factor BTF to the octanol-water partition ratio K(ow). However, the correctness of these empirical correlations has been questioned. Here, we compare them to the mechanistic model OMEGA that describes the distribution of substances in organisms by integrating theory on chemical fugacity and biological allometry. OMEGA has been calibrated and validated on thousands of laboratory and field data, reflecting many chemical substances and biological species. Overall fluxes of water, food, tissue (growth), milk and stable substances calculated by OMEGA are within a factor of two from independent data obtained in experiments. Rate constants measured for elimination of individual compounds of a recalcitrant nature vary around the level expected from the model for output to faeces and milk. Both data and model suggest that biotransfer BTF of stable substances to beef and milk is independent of the octanol-water partition ratio K(ow) in the range of 10(3)-10(6). This contradicts empirical regressions including stable and labile compounds. As expected, levels of labile substances vary widely around a tentative indication derived from the model. Transformation and accumulation of labile substances remains highly specific for the chemical and organism concerned but depends weakly on the octanol-water partition ratio K(ow). Several possibilities for additional refinement are identified.
NASA Astrophysics Data System (ADS)
Kutzbach, L.; Schneider, J.; Sachs, T.; Giebels, M.; Nykänen, H.; Shurpali, N. J.; Martikainen, P. J.; Alm, J.; Wilmking, M.
2007-07-01
Closed (non-steady state) chambers are widely used for quantifying carbon dioxide (CO2) fluxes between soils or low-stature canopies and the atmosphere. It is well recognised that covering a soil or vegetation by a closed chamber inherently disturbs the natural CO2 fluxes by altering the concentration gradients between the soil, the vegetation and the overlying air. Thus, the driving factors of CO2 fluxes are not constant during the closed chamber experiment, and no linear increase or decrease of CO2 concentration over time within the chamber headspace can be expected. Nevertheless, linear regression has been applied for calculating CO2 fluxes in many recent, partly influential, studies. This approach was justified by keeping the closure time short and assuming the concentration change over time to be in the linear range. Here, we test if the application of linear regression is really appropriate for estimating CO2 fluxes using closed chambers over short closure times and if the application of nonlinear regression is necessary. We developed a nonlinear exponential regression model from diffusion and photosynthesis theory. This exponential model was tested with four different datasets of CO2 flux measurements (total number: 1764) conducted at three peatland sites in Finland and a tundra site in Siberia. The flux measurements were performed using transparent chambers on vegetated surfaces and opaque chambers on bare peat surfaces. Thorough analyses of residuals demonstrated that linear regression was frequently not appropriate for the determination of CO2 fluxes by closed-chamber methods, even if closure times were kept short. The developed exponential model was well suited for nonlinear regression of the concentration over time c(t) evolution in the chamber headspace and estimation of the initial CO2 fluxes at closure time for the majority of experiments. CO2 flux estimates by linear regression can be as low as 40% of the flux estimates of exponential regression for closure times of only two minutes and even lower for longer closure times. The degree of underestimation increased with increasing CO2 flux strength and is dependent on soil and vegetation conditions which can disturb not only the quantitative but also the qualitative evaluation of CO2 flux dynamics. The underestimation effect by linear regression was observed to be different for CO2 uptake and release situations which can lead to stronger bias in the daily, seasonal and annual CO2 balances than in the individual fluxes. To avoid serious bias of CO2 flux estimates based on closed chamber experiments, we suggest further tests using published datasets and recommend the use of nonlinear regression models for future closed chamber studies.
Climate Prediction Center - Seasonal Outlook
SEASONAL CLIMATE VARIABILITY, INCLUDING ENSO, SOIL MOISTURE, AND VARIOUS STATE-OF-THE-ART DYNAMICAL MODEL ACROSS PARTS OF THE EAST-CENTRAL CONUS CENTERED ON THE MISSISSIPPI RIVER. THIS IS DUE TO VERY HIGH SOIL TRENDS, NEGATIVE SOIL MOISTURE ANOMALIES, LAGGED ENSO REGRESSIONS, AND DYNAMICAL MODEL GUIDANCE ARE ALL
Predicting Quantitative Traits With Regression Models for Dense Molecular Markers and Pedigree
de los Campos, Gustavo; Naya, Hugo; Gianola, Daniel; Crossa, José; Legarra, Andrés; Manfredi, Eduardo; Weigel, Kent; Cotes, José Miguel
2009-01-01
The availability of genomewide dense markers brings opportunities and challenges to breeding programs. An important question concerns the ways in which dense markers and pedigrees, together with phenotypic records, should be used to arrive at predictions of genetic values for complex traits. If a large number of markers are included in a regression model, marker-specific shrinkage of regression coefficients may be needed. For this reason, the Bayesian least absolute shrinkage and selection operator (LASSO) (BL) appears to be an interesting approach for fitting marker effects in a regression model. This article adapts the BL to arrive at a regression model where markers, pedigrees, and covariates other than markers are considered jointly. Connections between BL and other marker-based regression models are discussed, and the sensitivity of BL with respect to the choice of prior distributions assigned to key parameters is evaluated using simulation. The proposed model was fitted to two data sets from wheat and mouse populations, and evaluated using cross-validation methods. Results indicate that inclusion of markers in the regression further improved the predictive ability of models. An R program that implements the proposed model is freely available. PMID:19293140
NASA Technical Reports Server (NTRS)
Duda, David P.; Minnis, Patrick
2009-01-01
Previous studies have shown that probabilistic forecasting may be a useful method for predicting persistent contrail formation. A probabilistic forecast to accurately predict contrail formation over the contiguous United States (CONUS) is created by using meteorological data based on hourly meteorological analyses from the Advanced Regional Prediction System (ARPS) and from the Rapid Update Cycle (RUC) as well as GOES water vapor channel measurements, combined with surface and satellite observations of contrails. Two groups of logistic models were created. The first group of models (SURFACE models) is based on surface-based contrail observations supplemented with satellite observations of contrail occurrence. The second group of models (OUTBREAK models) is derived from a selected subgroup of satellite-based observations of widespread persistent contrails. The mean accuracies for both the SURFACE and OUTBREAK models typically exceeded 75 percent when based on the RUC or ARPS analysis data, but decreased when the logistic models were derived from ARPS forecast data.
Static and moving solid/gas interface modeling in a hybrid rocket engine
NASA Astrophysics Data System (ADS)
Mangeot, Alexandre; William-Louis, Mame; Gillard, Philippe
2018-07-01
A numerical model was developed with CFD-ACE software to study the working condition of an oxygen-nitrogen/polyethylene hybrid rocket combustor. As a first approach, a simplified numerical model is presented. It includes a compressible transient gas phase in which a two-step combustion mechanism is implemented coupled to a radiative model. The solid phase from the fuel grain is a semi-opaque material with its degradation process modeled by an Arrhenius type law. Two versions of the model were tested. The first considers the solid/gas interface with a static grid while the second uses grid deformation during the computation to follow the asymmetrical regression. The numerical results are obtained with two different regression kinetics originating from ThermoGravimetry Analysis and test bench results. In each case, the fuel surface temperature is retrieved within a range of 5% error. However, good results are only found using kinetics from the test bench. The regression rate is found within 0.03 mm s-1 and average combustor pressure and its variation over time have the same intensity than the measurements conducted on the test bench. The simulation that uses grid deformation to follow the regression shows a good stability over a 10 s simulated time simulation.
NASA Astrophysics Data System (ADS)
Yang, J.; Astitha, M.; Anagnostou, E. N.; Hartman, B.; Kallos, G. B.
2015-12-01
Weather prediction accuracy has become very important for the Northeast U.S. given the devastating effects of extreme weather events in the recent years. Weather forecasting systems are used towards building strategies to prevent catastrophic losses for human lives and the environment. Concurrently, weather forecast tools and techniques have evolved with improved forecast skill as numerical prediction techniques are strengthened by increased super-computing resources. In this study, we examine the combination of two state-of-the-science atmospheric models (WRF and RAMS/ICLAMS) by utilizing a Bayesian regression approach to improve the prediction of extreme weather events for NE U.S. The basic concept behind the Bayesian regression approach is to take advantage of the strengths of two atmospheric modeling systems and, similar to the multi-model ensemble approach, limit their weaknesses which are related to systematic and random errors in the numerical prediction of physical processes. The first part of this study is focused on retrospective simulations of seventeen storms that affected the region in the period 2004-2013. Optimal variances are estimated by minimizing the root mean square error and are applied to out-of-sample weather events. The applicability and usefulness of this approach are demonstrated by conducting an error analysis based on in-situ observations from meteorological stations of the National Weather Service (NWS) for wind speed and wind direction, and NCEP Stage IV radar data, mosaicked from the regional multi-sensor for precipitation. The preliminary results indicate a significant improvement in the statistical metrics of the modeled-observed pairs for meteorological variables using various combinations of the sixteen events as predictors of the seventeenth. This presentation will illustrate the implemented methodology and the obtained results for wind speed, wind direction and precipitation, as well as set the research steps that will be followed in the future.
The Fringe-Imaging Skin Friction Technique PC Application User's Manual
NASA Technical Reports Server (NTRS)
Zilliac, Gregory G.
1999-01-01
A personal computer application (CXWIN4G) has been written which greatly simplifies the task of extracting skin friction measurements from interferograms of oil flows on the surface of wind tunnel models. Images are first calibrated, using a novel approach to one-camera photogrammetry, to obtain accurate spatial information on surfaces with curvature. As part of the image calibration process, an auxiliary file containing the wind tunnel model geometry is used in conjunction with a two-dimensional direct linear transformation to relate the image plane to the physical (model) coordinates. The application then applies a nonlinear regression model to accurately determine the fringe spacing from interferometric intensity records as required by the Fringe Imaging Skin Friction (FISF) technique. The skin friction is found through application of a simple expression that makes use of lubrication theory to relate fringe spacing to skin friction.
Spatial vulnerability assessments by regression kriging
NASA Astrophysics Data System (ADS)
Pásztor, László; Laborczi, Annamária; Takács, Katalin; Szatmári, Gábor
2016-04-01
Two fairly different complex environmental phenomena, causing natural hazard were mapped based on a combined spatial inference approach. The behaviour is related to various environmental factors and the applied approach enables the inclusion of several, spatially exhaustive auxiliary variables that are available for mapping. Inland excess water (IEW) is an interrelated natural and human induced phenomenon causes several problems in the flat-land regions of Hungary, which cover nearly half of the country. The term 'inland excess water' refers to the occurrence of inundations outside the flood levee that originate from sources differing from flood overflow, it is surplus surface water forming due to the lack of runoff, insufficient absorption capability of soil or the upwelling of groundwater. There is a multiplicity of definitions, which indicate the complexity of processes that govern this phenomenon. Most of the definitions have a common part, namely, that inland excess water is temporary water inundation that occurs in flat-lands due to both precipitation and groundwater emerging on the surface as substantial sources. Radon gas is produced in the radioactive decay chain of uranium, which is an element that is naturally present in soils. Radon is transported mainly by diffusion and convection mechanisms through the soil depending mainly on soil physical and meteorological parameters and can enter and accumulate in the buildings. Health risk originating from indoor radon concentration attributed to natural factors is characterized by geogenic radon potential (GRP). In addition to geology and meteorology, physical soil properties play significant role in the determination of GRP. Identification of areas with high risk requires spatial modelling, that is mapping of specific natural hazards. In both cases external environmental factors determine the behaviour of the target process (occurrence/frequncy of IEW and grade of GRP respectively). Spatial auxiliary information representing IEW or GRP forming environmental factors were taken into account to support the spatial inference of the locally experienced IEW frequency and measured GRP values respectively. An efficient spatial prediction methodology was applied to construct reliable maps, namely regression kriging (RK) using spatially exhaustive auxiliary data on soil, geology, topography, land use and climate. RK divides the spatial inference into two parts. Firstly the deterministic component of the target variable is determined by a regression model. The residuals of the multiple linear regression analysis represent the spatially varying but dependent stochastic component, which are interpolated by kriging. The final map is the sum of the two component predictions. Application of RK also provides the possibility of inherent accuracy assessment. The resulting maps are characterized by global and local measures of its accuracy. Additionally the method enables interval estimation for spatial extension of the areas of predefined risk categories. All of these outputs provide useful contribution to spatial planning, action planning and decision making. Acknowledgement: Our work was partly supported by the Hungarian National Scientific Research Foundation (OTKA, Grant No. K105167).
Orthogonal Projection in Teaching Regression and Financial Mathematics
ERIC Educational Resources Information Center
Kachapova, Farida; Kachapov, Ilias
2010-01-01
Two improvements in teaching linear regression are suggested. The first is to include the population regression model at the beginning of the topic. The second is to use a geometric approach: to interpret the regression estimate as an orthogonal projection and the estimation error as the distance (which is minimized by the projection). Linear…
NASA Astrophysics Data System (ADS)
Mitra, Ashis; Majumdar, Prabal Kumar; Bannerjee, Debamalya
2013-03-01
This paper presents a comparative analysis of two modeling methodologies for the prediction of air permeability of plain woven handloom cotton fabrics. Four basic fabric constructional parameters namely ends per inch, picks per inch, warp count and weft count have been used as inputs for artificial neural network (ANN) and regression models. Out of the four regression models tried, interaction model showed very good prediction performance with a meager mean absolute error of 2.017 %. However, ANN models demonstrated superiority over the regression models both in terms of correlation coefficient and mean absolute error. The ANN model with 10 nodes in the single hidden layer showed very good correlation coefficient of 0.982 and 0.929 and mean absolute error of only 0.923 and 2.043 % for training and testing data respectively.
Interquantile Shrinkage in Regression Models
Jiang, Liewen; Wang, Huixia Judy; Bondell, Howard D.
2012-01-01
Conventional analysis using quantile regression typically focuses on fitting the regression model at different quantiles separately. However, in situations where the quantile coefficients share some common feature, joint modeling of multiple quantiles to accommodate the commonality often leads to more efficient estimation. One example of common features is that a predictor may have a constant effect over one region of quantile levels but varying effects in other regions. To automatically perform estimation and detection of the interquantile commonality, we develop two penalization methods. When the quantile slope coefficients indeed do not change across quantile levels, the proposed methods will shrink the slopes towards constant and thus improve the estimation efficiency. We establish the oracle properties of the two proposed penalization methods. Through numerical investigations, we demonstrate that the proposed methods lead to estimations with competitive or higher efficiency than the standard quantile regression estimation in finite samples. Supplemental materials for the article are available online. PMID:24363546
Bioinactivation: Software for modelling dynamic microbial inactivation.
Garre, Alberto; Fernández, Pablo S; Lindqvist, Roland; Egea, Jose A
2017-03-01
This contribution presents the bioinactivation software, which implements functions for the modelling of isothermal and non-isothermal microbial inactivation. This software offers features such as user-friendliness, modelling of dynamic conditions, possibility to choose the fitting algorithm and generation of prediction intervals. The software is offered in two different formats: Bioinactivation core and Bioinactivation SE. Bioinactivation core is a package for the R programming language, which includes features for the generation of predictions and for the fitting of models to inactivation experiments using non-linear regression or a Markov Chain Monte Carlo algorithm (MCMC). The calculations are based on inactivation models common in academia and industry (Bigelow, Peleg, Mafart and Geeraerd). Bioinactivation SE supplies a user-friendly interface to selected functions of Bioinactivation core, namely the model fitting of non-isothermal experiments and the generation of prediction intervals. The capabilities of bioinactivation are presented in this paper through a case study, modelling the non-isothermal inactivation of Bacillus sporothermodurans. This study has provided a full characterization of the response of the bacteria to dynamic temperature conditions, including confidence intervals for the model parameters and a prediction interval of the survivor curve. We conclude that the MCMC algorithm produces a better characterization of the biological uncertainty and variability than non-linear regression. The bioinactivation software can be relevant to the food and pharmaceutical industry, as well as to regulatory agencies, as part of a (quantitative) microbial risk assessment. Copyright © 2017 Elsevier Ltd. All rights reserved.
A Fast Vector Radiative Transfer Model for Atmospheric and Oceanic Remote Sensing
NASA Astrophysics Data System (ADS)
Ding, J.; Yang, P.; King, M. D.; Platnick, S. E.; Meyer, K.
2017-12-01
A fast vector radiative transfer model is developed in support of atmospheric and oceanic remote sensing. This model is capable of simulating the Stokes vector observed at the top of the atmosphere (TOA) and the terrestrial surface by considering absorption, scattering, and emission. The gas absorption is parameterized in terms of atmospheric gas concentrations, temperature, and pressure. The parameterization scheme combines a regression method and the correlated-K distribution method, and can easily integrate with multiple scattering computations. The approach is more than four orders of magnitude faster than a line-by-line radiative transfer model with errors less than 0.5% in terms of transmissivity. A two-component approach is utilized to solve the vector radiative transfer equation (VRTE). The VRTE solver separates the phase matrices of aerosol and cloud into forward and diffuse parts and thus the solution is also separated. The forward solution can be expressed by a semi-analytical equation based on the small-angle approximation, and serves as the source of the diffuse part. The diffuse part is solved by the adding-doubling method. The adding-doubling implementation is computationally efficient because the diffuse component needs much fewer spherical function expansion terms. The simulated Stokes vector at both the TOA and the surface have comparable accuracy compared with the counterparts based on numerically rigorous methods.
Kempe, P T; van Oppen, P; de Haan, E; Twisk, J W R; Sluis, A; Smit, J H; van Dyck, R; van Balkom, A J L M
2007-09-01
Two methods for predicting remissions in obsessive-compulsive disorder (OCD) treatment are evaluated. Y-BOCS measurements of 88 patients with a primary OCD (DSM-III-R) diagnosis were performed over a 16-week treatment period, and during three follow-ups. Remission at any measurement was defined as a Y-BOCS score lower than thirteen combined with a reduction of seven points when compared with baseline. Logistic regression models were compared with a Cox regression for recurrent events model. Logistic regression yielded different models at different evaluation times. The recurrent events model remained stable when fewer measurements were used. Higher baseline levels of neuroticism and more severe OCD symptoms were associated with a lower chance of remission, early age of onset and more depressive symptoms with a higher chance. Choice of outcome time affects logistic regression prediction models. Recurrent events analysis uses all information on remissions and relapses. Short- and long-term predictors for OCD remission show overlap.
NASA Astrophysics Data System (ADS)
Wagner, Kurt Collins
2001-10-01
This research asks the fundamental question: "What is the profile of the successful AP chemistry student?" Two populations of students are studied. The first population is comprised of students who attend or attended the South Carolina Governor's School for Science and Mathematics, a specialized high school for high ability students, and who have taken the Advanced Placement (AP) chemistry examination in the past five years. The second population is comprised of the 581 South Carolina public school students at 46 high schools who took the AP chemistry examination in 2000. The first part of the study is intended to be useful in recruitment and placement decisions for schools in the National Consortium for Specialized Secondary Schools of Mathematics, Science and Technology. The second part of the study is intended to facilitate AP chemistry recruitment in South Carolina public schools. The first part of the study was conducted by ex post facto searches of teacher and school records at the South Carolina Governor's School for Science and Mathematics. The second part of the study was conducted by obtaining school participation information from the SC Department of Education and soliciting data from the public schools. Data were collected from 440 of 581 (75.7%) of students in 35 of 46 (76.1%) of schools. Intercorrelational and Multiple Regression Analyses (MRA) have yielded different results for these two populations. For the specialized school population, the significant predictors for success in AP chemistry are PSAT Math, placement test, and PSAT Writing. For the population of SC students, significant predictors for success are PSAT Math, count of prior science courses, and PSAT Writing. Multiple Regressions have been successfully developed for the two populations studied. Recommendations for their application are made.
Retargeted Least Squares Regression Algorithm.
Zhang, Xu-Yao; Wang, Lingfeng; Xiang, Shiming; Liu, Cheng-Lin
2015-09-01
This brief presents a framework of retargeted least squares regression (ReLSR) for multicategory classification. The core idea is to directly learn the regression targets from data other than using the traditional zero-one matrix as regression targets. The learned target matrix can guarantee a large margin constraint for the requirement of correct classification for each data point. Compared with the traditional least squares regression (LSR) and a recently proposed discriminative LSR models, ReLSR is much more accurate in measuring the classification error of the regression model. Furthermore, ReLSR is a single and compact model, hence there is no need to train two-class (binary) machines that are independent of each other. The convex optimization problem of ReLSR is solved elegantly and efficiently with an alternating procedure including regression and retargeting as substeps. The experimental evaluation over a range of databases identifies the validity of our method.
Ardoino, Ilaria; Lanzoni, Monica; Marano, Giuseppe; Boracchi, Patrizia; Sagrini, Elisabetta; Gianstefani, Alice; Piscaglia, Fabio; Biganzoli, Elia M
2017-04-01
The interpretation of regression models results can often benefit from the generation of nomograms, 'user friendly' graphical devices especially useful for assisting the decision-making processes. However, in the case of multinomial regression models, whenever categorical responses with more than two classes are involved, nomograms cannot be drawn in the conventional way. Such a difficulty in managing and interpreting the outcome could often result in a limitation of the use of multinomial regression in decision-making support. In the present paper, we illustrate the derivation of a non-conventional nomogram for multinomial regression models, intended to overcome this issue. Although it may appear less straightforward at first sight, the proposed methodology allows an easy interpretation of the results of multinomial regression models and makes them more accessible for clinicians and general practitioners too. Development of prediction model based on multinomial logistic regression and of the pertinent graphical tool is illustrated by means of an example involving the prediction of the extent of liver fibrosis in hepatitis C patients by routinely available markers.
Intrinsic Raman spectroscopy for quantitative biological spectroscopy Part II
Bechtel, Kate L.; Shih, Wei-Chuan; Feld, Michael S.
2009-01-01
We demonstrate the effectiveness of intrinsic Raman spectroscopy (IRS) at reducing errors caused by absorption and scattering. Physical tissue models, solutions of varying absorption and scattering coefficients with known concentrations of Raman scatterers, are studied. We show significant improvement in prediction error by implementing IRS to predict concentrations of Raman scatterers using both ordinary least squares regression (OLS) and partial least squares regression (PLS). In particular, we show that IRS provides a robust calibration model that does not increase in error when applied to samples with optical properties outside the range of calibration. PMID:18711512
The use of generalized estimating equations in the analysis of motor vehicle crash data.
Hutchings, Caroline B; Knight, Stacey; Reading, James C
2003-01-01
The purpose of this study was to determine if it is necessary to use generalized estimating equations (GEEs) in the analysis of seat belt effectiveness in preventing injuries in motor vehicle crashes. The 1992 Utah crash dataset was used, excluding crash participants where seat belt use was not appropriate (n=93,633). The model used in the 1996 Report to Congress [Report to congress on benefits of safety belts and motorcycle helmets, based on data from the Crash Outcome Data Evaluation System (CODES). National Center for Statistics and Analysis, NHTSA, Washington, DC, February 1996] was analyzed for all occupants with logistic regression, one level of nesting (occupants within crashes), and two levels of nesting (occupants within vehicles within crashes) to compare the use of GEEs with logistic regression. When using one level of nesting compared to logistic regression, 13 of 16 variance estimates changed more than 10%, and eight of 16 parameter estimates changed more than 10%. In addition, three of the independent variables changed from significant to insignificant (alpha=0.05). With the use of two levels of nesting, two of 16 variance estimates and three of 16 parameter estimates changed more than 10% from the variance and parameter estimates in one level of nesting. One of the independent variables changed from insignificant to significant (alpha=0.05) in the two levels of nesting model; therefore, only two of the independent variables changed from significant to insignificant when the logistic regression model was compared to the two levels of nesting model. The odds ratio of seat belt effectiveness in preventing injuries was 12% lower when a one-level nested model was used. Based on these results, we stress the need to use a nested model and GEEs when analyzing motor vehicle crash data.
Predictors of change in life skills in schizophrenia after cognitive remediation.
Kurtz, Matthew M; Seltzer, James C; Fujimoto, Marco; Shagan, Dana S; Wexler, Bruce E
2009-02-01
Few studies have investigated predictors of response to cognitive remediation interventions in patients with schizophrenia. Predictor studies to date have selected treatment outcome measures that were either part of the remediation intervention itself or closely linked to the intervention with few studies investigating factors that predict generalization to measures of everyday life-skills as an index of treatment-related improvement. In the current study we investigated the relationship between four measures of neurocognitive function, crystallized verbal ability, auditory sustained attention and working memory, verbal learning and memory, and problem-solving, two measures of symptoms, total positive and negative symptoms, and the process variables of treatment intensity and duration, to change on a performance-based measure of everyday life-skills after a year of computer-assisted cognitive remediation offered as part of intensive outpatient rehabilitation treatment. Thirty-six patients with schizophrenia or schizoaffective disorder were studied. Results of a linear regression model revealed that auditory attention and working memory predicted a significant amount of the variance in change in performance-based measures of everyday life skills after cognitive remediation, even when variance for all other neurocognitive variables in the model was controlled. Stepwise regression revealed that auditory attention and working memory predicted change in everyday life-skills across the trial even when baseline life-skill scores, symptoms and treatment process variables were controlled. These findings emphasize the importance of sustained auditory attention and working memory for benefiting from extended programs of cognitive remediation.
The intergenerational transmission of conduct problems.
Raudino, Alessandra; Fergusson, David M; Woodward, Lianne J; Horwood, L John
2013-03-01
Drawing on prospective longitudinal data, this paper examines the intergenerational transmission of childhood conduct problems in a sample of 209 parents and their 331 biological offspring studied as part of the Christchurch Health and Developmental Study. The aims were to estimate the association between parental and offspring conduct problems and to examine the extent to which this association could be explained by (a) confounding social/family factors from the parent's childhood and (b) intervening factors reflecting parental behaviours and family functioning. The same item set was used to assess childhood conduct problems in parents and offspring. Two approaches to data analysis (generalised estimating equation regression methods and latent variable structural equation modelling) were used to examine possible explanations of the intergenerational continuity in behaviour. Regression analysis suggested that there was moderate intergenerational continuity (r = 0.23, p < 0.001) between parental and offspring conduct problems. This continuity was not explained by confounding factors but was partially mediated by parenting behaviours, particularly parental over-reactivity. Latent variable modelling designed to take account of non-observed common genetic and environmental factors underlying the continuities in problem behaviours across generations also suggested that parenting behaviour played a role in mediating the intergenerational transmission of conduct problems. There is clear evidence of intergenerational continuity in conduct problems. In part this association reflects a causal chain process in which parental conduct problems are associated (directly or indirectly) with impaired parenting behaviours that in turn influence risks of conduct problems in offspring.
An empirical model for estimating annual consumption by freshwater fish populations
Liao, H.; Pierce, C.L.; Larscheid, J.G.
2005-01-01
Population consumption is an important process linking predator populations to their prey resources. Simple tools are needed to enable fisheries managers to estimate population consumption. We assembled 74 individual estimates of annual consumption by freshwater fish populations and their mean annual population size, 41 of which also included estimates of mean annual biomass. The data set included 14 freshwater fish species from 10 different bodies of water. From this data set we developed two simple linear regression models predicting annual population consumption. Log-transformed population size explained 94% of the variation in log-transformed annual population consumption. Log-transformed biomass explained 98% of the variation in log-transformed annual population consumption. We quantified the accuracy of our regressions and three alternative consumption models as the mean percent difference from observed (bioenergetics-derived) estimates in a test data set. Predictions from our population-size regression matched observed consumption estimates poorly (mean percent difference = 222%). Predictions from our biomass regression matched observed consumption reasonably well (mean percent difference = 24%). The biomass regression was superior to an alternative model, similar in complexity, and comparable to two alternative models that were more complex and difficult to apply. Our biomass regression model, log10(consumption) = 0.5442 + 0.9962??log10(biomass), will be a useful tool for fishery managers, enabling them to make reasonably accurate annual population consumption predictions from mean annual biomass estimates. ?? Copyright by the American Fisheries Society 2005.
Regression analysis of current-status data: an application to breast-feeding.
Grummer-strawn, L M
1993-09-01
"Although techniques for calculating mean survival time from current-status data are well known, their use in multiple regression models is somewhat troublesome. Using data on current breast-feeding behavior, this article considers a number of techniques that have been suggested in the literature, including parametric, nonparametric, and semiparametric models as well as the application of standard schedules. Models are tested in both proportional-odds and proportional-hazards frameworks....I fit [the] models to current status data on breast-feeding from the Demographic and Health Survey (DHS) in six countries: two African (Mali and Ondo State, Nigeria), two Asian (Indonesia and Sri Lanka), and two Latin American (Colombia and Peru)." excerpt
Method for nonlinear exponential regression analysis
NASA Technical Reports Server (NTRS)
Junkin, B. G.
1972-01-01
Two computer programs developed according to two general types of exponential models for conducting nonlinear exponential regression analysis are described. Least squares procedure is used in which the nonlinear problem is linearized by expanding in a Taylor series. Program is written in FORTRAN 5 for the Univac 1108 computer.
Confidence Intervals for Assessing Heterogeneity in Generalized Linear Mixed Models
ERIC Educational Resources Information Center
Wagler, Amy E.
2014-01-01
Generalized linear mixed models are frequently applied to data with clustered categorical outcomes. The effect of clustering on the response is often difficult to practically assess partly because it is reported on a scale on which comparisons with regression parameters are difficult to make. This article proposes confidence intervals for…
Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan
2011-11-01
To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P < 0.0001) based on testing by the Lagrangemultiplier. Therefore, the over-dispersion dispersed data using a modified Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.
Changes in flowering phenology of woody plants in North China
NASA Astrophysics Data System (ADS)
Dai, Junhu
2016-04-01
Over the past several decades, abundant evidences proved that the first flowering date of plants in northern hemisphere became earlier in response to climate warming. However, the existing results about impact of climate change on flowering duration are controversial. In this study, we studied temporal trends in first flowering date (FFD), end of flowering date (EFD) and flowering duration (FD) of 94 woody plants from 1963 to 2014 at three stations (Harbin, Beijing and Xi'an) in North China. Meanwhile, we analyzed the relationship between length of flowering periods and temperature using two phenological models (including regression model and growing degree day model). At all stations, more than 90% of observed species showed earlier flowering over time from 1963 to 2014. The average trends in FFD were 1.33, 1.77 and 3.01 days decade-1 at Harbin, Beijing and Xi'an, respectively. During the same period, EFD also became earlier by a mean rate of 2.19, 1.39 and 2.00 days decade-1, respectively. Regarding FD, a significant shortening of FD was observed at Harbin (-0.86 days decade-1), but FD extended by 0.37 and 1.01 days decade-1 at Beijing and Xi'an, respectively. At interspecific level, the plant species with longer FD tend to have stronger trends of FD extension. Through regression analyses, we found more than 85% of time series revealed a significant negative relationship between FFD (or EFD) and preseason temperature. The regression model could simulate the interannual changes in FFD and EFD with the mean goodness of fit (R2) ranging from 0.38 to 0.67, but failed to simulate the FD accurately, as R2 ranging from 0.09 to 0.18. Regarding to FFD and EFD, the growing degree day model could improved R2 of simuation, but also could not simulate FD accurately. Therefore, we concluded that the FFD and EFD advanced notably in recent six decades as a result of climate warming, but the direction of FD changes depended on locations and the species involved. In addition, the conventional phenological models could not explain most parts of interannual variance in FD, partly due superposition of errors caused by simultaneously simulating FFD and EFD. Therefore, the mechanism of FD changes and more drivers of FD such as soil moisture and light need to be further studied.
Regression Models for Identifying Noise Sources in Magnetic Resonance Images
Zhu, Hongtu; Li, Yimei; Ibrahim, Joseph G.; Shi, Xiaoyan; An, Hongyu; Chen, Yashen; Gao, Wei; Lin, Weili; Rowe, Daniel B.; Peterson, Bradley S.
2009-01-01
Stochastic noise, susceptibility artifacts, magnetic field and radiofrequency inhomogeneities, and other noise components in magnetic resonance images (MRIs) can introduce serious bias into any measurements made with those images. We formally introduce three regression models including a Rician regression model and two associated normal models to characterize stochastic noise in various magnetic resonance imaging modalities, including diffusion-weighted imaging (DWI) and functional MRI (fMRI). Estimation algorithms are introduced to maximize the likelihood function of the three regression models. We also develop a diagnostic procedure for systematically exploring MR images to identify noise components other than simple stochastic noise, and to detect discrepancies between the fitted regression models and MRI data. The diagnostic procedure includes goodness-of-fit statistics, measures of influence, and tools for graphical display. The goodness-of-fit statistics can assess the key assumptions of the three regression models, whereas measures of influence can isolate outliers caused by certain noise components, including motion artifacts. The tools for graphical display permit graphical visualization of the values for the goodness-of-fit statistic and influence measures. Finally, we conduct simulation studies to evaluate performance of these methods, and we analyze a real dataset to illustrate how our diagnostic procedure localizes subtle image artifacts by detecting intravoxel variability that is not captured by the regression models. PMID:19890478
NASA Astrophysics Data System (ADS)
Wang, Y. P.; Lu, Z. P.; Sun, D. S.; Wang, N.
2016-01-01
In order to better express the characteristics of satellite clock bias (SCB) and improve SCB prediction precision, this paper proposed a new SCB prediction model which can take physical characteristics of space-borne atomic clock, the cyclic variation, and random part of SCB into consideration. First, the new model employs a quadratic polynomial model with periodic items to fit and extract the trend term and cyclic term of SCB; then based on the characteristics of fitting residuals, a time series ARIMA ~(Auto-Regressive Integrated Moving Average) model is used to model the residuals; eventually, the results from the two models are combined to obtain final SCB prediction values. At last, this paper uses precise SCB data from IGS (International GNSS Service) to conduct prediction tests, and the results show that the proposed model is effective and has better prediction performance compared with the quadratic polynomial model, grey model, and ARIMA model. In addition, the new method can also overcome the insufficiency of the ARIMA model in model recognition and order determination.
NASA Astrophysics Data System (ADS)
Wilson, Barry T.; Knight, Joseph F.; McRoberts, Ronald E.
2018-03-01
Imagery from the Landsat Program has been used frequently as a source of auxiliary data for modeling land cover, as well as a variety of attributes associated with tree cover. With ready access to all scenes in the archive since 2008 due to the USGS Landsat Data Policy, new approaches to deriving such auxiliary data from dense Landsat time series are required. Several methods have previously been developed for use with finer temporal resolution imagery (e.g. AVHRR and MODIS), including image compositing and harmonic regression using Fourier series. The manuscript presents a study, using Minnesota, USA during the years 2009-2013 as the study area and timeframe. The study examined the relative predictive power of land cover models, in particular those related to tree cover, using predictor variables based solely on composite imagery versus those using estimated harmonic regression coefficients. The study used two common non-parametric modeling approaches (i.e. k-nearest neighbors and random forests) for fitting classification and regression models of multiple attributes measured on USFS Forest Inventory and Analysis plots using all available Landsat imagery for the study area and timeframe. The estimated Fourier coefficients developed by harmonic regression of tasseled cap transformation time series data were shown to be correlated with land cover, including tree cover. Regression models using estimated Fourier coefficients as predictor variables showed a two- to threefold increase in explained variance for a small set of continuous response variables, relative to comparable models using monthly image composites. Similarly, the overall accuracies of classification models using the estimated Fourier coefficients were approximately 10-20 percentage points higher than the models using the image composites, with corresponding individual class accuracies between six and 45 percentage points higher.
Two Paradoxes in Linear Regression Analysis.
Feng, Ge; Peng, Jing; Tu, Dongke; Zheng, Julia Z; Feng, Changyong
2016-12-25
Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection.
Kernel Partial Least Squares for Nonlinear Regression and Discrimination
NASA Technical Reports Server (NTRS)
Rosipal, Roman; Clancy, Daniel (Technical Monitor)
2002-01-01
This paper summarizes recent results on applying the method of partial least squares (PLS) in a reproducing kernel Hilbert space (RKHS). A previously proposed kernel PLS regression model was proven to be competitive with other regularized regression methods in RKHS. The family of nonlinear kernel-based PLS models is extended by considering the kernel PLS method for discrimination. Theoretical and experimental results on a two-class discrimination problem indicate usefulness of the method.
Non-Gaussian spatiotemporal simulation of multisite daily precipitation: downscaling framework
NASA Astrophysics Data System (ADS)
Ben Alaya, M. A.; Ouarda, T. B. M. J.; Chebana, F.
2018-01-01
Probabilistic regression approaches for downscaling daily precipitation are very useful. They provide the whole conditional distribution at each forecast step to better represent the temporal variability. The question addressed in this paper is: how to simulate spatiotemporal characteristics of multisite daily precipitation from probabilistic regression models? Recent publications point out the complexity of multisite properties of daily precipitation and highlight the need for using a non-Gaussian flexible tool. This work proposes a reasonable compromise between simplicity and flexibility avoiding model misspecification. A suitable nonparametric bootstrapping (NB) technique is adopted. A downscaling model which merges a vector generalized linear model (VGLM as a probabilistic regression tool) and the proposed bootstrapping technique is introduced to simulate realistic multisite precipitation series. The model is applied to data sets from the southern part of the province of Quebec, Canada. It is shown that the model is capable of reproducing both at-site properties and the spatial structure of daily precipitations. Results indicate the superiority of the proposed NB technique, over a multivariate autoregressive Gaussian framework (i.e. Gaussian copula).
Yilmaz, Banu; Aras, Egemen; Nacar, Sinan; Kankal, Murat
2018-05-23
The functional life of a dam is often determined by the rate of sediment delivery to its reservoir. Therefore, an accurate estimate of the sediment load in rivers with dams is essential for designing and predicting a dam's useful lifespan. The most credible method is direct measurements of sediment input, but this can be very costly and it cannot always be implemented at all gauging stations. In this study, we tested various regression models to estimate suspended sediment load (SSL) at two gauging stations on the Çoruh River in Turkey, including artificial bee colony (ABC), teaching-learning-based optimization algorithm (TLBO), and multivariate adaptive regression splines (MARS). These models were also compared with one another and with classical regression analyses (CRA). Streamflow values and previously collected data of SSL were used as model inputs with predicted SSL data as output. Two different training and testing dataset configurations were used to reinforce the model accuracy. For the MARS method, the root mean square error value was found to range between 35% and 39% for the test two gauging stations, which was lower than errors for other models. Error values were even lower (7% to 15%) using another dataset. Our results indicate that simultaneous measurements of streamflow with SSL provide the most effective parameter for obtaining accurate predictive models and that MARS is the most accurate model for predicting SSL. Copyright © 2017 Elsevier B.V. All rights reserved.
Functional Relationships and Regression Analysis.
ERIC Educational Resources Information Center
Preece, Peter F. W.
1978-01-01
Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Hidden Connections between Regression Models of Strain-Gage Balance Calibration Data
NASA Technical Reports Server (NTRS)
Ulbrich, Norbert
2013-01-01
Hidden connections between regression models of wind tunnel strain-gage balance calibration data are investigated. These connections become visible whenever balance calibration data is supplied in its design format and both the Iterative and Non-Iterative Method are used to process the data. First, it is shown how the regression coefficients of the fitted balance loads of a force balance can be approximated by using the corresponding regression coefficients of the fitted strain-gage outputs. Then, data from the manual calibration of the Ames MK40 six-component force balance is chosen to illustrate how estimates of the regression coefficients of the fitted balance loads can be obtained from the regression coefficients of the fitted strain-gage outputs. The study illustrates that load predictions obtained by applying the Iterative or the Non-Iterative Method originate from two related regression solutions of the balance calibration data as long as balance loads are given in the design format of the balance, gage outputs behave highly linear, strict statistical quality metrics are used to assess regression models of the data, and regression model term combinations of the fitted loads and gage outputs can be obtained by a simple variable exchange.
Simultaneous Estimation of Electromechanical Modes and Forced Oscillations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Follum, Jim; Pierre, John W.; Martin, Russell
Over the past several years, great strides have been made in the effort to monitor the small-signal stability of power systems. These efforts focus on estimating electromechanical modes, which are a property of the system that dictate how generators in different parts of the system exchange energy. Though the algorithms designed for this task are powerful and important for reliable operation of the power system, they are susceptible to severe bias when forced oscillations are present in the system. Forced oscillations are fundamentally different from electromechanical oscillations in that they are the result of a rogue input to the system,more » rather than a property of the system itself. To address the presence of forced oscillations, the frequently used AutoRegressive Moving Average (ARMA) model is adapted to include sinusoidal inputs, resulting in the AutoRegressive Moving Average plus Sinusoid (ARMA+S) model. From this model, a new Two-Stage Least Squares algorithm is derived to incorporate the forced oscillations, thereby enabling the simultaneous estimation of the electromechanical modes and the amplitude and phase of the forced oscillations. The method is validated using simulated power system data as well as data obtained from the western North American power system (wNAPS) and Eastern Interconnection (EI).« less
A nonparametric multiple imputation approach for missing categorical data.
Zhou, Muhan; He, Yulei; Yu, Mandi; Hsu, Chiu-Hsieh
2017-06-06
Incomplete categorical variables with more than two categories are common in public health data. However, most of the existing missing-data methods do not use the information from nonresponse (missingness) probabilities. We propose a nearest-neighbour multiple imputation approach to impute a missing at random categorical outcome and to estimate the proportion of each category. The donor set for imputation is formed by measuring distances between each missing value with other non-missing values. The distance function is calculated based on a predictive score, which is derived from two working models: one fits a multinomial logistic regression for predicting the missing categorical outcome (the outcome model) and the other fits a logistic regression for predicting missingness probabilities (the missingness model). A weighting scheme is used to accommodate contributions from two working models when generating the predictive score. A missing value is imputed by randomly selecting one of the non-missing values with the smallest distances. We conduct a simulation to evaluate the performance of the proposed method and compare it with several alternative methods. A real-data application is also presented. The simulation study suggests that the proposed method performs well when missingness probabilities are not extreme under some misspecifications of the working models. However, the calibration estimator, which is also based on two working models, can be highly unstable when missingness probabilities for some observations are extremely high. In this scenario, the proposed method produces more stable and better estimates. In addition, proper weights need to be chosen to balance the contributions from the two working models and achieve optimal results for the proposed method. We conclude that the proposed multiple imputation method is a reasonable approach to dealing with missing categorical outcome data with more than two levels for assessing the distribution of the outcome. In terms of the choices for the working models, we suggest a multinomial logistic regression for predicting the missing outcome and a binary logistic regression for predicting the missingness probability.
Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures.
Bobb, Jennifer F; Valeri, Linda; Claus Henn, Birgit; Christiani, David C; Wright, Robert O; Mazumdar, Maitreyi; Godleski, John J; Coull, Brent A
2015-07-01
Because humans are invariably exposed to complex chemical mixtures, estimating the health effects of multi-pollutant exposures is of critical concern in environmental epidemiology, and to regulatory agencies such as the U.S. Environmental Protection Agency. However, most health effects studies focus on single agents or consider simple two-way interaction models, in part because we lack the statistical methodology to more realistically capture the complexity of mixed exposures. We introduce Bayesian kernel machine regression (BKMR) as a new approach to study mixtures, in which the health outcome is regressed on a flexible function of the mixture (e.g. air pollution or toxic waste) components that is specified using a kernel function. In high-dimensional settings, a novel hierarchical variable selection approach is incorporated to identify important mixture components and account for the correlated structure of the mixture. Simulation studies demonstrate the success of BKMR in estimating the exposure-response function and in identifying the individual components of the mixture responsible for health effects. We demonstrate the features of the method through epidemiology and toxicology applications. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A Semiparametric Change-Point Regression Model for Longitudinal Observations.
Xing, Haipeng; Ying, Zhiliang
2012-12-01
Many longitudinal studies involve relating an outcome process to a set of possibly time-varying covariates, giving rise to the usual regression models for longitudinal data. When the purpose of the study is to investigate the covariate effects when experimental environment undergoes abrupt changes or to locate the periods with different levels of covariate effects, a simple and easy-to-interpret approach is to introduce change-points in regression coefficients. In this connection, we propose a semiparametric change-point regression model, in which the error process (stochastic component) is nonparametric and the baseline mean function (functional part) is completely unspecified, the observation times are allowed to be subject-specific, and the number, locations and magnitudes of change-points are unknown and need to be estimated. We further develop an estimation procedure which combines the recent advance in semiparametric analysis based on counting process argument and multiple change-points inference, and discuss its large sample properties, including consistency and asymptotic normality, under suitable regularity conditions. Simulation results show that the proposed methods work well under a variety of scenarios. An application to a real data set is also given.
An empirical study using permutation-based resampling in meta-regression
2012-01-01
Background In meta-regression, as the number of trials in the analyses decreases, the risk of false positives or false negatives increases. This is partly due to the assumption of normality that may not hold in small samples. Creation of a distribution from the observed trials using permutation methods to calculate P values may allow for less spurious findings. Permutation has not been empirically tested in meta-regression. The objective of this study was to perform an empirical investigation to explore the differences in results for meta-analyses on a small number of trials using standard large sample approaches verses permutation-based methods for meta-regression. Methods We isolated a sample of randomized controlled clinical trials (RCTs) for interventions that have a small number of trials (herbal medicine trials). Trials were then grouped by herbal species and condition and assessed for methodological quality using the Jadad scale, and data were extracted for each outcome. Finally, we performed meta-analyses on the primary outcome of each group of trials and meta-regression for methodological quality subgroups within each meta-analysis. We used large sample methods and permutation methods in our meta-regression modeling. We then compared final models and final P values between methods. Results We collected 110 trials across 5 intervention/outcome pairings and 5 to 10 trials per covariate. When applying large sample methods and permutation-based methods in our backwards stepwise regression the covariates in the final models were identical in all cases. The P values for the covariates in the final model were larger in 78% (7/9) of the cases for permutation and identical for 22% (2/9) of the cases. Conclusions We present empirical evidence that permutation-based resampling may not change final models when using backwards stepwise regression, but may increase P values in meta-regression of multiple covariates for relatively small amount of trials. PMID:22587815
Wagner, Daniel M.; Krieger, Joshua D.; Veilleux, Andrea G.
2016-08-04
In 2013, the U.S. Geological Survey initiated a study to update regional skew, annual exceedance probability discharges, and regional regression equations used to estimate annual exceedance probability discharges for ungaged locations on streams in the study area with the use of recent geospatial data, new analytical methods, and available annual peak-discharge data through the 2013 water year. An analysis of regional skew using Bayesian weighted least-squares/Bayesian generalized-least squares regression was performed for Arkansas, Louisiana, and parts of Missouri and Oklahoma. The newly developed constant regional skew of -0.17 was used in the computation of annual exceedance probability discharges for 281 streamgages used in the regional regression analysis. Based on analysis of covariance, four flood regions were identified for use in the generation of regional regression models. Thirty-nine basin characteristics were considered as potential explanatory variables, and ordinary least-squares regression techniques were used to determine the optimum combinations of basin characteristics for each of the four regions. Basin characteristics in candidate models were evaluated based on multicollinearity with other basin characteristics (variance inflation factor < 2.5) and statistical significance at the 95-percent confidence level (p ≤ 0.05). Generalized least-squares regression was used to develop the final regression models for each flood region. Average standard errors of prediction of the generalized least-squares models ranged from 32.76 to 59.53 percent, with the largest range in flood region D. Pseudo coefficients of determination of the generalized least-squares models ranged from 90.29 to 97.28 percent, with the largest range also in flood region D. The regional regression equations apply only to locations on streams in Arkansas where annual peak discharges are not substantially affected by regulation, diversion, channelization, backwater, or urbanization. The applicability and accuracy of the regional regression equations depend on the basin characteristics measured for an ungaged location on a stream being within range of those used to develop the equations.
A Fast Hyperspectral Vector Radiative Transfer Model in UV to IR spectral bands
NASA Astrophysics Data System (ADS)
Ding, J.; Yang, P.; Sun, B.; Kattawar, G. W.; Platnick, S. E.; Meyer, K.; Wang, C.
2016-12-01
We develop a fast hyperspectral vector radiative transfer model with a spectral range from UV to IR with 5 nm resolutions. This model can simulate top of the atmosphere (TOA) diffuse radiance and polarized reflectance by considering gas absorption, Rayleigh scattering, and aerosol and cloud scattering. The absorption component considers several major atmospheric absorbers such as water vapor, CO2, O3, and O2 including both line and continuum absorptions. A regression-based method is used to parameterize the layer effective optical thickness for each gas, which substantially increases the computation efficiency for absorption while maintaining high accuracy. This method is over 500 times faster than the existing line-by-line method. The scattering component uses the successive order of scattering (SOS) method. For Rayleigh scattering, convergence is fast due to the small optical thickness of atmospheric gases. For cloud and aerosol layers, a small-angle approximation method is used in SOS calculations. The scattering process is divided into two parts, a forward part and a diffuse part. The scattering in the small-angle range in the forward direction is approximated as forward scattering. A cloud or aerosol layer is divided into thin layers. As the ray propagates through each thin layer, a portion diverges as diffuse radiation, while the remainder continues propagating in forward direction. The computed diffuse radiance is the sum of all of the diffuse parts. The small-angle approximation makes the SOS calculation converge rapidly even in a thick cloud layer.
Predicting recycling behaviour: Comparison of a linear regression model and a fuzzy logic model.
Vesely, Stepan; Klöckner, Christian A; Dohnal, Mirko
2016-03-01
In this paper we demonstrate that fuzzy logic can provide a better tool for predicting recycling behaviour than the customarily used linear regression. To show this, we take a set of empirical data on recycling behaviour (N=664), which we randomly divide into two halves. The first half is used to estimate a linear regression model of recycling behaviour, and to develop a fuzzy logic model of recycling behaviour. As the first comparison, the fit of both models to the data included in estimation of the models (N=332) is evaluated. As the second comparison, predictive accuracy of both models for "new" cases (hold-out data not included in building the models, N=332) is assessed. In both cases, the fuzzy logic model significantly outperforms the regression model in terms of fit. To conclude, when accurate predictions of recycling and possibly other environmental behaviours are needed, fuzzy logic modelling seems to be a promising technique. Copyright © 2015 Elsevier Ltd. All rights reserved.
Out-of-pocket expenditures for pharmaceuticals: lessons from the Austrian household budget survey.
Sanwald, Alice; Theurl, Engelbert
2017-05-01
Paying pharmaceuticals out of pocket is an important source of financing pharmaceutical consumption. Only limited empirical knowledge is available on the determinants of these expenditures. In this article we analyze which characteristics of private households influence out-of-pocket pharmaceutical expenditure (OOPPE) in Austria. We use cross-sectional information on OOPPE and household characteristics provided by the Austrian household budget survey 2009/10. We split pharmaceutical expenditures into the two components prescription fees and over-the-counter (OTC) expenditures. To adjust for the specific characteristics of the data, we compare different econometric approaches: a two-part model, hurdle model, generalized linear model and zero-inflated negative binomial regression model. The finally selected econometric approaches give a quite consistent picture. The probability of expenditures of both types is strongly influenced by the household structure. It increases with age, doctoral visits and the presence of a female householder. The education level and income only increase the probability of OTC pharmaceuticals. The level of OTC expenditures remains widely unexplained while the household structure and age influence the expenditures for prescription fees. Insurance characteristics of private households, either private or public, play a minor role in explaining the expenditure levels in all specifications. This refers to a homogeneous and comprehensive provision of pharmaceuticals in the public part of the Austrian health care system. The article gives useful insights into the determinants of pharmaceutical expenditures of private households and supplements the previous research that focuses on the individual level.
Babcock, Chad; Finley, Andrew O.; Bradford, John B.; Kolka, Randall K.; Birdsey, Richard A.; Ryan, Michael G.
2015-01-01
Many studies and production inventory systems have shown the utility of coupling covariates derived from Light Detection and Ranging (LiDAR) data with forest variables measured on georeferenced inventory plots through regression models. The objective of this study was to propose and assess the use of a Bayesian hierarchical modeling framework that accommodates both residual spatial dependence and non-stationarity of model covariates through the introduction of spatial random effects. We explored this objective using four forest inventory datasets that are part of the North American Carbon Program, each comprising point-referenced measures of above-ground forest biomass and discrete LiDAR. For each dataset, we considered at least five regression model specifications of varying complexity. Models were assessed based on goodness of fit criteria and predictive performance using a 10-fold cross-validation procedure. Results showed that the addition of spatial random effects to the regression model intercept improved fit and predictive performance in the presence of substantial residual spatial dependence. Additionally, in some cases, allowing either some or all regression slope parameters to vary spatially, via the addition of spatial random effects, further improved model fit and predictive performance. In other instances, models showed improved fit but decreased predictive performance—indicating over-fitting and underscoring the need for cross-validation to assess predictive ability. The proposed Bayesian modeling framework provided access to pixel-level posterior predictive distributions that were useful for uncertainty mapping, diagnosing spatial extrapolation issues, revealing missing model covariates, and discovering locally significant parameters.
NASA Astrophysics Data System (ADS)
Madonna, Erica; Ginsbourger, David; Martius, Olivia
2018-05-01
In Switzerland, hail regularly causes substantial damage to agriculture, cars and infrastructure, however, little is known about its long-term variability. To study the variability, the monthly number of days with hail in northern Switzerland is modeled in a regression framework using large-scale predictors derived from ERA-Interim reanalysis. The model is developed and verified using radar-based hail observations for the extended summer season (April-September) in the period 2002-2014. The seasonality of hail is explicitly modeled with a categorical predictor (month) and monthly anomalies of several large-scale predictors are used to capture the year-to-year variability. Several regression models are applied and their performance tested with respect to standard scores and cross-validation. The chosen model includes four predictors: the monthly anomaly of the two meter temperature, the monthly anomaly of the logarithm of the convective available potential energy (CAPE), the monthly anomaly of the wind shear and the month. This model well captures the intra-annual variability and slightly underestimates its inter-annual variability. The regression model is applied to the reanalysis data back in time to 1980. The resulting hail day time series shows an increase of the number of hail days per month, which is (in the model) related to an increase in temperature and CAPE. The trend corresponds to approximately 0.5 days per month per decade. The results of the regression model have been compared to two independent data sets. All data sets agree on the sign of the trend, but the trend is weaker in the other data sets.
Zhu, Yu; Xia, Jie-lai; Wang, Jing
2009-09-01
Application of the 'single auto regressive integrated moving average (ARIMA) model' and the 'ARIMA-generalized regression neural network (GRNN) combination model' in the research of the incidence of scarlet fever. Establish the auto regressive integrated moving average model based on the data of the monthly incidence on scarlet fever of one city, from 2000 to 2006. The fitting values of the ARIMA model was used as input of the GRNN, and the actual values were used as output of the GRNN. After training the GRNN, the effect of the single ARIMA model and the ARIMA-GRNN combination model was then compared. The mean error rate (MER) of the single ARIMA model and the ARIMA-GRNN combination model were 31.6%, 28.7% respectively and the determination coefficient (R(2)) of the two models were 0.801, 0.872 respectively. The fitting efficacy of the ARIMA-GRNN combination model was better than the single ARIMA, which had practical value in the research on time series data such as the incidence of scarlet fever.
Application of glas laser altimetry to detect elevation changes in East Antarctica
NASA Astrophysics Data System (ADS)
Scaioni, M.; Tong, X.; Li, R.
2013-10-01
In this paper the use of ICESat/GLAS laser altimeter for estimating multi-temporal elevation changes on polar ice sheets is afforded. Due to non-overlapping laser spots during repeat passes, interpolation methods are required to make comparisons. After reviewing the main methods described in the literature (crossover point analysis, cross-track DEM projection, space-temporal regressions), the last one has been chosen for its capability of providing more elevation change rate measurements. The standard implementation of the space-temporal linear regression technique has been revisited and improved to better cope with outliers and to check the estimability of model's parameters. GLAS data over the PANDA route in East Antarctica have been used for testing. Obtained results have been quite meaningful from a physical point of view, confirming the trend reported by the literature of a constant snow accumulation in the area during the two past decades, unlike the most part of the continent that has been losing mass.
A novel model incorporating two variability sources for describing motor evoked potentials
Goetz, Stefan M.; Luber, Bruce; Lisanby, Sarah H.; Peterchev, Angel V.
2014-01-01
Objective Motor evoked potentials (MEPs) play a pivotal role in transcranial magnetic stimulation (TMS), e.g., for determining the motor threshold and probing cortical excitability. Sampled across the range of stimulation strengths, MEPs outline an input–output (IO) curve, which is often used to characterize the corticospinal tract. More detailed understanding of the signal generation and variability of MEPs would provide insight into the underlying physiology and aid correct statistical treatment of MEP data. Methods A novel regression model is tested using measured IO data of twelve subjects. The model splits MEP variability into two independent contributions, acting on both sides of a strong sigmoidal nonlinearity that represents neural recruitment. Traditional sigmoidal regression with a single variability source after the nonlinearity is used for comparison. Results The distribution of MEP amplitudes varied across different stimulation strengths, violating statistical assumptions in traditional regression models. In contrast to the conventional regression model, the dual variability source model better described the IO characteristics including phenomena such as changing distribution spread and skewness along the IO curve. Conclusions MEP variability is best described by two sources that most likely separate variability in the initial excitation process from effects occurring later on. The new model enables more accurate and sensitive estimation of the IO curve characteristics, enhancing its power as a detection tool, and may apply to other brain stimulation modalities. Furthermore, it extracts new information from the IO data concerning the neural variability—information that has previously been treated as noise. PMID:24794287
Friesz, Paul J.
2010-01-01
Areas contributing recharge to four well fields in two study sites in southern Rhode Island were delineated on the basis of steady-state groundwater-flow models representing average hydrologic conditions. The wells are screened in sand and gravel deposits in wetland and coastal settings. The groundwater-flow models were calibrated by inverse modeling using nonlinear regression. Summary statistics from nonlinear regression were used to evaluate the uncertainty associated with the predicted areas contributing recharge to the well fields. In South Kingstown, two United Water Rhode Island well fields are in Mink Brook watershed and near Worden Pond and extensive wetlands. Wetland deposits of peat near the well fields generally range in thickness from 5 to 8 feet. Analysis of water-level drawdowns in a piezometer screened beneath the peat during a 20-day pumping period indicated vertical leakage and a vertical hydraulic conductivity for the peat of roughly 0.01 ft/d. The simulated area contributing recharge for average withdrawals of 2,138 gallons per minute during 2003-07 extended to groundwater divides in mostly till and morainal deposits, and it encompassed 2.30 square miles. Most of a sand and gravel mining operation between the well fields was in the simulated contributing area. For the maximum pumping capacity (5,100 gallons per minute), the simulated area contributing recharge expanded to 5.54 square miles. The well fields intercepted most of the precipitation recharge in Mink Brook watershed and in an adjacent small watershed, and simulated streams ceased to flow. The simulated contributing area to the well fields included an area beneath Worden Pond and a remote, isolated area in upland till on the opposite side of Worden Pond from the well fields. About 12 percent of the pumped water was derived from Worden Pond. In Charlestown, the Central Beach Fire District and the East Beach Water Association well fields are on a small (0.85 square mile) peninsula in a coastal setting. The wells are screened in a coarse-grained, ice-proximal part of a morphosequence with saturated thicknesses generally less than 30 feet on the peninsula. The simulated area contributing recharge for the average withdrawal (16 gallons per minute) during 2003-07 was 0.018 square mile. The contributing area extended southwestward from the well fields to a simulated groundwater mound; it underlay part of a small nearby wetland, and it included isolated areas on the side of the wetland opposite the well fields. For the maximum pumping rate (230 gallons per minute), the simulated area contributing recharge (0.26 square mile) expanded in all directions; it included a till area on the peninsula, and it underlay part of a nearby pond. Because the well fields are screened in a thin aquifer, simulated groundwater traveltimes from recharge locations to the discharging wells were short: 94 percent of the traveltimes were 10 years or less, and the median traveltime was 1.3 years. Model-prediction uncertainty was evaluated using a Monte Carlo analysis; the parameter variance-covariance matrix from nonlinear regression was used to create parameter sets for the analysis. Important parameters for model prediction that could not be estimated by nonlinear regression were incorporated into the variance-covariance matrix. For the South Kingstown study site, observations provided enough information to constrain the uncertainty of these parameters within realistic ranges, but for the Charlestown study site, prior information on parameters was required. Thus, the uncertainty analysis for the South Kingstown study site was an outcome of calibrating the model to available observations, but the Charlestown study site was also dependent on information provided by the modeler. A water budget and model-fit statistical criteria were used to assess parameter sets so that prediction uncertainty was not overestimated. For the scenarios using maximum pumping rates at both study
Aerodynamic parameters of High-Angle-of attack Research Vehicle (HARV) estimated from flight data
NASA Technical Reports Server (NTRS)
Klein, Vladislav; Ratvasky, Thomas R.; Cobleigh, Brent R.
1990-01-01
Aerodynamic parameters of the High-Angle-of-Attack Research Aircraft (HARV) were estimated from flight data at different values of the angle of attack between 10 degrees and 50 degrees. The main part of the data was obtained from small amplitude longitudinal and lateral maneuvers. A small number of large amplitude maneuvers was also used in the estimation. The measured data were first checked for their compatibility. It was found that the accuracy of air data was degraded by unexplained bias errors. Then, the data were analyzed by a stepwise regression method for obtaining a structure of aerodynamic model equations and least squares parameter estimates. Because of high data collinearity in several maneuvers, some of the longitudinal and all lateral maneuvers were reanalyzed by using two biased estimation techniques, the principal components regression and mixed estimation. The estimated parameters in the form of stability and control derivatives, and aerodynamic coefficients were plotted against the angle of attack and compared with the wind tunnel measurements. The influential parameters are, in general, estimated with acceptable accuracy and most of them are in agreement with wind tunnel results. The simulated responses of the aircraft showed good prediction capabilities of the resulting model.
Kendall, G M; Wakeford, R; Athanson, M; Vincent, T J; Carter, E J; McColl, N P; Little, M P
2016-03-01
Gamma radiation from natural sources (including directly ionising cosmic rays) is an important component of background radiation. In the present paper, indoor measurements of naturally occurring gamma rays that were undertaken as part of the UK Childhood Cancer Study are summarised, and it is shown that these are broadly compatible with an earlier UK National Survey. The distribution of indoor gamma-ray dose rates in Great Britain is approximately normal with mean 96 nGy/h and standard deviation 23 nGy/h. Directly ionising cosmic rays contribute about one-third of the total. The expanded dataset allows a more detailed description than previously of indoor gamma-ray exposures and in particular their geographical variation. Various strategies for predicting indoor natural background gamma-ray dose rates were explored. In the first of these, a geostatistical model was fitted, which assumes an underlying geologically determined spatial variation, superimposed on which is a Gaussian stochastic process with Matérn correlation structure that models the observed tendency of dose rates in neighbouring houses to correlate. In the second approach, a number of dose-rate interpolation measures were first derived, based on averages over geologically or administratively defined areas or using distance-weighted averages of measurements at nearest-neighbour points. Linear regression was then used to derive an optimal linear combination of these interpolation measures. The predictive performances of the two models were compared via cross-validation, using a randomly selected 70 % of the data to fit the models and the remaining 30 % to test them. The mean square error (MSE) of the linear-regression model was lower than that of the Gaussian-Matérn model (MSE 378 and 411, respectively). The predictive performance of the two candidate models was also evaluated via simulation; the OLS model performs significantly better than the Gaussian-Matérn model.
Two Paradoxes in Linear Regression Analysis
FENG, Ge; PENG, Jing; TU, Dongke; ZHENG, Julia Z.; FENG, Changyong
2016-01-01
Summary Regression is one of the favorite tools in applied statistics. However, misuse and misinterpretation of results from regression analysis are common in biomedical research. In this paper we use statistical theory and simulation studies to clarify some paradoxes around this popular statistical method. In particular, we show that a widely used model selection procedure employed in many publications in top medical journals is wrong. Formal procedures based on solid statistical theory should be used in model selection. PMID:28638214
[Calculating Pearson residual in logistic regressions: a comparison between SPSS and SAS].
Xu, Hao; Zhang, Tao; Li, Xiao-song; Liu, Yuan-yuan
2015-01-01
To compare the results of Pearson residual calculations in logistic regression models using SPSS and SAS. We reviewed Pearson residual calculation methods, and used two sets of data to test logistic models constructed by SPSS and STATA. One model contained a small number of covariates compared to the number of observed. The other contained a similar number of covariates as the number of observed. The two software packages produced similar Pearson residual estimates when the models contained a similar number of covariates as the number of observed, but the results differed when the number of observed was much greater than the number of covariates. The two software packages produce different results of Pearson residuals, especially when the models contain a small number of covariates. Further studies are warranted.
A Comparison of Methods for Estimating Quadratic Effects in Nonlinear Structural Equation Models
ERIC Educational Resources Information Center
Harring, Jeffrey R.; Weiss, Brandi A.; Hsu, Jui-Chen
2012-01-01
Two Monte Carlo simulations were performed to compare methods for estimating and testing hypotheses of quadratic effects in latent variable regression models. The methods considered in the current study were (a) a 2-stage moderated regression approach using latent variable scores, (b) an unconstrained product indicator approach, (c) a latent…
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
Regression Model Optimization for the Analysis of Experimental Data
NASA Technical Reports Server (NTRS)
Ulbrich, N.
2009-01-01
A candidate math model search algorithm was developed at Ames Research Center that determines a recommended math model for the multivariate regression analysis of experimental data. The search algorithm is applicable to classical regression analysis problems as well as wind tunnel strain gage balance calibration analysis applications. The algorithm compares the predictive capability of different regression models using the standard deviation of the PRESS residuals of the responses as a search metric. This search metric is minimized during the search. Singular value decomposition is used during the search to reject math models that lead to a singular solution of the regression analysis problem. Two threshold dependent constraints are also applied. The first constraint rejects math models with insignificant terms. The second constraint rejects math models with near-linear dependencies between terms. The math term hierarchy rule may also be applied as an optional constraint during or after the candidate math model search. The final term selection of the recommended math model depends on the regressor and response values of the data set, the user s function class combination choice, the user s constraint selections, and the result of the search metric minimization. A frequently used regression analysis example from the literature is used to illustrate the application of the search algorithm to experimental data.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Andrew T. Hudak; Nicholas L. Crookston; Jeffrey S. Evans; Michael K. Falkowski; Alistair M. S. Smith; Paul E. Gessler; Penelope Morgan
2006-01-01
We compared the utility of discrete-return light detection and ranging (lidar) data and multispectral satellite imagery, and their integration, for modeling and mapping basal area and tree density across two diverse coniferous forest landscapes in north-central Idaho. We applied multiple linear regression models subset from a suite of 26 predictor variables derived...
Using ridge regression in systematic pointing error corrections
NASA Technical Reports Server (NTRS)
Guiar, C. N.
1988-01-01
A pointing error model is used in the antenna calibration process. Data from spacecraft or radio star observations are used to determine the parameters in the model. However, the regression variables are not truly independent, displaying a condition known as multicollinearity. Ridge regression, a biased estimation technique, is used to combat the multicollinearity problem. Two data sets pertaining to Voyager 1 spacecraft tracking (days 105 and 106 of 1987) were analyzed using both linear least squares and ridge regression methods. The advantages and limitations of employing the technique are presented. The problem is not yet fully resolved.
Predicting spatio-temporal failure in large scale observational and micro scale experimental systems
NASA Astrophysics Data System (ADS)
de las Heras, Alejandro; Hu, Yong
2006-10-01
Forecasting has become an essential part of modern thought, but the practical limitations still are manifold. We addressed future rates of change by comparing models that take into account time, and models that focus more on space. Cox regression confirmed that linear change can be safely assumed in the short-term. Spatially explicit Poisson regression, provided a ceiling value for the number of deforestation spots. With several observed and estimated rates, it was decided to forecast using the more robust assumptions. A Markov-chain cellular automaton thus projected 5-year deforestation in the Amazonian Arc of Deforestation, showing that even a stable rate of change would largely deplete the forest area. More generally, resolution and implementation of the existing models could explain many of the modelling difficulties still affecting forecasting.
Xu, Yun; Muhamadali, Howbeer; Sayqal, Ali; Dixon, Neil; Goodacre, Royston
2016-10-28
Partial least squares (PLS) is one of the most commonly used supervised modelling approaches for analysing multivariate metabolomics data. PLS is typically employed as either a regression model (PLS-R) or a classification model (PLS-DA). However, in metabolomics studies it is common to investigate multiple, potentially interacting, factors simultaneously following a specific experimental design. Such data often cannot be considered as a "pure" regression or a classification problem. Nevertheless, these data have often still been treated as a regression or classification problem and this could lead to ambiguous results. In this study, we investigated the feasibility of designing a hybrid target matrix Y that better reflects the experimental design than simple regression or binary class membership coding commonly used in PLS modelling. The new design of Y coding was based on the same principle used by structural modelling in machine learning techniques. Two real metabolomics datasets were used as examples to illustrate how the new Y coding can improve the interpretability of the PLS model compared to classic regression/classification coding.
NASA Astrophysics Data System (ADS)
Köhler, S. J.; Buffam, I.; Seibert, J.; Bishop, K. H.; Laudon, H.
2009-06-01
SummaryTwo different but complementary modelling approaches for reproducing the observed dynamics of total organic carbon (TOC) in a boreal stream are presented. One is based on a regression analysis, while the other is based on riparian soil conditions using a convolution of flow and concentration. Both approaches are relatively simple to establish and help to identify gaps in the process understanding of the TOC transport from soils to catchments runoff. The largest part of the temporal variation of stream TOC concentrations (4-46 mg L -1) in a forested headwater stream in the boreal zone in northern Sweden may be described using a four-parameter regression equation that has runoff and transformed air temperature as sole input variables. Runoff is assumed to be a proxy for soil wetness conditions and changing flow pathways which in turn caused most of the stream TOC variation. Temperature explained a significant part of the observed inter-annual variability. Long-term riparian hydrochemistry in soil solutions within 4 m of the stream also captures a surprisingly large part of the observed variation of stream TOC and highlights the importance of riparian soils. The riparian zone was used to reproduce stream TOC with the help of a convolution model based on flow and average riparian chemistry as input variables. There is a significant effect of wetting of the riparian soil that translates into a memory effect for subsequent episodes and thus contributes to controlling stream TOC concentrations. Situations with high flow introduce a large amount of variability into stream water TOC that may be related to memory effects, rapid groundwater fluctuations and other processes not identified so far. Two different climate scenarios for the region based on the IPCC scenarios were applied to the regression equation to test what effect the expected increase in precipitation and temperature and resulting changes in runoff would have on stream TOC concentrations assuming that the soil conditions remain unchanged. Both scenarios resulted in a mean increase of stream TOC concentrations of between 1.5 and 2.5 mg L -1 during the snow free season, which amounts to approximately 15% more TOC export compared to present conditions. Wetter and warmer conditions in the late autumn led to a difference of monthly average TOC of up to 5 mg L -1, suggesting that stream TOC may be particularly susceptible to climate variability during this season.
Application of logistic regression to case-control association studies involving two causative loci.
North, Bernard V; Curtis, David; Sham, Pak C
2005-01-01
Models in which two susceptibility loci jointly influence the risk of developing disease can be explored using logistic regression analysis. Comparison of likelihoods of models incorporating different sets of disease model parameters allows inferences to be drawn regarding the nature of the joint effect of the loci. We have simulated case-control samples generated assuming different two-locus models and then analysed them using logistic regression. We show that this method is practicable and that, for the models we have used, it can be expected to allow useful inferences to be drawn from sample sizes consisting of hundreds of subjects. Interactions between loci can be explored, but interactive effects do not exactly correspond with classical definitions of epistasis. We have particularly examined the issue of the extent to which it is helpful to utilise information from a previously identified locus when investigating a second, unknown locus. We show that for some models conditional analysis can have substantially greater power while for others unconditional analysis can be more powerful. Hence we conclude that in general both conditional and unconditional analyses should be performed when searching for additional loci.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
Modeling vertebrate diversity in Oregon using satellite imagery
NASA Astrophysics Data System (ADS)
Cablk, Mary Elizabeth
Vertebrate diversity was modeled for the state of Oregon using a parametric approach to regression tree analysis. This exploratory data analysis effectively modeled the non-linear relationships between vertebrate richness and phenology, terrain, and climate. Phenology was derived from time-series NOAA-AVHRR satellite imagery for the year 1992 using two methods: principal component analysis and derivation of EROS data center greenness metrics. These two measures of spatial and temporal vegetation condition incorporated the critical temporal element in this analysis. The first three principal components were shown to contain spatial and temporal information about the landscape and discriminated phenologically distinct regions in Oregon. Principal components 2 and 3, 6 greenness metrics, elevation, slope, aspect, annual precipitation, and annual seasonal temperature difference were investigated as correlates to amphibians, birds, all vertebrates, reptiles, and mammals. Variation explained for each regression tree by taxa were: amphibians (91%), birds (67%), all vertebrates (66%), reptiles (57%), and mammals (55%). Spatial statistics were used to quantify the pattern of each taxa and assess validity of resulting predictions from regression tree models. Regression tree analysis was relatively robust against spatial autocorrelation in the response data and graphical results indicated models were well fit to the data.
Estimating irrigation water use in the humid eastern United States
Levin, Sara B.; Zarriello, Phillip J.
2013-01-01
Accurate accounting of irrigation water use is an important part of the U.S. Geological Survey National Water-Use Information Program and the WaterSMART initiative to help maintain sustainable water resources in the Nation. Irrigation water use in the humid eastern United States is not well characterized because of inadequate reporting and wide variability associated with climate, soils, crops, and farming practices. To better understand irrigation water use in the eastern United States, two types of predictive models were developed and compared by using metered irrigation water-use data for corn, cotton, peanut, and soybean crops in Georgia and turf farms in Rhode Island. Reliable metered irrigation data were limited to these areas. The first predictive model that was developed uses logistic regression to predict the occurrence of irrigation on the basis of antecedent climate conditions. Logistic regression equations were developed for corn, cotton, peanut, and soybean crops by using weekly irrigation water-use data from 36 metered sites in Georgia in 2009 and 2010 and turf farms in Rhode Island from 2000 to 2004. For the weeks when irrigation was predicted to take place, the irrigation water-use volume was estimated by multiplying the average metered irrigation application rate by the irrigated acreage for a given crop. The second predictive model that was developed is a crop-water-demand model that uses a daily soil water balance to estimate the water needs of a crop on a given day based on climate, soil, and plant properties. Crop-water-demand models were developed independently of reported irrigation water-use practices and relied on knowledge of plant properties that are available in the literature. Both modeling approaches require accurate accounting of irrigated area and crop type to estimate total irrigation water use. Water-use estimates from both modeling methods were compared to the metered irrigation data from Rhode Island and Georgia that were used to develop the models as well as two independent validation datasets from Georgia and Virginia that were not used in model development. Irrigation water-use estimates from the logistic regression method more closely matched mean reported irrigation rates than estimates from the crop-water-demand model when compared to the irrigation data used to develop the equations. The root mean squared errors (RMSEs) for the logistic regression estimates of mean annual irrigation ranged from 0.3 to 2.0 inches (in.) for the five crop types; RMSEs for the crop-water-demand models ranged from 1.4 to 3.9 in. However, when the models were applied and compared to the independent validation datasets from southwest Georgia from 2010, and from Virginia from 1999 to 2007, the crop-water-demand model estimates were as good as or better at predicting the mean irrigation volume than the logistic regression models for most crop types. RMSEs for logistic regression estimates of mean annual irrigation ranged from 1.0 to 7.0 in. for validation data from Georgia and from 1.8 to 4.9 in. for validation data from Virginia; RMSEs for crop-water-demand model estimates ranged from 2.1 to 5.8 in. for Georgia data and from 2.0 to 3.9 in. for Virginia data. In general, regression-based models performed better in areas that had quality daily or weekly irrigation data from which the regression equations were developed; however, the regression models were less reliable than the crop-water-demand models when applied outside the area for which they were developed. In most eastern coastal states that do not have quality irrigation data, the crop-water-demand model can be used more reliably. The development of predictive models of irrigation water use in this study was hindered by a lack of quality irrigation data. Many mid-Atlantic and New England states do not require irrigation water use to be reported. A survey of irrigation data from 14 eastern coastal states from Maine to Georgia indicated that, with the exception of the data in Georgia, irrigation data in the states that do require reporting commonly did not contain requisite ancillary information such as irrigated area or crop type, lacked precision, or were at an aggregated temporal scale making them unsuitable for use in the development of predictive models. Confidence in the reliability of either modeling method is affected by uncertainty in the reported data from which the models were developed or validated. Only through additional collection of quality data and further study can the accuracy and uncertainty of irrigation water-use estimates be improved in the humid eastern United States.
Bayesian Regression of Thermodynamic Models of Redox Active Materials
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnston, Katherine
Finding a suitable functional redox material is a critical challenge to achieving scalable, economically viable technologies for storing concentrated solar energy in the form of a defected oxide. Demonstrating e ectiveness for thermal storage or solar fuel is largely accomplished by using a thermodynamic model derived from experimental data. The purpose of this project is to test the accuracy of our regression model on representative data sets. Determining the accuracy of the model includes parameter tting the model to the data, comparing the model using di erent numbers of param- eters, and analyzing the entropy and enthalpy calculated from themore » model. Three data sets were considered in this project: two demonstrating materials for solar fuels by wa- ter splitting and the other of a material for thermal storage. Using Bayesian Inference and Markov Chain Monte Carlo (MCMC), parameter estimation was preformed on the three data sets. Good results were achieved, except some there was some deviations on the edges of the data input ranges. The evidence values were then calculated in a variety of ways and used to compare models with di erent number of parameters. It was believed that at least one of the parameters was unnecessary and comparing evidence values demonstrated that the parameter was need on one data set and not signi cantly helpful on another. The entropy was calculated by taking the derivative in one variable and integrating over another. and its uncertainty was also calculated by evaluating the entropy over multiple MCMC samples. Afterwards, all the parts were written up as a tutorial for the Uncertainty Quanti cation Toolkit (UQTk).« less
Factors associated with independent pharmacy owners' satisfaction with Medicare Part D contracts.
Zhang, Su; Doucette, William R; Urmie, Julie M; Xie, Yang; Brooks, John M
2010-06-01
As Medicare Part D contracts apply pressure on the profitability of independent pharmacies, there is concern about their owners' willingness to sign such contracts. Identifying factors affecting independent pharmacy owners' satisfaction with Medicare Part D contracts could inform policy makers in managing Medicare Part D. (1) To identify influences on independent pharmacy owners' satisfaction with Medicare Part D contracts and (2) to characterize comments made by independent pharmacy owners about Medicare Part D. This cross-sectional study used a mail survey of independent pharmacy owners in 15 states comprising 6 Medicare regions to collect information on their most- and least-favorable Medicare Part D contracts, including satisfaction, contract management activities, market position, pharmacy operation, and specific payment levels on brand and generic drugs. Of the 1649 surveys mailed, 296 surveys were analyzed. The regression models for satisfaction with both the least and the most-favorable Part D contracts were significant (P<0.05). A different set of significant influences on satisfaction was identified for each regression model. For the most-favorable contract, influences were contending and equity. For the least-favorable contract, influences were negotiation, equity, generic rate bonus, and medication therapy management (MTM) payment. About one-third of the survey respondents made at least 1 comment. The most frequent themes in the comments were that Medicare Part D reimbursement rate is too low (28%) and that contracts are offered without negotiation in a "take it or leave it" manner (20%). Equity, contending, negotiation, generic rate bonus, and MTM payments were identified as the influences of independent pharmacy owners' satisfaction toward Medicare Part D contracts. Generic rate bonus and MTM payment provide additional financial incentives to less financially favorable contracts and, in turn, contribute to independent pharmacy owner's satisfaction toward these contracts. Copyright 2010 Elsevier Inc. All rights reserved.
Wu, Chung-Hsuen; Erickson, Steven R
2012-09-01
The purpose of this study was to evaluate the association between asthma status and the occurrence and length of work absences among the US working adults. A cross-sectional study was conducted using the 2008 Medical Expenditure Panel Survey (MEPS). Employed respondents between ages 18 and 55 years were included. The association between asthma status (whether respondents have asthma or not) and occurrence of absences and the length of time per absence was evaluated using a two-part model. A multivariate logistic regression as the first part of the model was to estimate the probability of being absent from work at least once during the observation period as a function of asthma status. A multivariate negative binomial regression as the second part of the model was used to assess whether the length of each absence from work was associated with asthma status among respondents who reported at least one absence from work. Sociodemographic, socioeconomic, employment-related, health status, and comorbidity variables were included in each model as covariates. Of 12,161 respondents, 8.2% reported having asthma, which accounted for 10.4 million working adults in the United States in 2008. Employed adults with asthma were more likely to report having at least one absence from work compared to those without asthma in bivariate analyses (26.2% vs. 16.2%, p < .01). After adjusting for the number of comorbid chronic conditions and other covariates, there was no significant difference between having asthma and absenteeism among respondents (odds ratio (OR) = 1.31, 95% confidence interval (CI) = 0.99-1.72, rate ratio (RR) = 1.25, 95% CI = 0.91-1.72). Overall burden of illness as measured by comorbidity indices and perceived health status, but not asthma alone, contributes to absenteeism as well as the number of days off during each occurrence among employed people. It is important for health services researchers to consider overall burden of illness when examining the association between a general outcome such as absence from work and specific conditions such as asthma.
NASA Astrophysics Data System (ADS)
Ibrahim, Elsy; Kim, Wonkook; Crawford, Melba; Monbaliu, Jaak
2017-02-01
Remote sensing has been successfully utilized to distinguish and quantify sediment properties in the intertidal environment. Classification approaches of imagery are popular and powerful yet can lead to site- and case-specific results. Such specificity creates challenges for temporal studies. Thus, this paper investigates the use of regression models to quantify sediment properties instead of classifying them. Two regression approaches, namely multiple regression (MR) and support vector regression (SVR), are used in this study for the retrieval of bio-physical variables of intertidal surface sediment of the IJzermonding, a Belgian nature reserve. In the regression analysis, mud content, chlorophyll a concentration, organic matter content, and soil moisture are estimated using radiometric variables of two airborne sensors, namely airborne hyperspectral sensor (AHS) and airborne prism experiment (APEX) and and using field hyperspectral acquisitions by analytical spectral device (ASD). The performance of the two regression approaches is best for the estimation of moisture content. SVR attains the highest accuracy without feature reduction while MR achieves good results when feature reduction is carried out. Sediment property maps are successfully obtained using the models and hyperspectral imagery where SVR used with all bands achieves the best performance. The study also involves the extraction of weights identifying the contribution of each band of the images in the quantification of each sediment property when MR and principal component analysis are used.
Carbon dioxide stripping in aquaculture -- part III: model verification
Colt, John; Watten, Barnaby; Pfeiffer, Tim
2012-01-01
Based on conventional mass transfer models developed for oxygen, the use of the non-linear ASCE method, 2-point method, and one parameter linear-regression method were evaluated for carbon dioxide stripping data. For values of KLaCO2 < approximately 1.5/h, the 2-point or ASCE method are a good fit to experimental data, but the fit breaks down at higher values of KLaCO2. How to correct KLaCO2 for gas phase enrichment remains to be determined. The one-parameter linear regression model was used to vary the C*CO2 over the test, but it did not result in a better fit to the experimental data when compared to the ASCE or fixed C*CO2 assumptions.
ERIC Educational Resources Information Center
Kane, Michael T.; Mroch, Andrew A.
2010-01-01
In evaluating the relationship between two measures across different groups (i.e., in evaluating "differential validity") it is necessary to examine differences in correlation coefficients and in regression lines. Ordinary least squares (OLS) regression is the standard method for fitting lines to data, but its criterion for optimal fit…
NASA Astrophysics Data System (ADS)
Durmaz, Murat; Karslioglu, Mahmut Onur
2015-04-01
There are various global and regional methods that have been proposed for the modeling of ionospheric vertical total electron content (VTEC). Global distribution of VTEC is usually modeled by spherical harmonic expansions, while tensor products of compactly supported univariate B-splines can be used for regional modeling. In these empirical parametric models, the coefficients of the basis functions as well as differential code biases (DCBs) of satellites and receivers can be treated as unknown parameters which can be estimated from geometry-free linear combinations of global positioning system observables. In this work we propose a new semi-parametric multivariate adaptive regression B-splines (SP-BMARS) method for the regional modeling of VTEC together with satellite and receiver DCBs, where the parametric part of the model is related to the DCBs as fixed parameters and the non-parametric part adaptively models the spatio-temporal distribution of VTEC. The latter is based on multivariate adaptive regression B-splines which is a non-parametric modeling technique making use of compactly supported B-spline basis functions that are generated from the observations automatically. This algorithm takes advantage of an adaptive scale-by-scale model building strategy that searches for best-fitting B-splines to the data at each scale. The VTEC maps generated from the proposed method are compared numerically and visually with the global ionosphere maps (GIMs) which are provided by the Center for Orbit Determination in Europe (CODE). The VTEC values from SP-BMARS and CODE GIMs are also compared with VTEC values obtained through calibration using local ionospheric model. The estimated satellite and receiver DCBs from the SP-BMARS model are compared with the CODE distributed DCBs. The results show that the SP-BMARS algorithm can be used to estimate satellite and receiver DCBs while adaptively and flexibly modeling the daily regional VTEC.
The measurement of linear frequency drift in oscillators
NASA Astrophysics Data System (ADS)
Barnes, J. A.
1985-04-01
A linear drift in frequency is an important element in most stochastic models of oscillator performance. Quartz crystal oscillators often have drifts in excess of a part in ten to the tenth power per day. Even commercial cesium beam devices often show drifts of a few parts in ten to the thirteenth per year. There are many ways to estimate the drift rates from data samples (e.g., regress the phase on a quadratic; regress the frequency on a linear; compute the simple mean of the first difference of frequency; use Kalman filters with a drift term as one element in the state vector; and others). Although most of these estimators are unbiased, they vary in efficiency (i.e., confidence intervals). Further, the estimation of confidence intervals using the standard analysis of variance (typically associated with the specific estimating technique) can give amazingly optimistic results. The source of these problems is not an error in, say, the regressions techniques, but rather the problems arise from correlations within the residuals. That is, the oscillator model is often not consistent with constraints on the analysis technique or, in other words, some specific analysis techniques are often inappropriate for the task at hand. The appropriateness of a specific analysis technique is critically dependent on the oscillator model and can often be checked with a simple whiteness test on the residuals.
Two-dimensional advective transport in ground-water flow parameter estimation
Anderman, E.R.; Hill, M.C.; Poeter, E.P.
1996-01-01
Nonlinear regression is useful in ground-water flow parameter estimation, but problems of parameter insensitivity and correlation often exist given commonly available hydraulic-head and head-dependent flow (for example, stream and lake gain or loss) observations. To address this problem, advective-transport observations are added to the ground-water flow, parameter-estimation model MODFLOWP using particle-tracking methods. The resulting model is used to investigate the importance of advective-transport observations relative to head-dependent flow observations when either or both are used in conjunction with hydraulic-head observations in a simulation of the sewage-discharge plume at Otis Air Force Base, Cape Cod, Massachusetts, USA. The analysis procedure for evaluating the probable effect of new observations on the regression results consists of two steps: (1) parameter sensitivities and correlations calculated at initial parameter values are used to assess the model parameterization and expected relative contributions of different types of observations to the regression; and (2) optimal parameter values are estimated by nonlinear regression and evaluated. In the Cape Cod parameter-estimation model, advective-transport observations did not significantly increase the overall parameter sensitivity; however: (1) inclusion of advective-transport observations decreased parameter correlation enough for more unique parameter values to be estimated by the regression; (2) realistic uncertainties in advective-transport observations had a small effect on parameter estimates relative to the precision with which the parameters were estimated; and (3) the regression results and sensitivity analysis provided insight into the dynamics of the ground-water flow system, especially the importance of accurate boundary conditions. In this work, advective-transport observations improved the calibration of the model and the estimation of ground-water flow parameters, and use of regression and related techniques produced significant insight into the physical system.
Time series modeling by a regression approach based on a latent process.
Chamroukhi, Faicel; Samé, Allou; Govaert, Gérard; Aknin, Patrice
2009-01-01
Time series are used in many domains including finance, engineering, economics and bioinformatics generally to represent the change of a measurement over time. Modeling techniques may then be used to give a synthetic representation of such data. A new approach for time series modeling is proposed in this paper. It consists of a regression model incorporating a discrete hidden logistic process allowing for activating smoothly or abruptly different polynomial regression models. The model parameters are estimated by the maximum likelihood method performed by a dedicated Expectation Maximization (EM) algorithm. The M step of the EM algorithm uses a multi-class Iterative Reweighted Least-Squares (IRLS) algorithm to estimate the hidden process parameters. To evaluate the proposed approach, an experimental study on simulated data and real world data was performed using two alternative approaches: a heteroskedastic piecewise regression model using a global optimization algorithm based on dynamic programming, and a Hidden Markov Regression Model whose parameters are estimated by the Baum-Welch algorithm. Finally, in the context of the remote monitoring of components of the French railway infrastructure, and more particularly the switch mechanism, the proposed approach has been applied to modeling and classifying time series representing the condition measurements acquired during switch operations.
Multivariate Analysis of Seismic Field Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Alam, M. Kathleen
1999-06-01
This report includes the details of the model building procedure and prediction of seismic field data. Principal Components Regression, a multivariate analysis technique, was used to model seismic data collected as two pieces of equipment were cycled on and off. Models built that included only the two pieces of equipment of interest had trouble predicting data containing signals not included in the model. Evidence for poor predictions came from the prediction curves as well as spectral F-ratio plots. Once the extraneous signals were included in the model, predictions improved dramatically. While Principal Components Regression performed well for the present datamore » sets, the present data analysis suggests further work will be needed to develop more robust modeling methods as the data become more complex.« less
EFFECTS OF LANDSCAPE CHARACTERISTICS ON LAND-COVER CLASS ACCURACY
Utilizing land-cover data gathered as part of the National Land-Cover Data (NLCD) set accuracy assessment, several logistic regression models were formulated to analyze the effects of patch size and land-cover heterogeneity on classification accuracy. Specific land-cover ...
Christensen, A L; Lundbye-Christensen, S; Dethlefsen, C
2011-12-01
Several statistical methods of assessing seasonal variation are available. Brookhart and Rothman [3] proposed a second-order moment-based estimator based on the geometrical model derived by Edwards [1], and reported that this estimator is superior in estimating the peak-to-trough ratio of seasonal variation compared with Edwards' estimator with respect to bias and mean squared error. Alternatively, seasonal variation may be modelled using a Poisson regression model, which provides flexibility in modelling the pattern of seasonal variation and adjustments for covariates. Based on a Monte Carlo simulation study three estimators, one based on the geometrical model, and two based on log-linear Poisson regression models, were evaluated in regards to bias and standard deviation (SD). We evaluated the estimators on data simulated according to schemes varying in seasonal variation and presence of a secular trend. All methods and analyses in this paper are available in the R package Peak2Trough[13]. Applying a Poisson regression model resulted in lower absolute bias and SD for data simulated according to the corresponding model assumptions. Poisson regression models had lower bias and SD for data simulated to deviate from the corresponding model assumptions than the geometrical model. This simulation study encourages the use of Poisson regression models in estimating the peak-to-trough ratio of seasonal variation as opposed to the geometrical model. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
Detecting DIF in Polytomous Items Using MACS, IRT and Ordinal Logistic Regression
ERIC Educational Resources Information Center
Elosua, Paula; Wells, Craig
2013-01-01
The purpose of the present study was to compare the Type I error rate and power of two model-based procedures, the mean and covariance structure model (MACS) and the item response theory (IRT), and an observed-score based procedure, ordinal logistic regression, for detecting differential item functioning (DIF) in polytomous items. A simulation…
ERIC Educational Resources Information Center
Bloom, Allan M.; And Others
In response to the increasing importance of student performance in required classes, research was conducted to compare two prediction procedures, linear modeling using multiple regression and nonlinear modeling using AID3. Performance in the first college math course (College Mathematics, Calculus, or Business Calculus Matrices) was the dependent…
Tiedeman, C.R.; Kernodle, J.M.; McAda, D.P.
1998-01-01
This report documents the application of nonlinear-regression methods to a numerical model of ground-water flow in the Albuquerque Basin, New Mexico. In the Albuquerque Basin, ground water is the primary source for most water uses. Ground-water withdrawal has steadily increased since the 1940's, resulting in large declines in water levels in the Albuquerque area. A ground-water flow model was developed in 1994 and revised and updated in 1995 for the purpose of managing basin ground- water resources. In the work presented here, nonlinear-regression methods were applied to a modified version of the previous flow model. Goals of this work were to use regression methods to calibrate the model with each of six different configurations of the basin subsurface and to assess and compare optimal parameter estimates, model fit, and model error among the resulting calibrations. The Albuquerque Basin is one in a series of north trending structural basins within the Rio Grande Rift, a region of Cenozoic crustal extension. Mountains, uplifts, and fault zones bound the basin, and rock units within the basin include pre-Santa Fe Group deposits, Tertiary Santa Fe Group basin fill, and post-Santa Fe Group volcanics and sediments. The Santa Fe Group is greater than 14,000 feet (ft) thick in the central part of the basin. During deposition of the Santa Fe Group, crustal extension resulted in development of north trending normal faults with vertical displacements of as much as 30,000 ft. Ground-water flow in the Albuquerque Basin occurs primarily in the Santa Fe Group and post-Santa Fe Group deposits. Water flows between the ground-water system and surface-water bodies in the inner valley of the basin, where the Rio Grande, a network of interconnected canals and drains, and Cochiti Reservoir are located. Recharge to the ground-water flow system occurs as infiltration of precipitation along mountain fronts and infiltration of stream water along tributaries to the Rio Grande; subsurface flow from adjacent regions; irrigation and septic field seepage; and leakage through the Rio Grande, canal, and Cochiti Reservoir beds. Ground water is discharged from the basin by withdrawal; evapotranspiration; subsurface flow; and flow to the Rio Grande, canals, and drains. The transient, three-dimensional numerical model of ground-water flow to which nonlinear-regression methods were applied simulates flow in the Albuquerque Basin from 1900 to March 1995. Six different basin subsurface configurations are considered in the model. These configurations are designed to test the effects of (1) varying the simulated basin thickness, (2) including a hypothesized hydrogeologic unit with large hydraulic conductivity in the western part of the basin (the west basin high-K zone), and (3) substantially lowering the simulated hydraulic conductivity of a fault in the western part of the basin (the low-K fault zone). The model with each of the subsurface configurations was calibrated using a nonlinear least- squares regression technique. The calibration data set includes 802 hydraulic-head measurements that provide broad spatial and temporal coverage of basin conditions, and one measurement of net flow from the Rio Grande and drains to the ground-water system in the Albuquerque area. Data are weighted on the basis of estimates of the standard deviations of measurement errors. The 10 to 12 parameters to which the calibration data as a whole are generally most sensitive were estimated by nonlinear regression, whereas the remaining model parameter values were specified. Results of model calibration indicate that the optimal parameter estimates as a whole are most reasonable in calibrations of the model with with configurations 3 (which contains 1,600-ft-thick basin deposits and the west basin high-K zone), 4 (which contains 5,000-ft-thick basin de
Complex Environmental Data Modelling Using Adaptive General Regression Neural Networks
NASA Astrophysics Data System (ADS)
Kanevski, Mikhail
2015-04-01
The research deals with an adaptation and application of Adaptive General Regression Neural Networks (GRNN) to high dimensional environmental data. GRNN [1,2,3] are efficient modelling tools both for spatial and temporal data and are based on nonparametric kernel methods closely related to classical Nadaraya-Watson estimator. Adaptive GRNN, using anisotropic kernels, can be also applied for features selection tasks when working with high dimensional data [1,3]. In the present research Adaptive GRNN are used to study geospatial data predictability and relevant feature selection using both simulated and real data case studies. The original raw data were either three dimensional monthly precipitation data or monthly wind speeds embedded into 13 dimensional space constructed by geographical coordinates and geo-features calculated from digital elevation model. GRNN were applied in two different ways: 1) adaptive GRNN with the resulting list of features ordered according to their relevancy; and 2) adaptive GRNN applied to evaluate all possible models N [in case of wind fields N=(2^13 -1)=8191] and rank them according to the cross-validation error. In both cases training were carried out applying leave-one-out procedure. An important result of the study is that the set of the most relevant features depends on the month (strong seasonal effect) and year. The predictabilities of precipitation and wind field patterns, estimated using the cross-validation and testing errors of raw and shuffled data, were studied in detail. The results of both approaches were qualitatively and quantitatively compared. In conclusion, Adaptive GRNN with their ability to select features and efficient modelling of complex high dimensional data can be widely used in automatic/on-line mapping and as an integrated part of environmental decision support systems. 1. Kanevski M., Pozdnoukhov A., Timonin V. Machine Learning for Spatial Environmental Data. Theory, applications and software. EPFL Press. With a CD: data, software, guides. (2009). 2. Kanevski M. Spatial Predictions of Soil Contamination Using General Regression Neural Networks. Systems Research and Information Systems, Volume 8, number 4, 1999. 3. Robert S., Foresti L., Kanevski M. Spatial prediction of monthly wind speeds in complex terrain with adaptive general regression neural networks. International Journal of Climatology, 33 pp. 1793-1804, 2013.
David R. Weise; Eunmo Koo; Xiangyang Zhou; Shankar Mahalingam; Frédéric Morandini; Jacques-Henri Balbi
2016-01-01
Fire behaviour data from 240 laboratory fires in high-density live chaparral fuel beds were compared with model predictions. Logistic regression was used to develop a model to predict fire spread success in the fuel beds and linear regression was used to predict rate of spread. Predictions from the Rothermel equation and three proposed changes as well as two physically...
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
Estimation of elimination half-lives of organic chemicals in humans using gradient boosting machine.
Lu, Jing; Lu, Dong; Zhang, Xiaochen; Bi, Yi; Cheng, Keguang; Zheng, Mingyue; Luo, Xiaomin
2016-11-01
Elimination half-life is an important pharmacokinetic parameter that determines exposure duration to approach steady state of drugs and regulates drug administration. The experimental evaluation of half-life is time-consuming and costly. Thus, it is attractive to build an accurate prediction model for half-life. In this study, several machine learning methods, including gradient boosting machine (GBM), support vector regressions (RBF-SVR and Linear-SVR), local lazy regression (LLR), SA, SR, and GP, were employed to build high-quality prediction models. Two strategies of building consensus models were explored to improve the accuracy of prediction. Moreover, the applicability domains (ADs) of the models were determined by using the distance-based threshold. Among seven individual models, GBM showed the best performance (R(2)=0.820 and RMSE=0.555 for the test set), and Linear-SVR produced the inferior prediction accuracy (R(2)=0.738 and RMSE=0.672). The use of distance-based ADs effectively determined the scope of QSAR models. However, the consensus models by combing the individual models could not improve the prediction performance. Some essential descriptors relevant to half-life were identified and analyzed. An accurate prediction model for elimination half-life was built by GBM, which was superior to the reference model (R(2)=0.723 and RMSE=0.698). Encouraged by the promising results, we expect that the GBM model for elimination half-life would have potential applications for the early pharmacokinetic evaluations, and provide guidance for designing drug candidates with favorable in vivo exposure profile. This article is part of a Special Issue entitled "System Genetics" Guest Editor: Dr. Yudong Cai and Dr. Tao Huang. Copyright © 2016 Elsevier B.V. All rights reserved.
Prediction of coagulation and flocculation processes using ANN models and fuzzy regression.
Zangooei, Hossein; Delnavaz, Mohammad; Asadollahfardi, Gholamreza
2016-09-01
Coagulation and flocculation are two main processes used to integrate colloidal particles into larger particles and are two main stages of primary water treatment. Coagulation and flocculation processes are only needed when colloidal particles are a significant part of the total suspended solid fraction. Our objective was to predict turbidity of water after the coagulation and flocculation process while other parameters such as types and concentrations of coagulants, pH, and influent turbidity of raw water were known. We used a multilayer perceptron (MLP), a radial basis function (RBF) of artificial neural networks (ANNs) and various kinds of fuzzy regression analysis to predict turbidity after the coagulation and flocculation processes. The coagulant used in the pilot plant, which was located in water treatment plant, was poly aluminum chloride. We used existing data, including the type and concentrations of coagulant, pH and influent turbidity, of the raw water because these types of data were available from the pilot plant for simulation and data was collected by the Tehran water authority. The results indicated that ANNs had more ability in simulating the coagulation and flocculation process and predicting turbidity removal with different experimental data than did the fuzzy regression analysis, and may have the ability to reduce the number of jar tests, which are time-consuming and expensive. The MLP neural network proved to be the best network compared to the RBF neural network and fuzzy regression analysis in this study. The MLP neural network can predict the effluent turbidity of the coagulation and the flocculation process with a coefficient of determination (R 2 ) of 0.96 and root mean square error of 0.0106.
Linear models for calculating digestibile energy for sheep diets.
Fonnesbeck, P V; Christiansen, M L; Harris, L E
1981-05-01
Equations for estimating the digestible energy (DE) content of sheep diets were generated from the chemical contents and a factorial description of diets fed to lambs in digestion trials. The diet factors were two forages (alfalfa and grass hay), harvested at three stages of maturity (late vegetative, early bloom and full bloom), fed in two ingredient combinations (all hay or a 50:50 hay and corn grain mixture) and prepared by two forage texture processes (coarsely chopped or finely chopped and pelleted). The 2 x 3 x 2 x 2 factorial arrangement produced 24 diet treatments. These were replicated twice, for a total of 48 lamb digestion trials. In model 1 regression equations, DE was calculated directly from chemical composition of the diet. In model 2, regression equations predicted the percentage of digested nutrient from the chemical contents of the diet and then DE of the diet was calculated as the sum of the gross energy of the digested organic components. Expanded forms of model 1 and model 2 were also developed that included diet factors as qualitative indicator variables to adjust the regression constant and regression coefficients for the diet description. The expanded forms of the equations accounted for significantly more variation in DE than did the simple models and more accurately estimated DE of the diet. Information provided by the diet description proved as useful as chemical analyses for the prediction of digestibility of nutrients. The statistics indicate that, with model 1, neutral detergent fiber and plant cell wall analyses provided as much information for the estimation of DE as did model 2 with the combined information from crude protein, available carbohydrate, total lipid, cellulose and hemicellulose. Regression equations are presented for estimating DE with the most currently analyzed organic components, including linear and curvilinear variables and diet factors that significantly reduce the standard error of the estimate. To estimate De of a diet, the user utilizes the equation that uses the chemical analysis information and diet description most effectively.
NASA Astrophysics Data System (ADS)
Bloomfield, J. P.; Allen, D. J.; Griffiths, K. J.
2009-06-01
SummaryLinear regression methods can be used to quantify geological controls on baseflow index (BFI). This is illustrated using an example from the Thames Basin, UK. Two approaches have been adopted. The areal extents of geological classes based on lithostratigraphic and hydrogeological classification schemes have been correlated with BFI for 44 'natural' catchments from the Thames Basin. When regression models are built using lithostratigraphic classes that include a constant term then the model is shown to have some physical meaning and the relative influence of the different geological classes on BFI can be quantified. For example, the regression constants for two such models, 0.64 and 0.69, are consistent with the mean observed BFI (0.65) for the Thames Basin, and the signs and relative magnitudes of the regression coefficients for each of the lithostratigraphic classes are consistent with the hydrogeology of the Basin. In addition, regression coefficients for the lithostratigraphic classes scale linearly with estimates of log 10 hydraulic conductivity for each lithological class. When a regression is built using a hydrogeological classification scheme with no constant term, the model does not have any physical meaning, but it has a relatively high adjusted R2 value and because of the continuous coverage of the hydrogeological classification scheme, the model can be used for predictive purposes. A model calibrated on the 44 'natural' catchments and using four hydrogeological classes (low-permeability surficial deposits, consolidated aquitards, fractured aquifers and intergranular aquifers) is shown to perform as well as a model based on a hydrology of soil types (BFIHOST) scheme in predicting BFI in the Thames Basin. Validation of this model using 110 other 'variably impacted' catchments in the Basin shows that there is a correlation between modelled and observed BFI. Where the observed BFI is significantly higher than modelled BFI the deviations can be explained by an exogenous factor, catchment urban area. It is inferred that this is may be due influences from sewage discharge, mains leakage, and leakage from septic tanks.
Miller, Justin B; Axelrod, Bradley N; Schutte, Christian
2012-01-01
The recent release of the Wechsler Memory Scale Fourth Edition contains many improvements from a theoretical and administration perspective, including demographic corrections using the Advanced Clinical Solutions. Although the administration time has been reduced from previous versions, a shortened version may be desirable in certain situations given practical time limitations in clinical practice. The current study evaluated two- and three-subtest estimations of demographically corrected Immediate and Delayed Memory index scores using both simple arithmetic prorating and regression models. All estimated values were significantly associated with observed index scores. Use of Lin's Concordance Correlation Coefficient as a measure of agreement showed a high degree of precision and virtually zero bias in the models, although the regression models showed a stronger association than prorated models. Regression-based models proved to be more accurate than prorated estimates with less dispersion around observed values, particularly when using three subtest regression models. Overall, the present research shows strong support for estimating demographically corrected index scores on the WMS-IV in clinical practice with an adequate performance using arithmetically prorated models and a stronger performance using regression models to predict index scores.
D'Archivio, Angelo Antonio; Incani, Angela; Ruggieri, Fabrizio
2011-01-01
In this paper, we use a quantitative structure-retention relationship (QSRR) method to predict the retention times of polychlorinated biphenyls (PCBs) in comprehensive two-dimensional gas chromatography (GC×GC). We analyse the GC×GC retention data taken from the literature by comparing predictive capability of different regression methods. The various models are generated using 70 out of 209 PCB congeners in the calibration stage, while their predictive performance is evaluated on the remaining 139 compounds. The two-dimensional chromatogram is initially estimated by separately modelling retention times of PCBs in the first and in the second column ((1) t (R) and (2) t (R), respectively). In particular, multilinear regression (MLR) combined with genetic algorithm (GA) variable selection is performed to extract two small subsets of predictors for (1) t (R) and (2) t (R) from a large set of theoretical molecular descriptors provided by the popular software Dragon, which after removal of highly correlated or almost constant variables consists of 237 structure-related quantities. Based on GA-MLR analysis, a four-dimensional and a five-dimensional relationship modelling (1) t (R) and (2) t (R), respectively, are identified. Single-response partial least square (PLS-1) regression is alternatively applied to independently model (1) t (R) and (2) t (R) without the need for preliminary GA variable selection. Further, we explore the possibility of predicting the two-dimensional chromatogram of PCBs in a single calibration procedure by using a two-response PLS (PLS-2) model or a feed-forward artificial neural network (ANN) with two output neurons. In the first case, regression is carried out on the full set of 237 descriptors, while the variables previously selected by GA-MLR are initially considered as ANN inputs and subjected to a sensitivity analysis to remove the redundant ones. Results show PLS-1 regression exhibits a noticeably better descriptive and predictive performance than the other investigated approaches. The observed values of determination coefficients for (1) t (R) and (2) t (R) in calibration (0.9999 and 0.9993, respectively) and prediction (0.9987 and 0.9793, respectively) provided by PLS-1 demonstrate that GC×GC behaviour of PCBs is properly modelled. In particular, the predicted two-dimensional GC×GC chromatogram of 139 PCBs not involved in the calibration stage closely resembles the experimental one. Based on the above lines of evidence, the proposed approach ensures accurate simulation of the whole GC×GC chromatogram of PCBs using experimental determination of only 1/3 retention data of representative congeners.
Su, Li; Farewell, Vernon T
2013-01-01
For semi-continuous data which are a mixture of true zeros and continuously distributed positive values, the use of two-part mixed models provides a convenient modelling framework. However, deriving population-averaged (marginal) effects from such models is not always straightforward. Su et al. presented a model that provided convenient estimation of marginal effects for the logistic component of the two-part model but the specification of marginal effects for the continuous part of the model presented in that paper was based on an incorrect formulation. We present a corrected formulation and additionally explore the use of the two-part model for inferences on the overall marginal mean, which may be of more practical relevance in our application and more generally. PMID:24201470
Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K
2011-10-01
To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included. Copyright © 2011 Elsevier B.V. All rights reserved.
Green, Kimberly T.; Beckham, Jean C.; Youssef, Nagy; Elbogen, Eric B.
2013-01-01
Objective The present study sought to investigate the longitudinal effects of psychological resilience against alcohol misuse adjusting for socio-demographic factors, trauma-related variables, and self-reported history of alcohol abuse. Methodology Data were from National Post-Deployment Adjustment Study (NPDAS) participants who completed both a baseline and one-year follow-up survey (N=1090). Survey questionnaires measured combat exposure, probable posttraumatic stress disorder (PTSD), psychological resilience, and alcohol misuse, all of which were measured at two discrete time periods (baseline and one-year follow-up). Baseline resilience and change in resilience (increased or decreased) were utilized as independent variables in separate models evaluating alcohol misuse at the one-year follow-up. Results Multiple linear regression analyses controlled for age, gender, level of educational attainment, combat exposure, PTSD symptom severity, and self-reported alcohol abuse. Accounting for these covariates, findings revealed that lower baseline resilience, younger age, male gender, and self-reported alcohol abuse were related to alcohol misuse at the one-year follow-up. A separate regression analysis, adjusting for the same covariates, revealed a relationship between change in resilience (from baseline to the one-year follow-up) and alcohol misuse at the one-year follow-up. The regression model evaluating these variables in a subset of the sample in which all the participants had been deployed to Iraq and/or Afghanistan was consistent with findings involving the overall era sample. Finally, logistic regression analyses of the one-year follow-up data yielded similar results to the baseline and resilience change models. Conclusions These findings suggest that increased psychological resilience is inversely related to alcohol misuse and is protective against alcohol misuse over time. Additionally, it supports the conceptualization of resilience as a process which evolves over time. Moreover, our results underscore the importance of assessing resilience as part of alcohol use screening for preventing alcohol misuse in Iraq and Afghanistan era military veterans. PMID:24090625
Modelling space of spread Dengue Hemorrhagic Fever (DHF) in Central Java use spatial durbin model
NASA Astrophysics Data System (ADS)
Ispriyanti, Dwi; Prahutama, Alan; Taryono, Arkadina PN
2018-05-01
Dengue Hemorrhagic Fever is one of the major public health problems in Indonesia. From year to year, DHF causes Extraordinary Event in most parts of Indonesia, especially Central Java. Central Java consists of 35 districts or cities where each region is close to each other. Spatial regression is an analysis that suspects the influence of independent variables on the dependent variables with the influences of the region inside. In spatial regression modeling, there are spatial autoregressive model (SAR), spatial error model (SEM) and spatial autoregressive moving average (SARMA). Spatial Durbin model is the development of SAR where the dependent and independent variable have spatial influence. In this research dependent variable used is number of DHF sufferers. The independent variables observed are population density, number of hospitals, residents and health centers, and mean years of schooling. From the multiple regression model test, the variables that significantly affect the spread of DHF disease are the population and mean years of schooling. By using queen contiguity and rook contiguity, the best model produced is the SDM model with queen contiguity because it has the smallest AIC value of 494,12. Factors that generally affect the spread of DHF in Central Java Province are the number of population and the average length of school.
An improved strategy for regression of biophysical variables and Landsat ETM+ data.
Warren B. Cohen; Thomas K. Maiersperger; Stith T. Gower; David P. Turner
2003-01-01
Empirical models are important tools for relating field-measured biophysical variables to remote sensing data. Regression analysis has been a popular empirical method of linking these two types of data to provide continuous estimates for variables such as biomass, percent woody canopy cover, and leaf area index (LAI). Traditional methods of regression are not...
Prediction of hourly PM2.5 using a space-time support vector regression model
NASA Astrophysics Data System (ADS)
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Predicting ecological flow regime at ungaged sites: A comparison of methods
Murphy, Jennifer C.; Knight, Rodney R.; Wolfe, William J.; Gain, W. Scott
2012-01-01
Nineteen ecologically relevant streamflow characteristics were estimated using published rainfall–runoff and regional regression models for six sites with observed daily streamflow records in Kentucky. The regional regression model produced median estimates closer to the observed median for all but two characteristics. The variability of predictions from both models was generally less than the observed variability. The variability of the predictions from the rainfall–runoff model was greater than that from the regional regression model for all but three characteristics. Eight characteristics predicted by the rainfall–runoff model display positive or negative bias across all six sites; biases are not as pronounced for the regional regression model. Results suggest that a rainfall–runoff model calibrated on a single characteristic is less likely to perform well as a predictor of a range of other characteristics (flow regime) when compared with a regional regression model calibrated individually on multiple characteristics used to represent the flow regime. Poor model performance may misrepresent hydrologic conditions, potentially distorting the perceived risk of ecological degradation. Without prior selection of streamflow characteristics, targeted calibration, and error quantification, the widespread application of general hydrologic models to ecological flow studies is problematic. Published 2012. This article is a U.S. Government work and is in the public domain in the USA.
Kim, Yoonsang; Choi, Young-Ku; Emery, Sherry
2013-08-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods' performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages-SAS GLIMMIX Laplace and SuperMix Gaussian quadrature-perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes.
Kim, Yoonsang; Emery, Sherry
2013-01-01
Several statistical packages are capable of estimating generalized linear mixed models and these packages provide one or more of three estimation methods: penalized quasi-likelihood, Laplace, and Gauss-Hermite. Many studies have investigated these methods’ performance for the mixed-effects logistic regression model. However, the authors focused on models with one or two random effects and assumed a simple covariance structure between them, which may not be realistic. When there are multiple correlated random effects in a model, the computation becomes intensive, and often an algorithm fails to converge. Moreover, in our analysis of smoking status and exposure to anti-tobacco advertisements, we have observed that when a model included multiple random effects, parameter estimates varied considerably from one statistical package to another even when using the same estimation method. This article presents a comprehensive review of the advantages and disadvantages of each estimation method. In addition, we compare the performances of the three methods across statistical packages via simulation, which involves two- and three-level logistic regression models with at least three correlated random effects. We apply our findings to a real dataset. Our results suggest that two packages—SAS GLIMMIX Laplace and SuperMix Gaussian quadrature—perform well in terms of accuracy, precision, convergence rates, and computing speed. We also discuss the strengths and weaknesses of the two packages in regard to sample sizes. PMID:24288415
Regression-based model of skin diffuse reflectance for skin color analysis
NASA Astrophysics Data System (ADS)
Tsumura, Norimichi; Kawazoe, Daisuke; Nakaguchi, Toshiya; Ojima, Nobutoshi; Miyake, Yoichi
2008-11-01
A simple regression-based model of skin diffuse reflectance is developed based on reflectance samples calculated by Monte Carlo simulation of light transport in a two-layered skin model. This reflectance model includes the values of spectral reflectance in the visible spectra for Japanese women. The modified Lambert Beer law holds in the proposed model with a modified mean free path length in non-linear density space. The averaged RMS and maximum errors of the proposed model were 1.1 and 3.1%, respectively, in the above range.
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Kennedy, Jeffrey R.; Paretti, Nicholas V.; Veilleux, Andrea G.
2014-01-01
Regression equations, which allow predictions of n-day flood-duration flows for selected annual exceedance probabilities at ungaged sites, were developed using generalized least-squares regression and flood-duration flow frequency estimates at 56 streamgaging stations within a single, relatively uniform physiographic region in the central part of Arizona, between the Colorado Plateau and Basin and Range Province, called the Transition Zone. Drainage area explained most of the variation in the n-day flood-duration annual exceedance probabilities, but mean annual precipitation and mean elevation were also significant variables in the regression models. Standard error of prediction for the regression equations varies from 28 to 53 percent and generally decreases with increasing n-day duration. Outside the Transition Zone there are insufficient streamgaging stations to develop regression equations, but flood-duration flow frequency estimates are presented at select streamgaging stations.
Shrinkage Estimation of Varying Covariate Effects Based On Quantile Regression
Peng, Limin; Xu, Jinfeng; Kutner, Nancy
2013-01-01
Varying covariate effects often manifest meaningful heterogeneity in covariate-response associations. In this paper, we adopt a quantile regression model that assumes linearity at a continuous range of quantile levels as a tool to explore such data dynamics. The consideration of potential non-constancy of covariate effects necessitates a new perspective for variable selection, which, under the assumed quantile regression model, is to retain variables that have effects on all quantiles of interest as well as those that influence only part of quantiles considered. Current work on l1-penalized quantile regression either does not concern varying covariate effects or may not produce consistent variable selection in the presence of covariates with partial effects, a practical scenario of interest. In this work, we propose a shrinkage approach by adopting a novel uniform adaptive LASSO penalty. The new approach enjoys easy implementation without requiring smoothing. Moreover, it can consistently identify the true model (uniformly across quantiles) and achieve the oracle estimation efficiency. We further extend the proposed shrinkage method to the case where responses are subject to random right censoring. Numerical studies confirm the theoretical results and support the utility of our proposals. PMID:25332515
NASA Astrophysics Data System (ADS)
Lu, Lin; Chang, Yunlong; Li, Yingmin; He, Youyou
2013-05-01
A transverse magnetic field was introduced to the arc plasma in the process of welding stainless steel tubes by high-speed Tungsten Inert Gas Arc Welding (TIG for short) without filler wire. The influence of external magnetic field on welding quality was investigated. 9 sets of parameters were designed by the means of orthogonal experiment. The welding joint tensile strength and form factor of weld were regarded as the main standards of welding quality. A binary quadratic nonlinear regression equation was established with the conditions of magnetic induction and flow rate of Ar gas. The residual standard deviation was calculated to adjust the accuracy of regression model. The results showed that, the regression model was correct and effective in calculating the tensile strength and aspect ratio of weld. Two 3D regression models were designed respectively, and then the impact law of magnetic induction on welding quality was researched.
NASA Astrophysics Data System (ADS)
Solimun
2017-05-01
The aim of this research is to model survival data from kidney-transplant patients using the partial least squares (PLS)-Cox regression, which can both meet and not meet the no-multicollinearity assumption. The secondary data were obtained from research entitled "Factors affecting the survival of kidney-transplant patients". The research subjects comprised 250 patients. The predictor variables consisted of: age (X1), sex (X2); two categories, prior hemodialysis duration (X3), diabetes (X4); two categories, prior transplantation number (X5), number of blood transfusions (X6), discrepancy score (X7), use of antilymphocyte globulin(ALG) (X8); two categories, while the response variable was patient survival time (in months). Partial least squares regression is a model that connects the predictor variables X and the response variable y and it initially aims to determine the relationship between them. Results of the above analyses suggest that the survival of kidney transplant recipients ranged from 0 to 55 months, with 62% of the patients surviving until they received treatment that lasted for 55 months. The PLS-Cox regression analysis results revealed that patients' age and the use of ALG significantly affected the survival time of patients. The factor of patients' age (X1) in the PLS-Cox regression model merely affected the failure probability by 1.201. This indicates that the probability of dying for elderly patients with a kidney transplant is 1.152 times higher than that for younger patients.
Guenole, Nigel; Brown, Anna
2014-01-01
We report a Monte Carlo study examining the effects of two strategies for handling measurement non-invariance – modeling and ignoring non-invariant items – on structural regression coefficients between latent variables measured with item response theory models for categorical indicators. These strategies were examined across four levels and three types of non-invariance – non-invariant loadings, non-invariant thresholds, and combined non-invariance on loadings and thresholds – in simple, partial, mediated and moderated regression models where the non-invariant latent variable occupied predictor, mediator, and criterion positions in the structural regression models. When non-invariance is ignored in the latent predictor, the focal group regression parameters are biased in the opposite direction to the difference in loadings and thresholds relative to the referent group (i.e., lower loadings and thresholds for the focal group lead to overestimated regression parameters). With criterion non-invariance, the focal group regression parameters are biased in the same direction as the difference in loadings and thresholds relative to the referent group. While unacceptable levels of parameter bias were confined to the focal group, bias occurred at considerably lower levels of ignored non-invariance than was previously recognized in referent and focal groups. PMID:25278911
Kim, So-Hyun; Shin, Yoo-Soo; Choi, Hyung-Kyoon
2016-03-01
Korean ginseng (Panax ginseng C.A. Meyer) is one of the most popular medicinal herbs used in Asia, including Korea and China. In the present study lipid profiling of two officially registered cultivars (P. ginseng 'Chunpoong' and P. ginseng 'Yunpoong') was performed at different cultivation ages (5 and 6 years) and on different parts (tap roots, lateral roots, and rhizomes) using nano-electrospray ionization-mass spectrometry (nanoESI-MS). In total, 30 compounds including galactolipids, phospholipids, triacylglycerols, and ginsenosides were identified. Among them, triacylglycerol 54:6 (18:2/18:2/18:2), phosphatidylglycerol 34:3 (16:0/18:3), monogalactosyldiacylglycerol 36:4 (18:2/18:2), phosphatidic acid species 36:4 (18:2/18:2), and 34:1 (16:0/18:1) were selected as biomarkers to discriminate cultivars, cultivation ages, and parts. In addition, an unknown P. ginseng sample was successfully predicted by applying validated partial least squares projection to latent structures regression models. This is the first study regarding the identification of intact lipid species from P. ginseng and to predict cultivars, cultivation ages, and parts of P. ginseng using nanoESI-MS-based lipidomic profiling with a multivariate statistical analysis.
Fenlon, Caroline; O'Grady, Luke; Butler, Stephen; Doherty, Michael L; Dunnion, John
2017-01-01
Herd fertility in pasture-based dairy farms is a key driver of farm economics. Models for predicting nulliparous reproductive outcomes are rare, but age, genetics, weight, and BCS have been identified as factors influencing heifer conception. The aim of this study was to create a simulation model of heifer conception to service with thorough evaluation. Artificial Insemination service records from two research herds and ten commercial herds were provided to build and evaluate the models. All were managed as spring-calving pasture-based systems. The factors studied were related to age, genetics, and time of service. The data were split into training and testing sets and bootstrapping was used to train the models. Logistic regression (with and without random effects) and generalised additive modelling were selected as the model-building techniques. Two types of evaluation were used to test the predictive ability of the models: discrimination and calibration. Discrimination, which includes sensitivity, specificity, accuracy and ROC analysis, measures a model's ability to distinguish between positive and negative outcomes. Calibration measures the accuracy of the predicted probabilities with the Hosmer-Lemeshow goodness-of-fit, calibration plot and calibration error. After data cleaning and the removal of services with missing values, 1396 services remained to train the models and 597 were left for testing. Age, breed, genetic predicted transmitting ability for calving interval, month and year were significant in the multivariate models. The regression models also included an interaction between age and month. Year within herd was a random effect in the mixed regression model. Overall prediction accuracy was between 77.1% and 78.9%. All three models had very high sensitivity, but low specificity. The two regression models were very well-calibrated. The mean absolute calibration errors were all below 4%. Because the models were not adept at identifying unsuccessful services, they are not suggested for use in predicting the outcome of individual heifer services. Instead, they are useful for the comparison of services with different covariate values or as sub-models in whole-farm simulations. The mixed regression model was identified as the best model for prediction, as the random effects can be ignored and the other variables can be easily obtained or simulated.
Many-level multilevel structural equation modeling: An efficient evaluation strategy.
Pritikin, Joshua N; Hunter, Michael D; von Oertzen, Timo; Brick, Timothy R; Boker, Steven M
2017-01-01
Structural equation models are increasingly used for clustered or multilevel data in cases where mixed regression is too inflexible. However, when there are many levels of nesting, these models can become difficult to estimate. We introduce a novel evaluation strategy, Rampart, that applies an orthogonal rotation to the parts of a model that conform to commonly met requirements. This rotation dramatically simplifies fit evaluation in a way that becomes more potent as the size of the data set increases. We validate and evaluate the implementation using a 3-level latent regression simulation study. Then we analyze data from a state-wide child behavioral health measure administered by the Oklahoma Department of Human Services. We demonstrate the efficiency of Rampart compared to other similar software using a latent factor model with a 5-level decomposition of latent variance. Rampart is implemented in OpenMx, a free and open source software.
Interaction Models for Functional Regression.
Usset, Joseph; Staicu, Ana-Maria; Maity, Arnab
2016-02-01
A functional regression model with a scalar response and multiple functional predictors is proposed that accommodates two-way interactions in addition to their main effects. The proposed estimation procedure models the main effects using penalized regression splines, and the interaction effect by a tensor product basis. Extensions to generalized linear models and data observed on sparse grids or with measurement error are presented. A hypothesis testing procedure for the functional interaction effect is described. The proposed method can be easily implemented through existing software. Numerical studies show that fitting an additive model in the presence of interaction leads to both poor estimation performance and lost prediction power, while fitting an interaction model where there is in fact no interaction leads to negligible losses. The methodology is illustrated on the AneuRisk65 study data.
Two statistical approaches, weighted regression on time, discharge, and season and generalized additive models, have recently been used to evaluate water quality trends in estuaries. Both models have been used in similar contexts despite differences in statistical foundations and...
NASA Astrophysics Data System (ADS)
Batzias, Dimitris F.; Ifanti, Konstantina
2012-12-01
Process simulation models are usually empirical, therefore there is an inherent difficulty in serving as carriers for knowledge acquisition and technology transfer, since their parameters have no physical meaning to facilitate verification of the dependence on the production conditions; in such a case, a 'black box' regression model or a neural network might be used to simply connect input-output characteristics. In several cases, scientific/mechanismic models may be proved valid, in which case parameter identification is required to find out the independent/explanatory variables and parameters, which each parameter depends on. This is a difficult task, since the phenomenological level at which each parameter is defined is different. In this paper, we have developed a methodological framework under the form of an algorithmic procedure to solve this problem. The main parts of this procedure are: (i) stratification of relevant knowledge in discrete layers immediately adjacent to the layer that the initial model under investigation belongs to, (ii) design of the ontology corresponding to these layers, (iii) elimination of the less relevant parts of the ontology by thinning, (iv) retrieval of the stronger interrelations between the remaining nodes within the revised ontological network, and (v) parameter identification taking into account the most influential interrelations revealed in (iv). The functionality of this methodology is demonstrated by quoting two representative case examples on wastewater treatment.
Tom, Brian Dm; Su, Li; Farewell, Vernon T
2016-10-01
For semi-continuous data which are a mixture of true zeros and continuously distributed positive values, the use of two-part mixed models provides a convenient modelling framework. However, deriving population-averaged (marginal) effects from such models is not always straightforward. Su et al. presented a model that provided convenient estimation of marginal effects for the logistic component of the two-part model but the specification of marginal effects for the continuous part of the model presented in that paper was based on an incorrect formulation. We present a corrected formulation and additionally explore the use of the two-part model for inferences on the overall marginal mean, which may be of more practical relevance in our application and more generally. © The Author(s) 2013.
NASA Astrophysics Data System (ADS)
Samhouri, M.; Al-Ghandoor, A.; Fouad, R. H.
2009-08-01
In this study two techniques, for modeling electricity consumption of the Jordanian industrial sector, are presented: (i) multivariate linear regression and (ii) neuro-fuzzy models. Electricity consumption is modeled as function of different variables such as number of establishments, number of employees, electricity tariff, prevailing fuel prices, production outputs, capacity utilizations, and structural effects. It was found that industrial production and capacity utilization are the most important variables that have significant effect on future electrical power demand. The results showed that both the multivariate linear regression and neuro-fuzzy models are generally comparable and can be used adequately to simulate industrial electricity consumption. However, comparison that is based on the square root average squared error of data suggests that the neuro-fuzzy model performs slightly better for future prediction of electricity consumption than the multivariate linear regression model. Such results are in full agreement with similar work, using different methods, for other countries.
Li, Weide; Kong, Demeng; Wu, Jinran
2017-01-01
Air pollution in China is becoming more serious especially for the particular matter (PM) because of rapid economic growth and fast expansion of urbanization. To solve the growing environment problems, daily PM2.5 and PM10 concentration data form January 1, 2015, to August 23, 2016, in Kunming and Yuxi (two important cities in Yunnan Province, China) are used to present a new hybrid model CI-FPA-SVM to forecast air PM2.5 and PM10 concentration in this paper. The proposed model involves two parts. Firstly, due to its deficiency to assess the possible correlation between different variables, the cointegration theory is introduced to get the input-output relationship and then obtain the nonlinear dynamical system with support vector machine (SVM), in which the parameters c and g are optimized by flower pollination algorithm (FPA). Six benchmark models, including FPA-SVM, CI-SVM, CI-GA-SVM, CI-PSO-SVM, CI-FPA-NN, and multiple linear regression model, are considered to verify the superiority of the proposed hybrid model. The empirical study results demonstrate that the proposed model CI-FPA-SVM is remarkably superior to all considered benchmark models for its high prediction accuracy, and the application of the model for forecasting can give effective monitoring and management of further air quality.
Wu, Jinran
2017-01-01
Air pollution in China is becoming more serious especially for the particular matter (PM) because of rapid economic growth and fast expansion of urbanization. To solve the growing environment problems, daily PM2.5 and PM10 concentration data form January 1, 2015, to August 23, 2016, in Kunming and Yuxi (two important cities in Yunnan Province, China) are used to present a new hybrid model CI-FPA-SVM to forecast air PM2.5 and PM10 concentration in this paper. The proposed model involves two parts. Firstly, due to its deficiency to assess the possible correlation between different variables, the cointegration theory is introduced to get the input-output relationship and then obtain the nonlinear dynamical system with support vector machine (SVM), in which the parameters c and g are optimized by flower pollination algorithm (FPA). Six benchmark models, including FPA-SVM, CI-SVM, CI-GA-SVM, CI-PSO-SVM, CI-FPA-NN, and multiple linear regression model, are considered to verify the superiority of the proposed hybrid model. The empirical study results demonstrate that the proposed model CI-FPA-SVM is remarkably superior to all considered benchmark models for its high prediction accuracy, and the application of the model for forecasting can give effective monitoring and management of further air quality. PMID:28932237
Modeling health survey data with excessive zero and K responses.
Lin, Ting Hsiang; Tsai, Min-Hsiao
2013-04-30
Zero-inflated Poisson regression is a popular tool used to analyze data with excessive zeros. Although much work has already been performed to fit zero-inflated data, most models heavily depend on special features of the individual data. To be specific, this means that there is a sizable group of respondents who endorse the same answers making the data have peaks. In this paper, we propose a new model with the flexibility to model excessive counts other than zero, and the model is a mixture of multinomial logistic and Poisson regression, in which the multinomial logistic component models the occurrence of excessive counts, including zeros, K (where K is a positive integer) and all other values. The Poisson regression component models the counts that are assumed to follow a Poisson distribution. Two examples are provided to illustrate our models when the data have counts containing many ones and sixes. As a result, the zero-inflated and K-inflated models exhibit a better fit than the zero-inflated Poisson and standard Poisson regressions. Copyright © 2012 John Wiley & Sons, Ltd.
Mapping health outcome measures from a stroke registry to EQ-5D weights.
Ghatnekar, Ola; Eriksson, Marie; Glader, Eva-Lotta
2013-03-07
To map health outcome related variables from a national register, not part of any validated instrument, with EQ-5D weights among stroke patients. We used two cross-sectional data sets including patient characteristics, outcome variables and EQ-5D weights from the national Swedish stroke register. Three regression techniques were used on the estimation set (n=272): ordinary least squares (OLS), Tobit, and censored least absolute deviation (CLAD). The regression coefficients for "dressing", "toileting", "mobility", "mood", "general health" and "proxy-responders" were applied to the validation set (n=272), and the performance was analysed with mean absolute error (MAE) and mean square error (MSE). The number of statistically significant coefficients varied by model, but all models generated consistent coefficients in terms of sign. Mean utility was underestimated in all models (least in OLS) and with lower variation (least in OLS) compared to the observed. The maximum attainable EQ-5D weight ranged from 0.90 (OLS) to 1.00 (Tobit and CLAD). Health states with utility weights <0.5 had greater errors than those with weights ≥ 0.5 (P<0.01). This study indicates that it is possible to map non-validated health outcome measures from a stroke register into preference-based utilities to study the development of stroke care over time, and to compare with other conditions in terms of utility.
Chau, Kénora; Kabuth, Bernard; Chau, Nearkasen
2016-11-01
The risk of suicide behaviors in immigrant adolescents varies across countries and remains partly understood. We conducted a study in France to examine immigrant adolescents' likelihood of experiencing suicide ideation in the last 12 months (SI) and lifetime suicide attempts (SA) compared with their native counterparts, and the contribution of socioeconomic factors and school, behavior, and health-related difficulties. Questionnaires were completed by 1559 middle-school adolescents from north-eastern France including various risk factors, SI, SA, and their first occurrence over adolescent's life course (except SI). Data were analyzed using logistic regression models for SI and Cox regression models for SA (retaining only school, behavior, and health-related difficulties that started before SA). Immigrant adolescents had a two-time higher risk of SI and SA than their native counterparts. Using nested models, the excess SI risk was highly explained by socioeconomic factors (27%) and additional school, behavior, and health-related difficulties (24%) but remained significant. The excess SA risk was more highly explained by these issues (40% and 85%, respectively) and became non-significant. These findings demonstrate the risk patterns of SI and SA and the prominent confounding roles of socioeconomic factors and school, behavior, and health-related difficulties. They may be provided to policy makers, schools, carers, and various organizations interested in immigrant, adolescent, and suicide-behavior problems.
Two Enhancements of the Logarithmic Least-Squares Method for Analyzing Subjective Comparisons
1989-03-25
error term. 1 For this model, the total sum of squares ( SSTO ), defined as n 2 SSTO = E (yi y) i=1 can be partitioned into error and regression sums...of the regression line around the mean value. Mathematically, for the model given by equation A.4, SSTO = SSE + SSR (A.6) A-4 where SSTO is the total...sum of squares (i.e., the variance of the yi’s), SSE is error sum of squares, and SSR is the regression sum of squares. SSTO , SSE, and SSR are given
Linear regression analysis: part 14 of a series on evaluation of scientific publications.
Schneider, Astrid; Hommel, Gerhard; Blettner, Maria
2010-11-01
Regression analysis is an important statistical method for the analysis of medical data. It enables the identification and characterization of relationships among multiple factors. It also enables the identification of prognostically relevant risk factors and the calculation of risk scores for individual prognostication. This article is based on selected textbooks of statistics, a selective review of the literature, and our own experience. After a brief introduction of the uni- and multivariable regression models, illustrative examples are given to explain what the important considerations are before a regression analysis is performed, and how the results should be interpreted. The reader should then be able to judge whether the method has been used correctly and interpret the results appropriately. The performance and interpretation of linear regression analysis are subject to a variety of pitfalls, which are discussed here in detail. The reader is made aware of common errors of interpretation through practical examples. Both the opportunities for applying linear regression analysis and its limitations are presented.
Can patient comorbidities be included in clinical performance measures for radiation oncology?
Owen, Jean B; Khalid, Najma; Ho, Alex; Kachnic, Lisa A; Komaki, Ritsuko; Tao, May Lin; Currey, Adam; Wilson, J Frank
2014-05-01
Patient comorbidities may affect the applicability of performance measures that are inherent in multidisciplinary cancer treatment guidelines. This article describes the distribution of common comorbid conditions by disease site and by patient and facility characteristics in patients who received radiation therapy as part of treatment for cancer of the breast, cervix, lung, prostate, and stomach, and investigates the association of comorbidities with treatment decisions. Stratified two-stage cluster sampling provided a random sample of radiation oncology facilities. Eligible patients were randomly sampled from each participating facility for each disease site, and data were abstracted from medical records. The Adult Comorbidity Evaluation Index (ACE-27) was used to measure comorbid conditions and their severity. National estimates were calculated using SUDAAN statistical software. Multivariable logistic regression models predicted the dependent variable "treatment changed or contraindicated due to comorbidities." The final model showed that ACE-27 was highly associated with change in treatment for patients with severe or moderate index values compared to those with none or mild (P < .001). Two other covariates, age and medical coverage, had no (age) or little (medical coverage) significant contribution to predicting treatment change in the multivariable model. Disease site was associated with treatment change after adjusting for other covariates in the model. ACE-27 is highly predictive of treatment modifications for patients treated for these cancers who receive radiation as part of their care. A standardized tool identifying patients who should be excluded from clinical performance measures allows more accurate use of these measures. Copyright © 2014 by American Society of Clinical Oncology.
Can Patient Comorbidities Be Included in Clinical Performance Measures for Radiation Oncology?
Owen, Jean B.; Khalid, Najma; Ho, Alex; Kachnic, Lisa A.; Komaki, Ritsuko; Tao, May Lin; Currey, Adam; Wilson, J. Frank
2014-01-01
Purpose: Patient comorbidities may affect the applicability of performance measures that are inherent in multidisciplinary cancer treatment guidelines. This article describes the distribution of common comorbid conditions by disease site and by patient and facility characteristics in patients who received radiation therapy as part of treatment for cancer of the breast, cervix, lung, prostate, and stomach, and investigates the association of comorbidities with treatment decisions. Materials and Methods: Stratified two-stage cluster sampling provided a random sample of radiation oncology facilities. Eligible patients were randomly sampled from each participating facility for each disease site, and data were abstracted from medical records. The Adult Comorbidity Evaluation Index (ACE-27) was used to measure comorbid conditions and their severity. National estimates were calculated using SUDAAN statistical software. Results: Multivariable logistic regression models predicted the dependent variable “treatment changed or contraindicated due to comorbidities.” The final model showed that ACE-27 was highly associated with change in treatment for patients with severe or moderate index values compared to those with none or mild (P < .001). Two other covariates, age and medical coverage, had no (age) or little (medical coverage) significant contribution to predicting treatment change in the multivariable model. Disease site was associated with treatment change after adjusting for other covariates in the model. Conclusions: ACE-27 is highly predictive of treatment modifications for patients treated for these cancers who receive radiation as part of their care. A standardized tool identifying patients who should be excluded from clinical performance measures allows more accurate use of these measures. PMID:24643573
Estimating effects of limiting factors with regression quantiles
Cade, B.S.; Terrell, J.W.; Schroeder, R.L.
1999-01-01
In a recent Concepts paper in Ecology, Thomson et al. emphasized that assumptions of conventional correlation and regression analyses fundamentally conflict with the ecological concept of limiting factors, and they called for new statistical procedures to address this problem. The analytical issue is that unmeasured factors may be the active limiting constraint and may induce a pattern of unequal variation in the biological response variable through an interaction with the measured factors. Consequently, changes near the maxima, rather than at the center of response distributions, are better estimates of the effects expected when the observed factor is the active limiting constraint. Regression quantiles provide estimates for linear models fit to any part of a response distribution, including near the upper bounds, and require minimal assumptions about the form of the error distribution. Regression quantiles extend the concept of one-sample quantiles to the linear model by solving an optimization problem of minimizing an asymmetric function of absolute errors. Rank-score tests for regression quantiles provide tests of hypotheses and confidence intervals for parameters in linear models with heteroscedastic errors, conditions likely to occur in models of limiting ecological relations. We used selected regression quantiles (e.g., 5th, 10th, ..., 95th) and confidence intervals to test hypotheses that parameters equal zero for estimated changes in average annual acorn biomass due to forest canopy cover of oak (Quercus spp.) and oak species diversity. Regression quantiles also were used to estimate changes in glacier lily (Erythronium grandiflorum) seedling numbers as a function of lily flower numbers, rockiness, and pocket gopher (Thomomys talpoides fossor) activity, data that motivated the query by Thomson et al. for new statistical procedures. Both example applications showed that effects of limiting factors estimated by changes in some upper regression quantile (e.g., 90-95th) were greater than if effects were estimated by changes in the means from standard linear model procedures. Estimating a range of regression quantiles (e.g., 5-95th) provides a comprehensive description of biological response patterns for exploratory and inferential analyses in observational studies of limiting factors, especially when sampling large spatial and temporal scales.
TG study of the Li0.4Fe2.4Zn0.2O4 ferrite synthesis
NASA Astrophysics Data System (ADS)
Lysenko, E. N.; Nikolaev, E. V.; Surzhikov, A. P.
2016-02-01
In this paper, the kinetic analysis of Li-Zn ferrite synthesis was studied using thermogravimetry (TG) method through the simultaneous application of non-linear regression to several measurements run at different heating rates (multivariate non-linear regression). Using TG-curves obtained for the four heating rates and Netzsch Thermokinetics software package, the kinetic models with minimal adjustable parameters were selected to quantitatively describe the reaction of Li-Zn ferrite synthesis. It was shown that the experimental TG-curves clearly suggest a two-step process for the ferrite synthesis and therefore a model-fitting kinetic analysis based on multivariate non-linear regressions was conducted. The complex reaction was described by a two-step reaction scheme consisting of sequential reaction steps. It is established that the best results were obtained using the Yander three-dimensional diffusion model at the first stage and Ginstling-Bronstein model at the second step. The kinetic parameters for lithium-zinc ferrite synthesis reaction were found and discussed.
Can Predictive Modeling Identify Head and Neck Oncology Patients at Risk for Readmission?
Manning, Amy M; Casper, Keith A; Peter, Kay St; Wilson, Keith M; Mark, Jonathan R; Collar, Ryan M
2018-05-01
Objective Unplanned readmission within 30 days is a contributor to health care costs in the United States. The use of predictive modeling during hospitalization to identify patients at risk for readmission offers a novel approach to quality improvement and cost reduction. Study Design Two-phase study including retrospective analysis of prospectively collected data followed by prospective longitudinal study. Setting Tertiary academic medical center. Subjects and Methods Prospectively collected data for patients undergoing surgical treatment for head and neck cancer from January 2013 to January 2015 were used to build predictive models for readmission within 30 days of discharge using logistic regression, classification and regression tree (CART) analysis, and random forests. One model (logistic regression) was then placed prospectively into the discharge workflow from March 2016 to May 2016 to determine the model's ability to predict which patients would be readmitted within 30 days. Results In total, 174 admissions had descriptive data. Thirty-two were excluded due to incomplete data. Logistic regression, CART, and random forest predictive models were constructed using the remaining 142 admissions. When applied to 106 consecutive prospective head and neck oncology patients at the time of discharge, the logistic regression model predicted readmissions with a specificity of 94%, a sensitivity of 47%, a negative predictive value of 90%, and a positive predictive value of 62% (odds ratio, 14.9; 95% confidence interval, 4.02-55.45). Conclusion Prospectively collected head and neck cancer databases can be used to develop predictive models that can accurately predict which patients will be readmitted. This offers valuable support for quality improvement initiatives and readmission-related cost reduction in head and neck cancer care.
NASA Technical Reports Server (NTRS)
Cagliostro, Domenick E.; Riccitiello, Salvatore R.
1993-01-01
In the first part of this work, a model is developed for the deposition of silicon from the reduction of silicon tetrachloride with hydrogen in a tubular reactor at 700-1100 C, at atmospheric pressure. The model is based on gas chromatography of the volatile products of the reaction, followed by gravimetric analysis of total Si deposition on the tube. In the second part of this work, a model is developed for the case of SiC deposition from the pyrolysis of dichlorodimethylsilane in hydrogen under the same reactor conditions. The rate constants derived from a nonlinear regression analysis are reported.
Fang, Xingang; Bagui, Sikha; Bagui, Subhash
2017-08-01
The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Quantum regression theorem and non-Markovianity of quantum dynamics
NASA Astrophysics Data System (ADS)
Guarnieri, Giacomo; Smirne, Andrea; Vacchini, Bassano
2014-08-01
We explore the connection between two recently introduced notions of non-Markovian quantum dynamics and the validity of the so-called quantum regression theorem. While non-Markovianity of a quantum dynamics has been defined looking at the behavior in time of the statistical operator, which determines the evolution of mean values, the quantum regression theorem makes statements about the behavior of system correlation functions of order two and higher. The comparison relies on an estimate of the validity of the quantum regression hypothesis, which can be obtained exactly evaluating two-point correlation functions. To this aim we consider a qubit undergoing dephasing due to interaction with a bosonic bath, comparing the exact evaluation of the non-Markovianity measures with the violation of the quantum regression theorem for a class of spectral densities. We further study a photonic dephasing model, recently exploited for the experimental measurement of non-Markovianity. It appears that while a non-Markovian dynamics according to either definition brings with itself violation of the regression hypothesis, even Markovian dynamics can lead to a failure of the regression relation.
Determining factors influencing survival of breast cancer by fuzzy logistic regression model.
Nikbakht, Roya; Bahrampour, Abbas
2017-01-01
Fuzzy logistic regression model can be used for determining influential factors of disease. This study explores the important factors of actual predictive survival factors of breast cancer's patients. We used breast cancer data which collected by cancer registry of Kerman University of Medical Sciences during the period of 2000-2007. The variables such as morphology, grade, age, and treatments (surgery, radiotherapy, and chemotherapy) were applied in the fuzzy logistic regression model. Performance of model was determined in terms of mean degree of membership (MDM). The study results showed that almost 41% of patients were in neoplasm and malignant group and more than two-third of them were still alive after 5-year follow-up. Based on the fuzzy logistic model, the most important factors influencing survival were chemotherapy, morphology, and radiotherapy, respectively. Furthermore, the MDM criteria show that the fuzzy logistic regression have a good fit on the data (MDM = 0.86). Fuzzy logistic regression model showed that chemotherapy is more important than radiotherapy in survival of patients with breast cancer. In addition, another ability of this model is calculating possibilistic odds of survival in cancer patients. The results of this study can be applied in clinical research. Furthermore, there are few studies which applied the fuzzy logistic models. Furthermore, we recommend using this model in various research areas.
Tay, Cheryl Sihui; Sterzing, Thorsten; Lim, Chen Yen; Ding, Rui; Kong, Pui Wah
2017-05-01
This study examined (a) the strength of four individual footwear perception factors to influence the overall preference of running shoes and (b) whether these perception factors satisfied the nonmulticollinear assumption in a regression model. Running footwear must fulfill multiple functional criteria to satisfy its potential users. Footwear perception factors, such as fit and cushioning, are commonly used to guide shoe design and development, but it is unclear whether running-footwear users are able to differentiate one factor from another. One hundred casual runners assessed four running shoes on a 15-cm visual analogue scale for four footwear perception factors (fit, cushioning, arch support, and stability) as well as for overall preference during a treadmill running protocol. Diagnostic tests showed an absence of multicollinearity between factors, where values for tolerance ranged from .36 to .72, corresponding to variance inflation factors of 2.8 to 1.4. The multiple regression model of these four footwear perception variables accounted for 77.7% to 81.6% of variance in overall preference, with each factor explaining a unique part of the total variance. Casual runners were able to rate each footwear perception factor separately, thus assigning each factor a true potential to improve overall preference for the users. The results also support the use of a multiple regression model of footwear perception factors to predict overall running shoe preference. Regression modeling is a useful tool for running-shoe manufacturers to more precisely evaluate how individual factors contribute to the subjective assessment of running footwear.
González Costa, J J; Reigosa, M J; Matías, J M; Covelo, E F
2017-09-01
The aim of this study was to model the sorption and retention of Cd, Cu, Ni, Pb and Zn in soils. To that extent, the sorption and retention of these metals were studied and the soil characterization was performed separately. Multiple stepwise regression was used to produce multivariate models with linear techniques and with support vector machines, all of which included 15 explanatory variables characterizing soils. When the R-squared values are represented, two different groups are noticed. Cr, Cu and Pb sorption and retention show a higher R-squared; the most explanatory variables being humified organic matter, Al oxides and, in some cases, cation-exchange capacity (CEC). The other group of metals (Cd, Ni and Zn) shows a lower R-squared, and clays are the most explanatory variables, including a percentage of vermiculite and slime. In some cases, quartz, plagioclase or hematite percentages also show some explanatory capacity. Support Vector Machine (SVM) regression shows that the different models are not as regular as in multiple regression in terms of number of variables, the regression for nickel adsorption being the one with the highest number of variables in its optimal model. On the other hand, there are cases where the most explanatory variables are the same for two metals, as it happens with Cd and Cr adsorption. A similar adsorption mechanism is thus postulated. These patterns of the introduction of variables in the model allow us to create explainability sequences. Those which are the most similar to the selectivity sequences obtained by Covelo (2005) are Mn oxides in multiple regression and change capacity in SVM. Among all the variables, the only one that is explanatory for all the metals after applying the maximum parsimony principle is the percentage of sand in the retention process. In the competitive model arising from the aforementioned sequences, the most intense competitiveness for the adsorption and retention of different metals appears between Cr and Cd, Cu and Zn in multiple regression; and between Cr and Cd in SVM regression. Copyright © 2017 Elsevier B.V. All rights reserved.
Strand, Matthew; Sillau, Stefan; Grunwald, Gary K; Rabinovitch, Nathan
2014-02-10
Regression calibration provides a way to obtain unbiased estimators of fixed effects in regression models when one or more predictors are measured with error. Recent development of measurement error methods has focused on models that include interaction terms between measured-with-error predictors, and separately, methods for estimation in models that account for correlated data. In this work, we derive explicit and novel forms of regression calibration estimators and associated asymptotic variances for longitudinal models that include interaction terms, when data from instrumental and unbiased surrogate variables are available but not the actual predictors of interest. The longitudinal data are fit using linear mixed models that contain random intercepts and account for serial correlation and unequally spaced observations. The motivating application involves a longitudinal study of exposure to two pollutants (predictors) - outdoor fine particulate matter and cigarette smoke - and their association in interactive form with levels of a biomarker of inflammation, leukotriene E4 (LTE 4 , outcome) in asthmatic children. Because the exposure concentrations could not be directly observed, we used measurements from a fixed outdoor monitor and urinary cotinine concentrations as instrumental variables, and we used concentrations of fine ambient particulate matter and cigarette smoke measured with error by personal monitors as unbiased surrogate variables. We applied the derived regression calibration methods to estimate coefficients of the unobserved predictors and their interaction, allowing for direct comparison of toxicity of the different pollutants. We used simulations to verify accuracy of inferential methods based on asymptotic theory. Copyright © 2013 John Wiley & Sons, Ltd.
Multivariate decoding of brain images using ordinal regression.
Doyle, O M; Ashburner, J; Zelaya, F O; Williams, S C R; Mehta, M A; Marquand, A F
2013-11-01
Neuroimaging data are increasingly being used to predict potential outcomes or groupings, such as clinical severity, drug dose response, and transitional illness states. In these examples, the variable (target) we want to predict is ordinal in nature. Conventional classification schemes assume that the targets are nominal and hence ignore their ranked nature, whereas parametric and/or non-parametric regression models enforce a metric notion of distance between classes. Here, we propose a novel, alternative multivariate approach that overcomes these limitations - whole brain probabilistic ordinal regression using a Gaussian process framework. We applied this technique to two data sets of pharmacological neuroimaging data from healthy volunteers. The first study was designed to investigate the effect of ketamine on brain activity and its subsequent modulation with two compounds - lamotrigine and risperidone. The second study investigates the effect of scopolamine on cerebral blood flow and its modulation using donepezil. We compared ordinal regression to multi-class classification schemes and metric regression. Considering the modulation of ketamine with lamotrigine, we found that ordinal regression significantly outperformed multi-class classification and metric regression in terms of accuracy and mean absolute error. However, for risperidone ordinal regression significantly outperformed metric regression but performed similarly to multi-class classification both in terms of accuracy and mean absolute error. For the scopolamine data set, ordinal regression was found to outperform both multi-class and metric regression techniques considering the regional cerebral blood flow in the anterior cingulate cortex. Ordinal regression was thus the only method that performed well in all cases. Our results indicate the potential of an ordinal regression approach for neuroimaging data while providing a fully probabilistic framework with elegant approaches for model selection. Copyright © 2013. Published by Elsevier Inc.
Lenters, Virissa; Portengen, Lützen; Rignell-Hydbom, Anna; Jönsson, Bo A.G.; Lindh, Christian H.; Piersma, Aldert H.; Toft, Gunnar; Bonde, Jens Peter; Heederik, Dick; Rylander, Lars; Vermeulen, Roel
2015-01-01
Background Some legacy and emerging environmental contaminants are suspected risk factors for intrauterine growth restriction. However, the evidence is equivocal, in part due to difficulties in disentangling the effects of mixtures. Objectives We assessed associations between multiple correlated biomarkers of environmental exposure and birth weight. Methods We evaluated a cohort of 1,250 term (≥ 37 weeks gestation) singleton infants, born to 513 mothers from Greenland, 180 from Poland, and 557 from Ukraine, who were recruited during antenatal care visits in 2002‒2004. Secondary metabolites of diethylhexyl and diisononyl phthalates (DEHP, DiNP), eight perfluoroalkyl acids, and organochlorines (PCB-153 and p,p´-DDE) were quantifiable in 72‒100% of maternal serum samples. We assessed associations between exposures and term birth weight, adjusting for co-exposures and covariates, including prepregnancy body mass index. To identify independent associations, we applied the elastic net penalty to linear regression models. Results Two phthalate metabolites (MEHHP, MOiNP), perfluorooctanoic acid (PFOA), and p,p´-DDE were most consistently predictive of term birth weight based on elastic net penalty regression. In an adjusted, unpenalized regression model of the four exposures, 2-SD increases in natural log–transformed MEHHP, PFOA, and p,p´-DDE were associated with lower birth weight: –87 g (95% CI: –137, –340 per 1.70 ng/mL), –43 g (95% CI: –108, 23 per 1.18 ng/mL), and –135 g (95% CI: –192, –78 per 1.82 ng/g lipid), respectively; and MOiNP was associated with higher birth weight (46 g; 95% CI: –5, 97 per 2.22 ng/mL). Conclusions This study suggests that several of the environmental contaminants, belonging to three chemical classes, may be independently associated with impaired fetal growth. These results warrant follow-up in other cohorts. Citation Lenters V, Portengen L, Rignell-Hydbom A, Jönsson BA, Lindh CH, Piersma AH, Toft G, Bonde JP, Heederik D, Rylander L, Vermeulen R. 2016. Prenatal phthalate, perfluoroalkyl acid, and organochlorine exposures and term birth weight in three birth cohorts: multi-pollutant models based on elastic net regression. Environ Health Perspect 124:365–372; http://dx.doi.org/10.1289/ehp.1408933 PMID:26115335
Innovating patient care delivery: DSRIP's interrupted time series analysis paradigm.
Shenoy, Amrita G; Begley, Charles E; Revere, Lee; Linder, Stephen H; Daiger, Stephen P
2017-12-08
Adoption of Medicaid Section 1115 waiver is one of the many ways of innovating healthcare delivery system. The Delivery System Reform Incentive Payment (DSRIP) pool, one of the two funding pools of the waiver has four categories viz. infrastructure development, program innovation and redesign, quality improvement reporting and lastly, bringing about population health improvement. A metric of the fourth category, preventable hospitalization (PH) rate was analyzed in the context of eight conditions for two time periods, pre-reporting years (2010-2012) and post-reporting years (2013-2015) for two hospital cohorts, DSRIP participating and non-participating hospitals. The study explains how DSRIP impacted Preventable Hospitalization (PH) rates of eight conditions for both hospital cohorts within two time periods. Eight PH rates were regressed as the dependent variable with time, intervention and post-DSRIP Intervention as independent variables. PH rates of eight conditions were then consolidated into one rate for regressing with the above independent variables to evaluate overall impact of DSRIP. An interrupted time series regression was performed after accounting for auto-correlation, stationarity and seasonality in the dataset. In the individual regression model, PH rates showed statistically significant coefficients for seven out of eight conditions in DSRIP participating hospitals. In the combined regression model, the coefficient of the PH rate showed a statistically significant decrease with negative p-values for regression coefficients in DSRIP participating hospitals compared to positive/increased p-values for regression coefficients in DSRIP non-participating hospitals. Several macro- and micro-level factors may have likely contributed DSRIP hospitals outperforming DSRIP non-participating hospitals. Healthcare organization/provider collaboration, support from healthcare professionals, DSRIP's design, state reimbursement and coordination in care delivery methods may have led to likely success of DSRIP. IV, a retrospective cohort study based on longitudinal data. Copyright © 2017 Elsevier Inc. All rights reserved.
In silico study of in vitro GPCR assays by QSAR modeling ...
The U.S. EPA is screening thousands of chemicals of environmental interest in hundreds of in vitro high-throughput screening (HTS) assays (the ToxCast program). One goal is to prioritize chemicals for more detailed analyses based on activity in molecular initiating events (MIE) of adverse outcome pathways (AOPs). However, the chemical space of interest for environmental exposure is much wider than this set of chemicals. Thus, there is a need to fill data gaps with in silico methods, and quantitative structure-activity relationships (QSARs) are a proven and cost effective approach to predict biological activity. ToxCast in turn provides relatively large datasets that are ideal for training and testing QSAR models. The overall goal of the study described here was to develop QSAR models to fill the data gaps in a larger environmental database of ~32k structures. The specific aim of the current work was to build QSAR models for 18 G-Protein Coupled Receptor (GPCR) assays, part of the aminergic category. Two QSAR modeling strategies were adopted: classification models were developed to separate chemicals into active/non-active classes, and then regression models were built to predict the potency values of the bioassays for the active chemicals. Multiple software programs were used to calculate constitutional, topological and substructural molecular descriptors from two-dimensional (2D) chemical structures. Model-fitting methods included PLSDA (partial least squares d
Area-to-point regression kriging for pan-sharpening
NASA Astrophysics Data System (ADS)
Wang, Qunming; Shi, Wenzhong; Atkinson, Peter M.
2016-04-01
Pan-sharpening is a technique to combine the fine spatial resolution panchromatic (PAN) band with the coarse spatial resolution multispectral bands of the same satellite to create a fine spatial resolution multispectral image. In this paper, area-to-point regression kriging (ATPRK) is proposed for pan-sharpening. ATPRK considers the PAN band as the covariate. Moreover, ATPRK is extended with a local approach, called adaptive ATPRK (AATPRK), which fits a regression model using a local, non-stationary scheme such that the regression coefficients change across the image. The two geostatistical approaches, ATPRK and AATPRK, were compared to the 13 state-of-the-art pan-sharpening approaches summarized in Vivone et al. (2015) in experiments on three separate datasets. ATPRK and AATPRK produced more accurate pan-sharpened images than the 13 benchmark algorithms in all three experiments. Unlike the benchmark algorithms, the two geostatistical solutions precisely preserved the spectral properties of the original coarse data. Furthermore, ATPRK can be enhanced by a local scheme in AATRPK, in cases where the residuals from a global regression model are such that their spatial character varies locally.
NASA Astrophysics Data System (ADS)
Schlechtingen, Meik; Ferreira Santos, Ilmar
2011-07-01
This paper presents the research results of a comparison of three different model based approaches for wind turbine fault detection in online SCADA data, by applying developed models to five real measured faults and anomalies. The regression based model as the simplest approach to build a normal behavior model is compared to two artificial neural network based approaches, which are a full signal reconstruction and an autoregressive normal behavior model. Based on a real time series containing two generator bearing damages the capabilities of identifying the incipient fault prior to the actual failure are investigated. The period after the first bearing damage is used to develop the three normal behavior models. The developed or trained models are used to investigate how the second damage manifests in the prediction error. Furthermore the full signal reconstruction and the autoregressive approach are applied to further real time series containing gearbox bearing damages and stator temperature anomalies. The comparison revealed all three models being capable of detecting incipient faults. However, they differ in the effort required for model development and the remaining operational time after first indication of damage. The general nonlinear neural network approaches outperform the regression model. The remaining seasonality in the regression model prediction error makes it difficult to detect abnormality and leads to increased alarm levels and thus a shorter remaining operational period. For the bearing damages and the stator anomalies under investigation the full signal reconstruction neural network gave the best fault visibility and thus led to the highest confidence level.
Analytics of Radioactive Materials Released in the Fukushima Daiichi Nuclear Accident
DOE Office of Scientific and Technical Information (OSTI.GOV)
Egarievwe, Stephen U.; Nuclear Engineering Department, University of Tennessee, Knoxville, TN; Coble, Jamie B.
The 2011 Fukushima Daiichi nuclear accident in Japan resulted in the release of radioactive materials into the atmosphere, the nearby sea, and the surrounding land. Following the accident, several meteorological models were used to predict the transport of the radioactive materials to other continents such as North America and Europe. Also of high importance is the dispersion of radioactive materials locally and within Japan. Based on the International Atomic Energy Agency (IAEA) Convention on Early Notification of a nuclear accident, several radiological data sets were collected on the accident by the Japanese authorities. Among the radioactive materials monitored, are I-131more » and Cs-137 which form the major contributions to the contamination of drinking water. The radiation dose in the atmosphere was also measured. It is impractical to measure contamination and radiation dose in every place of interest. Therefore, modeling helps to predict contamination and radiation dose. Some modeling studies that have been reported in the literature include the simulation of transport and deposition of I-131 and Cs-137 from the accident, Cs-137 deposition and contamination of Japanese soils, and preliminary estimates of I-131 and Cs-137 discharged from the plant into the atmosphere. In this paper, we present statistical analytics of I-131 and Cs-137 with the goal of predicting gamma dose from the Fukushima Daiichi nuclear accident. The data sets used in our study were collected from the IAEA Fukushima Monitoring Database. As part of this study, we investigated several regression models to find the best algorithm for modeling the gamma dose. The modeling techniques used in our study include linear regression, principal component regression (PCR), partial least square (PLS) regression, and ridge regression. Our preliminary results on the first set of data showed that the linear regression model with one variable was the best with a root mean square error of 0.0133 μSv/h, compared to 0.0210 μSv/h for PCR, 0.231 μSv/h for ridge regression L-curve, 0.0856 μSv/h for PLS, and 0.0860 μSv/h for ridge regression cross validation. Complete results using the full datasets for these models will also be presented. (authors)« less
Granato, Gregory E.
2006-01-01
The Kendall-Theil Robust Line software (KTRLine-version 1.0) is a Visual Basic program that may be used with the Microsoft Windows operating system to calculate parameters for robust, nonparametric estimates of linear-regression coefficients between two continuous variables. The KTRLine software was developed by the U.S. Geological Survey, in cooperation with the Federal Highway Administration, for use in stochastic data modeling with local, regional, and national hydrologic data sets to develop planning-level estimates of potential effects of highway runoff on the quality of receiving waters. The Kendall-Theil robust line was selected because this robust nonparametric method is resistant to the effects of outliers and nonnormality in residuals that commonly characterize hydrologic data sets. The slope of the line is calculated as the median of all possible pairwise slopes between points. The intercept is calculated so that the line will run through the median of input data. A single-line model or a multisegment model may be specified. The program was developed to provide regression equations with an error component for stochastic data generation because nonparametric multisegment regression tools are not available with the software that is commonly used to develop regression models. The Kendall-Theil robust line is a median line and, therefore, may underestimate total mass, volume, or loads unless the error component or a bias correction factor is incorporated into the estimate. Regression statistics such as the median error, the median absolute deviation, the prediction error sum of squares, the root mean square error, the confidence interval for the slope, and the bias correction factor for median estimates are calculated by use of nonparametric methods. These statistics, however, may be used to formulate estimates of mass, volume, or total loads. The program is used to read a two- or three-column tab-delimited input file with variable names in the first row and data in subsequent rows. The user may choose the columns that contain the independent (X) and dependent (Y) variable. A third column, if present, may contain metadata such as the sample-collection location and date. The program screens the input files and plots the data. The KTRLine software is a graphical tool that facilitates development of regression models by use of graphs of the regression line with data, the regression residuals (with X or Y), and percentile plots of the cumulative frequency of the X variable, Y variable, and the regression residuals. The user may individually transform the independent and dependent variables to reduce heteroscedasticity and to linearize data. The program plots the data and the regression line. The program also prints model specifications and regression statistics to the screen. The user may save and print the regression results. The program can accept data sets that contain up to about 15,000 XY data points, but because the program must sort the array of all pairwise slopes, the program may be perceptibly slow with data sets that contain more than about 1,000 points.
Large unbalanced credit scoring using Lasso-logistic regression ensemble.
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data.
NASA Technical Reports Server (NTRS)
2004-01-01
The grant closure report is organized in the following four chapters: Chapter describes the two research areas Design optimization and Solid mechanics. Ten journal publications are listed in the second chapter. Five highlights is the subject matter of chapter three. CHAPTER 1. The Design Optimization Test Bed CometBoards. CHAPTER 2. Solid Mechanics: Integrated Force Method of Analysis. CHAPTER 3. Five Highlights: Neural Network and Regression Methods Demonstrated in the Design Optimization of a Subsonic Aircraft. Neural Network and Regression Soft Model Extended for PX-300 Aircraft Engine. Engine with Regression and Neural Network Approximators Designed. Cascade Optimization Strategy with Neural network and Regression Approximations Demonstrated on a Preliminary Aircraft Engine Design. Neural Network and Regression Approximations Used in Aircraft Design.
A surrogate model for thermal characteristics of stratospheric airship
NASA Astrophysics Data System (ADS)
Zhao, Da; Liu, Dongxu; Zhu, Ming
2018-06-01
A simple and accurate surrogate model is extremely needed to reduce the analysis complexity of thermal characteristics for a stratospheric airship. In this paper, a surrogate model based on the Least Squares Support Vector Regression (LSSVR) is proposed. The Gravitational Search Algorithm (GSA) is used to optimize hyper parameters. A novel framework consisting of a preprocessing classifier and two regression models is designed to train the surrogate model. Various temperature datasets of the airship envelope and the internal gas are obtained by a three-dimensional transient model for thermal characteristics. Using these thermal datasets, two-factor and multi-factor surrogate models are trained and several comparison simulations are conducted. Results illustrate that the surrogate models based on LSSVR-GSA have good fitting and generalization abilities. The pre-treated classification strategy proposed in this paper plays a significant role in improving the accuracy of the surrogate model.
Susan L. King
2003-01-01
The performance of two classifiers, logistic regression and neural networks, are compared for modeling noncatastrophic individual tree mortality for 21 species of trees in West Virginia. The output of the classifier is usually a continuous number between 0 and 1. A threshold is selected between 0 and 1 and all of the trees below the threshold are classified as...
Fonseca, Maria de Jesus Mendes da; Juvanhol, Leidjaira Lopes; Rotenberg, Lúcia; Nobre, Aline Araújo; Griep, Rosane Härter; Alves, Márcia Guimarães de Mello; Cardoso, Letícia de Oliveira; Giatti, Luana; Nunes, Maria Angélica; Aquino, Estela M L; Chor, Dóra
2017-11-17
This paper explores the association between job strain and adiposity, using two statistical analysis approaches and considering the role of gender. The research evaluated 11,960 active baseline participants (2008-2010) in the ELSA-Brasil study. Job strain was evaluated through a demand-control questionnaire, while body mass index (BMI) and waist circumference (WC) were evaluated in continuous form. The associations were estimated using gamma regression models with an identity link function. Quantile regression models were also estimated from the final set of co-variables established by gamma regression. The relationship that was found varied by analytical approach and gender. Among the women, no association was observed between job strain and adiposity in the fitted gamma models. In the quantile models, a pattern of increasing effects of high strain was observed at higher BMI and WC distribution quantiles. Among the men, high strain was associated with adiposity in the gamma regression models. However, when quantile regression was used, that association was found not to be homogeneous across outcome distributions. In addition, in the quantile models an association was observed between active jobs and BMI. Our results point to an association between job strain and adiposity, which follows a heterogeneous pattern. Modelling strategies can produce different results and should, accordingly, be used to complement one another.
Jang, Dae -Heung; Anderson-Cook, Christine Michaela
2016-11-22
With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jang, Dae -Heung; Anderson-Cook, Christine Michaela
With many predictors in regression, fitting the full model can induce multicollinearity problems. Least Absolute Shrinkage and Selection Operation (LASSO) is useful when the effects of many explanatory variables are sparse in a high-dimensional dataset. Influential points can have a disproportionate impact on the estimated values of model parameters. Here, this paper describes a new influence plot that can be used to increase understanding of the contributions of individual observations and the robustness of results. This can serve as a complement to other regression diagnostics techniques in the LASSO regression setting. Using this influence plot, we can find influential pointsmore » and their impact on shrinkage of model parameters and model selection. Lastly, we provide two examples to illustrate the methods.« less
Real-time quality monitoring in debutanizer column with regression tree and ANFIS
NASA Astrophysics Data System (ADS)
Siddharth, Kumar; Pathak, Amey; Pani, Ajaya Kumar
2018-05-01
A debutanizer column is an integral part of any petroleum refinery. Online composition monitoring of debutanizer column outlet streams is highly desirable in order to maximize the production of liquefied petroleum gas. In this article, data-driven models for debutanizer column are developed for real-time composition monitoring. The dataset used has seven process variables as inputs and the output is the butane concentration in the debutanizer column bottom product. The input-output dataset is divided equally into a training (calibration) set and a validation (testing) set. The training set data were used to develop fuzzy inference, adaptive neuro fuzzy (ANFIS) and regression tree models for the debutanizer column. The accuracy of the developed models were evaluated by simulation of the models with the validation dataset. It is observed that the ANFIS model has better estimation accuracy than other models developed in this work and many data-driven models proposed so far in the literature for the debutanizer column.
Spatial variation of natural radiation and childhood leukaemia incidence in Great Britain.
Richardson, S; Monfort, C; Green, M; Draper, G; Muirhead, C
This paper describes an analysis of the geographical variation of childhood leukaemia incidence in Great Britain over a 15 year period in relation to natural radiation (gamma and radon). Data at the level of the 459 district level local authorities in England, Wales and regional districts in Scotland are analysed in two complementary ways: first, by Poisson regressions with the inclusion of environmental covariates and a smooth spatial structure; secondly, by a hierarchical Bayesian model in which extra-Poisson variability is modelled explicitly in terms of spatial and non-spatial components. From this analysis, we deduce a strong indication that a main part of the variability is accounted for by a local neighbourhood 'clustering' structure. This structure is furthermore relatively stable over the 15 year period for the lymphocytic leukaemias which make up the majority of observed cases. We found no evidence of a positive association of childhood leukaemia incidence with outdoor or indoor gamma radiation levels. There is no consistent evidence of any association with radon levels. Indeed, in the Poisson regressions, a significant positive association was only observed for one 5-year period, a result which is not compatible with a stable environmental effect. Moreover, this positive association became clearly non-significant when over-dispersion relative to the Poisson distribution was taken into account.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test ofmore » the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.« less
NASA Astrophysics Data System (ADS)
Ghazali, Amirul Syafiq Mohd; Ali, Zalila; Noor, Norlida Mohd; Baharum, Adam
2015-10-01
Multinomial logistic regression is widely used to model the outcomes of a polytomous response variable, a categorical dependent variable with more than two categories. The model assumes that the conditional mean of the dependent categorical variables is the logistic function of an affine combination of predictor variables. Its procedure gives a number of logistic regression models that make specific comparisons of the response categories. When there are q categories of the response variable, the model consists of q-1 logit equations which are fitted simultaneously. The model is validated by variable selection procedures, tests of regression coefficients, a significant test of the overall model, goodness-of-fit measures, and validation of predicted probabilities using odds ratio. This study used the multinomial logistic regression model to investigate obesity and overweight among primary school students in a rural area on the basis of their demographic profiles, lifestyles and on the diet and food intake. The results indicated that obesity and overweight of students are related to gender, religion, sleep duration, time spent on electronic games, breakfast intake in a week, with whom meals are taken, protein intake, and also, the interaction between breakfast intake in a week with sleep duration, and the interaction between gender and protein intake.
2012-01-01
Background Route environments may influence people's active commuting positively and thereby contribute to public health. Assessments of route environments are, however, needed in order to better understand the possible relationship between active commuting and the route environment. The aim of this study was, therefore, to assess the potential associations between perceptions of whether the route environment on the whole hinders or stimulates bicycle commuting and perceptions of environmental factors. Methods The Active Commuting Route Environment Scale (ACRES) was used for the assessment of bicycle commuters' perceptions of their route environments in the inner urban parts of Greater Stockholm, Sweden. Bicycle commuters (n = 827) were recruited by advertisements in newspapers. Simultaneous multiple regression analyses were used to assess the relation between predictor variables (such as levels of exhaust fumes, noise, traffic speed, traffic congestion and greenery) and the outcome variable (hindering - stimulating route environments). Two models were run, (Model 1) without and (Model 2) with the item traffic: unsafe or safe included as a predictor. Results Overall, about 40% of the variance of hindering - stimulating route environments was explained by the environmental predictors in our models (Model 1, R2 = 0.415, and Model 2, R 2= 0.435). The regression equation for Model 1 was: y = 8.53 + 0.33 ugly or beautiful + 0.14 greenery + (-0.14) course of the route + (-0.13) exhaust fumes + (-0.09) congestion: all types of vehicles (p ≤ 0.019). The regression equation for Model 2 was y = 6.55 + 0.31 ugly or beautiful + 0.16 traffic: unsafe or safe + (-0.13) exhaust fumes + 0.12 greenery + (-0.12) course of the route (p ≤ 0.001). Conclusions The main results indicate that beautiful, green and safe route environments seem to be, independently of each other, stimulating factors for bicycle commuting in inner urban areas. On the other hand, exhaust fumes, traffic congestion and low 'directness' of the route seem to be hindering factors. Furthermore, the overall results illustrate the complexity of a research area at the beginning of exploration. PMID:22401492
Madarang, Krish J; Kang, Joo-Hyon
2014-06-01
Stormwater runoff has been identified as a source of pollution for the environment, especially for receiving waters. In order to quantify and manage the impacts of stormwater runoff on the environment, predictive models and mathematical models have been developed. Predictive tools such as regression models have been widely used to predict stormwater discharge characteristics. Storm event characteristics, such as antecedent dry days (ADD), have been related to response variables, such as pollutant loads and concentrations. However it has been a controversial issue among many studies to consider ADD as an important variable in predicting stormwater discharge characteristics. In this study, we examined the accuracy of general linear regression models in predicting discharge characteristics of roadway runoff. A total of 17 storm events were monitored in two highway segments, located in Gwangju, Korea. Data from the monitoring were used to calibrate United States Environmental Protection Agency's Storm Water Management Model (SWMM). The calibrated SWMM was simulated for 55 storm events, and the results of total suspended solid (TSS) discharge loads and event mean concentrations (EMC) were extracted. From these data, linear regression models were developed. R(2) and p-values of the regression of ADD for both TSS loads and EMCs were investigated. Results showed that pollutant loads were better predicted than pollutant EMC in the multiple regression models. Regression may not provide the true effect of site-specific characteristics, due to uncertainty in the data. Copyright © 2014 The Research Centre for Eco-Environmental Sciences, Chinese Academy of Sciences. Published by Elsevier B.V. All rights reserved.
Anderson, Chauncey W.; Rounds, Stewart A.
2010-01-01
Management of water quality in streams of the United States is becoming increasingly complex as regulators seek to control aquatic pollution and ecological problems through Total Maximum Daily Load programs that target reductions in the concentrations of certain constituents. Sediment, nutrients, and bacteria, for example, are constituents that regulators target for reduction nationally and in the Tualatin River basin, Oregon. These constituents require laboratory analysis of discrete samples for definitive determinations of concentrations in streams. Recent technological advances in the nearly continuous, in situ monitoring of related water-quality parameters has fostered the use of these parameters as surrogates for the labor intensive, laboratory-analyzed constituents. Although these correlative techniques have been successful in large rivers, it was unclear whether they could be applied successfully in tributaries of the Tualatin River, primarily because these streams tend to be small, have rapid hydrologic response to rainfall and high streamflow variability, and may contain unique sources of sediment, nutrients, and bacteria. This report evaluates the feasibility of developing correlative regression models for predicting dependent variables (concentrations of total suspended solids, total phosphorus, and Escherichia coli bacteria) in two Tualatin River basin streams: one draining highly urbanized land (Fanno Creek near Durham, Oregon) and one draining rural agricultural land (Dairy Creek at Highway 8 near Hillsboro, Oregon), during 2002-04. An important difference between these two streams is their response to storm runoff; Fanno Creek has a relatively rapid response due to extensive upstream impervious areas and Dairy Creek has a relatively slow response because of the large amount of undeveloped upstream land. Four other stream sites also were evaluated, but in less detail. Potential explanatory variables included continuously monitored streamflow (discharge), stream stage, specific conductance, turbidity, and time (to account for seasonal processes). Preliminary multiple-regression models were identified using stepwise regression and Mallow's Cp, which maximizes regression correlation coefficients and accounts for the loss of additional degrees of freedom when extra explanatory variables are used. Several data scenarios were created and evaluated for each site to assess the representativeness of existing monitoring data and autosampler-derived data, and to assess the utility of the available data to develop robust predictive models. The goodness-of-fit of candidate predictive models was assessed with diagnostic statistics from validation exercises that compared predictions against a subset of the available data. The regression modeling met with mixed success. Functional model forms that have a high likelihood of success were identified for most (but not all) dependent variables at each site, but there were limitations in the available datasets, notably the lack of samples from high-flows. These limitations increase the uncertainty in the predictions of the models and suggest that the models are not yet ready for use in assessing these streams, particularly under high-flow conditions, without additional data collection and recalibration of model coefficients. Nonetheless, the results reveal opportunities to use existing resources more efficiently. Baseline conditions are well represented in the available data, and, for the most part, the models reproduced these conditions well. Future sampling might therefore focus on high flow conditions, without much loss of ability to characterize the baseline. Seasonal cycles, as represented by trigonometric functions of time, were not significant in the evaluated models, perhaps because the baseline conditions are well characterized in the datasets or because the other explanatory variables indirectly incorporate seasonal aspects. Multicollinearity among independent variabl
Sharma, Ashok K; Srivastava, Gopal N; Roy, Ankita; Sharma, Vineet K
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84-0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better ( R 2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better ( R 2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules.
Sharma, Ashok K.; Srivastava, Gopal N.; Roy, Ankita; Sharma, Vineet K.
2017-01-01
The experimental methods for the prediction of molecular toxicity are tedious and time-consuming tasks. Thus, the computational approaches could be used to develop alternative methods for toxicity prediction. We have developed a tool for the prediction of molecular toxicity along with the aqueous solubility and permeability of any molecule/metabolite. Using a comprehensive and curated set of toxin molecules as a training set, the different chemical and structural based features such as descriptors and fingerprints were exploited for feature selection, optimization and development of machine learning based classification and regression models. The compositional differences in the distribution of atoms were apparent between toxins and non-toxins, and hence, the molecular features were used for the classification and regression. On 10-fold cross-validation, the descriptor-based, fingerprint-based and hybrid-based classification models showed similar accuracy (93%) and Matthews's correlation coefficient (0.84). The performances of all the three models were comparable (Matthews's correlation coefficient = 0.84–0.87) on the blind dataset. In addition, the regression-based models using descriptors as input features were also compared and evaluated on the blind dataset. Random forest based regression model for the prediction of solubility performed better (R2 = 0.84) than the multi-linear regression (MLR) and partial least square regression (PLSR) models, whereas, the partial least squares based regression model for the prediction of permeability (caco-2) performed better (R2 = 0.68) in comparison to the random forest and MLR based regression models. The performance of final classification and regression models was evaluated using the two validation datasets including the known toxins and commonly used constituents of health products, which attests to its accuracy. The ToxiM web server would be a highly useful and reliable tool for the prediction of toxicity, solubility, and permeability of small molecules. PMID:29249969
Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.
2008-01-01
Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
Urban, H-Jörg; Tricker, Anthony R; Leyden, Donald E; Forte, Natasa; Zenzen, Volker; Feuersenger, Astrid; Assink, Mareike; Kallischnigg, Gerd; Schorp, Matthias K
2012-11-01
A modeling approach termed 'nicotine bridging' is presented to estimate exposure to mainstream smoke constituents. The method is based on: (1) determination of harmful and potentially harmful constituents (HPHC) and in vitro toxicity parameter-to-nicotine regressions obtained using multiple machine-smoking protocols, (2) nicotine uptake distributions determined from 24-h excretion of nicotine metabolites in a clinical study, and (3) modeled HPHC uptake distributions using steps 1 and 2. An example of 'nicotine bridging' is provided, using a subset of the data reported in Part 2 of this supplement (Zenzen et al., 2012) for two conventional lit-end cigarettes (CC) and the Electrically Heated Cigarette Smoking System (EHCSS) series-K6 cigarette. The bridging method provides justified extrapolations of HPHC exposure distributions that cannot be obtained for smoke constituents due to the lack of specific biomarkers of exposure to cigarette smoke constituents in clinical evaluations. Using this modeling approach, exposure reduction is evident when the HPHC exposure distribution curves between the MRTP and the CC users are substantially separated with little or no overlap between the distribution curves. Copyright © 2012 Elsevier Inc. All rights reserved.
Bootstrap investigation of the stability of a Cox regression model.
Altman, D G; Andersen, P K
1989-07-01
We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.
Validation of a heteroscedastic hazards regression model.
Wu, Hong-Dar Isaac; Hsieh, Fushing; Chen, Chen-Hsin
2002-03-01
A Cox-type regression model accommodating heteroscedasticity, with a power factor of the baseline cumulative hazard, is investigated for analyzing data with crossing hazards behavior. Since the approach of partial likelihood cannot eliminate the baseline hazard, an overidentified estimating equation (OEE) approach is introduced in the estimation procedure. It by-product, a model checking statistic, is presented to test for the overall adequacy of the heteroscedastic model. Further, under the heteroscedastic model setting, we propose two statistics to test the proportional hazards assumption. Implementation of this model is illustrated in a data analysis of a cancer clinical trial.
Quill mites in Brazilian psittacine birds (Aves: Psittaciformes).
Jardim, Cassius Catão Gomes; Cunha, Lucas Maciel; Rezende, Leandro do Carmo; Teixeira, Cristina Mara; Martins, Nelson Rodrigo da Silva; de Oliveira, Paulo Roberto; Leite, Romário Cerqueira; Faccini, João Luiz Horácio; Leite, Rômulo Cerqueira
2012-09-01
The primary and secondary feathers of 170 Brazilian psittacine birds (Aves: Psittaciformes) were examined in order to identify feather quill mite fauna. Birds were held captive in two locations in the state of Minas Gerais (MG), and two in the state of Espirito Santo (ES). The quills were cut longitudinally and were examined under optical microscopy. The genus of quill mites most frequently found was Paralgopsis (Astigmata: Pyrogliphidae), followed by Cystoidosoma (Astigmata: Syringobiidae). Astigmata: Syringophilidae mites were sporadically observed. After analyzing the data using logistic regression models, it was determined that there was higher infestation risk for psittacines in ES state, as compared with those in MG, and a significant increase in risk depending on the psittacine host species. However, the location of captivity did not have a significant effect. Lesions were observed in infested feathers. Cystoidosoma sp. and Paralgopsis sp. were always observed together, with parts of Paralgopsis found inside Cystoidosoma sp., suggesting thanatochresis or predation.
Lui, Kung-Jong; Chang, Kuang-Chao
2015-01-01
When comparing two doses of a new drug with a placebo, we may consider using a crossover design subject to the condition that the high dose cannot be administered before the low dose. Under a random-effects logistic regression model, we focus our attention on dichotomous responses when the high dose cannot be used first under a three-period crossover trial. We derive asymptotic test procedures for testing equality between treatments. We further derive interval estimators to assess the magnitude of the relative treatment effects. We employ Monte Carlo simulation to evaluate the performance of these test procedures and interval estimators in a variety of situations. We use the data taken as a part of trial comparing two different doses of an analgesic with a placebo for the relief of primary dysmenorrhea to illustrate the use of the proposed test procedures and estimators.
Cognitive and Social Functioning Correlates of Employment Among People with Severe Mental Illness.
Saavedra, Javier; López, Marcelino; González, Sergio; Arias, Samuel; Crawford, Paul
2016-10-01
We assess how social and cognitive functioning is associated to gaining employment for 213 people diagnosed with severe mental illness taking part in employment programs in Andalusia (Spain). We used the Repeatable Battery for the Assessment of Neuropsychological Status and the Social Functioning Scale and conducted two binary logistical regression analyses. Response variables were: having a job or not, in ordinary companies (OCs) and social enterprises, and working in an OC or not. There were two variables with significant adjusted odds ratios for having a job: "attention" and "Educational level". There were five variables with significant odds ratios for having a job in an OC: "Sex", "Educational level", "Attention", "Communication", and "Independence-competence". The study looks at the possible benefits of combining employment with support and social enterprises in employment programs for these people and underlines how both social and cognitive functioning are central to developing employment models.
A multimodel approach to interannual and seasonal prediction of Danube discharge anomalies
NASA Astrophysics Data System (ADS)
Rimbu, Norel; Ionita, Monica; Patrut, Simona; Dima, Mihai
2010-05-01
Interannual and seasonal predictability of Danube river discharge is investigated using three model types: 1) time series models 2) linear regression models of discharge with large-scale climate mode indices and 3) models based on stable teleconnections. All models are calibrated using discharge and climatic data for the period 1901-1977 and validated for the period 1978-2008 . Various time series models, like autoregressive (AR), moving average (MA), autoregressive and moving average (ARMA) or singular spectrum analysis and autoregressive moving average (SSA+ARMA) models have been calibrated and their skills evaluated. The best results were obtained using SSA+ARMA models. SSA+ARMA models proved to have the highest forecast skill also for other European rivers (Gamiz-Fortis et al. 2008). Multiple linear regression models using large-scale climatic mode indices as predictors have a higher forecast skill than the time series models. The best predictors for Danube discharge are the North Atlantic Oscillation (NAO) and the East Atlantic/Western Russia patterns during winter and spring. Other patterns, like Polar/Eurasian or Tropical Northern Hemisphere (TNH) are good predictors for summer and autumn discharge. Based on stable teleconnection approach (Ionita et al. 2008) we construct prediction models through a combination of sea surface temperature (SST), temperature (T) and precipitation (PP) from the regions where discharge and SST, T and PP variations are stable correlated. Forecast skills of these models are higher than forecast skills of the time series and multiple regression models. The models calibrated and validated in our study can be used for operational prediction of interannual and seasonal Danube discharge anomalies. References Gamiz-Fortis, S., D. Pozo-Vazquez, R.M. Trigo, and Y. Castro-Diez, Quantifying the predictability of winter river flow in Iberia. Part I: intearannual predictability. J. Climate, 2484-2501, 2008. Gamiz-Fortis, S., D. Pozo-Vazquez, R.M. Trigo, and Y. Castro-Diez, Quantifying the predictability of winter river flow in Iberia. Part II: seasonal predictability. J. Climate, 2503-2518, 2008. Ionita, M., G. Lohmann, and N. Rimbu, Prediction of spring Elbe river discharge based on stable teleconnections with global temperature and precipitation. J. Climate. 6215-6226, 2008.
A primer for biomedical scientists on how to execute model II linear regression analysis.
Ludbrook, John
2012-04-01
1. There are two very different ways of executing linear regression analysis. One is Model I, when the x-values are fixed by the experimenter. The other is Model II, in which the x-values are free to vary and are subject to error. 2. I have received numerous complaints from biomedical scientists that they have great difficulty in executing Model II linear regression analysis. This may explain the results of a Google Scholar search, which showed that the authors of articles in journals of physiology, pharmacology and biochemistry rarely use Model II regression analysis. 3. I repeat my previous arguments in favour of using least products linear regression analysis for Model II regressions. I review three methods for executing ordinary least products (OLP) and weighted least products (WLP) regression analysis: (i) scientific calculator and/or computer spreadsheet; (ii) specific purpose computer programs; and (iii) general purpose computer programs. 4. Using a scientific calculator and/or computer spreadsheet, it is easy to obtain correct values for OLP slope and intercept, but the corresponding 95% confidence intervals (CI) are inaccurate. 5. Using specific purpose computer programs, the freeware computer program smatr gives the correct OLP regression coefficients and obtains 95% CI by bootstrapping. In addition, smatr can be used to compare the slopes of OLP lines. 6. When using general purpose computer programs, I recommend the commercial programs systat and Statistica for those who regularly undertake linear regression analysis and I give step-by-step instructions in the Supplementary Information as to how to use loss functions. © 2011 The Author. Clinical and Experimental Pharmacology and Physiology. © 2011 Blackwell Publishing Asia Pty Ltd.
Deep ensemble learning of sparse regression models for brain disease diagnosis.
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2017-04-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer's disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call 'Deep Ensemble Sparse Regression Network.' To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. Copyright © 2017 Elsevier B.V. All rights reserved.
Deep ensemble learning of sparse regression models for brain disease diagnosis
Suk, Heung-Il; Lee, Seong-Whan; Shen, Dinggang
2018-01-01
Recent studies on brain imaging analysis witnessed the core roles of machine learning techniques in computer-assisted intervention for brain disease diagnosis. Of various machine-learning techniques, sparse regression models have proved their effectiveness in handling high-dimensional data but with a small number of training samples, especially in medical problems. In the meantime, deep learning methods have been making great successes by outperforming the state-of-the-art performances in various applications. In this paper, we propose a novel framework that combines the two conceptually different methods of sparse regression and deep learning for Alzheimer’s disease/mild cognitive impairment diagnosis and prognosis. Specifically, we first train multiple sparse regression models, each of which is trained with different values of a regularization control parameter. Thus, our multiple sparse regression models potentially select different feature subsets from the original feature set; thereby they have different powers to predict the response values, i.e., clinical label and clinical scores in our work. By regarding the response values from our sparse regression models as target-level representations, we then build a deep convolutional neural network for clinical decision making, which thus we call ‘ Deep Ensemble Sparse Regression Network.’ To our best knowledge, this is the first work that combines sparse regression models with deep neural network. In our experiments with the ADNI cohort, we validated the effectiveness of the proposed method by achieving the highest diagnostic accuracies in three classification tasks. We also rigorously analyzed our results and compared with the previous studies on the ADNI cohort in the literature. PMID:28167394
Takaki, Koki; Wade, Andrew J; Collins, Chris D
2015-11-01
The aim of this study was to assess and improve the accuracy of biotransfer models for the organic pollutants (PCBs, PCDD/Fs, PBDEs, PFCAs, and pesticides) into cow's milk and beef used in human exposure assessment. Metabolic rate in cattle is known as a key parameter for this biotransfer, however few experimental data and no simulation methods are currently available. In this research, metabolic rate was estimated using existing QSAR biodegradation models of microorganisms (BioWIN) and fish (EPI-HL and IFS-HL). This simulated metabolic rate was then incorporated into the mechanistic cattle biotransfer models (RAIDAR, ACC-HUMAN, OMEGA, and CKow). The goodness of fit tests showed that RAIDAR, ACC-HUMAN, OMEGA model performances were significantly improved using either of the QSARs when comparing the new model outputs to observed data. The CKow model is the only one that separates the processes in the gut and liver. This model showed the lowest residual error of all the models tested when the BioWIN model was used to represent the ruminant metabolic process in the gut and the two fish QSARs were used to represent the metabolic process in the liver. Our testing included EUSES and CalTOX which are KOW-regression models that are widely used in regulatory assessment. New regressions based on the simulated rate of the two metabolic processes are also proposed as an alternative to KOW-regression models for a screening risk assessment. The modified CKow model is more physiologically realistic, but has equivalent usability to existing KOW-regression models for estimating cattle biotransfer of organic pollutants. Copyright © 2015. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Xu, Jianxin; Liang, Hong
2013-07-01
Terrestrial laser scanning creates a point cloud composed of thousands or millions of 3D points. Through pre-processing, generating TINs, mapping texture, a 3D model of a real object is obtained. When the object is too large, the object is separated into some parts. This paper mainly focuses on problem of gray uneven of two adjacent textures' intersection. The new algorithm is presented in the paper, which is per-pixel linear interpolation along loop line buffer .The experiment data derives from point cloud of stone lion which is situated in front of west gate of Henan Polytechnic University. The model flow is composed of three parts. First, the large object is separated into two parts, and then each part is modeled, finally the whole 3D model of the stone lion is composed of two part models. When the two part models are combined, there is an obvious fissure line in the overlapping section of two adjacent textures for the two models. Some researchers decrease brightness value of all pixels for two adjacent textures by some algorithms. However, some algorithms are effect and the fissure line still exists. Gray uneven of two adjacent textures is dealt by the algorithm in the paper. The fissure line in overlapping section textures is eliminated. The gray transition in overlapping section become more smoothly.
Su, Liyun; Zhao, Yanyong; Yan, Tianshun; Li, Fenglan
2012-01-01
Multivariate local polynomial fitting is applied to the multivariate linear heteroscedastic regression model. Firstly, the local polynomial fitting is applied to estimate heteroscedastic function, then the coefficients of regression model are obtained by using generalized least squares method. One noteworthy feature of our approach is that we avoid the testing for heteroscedasticity by improving the traditional two-stage method. Due to non-parametric technique of local polynomial estimation, it is unnecessary to know the form of heteroscedastic function. Therefore, we can improve the estimation precision, when the heteroscedastic function is unknown. Furthermore, we verify that the regression coefficients is asymptotic normal based on numerical simulations and normal Q-Q plots of residuals. Finally, the simulation results and the local polynomial estimation of real data indicate that our approach is surely effective in finite-sample situations.
Discrete post-processing of total cloud cover ensemble forecasts
NASA Astrophysics Data System (ADS)
Hemri, Stephan; Haiden, Thomas; Pappenberger, Florian
2017-04-01
This contribution presents an approach to post-process ensemble forecasts for the discrete and bounded weather variable of total cloud cover. Two methods for discrete statistical post-processing of ensemble predictions are tested. The first approach is based on multinomial logistic regression, the second involves a proportional odds logistic regression model. Applying them to total cloud cover raw ensemble forecasts from the European Centre for Medium-Range Weather Forecasts improves forecast skill significantly. Based on station-wise post-processing of raw ensemble total cloud cover forecasts for a global set of 3330 stations over the period from 2007 to early 2014, the more parsimonious proportional odds logistic regression model proved to slightly outperform the multinomial logistic regression model. Reference Hemri, S., Haiden, T., & Pappenberger, F. (2016). Discrete post-processing of total cloud cover ensemble forecasts. Monthly Weather Review 144, 2565-2577.
Barton, Alan J; Valdés, Julio J; Orchard, Robert
2009-01-01
Classical neural networks are composed of neurons whose nature is determined by a certain function (the neuron model), usually pre-specified. In this paper, a type of neural network (NN-GP) is presented in which: (i) each neuron may have its own neuron model in the form of a general function, (ii) any layout (i.e network interconnection) is possible, and (iii) no bias nodes or weights are associated to the connections, neurons or layers. The general functions associated to a neuron are learned by searching a function space. They are not provided a priori, but are rather built as part of an Evolutionary Computation process based on Genetic Programming. The resulting network solutions are evaluated based on a fitness measure, which may, for example, be based on classification or regression errors. Two real-world examples are presented to illustrate the promising behaviour on classification problems via construction of a low-dimensional representation of a high-dimensional parameter space associated to the set of all network solutions.
Sensitivity Analysis of the Integrated Medical Model for ISS Programs
NASA Technical Reports Server (NTRS)
Goodenow, D. A.; Myers, J. G.; Arellano, J.; Boley, L.; Garcia, Y.; Saile, L.; Walton, M.; Kerstman, E.; Reyes, D.; Young, M.
2016-01-01
Sensitivity analysis estimates the relative contribution of the uncertainty in input values to the uncertainty of model outputs. Partial Rank Correlation Coefficient (PRCC) and Standardized Rank Regression Coefficient (SRRC) are methods of conducting sensitivity analysis on nonlinear simulation models like the Integrated Medical Model (IMM). The PRCC method estimates the sensitivity using partial correlation of the ranks of the generated input values to each generated output value. The partial part is so named because adjustments are made for the linear effects of all the other input values in the calculation of correlation between a particular input and each output. In SRRC, standardized regression-based coefficients measure the sensitivity of each input, adjusted for all the other inputs, on each output. Because the relative ranking of each of the inputs and outputs is used, as opposed to the values themselves, both methods accommodate the nonlinear relationship of the underlying model. As part of the IMM v4.0 validation study, simulations are available that predict 33 person-missions on ISS and 111 person-missions on STS. These simulated data predictions feed the sensitivity analysis procedures. The inputs to the sensitivity procedures include the number occurrences of each of the one hundred IMM medical conditions generated over the simulations and the associated IMM outputs: total quality time lost (QTL), number of evacuations (EVAC), and number of loss of crew lives (LOCL). The IMM team will report the results of using PRCC and SRRC on IMM v4.0 predictions of the ISS and STS missions created as part of the external validation study. Tornado plots will assist in the visualization of the condition-related input sensitivities to each of the main outcomes. The outcomes of this sensitivity analysis will drive review focus by identifying conditions where changes in uncertainty could drive changes in overall model output uncertainty. These efforts are an integral part of the overall verification, validation, and credibility review of IMM v4.0.
Individual Differences in Holistic Processing Predict the Own-Race Advantage in Recognition Memory
DeGutis, Joseph; Mercado, Rogelio J.; Wilmer, Jeremy; Rosenblatt, Andrew
2013-01-01
Individuals are consistently better at recognizing own-race faces compared to other-race faces (other-race effect, ORE). One popular hypothesis is that this recognition memory ORE is caused by differential own- and other-race holistic processing, the simultaneous integration of part and configural face information into a coherent whole. Holistic processing may create a more rich, detailed memory representation of own-race faces compared to other-race faces. Despite several studies showing that own-race faces are processed more holistically than other-race faces, studies have yet to link the holistic processing ORE and the recognition memory ORE. In the current study, we sought to use a more valid method of analyzing individual differences in holistic processing by using regression to statistically remove the influence of the control condition (part trials in the part-whole task) from the condition of interest (whole trials in the part-whole task). We also employed regression to separately examine the two components of the ORE: own-race advantage (regressing other-race from own-race performance) and other-race decrement (regressing own-race from other-race performance). First, we demonstrated that own-race faces were processed more holistically than other-race faces, particularly the eye region. Notably, using regression, we showed a significant association between the own-race advantage in recognition memory and the own-race advantage in holistic processing and that these associations were weaker when examining the other-race decrement. We also demonstrated that performance on own- and other-race faces across all of our tasks was highly correlated, suggesting that the differences we found between own- and other-race faces are quantitative rather than qualitative. Together, this suggests that own- and other-race faces recruit largely similar mechanisms, that own-race faces more thoroughly engage holistic processing, and that this greater engagement of holistic processing is significantly associated with the own-race advantage in recognition memory. PMID:23593119
Individual differences in holistic processing predict the own-race advantage in recognition memory.
Degutis, Joseph; Mercado, Rogelio J; Wilmer, Jeremy; Rosenblatt, Andrew
2013-01-01
Individuals are consistently better at recognizing own-race faces compared to other-race faces (other-race effect, ORE). One popular hypothesis is that this recognition memory ORE is caused by differential own- and other-race holistic processing, the simultaneous integration of part and configural face information into a coherent whole. Holistic processing may create a more rich, detailed memory representation of own-race faces compared to other-race faces. Despite several studies showing that own-race faces are processed more holistically than other-race faces, studies have yet to link the holistic processing ORE and the recognition memory ORE. In the current study, we sought to use a more valid method of analyzing individual differences in holistic processing by using regression to statistically remove the influence of the control condition (part trials in the part-whole task) from the condition of interest (whole trials in the part-whole task). We also employed regression to separately examine the two components of the ORE: own-race advantage (regressing other-race from own-race performance) and other-race decrement (regressing own-race from other-race performance). First, we demonstrated that own-race faces were processed more holistically than other-race faces, particularly the eye region. Notably, using regression, we showed a significant association between the own-race advantage in recognition memory and the own-race advantage in holistic processing and that these associations were weaker when examining the other-race decrement. We also demonstrated that performance on own- and other-race faces across all of our tasks was highly correlated, suggesting that the differences we found between own- and other-race faces are quantitative rather than qualitative. Together, this suggests that own- and other-race faces recruit largely similar mechanisms, that own-race faces more thoroughly engage holistic processing, and that this greater engagement of holistic processing is significantly associated with the own-race advantage in recognition memory.
Ettner, Susan L; M Harwood, Jessica; Thalmayer, Amber; Ong, Michael K; Xu, Haiyong; Bresolin, Michael J; Wells, Kenneth B; Tseng, Chi-Hong; Azocar, Francisca
2016-12-01
Interrupted time series with and without controls was used to evaluate whether the federal Mental Health Parity and Addiction Equity Act (MHPAEA) and its Interim Final Rule increased the probability of specialty behavioral health treatment and levels of utilization and expenditures among patients receiving treatment. Linked insurance claims, eligibility, plan and employer data from 2008 to 2013 were used to estimate segmented regression analyses, allowing for level and slope changes during the transition (2010) and post-MHPAEA (2011-2013) periods. The sample included 1,812,541 individuals ages 27-64 (49,968,367 person-months) in 10,010 Optum "carve-out" plans. Two-part regression models with Generalized Estimating Equations were used to estimate expenditures by payer and outpatient, intermediate and inpatient service use. We found little evidence that MHPAEA increased utilization significantly, but somewhat more robust evidence that costs shifted from patients to plans. Thus the primary impact of MHPAEA among carve-out enrollees may have been a reduction in patient financial burden. Copyright © 2016 Elsevier B.V. All rights reserved.
Comparison of Mental Health Treatment Adequacy and Costs in Public Hospitals in Boston and Madrid.
Carmona, Rodrigo; Cook, Benjamin Lê; Baca-García, Enrique; Chavez, Ligia; Alvarez, Kiara; Iza, Miren; Alegría, Margarita
2018-03-07
Analyses of healthcare expenditures and adequacy are needed to identify cost-effective policies and practices that improve mental healthcare quality. Data are from 2010 to 2012 electronic health records from three hospital psychiatry departments in Madrid (n = 29,944 person-years) and three in Boston (n = 14,109 person-years). Two-part multivariate generalized linear regression and logistic regression models were estimated to identify site differences in mental healthcare expenditures and quality of care. Annual total average treatment expenditures were $4442.14 in Boston and $2277.48 in Madrid. Boston patients used inpatient services more frequently and had higher 30-day re-admission rates (23.7 vs. 8.7%) despite higher rates of minimally adequate care (49.5 vs. 34.8%). Patients in Madrid were more likely to receive psychotropic medication, had fewer inpatient stays and readmissions, and had lower expenditures, but had lower rates of minimally adequate care. Differences in insurance and healthcare system policies and mental health professional roles may explain these dissimilarities.
Abásolo, Ignacio; Saez, Marc; López-Casasnovas, Guillem
2017-07-24
The objective of this paper is to analyse whether the recent recession has altered health care utilisation patterns of different income groups in Spain. Based on information concerning individuals 'income and health care use, along with health need indicators and demographic characteristics (provided by the Spanish National Health Surveys from 2006/07 and 2011/12), econometric models are estimated in two parts (mixed logistic regressions and truncated negative binominal regressions) for each of the public health services studied (family doctor appointments, appointments with specialists, hospitalisations, emergencies and prescription drug use). The results show that the principle of universal access to public health provision does not in fact prevent a financial crisis from affecting certain income groups more than others in their utilisation of public health services. Specifically, in relative terms the recession has been more detrimental to low-income groups in the cases of specialist appointments and hospitalisations, whereas it has worked to their advantage in the cases of emergency services and family doctor appointments.
Analyzing hospitalization data: potential limitations of Poisson regression.
Weaver, Colin G; Ravani, Pietro; Oliver, Matthew J; Austin, Peter C; Quinn, Robert R
2015-08-01
Poisson regression is commonly used to analyze hospitalization data when outcomes are expressed as counts (e.g. number of days in hospital). However, data often violate the assumptions on which Poisson regression is based. More appropriate extensions of this model, while available, are rarely used. We compared hospitalization data between 206 patients treated with hemodialysis (HD) and 107 treated with peritoneal dialysis (PD) using Poisson regression and compared results from standard Poisson regression with those obtained using three other approaches for modeling count data: negative binomial (NB) regression, zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression. We examined the appropriateness of each model and compared the results obtained with each approach. During a mean 1.9 years of follow-up, 183 of 313 patients (58%) were never hospitalized (indicating an excess of 'zeros'). The data also displayed overdispersion (variance greater than mean), violating another assumption of the Poisson model. Using four criteria, we determined that the NB and ZINB models performed best. According to these two models, patients treated with HD experienced similar hospitalization rates as those receiving PD {NB rate ratio (RR): 1.04 [bootstrapped 95% confidence interval (CI): 0.49-2.20]; ZINB summary RR: 1.21 (bootstrapped 95% CI 0.60-2.46)}. Poisson and ZIP models fit the data poorly and had much larger point estimates than the NB and ZINB models [Poisson RR: 1.93 (bootstrapped 95% CI 0.88-4.23); ZIP summary RR: 1.84 (bootstrapped 95% CI 0.88-3.84)]. We found substantially different results when modeling hospitalization data, depending on the approach used. Our results argue strongly for a sound model selection process and improved reporting around statistical methods used for modeling count data. © The Author 2015. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
NASA Astrophysics Data System (ADS)
Nooruddin, Hasan A.; Anifowose, Fatai; Abdulraheem, Abdulazeez
2014-03-01
Soft computing techniques are recently becoming very popular in the oil industry. A number of computational intelligence-based predictive methods have been widely applied in the industry with high prediction capabilities. Some of the popular methods include feed-forward neural networks, radial basis function network, generalized regression neural network, functional networks, support vector regression and adaptive network fuzzy inference system. A comparative study among most popular soft computing techniques is presented using a large dataset published in literature describing multimodal pore systems in the Arab D formation. The inputs to the models are air porosity, grain density, and Thomeer parameters obtained using mercury injection capillary pressure profiles. Corrected air permeability is the target variable. Applying developed permeability models in recent reservoir characterization workflow ensures consistency between micro and macro scale information represented mainly by Thomeer parameters and absolute permeability. The dataset was divided into two parts with 80% of data used for training and 20% for testing. The target permeability variable was transformed to the logarithmic scale as a pre-processing step and to show better correlations with the input variables. Statistical and graphical analysis of the results including permeability cross-plots and detailed error measures were created. In general, the comparative study showed very close results among the developed models. The feed-forward neural network permeability model showed the lowest average relative error, average absolute relative error, standard deviations of error and root means squares making it the best model for such problems. Adaptive network fuzzy inference system also showed very good results.
Simple linear and multivariate regression models.
Rodríguez del Águila, M M; Benítez-Parejo, N
2011-01-01
In biomedical research it is common to find problems in which we wish to relate a response variable to one or more variables capable of describing the behaviour of the former variable by means of mathematical models. Regression techniques are used to this effect, in which an equation is determined relating the two variables. While such equations can have different forms, linear equations are the most widely used form and are easy to interpret. The present article describes simple and multiple linear regression models, how they are calculated, and how their applicability assumptions are checked. Illustrative examples are provided, based on the use of the freely accessible R program. Copyright © 2011 SEICAP. Published by Elsevier Espana. All rights reserved.
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Hoos, A.B.; McMahon, G.
2009-01-01
Understanding how nitrogen transport across the landscape varies with landscape characteristics is important for developing sound nitrogen management policies. We used a spatially referenced regression analysis (SPARROW) to examine landscape characteristics influencing delivery of nitrogen from sources in a watershed to stream channels. Modelled landscape delivery ratio varies widely (by a factor of 4) among watersheds in the southeastern United States - higher in the western part (Tennessee, Alabama, and Mississippi) than in the eastern part, and the average value for the region is lower compared to other parts of the nation. When we model landscape delivery ratio as a continuous function of local-scale landscape characteristics, we estimate a spatial pattern that varies as a function of soil and climate characteristics but exhibits spatial structure in residuals (observed load minus predicted load). The spatial pattern of modelled landscape delivery ratio and the spatial pattern of residuals coincide spatially with Level III ecoregions and also with hydrologic landscape regions. Subsequent incorporation into the model of these frameworks as regional scale variables improves estimation of landscape delivery ratio, evidenced by reduced spatial bias in residuals, and suggests that cross-scale processes affect nitrogen attenuation on the landscape. The model-fitted coefficient values are logically consistent with the hypothesis that broad-scale classifications of hydrologic response help to explain differential rates of nitrogen attenuation, controlling for local-scale landscape characteristics. Negative model coefficients for hydrologic landscape regions where the primary flow path is shallow ground water suggest that a lower fraction of nitrogen mass will be delivered to streams; this relation is reversed for regions where the primary flow path is overland flow.
Hoos, Anne B.; McMahon, Gerard
2009-01-01
Understanding how nitrogen transport across the landscape varies with landscape characteristics is important for developing sound nitrogen management policies. We used a spatially referenced regression analysis (SPARROW) to examine landscape characteristics influencing delivery of nitrogen from sources in a watershed to stream channels. Modelled landscape delivery ratio varies widely (by a factor of 4) among watersheds in the southeastern United States—higher in the western part (Tennessee, Alabama, and Mississippi) than in the eastern part, and the average value for the region is lower compared to other parts of the nation. When we model landscape delivery ratio as a continuous function of local-scale landscape characteristics, we estimate a spatial pattern that varies as a function of soil and climate characteristics but exhibits spatial structure in residuals (observed load minus predicted load). The spatial pattern of modelled landscape delivery ratio and the spatial pattern of residuals coincide spatially with Level III ecoregions and also with hydrologic landscape regions. Subsequent incorporation into the model of these frameworks as regional scale variables improves estimation of landscape delivery ratio, evidenced by reduced spatial bias in residuals, and suggests that cross-scale processes affect nitrogen attenuation on the landscape. The model-fitted coefficient values are logically consistent with the hypothesis that broad-scale classifications of hydrologic response help to explain differential rates of nitrogen attenuation, controlling for local-scale landscape characteristics. Negative model coefficients for hydrologic landscape regions where the primary flow path is shallow ground water suggest that a lower fraction of nitrogen mass will be delivered to streams; this relation is reversed for regions where the primary flow path is overland flow.
Regression Models for the Analysis of Longitudinal Gaussian Data from Multiple Sources
O’Brien, Liam M.; Fitzmaurice, Garrett M.
2006-01-01
We present a regression model for the joint analysis of longitudinal multiple source Gaussian data. Longitudinal multiple source data arise when repeated measurements are taken from two or more sources, and each source provides a measure of the same underlying variable and on the same scale. This type of data generally produces a relatively large number of observations per subject; thus estimation of an unstructured covariance matrix often may not be possible. We consider two methods by which parsimonious models for the covariance can be obtained for longitudinal multiple source data. The methods are illustrated with an example of multiple informant data arising from a longitudinal interventional trial in psychiatry. PMID:15726666
European Wintertime Windstorms and its Links to Large-Scale Variability Modes
NASA Astrophysics Data System (ADS)
Befort, D. J.; Wild, S.; Walz, M. A.; Knight, J. R.; Lockwood, J. F.; Thornton, H. E.; Hermanson, L.; Bett, P.; Weisheimer, A.; Leckebusch, G. C.
2017-12-01
Winter storms associated with extreme wind speeds and heavy precipitation are the most costly natural hazard in several European countries. Improved understanding and seasonal forecast skill of winter storms will thus help society, policy-makers and (re-) insurance industry to be better prepared for such events. We firstly assess the ability to represent extra-tropical windstorms over the Northern Hemisphere of three seasonal forecast ensemble suites: ECMWF System3, ECMWF System4 and GloSea5. Our results show significant skill for inter-annual variability of windstorm frequency over parts of Europe in two of these forecast suites (ECMWF-S4 and GloSea5) indicating the potential use of current seasonal forecast systems. In a regression model we further derive windstorm variability using the forecasted NAO from the seasonal model suites thus estimating the suitability of the NAO as the only predictor. We find that the NAO as the main large-scale mode over Europe can explain some of the achieved skill and is therefore an important source of variability in the seasonal models. However, our results show that the regression model fails to reproduce the skill level of the directly forecast windstorm frequency over large areas of central Europe. This suggests that the seasonal models also capture other sources of variability/predictability of windstorms than the NAO. In order to investigate which other large-scale variability modes steer the interannual variability of windstorms we develop a statistical model using a Poisson GLM. We find that the Scandinavian Pattern (SCA) in fact explains a larger amount of variability for Central Europe during the 20th century than the NAO. This statistical model is able to skilfully reproduce the interannual variability of windstorm frequency especially for the British Isles and Central Europe with correlations up to 0.8.
Comparison of Sub-Pixel Classification Approaches for Crop-Specific Mapping
This paper examined two non-linear models, Multilayer Perceptron (MLP) regression and Regression Tree (RT), for estimating sub-pixel crop proportions using time-series MODIS-NDVI data. The sub-pixel proportions were estimated for three major crop types including corn, soybean, a...
González-Madroño, A; Mancha, A; Rodríguez, F J; Culebras, J; de Ulibarri, J I
2012-01-01
To ratify previous validations of the CONUT nutritional screening tool by the development of two probabilistic models using the parameters included in the CONUT, to see if the CONUT´s effectiveness could be improved. It is a two step prospective study. In Step 1, 101 patients were randomly selected, and SGA and CONUT was made. With data obtained an unconditional logistic regression model was developed, and two variants of CONUT were constructed: Model 1 was made by a method of logistic regression. Model 2 was made by dividing the probabilities of undernutrition obtained in model 1 in seven regular intervals. In step 2, 60 patients were selected and underwent the SGA, the original CONUT and the new models developed. The diagnostic efficacy of the original CONUT and the new models was tested by means of ROC curves. Both samples 1 and 2 were put together to measure the agreement degree between the original CONUT and SGA, and diagnostic efficacy parameters were calculated. No statistically significant differences were found between sample 1 and 2, regarding age, sex and medical/surgical distribution and undernutrition rates were similar (over 40%). The AUC for the ROC curves were 0.862 for the original CONUT, and 0.839 and 0.874, for model 1 and 2 respectively. The kappa index for the CONUT and SGA was 0.680. The CONUT, with the original scores assigned by the authors is equally good than mathematical models and thus is a valuable tool, highly useful and efficient for the purpose of Clinical Undernutrition screening.
NASA Astrophysics Data System (ADS)
Lombardo, L.; Cama, M.; Maerker, M.; Parisi, L.; Rotigliano, E.
2014-12-01
This study aims at comparing the performances of Binary Logistic Regression (BLR) and Boosted Regression Trees (BRT) methods in assessing landslide susceptibility for multiple-occurrence regional landslide events within the Mediterranean region. A test area was selected in the north-eastern sector of Sicily (southern Italy), corresponding to the catchments of the Briga and the Giampilieri streams both stretching for few kilometres from the Peloritan ridge (eastern Sicily, Italy) to the Ionian sea. This area was struck on the 1st October 2009 by an extreme climatic event resulting in thousands of rapid shallow landslides, mainly of debris flows and debris avalanches types involving the weathered layer of a low to high grade metamorphic bedrock. Exploiting the same set of predictors and the 2009 landslide archive, BLR- and BRT-based susceptibility models were obtained for the two catchments separately, adopting a random partition (RP) technique for validation; besides, the models trained in one of the two catchments (Briga) were tested in predicting the landslide distribution in the other (Giampilieri), adopting a spatial partition (SP) based validation procedure. All the validation procedures were based on multi-folds tests so to evaluate and compare the reliability of the fitting, the prediction skill, the coherence in the predictor selection and the precision of the susceptibility estimates. All the obtained models for the two methods produced very high predictive performances, with a general congruence between BLR and BRT in the predictor importance. In particular, the research highlighted that BRT-models reached a higher prediction performance with respect to BLR-models, for RP based modelling, whilst for the SP-based models the difference in predictive skills between the two methods dropped drastically, converging to an analogous excellent performance. However, when looking at the precision of the probability estimates, BLR demonstrated to produce more robust models in terms of selected predictors and coefficients, as well as of dispersion of the estimated probabilities around the mean value for each mapped pixel. The difference in the behaviour could be interpreted as the result of overfitting effects, which heavily affect decision tree classification more than logistic regression techniques.
Fatigue Reliability of Gas Turbine Engine Structures
NASA Technical Reports Server (NTRS)
Cruse, Thomas A.; Mahadevan, Sankaran; Tryon, Robert G.
1997-01-01
The results of an investigation are described for fatigue reliability in engine structures. The description consists of two parts. Part 1 is for method development. Part 2 is a specific case study. In Part 1, the essential concepts and practical approaches to damage tolerance design in the gas turbine industry are summarized. These have evolved over the years in response to flight safety certification requirements. The effect of Non-Destructive Evaluation (NDE) methods on these methods is also reviewed. Assessment methods based on probabilistic fracture mechanics, with regard to both crack initiation and crack growth, are outlined. Limit state modeling techniques from structural reliability theory are shown to be appropriate for application to this problem, for both individual failure mode and system-level assessment. In Part 2, the results of a case study for the high pressure turbine of a turboprop engine are described. The response surface approach is used to construct a fatigue performance function. This performance function is used with the First Order Reliability Method (FORM) to determine the probability of failure and the sensitivity of the fatigue life to the engine parameters for the first stage disk rim of the two stage turbine. A hybrid combination of regression and Monte Carlo simulation is to use incorporate time dependent random variables. System reliability is used to determine the system probability of failure, and the sensitivity of the system fatigue life to the engine parameters of the high pressure turbine. 'ne variation in the primary hot gas and secondary cooling air, the uncertainty of the complex mission loading, and the scatter in the material data are considered.
Rebich, Richard A; Houston, Natalie A; Mize, Scott V; Pearson, Daniel K; Ging, Patricia B; Evan Hornig, C
2011-01-01
Abstract SPAtially Referenced Regressions On Watershed attributes (SPARROW) models were developed to estimate nutrient inputs [total nitrogen (TN) and total phosphorus (TP)] to the northwestern part of the Gulf of Mexico from streams in the South-Central United States (U.S.). This area included drainages of the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf hydrologic regions. The models were standardized to reflect nutrient sources and stream conditions during 2002. Model predictions of nutrient loads (mass per time) and yields (mass per area per time) generally were greatest in streams in the eastern part of the region and along reaches near the Texas and Louisiana shoreline. The Mississippi River and Atchafalaya River watersheds, which drain nearly two-thirds of the conterminous U.S., delivered the largest nutrient loads to the Gulf of Mexico, as expected. However, the three largest delivered TN yields were from the Trinity River/Galveston Bay, Calcasieu River, and Aransas River watersheds, while the three largest delivered TP yields were from the Calcasieu River, Mermentau River, and Trinity River/Galveston Bay watersheds. Model output indicated that the three largest sources of nitrogen from the region were atmospheric deposition (42%), commercial fertilizer (20%), and livestock manure (unconfined, 17%). The three largest sources of phosphorus were commercial fertilizer (28%), urban runoff (23%), and livestock manure (confined and unconfined, 23%). PMID:22457582
Simulation studies of improved sounding systems
NASA Technical Reports Server (NTRS)
Yates, H.; Wark, D.; Aumann, H.; Evans, N.; Phillips, N.; Susskind, J.; Mcmillin, L.; Goldman, A.; Chahine, M.; Crone, L.
1989-01-01
Two instrument designs for indirect satellite sounding of the atmosphere in the infrared are represented by the High Resolution Infra-Red Sounder, Model 2 (HIRS-2) and by the Advanced Meteorological Temperature Sounder (AMTS). The relative capabilities of the two instruments were tested by simulating satellite measurements from a group of temperature soundings, allowing the two participants to retrieve the temperature profiles from the simulated data, and comparing the results with the original temperature profiles. Four data sets were produced from radiosondes data extrapolated to a suitable altitude, representing continents and oceans, between 30S and 30N. From the information available, temperature profiles were retrieved by two different methods, statistical regression and inversion of the radiative transfer equation. Results show the consequence of greater spectral purity, concomitant increase in the number of spectral intervals, and the better spatial resolution in partly clouded areas. At the same time, the limitation of the HIRS-2 without its companion instrument leads to some results which should be ignored in comparing the two instruments. A clear superiority of AMTS results is shown.
Social security status and mortality in Belgian and Spanish male workers.
Duran, Xavier; Vanroelen, Christophe; Deboosere, Patrick; Benavides, Fernando G
2016-01-01
To assess differences in mortality rates between social security statuses in two independent samples of Belgian and Spanish male workers. Study of two retrospective cohorts (Belgium, n=23,607; Spain, n=44,385) of 50-60 year old male employees with 4 years of follow-up. Mortality rate ratios (MRR) were estimated using Poisson regression models. Mortality for subjects with permanent disability was higher than for the employed, for both Belgium [MRR=4.56 (95% CI: 2.88-7.21)] and Spain [MRR=7.15 (95% CI: 5.37-9.51)]. For the unemployed/early retirees, mortality was higher in Spain [MRR=1.64 (95% CI: 1.24-2.17)] than in Belgium [MRR=0.88 (95% CI: 0.46-1.71)]. MRR differences between Belgium and Spain for unemployed workers could be partly explained because of differences between the two social security systems. Future studies should further explore mortality differences between countries with different social security systems. Copyright © 2016 SESPAS. Published by Elsevier Espana. All rights reserved.
Nonlinear-regression groundwater flow modeling of a deep regional aquifer system
Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.
1986-01-01
A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.
Nonlinear-Regression Groundwater Flow Modeling of a Deep Regional Aquifer System
NASA Astrophysics Data System (ADS)
Cooley, Richard L.; Konikow, Leonard F.; Naff, Richard L.
1986-12-01
A nonlinear regression groundwater flow model, based on a Galerkin finite-element discretization, was used to analyze steady state two-dimensional groundwater flow in the areally extensive Madison aquifer in a 75,000 mi2 area of the Northern Great Plains. Regression parameters estimated include intrinsic permeabilities of the main aquifer and separate lineament zones, discharges from eight major springs surrounding the Black Hills, and specified heads on the model boundaries. Aquifer thickness and temperature variations were included as specified functions. The regression model was applied using sequential F testing so that the fewest number and simplest zonation of intrinsic permeabilities, combined with the simplest overall model, were evaluated initially; additional complexities (such as subdivisions of zones and variations in temperature and thickness) were added in stages to evaluate the subsequent degree of improvement in the model results. It was found that only the eight major springs, a single main aquifer intrinsic permeability, two separate lineament intrinsic permeabilities of much smaller values, and temperature variations are warranted by the observed data (hydraulic heads and prior information on some parameters) for inclusion in a model that attempts to explain significant controls on groundwater flow. Addition of thickness variations did not significantly improve model results; however, thickness variations were included in the final model because they are fairly well defined. Effects on the observed head distribution from other features, such as vertical leakage and regional variations in intrinsic permeability, apparently were overshadowed by measurement errors in the observed heads. Estimates of the parameters correspond well to estimates obtained from other independent sources.
Liu, Fengchen; Porco, Travis C.; Amza, Abdou; Kadri, Boubacar; Nassirou, Baido; West, Sheila K.; Bailey, Robin L.; Keenan, Jeremy D.; Solomon, Anthony W.; Emerson, Paul M.; Gambhir, Manoj; Lietman, Thomas M.
2015-01-01
Background Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. Methods The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts’ opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon’s signed-rank statistic. Findings Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher’s information. Each individual expert’s forecast was poorer than the sum of experts. Interpretation Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models. PMID:26302380
Liu, Fengchen; Porco, Travis C; Amza, Abdou; Kadri, Boubacar; Nassirou, Baido; West, Sheila K; Bailey, Robin L; Keenan, Jeremy D; Solomon, Anthony W; Emerson, Paul M; Gambhir, Manoj; Lietman, Thomas M
2015-08-01
Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts' opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon's signed-rank statistic. Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher's information. Each individual expert's forecast was poorer than the sum of experts. Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models. Clinicaltrials.gov NCT00792922.
Science of Test Research Consortium: Year Two Final Report
2012-10-02
July 2012. Analysis of an Intervention for Small Unmanned Aerial System ( SUAS ) Accidents, submitted to Quality Engineering, LQEN-2012-0056. Stone... Systems Engineering. Wolf, S. E., R. R. Hill, and J. J. Pignatiello. June 2012. Using Neural Networks and Logistic Regression to Model Small Unmanned ...Human Retina. 6. Wolf, S. E. March 2012. Modeling Small Unmanned Aerial System Mishaps using Logistic Regression and Artificial Neural Networks. 7
Dudley, Robert W.; Hodgkins, Glenn A.; Dickinson, Jesse
2017-01-01
We present a logistic regression approach for forecasting the probability of future groundwater levels declining or maintaining below specific groundwater-level thresholds. We tested our approach on 102 groundwater wells in different climatic regions and aquifers of the United States that are part of the U.S. Geological Survey Groundwater Climate Response Network. We evaluated the importance of current groundwater levels, precipitation, streamflow, seasonal variability, Palmer Drought Severity Index, and atmosphere/ocean indices for developing the logistic regression equations. Several diagnostics of model fit were used to evaluate the regression equations, including testing of autocorrelation of residuals, goodness-of-fit metrics, and bootstrap validation testing. The probabilistic predictions were most successful at wells with high persistence (low month-to-month variability) in their groundwater records and at wells where the groundwater level remained below the defined low threshold for sustained periods (generally three months or longer). The model fit was weakest at wells with strong seasonal variability in levels and with shorter duration low-threshold events. We identified challenges in deriving probabilistic-forecasting models and possible approaches for addressing those challenges.
Waite, Ian R.
2014-01-01
As part of the USGS study of nutrient enrichment of streams in agricultural regions throughout the United States, about 30 sites within each of eight study areas were selected to capture a gradient of nutrient conditions. The objective was to develop watershed disturbance predictive models for macroinvertebrate and algal metrics at national and three regional landscape scales to obtain a better understanding of important explanatory variables. Explanatory variables in models were generated from landscape data, habitat, and chemistry. Instream nutrient concentration and variables assessing the amount of disturbance to the riparian zone (e.g., percent row crops or percent agriculture) were selected as most important explanatory variable in almost all boosted regression tree models regardless of landscape scale or assemblage. Frequently, TN and TP concentration and riparian agricultural land use variables showed a threshold type response at relatively low values to biotic metrics modeled. Some measure of habitat condition was also commonly selected in the final invertebrate models, though the variable(s) varied across regions. Results suggest national models tended to account for more general landscape/climate differences, while regional models incorporated both broad landscape scale and more specific local-scale variables.
Adjusted variable plots for Cox's proportional hazards regression model.
Hall, C B; Zeger, S L; Bandeen-Roche, K J
1996-01-01
Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. In this paper, we extend adjusted variable plots to Cox's proportional hazards model for possibly censored survival data. We propose three different plots: a risk level adjusted variable (RLAV) plot in which each observation in each risk set appears, a subject level adjusted variable (SLAV) plot in which each subject is represented by one point, and an event level adjusted variable (ELAV) plot in which the entire risk set at each failure event is represented by a single point. The latter two plots are derived from the RLAV by combining multiple points. In each point, the regression coefficient and standard error from a Cox proportional hazards regression is obtained by a simple linear regression through the origin fit to the coordinates of the pictured points. The plots are illustrated with a reanalysis of a dataset of 65 patients with multiple myeloma.
USDA-ARS?s Scientific Manuscript database
Probabilistic forecasts of US Drought Monitor (USDM) intensification over two, four and eight week time periods are developed based on recent anomalies in precipitation, evapotranspiration and soil moisture. These statistical forecasts are computed using logistic regression with cross validation. Wh...
Part Two: Infantile Spasms--The New Consensus
ERIC Educational Resources Information Center
Pellock, John M.; O'Hara, Kathryn
2011-01-01
This article presents the conclusion made by the consensus group regarding infantile spasms. The consensus group concluded that "infantile spasms are a major form of severe epileptic encephalopathy of early childhood that results in neurodevelopmental regression and imposes a significant health burden." The entire group agrees that the best…
NASA Astrophysics Data System (ADS)
Maas, A.; Alrajhi, M.; Alobeid, A.; Heipke, C.
2017-05-01
Updating topographic geospatial databases is often performed based on current remotely sensed images. To automatically extract the object information (labels) from the images, supervised classifiers are being employed. Decisions to be taken in this process concern the definition of the classes which should be recognised, the features to describe each class and the training data necessary in the learning part of classification. With a view to large scale topographic databases for fast developing urban areas in the Kingdom of Saudi Arabia we conducted a case study, which investigated the following two questions: (a) which set of features is best suitable for the classification?; (b) what is the added value of height information, e.g. derived from stereo imagery? Using stereoscopic GeoEye and Ikonos satellite data we investigate these two questions based on our research on label tolerant classification using logistic regression and partly incorrect training data. We show that in between five and ten features can be recommended to obtain a stable solution, that height information consistently yields an improved overall classification accuracy of about 5%, and that label noise can be successfully modelled and thus only marginally influences the classification results.
NASA Astrophysics Data System (ADS)
Karl, Thomas R.; Wang, Wei-Chyung; Schlesinger, Michael E.; Knight, Richard W.; Portman, David
1990-10-01
Important surface observations such as the daily maximum and minimum temperature, daily precipitation, and cloud ceilings often have localized characteristics that are difficult to reproduce with the current resolution and the physical parameterizations in state-of-the-art General Circulation climate Models (GCMs). Many of the difficulties can be partially attributed to mismatches in scale, local topography. regional geography and boundary conditions between models and surface-based observations. Here, we present a method, called climatological projection by model statistics (CPMS), to relate GCM grid-point flee-atmosphere statistics, the predictors, to these important local surface observations. The method can be viewed as a generalization of the model output statistics (MOS) and perfect prog (PP) procedures used in numerical weather prediction (NWP) models. It consists of the application of three statistical methods: 1) principle component analysis (FICA), 2) canonical correlation, and 3) inflated regression analysis. The PCA reduces the redundancy of the predictors The canonical correlation is used to develop simultaneous relationships between linear combinations of the predictors, the canonical variables, and the surface-based observations. Finally, inflated regression is used to relate the important canonical variables to each of the surface-based observed variables.We demonstrate that even an early version of the Oregon State University two-level atmospheric GCM (with prescribed sea surface temperature) produces free-atmosphere statistics than can, when standardized using the model's internal means and variances (the MOS-like version of CPMS), closely approximate the observed local climate. When the model data are standardized by the observed free-atmosphere means and variances (the PP version of CPMS), however, the model does not reproduce the observed surface climate as well. Our results indicate that in the MOS-like version of CPMS the differences between the output of a ten-year GCM control run and the surface-based observations are often smaller than the differences between the observations of two ten-year periods. Such positive results suggest that GCMs may already contain important climatological information that can be used to infer the local climate.
Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble
Wang, Hong; Xu, Qingsong; Zhou, Lifeng
2015-01-01
Recently, various ensemble learning methods with different base classifiers have been proposed for credit scoring problems. However, for various reasons, there has been little research using logistic regression as the base classifier. In this paper, given large unbalanced data, we consider the plausibility of ensemble learning using regularized logistic regression as the base classifier to deal with credit scoring problems. In this research, the data is first balanced and diversified by clustering and bagging algorithms. Then we apply a Lasso-logistic regression learning ensemble to evaluate the credit risks. We show that the proposed algorithm outperforms popular credit scoring models such as decision tree, Lasso-logistic regression and random forests in terms of AUC and F-measure. We also provide two importance measures for the proposed model to identify important variables in the data. PMID:25706988
Mohd Yusof, Mohd Yusmiaidil Putera; Cauwels, Rita; Deschepper, Ellen; Martens, Luc
2015-08-01
The third molar development (TMD) has been widely utilized as one of the radiographic method for dental age estimation. By using the same radiograph of the same individual, third molar eruption (TME) information can be incorporated to the TMD regression model. This study aims to evaluate the performance of dental age estimation in individual method models and the combined model (TMD and TME) based on the classic regressions of multiple linear and principal component analysis. A sample of 705 digital panoramic radiographs of Malay sub-adults aged between 14.1 and 23.8 years was collected. The techniques described by Gleiser and Hunt (modified by Kohler) and Olze were employed to stage the TMD and TME, respectively. The data was divided to develop three respective models based on the two regressions of multiple linear and principal component analysis. The trained models were then validated on the test sample and the accuracy of age prediction was compared between each model. The coefficient of determination (R²) and root mean square error (RMSE) were calculated. In both genders, adjusted R² yielded an increment in the linear regressions of combined model as compared to the individual models. The overall decrease in RMSE was detected in combined model as compared to TMD (0.03-0.06) and TME (0.2-0.8). In principal component regression, low value of adjusted R(2) and high RMSE except in male were exhibited in combined model. Dental age estimation is better predicted using combined model in multiple linear regression models. Copyright © 2015 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
ERIC Educational Resources Information Center
Clarke, Paul; Crawford, Claire; Steele, Fiona; Vignoles, Anna
2015-01-01
The use of fixed (FE) and random effects (RE) in two-level hierarchical linear regression is discussed in the context of education research. We compare the robustness of FE models with the modelling flexibility and potential efficiency of those from RE models. We argue that the two should be seen as complementary approaches. We then compare both…
Ship traffic and shoreline erosion in the Lagoon of Venice
NASA Astrophysics Data System (ADS)
Scarpa, Gian Marco; Zaggia, Luca; Lorenzetti, Giuliano; Manfè, Giorgia; Parnell, Kevin; Molinaroli, Emanuela; Rapaglia, John; Gionta, Sofia
2016-04-01
A study based on the analysis of a historical sequence of aerial photographs and satellite images combined with in situ measurements revealed an unprecedented shoreline regression on the side of a major waterway in the Venice Lagoon, Italy. The study considered long and short-term recession rates caused by ship-induced depression wakes in an area which was reclaimed at the end of the '60 for the expansion of the nearby Porto Marghera Industrial Zone and never used since then. The GIS analysis performed with the available imagery shows an average retreat of about 4 m yr-1 in the period between 1965 and 2015. Field measurements carried out between April 2014 and January 2015 also revealed that the shoreline's regression still proceed with a speed comparable to the long-term average regardless of the distance from the navigation channel and is not constant through time. Periods of high water levels determined by astronomical tide or storm surges, more common in the winter season, are characterized by faster regression rates. The retreat proceeds by collapse of slabs of the reclaimed muddy soil after erosion and removal of the underlying original salt marsh sediments and is a discontinuous process in time and space depending on morphology, intrinsic propertiesand vegetation cover of the artificial deposits. Digitalization of historical maps and new bathymetric surveys made in April 2015 allowed for the reconstruction of two digital terrain models for both past and present situations. The two models have been used to calculate the total volume of sediment lost during the period between 1970 and 2015. The results of this study shows as ship-channel interactions can dominate the morphodynamics of a waterway and its margins and permitted to better understand how this part of the Venice Lagoon reacted to the pressure of human activities in the post-industrial period. Evaluation of the temporal and spatial variation of shoreline position is also crucial to predict future scenarios and manage the lagoon and its ecosystem services in the future.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoon Sohn; Charles Farrar; Norman Hunter
2001-01-01
This report summarizes the analysis of fiber-optic strain gauge data obtained from a surface-effect fast patrol boat being studied by the staff at the Norwegian Defense Research Establishment (NDRE) in Norway and the Naval Research Laboratory (NRL) in Washington D.C. Data from two different structural conditions were provided to the staff at Los Alamos National Laboratory. The problem was then approached from a statistical pattern recognition paradigm. This paradigm can be described as a four-part process: (1) operational evaluation, (2) data acquisition & cleansing, (3) feature extraction and data reduction, and (4) statistical model development for feature discrimination. Given thatmore » the first two portions of this paradigm were mostly completed by the NDRE and NRL staff, this study focused on data normalization, feature extraction, and statistical modeling for feature discrimination. The feature extraction process began by looking at relatively simple statistics of the signals and progressed to using the residual errors from auto-regressive (AR) models fit to the measured data as the damage-sensitive features. Data normalization proved to be the most challenging portion of this investigation. A novel approach to data normalization, where the residual errors in the AR model are considered to be an unmeasured input and an auto-regressive model with exogenous inputs (ARX) is then fit to portions of the data exhibiting similar waveforms, was successfully applied to this problem. With this normalization procedure, a clear distinction between the two different structural conditions was obtained. A false-positive study was also run, and the procedure developed herein did not yield any false-positive indications of damage. Finally, the results must be qualified by the fact that this procedure has only been applied to very limited data samples. A more complete analysis of additional data taken under various operational and environmental conditions as well as other structural conditions is necessary before one can definitively state that the procedure is robust enough to be used in practice.« less
Modeling animal-vehicle collisions using diagonal inflated bivariate Poisson regression.
Lao, Yunteng; Wu, Yao-Jan; Corey, Jonathan; Wang, Yinhai
2011-01-01
Two types of animal-vehicle collision (AVC) data are commonly adopted for AVC-related risk analysis research: reported AVC data and carcass removal data. One issue with these two data sets is that they were found to have significant discrepancies by previous studies. In order to model these two types of data together and provide a better understanding of highway AVCs, this study adopts a diagonal inflated bivariate Poisson regression method, an inflated version of bivariate Poisson regression model, to fit the reported AVC and carcass removal data sets collected in Washington State during 2002-2006. The diagonal inflated bivariate Poisson model not only can model paired data with correlation, but also handle under- or over-dispersed data sets as well. Compared with three other types of models, double Poisson, bivariate Poisson, and zero-inflated double Poisson, the diagonal inflated bivariate Poisson model demonstrates its capability of fitting two data sets with remarkable overlapping portions resulting from the same stochastic process. Therefore, the diagonal inflated bivariate Poisson model provides researchers a new approach to investigating AVCs from a different perspective involving the three distribution parameters (λ(1), λ(2) and λ(3)). The modeling results show the impacts of traffic elements, geometric design and geographic characteristics on the occurrences of both reported AVC and carcass removal data. It is found that the increase of some associated factors, such as speed limit, annual average daily traffic, and shoulder width, will increase the numbers of reported AVCs and carcass removals. Conversely, the presence of some geometric factors, such as rolling and mountainous terrain, will decrease the number of reported AVCs. Published by Elsevier Ltd.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Azeez, Adeboye; Obaromi, Davies; Odeyemi, Akinwumi; Ndege, James; Muntabayi, Ruffin
2016-07-26
Tuberculosis (TB) is a deadly infectious disease caused by Mycobacteria tuberculosis. Tuberculosis as a chronic and highly infectious disease is prevalent in almost every part of the globe. More than 95% of TB mortality occurs in low/middle income countries. In 2014, approximately 10 million people were diagnosed with active TB and two million died from the disease. In this study, our aim is to compare the predictive powers of the seasonal autoregressive integrated moving average (SARIMA) and neural network auto-regression (SARIMA-NNAR) models of TB incidence and analyse its seasonality in South Africa. TB incidence cases data from January 2010 to December 2015 were extracted from the Eastern Cape Health facility report of the electronic Tuberculosis Register (ERT.Net). A SARIMA model and a combined model of SARIMA model and a neural network auto-regression (SARIMA-NNAR) model were used in analysing and predicting the TB data from 2010 to 2015. Simulation performance parameters of mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), mean percent error (MPE), mean absolute scaled error (MASE) and mean absolute percentage error (MAPE) were applied to assess the better performance of prediction between the models. Though practically, both models could predict TB incidence, the combined model displayed better performance. For the combined model, the Akaike information criterion (AIC), second-order AIC (AICc) and Bayesian information criterion (BIC) are 288.56, 308.31 and 299.09 respectively, which were lower than the SARIMA model with corresponding values of 329.02, 327.20 and 341.99, respectively. The seasonality trend of TB incidence was forecast to have a slightly increased seasonal TB incidence trend from the SARIMA-NNAR model compared to the single model. The combined model indicated a better TB incidence forecasting with a lower AICc. The model also indicates the need for resolute intervention to reduce infectious disease transmission with co-infection with HIV and other concomitant diseases, and also at festival peak periods.
Sun, Zhuolu; Laporte, Audrey; Guerriere, Denise N; Coyte, Peter C
2017-05-01
With health system restructuring in Canada and a general preference by care recipients and their families to receive palliative care at home, attention to home-based palliative care continues to increase. A multidisciplinary team of health professionals is the most common delivery model for home-based palliative care in Canada. However, little is known about the changing temporal trends in the propensity and intensity of home-based palliative care. The purpose of this study was to assess the propensity to use home-based palliative care services, and once used, the intensity of that use for three main service categories: physician visits, nurse visits and care by personal support workers (PSWs) over the last decade. Three prospective cohort data sets were used to track changes in service use over the period 2005 to 2015. Service use for each category was assessed using a two-part model, and a Heckit regression was performed to assess the presence of selectivity bias. Service propensity was modelled using multivariate logistic regression analysis and service intensity was modelled using log-transformed ordinary least squares regression analysis. Both the propensity and intensity to use home-based physician visits and PSWs increased over the last decade, while service propensity and the intensity of nurse visits decreased. Meanwhile, there was a general tendency for service propensity and intensity to increase as the end of life approached. These findings demonstrate temporal changes towards increased use of home-based palliative care, and a shift to substitute care away from nursing to less expensive forms of care, specifically PSWs. These findings may provide a general idea of the types of services that are used more intensely and require more resources from multidisciplinary teams, as increased use of home-based palliative care has placed dramatic pressures on the budgets of local home and community care organisations. © 2016 John Wiley & Sons Ltd.
Knol, Mirjam J; van der Tweel, Ingeborg; Grobbee, Diederick E; Numans, Mattijs E; Geerlings, Mirjam I
2007-10-01
To determine the presence of interaction in epidemiologic research, typically a product term is added to the regression model. In linear regression, the regression coefficient of the product term reflects interaction as departure from additivity. However, in logistic regression it refers to interaction as departure from multiplicativity. Rothman has argued that interaction estimated as departure from additivity better reflects biologic interaction. So far, literature on estimating interaction on an additive scale using logistic regression only focused on dichotomous determinants. The objective of the present study was to provide the methods to estimate interaction between continuous determinants and to illustrate these methods with a clinical example. and results From the existing literature we derived the formulas to quantify interaction as departure from additivity between one continuous and one dichotomous determinant and between two continuous determinants using logistic regression. Bootstrapping was used to calculate the corresponding confidence intervals. To illustrate the theory with an empirical example, data from the Utrecht Health Project were used, with age and body mass index as risk factors for elevated diastolic blood pressure. The methods and formulas presented in this article are intended to assist epidemiologists to calculate interaction on an additive scale between two variables on a certain outcome. The proposed methods are included in a spreadsheet which is freely available at: http://www.juliuscenter.nl/additive-interaction.xls.
Estimating the Probability of Rare Events Occurring Using a Local Model Averaging.
Chen, Jin-Hua; Chen, Chun-Shu; Huang, Meng-Fan; Lin, Hung-Chih
2016-10-01
In statistical applications, logistic regression is a popular method for analyzing binary data accompanied by explanatory variables. But when one of the two outcomes is rare, the estimation of model parameters has been shown to be severely biased and hence estimating the probability of rare events occurring based on a logistic regression model would be inaccurate. In this article, we focus on estimating the probability of rare events occurring based on logistic regression models. Instead of selecting a best model, we propose a local model averaging procedure based on a data perturbation technique applied to different information criteria to obtain different probability estimates of rare events occurring. Then an approximately unbiased estimator of Kullback-Leibler loss is used to choose the best one among them. We design complete simulations to show the effectiveness of our approach. For illustration, a necrotizing enterocolitis (NEC) data set is analyzed. © 2016 Society for Risk Analysis.
Mathews, Mathew
2011-01-01
Culture is important to how populations understand the cause of mental disorder, a variable that has implications for treatment-seeking behaviour. Asian populations underutilize professional mental health treatment partly because of their endorsement of supernatural causation models to explain mental disorders, beliefs that stem from their religious backgrounds. This study sought to understand the dimensions of explanatory models used by three groups of Singaporean Chinese youth (n = 842)--Christian, Chinese religionist, no religion--and examined their responses to an instrument that combined explanations from psychological and organic perspectives on mental disorder with approaches from Asian and Western religious traditions. Factor analysis revealed five factors. Two were psychological corresponding to the humanistic and cognitive-behavioural perspectives respectively. Another two, which were supernatural in nature, dealt with karmaic beliefs popular among Asian religionists and more classical religious explanations common in monotheistic religions. The remaining factor was deemed a physiological model although it incorporated an item that made it consistent with an Asian organic model. While groups differed in their endorsement of supernatural explanations, psychological perspectives had the strongest endorsement among this population. Regression analysis showed that individuals who endorsed supernatural explanations more strongly tended to have no exposure to psychology courses and heightened religiosity.
Apostolopoulos, K N; Deligianni, D D
2008-02-01
An experimental model which can simulate physical changes that occur during aging was developed in order to evaluate the effects of change of mineral content and microstructure on ultrasonic properties of bovine cancellous bone. Timed immersion in hydrochloric acid was used to selectively alter the mineral content. Scanning electron microscopy and histological staining of the acid-treated trabeculae demonstrated a heterogeneous structure consisting of a mineralized core and a demineralized layer. The presence of organic matrix contributed very little to normalized broadband ultrasound attenuation (nBUA) and speed of sound. All three ultrasonic parameters, speed of sound, nBUA and backscatter coefficient, were sensitive to changes in apparent density of bovine cancellous bone. A two-component model utilizing a combination of two autocorrelation functions (a densely populated model and a spherical distribution) was used to approximate the backscatter coefficient. The predicted attenuation due to scattering constituted a significant part of the measured total attenuation (due to both scattering and absorption mechanisms) for bovine cancellous bone. Linear regression, performed between trabecular thickness values and estimated from the model correlation lengths, showed significant linear correlation, with R(2)=0.81 before and R(2)=0.80 after demineralization. The accuracy of estimation was found to increase with trabecular thickness.
Structure/activity relationships for biodegradability and their role in environmental assessment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boethling, R.S.
1994-12-31
Assessment of biodegradability is an important part of the review process for both new and existing chemicals under the Toxic Substances Control Act. It is often necessary to estimate biodegradability because experimental data are unavailable. Structure/biodegradability relationships (SBR) are a means to this end. Quantitative SBR have been developed, but this approach has not been very useful because they apply only to a few narrowly defined classes of chemicals. In response to the need for more widely applicable methods, multivariate analysis has been used to develop biodegradability classification models. For example, recent efforts have produced four new models. Two calculatemore » the probability of rapid biodegradation and can be used for classification; the other two models allow semi-quantitative estimation of primary and ultimate biodegradation rates. All are based on multiple regressions against 36 preselected substructures plus molecular weight. Such efforts have been fairly successful by statistical criteria, but in general are hampered by a lack of large and consistent datasets. Knowledge-based expert systems may represent the next step in the evolution of SBR. In principle such systems need not be as severely limited by imperfect datasets. However, the codification of expert knowledge and reasoning is a critical prerequisite. Results of knowledge acquisition exercises and modeling based on them will also be described.« less
A regression-based 3-D shoulder rhythm.
Xu, Xu; Lin, Jia-hua; McGorry, Raymond W
2014-03-21
In biomechanical modeling of the shoulder, it is important to know the orientation of each bone in the shoulder girdle when estimating the loads on each musculoskeletal element. However, because of the soft tissue overlying the bones, it is difficult to accurately derive the orientation of the clavicle and scapula using surface markers during dynamic movement. The purpose of this study is to develop two regression models which predict the orientation of the clavicle and the scapula. The first regression model uses humerus orientation and individual factors such as age, gender, and anthropometry data as the predictors. The second regression model includes only the humerus orientation as the predictor. Thirty-eight participants performed 118 static postures covering the volume of the right hand reach. The orientation of the thorax, clavicle, scapula and humerus were measured with a motion tracking system. Regression analysis was performed on the Euler angles decomposed from the orientation of each bone from 26 randomly selected participants. The regression models were then validated with the remaining 12 participants. The results indicate that for the first model, the r(2) of the predicted orientation of the clavicle and the scapula ranged between 0.31 and 0.65, and the RMSE obtained from the validation dataset ranged from 6.92° to 10.39°. For the second model, the r(2) ranged between 0.19 and 0.57, and the RMSE obtained from the validation dataset ranged from 6.62° and 11.13°. The derived regression-based shoulder rhythm could be useful in future biomechanical modeling of the shoulder. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Baldi, F; Alencar, M M; Albuquerque, L G
2010-12-01
The objective of this work was to estimate covariance functions using random regression models on B-splines functions of animal age, for weights from birth to adult age in Canchim cattle. Data comprised 49,011 records on 2435 females. The model of analysis included fixed effects of contemporary groups, age of dam as quadratic covariable and the population mean trend taken into account by a cubic regression on orthogonal polynomials of animal age. Residual variances were modelled through a step function with four classes. The direct and maternal additive genetic effects, and animal and maternal permanent environmental effects were included as random effects in the model. A total of seventeen analyses, considering linear, quadratic and cubic B-splines functions and up to seven knots, were carried out. B-spline functions of the same order were considered for all random effects. Random regression models on B-splines functions were compared to a random regression model on Legendre polynomials and with a multitrait model. Results from different models of analyses were compared using the REML form of the Akaike Information criterion and Schwarz' Bayesian Information criterion. In addition, the variance components and genetic parameters estimated for each random regression model were also used as criteria to choose the most adequate model to describe the covariance structure of the data. A model fitting quadratic B-splines, with four knots or three segments for direct additive genetic effect and animal permanent environmental effect and two knots for maternal additive genetic effect and maternal permanent environmental effect, was the most adequate to describe the covariance structure of the data. Random regression models using B-spline functions as base functions fitted the data better than Legendre polynomials, especially at mature ages, but higher number of parameters need to be estimated with B-splines functions. © 2010 Blackwell Verlag GmbH.
NASA Astrophysics Data System (ADS)
Ilie, Iulia; Dittrich, Peter; Carvalhais, Nuno; Jung, Martin; Heinemeyer, Andreas; Migliavacca, Mirco; Morison, James I. L.; Sippel, Sebastian; Subke, Jens-Arne; Wilkinson, Matthew; Mahecha, Miguel D.
2017-09-01
Accurate model representation of land-atmosphere carbon fluxes is essential for climate projections. However, the exact responses of carbon cycle processes to climatic drivers often remain uncertain. Presently, knowledge derived from experiments, complemented by a steadily evolving body of mechanistic theory, provides the main basis for developing such models. The strongly increasing availability of measurements may facilitate new ways of identifying suitable model structures using machine learning. Here, we explore the potential of gene expression programming (GEP) to derive relevant model formulations based solely on the signals present in data by automatically applying various mathematical transformations to potential predictors and repeatedly evolving the resulting model structures. In contrast to most other machine learning regression techniques, the GEP approach generates readable
models that allow for prediction and possibly for interpretation. Our study is based on two cases: artificially generated data and real observations. Simulations based on artificial data show that GEP is successful in identifying prescribed functions, with the prediction capacity of the models comparable to four state-of-the-art machine learning methods (random forests, support vector machines, artificial neural networks, and kernel ridge regressions). Based on real observations we explore the responses of the different components of terrestrial respiration at an oak forest in south-eastern England. We find that the GEP-retrieved models are often better in prediction than some established respiration models. Based on their structures, we find previously unconsidered exponential dependencies of respiration on seasonal ecosystem carbon assimilation and water dynamics. We noticed that the GEP models are only partly portable across respiration components, the identification of a general
terrestrial respiration model possibly prevented by equifinality issues. Overall, GEP is a promising tool for uncovering new model structures for terrestrial ecology in the data-rich era, complementing more traditional modelling approaches.
Uhrich, Mark A.; Kolasinac, Jasna; Booth, Pamela L.; Fountain, Robert L.; Spicer, Kurt R.; Mosbrucker, Adam R.
2014-01-01
Researchers at the U.S. Geological Survey, Cascades Volcano Observatory, investigated alternative methods for the traditional sample-based sediment record procedure in determining suspended-sediment concentration (SSC) and discharge. One such sediment-surrogate technique was developed using turbidity and discharge to estimate SSC for two gaging stations in the Toutle River Basin near Mount St. Helens, Washington. To provide context for the study, methods for collecting sediment data and monitoring turbidity are discussed. Statistical methods used include the development of ordinary least squares regression models for each gaging station. Issues of time-related autocorrelation also are evaluated. Addition of lagged explanatory variables was used to account for autocorrelation in the turbidity, discharge, and SSC data. Final regression model equations and plots are presented for the two gaging stations. The regression models support near-real-time estimates of SSC and improved suspended-sediment discharge records by incorporating continuous instream turbidity. Future use of such models may potentially lower the costs of sediment monitoring by reducing time it takes to collect and process samples and to derive a sediment-discharge record.
Astudillo, Mariana; Kuendig, Hervé; Centeno-Gil, Adriana; Wicki, Matthias; Gmel, Gerhard
2014-09-01
This study investigated the associations of alcohol outlet density with specific alcohol outcomes (consumption and consequences) among young men in Switzerland and assessed the possible geographically related variations. Alcohol consumption and drinking consequences were measured in a 2010-2011 study assessing substance use risk factors (Cohort Study on Substance Use Risk Factors) among 5519 young Swiss men. Outlet density was based on the number of on- and off-premise outlets in the district of residence. Linear regression models were run separately for drinking level, heavy episodic drinking (HED) and drinking consequences. Geographically weighted regression models were estimated when variations were recorded at the district level. No consistent association was found between outlet density and drinking consequences. A positive association between drinking level and HED with on-premise outlet density was found. Geographically weighted regressions were run for drinking level and HED. The predicted values for HED were higher in the southwest part of Switzerland (French-speaking part). Among Swiss young men, the density of outlets and, in particular, the abundance of bars, clubs and other on-premise outlets was associated with drinking level and HED, even when drinking consequences were not significantly affected. These findings support the idea that outlet density needs to be considered when developing and implementing regional-based prevention initiatives. © 2014 Australasian Professional Society on Alcohol and other Drugs.
Maximum Entropy Discrimination Poisson Regression for Software Reliability Modeling.
Chatzis, Sotirios P; Andreou, Andreas S
2015-11-01
Reliably predicting software defects is one of the most significant tasks in software engineering. Two of the major components of modern software reliability modeling approaches are: 1) extraction of salient features for software system representation, based on appropriately designed software metrics and 2) development of intricate regression models for count data, to allow effective software reliability data modeling and prediction. Surprisingly, research in the latter frontier of count data regression modeling has been rather limited. More specifically, a lack of simple and efficient algorithms for posterior computation has made the Bayesian approaches appear unattractive, and thus underdeveloped in the context of software reliability modeling. In this paper, we try to address these issues by introducing a novel Bayesian regression model for count data, based on the concept of max-margin data modeling, effected in the context of a fully Bayesian model treatment with simple and efficient posterior distribution updates. Our novel approach yields a more discriminative learning technique, making more effective use of our training data during model inference. In addition, it allows of better handling uncertainty in the modeled data, which can be a significant problem when the training data are limited. We derive elegant inference algorithms for our model under the mean-field paradigm and exhibit its effectiveness using the publicly available benchmark data sets.
Learning Models and Real-Time Speech Recognition.
ERIC Educational Resources Information Center
Danforth, Douglas G.; And Others
This report describes the construction and testing of two "psychological" learning models for the purpose of computer recognition of human speech over the telephone. One of the two models was found to be superior in all tests. A regression analysis yielded a 92.3% recognition rate for 14 subjects ranging in age from 6 to 13 years. Tests…
This paper presents an analysis of the CMAQ v4.5 model performance for particulate matter and its chemical components for the simulated year 2001. This is part two is two part series of papers that examines the model performance of CMAQ v4.5.
Genomic-Enabled Prediction of Ordinal Data with Bayesian Logistic Ordinal Regression.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Burgueño, Juan; Eskridge, Kent
2015-08-18
Most genomic-enabled prediction models developed so far assume that the response variable is continuous and normally distributed. The exception is the probit model, developed for ordered categorical phenotypes. In statistical applications, because of the easy implementation of the Bayesian probit ordinal regression (BPOR) model, Bayesian logistic ordinal regression (BLOR) is implemented rarely in the context of genomic-enabled prediction [sample size (n) is much smaller than the number of parameters (p)]. For this reason, in this paper we propose a BLOR model using the Pólya-Gamma data augmentation approach that produces a Gibbs sampler with similar full conditional distributions of the BPOR model and with the advantage that the BPOR model is a particular case of the BLOR model. We evaluated the proposed model by using simulation and two real data sets. Results indicate that our BLOR model is a good alternative for analyzing ordinal data in the context of genomic-enabled prediction with the probit or logit link. Copyright © 2015 Montesinos-López et al.
Klijs, Bart; Kibele, Eva U B; Ellwardt, Lea; Zuidersma, Marij; Stolk, Ronald P; Wittek, Rafael P M; Mendes de Leon, Carlos M; Smidt, Nynke
2016-08-11
Previous studies are inconclusive on whether poor socioeconomic conditions in the neighborhood are associated with major depressive disorder. Furthermore, conceptual models that relate neighborhood conditions to depressive disorder have not been evaluated using empirical data. In this study, we investigated whether neighborhood income is associated with major depressive episodes. We evaluated three conceptual models. Conceptual model 1: The association between neighborhood income and major depressive episodes is explained by diseases, lifestyle factors, stress and social participation. Conceptual model 2: A low individual income relative to the mean income in the neighborhood is associated with major depressive episodes. Conceptual model 3: A high income of the neighborhood buffers the effect of a low individual income on major depressive disorder. We used adult baseline data from the LifeLines Cohort Study (N = 71,058) linked with data on the participants' neighborhoods from Statistics Netherlands. The current presence of a major depressive episode was assessed using the MINI neuropsychiatric interview. The association between neighborhood income and major depressive episodes was assessed using a mixed effect logistic regression model adjusted for age, sex, marital status, education and individual (equalized) income. This regression model was sequentially adjusted for lifestyle factors, chronic diseases, stress, and social participation to evaluate conceptual model 1. To evaluate conceptual models 2 and 3, an interaction term for neighborhood income*individual income was included. Multivariate regression analysis showed that a low neighborhood income is associated with major depressive episodes (OR (95 % CI): 0.82 (0.73;0.93)). Adjustment for diseases, lifestyle factors, stress, and social participation attenuated this association (ORs (95 % CI): 0.90 (0.79;1.01)). Low individual income was also associated with major depressive episodes (OR (95 % CI): 0.72 (0.68;0.76)). The interaction of individual income*neighborhood income on major depressive episodes was not significant (p = 0.173). Living in a low-income neighborhood is associated with major depressive episodes. Our results suggest that this association is partly explained by chronic diseases, lifestyle factors, stress and poor social participation, and thereby partly confirm conceptual model 1. Our results do not support conceptual model 2 and 3.
AN IMPROVED STRATEGY FOR REGRESSION OF BIOPHYSICAL VARIABLES AND LANDSAT ETM+ DATA. (R828309)
Empirical models are important tools for relating field-measured biophysical variables to remote sensing data. Regression analysis has been a popular empirical method of linking these two types of data to provide continuous estimates for variables such as biomass, percent wood...
Selection of higher order regression models in the analysis of multi-factorial transcription data.
Prazeres da Costa, Olivia; Hoffman, Arthur; Rey, Johannes W; Mansmann, Ulrich; Buch, Thorsten; Tresch, Achim
2014-01-01
Many studies examine gene expression data that has been obtained under the influence of multiple factors, such as genetic background, environmental conditions, or exposure to diseases. The interplay of multiple factors may lead to effect modification and confounding. Higher order linear regression models can account for these effects. We present a new methodology for linear model selection and apply it to microarray data of bone marrow-derived macrophages. This experiment investigates the influence of three variable factors: the genetic background of the mice from which the macrophages were obtained, Yersinia enterocolitica infection (two strains, and a mock control), and treatment/non-treatment with interferon-γ. We set up four different linear regression models in a hierarchical order. We introduce the eruption plot as a new practical tool for model selection complementary to global testing. It visually compares the size and significance of effect estimates between two nested models. Using this methodology we were able to select the most appropriate model by keeping only relevant factors showing additional explanatory power. Application to experimental data allowed us to qualify the interaction of factors as either neutral (no interaction), alleviating (co-occurring effects are weaker than expected from the single effects), or aggravating (stronger than expected). We find a biologically meaningful gene cluster of putative C2TA target genes that appear to be co-regulated with MHC class II genes. We introduced the eruption plot as a tool for visual model comparison to identify relevant higher order interactions in the analysis of expression data obtained under the influence of multiple factors. We conclude that model selection in higher order linear regression models should generally be performed for the analysis of multi-factorial microarray data.
Traffic effects on bird counts on North American Breeding Bird Survey routes
Griffith, Emily H.; Sauer, John R.; Royle, J. Andrew
2010-01-01
The North American Breeding Bird Survey (BBS) is an annual roadside survey used to estimate population change in >420 species of birds that breed in North America. Roadside sampling has been criticized, in part because traffic noise can interfere with bird counts. Since 1997, data have been collected on the numbers of vehicles that pass during counts at each stop. We assessed the effect of traffic by modeling total vehicles as a covariate of counts in hierarchical Poisson regression models used to estimate population change. We selected species for analysis that represent birds detected at low and high abundance and birds with songs of low and high frequencies. Increases in vehicle counts were associated with decreases in bird counts in most of the species examined. The size and direction of these effects remained relatively constant between two alternative models that we analyzed. Although this analysis indicated only a small effect of incorporating traffic effects when modeling roadside counts of birds, we suggest that continued evaluation of changes in traffic at BBS stops should be a component of future BBS analyses.
Ford, Christopher N; Ng, Shu Wen; Popkin, Barry M
2015-01-01
Background: How beverage taxes might influence purchases of foods and beverages among households with preschool children is unclear. Thus, we examined the relation between beverage taxes and food and beverage purchases among US households with a child 2–5 y of age. Objectives: We examined how a potential tax on sugar-sweetened beverages (SSBs), or SSBs and >1% fat and/or high-sugar milk, would influence household food and beverage purchases among US households with a preschool child. We aimed to identify the lowest tax rate associated with meaningful changes in purchases. Methods: We used household food and beverage purchase data from households with a single child who participated in the 2009–2012 Nielsen Homescan Panel. A 2-part, multilevel panel model was used to examine the relation between beverage prices and food and beverage purchases. Logistic regression was used in the first part of the model to estimate the probability of a food/beverage being purchased, whereas the second part of the model used log-linear regression to estimate predicted changes in purchases among reporting households. Estimates from both parts were combined, and bootstrapping was performed to obtain corrected SEs. In separate models, prices of SSBs, or SSBs and >1% and/or high-sugar milk, were perturbed by +10%, +15%, and +20%. Predicted changes in food and beverage purchases were compared across models. Results: Price increases of 10%, 15%, and 20% on SSBs were associated with fewer purchases of juice drinks, whereas price increases of 10%, 15%, and 20% simulated on both SSBs plus >1% fat and/or high-sugar milk (combined tax) were associated with fewer kilocalories purchased from >1% fat, low-sugar milk, and meat, poultry, fish, and mixed meat dishes. Conclusions: Our study provides further evidence that a tax on beverages high in sugar and/or fat may be associated with favorable changes in beverage purchases among US households with a preschool child. PMID:26063069
Ford, Christopher N; Ng, Shu Wen; Popkin, Barry M
2015-08-01
How beverage taxes might influence purchases of foods and beverages among households with preschool children is unclear. Thus, we examined the relation between beverage taxes and food and beverage purchases among US households with a child 2-5 y of age. We examined how a potential tax on sugar-sweetened beverages (SSBs), or SSBs and >1% fat and/or high-sugar milk, would influence household food and beverage purchases among US households with a preschool child. We aimed to identify the lowest tax rate associated with meaningful changes in purchases. We used household food and beverage purchase data from households with a single child who participated in the 2009-2012 Nielsen Homescan Panel. A 2-part, multilevel panel model was used to examine the relation between beverage prices and food and beverage purchases. Logistic regression was used in the first part of the model to estimate the probability of a food/beverage being purchased, whereas the second part of the model used log-linear regression to estimate predicted changes in purchases among reporting households. Estimates from both parts were combined, and bootstrapping was performed to obtain corrected SEs. In separate models, prices of SSBs, or SSBs and >1% and/or high-sugar milk, were perturbed by +10%, +15%, and +20%. Predicted changes in food and beverage purchases were compared across models. Price increases of 10%, 15%, and 20% on SSBs were associated with fewer purchases of juice drinks, whereas price increases of 10%, 15%, and 20% simulated on both SSBs plus >1% fat and/or high-sugar milk (combined tax) were associated with fewer kilocalories purchased from >1% fat, low-sugar milk, and meat, poultry, fish, and mixed meat dishes. Our study provides further evidence that a tax on beverages high in sugar and/or fat may be associated with favorable changes in beverage purchases among US households with a preschool child. © 2015 American Society for Nutrition.
Olyphant, G.A.; Thomas, Joan; Whitman, R.L.; Harper, D.
2003-01-01
Two watersheds in northwestern Indiana were selected for detailed monitoring of bacterially contaminated discharges (Escherichia coli) into Lake Michigan. A large watershed that drains an urbanized area with treatment plants that release raw sewage during storms discharges into Lake Michigan at the outlet of Burns Ditch. A small watershed drains part of the Great Marsh, a wetland complex that has been disrupted by ditching and limited residential development, at the outlet of Derby Ditch. Monitoring at the outlet of Burns Ditch in 1999 and 2000 indicated that E. coli concentrations vary over two orders of magnitude during storms. During one storm, sewage overflows caused concentrations to increase to more than 10,000 cfu/100 mL for several hours. Monitoring at Derby Ditch from 1997 to 2000 also indicated that E. coli concentrations increase during storms with the highest concentrations generally occurring during rising streamflow. Multiple regression analysis indicated that 60% of the variability in measured outflows of E. coli from Derby Ditch (n = 88) could be accounted for by a model that utilizes continuously measured rainfall, stream discharge, soil temperature and depth to water table in the Great Marsh. A similar analysis indicated that 90% of the variability in measured E. coli concentrations at the outlet of Burns Ditch (n = 43) during storms could be accounted for by a combination of continuously measured water-quality variables including nitrate and ammonium. These models, which utilize data that can be collected on a real-time basis, could form part of an Early Warning System for predicting beach closures.
1990-05-01
0.759 0.744 0.768 0.753 106 (THUMBBR) THUMB BREADTH -0.652 -0.673 -0.539 -0.663 217 (LIPLGTHH) LIP LENGTH HEADBOARD 0.017 0.019 0.020 51 (FTBRHOR) FOOT...DEPENDENT VARIABLE: (106) THUMB BREADTH (THUBBR) MODEL INDEPENDENT VARIABLE 1 2 3 4 5 INTERCEPT 6.621 5.016 6.267 5.697 4.528 59 (HANDCIRC) HAND...95 (SLLSPEL) SLEEVE LENGTH: SPINE-ELBOW -0.020 -0.019 -C.018 9 (BLFTCIRC) BALL OF FOOT CIRCUMFERENCE -0.032 -0.039 106 (THUMBBR) THUMB BREADTH 0.228
Suppression of the oculocephalic reflex (doll's eyes phenomenon) in normal full-term babies.
Snir, Moshe; Hasanreisoglu, Murat; Hasanreisoglue, Murat; Goldenberg-Cohen, Nitza; Friling, Ronit; Katz, Kalman; Nachum, Yoav; Benjamini, Yoav; Herscovici, Zvi; Axer-Siegel, Ruth
2010-05-01
To determine the precise age of suppression of the oculocephalic reflex in infants and its relationship to specific clinical characteristics. The oculocephalic reflex was prospectively tested in 325 healthy full-term babies aged 1 to 32 weeks attending an orthopedic outpatient clinic. Two ophthalmologists raised the baby's head 30 degrees above horizontal and rapidly rotated it in the horizontal and vertical planes while watching the conjugate eye movement. Suppression of the reflex, by observer agreement, was analyzed in relation to gestational age, postpartum age, postconceptional age, birth weight, and current weight. The data were fitted to a logistic regression model to determine the probability of suppression of the reflex according to the clinical variables. The oculocephalic reflex was suppressed in 75% of babies by the age of 11.5 weeks and in more than 95% of babies aged 20 weeks. Although postpartum age had a greater influence than gestational age, both were significantly correlated with suppression of the reflex (p = 0.01 and p = 0.04, respectively; two-sided t-test). Postpartum age was the best single variable explaining absence of the reflex. On logistic regression with cross-validation, the model including postpartum age and current weight yielded the best results; both these factors were highly correlated with suppression of the reflex (r = 0.74). The oculocephalic reflex is suppressed in the vast majority of normal infants by age 11.5 weeks. The disappearance of the reflex occurs gradually and longitudinally and is part of the normal maturation of the visual system.
NASA Astrophysics Data System (ADS)
Mei, Zhixiong; Wu, Hao; Li, Shiyun
2018-06-01
The Conversion of Land Use and its Effects at Small regional extent (CLUE-S), which is a widely used model for land-use simulation, utilizes logistic regression to estimate the relationships between land use and its drivers, and thus, predict land-use change probabilities. However, logistic regression disregards possible spatial autocorrelation and self-organization in land-use data. Autologistic regression can depict spatial autocorrelation but cannot address self-organization, while logistic regression by considering only self-organization (NElogistic regression) fails to capture spatial autocorrelation. Therefore, this study developed a regression (NE-autologistic regression) method, which incorporated both spatial autocorrelation and self-organization, to improve CLUE-S. The Zengcheng District of Guangzhou, China was selected as the study area. The land-use data of 2001, 2005, and 2009, as well as 10 typical driving factors, were used to validate the proposed regression method and the improved CLUE-S model. Then, three future land-use scenarios in 2020: the natural growth scenario, ecological protection scenario, and economic development scenario, were simulated using the improved model. Validation results showed that NE-autologistic regression performed better than logistic regression, autologistic regression, and NE-logistic regression in predicting land-use change probabilities. The spatial allocation accuracy and kappa values of NE-autologistic-CLUE-S were higher than those of logistic-CLUE-S, autologistic-CLUE-S, and NE-logistic-CLUE-S for the simulations of two periods, 2001-2009 and 2005-2009, which proved that the improved CLUE-S model achieved the best simulation and was thereby effective to a certain extent. The scenario simulation results indicated that under all three scenarios, traffic land and residential/industrial land would increase, whereas arable land and unused land would decrease during 2009-2020. Apparent differences also existed in the simulated change sizes and locations of each land-use type under different scenarios. The results not only demonstrate the validity of the improved model but also provide a valuable reference for relevant policy-makers.
Li, Ji; Gray, B.R.; Bates, D.M.
2008-01-01
Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
A Heckman selection model for the safety analysis of signalized intersections
Wong, S. C.; Zhu, Feng; Pei, Xin; Huang, Helai; Liu, Youjun
2017-01-01
Purpose The objective of this paper is to provide a new method for estimating crash rate and severity simultaneously. Methods This study explores a Heckman selection model of the crash rate and severity simultaneously at different levels and a two-step procedure is used to investigate the crash rate and severity levels. The first step uses a probit regression model to determine the sample selection process, and the second step develops a multiple regression model to simultaneously evaluate the crash rate and severity for slight injury/kill or serious injury (KSI), respectively. The model uses 555 observations from 262 signalized intersections in the Hong Kong metropolitan area, integrated with information on the traffic flow, geometric road design, road environment, traffic control and any crashes that occurred during two years. Results The results of the proposed two-step Heckman selection model illustrate the necessity of different crash rates for different crash severity levels. Conclusions A comparison with the existing approaches suggests that the Heckman selection model offers an efficient and convenient alternative method for evaluating the safety performance at signalized intersections. PMID:28732050
2013-01-01
Background History taking and empathetic communication are two important aspects in successful physician-patient interaction. Gathering important information from the patient’s medical history is needed for effective clinical decision making while empathy is relevant for patient satisfaction. We wanted to investigate whether medical students near graduation are able to combine both skills as required in daily medical practice. Methods Thirty near graduates from Hamburg Medical School participated in an assessment for clinical competences including a consultation hour with five standardized patients. Each patient interview was videotaped and standardized patients rated participants with the CARE questionnaire for consultation and relational empathy. All videotaped interviews were rated with a checklist based on the number of important medical aspects for each case. Data were analysed with the linear mixed model to correct for random effects. Regression analysis was performed to look for correlations between the number of questions asked by a participant and their respective empathy rating. Results Of the 123 aspects that could have been gathered in total, students only requested 56.4% (95% CI 53.5-59.3). While no difference between male and female participants was found, a significant difference (p < .001) was observed between the two parts of the checklist with 61.1% (95% CI 57.9-64.3) of aspects asked for in part 1 (patient’s symptoms) versus 52.0 (95 47.4-56.7) in part 2 (further history). All female standardized patients combined rated female participants (mean score 14.2, 95% CI 12.3-16.3) to be significantly (p < .01) more empathetic than male participants (mean score 19.2, 95% CI 16.3-22.6). Regression analysis revealed no correlation between the number of medical aspects gathered by a participant and his or her respective empathy score given by the standardized patient in the CARE questionnaire. Conclusion Gathering sufficient medical data from a patient’s history and empathetic communication are two completely separate sides of the coin of history taking. While both skills have to be acquired during medical school training with particular focus on their respective learning objectives, medical students need to be provided with additional learning and feedback opportunities where they can be observed exercising both skills combined as required in physicians’ daily practice. PMID:23659369
Ohm, Friedemann; Vogel, Daniela; Sehner, Susanne; Wijnen-Meijer, Marjo; Harendza, Sigrid
2013-05-09
History taking and empathetic communication are two important aspects in successful physician-patient interaction. Gathering important information from the patient's medical history is needed for effective clinical decision making while empathy is relevant for patient satisfaction. We wanted to investigate whether medical students near graduation are able to combine both skills as required in daily medical practice. Thirty near graduates from Hamburg Medical School participated in an assessment for clinical competences including a consultation hour with five standardized patients. Each patient interview was videotaped and standardized patients rated participants with the CARE questionnaire for consultation and relational empathy. All videotaped interviews were rated with a checklist based on the number of important medical aspects for each case. Data were analysed with the linear mixed model to correct for random effects. Regression analysis was performed to look for correlations between the number of questions asked by a participant and their respective empathy rating. Of the 123 aspects that could have been gathered in total, students only requested 56.4% (95% CI 53.5-59.3). While no difference between male and female participants was found, a significant difference (p<.001) was observed between the two parts of the checklist with 61.1% (95% CI 57.9-64.3) of aspects asked for in part 1 (patient's symptoms) versus 52.0 (95 47.4-56.7) in part 2 (further history). All female standardized patients combined rated female participants (mean score 14.2, 95% CI 12.3-16.3) to be significantly (p<.01) more empathetic than male participants (mean score 19.2, 95% CI 16.3-22.6). Regression analysis revealed no correlation between the number of medical aspects gathered by a participant and his or her respective empathy score given by the standardized patient in the CARE questionnaire. Gathering sufficient medical data from a patient's history and empathetic communication are two completely separate sides of the coin of history taking. While both skills have to be acquired during medical school training with particular focus on their respective learning objectives, medical students need to be provided with additional learning and feedback opportunities where they can be observed exercising both skills combined as required in physicians' daily practice.
Finding gene clusters for a replicated time course study
2014-01-01
Background Finding genes that share similar expression patterns across samples is an important question that is frequently asked in high-throughput microarray studies. Traditional clustering algorithms such as K-means clustering and hierarchical clustering base gene clustering directly on the observed measurements and do not take into account the specific experimental design under which the microarray data were collected. A new model-based clustering method, the clustering of regression models method, takes into account the specific design of the microarray study and bases the clustering on how genes are related to sample covariates. It can find useful gene clusters for studies from complicated study designs such as replicated time course studies. Findings In this paper, we applied the clustering of regression models method to data from a time course study of yeast on two genotypes, wild type and YOX1 mutant, each with two technical replicates, and compared the clustering results with K-means clustering. We identified gene clusters that have similar expression patterns in wild type yeast, two of which were missed by K-means clustering. We further identified gene clusters whose expression patterns were changed in YOX1 mutant yeast compared to wild type yeast. Conclusions The clustering of regression models method can be a valuable tool for identifying genes that are coordinately transcribed by a common mechanism. PMID:24460656
Fritscher, Karl; Schuler, Benedikt; Link, Thomas; Eckstein, Felix; Suhm, Norbert; Hänni, Markus; Hengg, Clemens; Schubert, Rainer
2008-01-01
Fractures of the proximal femur are one of the principal causes of mortality among elderly persons. Traditional methods for the determination of femoral fracture risk use methods for measuring bone mineral density. However, BMD alone is not sufficient to predict bone failure load for an individual patient and additional parameters have to be determined for this purpose. In this work an approach that uses statistical models of appearance to identify relevant regions and parameters for the prediction of biomechanical properties of the proximal femur will be presented. By using Support Vector Regression the proposed model based approach is capable of predicting two different biomechanical parameters accurately and fully automatically in two different testing scenarios.
Mapping health outcome measures from a stroke registry to EQ-5D weights
2013-01-01
Purpose To map health outcome related variables from a national register, not part of any validated instrument, with EQ-5D weights among stroke patients. Methods We used two cross-sectional data sets including patient characteristics, outcome variables and EQ-5D weights from the national Swedish stroke register. Three regression techniques were used on the estimation set (n = 272): ordinary least squares (OLS), Tobit, and censored least absolute deviation (CLAD). The regression coefficients for “dressing“, “toileting“, “mobility”, “mood”, “general health” and “proxy-responders” were applied to the validation set (n = 272), and the performance was analysed with mean absolute error (MAE) and mean square error (MSE). Results The number of statistically significant coefficients varied by model, but all models generated consistent coefficients in terms of sign. Mean utility was underestimated in all models (least in OLS) and with lower variation (least in OLS) compared to the observed. The maximum attainable EQ-5D weight ranged from 0.90 (OLS) to 1.00 (Tobit and CLAD). Health states with utility weights <0.5 had greater errors than those with weights ≥0.5 (P < 0.01). Conclusion This study indicates that it is possible to map non-validated health outcome measures from a stroke register into preference-based utilities to study the development of stroke care over time, and to compare with other conditions in terms of utility. PMID:23496957
Chau, Kénora; Kabuth, Bernard; Chau, Nearkasen
2016-01-01
The risk of suicide behaviors in immigrant adolescents varies across countries and remains partly understood. We conducted a study in France to examine immigrant adolescents’ likelihood of experiencing suicide ideation in the last 12 months (SI) and lifetime suicide attempts (SA) compared with their native counterparts, and the contribution of socioeconomic factors and school, behavior, and health-related difficulties. Questionnaires were completed by 1559 middle-school adolescents from north-eastern France including various risk factors, SI, SA, and their first occurrence over adolescent’s life course (except SI). Data were analyzed using logistic regression models for SI and Cox regression models for SA (retaining only school, behavior, and health-related difficulties that started before SA). Immigrant adolescents had a two-time higher risk of SI and SA than their native counterparts. Using nested models, the excess SI risk was highly explained by socioeconomic factors (27%) and additional school, behavior, and health-related difficulties (24%) but remained significant. The excess SA risk was more highly explained by these issues (40% and 85%, respectively) and became non-significant. These findings demonstrate the risk patterns of SI and SA and the prominent confounding roles of socioeconomic factors and school, behavior, and health-related difficulties. They may be provided to policy makers, schools, carers, and various organizations interested in immigrant, adolescent, and suicide-behavior problems. PMID:27809296
NASA Astrophysics Data System (ADS)
Shen, S. S.
2014-12-01
This presentation describes a suite of global precipitation products reconstructed by a multivariate regression method using an empirical orthogonal function (EOF) expansion. The sampling errors of the reconstruction are estimated for each product datum entry. The maximum temporal coverage is 1850-present and the spatial coverage is quasi-global (75S, 75N). The temporal resolution ranges from 5-day, monthly, to seasonal and annual. The Global Precipitation Climatology Project (GPCP) precipitation data from 1979-2008 are used to calculate the EOFs. The Global Historical Climatology Network (GHCN) gridded data are used to calculate the regression coefficients for reconstructions. The sampling errors of the reconstruction are analyzed in detail for different EOF modes. Our reconstructed 1900-2011 time series of the global average annual precipitation shows a 0.024 (mm/day)/100a trend, which is very close to the trend derived from the mean of 25 models of the CMIP5 (Coupled Model Intercomparison Project Phase 5). Our reconstruction examples of 1983 El Niño precipitation and 1917 La Niña precipitation (Figure 1) demonstrate that the El Niño and La Niña precipitation patterns are well reflected in the first two EOFs. The validation of our reconstruction results with GPCP makes it possible to use the reconstruction as the benchmark data for climate models. This will help the climate modeling community to improve model precipitation mechanisms and reduce the systematic difference between observed global precipitation, which hovers at around 2.7 mm/day for reconstructions and GPCP, and model precipitations, which have a range of 2.6-3.3 mm/day for CMIP5. Our precipitation products are publically available online, including digital data, precipitation animations, computer codes, readme files, and the user manual. This work is a joint effort between San Diego State University (Sam Shen, Nancy Tafolla, Barbara Sperberg, and Melanie Thorn) and University of Maryland (Phil Arkin, Tom Smith, Li Ren, and Li Dai) and supported in part by the U.S. National Science Foundation (Awards No. AGS-1015926 and AGS-1015957).
A rational model of function learning.
Lucas, Christopher G; Griffiths, Thomas L; Williams, Joseph J; Kalish, Michael L
2015-10-01
Theories of how people learn relationships between continuous variables have tended to focus on two possibilities: one, that people are estimating explicit functions, or two that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, which provide a probabilistic basis for similarity-based function learning, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a rational model of human function learning that combines the strengths of both approaches and accounts for a wide variety of experimental results.
Comparative analysis of used car price evaluation models
NASA Astrophysics Data System (ADS)
Chen, Chuancan; Hao, Lulu; Xu, Cong
2017-05-01
An accurate used car price evaluation is a catalyst for the healthy development of used car market. Data mining has been applied to predict used car price in several articles. However, little is studied on the comparison of using different algorithms in used car price estimation. This paper collects more than 100,000 used car dealing records throughout China to do empirical analysis on a thorough comparison of two algorithms: linear regression and random forest. These two algorithms are used to predict used car price in three different models: model for a certain car make, model for a certain car series and universal model. Results show that random forest has a stable but not ideal effect in price evaluation model for a certain car make, but it shows great advantage in the universal model compared with linear regression. This indicates that random forest is an optimal algorithm when handling complex models with a large number of variables and samples, yet it shows no obvious advantage when coping with simple models with less variables.
NASA Astrophysics Data System (ADS)
Shafizadeh-Moghadam, Hossein; Helbich, Marco
2015-03-01
The rapid growth of megacities requires special attention among urban planners worldwide, and particularly in Mumbai, India, where growth is very pronounced. To cope with the planning challenges this will bring, developing a retrospective understanding of urban land-use dynamics and the underlying driving-forces behind urban growth is a key prerequisite. This research uses regression-based land-use change models - and in particular non-spatial logistic regression models (LR) and auto-logistic regression models (ALR) - for the Mumbai region over the period 1973-2010, in order to determine the drivers behind spatiotemporal urban expansion. Both global models are complemented by a local, spatial model, the so-called geographically weighted logistic regression (GWLR) model, one that explicitly permits variations in driving-forces across space. The study comes to two main conclusions. First, both global models suggest similar driving-forces behind urban growth over time, revealing that LRs and ALRs result in estimated coefficients with comparable magnitudes. Second, all the local coefficients show distinctive temporal and spatial variations. It is therefore concluded that GWLR aids our understanding of urban growth processes, and so can assist context-related planning and policymaking activities when seeking to secure a sustainable urban future.
Statistical Modeling of Fire Occurrence Using Data from the Tōhoku, Japan Earthquake and Tsunami.
Anderson, Dana; Davidson, Rachel A; Himoto, Keisuke; Scawthorn, Charles
2016-02-01
In this article, we develop statistical models to predict the number and geographic distribution of fires caused by earthquake ground motion and tsunami inundation in Japan. Using new, uniquely large, and consistent data sets from the 2011 Tōhoku earthquake and tsunami, we fitted three types of models-generalized linear models (GLMs), generalized additive models (GAMs), and boosted regression trees (BRTs). This is the first time the latter two have been used in this application. A simple conceptual framework guided identification of candidate covariates. Models were then compared based on their out-of-sample predictive power, goodness of fit to the data, ease of implementation, and relative importance of the framework concepts. For the ground motion data set, we recommend a Poisson GAM; for the tsunami data set, a negative binomial (NB) GLM or NB GAM. The best models generate out-of-sample predictions of the total number of ignitions in the region within one or two. Prefecture-level prediction errors average approximately three. All models demonstrate predictive power far superior to four from the literature that were also tested. A nonlinear relationship is apparent between ignitions and ground motion, so for GLMs, which assume a linear response-covariate relationship, instrumental intensity was the preferred ground motion covariate because it captures part of that nonlinearity. Measures of commercial exposure were preferred over measures of residential exposure for both ground motion and tsunami ignition models. This may vary in other regions, but nevertheless highlights the value of testing alternative measures for each concept. Models with the best predictive power included two or three covariates. © 2015 Society for Risk Analysis.
Geographically weighted regression model on poverty indicator
NASA Astrophysics Data System (ADS)
Slamet, I.; Nugroho, N. F. T. A.; Muslich
2017-12-01
In this research, we applied geographically weighted regression (GWR) for analyzing the poverty in Central Java. We consider Gaussian Kernel as weighted function. The GWR uses the diagonal matrix resulted from calculating kernel Gaussian function as a weighted function in the regression model. The kernel weights is used to handle spatial effects on the data so that a model can be obtained for each location. The purpose of this paper is to model of poverty percentage data in Central Java province using GWR with Gaussian kernel weighted function and to determine the influencing factors in each regency/city in Central Java province. Based on the research, we obtained geographically weighted regression model with Gaussian kernel weighted function on poverty percentage data in Central Java province. We found that percentage of population working as farmers, population growth rate, percentage of households with regular sanitation, and BPJS beneficiaries are the variables that affect the percentage of poverty in Central Java province. In this research, we found the determination coefficient R2 are 68.64%. There are two categories of district which are influenced by different of significance factors.
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
Xing, Dongyuan; Huang, Yangxin; Chen, Henian; Zhu, Yiliang; Dagne, Getachew A; Baldwin, Julie
2017-08-01
Semicontinuous data featured with an excessive proportion of zeros and right-skewed continuous positive values arise frequently in practice. One example would be the substance abuse/dependence symptoms data for which a substantial proportion of subjects investigated may report zero. Two-part mixed-effects models have been developed to analyze repeated measures of semicontinuous data from longitudinal studies. In this paper, we propose a flexible two-part mixed-effects model with skew distributions for correlated semicontinuous alcohol data under the framework of a Bayesian approach. The proposed model specification consists of two mixed-effects models linked by the correlated random effects: (i) a model on the occurrence of positive values using a generalized logistic mixed-effects model (Part I); and (ii) a model on the intensity of positive values using a linear mixed-effects model where the model errors follow skew distributions including skew- t and skew-normal distributions (Part II). The proposed method is illustrated with an alcohol abuse/dependence symptoms data from a longitudinal observational study, and the analytic results are reported by comparing potential models under different random-effects structures. Simulation studies are conducted to assess the performance of the proposed models and method.
[Primary and secondary encopresis].
Lang-Langer, Ellen
2007-01-01
While the difficulty of the child to part with its faeces in primary encopresis is linked to the incapability to experience the object as separated and independent from himself, secondary encopresis is a progressed psychical state of development. In this case we have to deal with regression caused by conflict. Two case-studies show clearly the differences.
A Regression Framework for Effect Size Assessments in Longitudinal Modeling of Group Differences
Feingold, Alan
2013-01-01
The use of growth modeling analysis (GMA)--particularly multilevel analysis and latent growth modeling--to test the significance of intervention effects has increased exponentially in prevention science, clinical psychology, and psychiatry over the past 15 years. Model-based effect sizes for differences in means between two independent groups in GMA can be expressed in the same metric (Cohen’s d) commonly used in classical analysis and meta-analysis. This article first reviews conceptual issues regarding calculation of d for findings from GMA and then introduces an integrative framework for effect size assessments that subsumes GMA. The new approach uses the structure of the linear regression model, from which effect sizes for findings from diverse cross-sectional and longitudinal analyses can be calculated with familiar statistics, such as the regression coefficient, the standard deviation of the dependent measure, and study duration. PMID:23956615
Comparison of stream invertebrate response models for bioassessment metric
Waite, Ian R.; Kennen, Jonathan G.; May, Jason T.; Brown, Larry R.; Cuffney, Thomas F.; Jones, Kimberly A.; Orlando, James L.
2012-01-01
We aggregated invertebrate data from various sources to assemble data for modeling in two ecoregions in Oregon and one in California. Our goal was to compare the performance of models developed using multiple linear regression (MLR) techniques with models developed using three relatively new techniques: classification and regression trees (CART), random forest (RF), and boosted regression trees (BRT). We used tolerance of taxa based on richness (RICHTOL) and ratio of observed to expected taxa (O/E) as response variables and land use/land cover as explanatory variables. Responses were generally linear; therefore, there was little improvement to the MLR models when compared to models using CART and RF. In general, the four modeling techniques (MLR, CART, RF, and BRT) consistently selected the same primary explanatory variables for each region. However, results from the BRT models showed significant improvement over the MLR models for each region; increases in R2 from 0.09 to 0.20. The O/E metric that was derived from models specifically calibrated for Oregon consistently had lower R2 values than RICHTOL for the two regions tested. Modeled O/E R2 values were between 0.06 and 0.10 lower for each of the four modeling methods applied in the Willamette Valley and were between 0.19 and 0.36 points lower for the Blue Mountains. As a result, BRT models may indeed represent a good alternative to MLR for modeling species distribution relative to environmental variables.
Estimating parasitic sea lamprey abundance in Lake Huron from heterogenous data sources
Young, Robert J.; Jones, Michael L.; Bence, James R.; McDonald, Rodney B.; Mullett, Katherine M.; Bergstedt, Roger A.
2003-01-01
The Great Lakes Fishery Commission uses time series of transformer, parasitic, and spawning population estimates to evaluate the effectiveness of its sea lamprey (Petromyzon marinus) control program. This study used an inverse variance weighting method to integrate Lake Huron sea lamprey population estimates derived from two estimation procedures: 1) prediction of the lake-wide spawning population from a regression model based on stream size and, 2) whole-lake mark and recapture estimates. In addition, we used a re-sampling procedure to evaluate the effect of trading off sampling effort between the regression and mark-recapture models. Population estimates derived from the regression model ranged from 132,000 to 377,000 while mark-recapture estimates of marked recently metamorphosed juveniles and parasitic sea lampreys ranged from 536,000 to 634,000 and 484,000 to 1,608,000, respectively. The precision of the estimates varied greatly among estimation procedures and years. The integrated estimate of the mark-recapture and spawner regression procedures ranged from 252,000 to 702,000 transformers. The re-sampling procedure indicated that the regression model is more sensitive to reduction in sampling effort than the mark-recapture model. Reliance on either the regression or mark-recapture model alone could produce misleading estimates of abundance of sea lampreys and the effect of the control program on sea lamprey abundance. These analyses indicate that the precision of the lakewide population estimate can be maximized by re-allocating sampling effort from marking sea lampreys to trapping additional streams.
Farid, Asam; Jadoon, Khanzaib; Akhter, Gulraiz; Iqbal, Muhammad Asim
2013-03-01
Hydrostratigraphy and hydrogeology of the Maira vicinity is important for the characterization of aquifer system and developing numerical groundwater flow models to predict the future availability of the water resource. Conventionally, the aquifer parameters are obtained by the analysis of pumping tests data which provide limited spatial information and turn out to be costly and time consuming. Vertical electrical soundings and pump testing of boreholes were conducted to delineate the aquifer system at the western part of the Maira area, Khyber Pakhtun Khwa, Pakistan. Aquifer lithology in the eastern part of the study area is dominated by coarse sand and gravel whereas the western part is characterized by fine sand. An attempt has been made to estimate the hydraulic conductivity of the aquifer system by establishing a relationship between the pumping test results and vertical electrical soundings by using regression technique. The relationship is applied to the area along the resistivity profiles where boreholes are not drilled. Our findings show a good match between pumped hydraulic conductivity and estimated hydraulic conductivity. In case of sparse borehole data, regression technique is useful in estimating hydraulic properties for aquifers with varying lithology.
Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.
Xie, Yanmei; Zhang, Biao
2017-04-20
Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).
Ordinal regression models to describe tourist satisfaction with Sintra's world heritage
NASA Astrophysics Data System (ADS)
Mouriño, Helena
2013-10-01
In Tourism Research, ordinal regression models are becoming a very powerful tool in modelling the relationship between an ordinal response variable and a set of explanatory variables. In August and September 2010, we conducted a pioneering Tourist Survey in Sintra, Portugal. The data were obtained by face-to-face interviews at the entrances of the Palaces and Parks of Sintra. The work developed in this paper focus on two main points: tourists' perception of the entrance fees; overall level of satisfaction with this heritage site. For attaining these goals, ordinal regression models were developed. We concluded that tourist's nationality was the only significant variable to describe the perception of the admission fees. Also, Sintra's image among tourists depends not only on their nationality, but also on previous knowledge about Sintra's World Heritage status.
NASA Astrophysics Data System (ADS)
Kneringer, Philipp; Dietz, Sebastian; Mayr, Georg J.; Zeileis, Achim
2017-04-01
Low-visibility conditions have a large impact on aviation safety and economic efficiency of airports and airlines. To support decision makers, we develop a statistical probabilistic nowcasting tool for the occurrence of capacity-reducing operations related to low visibility. The probabilities of four different low visibility classes are predicted with an ordered logistic regression model based on time series of meteorological point measurements. Potential predictor variables for the statistical models are visibility, humidity, temperature and wind measurements at several measurement sites. A stepwise variable selection method indicates that visibility and humidity measurements are the most important model inputs. The forecasts are tested with a 30 minute forecast interval up to two hours, which is a sufficient time span for tactical planning at Vienna Airport. The ordered logistic regression models outperform persistence and are competitive with human forecasters.
An algebraic method for constructing stable and consistent autoregressive filters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harlim, John, E-mail: jharlim@psu.edu; Department of Meteorology, the Pennsylvania State University, University Park, PA 16802; Hong, Hoon, E-mail: hong@ncsu.edu
2015-02-15
In this paper, we introduce an algebraic method to construct stable and consistent univariate autoregressive (AR) models of low order for filtering and predicting nonlinear turbulent signals with memory depth. By stable, we refer to the classical stability condition for the AR model. By consistent, we refer to the classical consistency constraints of Adams–Bashforth methods of order-two. One attractive feature of this algebraic method is that the model parameters can be obtained without directly knowing any training data set as opposed to many standard, regression-based parameterization methods. It takes only long-time average statistics as inputs. The proposed method provides amore » discretization time step interval which guarantees the existence of stable and consistent AR model and simultaneously produces the parameters for the AR models. In our numerical examples with two chaotic time series with different characteristics of decaying time scales, we find that the proposed AR models produce significantly more accurate short-term predictive skill and comparable filtering skill relative to the linear regression-based AR models. These encouraging results are robust across wide ranges of discretization times, observation times, and observation noise variances. Finally, we also find that the proposed model produces an improved short-time prediction relative to the linear regression-based AR-models in forecasting a data set that characterizes the variability of the Madden–Julian Oscillation, a dominant tropical atmospheric wave pattern.« less
A controlled experiment in ground water flow model calibration
Hill, M.C.; Cooley, R.L.; Pollock, D.W.
1998-01-01
Nonlinear regression was introduced to ground water modeling in the 1970s, but has been used very little to calibrate numerical models of complicated ground water systems. Apparently, nonlinear regression is thought by many to be incapable of addressing such complex problems. With what we believe to be the most complicated synthetic test case used for such a study, this work investigates using nonlinear regression in ground water model calibration. Results of the study fall into two categories. First, the study demonstrates how systematic use of a well designed nonlinear regression method can indicate the importance of different types of data and can lead to successive improvement of models and their parameterizations. Our method differs from previous methods presented in the ground water literature in that (1) weighting is more closely related to expected data errors than is usually the case; (2) defined diagnostic statistics allow for more effective evaluation of the available data, the model, and their interaction; and (3) prior information is used more cautiously. Second, our results challenge some commonly held beliefs about model calibration. For the test case considered, we show that (1) field measured values of hydraulic conductivity are not as directly applicable to models as their use in some geostatistical methods imply; (2) a unique model does not necessarily need to be identified to obtain accurate predictions; and (3) in the absence of obvious model bias, model error was normally distributed. The complexity of the test case involved implies that the methods used and conclusions drawn are likely to be powerful in practice.Nonlinear regression was introduced to ground water modeling in the 1970s, but has been used very little to calibrate numerical models of complicated ground water systems. Apparently, nonlinear regression is thought by many to be incapable of addressing such complex problems. With what we believe to be the most complicated synthetic test case used for such a study, this work investigates using nonlinear regression in ground water model calibration. Results of the study fall into two categories. First, the study demonstrates how systematic use of a well designed nonlinear regression method can indicate the importance of different types of data and can lead to successive improvement of models and their parameterizations. Our method differs from previous methods presented in the ground water literature in that (1) weighting is more closely related to expected data errors than is usually the case; (2) defined diagnostic statistics allow for more effective evaluation of the available data, the model, and their interaction; and (3) prior information is used more cautiously. Second, our results challenge some commonly held beliefs about model calibration. For the test case considered, we show that (1) field measured values of hydraulic conductivity are not as directly applicable to models as their use in some geostatistical methods imply; (2) a unique model does not necessarily need to be identified to obtain accurate predictions; and (3) in the absence of obvious model bias, model error was normally distributed. The complexity of the test case involved implies that the methods used and conclusions drawn are likely to be powerful in practice.
Coley, Rebekah Levine; Lombardi, Caitlin McPherran
2012-01-01
This study assessed whether previous findings linking early maternal employment to lower cognitive and behavioral skills among middle class and White children generalized to other groups. Using a representative sample of urban, low-income, predominantly African American and Hispanic families (n = 444), OLS regression and propensity score matching models assessed links between maternal employment in the two years after childbearing and children’s functioning at age 7. Children whose mothers were employed early, particularly in their first 8 months, showed enhanced socio-emotional functioning compared to peers whose mother remained nonemployed. Protective associations emerged for both part time and full time employment, and were driven by African American children, with neutral effects for Hispanics. Informal home-based child care also heightened positive links. PMID:22931466
Multicollinearity and Regression Analysis
NASA Astrophysics Data System (ADS)
Daoud, Jamal I.
2017-12-01
In regression analysis it is obvious to have a correlation between the response and predictor(s), but having correlation among predictors is something undesired. The number of predictors included in the regression model depends on many factors among which, historical data, experience, etc. At the end selection of most important predictors is something objective due to the researcher. Multicollinearity is a phenomena when two or more predictors are correlated, if this happens, the standard error of the coefficients will increase [8]. Increased standard errors means that the coefficients for some or all independent variables may be found to be significantly different from In other words, by overinflating the standard errors, multicollinearity makes some variables statistically insignificant when they should be significant. In this paper we focus on the multicollinearity, reasons and consequences on the reliability of the regression model.
Development of a Standalone Thermal Wellbore Simulator
NASA Astrophysics Data System (ADS)
Xiong, Wanqiang
With continuous developments of various different sophisticated wells in the petroleum industry, wellbore modeling and simulation have increasingly received more attention. Especially in unconventional oil and gas recovery processes, there is a growing demand for more accurate wellbore modeling. Despite notable advancements made in wellbore modeling, none of the existing wellbore simulators has been as successful as reservoir simulators such as Eclipse and CMG's and further research works on handling issues such as accurate heat loss modeling and multi-tubing wellbore modeling are really necessary. A series of mathematical equations including main governing equations, auxiliary equations, PVT equations, thermodynamic equations, drift-flux model equations, and wellbore heat loss calculation equations are collected and screened from publications. Based on these modeling equations, workflows for wellbore simulation and software development are proposed. Research works are conducted in key steps for developing a wellbore simulator: discretization, a grid system, a solution method, a linear equation solver, and computer language. A standalone thermal wellbore simulator is developed by using standard C++ language. This wellbore simulator can simulate single-phase injection and production, two-phase steam injection and two-phase oil and water production. By implementing a multi-part scheme which divides a wellbore with sophisticated configuration into several relative simple simulation running units, this simulator can handle different complex wellbores: wellbore with multistage casings, horizontal wells, multilateral wells and double tubing. In pursuance of improved accuracy of heat loss calculations to surrounding formations, a semi-numerical method is proposed and a series of FLUENT simulations have been conducted in this study. This semi-numerical method involves extending the 2D formation heat transfer simulation to include a casing wall and cement and adopting new correlations regressed by this study. Meanwhile, a correlation for handling heat transfer in double-tubing annulus is regressed. This work initiates the research on heat transfer in a double-tubing wellbore system. A series of validation and test works are performed in hot water injection, steam injection, real filed data, a horizontal well, a double-tubing well and comparison with the Ramey method. The program in this study also performs well in matching with real measured field data, simulation in horizontal wells and double-tubing wells.
Xing, Jian; Burkom, Howard; Tokars, Jerome
2011-12-01
Automated surveillance systems require statistical methods to recognize increases in visit counts that might indicate an outbreak. In prior work we presented methods to enhance the sensitivity of C2, a commonly used time series method. In this study, we compared the enhanced C2 method with five regression models. We used emergency department chief complaint data from US CDC BioSense surveillance system, aggregated by city (total of 206 hospitals, 16 cities) during 5/2008-4/2009. Data for six syndromes (asthma, gastrointestinal, nausea and vomiting, rash, respiratory, and influenza-like illness) was used and was stratified by mean count (1-19, 20-49, ≥50 per day) into 14 syndrome-count categories. We compared the sensitivity for detecting single-day artificially-added increases in syndrome counts. Four modifications of the C2 time series method, and five regression models (two linear and three Poisson), were tested. A constant alert rate of 1% was used for all methods. Among the regression models tested, we found that a Poisson model controlling for the logarithm of total visits (i.e., visits both meeting and not meeting a syndrome definition), day of week, and 14-day time period was best. Among 14 syndrome-count categories, time series and regression methods produced approximately the same sensitivity (<5% difference) in 6; in six categories, the regression method had higher sensitivity (range 6-14% improvement), and in two categories the time series method had higher sensitivity. When automated data are aggregated to the city level, a Poisson regression model that controls for total visits produces the best overall sensitivity for detecting artificially added visit counts. This improvement was achieved without increasing the alert rate, which was held constant at 1% for all methods. These findings will improve our ability to detect outbreaks in automated surveillance system data. Published by Elsevier Inc.
Fast and exact Newton and Bidirectional fitting of Active Appearance Models.
Kossaifi, Jean; Tzimiropoulos, Yorgos; Pantic, Maja
2016-12-21
Active Appearance Models (AAMs) are generative models of shape and appearance that have proven very attractive for their ability to handle wide changes in illumination, pose and occlusion when trained in the wild, while not requiring large training dataset like regression-based or deep learning methods. The problem of fitting an AAM is usually formulated as a non-linear least squares one and the main way of solving it is a standard Gauss-Newton algorithm. In this paper we extend Active Appearance Models in two ways: we first extend the Gauss-Newton framework by formulating a bidirectional fitting method that deforms both the image and the template to fit a new instance. We then formulate a second order method by deriving an efficient Newton method for AAMs fitting. We derive both methods in a unified framework for two types of Active Appearance Models, holistic and part-based, and additionally show how to exploit the structure in the problem to derive fast yet exact solutions. We perform a thorough evaluation of all algorithms on three challenging and recently annotated inthe- wild datasets, and investigate fitting accuracy, convergence properties and the influence of noise in the initialisation. We compare our proposed methods to other algorithms and show that they yield state-of-the-art results, out-performing other methods while having superior convergence properties.
Multivariate Boosting for Integrative Analysis of High-Dimensional Cancer Genomic Data
Xiong, Lie; Kuan, Pei-Fen; Tian, Jianan; Keles, Sunduz; Wang, Sijian
2015-01-01
In this paper, we propose a novel multivariate component-wise boosting method for fitting multivariate response regression models under the high-dimension, low sample size setting. Our method is motivated by modeling the association among different biological molecules based on multiple types of high-dimensional genomic data. Particularly, we are interested in two applications: studying the influence of DNA copy number alterations on RNA transcript levels and investigating the association between DNA methylation and gene expression. For this purpose, we model the dependence of the RNA expression levels on DNA copy number alterations and the dependence of gene expression on DNA methylation through multivariate regression models and utilize boosting-type method to handle the high dimensionality as well as model the possible nonlinear associations. The performance of the proposed method is demonstrated through simulation studies. Finally, our multivariate boosting method is applied to two breast cancer studies. PMID:26609213
Applicability of Cameriere's and Drusini's age estimation methods to a sample of Turkish adults.
Hatice, Boyacioglu Dogru; Nihal, Avcu; Nursel, Akkaya; Humeyra Ozge, Yilanci; Goksuluk, Dincer
2017-10-01
The aim of this study was to investigate the applicability of Drusini's and Cameriere's methods to a sample of Turkish people. Panoramic images of 200 individuals were allocated into two groups as study and test groups and examined by two observers. Tooth coronal indexes (TCI), which is the ratio between coronal pulp cavity height and crown height, were calculated in the mandibular first and second premolars and molars. Pulp/tooth area ratios (ARs) were calculated in the maxillary and mandibular canine teeth. Study group measurements were used to derive a regression model. Test group measurements were used to evaluate the accuracy of the regression model. Pearson's correlation coefficients and regression analysis were used. The correlations between TCIs and age were -0.230, -0.301, -0.344 and -0.257 for mandibular first premolar, second premolar, first molar and second molar, respectively. Those for the maxillary canine (MX) and mandibular canine (MN) ARs were -0.716 and -0.514, respectively. The MX ARs were used to build the linear regression model that explained 51.2% of the total variation, with a standard error of 9.23 years. The mean error of the estimates in test group was 8 years and age of 64% of the individuals were estimated with an error of <±10 years which is acceptable in forensic age prediction. The low correlation coefficients between age and TCI indicate that Drusini's method was not applicable to the estimation of age in a Turkish population. Using Cameriere's method, we derived a regression model.
Ground effects in FAA's Integrated Noise Model
DOT National Transportation Integrated Search
2000-01-01
The lateral attenuation algorithm in the Federal Aviation Administration's (FAA) Integrated Noise Model (INM) has historically been based on the two regression equations described in the Society of Automotive Engineers' (SAE) Aerospace Information Re...
NASA Astrophysics Data System (ADS)
Mfumu Kihumba, Antoine; Ndembo Longo, Jean; Vanclooster, Marnik
2016-03-01
A multivariate statistical modelling approach was applied to explain the anthropogenic pressure of nitrate pollution on the Kinshasa groundwater body (Democratic Republic of Congo). Multiple regression and regression tree models were compared and used to identify major environmental factors that control the groundwater nitrate concentration in this region. The analyses were made in terms of physical attributes related to the topography, land use, geology and hydrogeology in the capture zone of different groundwater sampling stations. For the nitrate data, groundwater datasets from two different surveys were used. The statistical models identified the topography, the residential area, the service land (cemetery), and the surface-water land-use classes as major factors explaining nitrate occurrence in the groundwater. Also, groundwater nitrate pollution depends not on one single factor but on the combined influence of factors representing nitrogen loading sources and aquifer susceptibility characteristics. The groundwater nitrate pressure was better predicted with the regression tree model than with the multiple regression model. Furthermore, the results elucidated the sensitivity of the model performance towards the method of delineation of the capture zones. For pollution modelling at the monitoring points, therefore, it is better to identify capture-zone shapes based on a conceptual hydrogeological model rather than to adopt arbitrary circular capture zones.
The prediction of intelligence in preschool children using alternative models to regression.
Finch, W Holmes; Chang, Mei; Davis, Andrew S; Holden, Jocelyn E; Rothlisberg, Barbara A; McIntosh, David E
2011-12-01
Statistical prediction of an outcome variable using multiple independent variables is a common practice in the social and behavioral sciences. For example, neuropsychologists are sometimes called upon to provide predictions of preinjury cognitive functioning for individuals who have suffered a traumatic brain injury. Typically, these predictions are made using standard multiple linear regression models with several demographic variables (e.g., gender, ethnicity, education level) as predictors. Prior research has shown conflicting evidence regarding the ability of such models to provide accurate predictions of outcome variables such as full-scale intelligence (FSIQ) test scores. The present study had two goals: (1) to demonstrate the utility of a set of alternative prediction methods that have been applied extensively in the natural sciences and business but have not been frequently explored in the social sciences and (2) to develop models that can be used to predict premorbid cognitive functioning in preschool children. Predictions of Stanford-Binet 5 FSIQ scores for preschool-aged children is used to compare the performance of a multiple regression model with several of these alternative methods. Results demonstrate that classification and regression trees provided more accurate predictions of FSIQ scores than does the more traditional regression approach. Implications of these results are discussed.
Galindo-Romero, Marta; Lippert, Tristan; Gavrilov, Alexander
2015-12-01
This paper presents an empirical linear equation to predict peak pressure level of anthropogenic impulsive signals based on its correlation with the sound exposure level. The regression coefficients are shown to be weakly dependent on the environmental characteristics but governed by the source type and parameters. The equation can be applied to values of the sound exposure level predicted with a numerical model, which provides a significant improvement in the prediction of the peak pressure level. Part I presents the analysis for airgun arrays signals, and Part II considers the application of the empirical equation to offshore impact piling noise.
Information, Avoidance Behavior, and Health: The Effect of Ozone on Asthma Hospitalizations
ERIC Educational Resources Information Center
Neidell, Matthew
2009-01-01
This paper assesses whether responses to information about risk impact estimates of the relationship between ozone and asthma in Southern California. Using a regression discontinuity design, I find smog alerts significantly reduce daily attendance at two major outdoor facilities. Using daily time-series regression models that include year-month…
Using within-day hive weight changes to measure environmental effects on honey bee colonies
USDA-ARS?s Scientific Manuscript database
Patterns in within-day hive weight data from two independent datasets in Arizona and California were modeled using piecewise regression, and analyzed with respect to honey bee colony behavior and landscape effects. The regression analysis yielded information on the start and finish of a colony’s dai...
Correlation Weights in Multiple Regression
ERIC Educational Resources Information Center
Waller, Niels G.; Jones, Jeff A.
2010-01-01
A general theory on the use of correlation weights in linear prediction has yet to be proposed. In this paper we take initial steps in developing such a theory by describing the conditions under which correlation weights perform well in population regression models. Using OLS weights as a comparison, we define cases in which the two weighting…
Regression Models of Quarterly Overhead Costs for Six Government Aerospace Contractors.
1986-03-01
34 Testing ,, for Serial Correlation After Least Squares %Regression, Econometrica, Vol. 36, No. 1, pp. 133-150, January 1968. Intrili8ator M.D., Econometric ...to be superior. These two estimators are both two-stage estimators that are calculated utilizing Wallis’s test statistic for fourth-order...utilizing Wallis’s test statistic for fourth-order autocorrelation. NTIS C F’,& D tI1C T - .1 I -. . . ..- rJ ,. *p J • - DA 3
NASA Astrophysics Data System (ADS)
Al-Harrasi, Ahmed; Rehman, Najeeb Ur; Mabood, Fazal; Albroumi, Muhammaed; Ali, Liaqat; Hussain, Javid; Hussain, Hidayat; Csuk, René; Khan, Abdul Latif; Alam, Tanveer; Alameri, Saif
2017-09-01
In the present study, for the first time, NIR spectroscopy coupled with PLS regression as a rapid and alternative method was developed to quantify the amount of Keto-β-Boswellic Acid (KBA) in different plant parts of Boswellia sacra and the resin exudates of the trunk. NIR spectroscopy was used for the measurement of KBA standards and B. sacra samples in absorption mode in the wavelength range from 700-2500 nm. PLS regression model was built from the obtained spectral data using 70% of KBA standards (training set) in the range from 0.1 ppm to 100 ppm. The PLS regression model obtained was having R-square value of 98% with 0.99 corelationship value and having good prediction with RMSEP value 3.2 and correlation of 0.99. It was then used to quantify the amount of KBA in the samples of B. sacra. The results indicated that the MeOH extract of resin has the highest concentration of KBA (0.6%) followed by essential oil (0.1%). However, no KBA was found in the aqueous extract. The MeOH extract of the resin was subjected to column chromatography to get various sub-fractions at different polarity of organic solvents. The sub-fraction at 4% MeOH/CHCl3 (4.1% of KBA) was found to contain the highest percentage of KBA followed by another sub-fraction at 2% MeOH/CHCl3 (2.2% of KBA). The present results also indicated that KBA is only present in the gum-resin of the trunk and not in all parts of the plant. These results were further confirmed through HPLC analysis and therefore it is concluded that NIRS coupled with PLS regression is a rapid and alternate method for quantification of KBA in Boswellia sacra. It is non-destructive, rapid, sensitive and uses simple methods of sample preparation.
Stuart P. Cottrell; Alan R. Graefe
1995-01-01
This paper examines predictors of boater behavior in a specific behavior situation, namely the percentage of raw sewage discharged from recreational vessels in a sanitation pumpout facility on the Chesapeake Bay. Results of a multiple regression analysis show knowledge predicts behavior in specific issue situations. In addition, the more specific the...
ERIC Educational Resources Information Center
Campbell, Tammy
2017-01-01
This paper tests the hypothesis that stream placement influences teacher judgements of pupils, thus investigating a route through which streaming by "ability" may contribute to inequalities. Regression modelling of data for 800+ 7-year-olds taking part in the Millennium Cohort Study examines whether teachers' reported perceptions of…
Mark Chopping; Anne Nolin; Gretchen G. Moisen; John V. Martonchik; Michael Bull
2009-01-01
In this study retrievals of forest canopy height were obtained through adjustment of a simple geometricoptical (GO) model against red band surface bidirectional reflectance estimates from NASA's Multiangle Imaging SpectroRadiometer (MISR), mapped to a 250 m grid. The soil-understory background contribution was partly isolated prior to inversion using regression...
Heterosexual Risk Behaviors Among Urban Young Adolescents
ERIC Educational Resources Information Center
O'Donnell, Lydia; Stueve, Ann; Wilson-Simmons, Renee; Dash, Kim; Agronick, Gail; JeanBaptiste, Varzi
2006-01-01
Urban 6th graders (n = 294) participate in a survey assessing early heterosexual risk behaviors as part of the Reach for Health Middle Childhood Study. About half the boys (47%) and 20% of girls report having a girlfriend or boyfriend; 42% of boys and 10% of girls report kissing and hugging for a long time. Stepwise regressions model the…
Composites from southern pine juvenile wood. Part 3. Juvenile and mature wood furnish mixtures
A.D. Pugel; E.W. Price; Chung-Yun Hse; T.F. Shupe
2004-01-01
Composite panelsmade from mixtures ofmature andjuvenile southern pine (Pinus taeda L.) were evaluated for initial mechanical properties and dimensional stability. The effect that the proportion of juvenile wood had on panel properties was analyzed by regression and rule-of-mixtures models. The mixed furnish data: 1) highlighted the degree to which...
Hot callusing for propagation of American beech by grafting
David W. Carey; Mary E. Mason; Paul Bloese; Jennifer L. Koch
2013-01-01
To increase grafting success rate, a hot callus grafting system was designed and implemented as part of a multiagency collaborative project to manage beech bark disease (BBD) through the establishment of regional BBD-resistant grafted seed orchards. Five years of data from over 2000 hot callus graft attempts were analyzed using a logistic regression model to determine...
Stoichev, T; Tessier, E; Amouroux, D; Almeida, C M; Basto, M C P; Vasconcelos, V M
2016-11-15
Spatial and seasonal variation of mercury species aqueous concentrations and distributions was carried out during six sampling campaigns at four locations within Laranjo Bay, the most mercury-contaminated area of the Aveiro Lagoon (Portugal). Inorganic mercury (IHg(II)) and methylmercury (MeHg) were determined in filter-retained (IHgPART, MeHgPART) and filtered (<0.45μm) fractions (IHg(II)DISS, MeHgDISS). The concentrations of IHgPART depended on site and on dilution with downstream particles. Similar processes were evidenced for MeHgPART, however, its concentrations increased for particles rich in phaeophytin (Pha). The concentrations of MeHgDISS, and especially those of IHg(II)DISS, increased with Pha concentrations in the water. Multiple regression models are able to depict MeHgPART, IHg(II)DISS and MeHgDISS concentrations with salinity and Pha concentrations exhibiting additive statistical effects and allowing separation of possible addition and removal processes. A link between phytoplankton/algae and consumers' grazing pressure in the contaminated area can be involved to increase concentrations of IHg(II)DISS and MeHgPART. These processes could lead to suspended particles enriched with MeHg and to the enhancement of IHg(II) and MeHg availability in surface waters and higher transfer to the food web. Copyright © 2016 Elsevier B.V. All rights reserved.
Use of probabilistic weights to enhance linear regression myoelectric control
NASA Astrophysics Data System (ADS)
Smith, Lauren H.; Kuiken, Todd A.; Hargrove, Levi J.
2015-12-01
Objective. Clinically available prostheses for transradial amputees do not allow simultaneous myoelectric control of degrees of freedom (DOFs). Linear regression methods can provide simultaneous myoelectric control, but frequently also result in difficulty with isolating individual DOFs when desired. This study evaluated the potential of using probabilistic estimates of categories of gross prosthesis movement, which are commonly used in classification-based myoelectric control, to enhance linear regression myoelectric control. Approach. Gaussian models were fit to electromyogram (EMG) feature distributions for three movement classes at each DOF (no movement, or movement in either direction) and used to weight the output of linear regression models by the probability that the user intended the movement. Eight able-bodied and two transradial amputee subjects worked in a virtual Fitts’ law task to evaluate differences in controllability between linear regression and probability-weighted regression for an intramuscular EMG-based three-DOF wrist and hand system. Main results. Real-time and offline analyses in able-bodied subjects demonstrated that probability weighting improved performance during single-DOF tasks (p < 0.05) by preventing extraneous movement at additional DOFs. Similar results were seen in experiments with two transradial amputees. Though goodness-of-fit evaluations suggested that the EMG feature distributions showed some deviations from the Gaussian, equal-covariance assumptions used in this experiment, the assumptions were sufficiently met to provide improved performance compared to linear regression control. Significance. Use of probability weights can improve the ability to isolate individual during linear regression myoelectric control, while maintaining the ability to simultaneously control multiple DOFs.
Techniques for estimating magnitude and frequency of peak flows for Pennsylvania streams
Stuckey, Marla H.; Reed, Lloyd A.
2000-01-01
Regression equations for estimating the magnitude and frequency of floods on ungaged streams in Pennsylvania with drainage areas less that 2,000 square miles were developed on the basis of peak-flow data collected at 313 streamflow-gaging stations. All streamflow-gaging stations used in the development of the equations had 10 or more years of record and include active and discontinued continuous-record and crest-stage partial-record streamflow-gaging stations. Regional regression equations were developed for flood flows expected every 10, 25, 50, 100, and 500 years by the use of a weighted multiple linear regression model.The State was divided into two regions. The largest region, Region A, encompasses about 78 percent of Pennsylvania. The smaller region, Region B, includes only the northwestern part of the State. Basin characteristics used in the regression equations for Region A are drainage area, percentage of forest cover, percentage of urban development, percentage of basin underlain by carbonate bedrock, and percentage of basin controlled by lakes, swamps, and reservoirs. Basin characteristics used in the regression equations for Region B are drainage area and percentage of basin controlled by lakes, swamps, and reservoirs. The coefficient of determination (R2) values for the five flood-frequency equations for Region A range from 0.93 to 0.82, and for Region B, the range is from 0.96 to 0.89.While the regression equations can be used to predict the magnitude and frequency of peak flows for most streams in the State, they should not be used for streams with drainage areas greater than 2,000 square miles or less than 1.5 square miles, for streams that drain extensively mined areas, or for stream reaches immediately below flood-control reservoirs. In addition, the equations presented for Region B should not be used if the stream drains a basin with more than 5 percent urban development.
Individualized prediction of lung-function decline in chronic obstructive pulmonary disease
Zafari, Zafar; Sin, Don D.; Postma, Dirkje S.; Löfdahl, Claes-Göran; Vonk, Judith; Bryan, Stirling; Lam, Stephen; Tammemagi, C. Martin; Khakban, Rahman; Man, S.F. Paul; Tashkin, Donald; Wise, Robert A.; Connett, John E.; McManus, Bruce; Ng, Raymond; Hollander, Zsuszanna; Sadatsafavi, Mohsen
2016-01-01
Background: The rate of lung-function decline in chronic obstructive pulmonary disease (COPD) varies substantially among individuals. We sought to develop and validate an individualized prediction model for forced expiratory volume at 1 second (FEV1) in current smokers with mild-to-moderate COPD. Methods: Using data from a large long-term clinical trial (the Lung Health Study), we derived mixed-effects regression models to predict future FEV1 values over 11 years according to clinical traits. We modelled heterogeneity by allowing regression coefficients to vary across individuals. Two independent cohorts with COPD were used for validating the equations. Results: We used data from 5594 patients (mean age 48.4 yr, 63% men, mean baseline FEV1 2.75 L) to create the individualized prediction equations. There was significant between-individual variability in the rate of FEV1 decline, with the interval for the annual rate of decline that contained 95% of individuals being −124 to −15 mL/yr for smokers and −83 to 15 mL/yr for sustained quitters. Clinical variables in the final model explained 88% of variation around follow-up FEV1. The C statistic for predicting severity grades was 0.90. Prediction equations performed robustly in the 2 external data sets. Interpretation: A substantial part of individual variation in FEV1 decline can be explained by easily measured clinical variables. The model developed in this work can be used for prediction of future lung health in patients with mild-to-moderate COPD. Trial registration: Lung Health Study — ClinicalTrials.gov, no. NCT00000568; Pan-Canadian Early Detection of Lung Cancer Study — ClinicalTrials.gov, no. NCT00751660 PMID:27486205
[Effect of social desirability on dietary intake estimated from a food questionnaire].
Barros, Renata; Moreira, Pedro; Oliveira, Bruno
2005-01-01
Self-report of dietary intake could be biased by social thus affecting risk estimates in epidemiological studies. The objective of study was to assess the effect of social desirability on dietary intake from a food frequency questionnaire (FFQ). A convenience sample of 483 Portuguese university students was recruited. Subjects were invited to complete a two-part self-administered questionnaire: the first part included the Marlowe-Crowne Social Desirability Scale (M-CSDS), a physical activity questionnaire and self-reported height and weight; the second part, included a semi-quantitative FFQ validated for Portuguese adults, that should be returned after fulfillment. All subjects completed the first part of the questionnaire and 40.4% returned the FFQ fairly completed. In multiple regression analysis, after adjustment for energy and confounders, social desirability produced a significant positive effect in the estimates of dietary fibre, vitamin C, vitamin E, magnesium and potassium, in both genders. In multiple regression, after adjustment for energy and confounders, social desirability had a significant positive effect in the estimates of vegetable consumption, for both genders, and a negative effect in white bread and beer, for women. Social desirability affected nutritional and food intake estimated from a food frequency questionnaire.
NASA Technical Reports Server (NTRS)
Batterson, J. G.
1986-01-01
The successful parametric modeling of the aerodynamics for an airplane operating at high angles of attack or sideslip is performed in two phases. First the aerodynamic model structure must be determined and second the associated aerodynamic parameters (stability and control derivatives) must be estimated for that model. The purpose of this paper is to document two versions of a stepwise regression computer program which were developed for the determination of airplane aerodynamic model structure and to provide two examples of their use on computer generated data. References are provided for the application of the programs to real flight data. The two computer programs that are the subject of this report, STEP and STEPSPL, are written in FORTRAN IV (ANSI l966) compatible with a CDC FTN4 compiler. Both programs are adaptations of a standard forward stepwise regression algorithm. The purpose of the adaptation is to facilitate the selection of a adequate mathematical model of the aerodynamic force and moment coefficients of an airplane from flight test data. The major difference between STEP and STEPSPL is in the basis for the model. The basis for the model in STEP is the standard polynomial Taylor's series expansion of the aerodynamic function about some steady-state trim condition. Program STEPSPL utilizes a set of spline basis functions.
Development and Application of Nonlinear Land-Use Regression Models
NASA Astrophysics Data System (ADS)
Champendal, Alexandre; Kanevski, Mikhail; Huguenot, Pierre-Emmanuel
2014-05-01
The problem of air pollution modelling in urban zones is of great importance both from scientific and applied points of view. At present there are several fundamental approaches either based on science-based modelling (air pollution dispersion) or on the application of space-time geostatistical methods (e.g. family of kriging models or conditional stochastic simulations). Recently, there were important developments in so-called Land Use Regression (LUR) models. These models take into account geospatial information (e.g. traffic network, sources of pollution, average traffic, population census, land use, etc.) at different scales, for example, using buffering operations. Usually the dimension of the input space (number of independent variables) is within the range of (10-100). It was shown that LUR models have some potential to model complex and highly variable patterns of air pollution in urban zones. Most of LUR models currently used are linear models. In the present research the nonlinear LUR models are developed and applied for Geneva city. Mainly two nonlinear data-driven models were elaborated: multilayer perceptron and random forest. An important part of the research deals also with a comprehensive exploratory data analysis using statistical, geostatistical and time series tools. Unsupervised self-organizing maps were applied to better understand space-time patterns of the pollution. The real data case study deals with spatial-temporal air pollution data of Geneva (2002-2011). Nitrogen dioxide (NO2) has caught our attention. It has effects on human health and on plants; NO2 contributes to the phenomenon of acid rain. The negative effects of nitrogen dioxides on plants are the reduction of the growth, production and pesticide resistance. And finally, the effects on materials: nitrogen dioxide increases the corrosion. The data used for this study consist of a set of 106 NO2 passive sensors. 80 were used to build the models and the remaining 36 have constituted the testing set. Missing data have been completed using multiple linear regression and annual average values of pollutant concentrations were computed. All sensors are dispersed homogeneously over the central urban area of Geneva. The main result of the study is that the nonlinear LUR models developed have demonstrated their efficiency in modelling complex phrenomena of air pollution in urban zones and significantly reduced the testing error in comparison with linear models. Further research deals with the development and application of other non-linear data-driven models (Kanevski et al. 2009). References Kanevski M., Pozdnoukhov A. and Timonin V. (2009). Machine Learning for Spatial Environmental Data. Theory, Applications and Software. EPLF Press, Lausanne.
Stature estimation from the lengths of the growing foot-a study on North Indian adolescents.
Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam; DiMaggio, John A
2012-12-01
Stature estimation is considered as one of the basic parameters of the investigation process in unknown and commingled human remains in medico-legal case work. Race, age and sex are the other parameters which help in this process. Stature estimation is of the utmost importance as it completes the biological profile of a person along with the other three parameters of identification. The present research is intended to formulate standards for stature estimation from foot dimensions in adolescent males from North India and study the pattern of foot growth during the growing years. 154 male adolescents from the Northern part of India were included in the study. Besides stature, five anthropometric measurements that included the length of the foot from each toe (T1, T2, T3, T4, and T5 respectively) to pternion were measured on each foot. The data was analyzed statistically using Student's t-test, Pearson's correlation, linear and multiple regression analysis for estimation of stature and growth of foot during ages 13-18 years. Correlation coefficients between stature and all the foot measurements were found to be highly significant and positively correlated. Linear regression models and multiple regression models (with age as a co-variable) were derived for estimation of stature from the different measurements of the foot. Multiple regression models (with age as a co-variable) estimate stature with greater accuracy than the regression models for 13-18 years age group. The study shows the growth pattern of feet in North Indian adolescents and indicates that anthropometric measurements of the foot and its segments are valuable in estimation of stature in growing individuals of that population. Copyright © 2012 Elsevier Ltd. All rights reserved.
Epidemiology of occupational injury among cleaners in the healthcare sector.
Alamgir, Hasanat; Yu, Shicheng
2008-09-01
The cleaning profession has been associated with multiple ergonomic and chemical hazards which elevate the risk for occupational injury. This study investigated the epidemiology of occupational injury among cleaners in healthcare work settings in the Canadian province of British Columbia. Incidents of occupational injury among cleaners, resulting in lost time from work or medical care, over a period of 1 year in two healthcare regions were extracted from a standardized operational database and with person-years obtained from payroll data. Detailed analysis was conducted using Poisson regression modeling. A total of 145 injuries were identified among cleaners, with an annual incidence rate of 32.1 per 100 person-years. After adjustment for age, gender, subsector, facility, experience and employment status, Poisson regression models demonstrated that a significantly higher relative risk (RR) of all injury, musculoskeletal injury and cuts was associated with cleaning work in acute care facilities, compared with long-term care facilities. Female cleaners were at a higher RR of all injuries and contusions than male cleaners. A lower risk of all injury and allergy and irritation incidents among part-time or casual workers was found. Cleaners with >10 years of experience were at significantly lower risk for all injury, contusion and allergy and irritation incidents. Cleaners were found to be at an elevated risk of all injury categories compared with healthcare workers in general.
Insulin Resistance: Regression and Clustering
Yoon, Sangho; Assimes, Themistocles L.; Quertermous, Thomas; Hsiao, Chin-Fu; Chuang, Lee-Ming; Hwu, Chii-Min; Rajaratnam, Bala; Olshen, Richard A.
2014-01-01
In this paper we try to define insulin resistance (IR) precisely for a group of Chinese women. Our definition deliberately does not depend upon body mass index (BMI) or age, although in other studies, with particular random effects models quite different from models used here, BMI accounts for a large part of the variability in IR. We accomplish our goal through application of Gauss mixture vector quantization (GMVQ), a technique for clustering that was developed for application to lossy data compression. Defining data come from measurements that play major roles in medical practice. A precise statement of what the data are is in Section 1. Their family structures are described in detail. They concern levels of lipids and the results of an oral glucose tolerance test (OGTT). We apply GMVQ to residuals obtained from regressions of outcomes of an OGTT and lipids on functions of age and BMI that are inferred from the data. A bootstrap procedure developed for our family data supplemented by insights from other approaches leads us to believe that two clusters are appropriate for defining IR precisely. One cluster consists of women who are IR, and the other of women who seem not to be. Genes and other features are used to predict cluster membership. We argue that prediction with “main effects” is not satisfactory, but prediction that includes interactions may be. PMID:24887437
Louys, Julien; Meloro, Carlo; Elton, Sarah; Ditchfield, Peter; Bishop, Laura C
2015-01-01
We test the performance of two models that use mammalian communities to reconstruct multivariate palaeoenvironments. While both models exploit the correlation between mammal communities (defined in terms of functional groups) and arboreal heterogeneity, the first uses a multiple multivariate regression of community structure and arboreal heterogeneity, while the second uses a linear regression of the principal components of each ecospace. The success of these methods means the palaeoenvironment of a particular locality can be reconstructed in terms of the proportions of heavy, moderate, light, and absent tree canopy cover. The linear regression is less biased, and more precisely and accurately reconstructs heavy tree canopy cover than the multiple multivariate model. However, the multiple multivariate model performs better than the linear regression for all other canopy cover categories. Both models consistently perform better than randomly generated reconstructions. We apply both models to the palaeocommunity of the Upper Laetolil Beds, Tanzania. Our reconstructions indicate that there was very little heavy tree cover at this site (likely less than 10%), with the palaeo-landscape instead comprising a mixture of light and absent tree cover. These reconstructions help resolve the previous conflicting palaeoecological reconstructions made for this site. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley
2007-01-01
Propulsion ground test facilities face the daily challenge of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Over the last decade NASA s propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and exceeded the capabilities of numerous test facility and test article components. A logistic regression mathematical modeling technique has been developed to predict the probability of successfully completing a rocket propulsion test. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),.., X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure of accomplishing a full duration test. The use of logistic regression modeling is not new; however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from this type of model provide project managers with insight and confidence into the effectiveness of rocket propulsion ground testing.
A New SEYHAN's Approach in Case of Heterogeneity of Regression Slopes in ANCOVA.
Ankarali, Handan; Cangur, Sengul; Ankarali, Seyit
2018-06-01
In this study, when the assumptions of linearity and homogeneity of regression slopes of conventional ANCOVA are not met, a new approach named as SEYHAN has been suggested to use conventional ANCOVA instead of robust or nonlinear ANCOVA. The proposed SEYHAN's approach involves transformation of continuous covariate into categorical structure when the relationship between covariate and dependent variable is nonlinear and the regression slopes are not homogenous. A simulated data set was used to explain SEYHAN's approach. In this approach, we performed conventional ANCOVA in each subgroup which is constituted according to knot values and analysis of variance with two-factor model after MARS method was used for categorization of covariate. The first model is a simpler model than the second model that includes interaction term. Since the model with interaction effect has more subjects, the power of test also increases and the existing significant difference is revealed better. We can say that linearity and homogeneity of regression slopes are not problem for data analysis by conventional linear ANCOVA model by helping this approach. It can be used fast and efficiently for the presence of one or more covariates.
NASA Astrophysics Data System (ADS)
Houborg, Rasmus; McCabe, Matthew F.
2018-01-01
With an increasing volume and dimensionality of Earth observation data, enhanced integration of machine-learning methodologies is needed to effectively analyze and utilize these information rich datasets. In machine-learning, a training dataset is required to establish explicit associations between a suite of explanatory 'predictor' variables and the target property. The specifics of this learning process can significantly influence model validity and portability, with a higher generalization level expected with an increasing number of observable conditions being reflected in the training dataset. Here we propose a hybrid training approach for leaf area index (LAI) estimation, which harnesses synergistic attributes of scattered in-situ measurements and systematically distributed physically based model inversion results to enhance the information content and spatial representativeness of the training data. To do this, a complimentary training dataset of independent LAI was derived from a regularized model inversion of RapidEye surface reflectances and subsequently used to guide the development of LAI regression models via Cubist and random forests (RF) decision tree methods. The application of the hybrid training approach to a broad set of Landsat 8 vegetation index (VI) predictor variables resulted in significantly improved LAI prediction accuracies and spatial consistencies, relative to results relying on in-situ measurements alone for model training. In comparing the prediction capacity and portability of the two machine-learning algorithms, a pair of relatively simple multi-variate regression models established by Cubist performed best, with an overall relative mean absolute deviation (rMAD) of ∼11%, determined based on a stringent scene-specific cross-validation approach. In comparison, the portability of RF regression models was less effective (i.e., an overall rMAD of ∼15%), which was attributed partly to model saturation at high LAI in association with inherent extrapolation and transferability limitations. Explanatory VIs formed from bands in the near-infrared (NIR) and shortwave infrared domains (e.g., NDWI) were associated with the highest predictive ability, whereas Cubist models relying entirely on VIs based on NIR and red band combinations (e.g., NDVI) were associated with comparatively high uncertainties (i.e., rMAD ∼ 21%). The most transferable and best performing models were based on combinations of several predictor variables, which included both NDWI- and NDVI-like variables. In this process, prior screening of input VIs based on an assessment of variable relevance served as an effective mechanism for optimizing prediction accuracies from both Cubist and RF. While this study demonstrated benefit in combining data mining operations with physically based constraints via a hybrid training approach, the concept of transferability and portability warrants further investigations in order to realize the full potential of emerging machine-learning techniques for regression purposes.
Xu, Jian-Wu; Suzuki, Kenji
2011-01-01
Purpose: A massive-training artificial neural network (MTANN) has been developed for the reduction of false positives (FPs) in computer-aided detection (CADe) of polyps in CT colonography (CTC). A major limitation of the MTANN is the long training time. To address this issue, the authors investigated the feasibility of two state-of-the-art regression models, namely, support vector regression (SVR) and Gaussian process regression (GPR) models, in the massive-training framework and developed massive-training SVR (MTSVR) and massive-training GPR (MTGPR) for the reduction of FPs in CADe of polyps. Methods: The authors applied SVR and GPR as volume-processing techniques in the distinction of polyps from FP detections in a CTC CADe scheme. Unlike artificial neural networks (ANNs), both SVR and GPR are memory-based methods that store a part of or the entire training data for testing. Therefore, their training is generally fast and they are able to improve the efficiency of the massive-training methodology. Rooted in a maximum margin property, SVR offers excellent generalization ability and robustness to outliers. On the other hand, GPR approaches nonlinear regression from a Bayesian perspective, which produces both the optimal estimated function and the covariance associated with the estimation. Therefore, both SVR and GPR, as the state-of-the-art nonlinear regression models, are able to offer a performance comparable or potentially superior to that of ANN, with highly efficient training. Both MTSVR and MTGPR were trained directly with voxel values from CTC images. A 3D scoring method based on a 3D Gaussian weighting function was applied to the outputs of MTSVR and MTGPR for distinction between polyps and nonpolyps. To test the performance of the proposed models, the authors compared them to the original MTANN in the distinction between actual polyps and various types of FPs in terms of training time reduction and FP reduction performance. The authors’ CTC database consisted of 240 CTC data sets obtained from 120 patients in the supine and prone positions. The training set consisted of 27 patients, 10 of which had 10 polyps. The authors selected 10 nonpolyps (i.e., FP sources) from the training set. These ten polyps and ten nonpolyps were used for training the proposed models. The testing set consisted of 93 patients, including 19 polyps in 7 patients and 86 negative patients with 474 FPs produced by an original CADe scheme. Results: With the MTSVR, the training time was reduced by a factor of 190, while a FP reduction performance [by-polyp sensitivity of 94.7% (18∕19) with 2.5 (230∕93) FPs∕patient] comparable to that of the original MTANN [the same sensitivity with 2.6 (244∕93) FPs∕patient] was achieved. The classification performance in terms of the area under the receiver-operating-characteristic curve value of the MTGPR (0.82) was statistically significantly higher than that of the original MTANN (0.77), with a two-sided p-value of 0.03. The MTGPR yielded a 94.7% (18∕19) by-polyp sensitivity at a FP rate of 2.5 (235∕93) per patient and reduced the training time by a factor of 1.3. Conclusions: Both MTSVR and MTGPR improve the efficiency of the training in the massive-training framework while maintaining a comparable performance. PMID:21626922
NASA Astrophysics Data System (ADS)
Rooper, Christopher N.; Zimmermann, Mark; Prescott, Megan M.
2017-08-01
Deep-sea coral and sponge ecosystems are widespread throughout most of Alaska's marine waters, and are associated with many different species of fishes and invertebrates. These ecosystems are vulnerable to the effects of commercial fishing activities and climate change. We compared four commonly used species distribution models (general linear models, generalized additive models, boosted regression trees and random forest models) and an ensemble model to predict the presence or absence and abundance of six groups of benthic invertebrate taxa in the Gulf of Alaska. All four model types performed adequately on training data for predicting presence and absence, with regression forest models having the best overall performance measured by the area under the receiver-operating-curve (AUC). The models also performed well on the test data for presence and absence with average AUCs ranging from 0.66 to 0.82. For the test data, ensemble models performed the best. For abundance data, there was an obvious demarcation in performance between the two regression-based methods (general linear models and generalized additive models), and the tree-based models. The boosted regression tree and random forest models out-performed the other models by a wide margin on both the training and testing data. However, there was a significant drop-off in performance for all models of invertebrate abundance ( 50%) when moving from the training data to the testing data. Ensemble model performance was between the tree-based and regression-based methods. The maps of predictions from the models for both presence and abundance agreed very well across model types, with an increase in variability in predictions for the abundance data. We conclude that where data conforms well to the modeled distribution (such as the presence-absence data and binomial distribution in this study), the four types of models will provide similar results, although the regression-type models may be more consistent with biological theory. For data with highly zero-inflated distributions and non-normal distributions such as the abundance data from this study, the tree-based methods performed better. Ensemble models that averaged predictions across the four model types, performed better than the GLM or GAM models but slightly poorer than the tree-based methods, suggesting ensemble models might be more robust to overfitting than tree methods, while mitigating some of the disadvantages in predictive performance of regression methods.
Medina-Solis, Carlo Eduardo; Maupomé, Gerardo; del Socorro, Herrera Miriam; Pérez-Núñez, Ricardo; Avila-Burgos, Leticia; Lamadrid-Figueroa, Hector
2008-01-01
To determine the factors associated with the dental health services utilization among children ages 6 to 12 in León, Nicaragua. A cross-sectional study was carried out in 1,400 schoolchildren. Using a questionnaire, we determined information related to utilization and independent variables in the previous year. Oral health needs were established by means of a dental examination. To identify the independent variables associated with dental health services utilization, two types of multivariate regression models were used, according to the measurement scale of the outcome variable: a) frequency of utilization as (0) none, (1) one, and (2) two or more, analyzed with the ordered logistic regression and b) the type of service utilized as (0) none, (1) preventive services, (2) curative services, and (3) both services, analyzed with the multinomial logistic regression. The proportion of children who received at least one dental service in the 12 months prior to the study was 27.7 percent. The variables associated with utilization in the two models were older age, female sex, more frequent toothbrushing, positive attitude of the mother toward the child's oral health, higher socioeconomic level, and higher oral health needs. Various predisposing, enabling, and oral health needs variables were associated with higher dental health services utilization. As in prior reports elsewhere, these results from Nicaragua confirmed that utilization inequalities exist between socioeconomic groups. The multinomial logistic regression model evidenced the association of different variables depending on the type of service used.
Bias and uncertainty of δ13CO2 isotopic mixing models
Zachary E. Kayler; Lisa Ganio; Mark Hauck; Thomas G. Pypker; Elizabeth W. Sulzman; Alan C. Mix; Barbara J. Bond
2009-01-01
The goal of this study was to evaluate how factorial combinations of two mixing models and two regression approaches (Keeling-OLS, MillerâTans-OLS, Keeling-GMR, MillerâTans-GMR) compare in small [CO2] range versus large[CO2] range regimes, with different combinations of...
Observed and Projected Precipitation Changes over the Nine US Climate Regions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chylek, Petr; Dubey, Manvendra; Hengartner, Nicholas
Here, we analyze the past (1900–2015) temperature and precipitation changes in nine separate US climate regions. We find that the temperature increased in a statistically significant (95% confidence level equivalent to alpha level of 0.05) manner in all of these regions. However, the variability in the observed precipitation was much more complex. In the eastern US (east of Rocky Mountains), the precipitation increased in all five climate regions and the increase was statistically significant in three of them. In contract, in the western US, the precipitation increased in two regions and decreased in two with no statistical significance in anymore » region. The CMIP5 climate models (an ensemble mean) were not able to capture properly either the large precipitation differences between the eastern and the western US, or the changes of precipitation between 1900 and 2015 in eastern US. The statistical regression model explains the differences between the eastern and western US precipitation as results of different significant predictors. The anthropogenic greenhouse gases and aerosol (GHGA) are the major forcing of the precipitation in the eastern part of US, while the Pacific Decadal Oscillation (PDO) has the major influence on precipitation in the western part of the US. This analysis suggests that the precipitation over the eastern US increased at an approximate rate of 6.7%/K, in agreement with the Clausius-Clapeyron equation, while the precipitation of the western US was approximately constant, independent of the temperature. Future precipitation over the western part of the US will depend on the behavior of the PDO, and how it (PDO) may be affected by future warming. Low hydrological sensitivity (percent increase of precipitation per one K of warming) projected by the CMIP5 models for the eastern US suggests either an underestimate of future precipitation or an overestimate of future warming.« less
Observed and Projected Precipitation Changes over the Nine US Climate Regions
Chylek, Petr; Dubey, Manvendra; Hengartner, Nicholas; ...
2017-10-25
Here, we analyze the past (1900–2015) temperature and precipitation changes in nine separate US climate regions. We find that the temperature increased in a statistically significant (95% confidence level equivalent to alpha level of 0.05) manner in all of these regions. However, the variability in the observed precipitation was much more complex. In the eastern US (east of Rocky Mountains), the precipitation increased in all five climate regions and the increase was statistically significant in three of them. In contract, in the western US, the precipitation increased in two regions and decreased in two with no statistical significance in anymore » region. The CMIP5 climate models (an ensemble mean) were not able to capture properly either the large precipitation differences between the eastern and the western US, or the changes of precipitation between 1900 and 2015 in eastern US. The statistical regression model explains the differences between the eastern and western US precipitation as results of different significant predictors. The anthropogenic greenhouse gases and aerosol (GHGA) are the major forcing of the precipitation in the eastern part of US, while the Pacific Decadal Oscillation (PDO) has the major influence on precipitation in the western part of the US. This analysis suggests that the precipitation over the eastern US increased at an approximate rate of 6.7%/K, in agreement with the Clausius-Clapeyron equation, while the precipitation of the western US was approximately constant, independent of the temperature. Future precipitation over the western part of the US will depend on the behavior of the PDO, and how it (PDO) may be affected by future warming. Low hydrological sensitivity (percent increase of precipitation per one K of warming) projected by the CMIP5 models for the eastern US suggests either an underestimate of future precipitation or an overestimate of future warming.« less
Kotwal, Ashwin A; Lauderdale, Diane S; Waite, Linda J; Dale, William
2016-07-01
Marriage is linked to improved colorectal cancer-related health, likely in part through preventive health behaviors, but it is unclear what role spouses play in colorectal cancer screening. We therefore determine whether self-reported colonoscopy rates are correlated within married couples and the characteristics of spouses associated with colonoscopy use in each partner. We use US nationally-representative 2010 data which includes 804 male-female married couples drawn from a total sample of 3137 community-dwelling adults aged 55-90years old. Using a logistic regression model in the full sample (N=3137), we first find married men have higher adjusted colonoscopy rates than unmarried men (61% versus 52%, p=0.023), but women's rates do not differ by marital status. In the couples' sample (N=804 couples), we use a bivariate probit regression model to estimate multiple regression equations for the two spouses simultaneously as a function of individual and spousal covariates, as well as the adjusted correlation within couples. We find that individuals are nearly twice as likely to receive a colonoscopy if their spouse recently has had one (OR=1.94, 95% CI: 1.39, 2.67, p<0.001). Additionally, we find that husbands have higher adjusted colonoscopy rates whose wives are: 1) happier with the marital relationship (65% vs 51%, p=0.020); 2) more highly educated (72% vs 51%, p=0.020), and 3) viewed as more supportive (65% vs 52%, p=0.020). Recognizing the role of marital status, relationship quality, and spousal characteristics on colonoscopy uptake, particularly in men, could help physicians increase guideline adherence. Copyright © 2016. Published by Elsevier Inc.
Comparing spatial regression to random forests for large ...
Environmental data may be “large” due to number of records, number of covariates, or both. Random forests has a reputation for good predictive performance when using many covariates, whereas spatial regression, when using reduced rank methods, has a reputation for good predictive performance when using many records. In this study, we compare these two techniques using a data set containing the macroinvertebrate multimetric index (MMI) at 1859 stream sites with over 200 landscape covariates. Our primary goal is predicting MMI at over 1.1 million perennial stream reaches across the USA. For spatial regression modeling, we develop two new methods to accommodate large data: (1) a procedure that estimates optimal Box-Cox transformations to linearize covariate relationships; and (2) a computationally efficient covariate selection routine that takes into account spatial autocorrelation. We show that our new methods lead to cross-validated performance similar to random forests, but that there is an advantage for spatial regression when quantifying the uncertainty of the predictions. Simulations are used to clarify advantages for each method. This research investigates different approaches for modeling and mapping national stream condition. We use MMI data from the EPA's National Rivers and Streams Assessment and predictors from StreamCat (Hill et al., 2015). Previous studies have focused on modeling the MMI condition classes (i.e., good, fair, and po
Ding, A Adam; Wu, Hulin
2014-10-01
We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method.
Ding, A. Adam; Wu, Hulin
2015-01-01
We propose a new method to use a constrained local polynomial regression to estimate the unknown parameters in ordinary differential equation models with a goal of improving the smoothing-based two-stage pseudo-least squares estimate. The equation constraints are derived from the differential equation model and are incorporated into the local polynomial regression in order to estimate the unknown parameters in the differential equation model. We also derive the asymptotic bias and variance of the proposed estimator. Our simulation studies show that our new estimator is clearly better than the pseudo-least squares estimator in estimation accuracy with a small price of computational cost. An application example on immune cell kinetics and trafficking for influenza infection further illustrates the benefits of the proposed new method. PMID:26401093
Noninvasive and fast measurement of blood glucose in vivo by near infrared (NIR) spectroscopy
NASA Astrophysics Data System (ADS)
Jintao, Xue; Liming, Ye; Yufei, Liu; Chunyan, Li; Han, Chen
2017-05-01
This research was to develop a method for noninvasive and fast blood glucose assay in vivo. Near-infrared (NIR) spectroscopy, a more promising technique compared to other methods, was investigated in rats with diabetes and normal rats. Calibration models are generated by two different multivariate strategies: partial least squares (PLS) as linear regression method and artificial neural networks (ANN) as non-linear regression method. The PLS model was optimized individually by considering spectral range, spectral pretreatment methods and number of model factors, while the ANN model was studied individually by selecting spectral pretreatment methods, parameters of network topology, number of hidden neurons, and times of epoch. The results of the validation showed the two models were robust, accurate and repeatable. Compared to the ANN model, the performance of the PLS model was much better, with lower root mean square error of validation (RMSEP) of 0.419 and higher correlation coefficients (R) of 96.22%.
NASA Astrophysics Data System (ADS)
Okuzawa, Yuki; Kato, Shohei; Kanoh, Masayoshi; Itoh, Hidenori
A knowledge-based approach to imitation learning of motion generation for humanoid robots and an imitative motion generation system based on motion knowledge learning and modification are described. The system has three parts: recognizing, learning, and modifying parts. The first part recognizes an instructed motion distinguishing it from the motion knowledge database by the continuous hidden markov model. When the motion is recognized as being unfamiliar, the second part learns it using locally weighted regression and acquires a knowledge of the motion. When a robot recognizes the instructed motion as familiar or judges that its acquired knowledge is applicable to the motion generation, the third part imitates the instructed motion by modifying a learned motion. This paper reports some performance results: the motion imitation of several radio gymnastics motions.
Pfeiffer, R M; Riedl, R
2015-08-15
We assess the asymptotic bias of estimates of exposure effects conditional on covariates when summary scores of confounders, instead of the confounders themselves, are used to analyze observational data. First, we study regression models for cohort data that are adjusted for summary scores. Second, we derive the asymptotic bias for case-control studies when cases and controls are matched on a summary score, and then analyzed either using conditional logistic regression or by unconditional logistic regression adjusted for the summary score. Two scores, the propensity score (PS) and the disease risk score (DRS) are studied in detail. For cohort analysis, when regression models are adjusted for the PS, the estimated conditional treatment effect is unbiased only for linear models, or at the null for non-linear models. Adjustment of cohort data for DRS yields unbiased estimates only for linear regression; all other estimates of exposure effects are biased. Matching cases and controls on DRS and analyzing them using conditional logistic regression yields unbiased estimates of exposure effect, whereas adjusting for the DRS in unconditional logistic regression yields biased estimates, even under the null hypothesis of no association. Matching cases and controls on the PS yield unbiased estimates only under the null for both conditional and unconditional logistic regression, adjusted for the PS. We study the bias for various confounding scenarios and compare our asymptotic results with those from simulations with limited sample sizes. To create realistic correlations among multiple confounders, we also based simulations on a real dataset. Copyright © 2015 John Wiley & Sons, Ltd.
Updated lateral attenuation in FAA's Integrated Noise Model
DOT National Transportation Integrated Search
2000-08-27
The lateral attenuation algorithm in the Federal Aviation Administration's (FAA) Integrated Noise Model (INM) has historically been based on the two regression equations described in the Society of Automotive Engineers' (SAE) Aerospace Information Re...
NASA Astrophysics Data System (ADS)
von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo
2014-06-01
Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.
Estimation of stature from the foot and its segments in a sub-adult female population of North India
2011-01-01
Background Establishing personal identity is one of the main concerns in forensic investigations. Estimation of stature forms a basic domain of the investigation process in unknown and co-mingled human remains in forensic anthropology case work. The objective of the present study was to set up standards for estimation of stature from the foot and its segments in a sub-adult female population. Methods The sample for the study constituted 149 young females from the Northern part of India. The participants were aged between 13 and 18 years. Besides stature, seven anthropometric measurements that included length of the foot from each toe (T1, T2, T3, T4, and T5 respectively), foot breadth at ball (BBAL) and foot breadth at heel (BHEL) were measured on both feet in each participant using standard methods and techniques. Results The results indicated that statistically significant differences (p < 0.05) between left and right feet occur in both the foot breadth measurements (BBAL and BHEL). Foot length measurements (T1 to T5 lengths) did not show any statistically significant bilateral asymmetry. The correlation between stature and all the foot measurements was found to be positive and statistically significant (p-value < 0.001). Linear regression models and multiple regression models were derived for estimation of stature from the measurements of the foot. The present study indicates that anthropometric measurements of foot and its segments are valuable in the estimation of stature. Foot length measurements estimate stature with greater accuracy when compared to foot breadth measurements. Conclusions The present study concluded that foot measurements have a strong relationship with stature in the sub-adult female population of North India. Hence, the stature of an individual can be successfully estimated from the foot and its segments using different regression models derived in the study. The regression models derived in the study may be applied successfully for the estimation of stature in sub-adult females, whenever foot remains are brought for forensic examination. Stepwise multiple regression models tend to estimate stature more accurately than linear regression models in female sub-adults. PMID:22104433
Krishan, Kewal; Kanchan, Tanuj; Passi, Neelam
2011-11-21
Establishing personal identity is one of the main concerns in forensic investigations. Estimation of stature forms a basic domain of the investigation process in unknown and co-mingled human remains in forensic anthropology case work. The objective of the present study was to set up standards for estimation of stature from the foot and its segments in a sub-adult female population. The sample for the study constituted 149 young females from the Northern part of India. The participants were aged between 13 and 18 years. Besides stature, seven anthropometric measurements that included length of the foot from each toe (T1, T2, T3, T4, and T5 respectively), foot breadth at ball (BBAL) and foot breadth at heel (BHEL) were measured on both feet in each participant using standard methods and techniques. The results indicated that statistically significant differences (p < 0.05) between left and right feet occur in both the foot breadth measurements (BBAL and BHEL). Foot length measurements (T1 to T5 lengths) did not show any statistically significant bilateral asymmetry. The correlation between stature and all the foot measurements was found to be positive and statistically significant (p-value < 0.001). Linear regression models and multiple regression models were derived for estimation of stature from the measurements of the foot. The present study indicates that anthropometric measurements of foot and its segments are valuable in the estimation of stature. Foot length measurements estimate stature with greater accuracy when compared to foot breadth measurements. The present study concluded that foot measurements have a strong relationship with stature in the sub-adult female population of North India. Hence, the stature of an individual can be successfully estimated from the foot and its segments using different regression models derived in the study. The regression models derived in the study may be applied successfully for the estimation of stature in sub-adult females, whenever foot remains are brought for forensic examination. Stepwise multiple regression models tend to estimate stature more accurately than linear regression models in female sub-adults.
The Intergenerational Transmission of Generosity
Wilhelm, Mark O.; Brown, Eleanor; Rooney, Patrick M.; Steinberg, Richard
2008-01-01
This paper estimates the correlation between the generosity of parents and the generosity of their adult children using regression models of adult children’s charitable giving. New charitable giving data are collected in the Panel Study of Income Dynamics and used to estimate the regression models. The regression models are estimated using a wide variety of techniques and specification tests, and the strength of the intergenerational giving correlations are compared with intergenerational correlations in income, wealth, and consumption expenditure from the same sample using the same set of controls. We find the religious giving of parents and children to be strongly correlated, as strongly correlated as are their income and wealth. The correlation in the secular giving (e.g., giving to the United Way, educational institutions, for poverty relief) of parents and children is smaller, similar in magnitude to the intergenerational correlation in consumption. Parents’ religious giving is positively associated with children’s secular giving, but in a more limited sense. Overall, the results are consistent with generosity emerging at least in part from the influence of parental charitable behavior. In contrast to intergenerational models in which parental generosity towards their children can undo government transfer policy (Ricardian equivalence), these results suggest that parental generosity towards charitable organizations might reinforce government policies, such as tax incentives aimed at encouraging voluntary transfers. PMID:19802345
NASA Astrophysics Data System (ADS)
Stigter, T. Y.; Ribeiro, L.; Dill, A. M. M. Carvalho
2008-07-01
SummaryFactorial regression models, based on correspondence analysis, are built to explain the high nitrate concentrations in groundwater beneath an agricultural area in the south of Portugal, exceeding 300 mg/l, as a function of chemical variables, electrical conductivity (EC), land use and hydrogeological setting. Two important advantages of the proposed methodology are that qualitative parameters can be involved in the regression analysis and that multicollinearity is avoided. Regression is performed on eigenvectors extracted from the data similarity matrix, the first of which clearly reveals the impact of agricultural practices and hydrogeological setting on the groundwater chemistry of the study area. Significant correlation exists between response variable NO3- and explanatory variables Ca 2+, Cl -, SO42-, depth to water, aquifer media and land use. Substituting Cl - by the EC results in the most accurate regression model for nitrate, when disregarding the four largest outliers (model A). When built solely on land use and hydrogeological setting, the regression model (model B) is less accurate but more interesting from a practical viewpoint, as it is based on easily obtainable data and can be used to predict nitrate concentrations in groundwater in other areas with similar conditions. This is particularly useful for conservative contaminants, where risk and vulnerability assessment methods, based on assumed rather than established correlations, generally produce erroneous results. Another purpose of the models can be to predict the future evolution of nitrate concentrations under influence of changes in land use or fertilization practices, which occur in compliance with policies such as the Nitrates Directive. Model B predicts a 40% decrease in nitrate concentrations in groundwater of the study area, when horticulture is replaced by other land use with much lower fertilization and irrigation rates.
Bohmanova, J; Miglior, F; Jamrozik, J; Misztal, I; Sullivan, P G
2008-09-01
A random regression model with both random and fixed regressions fitted by Legendre polynomials of order 4 was compared with 3 alternative models fitting linear splines with 4, 5, or 6 knots. The effects common for all models were a herd-test-date effect, fixed regressions on days in milk (DIM) nested within region-age-season of calving class, and random regressions for additive genetic and permanent environmental effects. Data were test-day milk, fat and protein yields, and SCS recorded from 5 to 365 DIM during the first 3 lactations of Canadian Holstein cows. A random sample of 50 herds consisting of 96,756 test-day records was generated to estimate variance components within a Bayesian framework via Gibbs sampling. Two sets of genetic evaluations were subsequently carried out to investigate performance of the 4 models. Models were compared by graphical inspection of variance functions, goodness of fit, error of prediction of breeding values, and stability of estimated breeding values. Models with splines gave lower estimates of variances at extremes of lactations than the model with Legendre polynomials. Differences among models in goodness of fit measured by percentages of squared bias, correlations between predicted and observed records, and residual variances were small. The deviance information criterion favored the spline model with 6 knots. Smaller error of prediction and higher stability of estimated breeding values were achieved by using spline models with 5 and 6 knots compared with the model with Legendre polynomials. In general, the spline model with 6 knots had the best overall performance based upon the considered model comparison criteria.
Estimating malaria burden in Nigeria: a geostatistical modelling approach.
Onyiri, Nnadozie
2015-11-04
This study has produced a map of malaria prevalence in Nigeria based on available data from the Mapping Malaria Risk in Africa (MARA) database, including all malaria prevalence surveys in Nigeria that could be geolocated, as well as data collected during fieldwork in Nigeria between March and June 2007. Logistic regression was fitted to malaria prevalence to identify significant demographic (age) and environmental covariates in STATA. The following environmental covariates were included in the spatial model: the normalized difference vegetation index, the enhanced vegetation index, the leaf area index, the land surface temperature for day and night, land use/landcover (LULC), distance to water bodies, and rainfall. The spatial model created suggests that the two main environmental covariates correlating with malaria presence were land surface temperature for day and rainfall. It was also found that malaria prevalence increased with distance to water bodies up to 4 km. The malaria risk map estimated from the spatial model shows that malaria prevalence in Nigeria varies from 20% in certain areas to 70% in others. The highest prevalence rates were found in the Niger Delta states of Rivers and Bayelsa, the areas surrounding the confluence of the rivers Niger and Benue, and also isolated parts of the north-eastern and north-western parts of the country. Isolated patches of low malaria prevalence were found to be scattered around the country with northern Nigeria having more such areas than the rest of the country. Nigeria's belt of middle regions generally has malaria prevalence of 40% and above.
Rebich, R.A.; Houston, N.A.; Mize, S.V.; Pearson, D.K.; Ging, P.B.; Evan, Hornig C.
2011-01-01
SPAtially Referenced Regressions On Watershed attributes (SPARROW) models were developed to estimate nutrient inputs [total nitrogen (TN) and total phosphorus (TP)] to the northwestern part of the Gulf of Mexico from streams in the South-Central United States (U.S.). This area included drainages of the Lower Mississippi, Arkansas-White-Red, and Texas-Gulf hydrologic regions. The models were standardized to reflect nutrient sources and stream conditions during 2002. Model predictions of nutrient loads (mass per time) and yields (mass per area per time) generally were greatest in streams in the eastern part of the region and along reaches near the Texas and Louisiana shoreline. The Mississippi River and Atchafalaya River watersheds, which drain nearly two-thirds of the conterminous U.S., delivered the largest nutrient loads to the Gulf of Mexico, as expected. However, the three largest delivered TN yields were from the Trinity River/Galveston Bay, Calcasieu River, and Aransas River watersheds, while the three largest delivered TP yields were from the Calcasieu River, Mermentau River, and Trinity River/Galveston Bay watersheds. Model output indicated that the three largest sources of nitrogen from the region were atmospheric deposition (42%), commercial fertilizer (20%), and livestock manure (unconfined, 17%). The three largest sources of phosphorus were commercial fertilizer (28%), urban runoff (23%), and livestock manure (confined and unconfined, 23%). ?? 2011 American Water Resources Association. This article is a U.S. Government work and is in the public domain in the USA.
NASA Astrophysics Data System (ADS)
Mahmod, Wael Elham; Watanabe, Kunio; Zahr-Eldeen, Ashraf A.
2013-08-01
Management of groundwater resources can be enhanced by using numerical models to improve development strategies. However, the lack of basic data often limits the implementation of these models. The Kharga Oasis in the western desert of Egypt is an arid area that mainly depends on groundwater from the Nubian Sandstone Aquifer System (NSAS), for which the hydrogeological data needed for groundwater simulation are lacking, thereby introducing a problem for model calibration and validation. The Grey Model (GM) was adopted to analyze groundwater flow. This model combines a finite element method (FEM) with a linear regression model to try to obtain the best-fit piezometric-level trends compared to observations. The GM simulation results clearly show that the future water table in the northeastern part of the study area will face a severe drawdown compared with that in the southwestern part and that the hydraulic head difference between these parts will reach 140 m by 2060. Given the uncertainty and limitation of available data, the GM produced more realistic results compared with those obtained from a FEM alone. The GM could be applied to other cases with similar data limitations.
Commitment to personal values and guilt feelings in dementia caregivers.
Gallego-Alberto, Laura; Losada, Andrés; Márquez-González, María; Romero-Moreno, Rosa; Vara, Carlos
2017-01-01
Caregivers' commitment to personal values is linked to caregivers' well-being, although the effects of personal values on caregivers' guilt have not been explored to date. The goal of this study is to analyze the relationship between caregivers´ commitment to personal values and guilt feelings. Participants were 179 dementia family caregivers. Face-to-face interviews were carried out to describe sociodemographic variables and assess stressors, caregivers' commitment to personal values and guilt feelings. Commitment to values was conceptualized as two factors (commitment to own values and commitment to family values) and 12 specific individual values (e.g. education, family or caregiving role). Hierarchical regressions were performed controlling for sociodemographic variables and stressors, and introducing the two commitment factors (in a first regression) or the commitment to individual/specific values (in a second regression) as predictors of guilt. In terms of the commitment to values factors, the analyzed regression model explained 21% of the variance of guilt feelings. Only the factor commitment to family values contributed significantly to the model, explaining 7% of variance. With regard to the regression analyzing the contribution of specific values to caregivers' guilt, commitment to the caregiving role and with leisure contributed negatively and significantly to the explanation of caregivers' guilt. Commitment to work contributed positively to guilt feelings. The full model explained 30% of guilt feelings variance. The specific values explained 16% of the variance. Our findings suggest that commitment to personal values is a relevant variable to understand guilt feelings in caregivers.
A catastrophe model for the prospect-utility theory question.
Oliva, Terence A; McDade, Sean R
2008-07-01
Anomalies have played a big part in the analysis of decision making under risk. Both expected utility and prospect theories were born out of anomalies exhibited by actual decision making behavior. Since the same individual can use both expected utility and prospect approaches at different times, it seems there should be a means of uniting the two. This paper turns to nonlinear dynamical systems (NDS), specifically a catastrophe model, to help suggest an 'out of the box' line of solution toward integration. We use a cusp model to create a value surface whose control dimensions are involvement and gains versus losses. By including 'involvement' as a variable the importance of the individual's psychological state is included, and it provides a rationale for how decision makers' changes from expected utility to prospect might occur. Additionally, it provides a possible explanation for what appears to be even more irrational decisions that individuals make when highly emotionally involved. We estimate the catastrophe model using a sample of 997 gamblers who attended a casino and compare it to the linear model using regression. Hence, we have actual data from individuals making real bets, under real conditions.
Automatic age and gender classification using supervised appearance model
NASA Astrophysics Data System (ADS)
Bukar, Ali Maina; Ugail, Hassan; Connah, David
2016-11-01
Age and gender classification are two important problems that recently gained popularity in the research community, due to their wide range of applications. Research has shown that both age and gender information are encoded in the face shape and texture, hence the active appearance model (AAM), a statistical model that captures shape and texture variations, has been one of the most widely used feature extraction techniques for the aforementioned problems. However, AAM suffers from some drawbacks, especially when used for classification. This is primarily because principal component analysis (PCA), which is at the core of the model, works in an unsupervised manner, i.e., PCA dimensionality reduction does not take into account how the predictor variables relate to the response (class labels). Rather, it explores only the underlying structure of the predictor variables, thus, it is no surprise if PCA discards valuable parts of the data that represent discriminatory features. Toward this end, we propose a supervised appearance model (sAM) that improves on AAM by replacing PCA with partial least-squares regression. This feature extraction technique is then used for the problems of age and gender classification. Our experiments show that sAM has better predictive power than the conventional AAM.
Suvak, Michael K; Walling, Sherry M; Iverson, Katherine M; Taft, Casey T; Resick, Patricia A
2009-12-01
Multilevel modeling is a powerful and flexible framework for analyzing nested data structures (e.g., repeated measures or longitudinal designs). The authors illustrate a series of multilevel regression procedures that can be used to elucidate the nature of the relationship between two variables across time. The goal is to help trauma researchers become more aware of the utility of multilevel modeling as a tool for increasing the field's understanding of posttraumatic adaptation. These procedures are demonstrated by examining the relationship between two posttraumatic symptoms, intrusion and avoidance, across five assessment points in a sample of rape and robbery survivors (n = 286). Results revealed that changes in intrusion were highly correlated with changes in avoidance over the 18-month posttrauma period.
Models for predicting the mass of lime fruits by some engineering properties.
Miraei Ashtiani, Seyed-Hassan; Baradaran Motie, Jalal; Emadi, Bagher; Aghkhani, Mohammad-Hosein
2014-11-01
Grading fruits based on mass is important in packaging and reduces the waste, also increases the marketing value of agricultural produce. The aim of this study was mass modeling of two major cultivars of Iranian limes based on engineering attributes. Models were classified into three: 1-Single and multiple variable regressions of lime mass and dimensional characteristics. 2-Single and multiple variable regressions of lime mass and projected areas. 3-Single regression of lime mass based on its actual volume and calculated volume assumed as ellipsoid and prolate spheroid shapes. All properties considered in the current study were found to be statistically significant (ρ < 0.01). The results indicated that mass modeling of lime based on minor diameter and first projected area are the most appropriate models in the first and the second classifications, respectively. In third classification, the best model was obtained on the basis of the prolate spheroid volume. It was finally concluded that the suitable grading system of lime mass is based on prolate spheroid volume.
Evaluating Internal Model Strength and Performance of Myoelectric Prosthesis Control Strategies.
Shehata, Ahmed W; Scheme, Erik J; Sensinger, Jonathon W
2018-05-01
On-going developments in myoelectric prosthesis control have provided prosthesis users with an assortment of control strategies that vary in reliability and performance. Many studies have focused on improving performance by providing feedback to the user but have overlooked the effect of this feedback on internal model development, which is key to improve long-term performance. In this paper, the strength of internal models developed for two commonly used myoelectric control strategies: raw control with raw feedback (using a regression-based approach) and filtered control with filtered feedback (using a classifier-based approach), were evaluated using two psychometric measures: trial-by-trial adaptation and just-noticeable difference. The performance of both strategies was also evaluated using Schmidt's style target acquisition task. Results obtained from 24 able-bodied subjects showed that although filtered control with filtered feedback had better short-term performance in path efficiency ( ), raw control with raw feedback resulted in stronger internal model development ( ), which may lead to better long-term performance. Despite inherent noise in the control signals of the regression controller, these findings suggest that rich feedback associated with regression control may be used to improve human understanding of the myoelectric control system.
Alamaniotis, Miltiadis; Bargiotas, Dimitrios; Tsoukalas, Lefteri H
2016-01-01
Integration of energy systems with information technologies has facilitated the realization of smart energy systems that utilize information to optimize system operation. To that end, crucial in optimizing energy system operation is the accurate, ahead-of-time forecasting of load demand. In particular, load forecasting allows planning of system expansion, and decision making for enhancing system safety and reliability. In this paper, the application of two types of kernel machines for medium term load forecasting (MTLF) is presented and their performance is recorded based on a set of historical electricity load demand data. The two kernel machine models and more specifically Gaussian process regression (GPR) and relevance vector regression (RVR) are utilized for making predictions over future load demand. Both models, i.e., GPR and RVR, are equipped with a Gaussian kernel and are tested on daily predictions for a 30-day-ahead horizon taken from the New England Area. Furthermore, their performance is compared to the ARMA(2,2) model with respect to mean average percentage error and squared correlation coefficient. Results demonstrate the superiority of RVR over the other forecasting models in performing MTLF.
Gender Performance Differences in Biochemistry
ERIC Educational Resources Information Center
Rauschenberger, Matthew M.; Sweeder, Ryan D.
2010-01-01
This study examined the historical performance of students at Michigan State University in a two-part biochemistry series Biochem I (n = 5,900) and Biochem II (n = 5,214) for students enrolled from 1997 to 2009. Multiple linear regressions predicted 54.9-87.5% of the variance in student from Biochem I grade and 53.8-76.1% of the variance in…
Roy, Banibrata; Ripstein, Ira; Perry, Kyle; Cohen, Barry
2016-01-01
To determine whether the pre-medical Grade Point Average (GPA), Medical College Admission Test (MCAT), Internal examinations (Block) and National Board of Medical Examiners (NBME) scores are correlated with and predict the Medical Council of Canada Qualifying Examination Part I (MCCQE-1) scores. Data from 392 admitted students in the graduating classes of 2010-2013 at University of Manitoba (UofM), College of Medicine was considered. Pearson's correlation to assess the strength of the relationship, multiple linear regression to estimate MCCQE-1 score and stepwise linear regression to investigate the amount of variance were employed. Complete data from 367 (94%) students were studied. The MCCQE-1 had a moderate-to-large positive correlation with NBME scores and Block scores but a low correlation with GPA and MCAT scores. The multiple linear regression model gives a good estimate of the MCCQE-1 (R2 =0.604). Stepwise regression analysis demonstrated that 59.2% of the variation in the MCCQE-1 was accounted for by the NBME, but only 1.9% by the Block exams, and negligible variation came from the GPA and the MCAT. Amongst all the examinations used at UofM, the NBME is most closely correlated with MCCQE-1.
Optimization of fixture layouts of glass laser optics using multiple kernel regression.
Su, Jianhua; Cao, Enhua; Qiao, Hong
2014-05-10
We aim to build an integrated fixturing model to describe the structural properties and thermal properties of the support frame of glass laser optics. Therefore, (a) a near global optimal set of clamps can be computed to minimize the surface shape error of the glass laser optic based on the proposed model, and (b) a desired surface shape error can be obtained by adjusting the clamping forces under various environmental temperatures based on the model. To construct the model, we develop a new multiple kernel learning method and call it multiple kernel support vector functional regression. The proposed method uses two layer regressions to group and order the data sources by the weights of the kernels and the factors of the layers. Because of that, the influences of the clamps and the temperature can be evaluated by grouping them into different layers.
Two-component Structure in the Entanglement Spectrum of Highly Excited States
NASA Astrophysics Data System (ADS)
Yang, Zhi-Cheng; Chamon, Claudio; Hamma, Alioscia; Mucciolo, Eduardo
We study the entanglement spectrum of highly excited eigenstates of two known models which exhibit a many-body localization transition, namely the one-dimensional random-field Heisenberg model and the quantum random energy model. Our results indicate that the entanglement spectrum shows a ``two-component'' structure: a universal part that is associated to Random Matrix Theory, and a non-universal part that is model dependent. The non-universal part manifests the deviation of the highly excited eigenstate from a true random state even in the thermalized phase where the Eigenstate Thermalization Hypothesis holds. The fraction of the spectrum containing the universal part decreases continuously as one approaches the critical point and vanishes in the localized phase in the thermodynamic limit. We use the universal part fraction to construct a new order parameter for the many-body delocalized-to-localized transition. Two toy models based on Rokhsar-Kivelson type wavefunctions are constructed and their entanglement spectra are shown to exhibit the same structure.
A general equation to obtain multiple cut-off scores on a test from multinomial logistic regression.
Bersabé, Rosa; Rivas, Teresa
2010-05-01
The authors derive a general equation to compute multiple cut-offs on a total test score in order to classify individuals into more than two ordinal categories. The equation is derived from the multinomial logistic regression (MLR) model, which is an extension of the binary logistic regression (BLR) model to accommodate polytomous outcome variables. From this analytical procedure, cut-off scores are established at the test score (the predictor variable) at which an individual is as likely to be in category j as in category j+1 of an ordinal outcome variable. The application of the complete procedure is illustrated by an example with data from an actual study on eating disorders. In this example, two cut-off scores on the Eating Attitudes Test (EAT-26) scores are obtained in order to classify individuals into three ordinal categories: asymptomatic, symptomatic and eating disorder. Diagnoses were made from the responses to a self-report (Q-EDD) that operationalises DSM-IV criteria for eating disorders. Alternatives to the MLR model to set multiple cut-off scores are discussed.
What are hierarchical models and how do we analyze them?
Royle, Andy
2016-01-01
In this chapter we provide a basic definition of hierarchical models and introduce the two canonical hierarchical models in this book: site occupancy and N-mixture models. The former is a hierarchical extension of logistic regression and the latter is a hierarchical extension of Poisson regression. We introduce basic concepts of probability modeling and statistical inference including likelihood and Bayesian perspectives. We go through the mechanics of maximizing the likelihood and characterizing the posterior distribution by Markov chain Monte Carlo (MCMC) methods. We give a general perspective on topics such as model selection and assessment of model fit, although we demonstrate these topics in practice in later chapters (especially Chapters 5, 6, 7, and 10 Chapter 5 Chapter 6 Chapter 7 Chapter 10)
NASA Astrophysics Data System (ADS)
Nong, Yu; Du, Qingyun; Wang, Kun; Miao, Lei; Zhang, Weiwei
2008-10-01
Urban growth modeling, one of the most important aspects of land use and land cover change study, has attracted substantial attention because it helps to comprehend the mechanisms of land use change thus helps relevant policies made. This study applied multinomial logistic regression to model urban growth in the Jiayu county of Hubei province, China to discover the relationship between urban growth and the driving forces of which biophysical and social-economic factors are selected as independent variables. This type of regression is similar to binary logistic regression, but it is more general because the dependent variable is not restricted to two categories, as those previous studies did. The multinomial one can simulate the process of multiple land use competition between urban land, bare land, cultivated land and orchard land. Taking the land use type of Urban as reference category, parameters could be estimated with odds ratio. A probability map is generated from the model to predict where urban growth will occur as a result of the computation.
Guan, Yongtao; Li, Yehua; Sinha, Rajita
2011-01-01
In a cocaine dependence treatment study, we use linear and nonlinear regression models to model posttreatment cocaine craving scores and first cocaine relapse time. A subset of the covariates are summary statistics derived from baseline daily cocaine use trajectories, such as baseline cocaine use frequency and average daily use amount. These summary statistics are subject to estimation error and can therefore cause biased estimators for the regression coefficients. Unlike classical measurement error problems, the error we encounter here is heteroscedastic with an unknown distribution, and there are no replicates for the error-prone variables or instrumental variables. We propose two robust methods to correct for the bias: a computationally efficient method-of-moments-based method for linear regression models and a subsampling extrapolation method that is generally applicable to both linear and nonlinear regression models. Simulations and an application to the cocaine dependence treatment data are used to illustrate the efficacy of the proposed methods. Asymptotic theory and variance estimation for the proposed subsampling extrapolation method and some additional simulation results are described in the online supplementary material. PMID:21984854
NASA Astrophysics Data System (ADS)
Wang, Lunche; Kisi, Ozgur; Zounemat-Kermani, Mohammad; Li, Hui
2017-01-01
Pan evaporation (Ep) plays important roles in agricultural water resources management. One of the basic challenges is modeling Ep using limited climatic parameters because there are a number of factors affecting the evaporation rate. This study investigated the abilities of six different soft computing methods, multi-layer perceptron (MLP), generalized regression neural network (GRNN), fuzzy genetic (FG), least square support vector machine (LSSVM), multivariate adaptive regression spline (MARS), adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP), and two regression methods, multiple linear regression (MLR) and Stephens and Stewart model (SS) in predicting monthly Ep. Long-term climatic data at various sites crossing a wide range of climates during 1961-2000 are used for model development and validation. The results showed that the models have different accuracies in different climates and the MLP model performed superior to the other models in predicting monthly Ep at most stations using local input combinations (for example, the MAE (mean absolute errors), RMSE (root mean square errors), and determination coefficient (R2) are 0.314 mm/day, 0.405 mm/day and 0.988, respectively for HEB station), while GRNN model performed better in Tibetan Plateau (MAE, RMSE and R2 are 0.459 mm/day, 0.592 mm/day and 0.932, respectively). The accuracies of above models ranked as: MLP, GRNN, LSSVM, FG, ANFIS-GP, MARS and MLR. The overall results indicated that the soft computing techniques generally performed better than the regression methods, but MLR and SS models can be more preferred at some climatic zones instead of complex nonlinear models, for example, the BJ (Beijing), CQ (Chongqing) and HK (Haikou) stations. Therefore, it can be concluded that Ep could be successfully predicted using above models in hydrological modeling studies.
Gruber, Susan; Logan, Roger W; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A
2015-01-15
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However, a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V-fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. Copyright © 2014 John Wiley & Sons, Ltd.
Gruber, Susan; Logan, Roger W.; Jarrín, Inmaculada; Monge, Susana; Hernán, Miguel A.
2014-01-01
Inverse probability weights used to fit marginal structural models are typically estimated using logistic regression. However a data-adaptive procedure may be able to better exploit information available in measured covariates. By combining predictions from multiple algorithms, ensemble learning offers an alternative to logistic regression modeling to further reduce bias in estimated marginal structural model parameters. We describe the application of two ensemble learning approaches to estimating stabilized weights: super learning (SL), an ensemble machine learning approach that relies on V -fold cross validation, and an ensemble learner (EL) that creates a single partition of the data into training and validation sets. Longitudinal data from two multicenter cohort studies in Spain (CoRIS and CoRIS-MD) were analyzed to estimate the mortality hazard ratio for initiation versus no initiation of combined antiretroviral therapy among HIV positive subjects. Both ensemble approaches produced hazard ratio estimates further away from the null, and with tighter confidence intervals, than logistic regression modeling. Computation time for EL was less than half that of SL. We conclude that ensemble learning using a library of diverse candidate algorithms offers an alternative to parametric modeling of inverse probability weights when fitting marginal structural models. With large datasets, EL provides a rich search over the solution space in less time than SL with comparable results. PMID:25316152
Factors That Influence Mandatory Child Abuse Reporting Attitudes of Pediatric Nurses in Korea.
Lee, In Sook; Kim, Kyoung Ja
This study aimed to identify knowledge of child abuse, awareness of child abuse reporting, factors that influence attitudes toward mandatory reporting, and professionalism among a sample of pediatric nurses in Korea. One hundred sixteen pediatric nurses working at two university hospitals in Korea took part in the study and completed self-administered questionnaires. The data were analyzed using descriptive statistics, t tests, analysis of variance, Pearson correlation coefficients, and hierarchical regression analysis. Knowledge of child abuse, awareness of child abuse reporting, and attitudes toward mandatory reporting were low. Regarding nursing professionalism, social perceptions had the lowest mean score and nursing autonomy had the highest mean score. Attitudes toward mandatory reporting significantly correlated with professionalism. In the hierarchical regression model, the influences of nursing autonomy and intentions to report child abuse on attitudes toward mandatory reporting were statistically significant (F = 2.176, p = .013), explaining 32% of the variation in attitudes toward mandatory reporting. The results of this study could be used to improve systems and policies addressing child abuse and to further develop reporting procedures for identifying children at risk of abuse, to ensure their protection as a professional responsibility.