2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project. PMID:26339227
Shin, Yoonseok
2015-01-01
Among the recent data mining techniques available, the boosting approach has attracted a great deal of attention because of its effective learning algorithm and strong boundaries in terms of its generalization performance. However, the boosting approach has yet to be used in regression problems within the construction domain, including cost estimations, but has been actively utilized in other domains. Therefore, a boosting regression tree (BRT) is applied to cost estimations at the early stage of a construction project to examine the applicability of the boosting approach to a regression problem within the construction domain. To evaluate the performance of the BRT model, its performance was compared with that of a neural network (NN) model, which has been proven to have a high performance in cost estimation domains. The BRT model has shown results similar to those of NN model using 234 actual cost datasets of a building construction project. In addition, the BRT model can provide additional information such as the importance plot and structure model, which can support estimators in comprehending the decision making process. Consequently, the boosting approach has potential applicability in preliminary cost estimations in a building construction project.
NASA Astrophysics Data System (ADS)
Bae, Gihyun; Huh, Hoon; Park, Sungho
This paper deals with a regression model for light weight and crashworthiness enhancement design of automotive parts in frontal car crash. The ULSAB-AVC model is employed for the crash analysis and effective parts are selected based on the amount of energy absorption during the crash behavior. Finite element analyses are carried out for designated design cases in order to investigate the crashworthiness and weight according to the material and thickness of main energy absorption parts. Based on simulations results, a regression analysis is performed to construct a regression model utilized for light weight and crashworthiness enhancement design of automotive parts. An example for weight reduction of main energy absorption parts demonstrates the validity of a regression model constructed.
Oil and gas pipeline construction cost analysis and developing regression models for cost estimation
NASA Astrophysics Data System (ADS)
Thaduri, Ravi Kiran
In this study, cost data for 180 pipelines and 136 compressor stations have been analyzed. On the basis of the distribution analysis, regression models have been developed. Material, Labor, ROW and miscellaneous costs make up the total cost of a pipeline construction. The pipelines are analyzed based on different pipeline lengths, diameter, location, pipeline volume and year of completion. In a pipeline construction, labor costs dominate the total costs with a share of about 40%. Multiple non-linear regression models are developed to estimate the component costs of pipelines for various cross-sectional areas, lengths and locations. The Compressor stations are analyzed based on the capacity, year of completion and location. Unlike the pipeline costs, material costs dominate the total costs in the construction of compressor station, with an average share of about 50.6%. Land costs have very little influence on the total costs. Similar regression models are developed to estimate the component costs of compressor station for various capacities and locations.
Engoren, Milo; Habib, Robert H; Dooner, John J; Schwann, Thomas A
2013-08-01
As many as 14 % of patients undergoing coronary artery bypass surgery are readmitted within 30 days. Readmission is usually the result of morbidity and may lead to death. The purpose of this study is to develop and compare statistical and genetic programming models to predict readmission. Patients were divided into separate Construction and Validation populations. Using 88 variables, logistic regression, genetic programs, and artificial neural nets were used to develop predictive models. Models were first constructed and tested on the Construction populations, then validated on the Validation population. Areas under the receiver operator characteristic curves (AU ROC) were used to compare the models. Two hundred and two patients (7.6 %) in the 2,644 patient Construction group and 216 (8.0 %) of the 2,711 patient Validation group were re-admitted within 30 days of CABG surgery. Logistic regression predicted readmission with AU ROC = .675 ± .021 in the Construction group. Genetic programs significantly improved the accuracy, AU ROC = .767 ± .001, p < .001). Artificial neural nets were less accurate with AU ROC = 0.597 ± .001 in the Construction group. Predictive accuracy of all three techniques fell in the Validation group. However, the accuracy of genetic programming (AU ROC = .654 ± .001) was still trivially but statistically non-significantly better than that of the logistic regression (AU ROC = .644 ± .020, p = .61). Genetic programming and logistic regression provide alternative methods to predict readmission that are similarly accurate.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-25
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.
Scoring and staging systems using cox linear regression modeling and recursive partitioning.
Lee, J W; Um, S H; Lee, J B; Mun, J; Cho, H
2006-01-01
Scoring and staging systems are used to determine the order and class of data according to predictors. Systems used for medical data, such as the Child-Turcotte-Pugh scoring and staging systems for ordering and classifying patients with liver disease, are often derived strictly from physicians' experience and intuition. We construct objective and data-based scoring/staging systems using statistical methods. We consider Cox linear regression modeling and recursive partitioning techniques for censored survival data. In particular, to obtain a target number of stages we propose cross-validation and amalgamation algorithms. We also propose an algorithm for constructing scoring and staging systems by integrating local Cox linear regression models into recursive partitioning, so that we can retain the merits of both methods such as superior predictive accuracy, ease of use, and detection of interactions between predictors. The staging system construction algorithms are compared by cross-validation evaluation of real data. The data-based cross-validation comparison shows that Cox linear regression modeling is somewhat better than recursive partitioning when there are only continuous predictors, while recursive partitioning is better when there are significant categorical predictors. The proposed local Cox linear recursive partitioning has better predictive accuracy than Cox linear modeling and simple recursive partitioning. This study indicates that integrating local linear modeling into recursive partitioning can significantly improve prediction accuracy in constructing scoring and staging systems.
Predicting School Enrollments Using the Modified Regression Technique.
ERIC Educational Resources Information Center
Grip, Richard S.; Young, John W.
This report is based on a study in which a regression model was constructed to increase accuracy in enrollment predictions. A model, known as the Modified Regression Technique (MRT), was used to examine K-12 enrollment over the past 20 years in 2 New Jersey school districts of similar size and ethnicity. To test the model's accuracy, MRT was…
Hemmila, April; McGill, Jim; Ritter, David
2008-03-01
To determine if changes in fingerprint infrared spectra linear with age can be found, partial least squares (PLS1) regression of 155 fingerprint infrared spectra against the person's age was constructed. The regression produced a linear model of age as a function of spectrum with a root mean square error of calibration of less than 4 years, showing an inflection at about 25 years of age. The spectral ranges emphasized by the regression do not correspond to the highest concentration constituents of the fingerprints. Separate linear regression models for old and young people can be constructed with even more statistical rigor. The success of the regression demonstrates that a combination of constituents can be found that changes linearly with age, with a significant shift around puberty.
Parisi Kern, Andrea; Ferreira Dias, Michele; Piva Kulakowski, Marlova; Paulo Gomes, Luciana
2015-05-01
Reducing construction waste is becoming a key environmental issue in the construction industry. The quantification of waste generation rates in the construction sector is an invaluable management tool in supporting mitigation actions. However, the quantification of waste can be a difficult process because of the specific characteristics and the wide range of materials used in different construction projects. Large variations are observed in the methods used to predict the amount of waste generated because of the range of variables involved in construction processes and the different contexts in which these methods are employed. This paper proposes a statistical model to determine the amount of waste generated in the construction of high-rise buildings by assessing the influence of design process and production system, often mentioned as the major culprits behind the generation of waste in construction. Multiple regression was used to conduct a case study based on multiple sources of data of eighteen residential buildings. The resulting statistical model produced dependent (i.e. amount of waste generated) and independent variables associated with the design and the production system used. The best regression model obtained from the sample data resulted in an adjusted R(2) value of 0.694, which means that it predicts approximately 69% of the factors involved in the generation of waste in similar constructions. Most independent variables showed a low determination coefficient when assessed in isolation, which emphasizes the importance of assessing their joint influence on the response (dependent) variable. Copyright © 2015 Elsevier Ltd. All rights reserved.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-01
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026
An Intelligent Decision Support System for Workforce Forecast
2011-01-01
ARIMA ) model to forecast the demand for construction skills in Hong Kong. This model was based...Decision Trees ARIMA Rule Based Forecasting Segmentation Forecasting Regression Analysis Simulation Modeling Input-Output Models LP and NLP Markovian...data • When results are needed as a set of easily interpretable rules 4.1.4 ARIMA Auto-regressive, integrated, moving-average ( ARIMA ) models
Construction of mathematical model for measuring material concentration by colorimetric method
NASA Astrophysics Data System (ADS)
Liu, Bing; Gao, Lingceng; Yu, Kairong; Tan, Xianghua
2018-06-01
This paper use the method of multiple linear regression to discuss the data of C problem of mathematical modeling in 2017. First, we have established a regression model for the concentration of 5 substances. But only the regression model of the substance concentration of urea in milk can pass through the significance test. The regression model established by the second sets of data can pass the significance test. But this model exists serious multicollinearity. We have improved the model by principal component analysis. The improved model is used to control the system so that it is possible to measure the concentration of material by direct colorimetric method.
Chen, Liang-Hsuan; Hsueh, Chan-Ching
2007-06-01
Fuzzy regression models are useful to investigate the relationship between explanatory and response variables with fuzzy observations. Different from previous studies, this correspondence proposes a mathematical programming method to construct a fuzzy regression model based on a distance criterion. The objective of the mathematical programming is to minimize the sum of distances between the estimated and observed responses on the X axis, such that the fuzzy regression model constructed has the minimal total estimation error in distance. Only several alpha-cuts of fuzzy observations are needed as inputs to the mathematical programming model; therefore, the applications are not restricted to triangular fuzzy numbers. Three examples, adopted in the previous studies, and a larger example, modified from the crisp case, are used to illustrate the performance of the proposed approach. The results indicate that the proposed model has better performance than those in the previous studies based on either distance criterion or Kim and Bishu's criterion. In addition, the efficiency and effectiveness for solving the larger example by the proposed model are also satisfactory.
Fei, Yang; Hu, Jian; Gao, Kun; Tu, Jianfeng; Li, Wei-Qin; Wang, Wei
2017-06-01
To construct a radical basis function (RBF) artificial neural networks (ANNs) model to predict the incidence of acute pancreatitis (AP)-induced portal vein thrombosis. The analysis included 353 patients with AP who had admitted between January 2011 and December 2015. RBF ANNs model and logistic regression model were constructed based on eleven factors relevant to AP respectively. Statistical indexes were used to evaluate the value of the prediction in two models. The predict sensitivity, specificity, positive predictive value, negative predictive value and accuracy by RBF ANNs model for PVT were 73.3%, 91.4%, 68.8%, 93.0% and 87.7%, respectively. There were significant differences between the RBF ANNs and logistic regression models in these parameters (P<0.05). In addition, a comparison of the area under receiver operating characteristic curves of the two models showed a statistically significant difference (P<0.05). The RBF ANNs model is more likely to predict the occurrence of PVT induced by AP than logistic regression model. D-dimer, AMY, Hct and PT were important prediction factors of approval for AP-induced PVT. Copyright © 2017 Elsevier Inc. All rights reserved.
Quantile regression models of animal habitat relationships
Cade, Brian S.
2003-01-01
Typically, all factors that limit an organism are not measured and included in statistical models used to investigate relationships with their environment. If important unmeasured variables interact multiplicatively with the measured variables, the statistical models often will have heterogeneous response distributions with unequal variances. Quantile regression is an approach for estimating the conditional quantiles of a response variable distribution in the linear model, providing a more complete view of possible causal relationships between variables in ecological processes. Chapter 1 introduces quantile regression and discusses the ordering characteristics, interval nature, sampling variation, weighting, and interpretation of estimates for homogeneous and heterogeneous regression models. Chapter 2 evaluates performance of quantile rankscore tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). A permutation F test maintained better Type I errors than the Chi-square T test for models with smaller n, greater number of parameters p, and more extreme quantiles τ. Both versions of the test required weighting to maintain correct Type I errors when there was heterogeneity under the alternative model. An example application related trout densities to stream channel width:depth. Chapter 3 evaluates a drop in dispersion, F-ratio like permutation test for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1). Chapter 4 simulates from a large (N = 10,000) finite population representing grid areas on a landscape to demonstrate various forms of hidden bias that might occur when the effect of a measured habitat variable on some animal was confounded with the effect of another unmeasured variable (spatially and not spatially structured). Depending on whether interactions of the measured habitat and unmeasured variable were negative (interference interactions) or positive (facilitation interactions), either upper (τ > 0.5) or lower (τ < 0.5) quantile regression parameters were less biased than mean rate parameters. Sampling (n = 20 - 300) simulations demonstrated that confidence intervals constructed by inverting rankscore tests provided valid coverage of these biased parameters. Quantile regression was used to estimate effects of physical habitat resources on a bivalve mussel (Macomona liliana) in a New Zealand harbor by modeling the spatial trend surface as a cubic polynomial of location coordinates.
Workers' compensation costs among construction workers: a robust regression analysis.
Friedman, Lee S; Forst, Linda S
2009-11-01
Workers' compensation data are an important source for evaluating costs associated with construction injuries. We describe the characteristics of injured construction workers filing claims in Illinois between 2000 and 2005 and the factors associated with compensation costs using a robust regression model. In the final multivariable model, the cumulative percent temporary and permanent disability-measures of severity of injury-explained 38.7% of the variance of cost. Attorney costs explained only 0.3% of the variance of the dependent variable. The model used in this study clearly indicated that percent disability was the most important determinant of cost, although the method and uniformity of percent impairment allocation could be better elucidated. There is a need to integrate analytical methods that are suitable for skewed data when analyzing claim costs.
NASA Astrophysics Data System (ADS)
Arabzadeh, Vida; Niaki, S. T. A.; Arabzadeh, Vahid
2017-10-01
One of the most important processes in the early stages of construction projects is to estimate the cost involved. This process involves a wide range of uncertainties, which make it a challenging task. Because of unknown issues, using the experience of the experts or looking for similar cases are the conventional methods to deal with cost estimation. The current study presents data-driven methods for cost estimation based on the application of artificial neural network (ANN) and regression models. The learning algorithms of the ANN are the Levenberg-Marquardt and the Bayesian regulated. Moreover, regression models are hybridized with a genetic algorithm to obtain better estimates of the coefficients. The methods are applied in a real case, where the input parameters of the models are assigned based on the key issues involved in a spherical tank construction. The results reveal that while a high correlation between the estimated cost and the real cost exists; both ANNs could perform better than the hybridized regression models. In addition, the ANN with the Levenberg-Marquardt learning algorithm (LMNN) obtains a better estimation than the ANN with the Bayesian-regulated learning algorithm (BRNN). The correlation between real data and estimated values is over 90%, while the mean square error is achieved around 0.4. The proposed LMNN model can be effective to reduce uncertainty and complexity in the early stages of the construction project.
Vesicular stomatitis forecasting based on Google Trends
Lu, Yi; Zhou, GuangYa; Chen, Qin
2018-01-01
Background Vesicular stomatitis (VS) is an important viral disease of livestock. The main feature of VS is irregular blisters that occur on the lips, tongue, oral mucosa, hoof crown and nipple. Humans can also be infected with vesicular stomatitis and develop meningitis. This study analyses 2014 American VS outbreaks in order to accurately predict vesicular stomatitis outbreak trends. Methods American VS outbreaks data were collected from OIE. The data for VS keywords were obtained by inputting 24 disease-related keywords into Google Trends. After calculating the Pearson and Spearman correlation coefficients, it was found that there was a relationship between outbreaks and keywords derived from Google Trends. Finally, the predicted model was constructed based on qualitative classification and quantitative regression. Results For the regression model, the Pearson correlation coefficients between the predicted outbreaks and actual outbreaks are 0.953 and 0.948, respectively. For the qualitative classification model, we constructed five classification predictive models and chose the best classification predictive model as the result. The results showed, SN (sensitivity), SP (specificity) and ACC (prediction accuracy) values of the best classification predictive model are 78.52%,72.5% and 77.14%, respectively. Conclusion This study applied Google search data to construct a qualitative classification model and a quantitative regression model. The results show that the method is effective and that these two models obtain more accurate forecast. PMID:29385198
Nishii, Takashi; Genkawa, Takuma; Watari, Masahiro; Ozaki, Yukihiro
2012-01-01
A new selection procedure of an informative near-infrared (NIR) region for regression model building is proposed that uses an online NIR/mid-infrared (mid-IR) dual-region spectrometer in conjunction with two-dimensional (2D) NIR/mid-IR heterospectral correlation spectroscopy. In this procedure, both NIR and mid-IR spectra of a liquid sample are acquired sequentially during a reaction process using the NIR/mid-IR dual-region spectrometer; the 2D NIR/mid-IR heterospectral correlation spectrum is subsequently calculated from the obtained spectral data set. From the calculated 2D spectrum, a NIR region is selected that includes bands of high positive correlation intensity with mid-IR bands assigned to the analyte, and used for the construction of a regression model. To evaluate the performance of this procedure, a partial least-squares (PLS) regression model of the ethanol concentration in a fermentation process was constructed. During fermentation, NIR/mid-IR spectra in the 10000 - 1200 cm(-1) region were acquired every 3 min, and a 2D NIR/mid-IR heterospectral correlation spectrum was calculated to investigate the correlation intensity between the NIR and mid-IR bands. NIR regions that include bands at 4343, 4416, 5778, 5904, and 5955 cm(-1), which result from the combinations and overtones of the C-H group of ethanol, were selected for use in the PLS regression models, by taking the correlation intensity of a mid-IR band at 2985 cm(-1) arising from the CH(3) asymmetric stretching vibration mode of ethanol as a reference. The predicted results indicate that the ethanol concentrations calculated from the PLS regression models fit well to those obtained by high-performance liquid chromatography. Thus, it can be concluded that the selection procedure using the NIR/mid-IR dual-region spectrometer combined with 2D NIR/mid-IR heterospectral correlation spectroscopy is a powerful method for the construction of a reliable regression model.
The use of cognitive ability measures as explanatory variables in regression analysis.
Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J
2012-12-01
Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual's wage, or a decision such as an individual's education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score , constructed via standard psychometric practice from individuals' responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a "mixed effects structural equations" (MESE) model, may be more appropriate in many circumstances.
Chan, Siew Foong; Deeks, Jonathan J; Macaskill, Petra; Irwig, Les
2008-01-01
To compare three predictive models based on logistic regression to estimate adjusted likelihood ratios allowing for interdependency between diagnostic variables (tests). This study was a review of the theoretical basis, assumptions, and limitations of published models; and a statistical extension of methods and application to a case study of the diagnosis of obstructive airways disease based on history and clinical examination. Albert's method includes an offset term to estimate an adjusted likelihood ratio for combinations of tests. Spiegelhalter and Knill-Jones method uses the unadjusted likelihood ratio for each test as a predictor and computes shrinkage factors to allow for interdependence. Knottnerus' method differs from the other methods because it requires sequencing of tests, which limits its application to situations where there are few tests and substantial data. Although parameter estimates differed between the models, predicted "posttest" probabilities were generally similar. Construction of predictive models using logistic regression is preferred to the independence Bayes' approach when it is important to adjust for dependency of tests errors. Methods to estimate adjusted likelihood ratios from predictive models should be considered in preference to a standard logistic regression model to facilitate ease of interpretation and application. Albert's method provides the most straightforward approach.
Jiang, Feng; Han, Ji-zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods. PMID:29623088
Yu, Xu; Lin, Jun-Yu; Jiang, Feng; Du, Jun-Wei; Han, Ji-Zhong
2018-01-01
Cross-domain collaborative filtering (CDCF) solves the sparsity problem by transferring rating knowledge from auxiliary domains. Obviously, different auxiliary domains have different importance to the target domain. However, previous works cannot evaluate effectively the significance of different auxiliary domains. To overcome this drawback, we propose a cross-domain collaborative filtering algorithm based on Feature Construction and Locally Weighted Linear Regression (FCLWLR). We first construct features in different domains and use these features to represent different auxiliary domains. Thus the weight computation across different domains can be converted as the weight computation across different features. Then we combine the features in the target domain and in the auxiliary domains together and convert the cross-domain recommendation problem into a regression problem. Finally, we employ a Locally Weighted Linear Regression (LWLR) model to solve the regression problem. As LWLR is a nonparametric regression method, it can effectively avoid underfitting or overfitting problem occurring in parametric regression methods. We conduct extensive experiments to show that the proposed FCLWLR algorithm is effective in addressing the data sparsity problem by transferring the useful knowledge from the auxiliary domains, as compared to many state-of-the-art single-domain or cross-domain CF methods.
System dynamic modeling: an alternative method for budgeting.
Srijariya, Witsanuchai; Riewpaiboon, Arthorn; Chaikledkaew, Usa
2008-03-01
To construct, validate, and simulate a system dynamic financial model and compare it against the conventional method. The study was a cross-sectional analysis of secondary data retrieved from the National Health Security Office (NHSO) in the fiscal year 2004. The sample consisted of all emergency patients who received emergency services outside their registered hospital-catchments area. The dependent variable used was the amount of reimbursed money. Two types of model were constructed, namely, the system dynamic model using the STELLA software and the multiple linear regression model. The outputs of both methods were compared. The study covered 284,716 patients from various levels of providers. The system dynamic model had the capability of producing various types of outputs, for example, financial and graphical analyses. For the regression analysis, statistically significant predictors were composed of service types (outpatient or inpatient), operating procedures, length of stay, illness types (accident or not), hospital characteristics, age, and hospital location (adjusted R(2) = 0.74). The total budget arrived at from using the system dynamic model and regression model was US$12,159,614.38 and US$7,301,217.18, respectively, whereas the actual NHSO reimbursement cost was US$12,840,805.69. The study illustrated that the system dynamic model is a useful financial management tool, although it is not easy to construct. The model is not only more accurate in prediction but is also more capable of analyzing large and complex real-world situations than the conventional method.
Future climate data from RCP 4.5 and occurrence of malaria in Korea.
Kwak, Jaewon; Noh, Huiseong; Kim, Soojun; Singh, Vijay P; Hong, Seung Jin; Kim, Duckgil; Lee, Keonhaeng; Kang, Narae; Kim, Hung Soo
2014-10-15
Since its reappearance at the Military Demarcation Line in 1993, malaria has been occurring annually in Korea. Malaria is regarded as a third grade nationally notifiable disease susceptible to climate change. The objective of this study is to quantify the effect of climatic factors on the occurrence of malaria in Korea and construct a malaria occurrence model for predicting the future trend of malaria under the influence of climate change. Using data from 2001-2011, the effect of time lag between malaria occurrence and mean temperature, relative humidity and total precipitation was investigated using spectral analysis. Also, a principal component regression model was constructed, considering multicollinearity. Future climate data, generated from RCP 4.5 climate change scenario and CNCM3 climate model, was applied to the constructed regression model to simulate future malaria occurrence and analyze the trend of occurrence. Results show an increase in the occurrence of malaria and the shortening of annual time of occurrence in the future.
Future Climate Data from RCP 4.5 and Occurrence of Malaria in Korea
Kwak, Jaewon; Noh, Huiseong; Kim, Soojun; Singh, Vijay P.; Hong, Seung Jin; Kim, Duckgil; Lee, Keonhaeng; Kang, Narae; Kim, Hung Soo
2014-01-01
Since its reappearance at the Military Demarcation Line in 1993, malaria has been occurring annually in Korea. Malaria is regarded as a third grade nationally notifiable disease susceptible to climate change. The objective of this study is to quantify the effect of climatic factors on the occurrence of malaria in Korea and construct a malaria occurrence model for predicting the future trend of malaria under the influence of climate change. Using data from 2001–2011, the effect of time lag between malaria occurrence and mean temperature, relative humidity and total precipitation was investigated using spectral analysis. Also, a principal component regression model was constructed, considering multicollinearity. Future climate data, generated from RCP 4.5 climate change scenario and CNCM3 climate model, was applied to the constructed regression model to simulate future malaria occurrence and analyze the trend of occurrence. Results show an increase in the occurrence of malaria and the shortening of annual time of occurrence in the future. PMID:25321875
NASA Astrophysics Data System (ADS)
Mitra, Ashis; Majumdar, Prabal Kumar; Bannerjee, Debamalya
2013-03-01
This paper presents a comparative analysis of two modeling methodologies for the prediction of air permeability of plain woven handloom cotton fabrics. Four basic fabric constructional parameters namely ends per inch, picks per inch, warp count and weft count have been used as inputs for artificial neural network (ANN) and regression models. Out of the four regression models tried, interaction model showed very good prediction performance with a meager mean absolute error of 2.017 %. However, ANN models demonstrated superiority over the regression models both in terms of correlation coefficient and mean absolute error. The ANN model with 10 nodes in the single hidden layer showed very good correlation coefficient of 0.982 and 0.929 and mean absolute error of only 0.923 and 2.043 % for training and testing data respectively.
The use of cognitive ability measures as explanatory variables in regression analysis
Junker, Brian; Schofield, Lynne Steuerle; Taylor, Lowell J
2015-01-01
Cognitive ability measures are often taken as explanatory variables in regression analysis, e.g., as a factor affecting a market outcome such as an individual’s wage, or a decision such as an individual’s education acquisition. Cognitive ability is a latent construct; its true value is unobserved. Nonetheless, researchers often assume that a test score, constructed via standard psychometric practice from individuals’ responses to test items, can be safely used in regression analysis. We examine problems that can arise, and suggest that an alternative approach, a “mixed effects structural equations” (MESE) model, may be more appropriate in many circumstances. PMID:26998417
Oviedo de la Fuente, Manuel; Febrero-Bande, Manuel; Muñoz, María Pilar; Domínguez, Àngela
2018-01-01
This paper proposes a novel approach that uses meteorological information to predict the incidence of influenza in Galicia (Spain). It extends the Generalized Least Squares (GLS) methods in the multivariate framework to functional regression models with dependent errors. These kinds of models are useful when the recent history of the incidence of influenza are readily unavailable (for instance, by delays on the communication with health informants) and the prediction must be constructed by correcting the temporal dependence of the residuals and using more accessible variables. A simulation study shows that the GLS estimators render better estimations of the parameters associated with the regression model than they do with the classical models. They obtain extremely good results from the predictive point of view and are competitive with the classical time series approach for the incidence of influenza. An iterative version of the GLS estimator (called iGLS) was also proposed that can help to model complicated dependence structures. For constructing the model, the distance correlation measure [Formula: see text] was employed to select relevant information to predict influenza rate mixing multivariate and functional variables. These kinds of models are extremely useful to health managers in allocating resources in advance to manage influenza epidemics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Gupta, Shikha
Ensemble learning approach based decision treeboost (DTB) and decision tree forest (DTF) models are introduced in order to establish quantitative structure–toxicity relationship (QSTR) for the prediction of toxicity of 1450 diverse chemicals. Eight non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals was evaluated using Tanimoto similarity index. Stochastic gradient boosting and bagging algorithms supplemented DTB and DTF models were constructed for classification and function optimization problems using the toxicity end-point in T. pyriformis. Special attention was drawn to prediction ability and robustness of the models, investigated both in external and 10-fold cross validation processes. In complete data,more » optimal DTB and DTF models rendered accuracies of 98.90%, 98.83% in two-category and 98.14%, 98.14% in four-category toxicity classifications. Both the models further yielded classification accuracies of 100% in external toxicity data of T. pyriformis. The constructed regression models (DTB and DTF) using five descriptors yielded correlation coefficients (R{sup 2}) of 0.945, 0.944 between the measured and predicted toxicities with mean squared errors (MSEs) of 0.059, and 0.064 in complete T. pyriformis data. The T. pyriformis regression models (DTB and DTF) applied to the external toxicity data sets yielded R{sup 2} and MSE values of 0.637, 0.655; 0.534, 0.507 (marine bacteria) and 0.741, 0.691; 0.155, 0.173 (algae). The results suggest for wide applicability of the inter-species models in predicting toxicity of new chemicals for regulatory purposes. These approaches provide useful strategy and robust tools in the screening of ecotoxicological risk or environmental hazard potential of chemicals. - Graphical abstract: Importance of input variables in DTB and DTF classification models for (a) two-category, and (b) four-category toxicity intervals in T. pyriformis data. Generalization and predictive abilities of the constructed (c) DTB and (d) DTF regression models to predict the T. pyriformis toxicity of diverse chemicals. - Highlights: • Ensemble learning (EL) based models constructed for toxicity prediction of chemicals • Predictive models used a few simple non-quantum mechanical molecular descriptors. • EL-based DTB/DTF models successfully discriminated toxic and non-toxic chemicals. • DTB/DTF regression models precisely predicted toxicity of chemicals in multi-species. • Proposed EL based models can be used as tool to predict toxicity of new chemicals.« less
NASA Astrophysics Data System (ADS)
Schaeben, Helmut; Semmler, Georg
2016-09-01
The objective of prospectivity modeling is prediction of the conditional probability of the presence T = 1 or absence T = 0 of a target T given favorable or prohibitive predictors B, or construction of a two classes 0,1 classification of T. A special case of logistic regression called weights-of-evidence (WofE) is geologists' favorite method of prospectivity modeling due to its apparent simplicity. However, the numerical simplicity is deceiving as it is implied by the severe mathematical modeling assumption of joint conditional independence of all predictors given the target. General weights of evidence are explicitly introduced which are as simple to estimate as conventional weights, i.e., by counting, but do not require conditional independence. Complementary to the regression view is the classification view on prospectivity modeling. Boosting is the construction of a strong classifier from a set of weak classifiers. From the regression point of view it is closely related to logistic regression. Boost weights-of-evidence (BoostWofE) was introduced into prospectivity modeling to counterbalance violations of the assumption of conditional independence even though relaxation of modeling assumptions with respect to weak classifiers was not the (initial) purpose of boosting. In the original publication of BoostWofE a fabricated dataset was used to "validate" this approach. Using the same fabricated dataset it is shown that BoostWofE cannot generally compensate lacking conditional independence whatever the consecutively processing order of predictors. Thus the alleged features of BoostWofE are disproved by way of counterexamples, while theoretical findings are confirmed that logistic regression including interaction terms can exactly compensate violations of joint conditional independence if the predictors are indicators.
Portfolio optimization for index tracking modelling in Malaysia stock market
NASA Astrophysics Data System (ADS)
Siew, Lam Weng; Jaaman, Saiful Hafizah; Ismail, Hamizun
2016-06-01
Index tracking is an investment strategy in portfolio management which aims to construct an optimal portfolio to generate similar mean return with the stock market index mean return without purchasing all of the stocks that make up the index. The objective of this paper is to construct an optimal portfolio using the optimization model which adopts regression approach in tracking the benchmark stock market index return. In this study, the data consists of weekly price of stocks in Malaysia market index which is FTSE Bursa Malaysia Kuala Lumpur Composite Index from January 2010 until December 2013. The results of this study show that the optimal portfolio is able to track FBMKLCI Index at minimum tracking error of 1.0027% with 0.0290% excess mean return over the mean return of FBMKLCI Index. The significance of this study is to construct the optimal portfolio using optimization model which adopts regression approach in tracking the stock market index without purchasing all index components.
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.
2003-01-01
Logistic regression was used to predict the probability of debris flows occurring in areas recently burned by wildland fires. Multiple logistic regression is conceptually similar to multiple linear regression because statistical relations between one dependent variable and several independent variables are evaluated. In logistic regression, however, the dependent variable is transformed to a binary variable (debris flow did or did not occur), and the actual probability of the debris flow occurring is statistically modeled. Data from 399 basins located within 15 wildland fires that burned during 2000-2002 in Colorado, Idaho, Montana, and New Mexico were evaluated. More than 35 independent variables describing the burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows were delineated from National Elevation Data using a Geographic Information System (GIS). (2) Data describing the burn severity, geology, land surface gradient, rainfall, and soil properties were determined for each basin. These data were then downloaded to a statistics software package for analysis using logistic regression. (3) Relations between the occurrence/non-occurrence of debris flows and burn severity, geology, land surface gradient, rainfall, and soil properties were evaluated and several preliminary multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combination produced the most effective model. The multivariate model that best predicted the occurrence of debris flows was selected. (4) The multivariate logistic regression model was entered into a GIS, and a map showing the probability of debris flows was constructed. The most effective model incorporates the percentage of each basin with slope greater than 30 percent, percentage of land burned at medium and high burn severity in each basin, particle size sorting, average storm intensity (millimeters per hour), soil organic matter content, soil permeability, and soil drainage. The results of this study demonstrate that logistic regression is a valuable tool for predicting the probability of debris flows occurring in recently-burned landscapes.
Yajima, Airi; Uesawa, Yoshihiro; Ogawa, Chiaki; Yatabe, Megumi; Kondo, Naoki; Saito, Shinichiro; Suzuki, Yoshihiko; Atsuda, Kouichiro; Kagaya, Hajime
2015-05-01
There exist various useful predictive models, such as the Cockcroft-Gault model, for estimating creatinine clearance (CLcr). However, the prediction of renal function is difficult in patients with cancer treated with cisplatin. Therefore, we attempted to construct a new model for predicting CLcr in such patients. Japanese patients with head and neck cancer who had received cisplatin-based chemotherapy were used as subjects. A multiple regression equation was constructed as a model for predicting CLcr values based on background and laboratory data. A model for predicting CLcr, which included body surface area, serum creatinine and albumin, was constructed. The model exhibited good performance prior to cisplatin therapy. In addition, it performed better than previously reported models after cisplatin therapy. The predictive model constructed in the present study displayed excellent potential and was useful for estimating the renal function of patients treated with cisplatin therapy. Copyright© 2015 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.
[A competency model of rural general practitioners: theory construction and empirical study].
Yang, Xiu-Mu; Qi, Yu-Long; Shne, Zheng-Fu; Han, Bu-Xin; Meng, Bei
2015-04-01
To perform theory construction and empirical study of the competency model of rural general practitioners. Through literature study, job analysis, interviews, and expert team discussion, the questionnaire of rural general practitioners competency was constructed. A total of 1458 rural general practitioners were surveyed by the questionnaire in 6 central provinces. The common factors were constructed using the principal component method of exploratory factor analysis and confirmatory factor analysis. The influence of the competency characteristics on the working performance was analyzed using regression equation analysis. The Cronbach 's alpha coefficient of the questionnaire was 0.974. The model consisted of 9 dimensions and 59 items. The 9 competency dimensions included basic public health service ability, basic clinical skills, system analysis capability, information management capability, communication and cooperation ability, occupational moral ability, non-medical professional knowledge, personal traits and psychological adaptability. The rate of explained cumulative total variance was 76.855%. The model fitting index were Χ(2)/df 1.88, GFI=0.94, NFI=0.96, NNFI=0.98, PNFI=0.91, RMSEA=0.068, CFI=0.97, IFI=0.97, RFI=0.96, suggesting good model fitting. Regression analysis showed that the competency characteristics had a significant effect on job performance. The rural general practitioners competency model provides reference for rural doctor training, rural order directional cultivation of medical students, and competency performance management of the rural general practitioners.
Logistic Regression in the Identification of Hazards in Construction
NASA Astrophysics Data System (ADS)
Drozd, Wojciech
2017-10-01
The construction site and its elements create circumstances that are conducive to the formation of risks to safety during the execution of works. Analysis indicates the critical importance of these factors in the set of characteristics that describe the causes of accidents in the construction industry. This article attempts to analyse the characteristics related to the construction site, in order to indicate their importance in defining the circumstances of accidents at work. The study includes sites inspected in 2014 - 2016 by the employees of the District Labour Inspectorate in Krakow (Poland). The analysed set of detailed (disaggregated) data includes both quantitative and qualitative characteristics. The substantive task focused on classification modelling in the identification of hazards in construction and identifying those of the analysed characteristics that are important in an accident. In terms of methodology, resource data analysis using statistical classifiers, in the form of logistic regression, was the method used.
Shi, Huilan; Jia, Junya; Li, Dong; Wei, Li; Shang, Wenya; Zheng, Zhenfeng
2018-02-09
Precise renal histopathological diagnosis will guide therapy strategy in patients with lupus nephritis. Blood oxygen level dependent (BOLD) magnetic resonance imaging (MRI) has been applicable noninvasive technique in renal disease. This current study was performed to explore whether BOLD MRI could contribute to diagnose renal pathological pattern. Adult patients with lupus nephritis renal pathological diagnosis were recruited for this study. Renal biopsy tissues were assessed based on the lupus nephritis ISN/RPS 2003 classification. The Blood oxygen level dependent magnetic resonance imaging (BOLD-MRI) was used to obtain functional magnetic resonance parameter, R2* values. Several functions of R2* values were calculated and used to construct algorithmic models for renal pathological patterns. In addition, the algorithmic models were compared as to their diagnostic capability. Both Histopathology and BOLD MRI were used to examine a total of twelve patients. Renal pathological patterns included five classes III (including 3 as class III + V) and seven classes IV (including 4 as class IV + V). Three algorithmic models, including decision tree, line discriminant, and logistic regression, were constructed to distinguish the renal pathological pattern of class III and class IV. The sensitivity of the decision tree model was better than that of the line discriminant model (71.87% vs 59.48%, P < 0.001) and inferior to that of the Logistic regression model (71.87% vs 78.71%, P < 0.001). The specificity of decision tree model was equivalent to that of the line discriminant model (63.87% vs 63.73%, P = 0.939) and higher than that of the logistic regression model (63.87% vs 38.0%, P < 0.001). The Area under the ROC curve (AUROCC) of the decision tree model was greater than that of the line discriminant model (0.765 vs 0.629, P < 0.001) and logistic regression model (0.765 vs 0.662, P < 0.001). BOLD MRI is a useful non-invasive imaging technique for the evaluation of lupus nephritis. Decision tree models constructed using functions of R2* values may facilitate the prediction of renal pathological patterns.
Corron, Louise; Marchal, François; Condemi, Silvana; Chaumoître, Kathia; Adalian, Pascal
2017-01-01
Juvenile age estimation methods used in forensic anthropology generally lack methodological consistency and/or statistical validity. Considering this, a standard approach using nonparametric Multivariate Adaptive Regression Splines (MARS) models were tested to predict age from iliac biometric variables of male and female juveniles from Marseilles, France, aged 0-12 years. Models using unidimensional (length and width) and bidimensional iliac data (module and surface) were constructed on a training sample of 176 individuals and validated on an independent test sample of 68 individuals. Results show that MARS prediction models using iliac width, module and area give overall better and statistically valid age estimates. These models integrate punctual nonlinearities of the relationship between age and osteometric variables. By constructing valid prediction intervals whose size increases with age, MARS models take into account the normal increase of individual variability. MARS models can qualify as a practical and standardized approach for juvenile age estimation. © 2016 American Academy of Forensic Sciences.
Learning accurate and interpretable models based on regularized random forests regression
2014-01-01
Background Many biology related research works combine data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. Methods In this study, we focus on regression problems for biological data where target outcomes are continuous. In general, models constructed from linear regression approaches are relatively easy to interpret. However, many practical biological applications are nonlinear in essence where we can hardly find a direct linear relationship between input and output. Nonlinear regression techniques can reveal nonlinear relationship of data, but are generally hard for human to interpret. We propose a rule based regression algorithm that uses 1-norm regularized random forests. The proposed approach simultaneously extracts a small number of rules from generated random forests and eliminates unimportant features. Results We tested the approach on some biological data sets. The proposed approach is able to construct a significantly smaller set of regression rules using a subset of attributes while achieving prediction performance comparable to that of random forests regression. Conclusion It demonstrates high potential in aiding prediction and interpretation of nonlinear relationships of the subject being studied. PMID:25350120
Rupert, Michael G.; Cannon, Susan H.; Gartner, Joseph E.; Michael, John A.; Helsel, Dennis R.
2008-01-01
Logistic regression was used to develop statistical models that can be used to predict the probability of debris flows in areas recently burned by wildfires by using data from 14 wildfires that burned in southern California during 2003-2006. Twenty-eight independent variables describing the basin morphology, burn severity, rainfall, and soil properties of 306 drainage basins located within those burned areas were evaluated. The models were developed as follows: (1) Basins that did and did not produce debris flows soon after the 2003 to 2006 fires were delineated from data in the National Elevation Dataset using a geographic information system; (2) Data describing the basin morphology, burn severity, rainfall, and soil properties were compiled for each basin. These data were then input to a statistics software package for analysis using logistic regression; and (3) Relations between the occurrence or absence of debris flows and the basin morphology, burn severity, rainfall, and soil properties were evaluated, and five multivariate logistic regression models were constructed. All possible combinations of independent variables were evaluated to determine which combinations produced the most effective models, and the multivariate models that best predicted the occurrence of debris flows were identified. Percentage of high burn severity and 3-hour peak rainfall intensity were significant variables in all models. Soil organic matter content and soil clay content were significant variables in all models except Model 5. Soil slope was a significant variable in all models except Model 4. The most suitable model can be selected from these five models on the basis of the availability of independent variables in the particular area of interest and field checking of probability maps. The multivariate logistic regression models can be entered into a geographic information system, and maps showing the probability of debris flows can be constructed in recently burned areas of southern California. This study demonstrates that logistic regression is a valuable tool for developing models that predict the probability of debris flows occurring in recently burned landscapes.
Soltanzadeh, Ahmad; Mohammadfam, Iraj; Moghimbeigi, Abbas; Ghiasvand, Reza
2016-03-01
Construction industry involves the highest risk of occupational accidents and bodily injuries, which range from mild to very severe. The aim of this cross-sectional study was to identify the factors associated with accident severity rate (ASR) in the largest Iranian construction companies based on data about 500 occupational accidents recorded from 2009 to 2013. We also gathered data on safety and health risk management and training systems. Data were analysed using Pearson's chi-squared coefficient and multiple regression analysis. Median ASR (and the interquartile range) was 107.50 (57.24- 381.25). Fourteen of the 24 studied factors stood out as most affecting construction accident severity (p<0.05). These findings can be applied in the design and implementation of a comprehensive safety and health risk management system to reduce ASR.
An algebraic method for constructing stable and consistent autoregressive filters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Harlim, John, E-mail: jharlim@psu.edu; Department of Meteorology, the Pennsylvania State University, University Park, PA 16802; Hong, Hoon, E-mail: hong@ncsu.edu
2015-02-15
In this paper, we introduce an algebraic method to construct stable and consistent univariate autoregressive (AR) models of low order for filtering and predicting nonlinear turbulent signals with memory depth. By stable, we refer to the classical stability condition for the AR model. By consistent, we refer to the classical consistency constraints of Adams–Bashforth methods of order-two. One attractive feature of this algebraic method is that the model parameters can be obtained without directly knowing any training data set as opposed to many standard, regression-based parameterization methods. It takes only long-time average statistics as inputs. The proposed method provides amore » discretization time step interval which guarantees the existence of stable and consistent AR model and simultaneously produces the parameters for the AR models. In our numerical examples with two chaotic time series with different characteristics of decaying time scales, we find that the proposed AR models produce significantly more accurate short-term predictive skill and comparable filtering skill relative to the linear regression-based AR models. These encouraging results are robust across wide ranges of discretization times, observation times, and observation noise variances. Finally, we also find that the proposed model produces an improved short-time prediction relative to the linear regression-based AR-models in forecasting a data set that characterizes the variability of the Madden–Julian Oscillation, a dominant tropical atmospheric wave pattern.« less
Safety evaluation model of urban cross-river tunnel based on driving simulation.
Ma, Yingqi; Lu, Linjun; Lu, Jian John
2017-09-01
Currently, Shanghai urban cross-river tunnels have three principal characteristics: increased traffic, a high accident rate and rapidly developing construction. Because of their complex geographic and hydrological characteristics, the alignment conditions in urban cross-river tunnels are more complicated than in highway tunnels, so a safety evaluation of urban cross-river tunnels is necessary to suggest follow-up construction and changes in operational management. A driving risk index (DRI) for urban cross-river tunnels was proposed in this study. An index system was also constructed, combining eight factors derived from the output of a driving simulator regarding three aspects of risk due to following, lateral accidents and driver workload. Analytic hierarchy process methods and expert marking and normalization processing were applied to construct a mathematical model for the DRI. The driving simulator was used to simulate 12 Shanghai urban cross-river tunnels and a relationship was obtained between the DRI for the tunnels and the corresponding accident rate (AR) via a regression analysis. The regression analysis results showed that the relationship between the DRI and the AR mapped to an exponential function with a high degree of fit. In the absence of detailed accident data, a safety evaluation model based on factors derived from a driving simulation can effectively assess the driving risk in urban cross-river tunnels constructed or in design.
Bootstrap investigation of the stability of a Cox regression model.
Altman, D G; Andersen, P K
1989-07-01
We describe a bootstrap investigation of the stability of a Cox proportional hazards regression model resulting from the analysis of a clinical trial of azathioprine versus placebo in patients with primary biliary cirrhosis. We have considered stability to refer both to the choice of variables included in the model and, more importantly, to the predictive ability of the model. In stepwise Cox regression analyses of 100 bootstrap samples using 17 candidate variables, the most frequently selected variables were those selected in the original analysis, and no other important variable was identified. Thus there was no reason to doubt the model obtained in the original analysis. For each patient in the trial, bootstrap confidence intervals were constructed for the estimated probability of surviving two years. It is shown graphically that these intervals are markedly wider than those obtained from the original model.
Fei, Y; Hu, J; Li, W-Q; Wang, W; Zong, G-Q
2017-03-01
Essentials Predicting the occurrence of portosplenomesenteric vein thrombosis (PSMVT) is difficult. We studied 72 patients with acute pancreatitis. Artificial neural networks modeling was more accurate than logistic regression in predicting PSMVT. Additional predictive factors may be incorporated into artificial neural networks. Objective To construct and validate artificial neural networks (ANNs) for predicting the occurrence of portosplenomesenteric venous thrombosis (PSMVT) and compare the predictive ability of the ANNs with that of logistic regression. Methods The ANNs and logistic regression modeling were constructed using simple clinical and laboratory data of 72 acute pancreatitis (AP) patients. The ANNs and logistic modeling were first trained on 48 randomly chosen patients and validated on the remaining 24 patients. The accuracy and the performance characteristics were compared between these two approaches by SPSS17.0 software. Results The training set and validation set did not differ on any of the 11 variables. After training, the back propagation network training error converged to 1 × 10 -20 , and it retained excellent pattern recognition ability. When the ANNs model was applied to the validation set, it revealed a sensitivity of 80%, specificity of 85.7%, a positive predictive value of 77.6% and negative predictive value of 90.7%. The accuracy was 83.3%. Differences could be found between ANNs modeling and logistic regression modeling in these parameters (10.0% [95% CI, -14.3 to 34.3%], 14.3% [95% CI, -8.6 to 37.2%], 15.7% [95% CI, -9.9 to 41.3%], 11.8% [95% CI, -8.2 to 31.8%], 22.6% [95% CI, -1.9 to 47.1%], respectively). When ANNs modeling was used to identify PSMVT, the area under receiver operating characteristic curve was 0.849 (95% CI, 0.807-0.901), which demonstrated better overall properties than logistic regression modeling (AUC = 0.716) (95% CI, 0.679-0.761). Conclusions ANNs modeling was a more accurate tool than logistic regression in predicting the occurrence of PSMVT following AP. More clinical factors or biomarkers may be incorporated into ANNs modeling to improve its predictive ability. © 2016 International Society on Thrombosis and Haemostasis.
NASA Astrophysics Data System (ADS)
Guan, Yafu; Yang, Shuo; Zhang, Dong H.
2018-04-01
Gaussian process regression (GPR) is an efficient non-parametric method for constructing multi-dimensional potential energy surfaces (PESs) for polyatomic molecules. Since not only the posterior mean but also the posterior variance can be easily calculated, GPR provides a well-established model for active learning, through which PESs can be constructed more efficiently and accurately. We propose a strategy of active data selection for the construction of PESs with emphasis on low energy regions. Through three-dimensional (3D) example of H3, the validity of this strategy is verified. The PESs for two prototypically reactive systems, namely, H + H2O ↔ H2 + OH reaction and H + CH4 ↔ H2 + CH3 reaction are reconstructed. Only 920 and 4000 points are assembled to reconstruct these two PESs respectively. The accuracy of the GP PESs is not only tested by energy errors but also validated by quantum scattering calculations.
NASA Astrophysics Data System (ADS)
Dobronets, Boris S.; Popova, Olga A.
2018-05-01
The paper considers a new approach of regression modeling that uses aggregated data presented in the form of density functions. Approaches to Improving the reliability of aggregation of empirical data are considered: improving accuracy and estimating errors. We discuss the procedures of data aggregation as a preprocessing stage for subsequent to regression modeling. An important feature of study is demonstration of the way how represent the aggregated data. It is proposed to use piecewise polynomial models, including spline aggregate functions. We show that the proposed approach to data aggregation can be interpreted as the frequency distribution. To study its properties density function concept is used. Various types of mathematical models of data aggregation are discussed. For the construction of regression models, it is proposed to use data representation procedures based on piecewise polynomial models. New approaches to modeling functional dependencies based on spline aggregations are proposed.
Ridge regression for predicting elastic moduli and hardness of calcium aluminosilicate glasses
NASA Astrophysics Data System (ADS)
Deng, Yifan; Zeng, Huidan; Jiang, Yejia; Chen, Guorong; Chen, Jianding; Sun, Luyi
2018-03-01
It is of great significance to design glasses with satisfactory mechanical properties predictively through modeling. Among various modeling methods, data-driven modeling is such a reliable approach that can dramatically shorten research duration, cut research cost and accelerate the development of glass materials. In this work, the ridge regression (RR) analysis was used to construct regression models for predicting the compositional dependence of CaO-Al2O3-SiO2 glass elastic moduli (Shear, Bulk, and Young’s moduli) and hardness based on the ternary diagram of the compositions. The property prediction over a large glass composition space was accomplished with known experimental data of various compositions in the literature, and the simulated results are in good agreement with the measured ones. This regression model can serve as a facile and effective tool for studying the relationship between the compositions and the property, enabling high-efficient design of glasses to meet the requirements for specific elasticity and hardness.
Dong, J Q; Zhang, X Y; Wang, S Z; Jiang, X F; Zhang, K; Ma, G W; Wu, M Q; Li, H; Zhang, H
2018-01-01
Plasma very low-density lipoprotein (VLDL) can be used to select for low body fat or abdominal fat (AF) in broilers, but its correlation with AF is limited. We investigated whether any other biochemical indicator can be used in combination with VLDL for a better selective effect. Nineteen plasma biochemical indicators were measured in male chickens from the Northeast Agricultural University broiler lines divergently selected for AF content (NEAUHLF) in the fed state at 46 and 48 d of age. The average concentration of every parameter for the 2 d was used for statistical analysis. Levels of these 19 plasma biochemical parameters were compared between the lean and fat lines. The phenotypic correlations between these plasma biochemical indicators and AF traits were analyzed. Then, multiple linear regression models were constructed to select the best model used for selecting against AF content. and the heritabilities of plasma indicators contained in the best models were estimated. The results showed that 11 plasma biochemical indicators (triglycerides, total bile acid, total protein, globulin, albumin/globulin, aspartate transaminase, alanine transaminase, gamma-glutamyl transpeptidase, uric acid, creatinine, and VLDL) differed significantly between the lean and fat lines (P < 0.01), and correlated significantly with AF traits (P < 0.05). The best multiple linear regression models based on albumin/globulin, VLDL, triglycerides, globulin, total bile acid, and uric acid, had higher R2 (0.73) than the model based only on VLDL (0.21). The plasma parameters included in the best models had moderate heritability estimates (0.21 ≤ h2 ≤ 0.43). These results indicate that these multiple linear regression models can be used to select for lean broiler chickens. © 2017 Poultry Science Association Inc.
Modeling the prediction of business intelligence system effectiveness.
Weng, Sung-Shun; Yang, Ming-Hsien; Koo, Tian-Lih; Hsiao, Pei-I
2016-01-01
Although business intelligence (BI) technologies are continually evolving, the capability to apply BI technologies has become an indispensable resource for enterprises running in today's complex, uncertain and dynamic business environment. This study performed pioneering work by constructing models and rules for the prediction of business intelligence system effectiveness (BISE) in relation to the implementation of BI solutions. For enterprises, effectively managing critical attributes that determine BISE to develop prediction models with a set of rules for self-evaluation of the effectiveness of BI solutions is necessary to improve BI implementation and ensure its success. The main study findings identified the critical prediction indicators of BISE that are important to forecasting BI performance and highlighted five classification and prediction rules of BISE derived from decision tree structures, as well as a refined regression prediction model with four critical prediction indicators constructed by logistic regression analysis that can enable enterprises to improve BISE while effectively managing BI solution implementation and catering to academics to whom theory is important.
Constructive thinking, rational intelligence and irritable bowel syndrome.
Rey, Enrique; Moreno Ortega, Marta; Garcia Alonso, Monica-Olga; Diaz-Rubio, Manuel
2009-07-07
To evaluate rational and experiential intelligence in irritable bowel syndrome (IBS) sufferers. We recruited 100 subjects with IBS as per Rome II criteria (50 consulters and 50 non-consulters) and 100 healthy controls, matched by age, sex and educational level. Cases and controls completed a clinical questionnaire (including symptom characteristics and medical consultation) and the following tests: rational-intelligence (Wechsler Adult Intelligence Scale, 3rd edition); experiential-intelligence (Constructive Thinking Inventory); personality (NEO personality inventory); psychopathology (MMPI-2), anxiety (state-trait anxiety inventory) and life events (social readjustment rating scale). Analysis of variance was used to compare the test results of IBS-sufferers and controls, and a logistic regression model was then constructed and adjusted for age, sex and educational level to evaluate any possible association with IBS. No differences were found between IBS cases and controls in terms of IQ (102.0 +/- 10.8 vs 102.8 +/- 12.6), but IBS sufferers scored significantly lower in global constructive thinking (43.7 +/- 9.4 vs 49.6 +/- 9.7). In the logistic regression model, global constructive thinking score was independently linked to suffering from IBS [OR 0.92 (0.87-0.97)], without significant OR for total IQ. IBS subjects do not show lower rational intelligence than controls, but lower experiential intelligence is nevertheless associated with IBS.
Normal Theory Two-Stage ML Estimator When Data Are Missing at the Item Level
ERIC Educational Resources Information Center
Savalei, Victoria; Rhemtulla, Mijke
2017-01-01
In many modeling contexts, the variables in the model are linear composites of the raw items measured for each participant; for instance, regression and path analysis models rely on scale scores, and structural equation models often use parcels as indicators of latent constructs. Currently, no analytic estimation method exists to appropriately…
Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755
Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.
Hu, Yi-Chung
2014-01-01
On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
ERIC Educational Resources Information Center
Bullock, Emily E.; Reardon, Robert C.
2008-01-01
The study used the Self-Directed Search (SDS) and the NEO-FFI to explore profile elevation, four secondary constructs, and the Big Five personality factors in a sample of college students in a career course. Regression model results showed that openness, conscientiousness, differentiation high-low, differentiation Iachan, and consistency accounted…
Event-based soil loss models for construction sites
NASA Astrophysics Data System (ADS)
Trenouth, William R.; Gharabaghi, Bahram
2015-05-01
The elevated rates of soil erosion stemming from land clearing and grading activities during urban development, can result in excessive amounts of eroded sediments entering waterways and causing harm to the biota living therein. However, construction site event-based soil loss simulations - required for reliable design of erosion and sediment controls - are one of the most uncertain types of hydrologic models. This study presents models with improved degree of accuracy to advance the design of erosion and sediment controls for construction sites. The new models are developed using multiple linear regression (MLR) on event-based permutations of the Universal Soil Loss Equation (USLE) and artificial neural networks (ANN). These models were developed using surface runoff monitoring datasets obtained from three sites - Greensborough, Cookstown, and Alcona - in Ontario and datasets mined from the literature for three additional sites - Treynor, Iowa, Coshocton, Ohio and Cordoba, Spain. The predictive MLR and ANN models can serve as both diagnostic and design tools for the effective sizing of erosion and sediment controls on active construction sites, and can be used for dynamic scenario forecasting when considering rapidly changing land use conditions during various phases of construction.
NASA Astrophysics Data System (ADS)
Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Santos-Filho, Osvaldo A.; Esposito, Emilio X.; Hopfinger, Anton J.; Tseng, Yufeng J.
2008-06-01
In previous studies we have developed categorical QSAR models for predicting skin-sensitization potency based on 4D-fingerprint (4D-FP) descriptors and in vivo murine local lymph node assay (LLNA) measures. Only 4D-FP derived from the ground state (GMAX) structures of the molecules were used to build the QSAR models. In this study we have generated 4D-FP descriptors from the first excited state (EMAX) structures of the molecules. The GMAX, EMAX and the combined ground and excited state 4D-FP descriptors (GEMAX) were employed in building categorical QSAR models. Logistic regression (LR) and partial least square coupled logistic regression (PLS-CLR), found to be effective model building for the LLNA skin-sensitization measures in our previous studies, were used again in this study. This also permitted comparison of the prior ground state models to those involving first excited state 4D-FP descriptors. Three types of categorical QSAR models were constructed for each of the GMAX, EMAX and GEMAX datasets: a binary model (2-state), an ordinal model (3-state) and a binary-binary model (two-2-state). No significant differences exist among the LR 2-state model constructed for each of the three datasets. However, the PLS-CLR 3-state and 2-state models based on the EMAX and GEMAX datasets have higher predictivity than those constructed using only the GMAX dataset. These EMAX and GMAX categorical models are also more significant and predictive than corresponding models built in our previous QSAR studies of LLNA skin-sensitization measures.
NASA Astrophysics Data System (ADS)
Saad, Ahmed S.; Hamdy, Abdallah M.; Salama, Fathy M.; Abdelkawy, Mohamed
2016-10-01
Effect of data manipulation in preprocessing step proceeding construction of chemometric models was assessed. The same set of UV spectral data was used for construction of PLS and PCR models directly and after mathematically manipulation as per well known first and second derivatives of the absorption spectra, ratio spectra and first and second derivatives of the ratio spectra spectrophotometric methods, meanwhile the optimal working wavelength ranges were carefully selected for each model and the models were constructed. Unexpectedly, number of latent variables used for models' construction varied among the different methods. The prediction power of the different models was compared using a validation set of 8 mixtures prepared as per the multilevel multifactor design and results were statistically compared using two-way ANOVA test. Root mean squares error of prediction (RMSEP) was used for further comparison of the predictability among different constructed models. Although no significant difference was found between results obtained using Partial Least Squares (PLS) and Principal Component Regression (PCR) models, however, discrepancies among results was found to be attributed to the variation in the discrimination power of adopted spectrophotometric methods on spectral data.
NASA Astrophysics Data System (ADS)
Fomina, E. V.; Kozhukhova, N. I.; Sverguzova, S. V.; Fomin, A. E.
2018-05-01
In this paper, the regression equations method for design of construction material was studied. Regression and polynomial equations representing the correlation between the studied parameters were proposed. The logic design and software interface of the regression equations method focused on parameter optimization to provide the energy saving effect at the stage of autoclave aerated concrete design considering the replacement of traditionally used quartz sand by coal mining by-product such as argillite. The mathematical model represented by a quadric polynomial for the design of experiment was obtained using calculated and experimental data. This allowed the estimation of relationship between the composition and final properties of the aerated concrete. The surface response graphically presented in a nomogram allowed the estimation of concrete properties in response to variation of composition within the x-space. The optimal range of argillite content was obtained leading to a reduction of raw materials demand, development of target plastic strength of aerated concrete as well as a reduction of curing time before autoclave treatment. Generally, this method allows the design of autoclave aerated concrete with required performance without additional resource and time costs.
A method for fitting regression splines with varying polynomial order in the linear mixed model.
Edwards, Lloyd J; Stewart, Paul W; MacDougall, James E; Helms, Ronald W
2006-02-15
The linear mixed model has become a widely used tool for longitudinal analysis of continuous variables. The use of regression splines in these models offers the analyst additional flexibility in the formulation of descriptive analyses, exploratory analyses and hypothesis-driven confirmatory analyses. We propose a method for fitting piecewise polynomial regression splines with varying polynomial order in the fixed effects and/or random effects of the linear mixed model. The polynomial segments are explicitly constrained by side conditions for continuity and some smoothness at the points where they join. By using a reparameterization of this explicitly constrained linear mixed model, an implicitly constrained linear mixed model is constructed that simplifies implementation of fixed-knot regression splines. The proposed approach is relatively simple, handles splines in one variable or multiple variables, and can be easily programmed using existing commercial software such as SAS or S-plus. The method is illustrated using two examples: an analysis of longitudinal viral load data from a study of subjects with acute HIV-1 infection and an analysis of 24-hour ambulatory blood pressure profiles.
Pedersen, Nicklas Juel; Jensen, David Hebbelstrup; Lelkaitis, Giedrius; Kiss, Katalin; Charabi, Birgitte; Specht, Lena; von Buchwald, Christian
2017-01-01
It is challenging to identify at diagnosis those patients with early oral squamous cell carcinoma (OSCC), who have a poor prognosis and those that have a high risk of harboring occult lymph node metastases. The aim of this study was to develop a standardized and objective digital scoring method to evaluate the predictive value of tumor budding. We developed a semi-automated image-analysis algorithm, Digital Tumor Bud Count (DTBC), to evaluate tumor budding. The algorithm was tested in 222 consecutive patients with early-stage OSCC and major endpoints were overall (OS) and progression free survival (PFS). We subsequently constructed and cross-validated a binary logistic regression model and evaluated its clinical utility by decision curve analysis. A high DTBC was an independent predictor of both poor OS and PFS in a multivariate Cox regression model. The logistic regression model was able to identify patients with occult lymph node metastases with an area under the curve (AUC) of 0.83 (95% CI: 0.78–0.89, P <0.001) and a 10-fold cross-validated AUC of 0.79. Compared to other known histopathological risk factors, the DTBC had a higher diagnostic accuracy. The proposed, novel risk model could be used as a guide to identify patients who would benefit from an up-front neck dissection. PMID:28212555
Learning Models and Real-Time Speech Recognition.
ERIC Educational Resources Information Center
Danforth, Douglas G.; And Others
This report describes the construction and testing of two "psychological" learning models for the purpose of computer recognition of human speech over the telephone. One of the two models was found to be superior in all tests. A regression analysis yielded a 92.3% recognition rate for 14 subjects ranging in age from 6 to 13 years. Tests…
Bai, Wenming; Yoshimura, Norio; Takayanagi, Masao; Che, Jingai; Horiuchi, Naomi; Ogiwara, Isao
2016-06-28
Nondestructive prediction of ingredient contents of farm products is useful to ship and sell the products with guaranteed qualities. Here, near-infrared spectroscopy is used to predict nondestructively total sugar, total organic acid, and total anthocyanin content in each blueberry. The technique is expected to enable the selection of only delicious blueberries from all harvested ones. The near-infrared absorption spectra of blueberries are measured with the diffuse reflectance mode at the positions not on the calyx. The ingredient contents of a blueberry determined by high-performance liquid chromatography are used to construct models to predict the ingredient contents from observed spectra. Partial least squares regression is used for the construction of the models. It is necessary to properly select the pretreatments for the observed spectra and the wavelength regions of the spectra used for analyses. Validations are necessary for the constructed models to confirm that the ingredient contents are predicted with practical accuracies. Here we present a protocol to construct and validate the models for nondestructive prediction of ingredient contents in blueberries by near-infrared spectroscopy.
NASA Astrophysics Data System (ADS)
Wang, Jiangbo; Liu, Junhui; Li, Tiantian; Yin, Shuo; He, Xinhui
2018-01-01
The monthly electricity sales forecasting is a basic work to ensure the safety of the power system. This paper presented a monthly electricity sales forecasting method which comprehensively considers the coupled multi-factors of temperature, economic growth, electric power replacement and business expansion. The mathematical model is constructed by using regression method. The simulation results show that the proposed method is accurate and effective.
Modeling of fugitive dust emission for construction sand and gravel processing plant.
Lee, C H; Tang, L W; Chang, C T
2001-05-15
Due to rapid economic development in Taiwan, a large quantity of construction sand and gravel is needed to support domestic civil construction projects. However, a construction sand and gravel processing plant is often a major source of air pollution, due to its associated fugitive dust emission. To predict the amount of fugitive dust emitted from this kind of processing plant, a semiempirical model was developed in this study. This model was developed on the basis of the actual dust emission data (i.e., total suspended particulate, TSP) and four on-site operating parameters (i.e., wind speed (u), soil moisture (M), soil silt content (s), and number (N) of trucks) measured at a construction sand and gravel processing plant. On the basis of the on-site measured data and an SAS nonlinear regression program, the expression of this model is E = 0.011.u2.653.M-1.875.s0.060.N0.896, where E is the amount (kg/ton) of dust emitted during the production of each ton of gravel and sand. This model can serve as a facile tool for predicting the fugitive dust emission from a construction sand and gravel processing plant.
Parametric Human Body Reconstruction Based on Sparse Key Points.
Cheng, Ke-Li; Tong, Ruo-Feng; Tang, Min; Qian, Jing-Ye; Sarkis, Michel
2016-11-01
We propose an automatic parametric human body reconstruction algorithm which can efficiently construct a model using a single Kinect sensor. A user needs to stand still in front of the sensor for a couple of seconds to measure the range data. The user's body shape and pose will then be automatically constructed in several seconds. Traditional methods optimize dense correspondences between range data and meshes. In contrast, our proposed scheme relies on sparse key points for the reconstruction. It employs regression to find the corresponding key points between the scanned range data and some annotated training data. We design two kinds of feature descriptors as well as corresponding regression stages to make the regression robust and accurate. Our scheme follows with dense refinement where a pre-factorization method is applied to improve the computational efficiency. Compared with other methods, our scheme achieves similar reconstruction accuracy but significantly reduces runtime.
Computationally efficient algorithm for Gaussian Process regression in case of structured samples
NASA Astrophysics Data System (ADS)
Belyaev, M.; Burnaev, E.; Kapushev, Y.
2016-04-01
Surrogate modeling is widely used in many engineering problems. Data sets often have Cartesian product structure (for instance factorial design of experiments with missing points). In such case the size of the data set can be very large. Therefore, one of the most popular algorithms for approximation-Gaussian Process regression-can be hardly applied due to its computational complexity. In this paper a computationally efficient approach for constructing Gaussian Process regression in case of data sets with Cartesian product structure is presented. Efficiency is achieved by using a special structure of the data set and operations with tensors. Proposed algorithm has low computational as well as memory complexity compared to existing algorithms. In this work we also introduce a regularization procedure allowing to take into account anisotropy of the data set and avoid degeneracy of regression model.
Efficient Regressions via Optimally Combining Quantile Information*
Zhao, Zhibiao; Xiao, Zhijie
2014-01-01
We develop a generally applicable framework for constructing efficient estimators of regression models via quantile regressions. The proposed method is based on optimally combining information over multiple quantiles and can be applied to a broad range of parametric and nonparametric settings. When combining information over a fixed number of quantiles, we derive an upper bound on the distance between the efficiency of the proposed estimator and the Fisher information. As the number of quantiles increases, this upper bound decreases and the asymptotic variance of the proposed estimator approaches the Cramér-Rao lower bound under appropriate conditions. In the case of non-regular statistical estimation, the proposed estimator leads to super-efficient estimation. We illustrate the proposed method for several widely used regression models. Both asymptotic theory and Monte Carlo experiments show the superior performance over existing methods. PMID:25484481
Eaton, Jennifer L; Mohr, David C; Hodgson, Michael J; McPhaul, Kathleen M
2018-02-01
To describe development and validation of the work-related well-being (WRWB) index. Principal components analysis was performed using Federal Employee Viewpoint Survey (FEVS) data (N = 392,752) to extract variables representing worker well-being constructs. Confirmatory factor analysis was performed to verify factor structure. To validate the WRWB index, we used multiple regression analysis to examine relationships with burnout associated outcomes. Principal Components Analysis identified three positive psychology constructs: "Work Positivity", "Co-worker Relationships", and "Work Mastery". An 11 item index explaining 63.5% of variance was achieved. The structural equation model provided a very good fit to the data. Higher WRWB scores were positively associated with all three employee experience measures examined in regression models. The new WRWB index shows promise as a valid and widely accessible instrument to assess worker well-being.
Eaton, Jennifer L; Mohr, David C; Hodgson, Michael J; McPhaul, Kathleen M
2017-10-11
To describe development and validation of the Work-Related Well-Being (WRWB) Index. Principal Components Analysis was performed using Federal Employee Viewpoint Survey (FEVS) data (N = 392,752) to extract variables representing worker well-being constructs. Confirmatory factor analysis was performed to verify factor structure. To validate the WRWB index, we used multiple regression analysis to examine relationships with burnout associated outcomes. PCA identified three positive psychology constructs: "Work Positivity", "Co-worker Relationships", and "Work Mastery". An 11 item index explaining 63.5% of variance was achieved. The structural equation model provided a very good fit to the data. Higher WRWB scores were positively associated with all 3 employee experience measures examined in regression models. The new WRWB index shows promise as a valid and widely accessible instrument to assess worker well-being.
Cubbin, Catherine; Heck, Katherine; Powell, Tara; Marchi, Kristen; Braveman, Paula
2015-01-01
We examined racial/ethnic disparities in depressive symptoms during pregnancy among a population-based sample of childbearing women in California (N = 24,587). We hypothesized that these racial/ethnic disparities would be eliminated when comparing women with similar incomes and neighborhood poverty environments. Neighborhood poverty trajectory descriptions were linked with survey data measuring age, parity, race/ethnicity, marital status, education, income, and depressive symptoms. We constructed logistic regression models among the overall sample to examine both crude and adjusted racial/ethnic disparities in feeling depressed. Next, stratified adjusted logistic regression models were constructed to examine racial/ethnic disparities in feeling depressed among women of similar income levels living in similar neighborhood poverty environments. We found that racial/ethnic disparities in feeling depressed remained only among women who were not poor themselves and who lived in long-term moderate or low poverty neighborhoods.
Constructive thinking, rational intelligence and irritable bowel syndrome
Rey, Enrique; Ortega, Marta Moreno; Alonso, Monica Olga Garcia; Diaz-Rubio, Manuel
2009-01-01
AIM: To evaluate rational and experiential intelligence in irritable bowel syndrome (IBS) sufferers. METHODS: We recruited 100 subjects with IBS as per Rome II criteria (50 consulters and 50 non-consulters) and 100 healthy controls, matched by age, sex and educational level. Cases and controls completed a clinical questionnaire (including symptom characteristics and medical consultation) and the following tests: rational-intelligence (Wechsler Adult Intelligence Scale, 3rd edition); experiential-intelligence (Constructive Thinking Inventory); personality (NEO personality inventory); psychopathology (MMPI-2), anxiety (state-trait anxiety inventory) and life events (social readjustment rating scale). Analysis of variance was used to compare the test results of IBS-sufferers and controls, and a logistic regression model was then constructed and adjusted for age, sex and educational level to evaluate any possible association with IBS. RESULTS: No differences were found between IBS cases and controls in terms of IQ (102.0 ± 10.8 vs 102.8 ± 12.6), but IBS sufferers scored significantly lower in global constructive thinking (43.7 ± 9.4 vs 49.6 ± 9.7). In the logistic regression model, global constructive thinking score was independently linked to suffering from IBS [OR 0.92 (0.87-0.97)], without significant OR for total IQ. CONCLUSION: IBS subjects do not show lower rational intelligence than controls, but lower experiential intelligence is nevertheless associated with IBS. PMID:19575489
Standards for Standardized Logistic Regression Coefficients
ERIC Educational Resources Information Center
Menard, Scott
2011-01-01
Standardized coefficients in logistic regression analysis have the same utility as standardized coefficients in linear regression analysis. Although there has been no consensus on the best way to construct standardized logistic regression coefficients, there is now sufficient evidence to suggest a single best approach to the construction of a…
NASA Astrophysics Data System (ADS)
Madhu, B.; Ashok, N. C.; Balasubramanian, S.
2014-11-01
Multinomial logistic regression analysis was used to develop statistical model that can predict the probability of breast cancer in Southern Karnataka using the breast cancer occurrence data during 2007-2011. Independent socio-economic variables describing the breast cancer occurrence like age, education, occupation, parity, type of family, health insurance coverage, residential locality and socioeconomic status of each case was obtained. The models were developed as follows: i) Spatial visualization of the Urban- rural distribution of breast cancer cases that were obtained from the Bharat Hospital and Institute of Oncology. ii) Socio-economic risk factors describing the breast cancer occurrences were complied for each case. These data were then analysed using multinomial logistic regression analysis in a SPSS statistical software and relations between the occurrence of breast cancer across the socio-economic status and the influence of other socio-economic variables were evaluated and multinomial logistic regression models were constructed. iii) the model that best predicted the occurrence of breast cancer were identified. This multivariate logistic regression model has been entered into a geographic information system and maps showing the predicted probability of breast cancer occurrence in Southern Karnataka was created. This study demonstrates that Multinomial logistic regression is a valuable tool for developing models that predict the probability of breast cancer Occurrence in Southern Karnataka.
Optimization of fixture layouts of glass laser optics using multiple kernel regression.
Su, Jianhua; Cao, Enhua; Qiao, Hong
2014-05-10
We aim to build an integrated fixturing model to describe the structural properties and thermal properties of the support frame of glass laser optics. Therefore, (a) a near global optimal set of clamps can be computed to minimize the surface shape error of the glass laser optic based on the proposed model, and (b) a desired surface shape error can be obtained by adjusting the clamping forces under various environmental temperatures based on the model. To construct the model, we develop a new multiple kernel learning method and call it multiple kernel support vector functional regression. The proposed method uses two layer regressions to group and order the data sources by the weights of the kernels and the factors of the layers. Because of that, the influences of the clamps and the temperature can be evaluated by grouping them into different layers.
Table Rock Lake Water-Clarity Assessment Using Landsat Thematic Mapper Satellite Data
Krizanich, Gary; Finn, Michael P.
2009-01-01
Water quality of Table Rock Lake in southwestern Missouri is assessed using Landsat Thematic Mapper satellite data. A pilot study uses multidate satellite image scenes in conjunction with physical measurements of secchi disk transparency collected by the Lakes of Missouri Volunteer Program to construct a regression model used to estimate water clarity. The natural log of secchi disk transparency is the dependent variable in the regression and the independent variables are Thematic Mapper band 1 (blue) reflectance and a ratio of the band 1 and band 3 (red) reflectance. The regression model can be used to reliably predict water clarity anywhere within the lake. A pixel-level lake map of predicted water clarity or computed trophic state can be produced from the model output. Information derived from this model can be used by water-resource managers to assess water quality and evaluate effects of changes in the watershed on water quality.
NASA Astrophysics Data System (ADS)
Zhai, Mengting; Chen, Yan; Li, Jing; Zhou, Jun
2017-12-01
The molecular electrongativity distance vector (MEDV-13) was used to describe the molecular structure of benzyl ether diamidine derivatives in this paper, Based on MEDV-13, The three-parameter (M 3, M 15, M 47) QSAR model of insecticidal activity (pIC 50) for 60 benzyl ether diamidine derivatives was constructed by leaps-and-bounds regression (LBR) . The traditional correlation coefficient (R) and the cross-validation correlation coefficient (R CV ) were 0.975 and 0.971, respectively. The robustness of the regression model was validated by Jackknife method, the correlation coefficient R were between 0.971 and 0.983. Meanwhile, the independent variables in the model were tested to be no autocorrelation. The regression results indicate that the model has good robust and predictive capabilities. The research would provide theoretical guidance for the development of new generation of anti African trypanosomiasis drugs with efficiency and low toxicity.
Martinez-Fiestas, Myriam; Rodríguez-Garzón, Ignacio; Delgado-Padial, Antonio; Lucas-Ruiz, Valeriano
2017-09-01
This article presents a cross-cultural study on perceived risk in the construction industry. Worker samples from three different countries were studied: Spain, Peru and Nicaragua. The main goal was to explain how construction workers perceive their occupational hazard and to analyze how this is related to their national culture. The model used to measure perceived risk was the psychometric paradigm. The results show three very similar profiles, indicating that risk perception is independent of nationality. A cultural analysis was conducted using the Hofstede model. The results of this analysis and the relation to perceived risk showed that risk perception in construction is independent of national culture. Finally, a multiple lineal regression analysis was conducted to determine what qualitative attributes could predict the global quantitative size of risk perception. All of the findings have important implications regarding the management of safety in the workplace.
Li, Yi; Tseng, Yufeng J.; Pan, Dahua; Liu, Jianzhong; Kern, Petra S.; Gerberick, G. Frank; Hopfinger, Anton J.
2008-01-01
Currently, the only validated methods to identify skin sensitization effects are in vivo models, such as the Local Lymph Node Assay (LLNA) and guinea pig studies. There is a tremendous need, in particular due to novel legislation, to develop animal alternatives, eg. Quantitative Structure-Activity Relationship (QSAR) models. Here, QSAR models for skin sensitization using LLNA data have been constructed. The descriptors used to generate these models are derived from the 4D-molecular similarity paradigm and are referred to as universal 4D-fingerprints. A training set of 132 structurally diverse compounds and a test set of 15 structurally diverse compounds were used in this study. The statistical methodologies used to build the models are logistic regression (LR), and partial least square coupled logistic regression (PLS-LR), which prove to be effective tools for studying skin sensitization measures expressed in the two categorical terms of sensitizer and non-sensitizer. QSAR models with low values of the Hosmer-Lemeshow goodness-of-fit statistic, χHL2, are significant and predictive. For the training set, the cross-validated prediction accuracy of the logistic regression models ranges from 77.3% to 78.0%, while that of PLS-logistic regression models ranges from 87.1% to 89.4%. For the test set, the prediction accuracy of logistic regression models ranges from 80.0%-86.7%, while that of PLS-logistic regression models ranges from 73.3%-80.0%. The QSAR models are made up of 4D-fingerprints related to aromatic atoms, hydrogen bond acceptors and negatively partially charged atoms. PMID:17226934
ERIC Educational Resources Information Center
Vrieze, Scott I.
2012-01-01
This article reviews the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) in model selection and the appraisal of psychological theory. The focus is on latent variable models, given their growing use in theory testing and construction. Theoretical statistical results in regression are discussed, and more important…
Brouckaert, D; Uyttersprot, J-S; Broeckx, W; De Beer, T
2018-03-01
Calibration transfer or standardisation aims at creating a uniform spectral response on different spectroscopic instruments or under varying conditions, without requiring a full recalibration for each situation. In the current study, this strategy is applied to construct at-line multivariate calibration models and consequently employ them in-line in a continuous industrial production line, using the same spectrometer. Firstly, quantitative multivariate models are constructed at-line at laboratory scale for predicting the concentration of two main ingredients in hard surface cleaners. By regressing the Raman spectra of a set of small-scale calibration samples against their reference concentration values, partial least squares (PLS) models are developed to quantify the surfactant levels in the liquid detergent compositions under investigation. After evaluating the models performance with a set of independent validation samples, a univariate slope/bias correction is applied in view of transporting these at-line calibration models to an in-line manufacturing set-up. This standardisation technique allows a fast and easy transfer of the PLS regression models, by simply correcting the model predictions on the in-line set-up, without adjusting anything to the original multivariate calibration models. An extensive statistical analysis is performed in order to assess the predictive quality of the transferred regression models. Before and after transfer, the R 2 and RMSEP of both models is compared for evaluating if their magnitude is similar. T-tests are then performed to investigate whether the slope and intercept of the transferred regression line are not statistically different from 1 and 0, respectively. Furthermore, it is inspected whether no significant bias can be noted. F-tests are executed as well, for assessing the linearity of the transfer regression line and for investigating the statistical coincidence of the transfer and validation regression line. Finally, a paired t-test is performed to compare the original at-line model to the slope/bias corrected in-line model, using interval hypotheses. It is shown that the calibration models of Surfactant 1 and Surfactant 2 yield satisfactory in-line predictions after slope/bias correction. While Surfactant 1 passes seven out of eight statistical tests, the recommended validation parameters are 100% successful for Surfactant 2. It is hence concluded that the proposed strategy for transferring at-line calibration models to an in-line industrial environment via a univariate slope/bias correction of the predicted values offers a successful standardisation approach. Copyright © 2017 Elsevier B.V. All rights reserved.
Perez-Guaita, David; Kuligowski, Julia; Quintás, Guillermo; Garrigues, Salvador; Guardia, Miguel de la
2013-03-30
Locally weighted partial least squares regression (LW-PLSR) has been applied to the determination of four clinical parameters in human serum samples (total protein, triglyceride, glucose and urea contents) by Fourier transform infrared (FTIR) spectroscopy. Classical LW-PLSR models were constructed using different spectral regions. For the selection of parameters by LW-PLSR modeling, a multi-parametric study was carried out employing the minimum root-mean square error of cross validation (RMSCV) as objective function. In order to overcome the effect of strong matrix interferences on the predictive accuracy of LW-PLSR models, this work focuses on sample selection. Accordingly, a novel strategy for the development of local models is proposed. It was based on the use of: (i) principal component analysis (PCA) performed on an analyte specific spectral region for identifying most similar sample spectra and (ii) partial least squares regression (PLSR) constructed using the whole spectrum. Results found by using this strategy were compared to those provided by PLSR using the same spectral intervals as for LW-PLSR. Prediction errors found by both, classical and modified LW-PLSR improved those obtained by PLSR. Hence, both proposed approaches were useful for the determination of analytes present in a complex matrix as in the case of human serum samples. Copyright © 2013 Elsevier B.V. All rights reserved.
Newman, J; Egan, T; Harbourne, N; O'Riordan, D; Jacquier, J C; O'Sullivan, M
2014-08-01
Sensory evaluation can be problematic for ingredients with a bitter taste during research and development phase of new food products. In this study, 19 dairy protein hydrolysates (DPH) were analysed by an electronic tongue and their physicochemical characteristics, the data obtained from these methods were correlated with their bitterness intensity as scored by a trained sensory panel and each model was also assessed by its predictive capabilities. The physiochemical characteristics of the DPHs investigated were degree of hydrolysis (DH%), and data relating to peptide size and relative hydrophobicity from size exclusion chromatography (SEC) and reverse phase (RP) HPLC. Partial least square regression (PLS) was used to construct the prediction models. All PLS regressions had good correlations (0.78 to 0.93) with the strongest being the combination of data obtained from SEC and RP HPLC. However, the PLS with the strongest predictive power was based on the e-tongue which had the PLS regression with the lowest root mean predicted residual error sum of squares (PRESS) in the study. The results show that the PLS models constructed with the e-tongue and the combination of SEC and RP-HPLC has potential to be used for prediction of bitterness and thus reducing the reliance on sensory analysis in DPHs for future food research. Copyright © 2014 Elsevier B.V. All rights reserved.
Simultaneous confidence bands for Cox regression from semiparametric random censorship.
Mondal, Shoubhik; Subramanian, Sundarraman
2016-01-01
Cox regression is combined with semiparametric random censorship models to construct simultaneous confidence bands (SCBs) for subject-specific survival curves. Simulation results are presented to compare the performance of the proposed SCBs with the SCBs that are based only on standard Cox. The new SCBs provide correct empirical coverage and are more informative. The proposed SCBs are illustrated with two real examples. An extension to handle missing censoring indicators is also outlined.
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
2017-04-08
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale.
Song, Chao; Kwan, Mei-Po; Zhu, Jiping
2017-01-01
An increasing number of fires are occurring with the rapid development of cities, resulting in increased risk for human beings and the environment. This study compares geographically weighted regression-based models, including geographically weighted regression (GWR) and geographically and temporally weighted regression (GTWR), which integrates spatial and temporal effects and global linear regression models (LM) for modeling fire risk at the city scale. The results show that the road density and the spatial distribution of enterprises have the strongest influences on fire risk, which implies that we should focus on areas where roads and enterprises are densely clustered. In addition, locations with a large number of enterprises have fewer fire ignition records, probably because of strict management and prevention measures. A changing number of significant variables across space indicate that heterogeneity mainly exists in the northern and eastern rural and suburban areas of Hefei city, where human-related facilities or road construction are only clustered in the city sub-centers. GTWR can capture small changes in the spatiotemporal heterogeneity of the variables while GWR and LM cannot. An approach that integrates space and time enables us to better understand the dynamic changes in fire risk. Thus governments can use the results to manage fire safety at the city scale. PMID:28397745
2014-01-01
Background Meta-regression is becoming increasingly used to model study level covariate effects. However this type of statistical analysis presents many difficulties and challenges. Here two methods for calculating confidence intervals for the magnitude of the residual between-study variance in random effects meta-regression models are developed. A further suggestion for calculating credible intervals using informative prior distributions for the residual between-study variance is presented. Methods Two recently proposed and, under the assumptions of the random effects model, exact methods for constructing confidence intervals for the between-study variance in random effects meta-analyses are extended to the meta-regression setting. The use of Generalised Cochran heterogeneity statistics is extended to the meta-regression setting and a Newton-Raphson procedure is developed to implement the Q profile method for meta-analysis and meta-regression. WinBUGS is used to implement informative priors for the residual between-study variance in the context of Bayesian meta-regressions. Results Results are obtained for two contrasting examples, where the first example involves a binary covariate and the second involves a continuous covariate. Intervals for the residual between-study variance are wide for both examples. Conclusions Statistical methods, and R computer software, are available to compute exact confidence intervals for the residual between-study variance under the random effects model for meta-regression. These frequentist methods are almost as easily implemented as their established counterparts for meta-analysis. Bayesian meta-regressions are also easily performed by analysts who are comfortable using WinBUGS. Estimates of the residual between-study variance in random effects meta-regressions should be routinely reported and accompanied by some measure of their uncertainty. Confidence and/or credible intervals are well-suited to this purpose. PMID:25196829
Construction and analysis of a modular model of caspase activation in apoptosis
Harrington, Heather A; Ho, Kenneth L; Ghosh, Samik; Tung, KC
2008-01-01
Background A key physiological mechanism employed by multicellular organisms is apoptosis, or programmed cell death. Apoptosis is triggered by the activation of caspases in response to both extracellular (extrinsic) and intracellular (intrinsic) signals. The extrinsic and intrinsic pathways are characterized by the formation of the death-inducing signaling complex (DISC) and the apoptosome, respectively; both the DISC and the apoptosome are oligomers with complex formation dynamics. Additionally, the extrinsic and intrinsic pathways are coupled through the mitochondrial apoptosis-induced channel via the Bcl-2 family of proteins. Results A model of caspase activation is constructed and analyzed. The apoptosis signaling network is simplified through modularization methodologies and equilibrium abstractions for three functional modules. The mathematical model is composed of a system of ordinary differential equations which is numerically solved. Multiple linear regression analysis investigates the role of each module and reduced models are constructed to identify key contributions of the extrinsic and intrinsic pathways in triggering apoptosis for different cell lines. Conclusion Through linear regression techniques, we identified the feedbacks, dissociation of complexes, and negative regulators as the key components in apoptosis. The analysis and reduced models for our model formulation reveal that the chosen cell lines predominately exhibit strong extrinsic caspase, typical of type I cell, behavior. Furthermore, under the simplified model framework, the selected cells lines exhibit different modes by which caspase activation may occur. Finally the proposed modularized model of apoptosis may generalize behavior for additional cells and tissues, specifically identifying and predicting components responsible for the transition from type I to type II cell behavior. PMID:19077196
NASA Astrophysics Data System (ADS)
Duman, T. Y.; Can, T.; Gokceoglu, C.; Nefeslioglu, H. A.; Sonmez, H.
2006-11-01
As a result of industrialization, throughout the world, cities have been growing rapidly for the last century. One typical example of these growing cities is Istanbul, the population of which is over 10 million. Due to rapid urbanization, new areas suitable for settlement and engineering structures are necessary. The Cekmece area located west of the Istanbul metropolitan area is studied, because the landslide activity is extensive in this area. The purpose of this study is to develop a model that can be used to characterize landslide susceptibility in map form using logistic regression analysis of an extensive landslide database. A database of landslide activity was constructed using both aerial-photography and field studies. About 19.2% of the selected study area is covered by deep-seated landslides. The landslides that occur in the area are primarily located in sandstones with interbedded permeable and impermeable layers such as claystone, siltstone and mudstone. About 31.95% of the total landslide area is located at this unit. To apply logistic regression analyses, a data matrix including 37 variables was constructed. The variables used in the forwards stepwise analyses are different measures of slope, aspect, elevation, stream power index (SPI), plan curvature, profile curvature, geology, geomorphology and relative permeability of lithological units. A total of 25 variables were identified as exerting strong influence on landslide occurrence, and included by the logistic regression equation. Wald statistics values indicate that lithology, SPI and slope are more important than the other parameters in the equation. Beta coefficients of the 25 variables included the logistic regression equation provide a model for landslide susceptibility in the Cekmece area. This model is used to generate a landslide susceptibility map that correctly classified 83.8% of the landslide-prone areas.
Estimating Building Age with 3d GIS
NASA Astrophysics Data System (ADS)
Biljecki, F.; Sindram, M.
2017-10-01
Building datasets (e.g. footprints in OpenStreetMap and 3D city models) are becoming increasingly available worldwide. However, the thematic (attribute) aspect is not always given attention, as many of such datasets are lacking in completeness of attributes. A prominent attribute of buildings is the year of construction, which is useful for some applications, but its availability may be scarce. This paper explores the potential of estimating the year of construction (or age) of buildings from other attributes using random forest regression. The developed method has a two-fold benefit: enriching datasets and quality control (verification of existing attributes). Experiments are carried out on a semantically rich LOD1 dataset of Rotterdam in the Netherlands using 9 attributes. The results are mixed: the accuracy in the estimation of building age depends on the available information used in the regression model. In the best scenario we have achieved predictions with an RMSE of 11 years, but in more realistic situations with limited knowledge about buildings the error is much larger (RMSE = 26 years). Hence the main conclusion of the paper is that inferring building age with 3D city models is possible to a certain extent because it reveals the approximate period of construction, but precise estimations remain a difficult task.
Lu, Lee-Jane W.; Nishino, Thomas K.; Khamapirad, Tuenchit; Grady, James J; Leonard, Morton H.; Brunder, Donald G.
2009-01-01
Breast density (the percentage of fibroglandular tissue in the breast) has been suggested to be a useful surrogate marker for breast cancer risk. It is conventionally measured using screen-film mammographic images by a labor intensive histogram segmentation method (HSM). We have adapted and modified the HSM for measuring breast density from raw digital mammograms acquired by full-field digital mammography. Multiple regression model analyses showed that many of the instrument parameters for acquiring the screening mammograms (e.g. breast compression thickness, radiological thickness, radiation dose, compression force, etc) and image pixel intensity statistics of the imaged breasts were strong predictors of the observed threshold values (model R2=0.93) and %density (R2=0.84). The intra-class correlation coefficient of the %-density for duplicate images was estimated to be 0.80, using the regression model-derived threshold values, and 0.94 if estimated directly from the parameter estimates of the %-density prediction regression model. Therefore, with additional research, these mathematical models could be used to compute breast density objectively, automatically bypassing the HSM step, and could greatly facilitate breast cancer research studies. PMID:17671343
Krasikova, Dina V; Le, Huy; Bachura, Eric
2018-06-01
To address a long-standing concern regarding a gap between organizational science and practice, scholars called for more intuitive and meaningful ways of communicating research results to users of academic research. In this article, we develop a common language effect size index (CLβ) that can help translate research results to practice. We demonstrate how CLβ can be computed and used to interpret the effects of continuous and categorical predictors in multiple linear regression models. We also elaborate on how the proposed CLβ index is computed and used to interpret interactions and nonlinear effects in regression models. In addition, we test the robustness of the proposed index to violations of normality and provide means for computing standard errors and constructing confidence intervals around its estimates. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Ling, Ru; Liu, Jiawang
2011-12-01
To construct prediction model for health workforce and hospital beds in county hospitals of Hunan by multiple linear regression. We surveyed 16 counties in Hunan with stratified random sampling according to uniform questionnaires,and multiple linear regression analysis with 20 quotas selected by literature view was done. Independent variables in the multiple linear regression model on medical personnels in county hospitals included the counties' urban residents' income, crude death rate, medical beds, business occupancy, professional equipment value, the number of devices valued above 10 000 yuan, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, and utilization rate of hospital beds. Independent variables in the multiple linear regression model on county hospital beds included the the population of aged 65 and above in the counties, disposable income of urban residents, medical personnel of medical institutions in county area, business occupancy, the total value of professional equipment, fixed assets, long-term debt, medical income, medical expenses, outpatient and emergency visits, hospital visits, actual available bed days, utilization rate of hospital beds, and length of hospitalization. The prediction model shows good explanatory and fitting, and may be used for short- and mid-term forecasting.
Rainfall-induced Landslide Susceptibility assessment at the Longnan county
NASA Astrophysics Data System (ADS)
Hong, Haoyuan; Zhang, Ying
2017-04-01
Landslides are a serious disaster in Longnan county, China. Therefore landslide susceptibility assessment is useful tool for government or decision making. The main objective of this study is to investigate and compare the frequency ratio, support vector machines, and logistic regression. The Longnan county (Jiangxi province, China) was selected as the case study. First, the landslide inventory map with 354 landslide locations was constructed. Then landslide locations were then randomly divided into a ratio of 70/30 for the training and validating the models. Second, fourteen landslide conditioning factors were prepared such as slope, aspect, altitude, topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), plan curvature, lithology, distance to faults, distance to rivers, distance to roads, land use, normalized difference vegetation index (NDVI), and rainfall. Using the frequency ratio, support vector machines, and logistic regression, a total of three landslide susceptibility models were constructed. Finally, the overall performance of the resulting models was assessed and compared using the Receiver operating characteristic (ROC) curve technique. The result showed that the support vector machines model is the best model in the study area. The success rate is 88.39 %; and prediction rate is 84.06 %.
Statistical considerations in the development of injury risk functions.
McMurry, Timothy L; Poplin, Gerald S
2015-01-01
We address 4 frequently misunderstood and important statistical ideas in the construction of injury risk functions. These include the similarities of survival analysis and logistic regression, the correct scale on which to construct pointwise confidence intervals for injury risk, the ability to discern which form of injury risk function is optimal, and the handling of repeated tests on the same subject. The statistical models are explored through simulation and examination of the underlying mathematics. We provide recommendations for the statistically valid construction and correct interpretation of single-predictor injury risk functions. This article aims to provide useful and understandable statistical guidance to improve the practice in constructing injury risk functions.
Using the Graded Response Model to Control Spurious Interactions in Moderated Multiple Regression
ERIC Educational Resources Information Center
Morse, Brendan J.; Johanson, George A.; Griffeth, Rodger W.
2012-01-01
Recent simulation research has demonstrated that using simple raw score to operationalize a latent construct can result in inflated Type I error rates for the interaction term of a moderated statistical model when the interaction (or lack thereof) is proposed at the latent variable level. Rescaling the scores using an appropriate item response…
ERIC Educational Resources Information Center
Smedema, Susan Miller; Chan, Fong; Yaghmaian, Rana A.; Cardoso, Elizabeth DaSilva; Muller, Veronica; Keegan, John; Dutta, Alo; Ebener, Deborah J.
2015-01-01
This study examined the factorial structure of the construct core self-evaluations (CSE) and tested a mediational model of the relationship between CSE and life satisfaction in college students with disabilities. We conducted a quantitative descriptive design using exploratory and confirmatory factor analysis and multiple regression analysis.…
Valente, Bruno D.; Morota, Gota; Peñagaricano, Francisco; Gianola, Daniel; Weigel, Kent; Rosa, Guilherme J. M.
2015-01-01
The term “effect” in additive genetic effect suggests a causal meaning. However, inferences of such quantities for selection purposes are typically viewed and conducted as a prediction task. Predictive ability as tested by cross-validation is currently the most acceptable criterion for comparing models and evaluating new methodologies. Nevertheless, it does not directly indicate if predictors reflect causal effects. Such evaluations would require causal inference methods that are not typical in genomic prediction for selection. This suggests that the usual approach to infer genetic effects contradicts the label of the quantity inferred. Here we investigate if genomic predictors for selection should be treated as standard predictors or if they must reflect a causal effect to be useful, requiring causal inference methods. Conducting the analysis as a prediction or as a causal inference task affects, for example, how covariates of the regression model are chosen, which may heavily affect the magnitude of genomic predictors and therefore selection decisions. We demonstrate that selection requires learning causal genetic effects. However, genomic predictors from some models might capture noncausal signal, providing good predictive ability but poorly representing true genetic effects. Simulated examples are used to show that aiming for predictive ability may lead to poor modeling decisions, while causal inference approaches may guide the construction of regression models that better infer the target genetic effect even when they underperform in cross-validation tests. In conclusion, genomic selection models should be constructed to aim primarily for identifiability of causal genetic effects, not for predictive ability. PMID:25908318
Karabatsos, George
2017-02-01
Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
NASA Astrophysics Data System (ADS)
Dalkilic, Turkan Erbay; Apaydin, Aysen
2009-11-01
In a regression analysis, it is assumed that the observations come from a single class in a data cluster and the simple functional relationship between the dependent and independent variables can be expressed using the general model; Y=f(X)+[epsilon]. However; a data cluster may consist of a combination of observations that have different distributions that are derived from different clusters. When faced with issues of estimating a regression model for fuzzy inputs that have been derived from different distributions, this regression model has been termed the [`]switching regression model' and it is expressed with . Here li indicates the class number of each independent variable and p is indicative of the number of independent variables [J.R. Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE Transaction on Systems, Man and Cybernetics 23 (3) (1993) 665-685; M. Michel, Fuzzy clustering and switching regression models using ambiguity and distance rejects, Fuzzy Sets and Systems 122 (2001) 363-399; E.Q. Richard, A new approach to estimating switching regressions, Journal of the American Statistical Association 67 (338) (1972) 306-310]. In this study, adaptive networks have been used to construct a model that has been formed by gathering obtained models. There are methods that suggest the class numbers of independent variables heuristically. Alternatively, in defining the optimal class number of independent variables, the use of suggested validity criterion for fuzzy clustering has been aimed. In the case that independent variables have an exponential distribution, an algorithm has been suggested for defining the unknown parameter of the switching regression model and for obtaining the estimated values after obtaining an optimal membership function, which is suitable for exponential distribution.
Virtual Beach version 2.2 (VB 2.2) is a decision support tool. It is designed to construct site-specific Multi-Linear Regression (MLR) models to predict pathogen indicator levels (or fecal indicator bacteria, FIB) at recreational beaches. MLR analysis has outperformed persisten...
[Calculating Pearson residual in logistic regressions: a comparison between SPSS and SAS].
Xu, Hao; Zhang, Tao; Li, Xiao-song; Liu, Yuan-yuan
2015-01-01
To compare the results of Pearson residual calculations in logistic regression models using SPSS and SAS. We reviewed Pearson residual calculation methods, and used two sets of data to test logistic models constructed by SPSS and STATA. One model contained a small number of covariates compared to the number of observed. The other contained a similar number of covariates as the number of observed. The two software packages produced similar Pearson residual estimates when the models contained a similar number of covariates as the number of observed, but the results differed when the number of observed was much greater than the number of covariates. The two software packages produce different results of Pearson residuals, especially when the models contain a small number of covariates. Further studies are warranted.
Prediction of silicon oxynitride plasma etching using a generalized regression neural network
NASA Astrophysics Data System (ADS)
Kim, Byungwhan; Lee, Byung Teak
2005-08-01
A prediction model of silicon oxynitride (SiON) etching was constructed using a neural network. Model prediction performance was improved by means of genetic algorithm. The etching was conducted in a C2F6 inductively coupled plasma. A 24 full factorial experiment was employed to systematically characterize parameter effects on SiON etching. The process parameters include radio frequency source power, bias power, pressure, and C2F6 flow rate. To test the appropriateness of the trained model, additional 16 experiments were conducted. For comparison, four types of statistical regression models were built. Compared to the best regression model, the optimized neural network model demonstrated an improvement of about 52%. The optimized model was used to infer etch mechanisms as a function of parameters. The pressure effect was noticeably large only as relatively large ion bombardment was maintained in the process chamber. Ion-bombardment-activated polymer deposition played the most significant role in interpreting the complex effect of bias power or C2F6 flow rate. Moreover, [CF2] was expected to be the predominant precursor to polymer deposition.
Shi, K-Q; Zhou, Y-Y; Yan, H-D; Li, H; Wu, F-L; Xie, Y-Y; Braddock, M; Lin, X-Y; Zheng, M-H
2017-02-01
At present, there is no ideal model for predicting the short-term outcome of patients with acute-on-chronic hepatitis B liver failure (ACHBLF). This study aimed to establish and validate a prognostic model by using the classification and regression tree (CART) analysis. A total of 1047 patients from two separate medical centres with suspected ACHBLF were screened in the study, which were recognized as derivation cohort and validation cohort, respectively. CART analysis was applied to predict the 3-month mortality of patients with ACHBLF. The accuracy of the CART model was tested using the area under the receiver operating characteristic curve, which was compared with the model for end-stage liver disease (MELD) score and a new logistic regression model. CART analysis identified four variables as prognostic factors of ACHBLF: total bilirubin, age, serum sodium and INR, and three distinct risk groups: low risk (4.2%), intermediate risk (30.2%-53.2%) and high risk (81.4%-96.9%). The new logistic regression model was constructed with four independent factors, including age, total bilirubin, serum sodium and prothrombin activity by multivariate logistic regression analysis. The performances of the CART model (0.896), similar to the logistic regression model (0.914, P=.382), exceeded that of MELD score (0.667, P<.001). The results were confirmed in the validation cohort. We have developed and validated a novel CART model superior to MELD for predicting three-month mortality of patients with ACHBLF. Thus, the CART model could facilitate medical decision-making and provide clinicians with a validated practical bedside tool for ACHBLF risk stratification. © 2016 John Wiley & Sons Ltd.
Miller, Nathan; Prevatt, Frances
2017-10-01
The purpose of this study was to reexamine the latent structure of ADHD and sluggish cognitive tempo (SCT) due to issues with construct validity. Two proposed changes to the construct include viewing hyperactivity and sluggishness (hypoactivity) as a single continuum of activity level, and viewing inattention as a separate dimension from activity level. Data were collected from 1,398 adults using Amazon's MTurk. A new scale measuring activity level was developed, and scores of Inattention were regressed onto scores of Activity Level using curvilinear regression. The Activity Level scale showed acceptable levels of internal consistency, normality, and unimodality. Curvilinear regression indicates that a quadratic (curvilinear) model accurately explains a small but significant portion of the variance in levels of inattention. Hyperactivity and hypoactivity may be viewed as a continuum, rather than separate disorders. Inattention may have a U-shaped relationship with activity level. Linear analyses may be insufficient and inaccurate for studying ADHD.
Gallucci, Andrew; Martin, Ryan; Beaujean, Alex; Usdan, Stuart
2015-01-01
The misuse of prescription stimulants (MPS) is an emergent adverse health behavior among undergraduate college students. However, current research on MPS is largely atheoretical. The purpose of this study was to validate a survey to assess MPS-related theory of planned behavior (TPB) constructs (i.e. attitudes, subjective norms, and perceived behavioral control) and determine the relationship between these constructs, MPS-related risk factors (e.g. gender and class status), and current MPS (i.e. past 30 days use) among college students. Participants (N = 978, 67.8% female and 82.9% Caucasian) at a large public university in the southeastern USA completed a survey assessing MPS and MPS-related TPB constructs during fall 2010. To examine the relationship between MPS-related TPB constructs and current MPS, we conducted (1) confirmatory factor analyses to validate that our survey items assessed MPS-related TPB constructs and (2) a series of regression analyses to examine associations between MPS-related TPB constructs, potential MPS-related risk factors, and MPS in this sample. Our factor analyses indicated that the survey items assessed MPS-related TPB constructs and our multivariate logistic regression analysis indicated that perceived behavioral control was significantly associated with current MPS. In addition, analyses found that having a prescription stimulant was a protective factor against MPS when the model included MPS-related TPB variables.
Security of statistical data bases: invasion of privacy through attribute correlational modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Palley, M.A.
This study develops, defines, and applies a statistical technique for the compromise of confidential information in a statistical data base. Attribute Correlational Modeling (ACM) recognizes that the information contained in a statistical data base represents real world statistical phenomena. As such, ACM assumes correlational behavior among the database attributes. ACM proceeds to compromise confidential information through creation of a regression model, where the confidential attribute is treated as the dependent variable. The typical statistical data base may preclude the direct application of regression. In this scenario, the research introduces the notion of a synthetic data base, created through legitimate queriesmore » of the actual data base, and through proportional random variation of responses to these queries. The synthetic data base is constructed to resemble the actual data base as closely as possible in a statistical sense. ACM then applies regression analysis to the synthetic data base, and utilizes the derived model to estimate confidential information in the actual database.« less
Ertefaie, Ashkan; Shortreed, Susan; Chakraborty, Bibhas
2016-06-15
Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to optimally individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been largely overlooked in the current literature. In this article, we show that residual plots obtained from standard Q-learning models may fail to adequately check the quality of the model fit. We present a modified Q-learning procedure that accommodates residual analyses using standard tools. We present simulation studies showing the advantage of the proposed modification over standard Q-learning. We illustrate this new Q-learning approach using data collected from a sequential multiple assignment randomized trial of patients with schizophrenia. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Constructing a consumption model of fine dining from the perspective of behavioral economics
Tsai, Sang-Bing
2018-01-01
Numerous factors affect how people choose a fine dining restaurant, including food quality, service quality, food safety, and hedonic value. A conceptual framework for evaluating restaurant selection behavior has not yet been developed. This study surveyed 150 individuals with fine dining experience and proposed the use of mental accounting and axiomatic design to construct a consumer economic behavior model. Linear and logistic regressions were employed to determine model correlations and the probability of each factor affecting behavior. The most crucial factor was food quality, followed by service and dining motivation, particularly regarding family dining. Safe ingredients, high cooking standards, and menu innovation all increased the likelihood of consumers choosing fine dining restaurants. PMID:29641554
Constructing a consumption model of fine dining from the perspective of behavioral economics.
Hsu, Sheng-Hsun; Hsiao, Cheng-Fu; Tsai, Sang-Bing
2018-01-01
Numerous factors affect how people choose a fine dining restaurant, including food quality, service quality, food safety, and hedonic value. A conceptual framework for evaluating restaurant selection behavior has not yet been developed. This study surveyed 150 individuals with fine dining experience and proposed the use of mental accounting and axiomatic design to construct a consumer economic behavior model. Linear and logistic regressions were employed to determine model correlations and the probability of each factor affecting behavior. The most crucial factor was food quality, followed by service and dining motivation, particularly regarding family dining. Safe ingredients, high cooking standards, and menu innovation all increased the likelihood of consumers choosing fine dining restaurants.
WebGLORE: a web service for Grid LOgistic REgression.
Jiang, Wenchao; Li, Pinghao; Wang, Shuang; Wu, Yuan; Xue, Meng; Ohno-Machado, Lucila; Jiang, Xiaoqian
2013-12-15
WebGLORE is a free web service that enables privacy-preserving construction of a global logistic regression model from distributed datasets that are sensitive. It only transfers aggregated local statistics (from participants) through Hypertext Transfer Protocol Secure to a trusted server, where the global model is synthesized. WebGLORE seamlessly integrates AJAX, JAVA Applet/Servlet and PHP technologies to provide an easy-to-use web service for biomedical researchers to break down policy barriers during information exchange. http://dbmi-engine.ucsd.edu/webglore3/. WebGLORE can be used under the terms of GNU general public license as published by the Free Software Foundation.
Jiménez-Huete, Adolfo; Riva, Elena; Toledano, Rafael; Campo, Pablo; Esteban, Jesús; Barrio, Antonio Del; Franch, Oriol
2014-12-01
The validity of neuropsychological tests for the differential diagnosis of degenerative dementias may depend on the clinical context. We constructed a series of logistic models taking into account this factor. We retrospectively analyzed the demographic and neuropsychological data of 301 patients with probable Alzheimer's disease (AD), frontotemporal degeneration (FTLD), or dementia with Lewy bodies (DLB). Nine models were constructed taking into account the diagnostic question (eg, AD vs DLB) and subpopulation (incident vs prevalent). The AD versus DLB model for all patients, including memory recovery and phonological fluency, was highly accurate (area under the curve = 0.919, sensitivity = 90%, and specificity = 80%). The results were comparable in incident and prevalent cases. The FTLD versus AD and DLB versus FTLD models were both inaccurate. The models constructed from basic neuropsychological variables allowed an accurate differential diagnosis of AD versus DLB but not of FTLD versus AD or DLB. © The Author(s) 2014.
Bjorner, Jakob Bue; Pejtersen, Jan Hyld
2010-02-01
To evaluate the construct validity of the Copenhagen Psychosocial Questionnaire II (COPSOQ II) by means of tests for differential item functioning (DIF) and differential item effect (DIE). We used a Danish general population postal survey (n = 4,732 with 3,517 wage earners) with a one-year register based follow up for long-term sickness absence. DIF was evaluated against age, gender, education, social class, public/private sector employment, and job type using ordinal logistic regression. DIE was evaluated against job satisfaction and self-rated health (using ordinal logistic regression), against depressive symptoms, burnout, and stress (using multiple linear regression), and against long-term sick leave (using a proportional hazards model). We used a cross-validation approach to counter the risk of significant results due to multiple testing. Out of 1,052 tests, we found 599 significant instances of DIF/DIE, 69 of which showed both practical and statistical significance across two independent samples. Most DIF occurred for job type (in 20 cases), while we found little DIF for age, gender, education, social class and sector. DIE seemed to pertain to particular items, which showed DIE in the same direction for several outcome variables. The results allowed a preliminary identification of items that have a positive impact on construct validity and items that have negative impact on construct validity. These results can be used to develop better shortform measures and to improve the conceptual framework, items and scales of the COPSOQ II. We conclude that tests of DIF and DIE are useful for evaluating construct validity.
Evaluation of statistical models for forecast errors from the HBV model
NASA Astrophysics Data System (ADS)
Engeland, Kolbjørn; Renard, Benjamin; Steinsland, Ingelin; Kolberg, Sjur
2010-04-01
SummaryThree statistical models for the forecast errors for inflow into the Langvatn reservoir in Northern Norway have been constructed and tested according to the agreement between (i) the forecast distribution and the observations and (ii) median values of the forecast distribution and the observations. For the first model observed and forecasted inflows were transformed by the Box-Cox transformation before a first order auto-regressive model was constructed for the forecast errors. The parameters were conditioned on weather classes. In the second model the Normal Quantile Transformation (NQT) was applied on observed and forecasted inflows before a similar first order auto-regressive model was constructed for the forecast errors. For the third model positive and negative errors were modeled separately. The errors were first NQT-transformed before conditioning the mean error values on climate, forecasted inflow and yesterday's error. To test the three models we applied three criterions: we wanted (a) the forecast distribution to be reliable; (b) the forecast intervals to be narrow; (c) the median values of the forecast distribution to be close to the observed values. Models 1 and 2 gave almost identical results. The median values improved the forecast with Nash-Sutcliffe R eff increasing from 0.77 for the original forecast to 0.87 for the corrected forecasts. Models 1 and 2 over-estimated the forecast intervals but gave the narrowest intervals. Their main drawback was that the distributions are less reliable than Model 3. For Model 3 the median values did not fit well since the auto-correlation was not accounted for. Since Model 3 did not benefit from the potential variance reduction that lies in bias estimation and removal it gave on average wider forecasts intervals than the two other models. At the same time Model 3 on average slightly under-estimated the forecast intervals, probably explained by the use of average measures to evaluate the fit.
Self-Determination Theory and Outpatient Follow-Up After Psychiatric Hospitalization.
Sripada, Rebecca K; Bowersox, Nicholas W; Ganoczy, Dara; Valenstein, Marcia; Pfeiffer, Paul N
2016-08-01
The objective of this study was to assess whether the constructs of self-determination theory-autonomy, competence, and relatedness-are associated with adherence to outpatient follow-up appointments after psychiatric hospitalization. 242 individuals discharged from inpatient psychiatric treatment within the Veterans Health Administration completed surveys assessing self-determination theory constructs as well as measures of depression and barriers to treatment. Medical records were used to count the number of mental health visits and no-shows in the 14 weeks following discharge. Logistic regression models assessed the association between survey items assessing theory constructs and attendance at mental healthcare visits. In multivariate models, none of the self-determination theory factors predicted outpatient follow-up attendance. The constructs of self-determination theory as measured by a single self-report survey may not reliably predict adherence to post-hospital care. Need factors such as depression may be more strongly predictive of treatment adherence.
Developing a dengue forecast model using machine learning: A case study in China.
Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-10-01
In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.
Singh, Kunwar P; Gupta, Shikha; Ojha, Priyanka; Rai, Premanjali
2013-04-01
The research aims to develop artificial intelligence (AI)-based model to predict the adsorptive removal of 2-chlorophenol (CP) in aqueous solution by coconut shell carbon (CSC) using four operational variables (pH of solution, adsorbate concentration, temperature, and contact time), and to investigate their effects on the adsorption process. Accordingly, based on a factorial design, 640 batch experiments were conducted. Nonlinearities in experimental data were checked using Brock-Dechert-Scheimkman (BDS) statistics. Five nonlinear models were constructed to predict the adsorptive removal of CP in aqueous solution by CSC using four variables as input. Performances of the constructed models were evaluated and compared using statistical criteria. BDS statistics revealed strong nonlinearity in experimental data. Performance of all the models constructed here was satisfactory. Radial basis function network (RBFN) and multilayer perceptron network (MLPN) models performed better than generalized regression neural network, support vector machines, and gene expression programming models. Sensitivity analysis revealed that the contact time had highest effect on adsorption followed by the solution pH, temperature, and CP concentration. The study concluded that all the models constructed here were capable of capturing the nonlinearity in data. A better generalization and predictive performance of RBFN and MLPN models suggested that these can be used to predict the adsorption of CP in aqueous solution using CSC.
Analysis of precision and accuracy in a simple model of machine learning
NASA Astrophysics Data System (ADS)
Lee, Julian
2017-12-01
Machine learning is a procedure where a model for the world is constructed from a training set of examples. It is important that the model should capture relevant features of the training set, and at the same time make correct prediction for examples not included in the training set. I consider the polynomial regression, the simplest method of learning, and analyze the accuracy and precision for different levels of the model complexity.
Ensemble learning with trees and rules: supervised, semi-supervised, unsupervised
USDA-ARS?s Scientific Manuscript database
In this article, we propose several new approaches for post processing a large ensemble of conjunctive rules for supervised and semi-supervised learning problems. We show with various examples that for high dimensional regression problems the models constructed by the post processing the rules with ...
Identifying pollution sources and predicting urban air quality using ensemble learning methods
NASA Astrophysics Data System (ADS)
Singh, Kunwar P.; Gupta, Shikha; Rai, Premanjali
2013-12-01
In this study, principal components analysis (PCA) was performed to identify air pollution sources and tree based ensemble learning models were constructed to predict the urban air quality of Lucknow (India) using the air quality and meteorological databases pertaining to a period of five years. PCA identified vehicular emissions and fuel combustion as major air pollution sources. The air quality indices revealed the air quality unhealthy during the summer and winter. Ensemble models were constructed to discriminate between the seasonal air qualities, factors responsible for discrimination, and to predict the air quality indices. Accordingly, single decision tree (SDT), decision tree forest (DTF), and decision treeboost (DTB) were constructed and their generalization and predictive performance was evaluated in terms of several statistical parameters and compared with conventional machine learning benchmark, support vector machines (SVM). The DT and SVM models discriminated the seasonal air quality rendering misclassification rate (MR) of 8.32% (SDT); 4.12% (DTF); 5.62% (DTB), and 6.18% (SVM), respectively in complete data. The AQI and CAQI regression models yielded a correlation between measured and predicted values and root mean squared error of 0.901, 6.67 and 0.825, 9.45 (SDT); 0.951, 4.85 and 0.922, 6.56 (DTF); 0.959, 4.38 and 0.929, 6.30 (DTB); 0.890, 7.00 and 0.836, 9.16 (SVR) in complete data. The DTF and DTB models outperformed the SVM both in classification and regression which could be attributed to the incorporation of the bagging and boosting algorithms in these models. The proposed ensemble models successfully predicted the urban ambient air quality and can be used as effective tools for its management.
Functional CAR models for large spatially correlated functional datasets.
Zhang, Lin; Baladandayuthapani, Veerabhadran; Zhu, Hongxiao; Baggerly, Keith A; Majewski, Tadeusz; Czerniak, Bogdan A; Morris, Jeffrey S
2016-01-01
We develop a functional conditional autoregressive (CAR) model for spatially correlated data for which functions are collected on areal units of a lattice. Our model performs functional response regression while accounting for spatial correlations with potentially nonseparable and nonstationary covariance structure, in both the space and functional domains. We show theoretically that our construction leads to a CAR model at each functional location, with spatial covariance parameters varying and borrowing strength across the functional domain. Using basis transformation strategies, the nonseparable spatial-functional model is computationally scalable to enormous functional datasets, generalizable to different basis functions, and can be used on functions defined on higher dimensional domains such as images. Through simulation studies, we demonstrate that accounting for the spatial correlation in our modeling leads to improved functional regression performance. Applied to a high-throughput spatially correlated copy number dataset, the model identifies genetic markers not identified by comparable methods that ignore spatial correlations.
Ye, Jiang-Feng; Zhao, Yu-Xin; Ju, Jian; Wang, Wei
2017-10-01
To discuss the value of the Bedside Index for Severity in Acute Pancreatitis (BISAP), Modified Early Warning Score (MEWS), serum Ca2+, similarly hereinafter, and red cell distribution width (RDW) for predicting the severity grade of acute pancreatitis and to develop and verify a more accurate scoring system to predict the severity of AP. In 302 patients with AP, we calculated BISAP and MEWS scores and conducted regression analyses on the relationships of BISAP scoring, RDW, MEWS, and serum Ca2+ with the severity of AP using single-factor logistics. The variables with statistical significance in the single-factor logistic regression were used in a multi-factor logistic regression model; forward stepwise regression was used to screen variables and build a multi-factor prediction model. A receiver operating characteristic curve (ROC curve) was constructed, and the significance of multi- and single-factor prediction models in predicting the severity of AP using the area under the ROC curve (AUC) was evaluated. The internal validity of the model was verified through bootstrapping. Among 302 patients with AP, 209 had mild acute pancreatitis (MAP) and 93 had severe acute pancreatitis (SAP). According to single-factor logistic regression analysis, we found that BISAP, MEWS and serum Ca2+ are prediction indexes of the severity of AP (P-value<0.001), whereas RDW is not a prediction index of AP severity (P-value>0.05). The multi-factor logistic regression analysis showed that BISAP and serum Ca2+ are independent prediction indexes of AP severity (P-value<0.001), and MEWS is not an independent prediction index of AP severity (P-value>0.05); BISAP is negatively related to serum Ca2+ (r=-0.330, P-value<0.001). The constructed model is as follows: ln()=7.306+1.151*BISAP-4.516*serum Ca2+. The predictive ability of each model for SAP follows the order of the combined BISAP and serum Ca2+ prediction model>Ca2+>BISAP. There is no statistical significance for the predictive ability of BISAP and serum Ca2+ (P-value>0.05); however, there is remarkable statistical significance for the predictive ability using the newly built prediction model as well as BISAP and serum Ca2+ individually (P-value<0.01). Verification of the internal validity of the models by bootstrapping is favorable. BISAP and serum Ca2+ have high predictive value for the severity of AP. However, the model built by combining BISAP and serum Ca2+ is remarkably superior to those of BISAP and serum Ca2+ individually. Furthermore, this model is simple, practical and appropriate for clinical use. Copyright © 2016. Published by Elsevier Masson SAS.
Robust scoring functions for protein-ligand interactions with quantum chemical charge models.
Wang, Jui-Chih; Lin, Jung-Hsin; Chen, Chung-Ming; Perryman, Alex L; Olson, Arthur J
2011-10-24
Ordinary least-squares (OLS) regression has been used widely for constructing the scoring functions for protein-ligand interactions. However, OLS is very sensitive to the existence of outliers, and models constructed using it are easily affected by the outliers or even the choice of the data set. On the other hand, determination of atomic charges is regarded as of central importance, because the electrostatic interaction is known to be a key contributing factor for biomolecular association. In the development of the AutoDock4 scoring function, only OLS was conducted, and the simple Gasteiger method was adopted. It is therefore of considerable interest to see whether more rigorous charge models could improve the statistical performance of the AutoDock4 scoring function. In this study, we have employed two well-established quantum chemical approaches, namely the restrained electrostatic potential (RESP) and the Austin-model 1-bond charge correction (AM1-BCC) methods, to obtain atomic partial charges, and we have compared how different charge models affect the performance of AutoDock4 scoring functions. In combination with robust regression analysis and outlier exclusion, our new protein-ligand free energy regression model with AM1-BCC charges for ligands and Amber99SB charges for proteins achieve lowest root-mean-squared error of 1.637 kcal/mol for the training set of 147 complexes and 2.176 kcal/mol for the external test set of 1427 complexes. The assessment for binding pose prediction with the 100 external decoy sets indicates very high success rate of 87% with the criteria of predicted root-mean-squared deviation of less than 2 Å. The success rates and statistical performance of our robust scoring functions are only weakly class-dependent (hydrophobic, hydrophilic, or mixed).
He, Y J; Li, X T; Fan, Z Q; Li, Y L; Cao, K; Sun, Y S; Ouyang, T
2018-01-23
Objective: To construct a dynamic enhanced MR based predictive model for early assessing pathological complete response (pCR) to neoadjuvant therapy in breast cancer, and to evaluate the clinical benefit of the model by using decision curve. Methods: From December 2005 to December 2007, 170 patients with breast cancer treated with neoadjuvant therapy were identified and their MR images before neoadjuvant therapy and at the end of the first cycle of neoadjuvant therapy were collected. Logistic regression model was used to detect independent factors for predicting pCR and construct the predictive model accordingly, then receiver operating characteristic (ROC) curve and decision curve were used to evaluate the predictive model. Results: ΔArea(max) and Δslope(max) were independent predictive factors for pCR, OR =0.942 (95% CI : 0.918-0.967) and 0.961 (95% CI : 0.940-0.987), respectively. The area under ROC curve (AUC) for the constructed model was 0.886 (95% CI : 0.820-0.951). Decision curve showed that in the range of the threshold probability above 0.4, the predictive model presented increased net benefit as the threshold probability increased. Conclusions: The constructed predictive model for pCR is of potential clinical value, with an AUC>0.85. Meanwhile, decision curve analysis indicates the constructed predictive model has net benefit from 3 to 8 percent in the likely range of probability threshold from 80% to 90%.
Mapping urban environmental noise: a land use regression method.
Xie, Dan; Liu, Yi; Chen, Jining
2011-09-01
Forecasting and preventing urban noise pollution are major challenges in urban environmental management. Most existing efforts, including experiment-based models, statistical models, and noise mapping, however, have limited capacity to explain the association between urban growth and corresponding noise change. Therefore, these conventional methods can hardly forecast urban noise at a given outlook of development layout. This paper, for the first time, introduces a land use regression method, which has been applied for simulating urban air quality for a decade, to construct an urban noise model (LUNOS) in Dalian Municipality, Northwest China. The LUNOS model describes noise as a dependent variable of surrounding various land areas via a regressive function. The results suggest that a linear model performs better in fitting monitoring data, and there is no significant difference of the LUNOS's outputs when applied to different spatial scales. As the LUNOS facilitates a better understanding of the association between land use and urban environmental noise in comparison to conventional methods, it can be regarded as a promising tool for noise prediction for planning purposes and aid smart decision-making.
Trojano, Luigi; Siciliano, Mattia; Cristinzio, Chiara; Grossi, Dario
2018-01-01
The present study aimed at exploring relationships among the visuospatial tasks included in the Battery for Visuospatial Abilities (BVA), and at assessing the relative contribution of different facets of visuospatial processing on tests tapping constructional abilities and nonverbal abstract reasoning. One hundred forty-four healthy subjects with a normal score on Mini Mental State Examination completed the BVA plus Raven's Coloured Progressive Matrices and Constructional Apraxia test. We used Principal Axis Factoring and Parallel Analysis to investigate relationships among the BVA visuospatial tasks, and performed regression analyses to assess the visuospatial contribution to constructional abilities and nonverbal abstract reasoning. Principal Axis Factoring and Parallel Analysis revealed two eigenvalues exceeding 1, accounting for about 60% of the variance. A 2-factor model provided the best fit. Factor 1 included sub-tests exploring "complex" visuospatial skills, whereas Factor 2 included two subtests tapping "simple" visuospatial skills. Regression analyses revealed that both Factor 1 and Factor 2 significantly affected performance on Raven's Coloured Progressive Matrices, whereas only the Factor 1 affected performance on Constructional Apraxia test. Our results supported functional segregation proposed by De Renzi, suggesting clinical caution to utilize a single test to assess visuospatial domain, and qualified the visuospatial contribution in drawing and non-verbal intelligence test.
Psychosocial Correlates of Dating Violence Victimization among Latino Youth
ERIC Educational Resources Information Center
Howard, Donna E.; Beck, Kenneth; Kerr, Melissa Hallmark; Shattuck, Teresa
2005-01-01
To examine the association between physical dating violence victimization and risk and protective factors, an anonymous, cross-sectional, self-reported survey was administered to Latino youth (n = 446) residing in suburban Washington, DC. Multivariate logistic regression models were constructed, and adjusted OR and 95% CI were examined.…
System identification principles in studies of forest dynamics.
Rolfe A. Leary
1970-01-01
Shows how it is possible to obtain governing equation parameter estimates on the basis of observed system states. The approach used represents a constructive alternative to regression techniques for models expressed as differential equations. This approach allows scientists to more completely quantify knowledge of forest development processes, to express theories in...
NASA Astrophysics Data System (ADS)
Kiram, J. J.; Sulaiman, J.; Swanto, S.; Din, W. A.
2015-10-01
This study aims to construct a mathematical model of the relationship between a student's Language Learning Strategy usage and English Language proficiency. Fifty-six pre-university students of University Malaysia Sabah participated in this study. A self-report questionnaire called the Strategy Inventory for Language Learning was administered to them to measure their language learning strategy preferences before they sat for the Malaysian University English Test (MUET), the results of which were utilised to measure their English language proficiency. We attempted the model assessment specific to Multiple Linear Regression Analysis subject to variable selection using Stepwise regression. We conducted various assessments to the model obtained, including the Global F-test, Root Mean Square Error and R-squared. The model obtained suggests that not all language learning strategies should be included in the model in an attempt to predict Language Proficiency.
NASA Astrophysics Data System (ADS)
Yilmaz, Işık
2009-06-01
The purpose of this study is to compare the landslide susceptibility mapping methods of frequency ratio (FR), logistic regression and artificial neural networks (ANN) applied in the Kat County (Tokat—Turkey). Digital elevation model (DEM) was first constructed using GIS software. Landslide-related factors such as geology, faults, drainage system, topographical elevation, slope angle, slope aspect, topographic wetness index (TWI) and stream power index (SPI) were used in the landslide susceptibility analyses. Landslide susceptibility maps were produced from the frequency ratio, logistic regression and neural networks models, and they were then compared by means of their validations. The higher accuracies of the susceptibility maps for all three models were obtained from the comparison of the landslide susceptibility maps with the known landslide locations. However, respective area under curve (AUC) values of 0.826, 0.842 and 0.852 for frequency ratio, logistic regression and artificial neural networks showed that the map obtained from ANN model is more accurate than the other models, accuracies of all models can be evaluated relatively similar. The results obtained in this study also showed that the frequency ratio model can be used as a simple tool in assessment of landslide susceptibility when a sufficient number of data were obtained. Input process, calculations and output process are very simple and can be readily understood in the frequency ratio model, however logistic regression and neural networks require the conversion of data to ASCII or other formats. Moreover, it is also very hard to process the large amount of data in the statistical package.
Modeling Manpower and Equipment Productivity in Tall Building Construction Projects
NASA Astrophysics Data System (ADS)
Mudumbai Krishnaswamy, Parthasarathy; Rajiah, Murugasan; Vasan, Ramya
2017-12-01
Tall building construction projects involve two critical resources of manpower and equipment. Their usage, however, widely varies due to several factors affecting their productivity. Currently, no systematic study for estimating and increasing their productivity is available. What is prevalent is the use of empirical data, experience of similar projects and assumptions. As tall building projects are here to stay and increase, to meet the emerging demands in ever shrinking urban spaces, it is imperative to explore ways and means of scientific productivity models for basic construction activities: concrete, reinforcement, formwork, block work and plastering for the input of specific resources in a mixed environment of manpower and equipment usage. Data pertaining to 72 tall building projects in India were collected and analyzed. Then, suitable productivity estimation models were developed using multiple linear regression analysis and validated using independent field data. It is hoped that the models developed in the study will be useful for quantity surveyors, cost engineers and project managers to estimate productivity of resources in tall building projects.
Biomarker combinations for diagnosis and prognosis in multicenter studies: Principles and methods.
Meisner, Allison; Parikh, Chirag R; Kerr, Kathleen F
2017-01-01
Many investigators are interested in combining biomarkers to predict a binary outcome or detect underlying disease. This endeavor is complicated by the fact that many biomarker studies involve data from multiple centers. Depending upon the relationship between center, the biomarkers, and the target of prediction, care must be taken when constructing and evaluating combinations of biomarkers. We introduce a taxonomy to describe the role of center and consider how a biomarker combination should be constructed and evaluated. We show that ignoring center, which is frequently done by clinical researchers, is often not appropriate. The limited statistical literature proposes using random intercept logistic regression models, an approach that we demonstrate is generally inadequate and may be misleading. We instead propose using fixed intercept logistic regression, which appropriately accounts for center without relying on untenable assumptions. After constructing the biomarker combination, we recommend using performance measures that account for the multicenter nature of the data, namely the center-adjusted area under the receiver operating characteristic curve. We apply these methods to data from a multicenter study of acute kidney injury after cardiac surgery. Appropriately accounting for center, both in construction and evaluation, may increase the likelihood of identifying clinically useful biomarker combinations.
Stewart, James A.; Kohnert, Aaron A.; Capolungo, Laurent; ...
2018-03-06
The complexity of radiation effects in a material’s microstructure makes developing predictive models a difficult task. In principle, a complete list of all possible reactions between defect species being considered can be used to elucidate damage evolution mechanisms and its associated impact on microstructure evolution. However, a central limitation is that many models use a limited and incomplete catalog of defect energetics and associated reactions. Even for a given model, estimating its input parameters remains a challenge, especially for complex material systems. Here, we present a computational analysis to identify the extent to which defect accumulation, energetics, and irradiation conditionsmore » can be determined via forward and reverse regression models constructed and trained from large data sets produced by cluster dynamics simulations. A global sensitivity analysis, via Sobol’ indices, concisely characterizes parameter sensitivity and demonstrates how this can be connected to variability in defect evolution. Based on this analysis and depending on the definition of what constitutes the input and output spaces, forward and reverse regression models are constructed and allow for the direct calculation of defect accumulation, defect energetics, and irradiation conditions. Here, this computational analysis, exercised on a simplified cluster dynamics model, demonstrates the ability to design predictive surrogate and reduced-order models, and provides guidelines for improving model predictions within the context of forward and reverse engineering of mathematical models for radiation effects in a materials’ microstructure.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stewart, James A.; Kohnert, Aaron A.; Capolungo, Laurent
The complexity of radiation effects in a material’s microstructure makes developing predictive models a difficult task. In principle, a complete list of all possible reactions between defect species being considered can be used to elucidate damage evolution mechanisms and its associated impact on microstructure evolution. However, a central limitation is that many models use a limited and incomplete catalog of defect energetics and associated reactions. Even for a given model, estimating its input parameters remains a challenge, especially for complex material systems. Here, we present a computational analysis to identify the extent to which defect accumulation, energetics, and irradiation conditionsmore » can be determined via forward and reverse regression models constructed and trained from large data sets produced by cluster dynamics simulations. A global sensitivity analysis, via Sobol’ indices, concisely characterizes parameter sensitivity and demonstrates how this can be connected to variability in defect evolution. Based on this analysis and depending on the definition of what constitutes the input and output spaces, forward and reverse regression models are constructed and allow for the direct calculation of defect accumulation, defect energetics, and irradiation conditions. Here, this computational analysis, exercised on a simplified cluster dynamics model, demonstrates the ability to design predictive surrogate and reduced-order models, and provides guidelines for improving model predictions within the context of forward and reverse engineering of mathematical models for radiation effects in a materials’ microstructure.« less
Ren, Y Y; Zhou, L C; Yang, L; Liu, P Y; Zhao, B W; Liu, H X
2016-09-01
The paper highlights the use of the logistic regression (LR) method in the construction of acceptable statistically significant, robust and predictive models for the classification of chemicals according to their aquatic toxic modes of action. Essentials accounting for a reliable model were all considered carefully. The model predictors were selected by stepwise forward discriminant analysis (LDA) from a combined pool of experimental data and chemical structure-based descriptors calculated by the CODESSA and DRAGON software packages. Model predictive ability was validated both internally and externally. The applicability domain was checked by the leverage approach to verify prediction reliability. The obtained models are simple and easy to interpret. In general, LR performs much better than LDA and seems to be more attractive for the prediction of the more toxic compounds, i.e. compounds that exhibit excess toxicity versus non-polar narcotic compounds and more reactive compounds versus less reactive compounds. In addition, model fit and regression diagnostics was done through the influence plot which reflects the hat-values, studentized residuals, and Cook's distance statistics of each sample. Overdispersion was also checked for the LR model. The relationships between the descriptors and the aquatic toxic behaviour of compounds are also discussed.
Oil Formation Volume Factor Determination Through a Fused Intelligence
NASA Astrophysics Data System (ADS)
Gholami, Amin
2016-12-01
Volume change of oil between reservoir condition and standard surface condition is called oil formation volume factor (FVF), which is very time, cost and labor intensive to determine. This study proposes an accurate, rapid and cost-effective approach for determining FVF from reservoir temperature, dissolved gas oil ratio, and specific gravity of both oil and dissolved gas. Firstly, structural risk minimization (SRM) principle of support vector regression (SVR) was employed to construct a robust model for estimating FVF from the aforementioned inputs. Subsequently, an alternating conditional expectation (ACE) was used for approximating optimal transformations of input/output data to a higher correlated data and consequently developing a sophisticated model between transformed data. Eventually, a committee machine with SVR and ACE was constructed through the use of hybrid genetic algorithm-pattern search (GA-PS). Committee machine integrates ACE and SVR models in an optimal linear combination such that makes benefit of both methods. A group of 342 data points was used for model development and a group of 219 data points was used for blind testing the constructed model. Results indicated that the committee machine performed better than individual models.
Gaussian functional regression for output prediction: Model assimilation and experimental design
NASA Astrophysics Data System (ADS)
Nguyen, N. C.; Peraire, J.
2016-03-01
In this paper, we introduce a Gaussian functional regression (GFR) technique that integrates multi-fidelity models with model reduction to efficiently predict the input-output relationship of a high-fidelity model. The GFR method combines the high-fidelity model with a low-fidelity model to provide an estimate of the output of the high-fidelity model in the form of a posterior distribution that can characterize uncertainty in the prediction. A reduced basis approximation is constructed upon the low-fidelity model and incorporated into the GFR method to yield an inexpensive posterior distribution of the output estimate. As this posterior distribution depends crucially on a set of training inputs at which the high-fidelity models are simulated, we develop a greedy sampling algorithm to select the training inputs. Our approach results in an output prediction model that inherits the fidelity of the high-fidelity model and has the computational complexity of the reduced basis approximation. Numerical results are presented to demonstrate the proposed approach.
Study on power grid characteristics in summer based on Linear regression analysis
NASA Astrophysics Data System (ADS)
Tang, Jin-hui; Liu, You-fei; Liu, Juan; Liu, Qiang; Liu, Zhuan; Xu, Xi
2018-05-01
The correlation analysis of power load and temperature is the precondition and foundation for accurate load prediction, and a great deal of research has been made. This paper constructed the linear correlation model between temperature and power load, then the correlation of fault maintenance work orders with the power load is researched. Data details of Jiangxi province in 2017 summer such as temperature, power load, fault maintenance work orders were adopted in this paper to develop data analysis and mining. Linear regression models established in this paper will promote electricity load growth forecast, fault repair work order review, distribution network operation weakness analysis and other work to further deepen the refinement.
WebGLORE: a Web service for Grid LOgistic REgression
Jiang, Wenchao; Li, Pinghao; Wang, Shuang; Wu, Yuan; Xue, Meng; Ohno-Machado, Lucila; Jiang, Xiaoqian
2013-01-01
WebGLORE is a free web service that enables privacy-preserving construction of a global logistic regression model from distributed datasets that are sensitive. It only transfers aggregated local statistics (from participants) through Hypertext Transfer Protocol Secure to a trusted server, where the global model is synthesized. WebGLORE seamlessly integrates AJAX, JAVA Applet/Servlet and PHP technologies to provide an easy-to-use web service for biomedical researchers to break down policy barriers during information exchange. Availability and implementation: http://dbmi-engine.ucsd.edu/webglore3/. WebGLORE can be used under the terms of GNU general public license as published by the Free Software Foundation. Contact: x1jiang@ucsd.edu PMID:24072732
Rebechi, S R; Vélez, M A; Vaira, S; Perotti, M C
2016-02-01
The aims of the present study were to test the accuracy of the fatty acid ratios established by the Argentinean Legislation to detect adulterations of milk fat with animal fats and to propose a regression model suitable to evaluate these adulterations. For this purpose, 70 milk fat, 10 tallow and 7 lard fat samples were collected and analyzed by gas chromatography. Data was utilized to simulate arithmetically adulterated milk fat samples at 0%, 2%, 5%, 10% and 15%, for both animal fats. The fatty acids ratios failed to distinguish adulterated milk fats containing less than 15% of tallow or lard. For each adulterant, Multiple Linear Regression (MLR) was applied, and a model was chosen and validated. For that, calibration and validation matrices were constructed employing genuine and adulterated milk fat samples. The models were able to detect adulterations of milk fat at levels greater than 10% for tallow and 5% for lard. Copyright © 2015 Elsevier Ltd. All rights reserved.
Quantile Regression Models for Current Status Data
Ou, Fang-Shu; Zeng, Donglin; Cai, Jianwen
2016-01-01
Current status data arise frequently in demography, epidemiology, and econometrics where the exact failure time cannot be determined but is only known to have occurred before or after a known observation time. We propose a quantile regression model to analyze current status data, because it does not require distributional assumptions and the coefficients can be interpreted as direct regression effects on the distribution of failure time in the original time scale. Our model assumes that the conditional quantile of failure time is a linear function of covariates. We assume conditional independence between the failure time and observation time. An M-estimator is developed for parameter estimation which is computed using the concave-convex procedure and its confidence intervals are constructed using a subsampling method. Asymptotic properties for the estimator are derived and proven using modern empirical process theory. The small sample performance of the proposed method is demonstrated via simulation studies. Finally, we apply the proposed method to analyze data from the Mayo Clinic Study of Aging. PMID:27994307
Plausibility and the Theoreticians' Regress: Constructing the evolutionary fate of stars
NASA Astrophysics Data System (ADS)
Ipe, Alex Ike
2002-10-01
This project presents a case-study of a scientific controversy that occurred in theoretical astrophysics nearly seventy years ago following the conceptual discovery of a novel phenomenon relating to the evolution and structure of stellar matter, known as the limiting mass. The ensuing debate between the author of the finding, Subrahmanyan Chandrasekhar and his primary critic, Arthur Stanley Eddington, witnessed both scientists trying to convince one another, as well as the astrophysical community, that their respective positions on the issue was the correct one. Since there was no independent criterion—that is, no observational evidence—at the time of the dispute that could have been drawn upon to test the validity of the limiting mass concept, a logical, objective resolution to the controversy was not possible. In this respect, I argue that the dynamics of the Chandrasekhar-Eddington debate succinctly resonates with Kennefick's notion of the Theoreticians' Regress. However, whereas this model predicts that such a regress can be broken if both parties in a dispute come to agree on who was in error and collaborate on a calculation whose technical foundation can be agreed to, I argue that a more pragmatic path by which the Theoreticians' Regress is broken is when one side in a dispute is able to construct its argument as being more plausible than that of its opponent, and is so successful in doing so, that its opposition is subsequently forced to withdraw from the debate. In order to adequately deal with the construction of plausibility in the context of scientific controversies, I draw upon Harvey's Plausibility Model as well as Pickering's work on the role socio-cultural factors play in the resolution of intellectual disputes. It is believed that the ideas embedded in these social- relativist-constructivist perspectives provide the most parsimonious explanation as to the reasons for the genesis and ultimate closure of this particular scientific controversy.
NASA Astrophysics Data System (ADS)
Pradhan, Biswajeet
2010-05-01
This paper presents the results of the cross-validation of a multivariate logistic regression model using remote sensing data and GIS for landslide hazard analysis on the Penang, Cameron, and Selangor areas in Malaysia. Landslide locations in the study areas were identified by interpreting aerial photographs and satellite images, supported by field surveys. SPOT 5 and Landsat TM satellite imagery were used to map landcover and vegetation index, respectively. Maps of topography, soil type, lineaments and land cover were constructed from the spatial datasets. Ten factors which influence landslide occurrence, i.e., slope, aspect, curvature, distance from drainage, lithology, distance from lineaments, soil type, landcover, rainfall precipitation, and normalized difference vegetation index (ndvi), were extracted from the spatial database and the logistic regression coefficient of each factor was computed. Then the landslide hazard was analysed using the multivariate logistic regression coefficients derived not only from the data for the respective area but also using the logistic regression coefficients calculated from each of the other two areas (nine hazard maps in all) as a cross-validation of the model. For verification of the model, the results of the analyses were then compared with the field-verified landslide locations. Among the three cases of the application of logistic regression coefficient in the same study area, the case of Selangor based on the Selangor logistic regression coefficients showed the highest accuracy (94%), where as Penang based on the Penang coefficients showed the lowest accuracy (86%). Similarly, among the six cases from the cross application of logistic regression coefficient in other two areas, the case of Selangor based on logistic coefficient of Cameron showed highest (90%) prediction accuracy where as the case of Penang based on the Selangor logistic regression coefficients showed the lowest accuracy (79%). Qualitatively, the cross application model yields reasonable results which can be used for preliminary landslide hazard mapping.
Kelleher, John D; Ross, Robert J; Sloan, Colm; Mac Namee, Brian
2011-02-01
Although data-driven spatial template models provide a practical and cognitively motivated mechanism for characterizing spatial term meaning, the influence of perceptual rather than solely geometric and functional properties has yet to be systematically investigated. In the light of this, in this paper, we investigate the effects of the perceptual phenomenon of object occlusion on the semantics of projective terms. We did this by conducting a study to test whether object occlusion had a noticeable effect on the acceptance values assigned to projective terms with respect to a 2.5-dimensional visual stimulus. Based on the data collected, a regression model was constructed and presented. Subsequent analysis showed that the regression model that included the occlusion factor outperformed an adaptation of Regier & Carlson's well-regarded AVS model for that same spatial configuration.
NASA Astrophysics Data System (ADS)
Haritonova, Larisa
2018-03-01
The recent change in the correlation of the number of man-made and natural catastrophes is presented in the paper. Some recommendations are proposed to increase the firefighting efficiency in the high-rise buildings. The article analyzes the methodology of modeling seismic effects. The prospectivity of applying the neural modeling and artificial neural networks to analyze a such dynamic parameters of the earthquake foci as the value of dislocation (or the average rupture slip) is shown. The following two input signals were used: the power class and the number of earthquakes. The regression analysis has been carried out for the predicted results and the target outputs. The equations of the regression for the outputs and target are presented in the work as well as the correlation coefficients in training, validation, testing, and the total (All) for the network structure 2-5-5-1for the average rupture slip. The application of the results obtained in the article for the seismic design for the newly constructed buildings and structures and the given recommendations will provide the additional protection from fire and earthquake risks, reduction of their negative economic and environmental consequences.
Toyabe, Shin-ichi
2014-01-01
Inpatient falls are the most common adverse events that occur in a hospital, and about 3 to 10% of falls result in serious injuries such as bone fractures and intracranial haemorrhages. We previously reported that bone fractures and intracranial haemorrhages were two major fall-related injuries and that risk assessment score for osteoporotic bone fracture was significantly associated not only with bone fractures after falls but also with intracranial haemorrhage after falls. Based on the results, we tried to establish a risk assessment tool for predicting fall-related severe injuries in a hospital. Possible risk factors related to fall-related serious injuries were extracted from data on inpatients that were admitted to a tertiary-care university hospital by using multivariate Cox’ s regression analysis and multiple logistic regression analysis. We found that fall risk score and fracture risk score were the two significant factors, and we constructed models to predict fall-related severe injuries incorporating these factors. When the prediction model was applied to another independent dataset, the constructed model could detect patients with fall-related severe injuries efficiently. The new assessment system could identify patients prone to severe injuries after falls in a reproducible fashion. PMID:25168984
Zhang, Jingyi; Li, Bin; Chen, Yumin; Chen, Meijie; Fang, Tao; Liu, Yongfeng
2018-06-11
This paper proposes a regression model using the Eigenvector Spatial Filtering (ESF) method to estimate ground PM 2.5 concentrations. Covariates are derived from remotely sensed data including aerosol optical depth, normal differential vegetation index, surface temperature, air pressure, relative humidity, height of planetary boundary layer and digital elevation model. In addition, cultural variables such as factory densities and road densities are also used in the model. With the Yangtze River Delta region as the study area, we constructed ESF-based Regression (ESFR) models at different time scales, using data for the period between December 2015 and November 2016. We found that the ESFR models effectively filtered spatial autocorrelation in the OLS residuals and resulted in increases in the goodness-of-fit metrics as well as reductions in residual standard errors and cross-validation errors, compared to the classic OLS models. The annual ESFR model explained 70% of the variability in PM 2.5 concentrations, 16.7% more than the non-spatial OLS model. With the ESFR models, we performed detail analyses on the spatial and temporal distributions of PM 2.5 concentrations in the study area. The model predictions are lower than ground observations but match the general trend. The experiment shows that ESFR provides a promising approach to PM 2.5 analysis and prediction.
Chung, Moo K.; Qiu, Anqi; Seo, Seongho; Vorperian, Houri K.
2014-01-01
We present a novel kernel regression framework for smoothing scalar surface data using the Laplace-Beltrami eigenfunctions. Starting with the heat kernel constructed from the eigenfunctions, we formulate a new bivariate kernel regression framework as a weighted eigenfunction expansion with the heat kernel as the weights. The new kernel regression is mathematically equivalent to isotropic heat diffusion, kernel smoothing and recently popular diffusion wavelets. Unlike many previous partial differential equation based approaches involving diffusion, our approach represents the solution of diffusion analytically, reducing numerical inaccuracy and slow convergence. The numerical implementation is validated on a unit sphere using spherical harmonics. As an illustration, we have applied the method in characterizing the localized growth pattern of mandible surfaces obtained in CT images from subjects between ages 0 and 20 years by regressing the length of displacement vectors with respect to the template surface. PMID:25791435
Li, Guowei; Thabane, Lehana; Delate, Thomas; Witt, Daniel M.; Levine, Mitchell A. H.; Cheng, Ji; Holbrook, Anne
2016-01-01
Objectives To construct and validate a prediction model for individual combined benefit and harm outcomes (stroke with no major bleeding, major bleeding with no stroke, neither event, or both) in patients with atrial fibrillation (AF) with and without warfarin therapy. Methods Using the Kaiser Permanente Colorado databases, we included patients newly diagnosed with AF between January 1, 2005 and December 31, 2012 for model construction and validation. The primary outcome was a prediction model of composite of stroke or major bleeding using polytomous logistic regression (PLR) modelling. The secondary outcome was a prediction model of all-cause mortality using the Cox regression modelling. Results We included 9074 patients with 4537 and 4537 warfarin users and non-users, respectively. In the derivation cohort (n = 4632), there were 136 strokes (2.94%), 280 major bleedings (6.04%) and 1194 deaths (25.78%) occurred. In the prediction models, warfarin use was not significantly associated with risk of stroke, but increased the risk of major bleeding and decreased the risk of death. Both the PLR and Cox models were robust, internally and externally validated, and with acceptable model performances. Conclusions In this study, we introduce a new methodology for predicting individual combined benefit and harm outcomes associated with warfarin therapy for patients with AF. Should this approach be validated in other patient populations, it has potential advantages over existing risk stratification approaches as a patient-physician aid for shared decision-making PMID:27513986
Evaluating the perennial stream using logistic regression in central Taiwan
NASA Astrophysics Data System (ADS)
Ruljigaljig, T.; Cheng, Y. S.; Lin, H. I.; Lee, C. H.; Yu, T. T.
2014-12-01
This study produces a perennial stream head potential map, based on a logistic regression method with a Geographic Information System (GIS). Perennial stream initiation locations, indicates the location of the groundwater and surface contact, were identified in the study area from field survey. The perennial stream potential map in central Taiwan was constructed using the relationship between perennial stream and their causative factors, such as Catchment area, slope gradient, aspect, elevation, groundwater recharge and precipitation. Here, the field surveys of 272 streams were determined in the study area. The areas under the curve for logistic regression methods were calculated as 0.87. The results illustrate the importance of catchment area and groundwater recharge as key factors within the model. The results obtained from the model within the GIS were then used to produce a map of perennial stream and estimate the location of perennial stream head.
The Effects of Local Economic Conditions on Navy Enlistments.
1980-03-18
Standard Metropolitan Statistical Area (SMSA) as the basic economic unit, cross-sectional regression models were constructed for enlistment rate, recruiter...to eligible population suggesting that a cheaper alternative to raising mili- tary wages would be to increase the number of recruiters. Arima (1978...is faced with a number of cri- teria that must be satisfied by an acceptable test variable. As with other variables included in the model , economic
A Global Model for Bankruptcy Prediction
Alaminos, David; del Castillo, Agustín; Fernández, Manuel Ángel
2016-01-01
The recent world financial crisis has increased the number of bankruptcies in numerous countries and has resulted in a new area of research which responds to the need to predict this phenomenon, not only at the level of individual countries, but also at a global level, offering explanations of the common characteristics shared by the affected companies. Nevertheless, few studies focus on the prediction of bankruptcies globally. In order to compensate for this lack of empirical literature, this study has used a methodological framework of logistic regression to construct predictive bankruptcy models for Asia, Europe and America, and other global models for the whole world. The objective is to construct a global model with a high capacity for predicting bankruptcy in any region of the world. The results obtained have allowed us to confirm the superiority of the global model in comparison to regional models over periods of up to three years prior to bankruptcy. PMID:27880810
NASA Astrophysics Data System (ADS)
Liu, Ronghua; Sun, Qiaofeng; Hu, Tian; Li, Lian; Nie, Lei; Wang, Jiayue; Zhou, Wanhui; Zang, Hengchang
2018-03-01
As a powerful process analytical technology (PAT) tool, near infrared (NIR) spectroscopy has been widely used in real-time monitoring. In this study, NIR spectroscopy was applied to monitor multi-parameters of traditional Chinese medicine (TCM) Shenzhiling oral liquid during the concentration process to guarantee the quality of products. Five lab scale batches were employed to construct quantitative models to determine five chemical ingredients and physical change (samples density) during concentration process. The paeoniflorin, albiflorin, liquiritin and samples density were modeled by partial least square regression (PLSR), while the content of the glycyrrhizic acid and cinnamic acid were modeled by support vector machine regression (SVMR). Standard normal variate (SNV) and/or Savitzkye-Golay (SG) smoothing with derivative methods were adopted for spectra pretreatment. Variable selection methods including correlation coefficient (CC), competitive adaptive reweighted sampling (CARS) and interval partial least squares regression (iPLS) were performed for optimizing the models. The results indicated that NIR spectroscopy was an effective tool to successfully monitoring the concentration process of Shenzhiling oral liquid.
NASA Astrophysics Data System (ADS)
Oh, Hyun-Joo; Lee, Saro; Chotikasathien, Wisut; Kim, Chang Hwan; Kwon, Ju Hyoung
2009-04-01
For predictive landslide susceptibility mapping, this study applied and verified probability model, the frequency ratio and statistical model, logistic regression at Pechabun, Thailand, using a geographic information system (GIS) and remote sensing. Landslide locations were identified in the study area from interpretation of aerial photographs and field surveys, and maps of the topography, geology and land cover were constructed to spatial database. The factors that influence landslide occurrence, such as slope gradient, slope aspect and curvature of topography and distance from drainage were calculated from the topographic database. Lithology and distance from fault were extracted and calculated from the geology database. Land cover was classified from Landsat TM satellite image. The frequency ratio and logistic regression coefficient were overlaid for landslide susceptibility mapping as each factor’s ratings. Then the landslide susceptibility map was verified and compared using the existing landslide location. As the verification results, the frequency ratio model showed 76.39% and logistic regression model showed 70.42% in prediction accuracy. The method can be used to reduce hazards associated with landslides and to plan land cover.
ERIC Educational Resources Information Center
Chen, Shih-Neng; Tseng, Jauling
2010-01-01
Objective: To assess various marginal effects of nutrient intakes, health behaviours and nutrition knowledge on the entire distribution of body mass index (BMI) across individuals. Design: Quantitative and distributional study. Setting: Taiwan. Methods: This study applies Becker's (1965) model of health production to construct an individual's BMI…
The Effects of Televised Political Advertisements on Voter Perceptions about Candidates.
ERIC Educational Resources Information Center
Baskin, Otis Wayne
This study investigated whether candidate images could be designated as primarily either stimulus- or perceiver-determined and if a multiple regression model could be constructed to predict candidate image ratings from pre-stimulus perceptions of the candidate's party and post-stimulus ratings of the advertisement. One hundred twenty subjects were…
ERIC Educational Resources Information Center
Zullig, Keith; Ubbes, Valerie A.; Pyle, Jennifer; Valois, Robert F.
2006-01-01
This study explored the relationships among weight perceptions, dieting behavior, and breakfast eating in 4597 public high school adolescents using the Centers for Disease Control and Prevention Youth Risk Behavior Survey. Adjusted multiple logistic regression models were constructed separately for race and gender groups via SUDAAN (Survey Data…
Avalos, Marta; Adroher, Nuria Duran; Lagarde, Emmanuel; Thiessard, Frantz; Grandvalet, Yves; Contrand, Benjamin; Orriols, Ludivine
2012-09-01
Large data sets with many variables provide particular challenges when constructing analytic models. Lasso-related methods provide a useful tool, although one that remains unfamiliar to most epidemiologists. We illustrate the application of lasso methods in an analysis of the impact of prescribed drugs on the risk of a road traffic crash, using a large French nationwide database (PLoS Med 2010;7:e1000366). In the original case-control study, the authors analyzed each exposure separately. We use the lasso method, which can simultaneously perform estimation and variable selection in a single model. We compare point estimates and confidence intervals using (1) a separate logistic regression model for each drug with a Bonferroni correction and (2) lasso shrinkage logistic regression analysis. Shrinkage regression had little effect on (bias corrected) point estimates, but led to less conservative results, noticeably for drugs with moderate levels of exposure. Carbamates, carboxamide derivative and fatty acid derivative antiepileptics, drugs used in opioid dependence, and mineral supplements of potassium showed stronger associations. Lasso is a relevant method in the analysis of databases with large number of exposures and can be recommended as an alternative to conventional strategies.
Application of near-infrared spectroscopy in the detection of fat-soluble vitamins in premix feed
NASA Astrophysics Data System (ADS)
Jia, Lian Ping; Tian, Shu Li; Zheng, Xue Cong; Jiao, Peng; Jiang, Xun Peng
2018-02-01
Vitamin is the organic compound and necessary for animal physiological maintenance. The rapid determination of the content of different vitamins in premix feed can help to achieve accurate diets and efficient feeding. Compared with high-performance liquid chromatography and other wet chemical methods, near-infrared spectroscopy is a fast, non-destructive, non-polluting method. 168 samples of premix feed were collected and the contents of vitamin A, vitamin E and vitamin D3 were detected by the standard method. The near-infrared spectra of samples ranging from 10 000 to 4 000 cm-1 were obtained. Partial least squares regression (PLSR) and support vector machine regression (SVMR) were used to construct the quantitative model. The results showed that the RMSEP of PLSR model of vitamin A, vitamin E and vitamin D3 were 0.43×107 IU/kg, 0.09×105 IU/kg and 0.17×107 IU/kg, respectively. The RMSEP of SVMR model was 0.45×107 IU/kg, 0.11×105 IU/kg and 0.18×107 IU/kg. Compared with nonlinear regression method (SVMR), linear regression method (PLSR) is more suitable for the quantitative analysis of vitamins in premix feed.
Kim, Jung Kwon; Ha, Seung Beom; Jeon, Chan Hoo; Oh, Jong Jin; Cho, Sung Yong; Oh, Seung-June; Kim, Hyeon Hoe; Jeong, Chang Wook
2016-01-01
Purpose Shock-wave lithotripsy (SWL) is accepted as the first line treatment modality for uncomplicated upper urinary tract stones; however, validated prediction models with regards to stone-free rates (SFRs) are still needed. We aimed to develop nomograms predicting SFRs after the first and within the third session of SWL. Computed tomography (CT) information was also modeled for constructing nomograms. Materials and Methods From March 2006 to December 2013, 3028 patients were treated with SWL for ureter and renal stones at our three tertiary institutions. Four cohorts were constructed: Total-development, Total-validation, CT-development, and CT-validation cohorts. The nomograms were developed using multivariate logistic regression models with selected significant variables in a univariate logistic regression model. A C-index was used to assess the discrimination accuracy of nomograms and calibration plots were used to analyze the consistency of prediction. Results The SFR, after the first and within the third session, was 48.3% and 68.8%, respectively. Significant variables were sex, stone location, stone number, and maximal stone diameter in the Total-development cohort, and mean Hounsfield unit (HU) and grade of hydronephrosis (HN) were additional parameters in the CT-development cohort. The C-indices were 0.712 and 0.723 for after the first and within the third session of SWL in the Total-development cohort, and 0.755 and 0.756, in the CT-development cohort, respectively. The calibration plots showed good correspondences. Conclusions We constructed and validated nomograms to predict SFR after SWL. To the best of our knowledge, these are the first graphical nomograms to be modeled with CT information. These may be useful for patient counseling and treatment decision-making. PMID:26890006
Carnahan, Brian; Meyer, Gérard; Kuntz, Lois-Ann
2003-01-01
Multivariate classification models play an increasingly important role in human factors research. In the past, these models have been based primarily on discriminant analysis and logistic regression. Models developed from machine learning research offer the human factors professional a viable alternative to these traditional statistical classification methods. To illustrate this point, two machine learning approaches--genetic programming and decision tree induction--were used to construct classification models designed to predict whether or not a student truck driver would pass his or her commercial driver license (CDL) examination. The models were developed and validated using the curriculum scores and CDL exam performances of 37 student truck drivers who had completed a 320-hr driver training course. Results indicated that the machine learning classification models were superior to discriminant analysis and logistic regression in terms of predictive accuracy. Actual or potential applications of this research include the creation of models that more accurately predict human performance outcomes.
Nguyen, X Cuong; Chang, S Woong; Nguyen, Thi Loan; Ngo, H Hao; Kumar, Gopalakrishnan; Banu, J Rajesh; Vu, M Cuong; Le, H Sinh; Nguyen, D Duc
2018-09-15
A pilot-scale hybrid constructed wetland with vertical flow and horizontal flow in series was constructed and used to investigate organic material and nutrient removal rate constants for wastewater treatment and establish a practical predictive model for use. For this purpose, the performance of multiple parameters was statistically evaluated during the process and predictive models were suggested. The measurement of the kinetic rate constant was based on the use of the first-order derivation and Monod kinetic derivation (Monod) paired with a plug flow reactor (PFR) and a continuously stirred tank reactor (CSTR). Both the Lindeman, Merenda, and Gold (LMG) analysis and Bayesian model averaging (BMA) method were employed for identifying the relative importance of variables and their optimal multiple regression (MR). The results showed that the first-order-PFR (M 2 ) model did not fit the data (P > 0.05, and R 2 < 0.5), whereas the first-order-CSTR (M 1 ) model for the chemical oxygen demand (COD Cr ) and Monod-CSTR (M 3 ) model for the COD Cr and ammonium nitrogen (NH 4 -N) showed a high correlation with the experimental data (R 2 > 0.5). The pollutant removal rates in the case of M 1 were 0.19 m/d (COD Cr ) and those for M 3 were 25.2 g/m 2 ∙d for COD Cr and 2.63 g/m 2 ∙d for NH 4 -N. By applying a multi-variable linear regression method, the optimal empirical models were established for predicting the final effluent concentration of five days' biochemical oxygen demand (BOD 5 ) and NH 4 -N. In general, the hydraulic loading rate was considered an important variable having a high value of relative importance, which appeared in all the optimal predictive models. Copyright © 2018 Elsevier Ltd. All rights reserved.
Chung, Moo K; Qiu, Anqi; Seo, Seongho; Vorperian, Houri K
2015-05-01
We present a novel kernel regression framework for smoothing scalar surface data using the Laplace-Beltrami eigenfunctions. Starting with the heat kernel constructed from the eigenfunctions, we formulate a new bivariate kernel regression framework as a weighted eigenfunction expansion with the heat kernel as the weights. The new kernel method is mathematically equivalent to isotropic heat diffusion, kernel smoothing and recently popular diffusion wavelets. The numerical implementation is validated on a unit sphere using spherical harmonics. As an illustration, the method is applied to characterize the localized growth pattern of mandible surfaces obtained in CT images between ages 0 and 20 by regressing the length of displacement vectors with respect to a surface template. Copyright © 2015 Elsevier B.V. All rights reserved.
Can Predictive Modeling Identify Head and Neck Oncology Patients at Risk for Readmission?
Manning, Amy M; Casper, Keith A; Peter, Kay St; Wilson, Keith M; Mark, Jonathan R; Collar, Ryan M
2018-05-01
Objective Unplanned readmission within 30 days is a contributor to health care costs in the United States. The use of predictive modeling during hospitalization to identify patients at risk for readmission offers a novel approach to quality improvement and cost reduction. Study Design Two-phase study including retrospective analysis of prospectively collected data followed by prospective longitudinal study. Setting Tertiary academic medical center. Subjects and Methods Prospectively collected data for patients undergoing surgical treatment for head and neck cancer from January 2013 to January 2015 were used to build predictive models for readmission within 30 days of discharge using logistic regression, classification and regression tree (CART) analysis, and random forests. One model (logistic regression) was then placed prospectively into the discharge workflow from March 2016 to May 2016 to determine the model's ability to predict which patients would be readmitted within 30 days. Results In total, 174 admissions had descriptive data. Thirty-two were excluded due to incomplete data. Logistic regression, CART, and random forest predictive models were constructed using the remaining 142 admissions. When applied to 106 consecutive prospective head and neck oncology patients at the time of discharge, the logistic regression model predicted readmissions with a specificity of 94%, a sensitivity of 47%, a negative predictive value of 90%, and a positive predictive value of 62% (odds ratio, 14.9; 95% confidence interval, 4.02-55.45). Conclusion Prospectively collected head and neck cancer databases can be used to develop predictive models that can accurately predict which patients will be readmitted. This offers valuable support for quality improvement initiatives and readmission-related cost reduction in head and neck cancer care.
Beukinga, Roelof J; Hulshoff, Jan B; van Dijk, Lisanne V; Muijs, Christina T; Burgerhof, Johannes G M; Kats-Ugurlu, Gursah; Slart, Riemer H J A; Slump, Cornelis H; Mul, Véronique E M; Plukker, John Th M
2017-05-01
Adequate prediction of tumor response to neoadjuvant chemoradiotherapy (nCRT) in esophageal cancer (EC) patients is important in a more personalized treatment. The current best clinical method to predict pathologic complete response is SUV max in 18 F-FDG PET/CT imaging. To improve the prediction of response, we constructed a model to predict complete response to nCRT in EC based on pretreatment clinical parameters and 18 F-FDG PET/CT-derived textural features. Methods: From a prospectively maintained single-institution database, we reviewed 97 consecutive patients with locally advanced EC and a pretreatment 18 F-FDG PET/CT scan between 2009 and 2015. All patients were treated with nCRT (carboplatin/paclitaxel/41.4 Gy) followed by esophagectomy. We analyzed clinical, geometric, and pretreatment textural features extracted from both 18 F-FDG PET and CT. The current most accurate prediction model with SUV max as a predictor variable was compared with 6 different response prediction models constructed using least absolute shrinkage and selection operator regularized logistic regression. Internal validation was performed to estimate the model's performances. Pathologic response was defined as complete versus incomplete response (Mandard tumor regression grade system 1 vs. 2-5). Results: Pathologic examination revealed 19 (19.6%) complete and 78 (80.4%) incomplete responders. Least absolute shrinkage and selection operator regularization selected the clinical parameters: histologic type and clinical T stage, the 18 F-FDG PET-derived textural feature long run low gray level emphasis, and the CT-derived textural feature run percentage. Introducing these variables to a logistic regression analysis showed areas under the receiver-operating-characteristic curve (AUCs) of 0.78 compared with 0.58 in the SUV max model. The discrimination slopes were 0.17 compared with 0.01, respectively. After internal validation, the AUCs decreased to 0.74 and 0.54, respectively. Conclusion: The predictive values of the constructed models were superior to the standard method (SUV max ). These results can be considered as an initial step in predicting tumor response to nCRT in locally advanced EC. Further research in refining the predictive value of these models is needed to justify omission of surgery. © 2017 by the Society of Nuclear Medicine and Molecular Imaging.
Detection of fraudulent financial statements using the hybrid data mining approach.
Chen, Suduan
2016-01-01
The purpose of this study is to construct a valid and rigorous fraudulent financial statement detection model. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements between the years 2002 and 2013. In the first stage, two decision tree algorithms, including the classification and regression trees (CART) and the Chi squared automatic interaction detector (CHAID) are applied in the selection of major variables. The second stage combines CART, CHAID, Bayesian belief network, support vector machine and artificial neural network in order to construct fraudulent financial statement detection models. According to the results, the detection performance of the CHAID-CART model is the most effective, with an overall accuracy of 87.97 % (the FFS detection accuracy is 92.69 %).
Developing a dengue forecast model using machine learning: A case study in China
Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-01-01
Background In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Methodology/Principal findings Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011–2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. Conclusion and significance The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics. PMID:29036169
Retrieval and Mapping of Heavy Metal Concentration in Soil Using Time Series Landsat 8 Imagery
NASA Astrophysics Data System (ADS)
Fang, Y.; Xu, L.; Peng, J.; Wang, H.; Wong, A.; Clausi, D. A.
2018-04-01
Heavy metal pollution is a critical global environmental problem which has always been a concern. Traditional approach to obtain heavy metal concentration relying on field sampling and lab testing is expensive and time consuming. Although many related studies use spectrometers data to build relational model between heavy metal concentration and spectra information, and then use the model to perform prediction using the hyperspectral imagery, this manner can hardly quickly and accurately map soil metal concentration of an area due to the discrepancies between spectrometers data and remote sensing imagery. Taking the advantage of easy accessibility of Landsat 8 data, this study utilizes Landsat 8 imagery to retrieve soil Cu concentration and mapping its distribution in the study area. To enlarge the spectral information for more accurate retrieval and mapping, 11 single date Landsat 8 imagery from 2013-2017 are selected to form a time series imagery. Three regression methods, partial least square regression (PLSR), artificial neural network (ANN) and support vector regression (SVR) are used to model construction. By comparing these models unbiasedly, the best model are selected to mapping Cu concentration distribution. The produced distribution map shows a good spatial autocorrelation and consistency with the mining area locations.
Sun, Yanqing; Sun, Liuquan; Zhou, Jie
2013-07-01
This paper studies the generalized semiparametric regression model for longitudinal data where the covariate effects are constant for some and time-varying for others. Different link functions can be used to allow more flexible modelling of longitudinal data. The nonparametric components of the model are estimated using a local linear estimating equation and the parametric components are estimated through a profile estimating function. The method automatically adjusts for heterogeneity of sampling times, allowing the sampling strategy to depend on the past sampling history as well as possibly time-dependent covariates without specifically model such dependence. A [Formula: see text]-fold cross-validation bandwidth selection is proposed as a working tool for locating an appropriate bandwidth. A criteria for selecting the link function is proposed to provide better fit of the data. Large sample properties of the proposed estimators are investigated. Large sample pointwise and simultaneous confidence intervals for the regression coefficients are constructed. Formal hypothesis testing procedures are proposed to check for the covariate effects and whether the effects are time-varying. A simulation study is conducted to examine the finite sample performances of the proposed estimation and hypothesis testing procedures. The methods are illustrated with a data example.
NASA Astrophysics Data System (ADS)
Yilmaz, Isik; Keskin, Inan; Marschalko, Marian; Bednarik, Martin
2010-05-01
This study compares the GIS based collapse susceptibility mapping methods such as; conditional probability (CP), logistic regression (LR) and artificial neural networks (ANN) applied in gypsum rock masses in Sivas basin (Turkey). Digital Elevation Model (DEM) was first constructed using GIS software. Collapse-related factors, directly or indirectly related to the causes of collapse occurrence, such as distance from faults, slope angle and aspect, topographical elevation, distance from drainage, topographic wetness index- TWI, stream power index- SPI, Normalized Difference Vegetation Index (NDVI) by means of vegetation cover, distance from roads and settlements were used in the collapse susceptibility analyses. In the last stage of the analyses, collapse susceptibility maps were produced from CP, LR and ANN models, and they were then compared by means of their validations. Area Under Curve (AUC) values obtained from all three methodologies showed that the map obtained from ANN model looks like more accurate than the other models, and the results also showed that the artificial neural networks is a usefull tool in preparation of collapse susceptibility map and highly compatible with GIS operating features. Key words: Collapse; doline; susceptibility map; gypsum; GIS; conditional probability; logistic regression; artificial neural networks.
[Multivariate Adaptive Regression Splines (MARS), an alternative for the analysis of time series].
Vanegas, Jairo; Vásquez, Fabián
Multivariate Adaptive Regression Splines (MARS) is a non-parametric modelling method that extends the linear model, incorporating nonlinearities and interactions between variables. It is a flexible tool that automates the construction of predictive models: selecting relevant variables, transforming the predictor variables, processing missing values and preventing overshooting using a self-test. It is also able to predict, taking into account structural factors that might influence the outcome variable, thereby generating hypothetical models. The end result could identify relevant cut-off points in data series. It is rarely used in health, so it is proposed as a tool for the evaluation of relevant public health indicators. For demonstrative purposes, data series regarding the mortality of children under 5 years of age in Costa Rica were used, comprising the period 1978-2008. Copyright © 2016 SESPAS. Publicado por Elsevier España, S.L.U. All rights reserved.
Forecast of severe fever with thrombocytopenia syndrome incidence with meteorological factors.
Sun, Ji-Min; Lu, Liang; Liu, Ke-Ke; Yang, Jun; Wu, Hai-Xia; Liu, Qi-Yong
2018-06-01
Severe fever with thrombocytopenia syndrome (SFTS) is emerging and some studies reported that SFTS incidence was associated with meteorological factors, while no report on SFTS forecast models was reported up to date. In this study, we constructed and compared three forecast models using autoregressive integrated moving average (ARIMA) model, negative binomial regression model (NBM), and quasi-Poisson generalized additive model (GAM). The dataset from 2011 to 2015 were used for model construction and the dataset in 2016 were used for external validity assessment. All the three models fitted the SFTS cases reasonably well during the training process and forecast process, while the NBM model forecasted better than other two models. Moreover, we demonstrated that temperature and relative humidity played key roles in explaining the temporal dynamics of SFTS occurrence. Our study contributes to better understanding of SFTS dynamics and provides predictive tools for the control and prevention of SFTS. Copyright © 2018 Elsevier B.V. All rights reserved.
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS. Copyright © 2013 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Zhang, Xuan; Li, Wei; Yin, Bin; Chen, Weizhong; Kelly, Declan P.; Wang, Xiaoxin; Zheng, Kaiyi; Du, Yiping
2013-10-01
Coffee is the most heavily consumed beverage in the world after water, for which quality is a key consideration in commercial trade. Therefore, caffeine content which has a significant effect on the final quality of the coffee products requires to be determined fast and reliably by new analytical techniques. The main purpose of this work was to establish a powerful and practical analytical method based on near infrared spectroscopy (NIRS) and chemometrics for quantitative determination of caffeine content in roasted Arabica coffees. Ground coffee samples within a wide range of roasted levels were analyzed by NIR, meanwhile, in which the caffeine contents were quantitative determined by the most commonly used HPLC-UV method as the reference values. Then calibration models based on chemometric analyses of the NIR spectral data and reference concentrations of coffee samples were developed. Partial least squares (PLS) regression was used to construct the models. Furthermore, diverse spectra pretreatment and variable selection techniques were applied in order to obtain robust and reliable reduced-spectrum regression models. Comparing the respective quality of the different models constructed, the application of second derivative pretreatment and stability competitive adaptive reweighted sampling (SCARS) variable selection provided a notably improved regression model, with root mean square error of cross validation (RMSECV) of 0.375 mg/g and correlation coefficient (R) of 0.918 at PLS factor of 7. An independent test set was used to assess the model, with the root mean square error of prediction (RMSEP) of 0.378 mg/g, mean relative error of 1.976% and mean relative standard deviation (RSD) of 1.707%. Thus, the results provided by the high-quality calibration model revealed the feasibility of NIR spectroscopy for at-line application to predict the caffeine content of unknown roasted coffee samples, thanks to the short analysis time of a few seconds and non-destructive advantages of NIRS.
Lee, Seokho; Shin, Hyejin; Lee, Sang Han
2016-12-01
Alzheimer's disease (AD) is usually diagnosed by clinicians through cognitive and functional performance test with a potential risk of misdiagnosis. Since the progression of AD is known to cause structural changes in the corpus callosum (CC), the CC thickness can be used as a functional covariate in AD classification problem for a diagnosis. However, misclassified class labels negatively impact the classification performance. Motivated by AD-CC association studies, we propose a logistic regression for functional data classification that is robust to misdiagnosis or label noise. Specifically, our logistic regression model is constructed by adopting individual intercepts to functional logistic regression model. This approach enables to indicate which observations are possibly mislabeled and also lead to a robust and efficient classifier. An effective algorithm using MM algorithm provides simple closed-form update formulas. We test our method using synthetic datasets to demonstrate its superiority over an existing method, and apply it to differentiating patients with AD from healthy normals based on CC from MRI. © 2016, The International Biometric Society.
Altubasi, Ibrahim M
2018-06-07
Knee osteoarthritis is a common and a disabling musculoskeletal disorder. Patients with knee osteoarthritis have activity limitations which are linked to the strength of the quadriceps muscle. Previous research reported that the relationship between quadriceps muscle strength and physical function is moderated by the level of knee joint frontal plane laxity. The purpose of the current study is to reexamine the moderation effect of the knee joint laxity as measured by stress radiographs on the relationship between quadriceps muscle strength and physical function. One-hundred and sixty osteoarthritis patients participated in this cross-sectional study. Isometric quadriceps muscle strength was measured using an isokinetic dynamometer. Self-rated and performance-based physical function were measured using the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) physical function subscale and Get Up and Go test, respectively. Stress radiographs which were taken while applying varus and valgus loads to knee using the TELOS device. Knee joint laxity was determined by measuring the distance between joint surfaces on the medial and lateral sides. Hierarchical multiple regression models were constructed to study the moderation effect of laxity on the strength function relationship. Two regression models were constructed for self-rated and performance-based function. After controlling for demographics, strength contributed significantly in the models. The addition of laxity and laxity-strength interaction did not add significant contributions in the regression models. Frontal plane knee joint laxity measured by stress radiographs does not moderate the relationship between quadriceps muscle strength and physical function in patients with osteoarthritis. Copyright © 2018 Elsevier B.V. All rights reserved.
Messner, Steven F.; Raffalovich, Lawrence E.; Sutton, Gretchen M.
2011-01-01
This paper assesses the extent to which the infant mortality rate might be treated as a “proxy” for poverty in research on cross-national variation in homicide rates. We have assembled a pooled, cross-sectional time-series dataset for 16 advanced nations over the 1993–2000 period that includes standard measures of infant mortality and homicide and also contains information on two commonly used “income-based” poverty measures: a measure intended to reflect “absolute” deprivation and a measure intended to reflect “relative” deprivation. With these data, we are able to assess the criterion validity of the infant mortality rate with reference to the two income-based poverty measures. We are also able to estimate the effects of the various indicators of disadvantage on homicide rates in regression models, thereby assessing construct validity. The results reveal that the infant mortality rate is more strongly correlated with “relative poverty” than with “absolute poverty,” although much unexplained variance remains. In the regression models, the measure of infant mortality and the relative poverty measure yield significant positive effects on homicide rates, while the absolute poverty measure does not exhibit any significant effects. Our analyses suggest that it would be premature to dismiss relative deprivation in cross-national research on homicide, and that disadvantage is best conceptualized and measured as a multidimensional construct. PMID:21643432
Wang, Wei; Griswold, Michael E
2016-11-30
The random effect Tobit model is a regression model that accommodates both left- and/or right-censoring and within-cluster dependence of the outcome variable. Regression coefficients of random effect Tobit models have conditional interpretations on a constructed latent dependent variable and do not provide inference of overall exposure effects on the original outcome scale. Marginalized random effects model (MREM) permits likelihood-based estimation of marginal mean parameters for the clustered data. For random effect Tobit models, we extend the MREM to marginalize over both the random effects and the normal space and boundary components of the censored response to estimate overall exposure effects at population level. We also extend the 'Average Predicted Value' method to estimate the model-predicted marginal means for each person under different exposure status in a designated reference group by integrating over the random effects and then use the calculated difference to assess the overall exposure effect. The maximum likelihood estimation is proposed utilizing a quasi-Newton optimization algorithm with Gauss-Hermite quadrature to approximate the integration of the random effects. We use these methods to carefully analyze two real datasets. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
ERIC Educational Resources Information Center
Malin, Heather; Han, Hyemin; Liauw, Indrawati
2017-01-01
This study investigated the effects of internal and demographic variables on civic development in late adolescence using the construct "civic purpose." We conducted surveys on civic engagement with 480 high school seniors, and surveyed them again 2 years later. Using multivariate regression and linear mixed models, we tested the main…
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Schinka, John A.; Curtiss, Glenn
2004-01-01
This study examined the contributions of the 5-Factor Model (FFM; P. T. Costa & R. R. McCrae, 1992) and RIASEC (J. L. Holland, 1994) constructs of consistency, differentiation, and person-environment congruence in predicting job performance ratings in a large sample (N = 514) of employees. Hierarchical regression analyses conducted separately by…
Economics of Education and Work Life Demand in Terms of Earnings and Skills
ERIC Educational Resources Information Center
Xia, Belle Selene; Liitiäinen, Elia
2014-01-01
This article uses data from a major international survey to construct earnings functions in terms of learning outcomes and variables related to working life in different European countries. In order to complement the extended earnings regression model, the authors have used partial correlation analysis and the analysis of covariance (ANCOVA) to…
ERIC Educational Resources Information Center
Daberkow, Kevin S.; Lin, Wei
2012-01-01
Nearly half a century of lottery scholarship has measured lottery tax incidence predominantly through either the Suits Index or regression analysis. The present study builds on historic lottery tax burden measurement to present a comprehensive set of tools to determine the tax incidence of individual games in addition to determining which lottery…
Casero-Alonso, V; López-Fidalgo, J; Torsney, B
2017-01-01
Binary response models are used in many real applications. For these models the Fisher information matrix (FIM) is proportional to the FIM of a weighted simple linear regression model. The same is also true when the weight function has a finite integral. Thus, optimal designs for one binary model are also optimal for the corresponding weighted linear regression model. The main objective of this paper is to provide a tool for the construction of MV-optimal designs, minimizing the maximum of the variances of the estimates, for a general design space. MV-optimality is a potentially difficult criterion because of its nondifferentiability at equal variance designs. A methodology for obtaining MV-optimal designs where the design space is a compact interval [a, b] will be given for several standard weight functions. The methodology will allow us to build a user-friendly computer tool based on Mathematica to compute MV-optimal designs. Some illustrative examples will show a representation of MV-optimal designs in the Euclidean plane, taking a and b as the axes. The applet will be explained using two relevant models. In the first one the case of a weighted linear regression model is considered, where the weight function is directly chosen from a typical family. In the second example a binary response model is assumed, where the probability of the outcome is given by a typical probability distribution. Practitioners can use the provided applet to identify the solution and to know the exact support points and design weights. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Rupert, Michael G.
2003-01-01
Draft Federal regulations may require that each State develop a State Pesticide Management Plan for the herbicides atrazine, alachlor, metolachlor, and simazine. Maps were developed that the State of Colorado could use to predict the probability of detecting atrazine and desethyl-atrazine (a breakdown product of atrazine) in ground water in Colorado. These maps can be incorporated into the State Pesticide Management Plan and can help provide a sound hydrogeologic basis for atrazine management in Colorado. Maps showing the probability of detecting elevated nitrite plus nitrate as nitrogen (nitrate) concentrations in ground water in Colorado also were developed because nitrate is a contaminant of concern in many areas of Colorado. Maps showing the probability of detecting atrazine and(or) desethyl-atrazine (atrazine/DEA) at or greater than concentrations of 0.1 microgram per liter and nitrate concentrations in ground water greater than 5 milligrams per liter were developed as follows: (1) Ground-water quality data were overlaid with anthropogenic and hydrogeologic data using a geographic information system to produce a data set in which each well had corresponding data on atrazine use, fertilizer use, geology, hydrogeomorphic regions, land cover, precipitation, soils, and well construction. These data then were downloaded to a statistical software package for analysis by logistic regression. (2) Relations were observed between ground-water quality and the percentage of land-cover categories within circular regions (buffers) around wells. Several buffer sizes were evaluated; the buffer size that provided the strongest relation was selected for use in the logistic regression models. (3) Relations between concentrations of atrazine/DEA and nitrate in ground water and atrazine use, fertilizer use, geology, hydrogeomorphic regions, land cover, precipitation, soils, and well-construction data were evaluated, and several preliminary multivariate models with various combinations of independent variables were constructed. (4) The multivariate models that best predicted the presence of atrazine/DEA and elevated concentrations of nitrate in ground water were selected. (5) The accuracy of the multivariate models was confirmed by validating the models with an independent set of ground-water quality data. (6) The multivariate models were entered into a geographic information system and the probability maps were constructed.
Garcia, Tanya P; Ma, Yanyuan
2017-10-01
We develop consistent and efficient estimation of parameters in general regression models with mismeasured covariates. We assume the model error and covariate distributions are unspecified, and the measurement error distribution is a general parametric distribution with unknown variance-covariance. We construct root- n consistent, asymptotically normal and locally efficient estimators using the semiparametric efficient score. We do not estimate any unknown distribution or model error heteroskedasticity. Instead, we form the estimator under possibly incorrect working distribution models for the model error, error-prone covariate, or both. Empirical results demonstrate robustness to different incorrect working models in homoscedastic and heteroskedastic models with error-prone covariates.
Grobman, William A.; Lai, Yinglei; Landon, Mark B.; Spong, Catherine Y.; Leveno, Kenneth J.; Rouse, Dwight J.; Varner, Michael W.; Moawad, Atef H.; Simhan, Hyagriv N.; Harper, Margaret; Wapner, Ronald J.; Sorokin, Yoram; Miodovnik, Menachem; Carpenter, Marshall; O'sullivan, Mary J.; Sibai, Baha M.; Langer, Oded; Thorp, John M.; Ramin, Susan M.; Mercer, Brian M.
2010-01-01
Objective To construct a predictive model for vaginal birth after cesarean (VBAC) that combines factors that can be ascertained only as the pregnancy progresses with those known at initiation of prenatal care. Study design Using multivariable modeling, we constructed a predictive model for VBAC that included patient factors known at the initial prenatal visit as well as those that only became evident as the pregancy progressed to the admission for delivery. Results 9616 women were analyzed. The regression equation for VBAC success included multiple factors that could not be known at the first prenatal visit. The area under the curve for this model was significantly greater (P < .001) than that of a model that included only factors available at the first prenatal visit. Conclusion A prediction model for VBAC success that incorporates factors that can be ascertained only as the pregnancy progresses adds to the predictive accuracy of a model that uses only factors available at a first prenatal visit. PMID:19813165
Creasy, John M; Midya, Abhishek; Chakraborty, Jayasree; Adams, Lauryn B; Gomes, Camilla; Gonen, Mithat; Seastedt, Kenneth P; Sutton, Elizabeth J; Cercek, Andrea; Kemeny, Nancy E; Shia, Jinru; Balachandran, Vinod P; Kingham, T Peter; Allen, Peter J; DeMatteo, Ronald P; Jarnagin, William R; D'Angelica, Michael I; Do, Richard K G; Simpson, Amber L
2018-06-19
This study investigates whether quantitative image analysis of pretreatment CT scans can predict volumetric response to chemotherapy for patients with colorectal liver metastases (CRLM). Patients treated with chemotherapy for CRLM (hepatic artery infusion (HAI) combined with systemic or systemic alone) were included in the study. Patients were imaged at baseline and approximately 8 weeks after treatment. Response was measured as the percentage change in tumour volume from baseline. Quantitative imaging features were derived from the index hepatic tumour on pretreatment CT, and features statistically significant on univariate analysis were included in a linear regression model to predict volumetric response. The regression model was constructed from 70% of data, while 30% were reserved for testing. Test data were input into the trained model. Model performance was evaluated with mean absolute prediction error (MAPE) and R 2 . Clinicopatholologic factors were assessed for correlation with response. 157 patients were included, split into training (n = 110) and validation (n = 47) sets. MAPE from the multivariate linear regression model was 16.5% (R 2 = 0.774) and 21.5% in the training and validation sets, respectively. Stratified by HAI utilisation, MAPE in the validation set was 19.6% for HAI and 25.1% for systemic chemotherapy alone. Clinical factors associated with differences in median tumour response were treatment strategy, systemic chemotherapy regimen, age and KRAS mutation status (p < 0.05). Quantitative imaging features extracted from pretreatment CT are promising predictors of volumetric response to chemotherapy in patients with CRLM. Pretreatment predictors of response have the potential to better select patients for specific therapies. • Colorectal liver metastases (CRLM) are downsized with chemotherapy but predicting the patients that will respond to chemotherapy is currently not possible. • Heterogeneity and enhancement patterns of CRLM can be measured with quantitative imaging. • Prediction model constructed that predicts volumetric response with 20% error suggesting that quantitative imaging holds promise to better select patients for specific treatments.
Investigation using data from ERTS to develop and implement utilization of living marine resources
NASA Technical Reports Server (NTRS)
Stevenson, W. H. (Principal Investigator); Pastula, E. J., Jr.
1973-01-01
The author has identified the following significant results. The feasibility of utilizing ERTS-1 data in conjunction with aerial remote sensing and sea truth information to predict the distribution of menhaden in the Mississippi Sound during a specific time frame has been demonstrated by employing a number of uniquely designed empirical regression models. The construction of these models was made possible through innovative statistical routines specifically developed to meet the stated objectives.
Sandborgh, Maria; Johansson, Ann-Christin; Söderlund, Anne
2016-01-01
In the fear-avoidance (FA) model social cognitive constructs could add to explaining the disabling process in whiplash associated disorder (WAD). The aim was to exemplify the possible input from Social Cognitive Theory on the FA model. Specifically the role of functional self-efficacy and perceived responses from a spouse/intimate partner was studied. A cross-sectional and correlational design was used. Data from 64 patients with acute WAD were used. Measures were pain intensity measured with a numerical rating scale, the Pain Disability Index, support, punishing responses, solicitous responses, and distracting responses subscales from the Multidimensional Pain Inventory, the Catastrophizing subscale from the Coping Strategies Questionnaire, the Tampa Scale of Kinesiophobia, and the Self-Efficacy Scale. Bivariate correlational, simple linear regression, and multiple regression analyses were used. In the statistical prediction models high pain intensity indicated high punishing responses, which indicated high catastrophizing. High catastrophizing indicated high fear of movement, which indicated low self-efficacy. Low self-efficacy indicated high disability, which indicated high pain intensity. All independent variables together explained 66.4% of the variance in pain disability, p < 0.001. Results suggest a possible link between one aspect of the social environment, perceived punishing responses from a spouse/intimate partner, pain intensity, and catastrophizing. Further, results support a mediating role of self-efficacy between fear of movement and disability in WAD.
Hill, Mary Catherine
1992-01-01
This report documents a new version of the U.S. Geological Survey modular, three-dimensional, finite-difference, ground-water flow model (MODFLOW) which, with the new Parameter-Estimation Package that also is documented in this report, can be used to estimate parameters by nonlinear regression. The new version of MODFLOW is called MODFLOWP (pronounced MOD-FLOW*P), and functions nearly identically to MODFLOW when the ParameterEstimation Package is not used. Parameters are estimated by minimizing a weighted least-squares objective function by the modified Gauss-Newton method or by a conjugate-direction method. Parameters used to calculate the following MODFLOW model inputs can be estimated: Transmissivity and storage coefficient of confined layers; hydraulic conductivity and specific yield of unconfined layers; vertical leakance; vertical anisotropy (used to calculate vertical leakance); horizontal anisotropy; hydraulic conductance of the River, Streamflow-Routing, General-Head Boundary, and Drain Packages; areal recharge rates; maximum evapotranspiration; pumpage rates; and the hydraulic head at constant-head boundaries. Any spatial variation in parameters can be defined by the user. Data used to estimate parameters can include existing independent estimates of parameter values, observed hydraulic heads or temporal changes in hydraulic heads, and observed gains and losses along head-dependent boundaries (such as streams). Model output includes statistics for analyzing the parameter estimates and the model; these statistics can be used to quantify the reliability of the resulting model, to suggest changes in model construction, and to compare results of models constructed in different ways.
Learning a Health Knowledge Graph from Electronic Medical Records.
Rotmensch, Maya; Halpern, Yoni; Tlimat, Abdulhakim; Horng, Steven; Sontag, David
2017-07-20
Demand for clinical decision support systems in medicine and self-diagnostic symptom checkers has substantially increased in recent years. Existing platforms rely on knowledge bases manually compiled through a labor-intensive process or automatically derived using simple pairwise statistics. This study explored an automated process to learn high quality knowledge bases linking diseases and symptoms directly from electronic medical records. Medical concepts were extracted from 273,174 de-identified patient records and maximum likelihood estimation of three probabilistic models was used to automatically construct knowledge graphs: logistic regression, naive Bayes classifier and a Bayesian network using noisy OR gates. A graph of disease-symptom relationships was elicited from the learned parameters and the constructed knowledge graphs were evaluated and validated, with permission, against Google's manually-constructed knowledge graph and against expert physician opinions. Our study shows that direct and automated construction of high quality health knowledge graphs from medical records using rudimentary concept extraction is feasible. The noisy OR model produces a high quality knowledge graph reaching precision of 0.85 for a recall of 0.6 in the clinical evaluation. Noisy OR significantly outperforms all tested models across evaluation frameworks (p < 0.01).
Risk factors for injury among construction workers at Denver International Airport.
Lowery, J T; Borgerding, J A; Zhen, B; Glazner, J E; Bondy, J; Kreiss, K
1998-08-01
The Denver International Airport construction project provided a rare opportunity to identify risk factors for injury on a large construction project for which 769 contractors were hired to complete 2,843 construction contracts. Workers' compensation claims and payroll data for individual contracts were recorded in an administrative database developed by the project's Owner-Controlled Insurance Program. From claims andy payroll data linked with employee demographic information, we calculated injury rates per 200,000 person-hours by contract and over contract characteristics of interest. We used Poisson regression models to examine contract-specific risk factors in relation to total injuries, lost-work-time (LWT), and non-LWT injuries. We included contract-specific expected loss rates (ELRs) in the model to control for prevailing risk of work and used logistic regression methods to determine the association between LWT and non-LWT injuries on contracts. Injury rates were highest during the first year of construction, at the beginning of contracts, and among older workers. Risk for total and non-LWT injuries was elevated for building construction contracts, contract for special trades companies (SIC 17), contracts with payrolls over $1 million, and those with overtime payrolls greater than 20%. Risk for LWT injuries only was increased for site development contracts and contract starting in the first year of construction. Contracts experiencing one or more minor injuries were four times as likely to have at least one major injury (OR = 4.0, 95% CI (2.9, 5.5)). Enhancement of DIA's safety infrastructure during the second year of construction appears to have been effective in reducing serious (LWT) injures. The absence of correlation between injury rates among contracts belonging to the same company suggest that targeting of safety resources at the level of the contract may be an effective approach to injury prevention. Interventions focused on high-risk contracts, including those with considerable overtime work, contracts held by special trades contractors (SIC 17), and contracts belonging to small and mid-sized companies, and on high-risk workers, such as those new to a construction site or new to a contract may reduce injury burden on large construction sites. The join occurrence of minor and major injuries on a contract level suggests that surveillance of minor injuries may be useful in identifying opportunities for prevention of major injures.
Sources of Variability in Physical Activity Among Inactive People with Multiple Sclerosis.
Uszynski, Marcin K; Herring, Matthew P; Casey, Blathin; Hayes, Sara; Gallagher, Stephen; Motl, Robert W; Coote, Susan
2018-04-01
Evidence supports that physical activity (PA) improves symptoms of multiple sclerosis (MS). Although application of principles from Social Cognitive Theory (SCT) may facilitate positive changes in PA behaviour among people with multiple sclerosis (pwMS), the constructs often explain limited variance in PA. This study investigated the extent to which MS symptoms, including fatigue, depression, and walking limitations combined with the SCT constructs, explained more variance in PA than SCT constructs alone among pwMS. Baseline data, including objectively assessed PA, exercise self-efficacy, goal setting, outcome expectations, 6-min walk test, fatigue and depression, from 65 participants of the Step It Up randomized controlled trial completed in Ireland (2016), were included. Multiple regression models quantified variance explained in PA and independent associations of (1) SCT constructs, (2) symptoms and (3) SCT constructs and symptoms. Model 1 included exercise self-efficacy, exercise goal setting and multidimensional outcomes expectations for exercise and explained ~14% of the variance in PA (R 2 =0.144, p < 0.05). Model 2 included walking limitations, fatigue and depression and explained 20% of the variance in PA (R 2 =0.196, p < 0.01). Model 3 combined models 1 and 2 and explained variance increased to ~29% (R 2 =0.288; p<0.01). In Model 3, exercise self-efficacy (β=0.30, p < 0.05), walking limitations (β=0.32, p < 0.01), fatigue (β = -0.41, p < 0.01) and depression (β = 0.34, p < 0.05) were significantly and independently associated with PA. Findings suggest that relevant MS symptoms improved by PA, including fatigue, depression and walking limitations, and SCT constructs together explained more variance in PA than SCT constructs alone, providing support for targeting both SCT constructs and these symptoms in the multifactorial promotion of PA among pwMS.
Zoellner, Jamie M; Porter, Kathleen J; Chen, Yvonnes; Hedrick, Valisa E; You, Wen; Hickman, Maja; Estabrooks, Paul A
2017-05-01
Guided by the theory of planned behaviour (TPB) and health literacy concepts, SIPsmartER is a six-month multicomponent intervention effective at improving SSB behaviours. Using SIPsmartER data, this study explores prediction of SSB behavioural intention (BI) and behaviour from TPB constructs using: (1) cross-sectional and prospective models and (2) 11 single-item assessments from interactive voice response (IVR) technology. Quasi-experimental design, including pre- and post-outcome data and repeated-measures process data of 155 intervention participants. Validated multi-item TPB measures, single-item TPB measures, and self-reported SSB behaviours. Hypothesised relationships were investigated using correlation and multiple regression models. TPB constructs explained 32% of the variance cross sectionally and 20% prospectively in BI; and explained 13-20% of variance cross sectionally and 6% prospectively. Single-item scale models were significant, yet explained less variance. All IVR models predicting BI (average 21%, range 6-38%) and behaviour (average 30%, range 6-55%) were significant. Findings are interpreted in the context of other cross-sectional, prospective and experimental TPB health and dietary studies. Findings advance experimental application of the TPB, including understanding constructs at outcome and process time points and applying theory in all intervention development, implementation and evaluation phases.
Construction of the Second Quito Astrolabe Catalogue
NASA Astrophysics Data System (ADS)
Kolesnik, Y. B.
1994-03-01
A method for astrolabe catalogue construction is presented. It is based on classical concepts, but the model of conditional equations for the group reduction is modified, additional parameters being introduced in the step- wise regressions. The chain adjustment is neglected, and the advantages of this approach are discussed. The method has been applied to the data obtained with the astrolabe of the Quito Astronomical Observatory from 1964 to 1983. Various characteristics of the catalogue produced with this method are compared with those due to the rigorous classical method. Some improvement both in systematic and random errors is outlined.
Schmiege, Sarah J; Bryan, Angela D
2016-04-01
Justice-involved adolescents engage in high levels of risky sexual behavior and substance use, and understanding potential relationships among these constructs is important for effective HIV/STI prevention. A regression mixture modeling approach was used to determine whether subgroups could be identified based on the regression of two indicators of sexual risk (condom use and frequency of intercourse) on three measures of substance use (alcohol, marijuana and hard drugs). Three classes were observed among n = 596 adolescents on probation: none of the substances predicted outcomes for approximately 18 % of the sample; alcohol and marijuana use were predictive for approximately 59 % of the sample, and marijuana use and hard drug use were predictive in approximately 23 % of the sample. Demographic, individual difference, and additional sexual and substance use risk variables were examined in relation to class membership. Findings are discussed in terms of understanding profiles of risk behavior among at-risk youth.
Montaño, Daniel E; Kasprzyk, Danuta; Hamilton, Deven T; Tshimanga, Mufuta; Gorn, Gerald
2014-05-01
Male circumcision (MC) reduces HIV acquisition among men, leading WHO/UNAIDS to recommend a goal to circumcise 80 % of men in high HIV prevalence countries. Significant investment to increase MC capacity in priority countries was made, yet only 5 % of the goal has been achieved in Zimbabwe. The integrated behavioral model (IBM) was used as a framework to investigate the factors affecting MC motivation among men in Zimbabwe. A survey instrument was designed based on elicitation study results, and administered to a representative household-based sample of 1,201 men aged 18-30 from two urban and two rural areas in Zimbabwe. Multiple regression analysis found all five IBM constructs significantly explained MC Intention. Nearly all beliefs underlying the IBM constructs were significantly correlated with MC Intention. Stepwise regression analysis of beliefs underlying each construct respectively found that 13 behavioral beliefs, 5 normative beliefs, 4 descriptive norm beliefs, 6 efficacy beliefs, and 10 control beliefs were significant in explaining MC Intention. A final stepwise regression of the five sets of significant IBM construct beliefs identified 14 key beliefs that best explain Intention. Similar analyses were carried out with subgroups of men by urban-rural and age. Different sets of behavioral, normative, efficacy, and control beliefs were significant for each sub-group, suggesting communication messages need to be targeted to be most effective for sub-groups. Implications for the design of effective MC demand creation messages are discussed. This study demonstrates the application of theory-driven research to identify evidence-based targets for intervention messages to increase men's motivation to get circumcised and thereby improve demand for male circumcision.
Prediction models for Arabica coffee beverage quality based on aroma analyses and chemometrics.
Ribeiro, J S; Augusto, F; Salva, T J G; Ferreira, M M C
2012-11-15
In this work, soft modeling based on chemometric analyses of coffee beverage sensory data and the chromatographic profiles of volatile roasted coffee compounds is proposed to predict the scores of acidity, bitterness, flavor, cleanliness, body, and overall quality of the coffee beverage. A partial least squares (PLS) regression method was used to construct the models. The ordered predictor selection (OPS) algorithm was applied to select the compounds for the regression model of each sensory attribute in order to take only significant chromatographic peaks into account. The prediction errors of these models, using 4 or 5 latent variables, were equal to 0.28, 0.33, 0.35, 0.33, 0.34 and 0.41, for each of the attributes and compatible with the errors of the mean scores of the experts. Thus, the results proved the feasibility of using a similar methodology in on-line or routine applications to predict the sensory quality of Brazilian Arabica coffee. Copyright © 2012 Elsevier B.V. All rights reserved.
How Do DSM-5 Personality Traits Align With Schema Therapy Constructs?
Bach, Bo; Lee, Christopher; Mortensen, Erik Lykke; Simonsen, Erik
2016-08-01
DSM-5 offers an alternative model of personality pathology that includes 25 traits. Although personality disorders are mostly treated with psychotherapy, the correspondence between DSM-5 traits and concepts in evidence-based psychotherapy has not yet been evaluated adequately. Suitably, schema therapy was developed for treating personality disorders, and it has achieved promising evidence. The authors examined associations between DSM-5 traits and schema therapy constructs in a mixed sample of 662 adults, including 312 clinical participants. Associations were investigated in terms of factor loadings and regression coefficients in relation to five domains, followed by specific correlations among all constructs. The results indicated conceptually coherent associations, and 15 of 25 traits were strongly related to relevant schema therapy constructs. Conclusively, DSM-5 traits may be considered expressions of schema therapy constructs, which psychotherapists might take advantage of in terms of case formulation and targets of treatment. In turn, schema therapy constructs add theoretical understanding to DSM-5 traits.
Regression analysis for solving diagnosis problem of children's health
NASA Astrophysics Data System (ADS)
Cherkashina, Yu A.; Gerget, O. M.
2016-04-01
The paper includes results of scientific researches. These researches are devoted to the application of statistical techniques, namely, regression analysis, to assess the health status of children in the neonatal period based on medical data (hemostatic parameters, parameters of blood tests, the gestational age, vascular-endothelial growth factor) measured at 3-5 days of children's life. In this paper a detailed description of the studied medical data is given. A binary logistic regression procedure is discussed in the paper. Basic results of the research are presented. A classification table of predicted values and factual observed values is shown, the overall percentage of correct recognition is determined. Regression equation coefficients are calculated, the general regression equation is written based on them. Based on the results of logistic regression, ROC analysis was performed, sensitivity and specificity of the model are calculated and ROC curves are constructed. These mathematical techniques allow carrying out diagnostics of health of children providing a high quality of recognition. The results make a significant contribution to the development of evidence-based medicine and have a high practical importance in the professional activity of the author.
NASA Astrophysics Data System (ADS)
Alekseenko, M. A.; Gendrina, I. Yu.
2017-11-01
Recently, due to the abundance of various types of observational data in the systems of vision through the atmosphere and the need for their processing, the use of various methods of statistical research in the study of such systems as correlation-regression analysis, dynamic series, variance analysis, etc. is actual. We have attempted to apply elements of correlation-regression analysis for the study and subsequent prediction of the patterns of radiation transfer in these systems same as in the construction of radiation models of the atmosphere. In this paper, we present some results of statistical processing of the results of numerical simulation of the characteristics of vision systems through the atmosphere obtained with the help of a special software package.1
A Proposal for Phase 4 of the Forest Inventory and Analysis Program
Ronald E. McRoberts
2005-01-01
Maps of forest cover were constructed using observations from forest inventory plots, Landsat Thematic Mapper satellite imagery, and a logistic regression model. Estimates of mean proportion forest area and the variance of the mean were calculated for circular study areas with radii ranging from 1 km to 15 km. The spatial correlation among pixel predictions was...
Accounting for informatively missing data in logistic regression by means of reassessment sampling.
Lin, Ji; Lyles, Robert H
2015-05-20
We explore the 'reassessment' design in a logistic regression setting, where a second wave of sampling is applied to recover a portion of the missing data on a binary exposure and/or outcome variable. We construct a joint likelihood function based on the original model of interest and a model for the missing data mechanism, with emphasis on non-ignorable missingness. The estimation is carried out by numerical maximization of the joint likelihood function with close approximation of the accompanying Hessian matrix, using sharable programs that take advantage of general optimization routines in standard software. We show how likelihood ratio tests can be used for model selection and how they facilitate direct hypothesis testing for whether missingness is at random. Examples and simulations are presented to demonstrate the performance of the proposed method. Copyright © 2015 John Wiley & Sons, Ltd.
Liu, Fengchen; Porco, Travis C.; Amza, Abdou; Kadri, Boubacar; Nassirou, Baido; West, Sheila K.; Bailey, Robin L.; Keenan, Jeremy D.; Solomon, Anthony W.; Emerson, Paul M.; Gambhir, Manoj; Lietman, Thomas M.
2015-01-01
Background Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. Methods The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts’ opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon’s signed-rank statistic. Findings Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher’s information. Each individual expert’s forecast was poorer than the sum of experts. Interpretation Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models. PMID:26302380
Liu, Fengchen; Porco, Travis C; Amza, Abdou; Kadri, Boubacar; Nassirou, Baido; West, Sheila K; Bailey, Robin L; Keenan, Jeremy D; Solomon, Anthony W; Emerson, Paul M; Gambhir, Manoj; Lietman, Thomas M
2015-08-01
Trachoma programs rely on guidelines made in large part using expert opinion of what will happen with and without intervention. Large community-randomized trials offer an opportunity to actually compare forecasting methods in a masked fashion. The Program for the Rapid Elimination of Trachoma trials estimated longitudinal prevalence of ocular chlamydial infection from 24 communities treated annually with mass azithromycin. Given antibiotic coverage and biannual assessments from baseline through 30 months, forecasts of the prevalence of infection in each of the 24 communities at 36 months were made by three methods: the sum of 15 experts' opinion, statistical regression of the square-root-transformed prevalence, and a stochastic hidden Markov model of infection transmission (Susceptible-Infectious-Susceptible, or SIS model). All forecasters were masked to the 36-month results and to the other forecasts. Forecasts of the 24 communities were scored by the likelihood of the observed results and compared using Wilcoxon's signed-rank statistic. Regression and SIS hidden Markov models had significantly better likelihood than community expert opinion (p = 0.004 and p = 0.01, respectively). All forecasts scored better when perturbed to decrease Fisher's information. Each individual expert's forecast was poorer than the sum of experts. Regression and SIS models performed significantly better than expert opinion, although all forecasts were overly confident. Further model refinements may score better, although would need to be tested and compared in new masked studies. Construction of guidelines that rely on forecasting future prevalence could consider use of mathematical and statistical models. Clinicaltrials.gov NCT00792922.
Maertens de Noordhout, Charline; Devleesschauwer, Brecht; Salomon, Joshua A; Turner, Heather; Cassini, Alessandro; Colzani, Edoardo; Speybroeck, Niko; Polinder, Suzanne; Kretzschmar, Mirjam E; Havelaar, Arie H; Haagsma, Juanita A
2018-01-01
Abstract Background In 2015, new disability weights (DWs) for infectious diseases were constructed based on data from four European countries. In this paper, we evaluated if country, age, sex, disease experience status, income and educational levels have an impact on these DWs. Methods We analyzed paired comparison responses of the European DW study by participants’ characteristics with separate probit regression models. To evaluate the effect of participants’ characteristics, we performed correlation analyses between countries and within country by respondent characteristics and constructed seven probit regression models, including a null model and six models containing participants’ characteristics. We compared these seven models using Akaike Information Criterion (AIC). Results According to AIC, the probit model including country as covariate was the best model. We found a lower correlation of the probit coefficients between countries and income levels (range rs: 0.97–0.99, P < 0.01) than between age groups (range rs: 0.98–0.99, P < 0.01), educational level (range rs: 0.98–0.99, P < 0.01), sex (rs = 0.99, P < 0.01) and disease status (rs = 0.99, P < 0.01). Within country the lowest correlations of the probit coefficients were between low and high income level (range rs = 0.89–0.94, P < 0.01). Conclusions We observed variations in health valuation across countries and within country between income levels. These observations should be further explored in a systematic way, also in non-European countries. We recommend future researches studying the effect of other characteristics of respondents on health assessment. PMID:29020343
Prediction equations of forced oscillation technique: the insidious role of collinearity.
Narchi, Hassib; AlBlooshi, Afaf
2018-03-27
Many studies have reported reference data for forced oscillation technique (FOT) in healthy children. The prediction equation of FOT parameters were derived from a multivariable regression model examining the effect of age, gender, weight and height on each parameter. As many of these variables are likely to be correlated, collinearity might have affected the accuracy of the model, potentially resulting in misleading, erroneous or difficult to interpret conclusions.The aim of this work was: To review all FOT publications in children since 2005 to analyze whether collinearity was considered in the construction of the published prediction equations. Then to compare these prediction equations with our own study. And to analyse, in our study, how collinearity between the explanatory variables might affect the predicted equations if it was not considered in the model. The results showed that none of the ten reviewed studies had stated whether collinearity was checked for. Half of the reports had also included in their equations variables which are physiologically correlated, such as age, weight and height. The predicted resistance varied by up to 28% amongst these studies. And in our study, multicollinearity was identified between the explanatory variables initially considered for the regression model (age, weight and height). Ignoring it would have resulted in inaccuracies in the coefficients of the equation, their signs (positive or negative), their 95% confidence intervals, their significance level and the model goodness of fit. In Conclusion with inaccurately constructed and improperly reported models, understanding the results and reproducing the models for future research might be compromised.
Montesinos-López, Abelardo; Montesinos-López, Osval A; Cuevas, Jaime; Mata-López, Walter A; Burgueño, Juan; Mondal, Sushismita; Huerta, Julio; Singh, Ravi; Autrique, Enrique; González-Pérez, Lorena; Crossa, José
2017-01-01
Modern agriculture uses hyperspectral cameras that provide hundreds of reflectance data at discrete narrow bands in many environments. These bands often cover the whole visible light spectrum and part of the infrared and ultraviolet light spectra. With the bands, vegetation indices are constructed for predicting agronomically important traits such as grain yield and biomass. However, since vegetation indices only use some wavelengths (referred to as bands), we propose using all bands simultaneously as predictor variables for the primary trait grain yield; results of several multi-environment maize (Aguate et al. in Crop Sci 57(5):1-8, 2017) and wheat (Montesinos-López et al. in Plant Methods 13(4):1-23, 2017) breeding trials indicated that using all bands produced better prediction accuracy than vegetation indices. However, until now, these prediction models have not accounted for the effects of genotype × environment (G × E) and band × environment (B × E) interactions incorporating genomic or pedigree information. In this study, we propose Bayesian functional regression models that take into account all available bands, genomic or pedigree information, the main effects of lines and environments, as well as G × E and B × E interaction effects. The data set used is comprised of 976 wheat lines evaluated for grain yield in three environments (Drought, Irrigated and Reduced Irrigation). The reflectance data were measured in 250 discrete narrow bands ranging from 392 to 851 nm (nm). The proposed Bayesian functional regression models were implemented using two types of basis: B-splines and Fourier. Results of the proposed Bayesian functional regression models, including all the wavelengths for predicting grain yield, were compared with results from conventional models with and without bands. We observed that the models with B × E interaction terms were the most accurate models, whereas the functional regression models (with B-splines and Fourier basis) and the conventional models performed similarly in terms of prediction accuracy. However, the functional regression models are more parsimonious and computationally more efficient because the number of beta coefficients to be estimated is 21 (number of basis), rather than estimating the 250 regression coefficients for all bands. In this study adding pedigree or genomic information did not increase prediction accuracy.
Chapman, Benjamin P.; Weiss, Alexander; Duberstein, Paul
2016-01-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in “big data” problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how three common SLT algorithms–Supervised Principal Components, Regularization, and Boosting—can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach—or perhaps because of them–SLT methods may hold value as a statistically rigorous approach to exploratory regression. PMID:27454257
Sung, Yao-Ting; Chen, Ju-Ling; Cha, Ji-Her; Tseng, Hou-Chiang; Chang, Tao-Hsing; Chang, Kuo-En
2015-06-01
Multilevel linguistic features have been proposed for discourse analysis, but there have been few applications of multilevel linguistic features to readability models and also few validations of such models. Most traditional readability formulae are based on generalized linear models (GLMs; e.g., discriminant analysis and multiple regression), but these models have to comply with certain statistical assumptions about data properties and include all of the data in formulae construction without pruning the outliers in advance. The use of such readability formulae tends to produce a low text classification accuracy, while using a support vector machine (SVM) in machine learning can enhance the classification outcome. The present study constructed readability models by integrating multilevel linguistic features with SVM, which is more appropriate for text classification. Taking the Chinese language as an example, this study developed 31 linguistic features as the predicting variables at the word, semantic, syntax, and cohesion levels, with grade levels of texts as the criterion variable. The study compared four types of readability models by integrating unilevel and multilevel linguistic features with GLMs and an SVM. The results indicate that adopting a multilevel approach in readability analysis provides a better representation of the complexities of both texts and the reading comprehension process.
Mathematical Modelling of Optimization of Structures of Monolithic Coverings Based on Liquid Rubbers
NASA Astrophysics Data System (ADS)
Turgumbayeva, R. Kh; Abdikarimov, M. N.; Mussabekov, R.; Sartayev, D. T.
2018-05-01
The paper considers optimization of monolithic coatings compositions using a computer and MPE methods. The goal of the paper was to construct a mathematical model of the complete factorial experiment taking into account its plan and conditions. Several regression equations were received. Dependence between content components and parameters of rubber, as well as the quantity of a rubber crumb, was considered. An optimal composition for manufacturing the material of monolithic coatings compositions was recommended based on experimental data.
Chahine, Teresa; Schultz, Bradley D.; Zartarian, Valerie G.; Xue, Jianping; Subramanian, SV; Levy, Jonathan I.
2011-01-01
Community-based cumulative risk assessment requires characterization of exposures to multiple chemical and non-chemical stressors, with consideration of how the non-chemical stressors may influence risks from chemical stressors. Residential radon provides an interesting case example, given its large attributable risk, effect modification due to smoking, and significant variability in radon concentrations and smoking patterns. In spite of this fact, no study to date has estimated geographic and sociodemographic patterns of both radon and smoking in a manner that would allow for inclusion of radon in community-based cumulative risk assessment. In this study, we apply multi-level regression models to explain variability in radon based on housing characteristics and geological variables, and construct a regression model predicting housing characteristics using U.S. Census data. Multi-level regression models of smoking based on predictors common to the housing model allow us to link the exposures. We estimate county-average lifetime lung cancer risks from radon ranging from 0.15 to 1.8 in 100, with high-risk clusters in areas and for subpopulations with high predicted radon and smoking rates. Our findings demonstrate the viability of screening-level assessment to characterize patterns of lung cancer risk from radon, with an approach that can be generalized to multiple chemical and non-chemical stressors. PMID:22016710
Empirical Likelihood in Nonignorable Covariate-Missing Data Problems.
Xie, Yanmei; Zhang, Biao
2017-04-20
Missing covariate data occurs often in regression analysis, which frequently arises in the health and social sciences as well as in survey sampling. We study methods for the analysis of a nonignorable covariate-missing data problem in an assumed conditional mean function when some covariates are completely observed but other covariates are missing for some subjects. We adopt the semiparametric perspective of Bartlett et al. (Improving upon the efficiency of complete case analysis when covariates are MNAR. Biostatistics 2014;15:719-30) on regression analyses with nonignorable missing covariates, in which they have introduced the use of two working models, the working probability model of missingness and the working conditional score model. In this paper, we study an empirical likelihood approach to nonignorable covariate-missing data problems with the objective of effectively utilizing the two working models in the analysis of covariate-missing data. We propose a unified approach to constructing a system of unbiased estimating equations, where there are more equations than unknown parameters of interest. One useful feature of these unbiased estimating equations is that they naturally incorporate the incomplete data into the data analysis, making it possible to seek efficient estimation of the parameter of interest even when the working regression function is not specified to be the optimal regression function. We apply the general methodology of empirical likelihood to optimally combine these unbiased estimating equations. We propose three maximum empirical likelihood estimators of the underlying regression parameters and compare their efficiencies with other existing competitors. We present a simulation study to compare the finite-sample performance of various methods with respect to bias, efficiency, and robustness to model misspecification. The proposed empirical likelihood method is also illustrated by an analysis of a data set from the US National Health and Nutrition Examination Survey (NHANES).
Blosnich, John; Bossarte, Robert
2011-02-01
Low-level violent behavior, particularly school bullying, remains a critical public health issue that has been associated with negative mental and physical health outcomes. School-based prevention programs, while a valuable line of defense to stave off bullying, have shown inconsistent results in terms of decreasing bullying. This study explored whether school safety measures (eg, security guards, cameras, ID badges) were associated with student reports of different forms of peer victimization related to bullying. Data came from the 2007 School Crime Supplement of the National Crime Victimization Survey. Chi-square tests of independence were used to examine differences among categorical variables. Logistic regression models were constructed for the peer victimization outcomes. A count variable was constructed among the bullying outcomes (0-7) with which a Poisson regression model was constructed to analyze school safety measures' impacts on degree of victimization. Of the various school safety measures, only having adults in hallways resulted in a significant reduction in odds of being physically bullied, having property vandalized, or having rumors spread. In terms of degree of victimization, having adults and/or staff supervising hallways was associated with an approximate 26% decrease in students experiencing an additional form of peer victimization. Results indicated that school safety measures overall were not associated with decreased reports of low-level violent behaviors related to bullying. More research is needed to further explore what best promotes comprehensive safety in schools. © 2011, American School Health Association.
Wasserkampf, A; Silva, M N; Santos, I C; Carraça, E V; Meis, J J M; Kremers, S P J; Teixeira, P J
2014-12-01
This study analyzed psychosocial predictors of the Theory of Planned Behavior (TPB) and Self-Determination Theory (SDT) and evaluated their associations with short- and long-term moderate plus vigorous physical activity (MVPA) and lifestyle physical activity (PA) outcomes in women who underwent a weight-management program. 221 participants (age 37.6 ± 7.02 years) completed a 12-month SDT-based lifestyle intervention and were followed-up for 24 months. Multiple linear regression analyses tested associations between psychosocial variables and self-reported short- and long-term PA outcomes. Regression analyses showed that control constructs of both theories were significant determinants of short- and long-term MVPA, whereas affective and self-determination variables were strong predictors of short- and long-term lifestyle PA. Regarding short-term prediction models, TPB constructs were stronger in predicting MVPA, whereas SDT was more effective in predicting lifestyle PA. For long-term models, both forms of PA were better predicted by SDT in comparison to TPB. These results highlight the importance of comparing health behavior theories to identify the mechanisms involved in the behavior change process. Control and competence constructs are crucial during early adoption of structured PA behaviors, whereas affective and intrinsic sources of motivation are more involved in incidental types of PA, particularly in relation to behavioral maintenance. © The Author 2014. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins. PMID:27418910
Asghari, Mehdi Poursheikhali; Hayatshahi, Sayyed Hamed Sadat; Abdolmaleki, Parviz
2012-01-01
From both the structural and functional points of view, β-turns play important biological roles in proteins. In the present study, a novel two-stage hybrid procedure has been developed to identify β-turns in proteins. Binary logistic regression was initially used for the first time to select significant sequence parameters in identification of β-turns due to a re-substitution test procedure. Sequence parameters were consisted of 80 amino acid positional occurrences and 20 amino acid percentages in sequence. Among these parameters, the most significant ones which were selected by binary logistic regression model, were percentages of Gly, Ser and the occurrence of Asn in position i+2, respectively, in sequence. These significant parameters have the highest effect on the constitution of a β-turn sequence. A neural network model was then constructed and fed by the parameters selected by binary logistic regression to build a hybrid predictor. The networks have been trained and tested on a non-homologous dataset of 565 protein chains. With applying a nine fold cross-validation test on the dataset, the network reached an overall accuracy (Qtotal) of 74, which is comparable with results of the other β-turn prediction methods. In conclusion, this study proves that the parameter selection ability of binary logistic regression together with the prediction capability of neural networks lead to the development of more precise models for identifying β-turns in proteins.
Liu, Zhenqiu; Sun, Fengzhu; Braun, Jonathan; McGovern, Dermot P B; Piantadosi, Steven
2015-04-01
Identifying disease associated taxa and constructing networks for bacteria interactions are two important tasks usually studied separately. In reality, differentiation of disease associated taxa and correlation among taxa may affect each other. One genus can be differentiated because it is highly correlated with another highly differentiated one. In addition, network structures may vary under different clinical conditions. Permutation tests are commonly used to detect differences between networks in distinct phenotypes, and they are time-consuming. In this manuscript, we propose a multilevel regularized regression method to simultaneously identify taxa and construct networks. We also extend the framework to allow construction of a common network and differentiated network together. An efficient algorithm with dual formulation is developed to deal with the large-scale n ≪ m problem with a large number of taxa (m) and a small number of samples (n) efficiently. The proposed method is regularized with a general Lp (p ∈ [0, 2]) penalty and models the effects of taxa abundance differentiation and correlation jointly. We demonstrate that it can identify both true and biologically significant genera and network structures. Software MLRR in MATLAB is available at http://biostatistics.csmc.edu/mlrr/. Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Assessment of Cognitively Stimulating Activity in a Spanish Population.
Morales Ortiz, Manuel; Fernández, Aaron
2018-05-01
Theoretical models of active ageing and cognitive reserve emphasize the importance of leading an active life to delay age-related cognitive deterioration and maintain good levels of well-being and personal satisfaction in the elderly. The objective of this research was to construct a scale to measure cognitively stimulating activities (CSA) in the Spanish language. The sample consisted of a total of 453 older persons. The scale was constructed from a list of 28 items and validated using structural equation models. The scale obtained showed a negative correlation with age and a positive correlation with education and physical activity. Using hierarchical regression models, CSAs were found to have a significant effect on attention when controlling for the effect of age and education. Likewise, a significant interaction between age and CSA was found on the measure of episodic memory. The validated CSA scale will enable the relationships between changes in cognitive functions and stimulating activities to be studied.
Enhanced index tracking modeling in portfolio optimization with mixed-integer programming z approach
NASA Astrophysics Data System (ADS)
Siew, Lam Weng; Jaaman, Saiful Hafizah Hj.; Ismail, Hamizun bin
2014-09-01
Enhanced index tracking is a popular form of portfolio management in stock market investment. Enhanced index tracking aims to construct an optimal portfolio to generate excess return over the return achieved by the stock market index without purchasing all of the stocks that make up the index. The objective of this paper is to construct an optimal portfolio using mixed-integer programming model which adopts regression approach in order to generate higher portfolio mean return than stock market index return. In this study, the data consists of 24 component stocks in Malaysia market index which is FTSE Bursa Malaysia Kuala Lumpur Composite Index from January 2010 until December 2012. The results of this study show that the optimal portfolio of mixed-integer programming model is able to generate higher mean return than FTSE Bursa Malaysia Kuala Lumpur Composite Index return with only selecting 30% out of the total stock market index components.
Defining the student burnout construct: a structural analysis from three burnout inventories.
Maroco, João; Campos, Juliana Alvares Duarte Bonini
2012-12-01
College student burnout has been assessed mainly with the Maslach burnout inventory (MBI). However, the construct's definition and measurement with MBI has drawn several criticisms and new inventories have been suggested for the evaluation of the syndrome. A redefinition of the construct of student burnout is proposed by means of a structural equation model, reflecting burnout as a second order factor defined by factors from the MBI-student survey (MBI-SS); the Copenhagen burnout inventory-student survey (CBI-SS) and the Oldenburg burnout inventory-student survey (OLBI-SS). Standardized regression weights from Burnout to Exhaustion and Cynicism from the MBI-SS scale, personal burnout and studies related burnout from the CBI, and exhaustion and disengagement from OLBI, show that these factors are strong manifestations of students' burnout. For college students, the burnout construct is best defined by two dimensions described as "physical and psychological exhaustion" and "cynicism and disengagement".
Mitigation of two pyrethroid insecticides in a Mississippi Delta constructed wetland.
Moore, M T; Cooper, C M; Smith, S; Cullum, R F; Knight, S S; Locke, M A; Bennett, E R
2009-01-01
Constructed wetlands are a suggested best management practice to help mitigate agricultural runoff before entering receiving aquatic ecosystems. A constructed wetland system (180 m x 30 m), comprising a sediment retention basin and two treatment cells, was used to determine the fate and transport of simulated runoff containing the pyrethroid insecticides lambda-cyhalothrin and cyfluthrin, as well as suspended sediment. Wetland water, sediment, and plant samples were collected spatially and temporally over 55 d. Results showed 49 and 76% of the study's measured lambda-cyhalothrin and cyfluthrin masses were associated with vegetation, respectively. Based on conservative effects concentrations for invertebrates and regression analyses of maximum observed wetland aqueous concentrations, a wetland length of 215 m x 30 m width would be required to adequately mitigate 1% pesticide runoff from a 14 ha contributing area. Results of this experiment can be used to model future design specifications for constructed wetland mitigation of pyrethroid insecticides.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jiangjiang; Li, Weixuan; Lin, Guang
In decision-making for groundwater management and contamination remediation, it is important to accurately evaluate the probability of the occurrence of a failure event. For small failure probability analysis, a large number of model evaluations are needed in the Monte Carlo (MC) simulation, which is impractical for CPU-demanding models. One approach to alleviate the computational cost caused by the model evaluations is to construct a computationally inexpensive surrogate model instead. However, using a surrogate approximation can cause an extra error in the failure probability analysis. Moreover, constructing accurate surrogates is challenging for high-dimensional models, i.e., models containing many uncertain input parameters.more » To address these issues, we propose an efficient two-stage MC approach for small failure probability analysis in high-dimensional groundwater contaminant transport modeling. In the first stage, a low-dimensional representation of the original high-dimensional model is sought with Karhunen–Loève expansion and sliced inverse regression jointly, which allows for the easy construction of a surrogate with polynomial chaos expansion. Then a surrogate-based MC simulation is implemented. In the second stage, the small number of samples that are close to the failure boundary are re-evaluated with the original model, which corrects the bias introduced by the surrogate approximation. The proposed approach is tested with a numerical case study and is shown to be 100 times faster than the traditional MC approach in achieving the same level of estimation accuracy.« less
Development of Interpretable Predictive Models for BPH and Prostate Cancer.
Bermejo, Pablo; Vivo, Alicia; Tárraga, Pedro J; Rodríguez-Montes, J A
2015-01-01
Traditional methods for deciding whether to recommend a patient for a prostate biopsy are based on cut-off levels of stand-alone markers such as prostate-specific antigen (PSA) or any of its derivatives. However, in the last decade we have seen the increasing use of predictive models that combine, in a non-linear manner, several predictives that are better able to predict prostate cancer (PC), but these fail to help the clinician to distinguish between PC and benign prostate hyperplasia (BPH) patients. We construct two new models that are capable of predicting both PC and BPH. An observational study was performed on 150 patients with PSA ≥3 ng/mL and age >50 years. We built a decision tree and a logistic regression model, validated with the leave-one-out methodology, in order to predict PC or BPH, or reject both. Statistical dependence with PC and BPH was found for prostate volume (P-value < 0.001), PSA (P-value < 0.001), international prostate symptom score (IPSS; P-value < 0.001), digital rectal examination (DRE; P-value < 0.001), age (P-value < 0.002), antecedents (P-value < 0.006), and meat consumption (P-value < 0.08). The two predictive models that were constructed selected a subset of these, namely, volume, PSA, DRE, and IPSS, obtaining an area under the ROC curve (AUC) between 72% and 80% for both PC and BPH prediction. PSA and volume together help to build predictive models that accurately distinguish among PC, BPH, and patients without any of these pathologies. Our decision tree and logistic regression models outperform the AUC obtained in the compared studies. Using these models as decision support, the number of unnecessary biopsies might be significantly reduced.
Huen, Jenny M Y; Ip, Brian Y T; Ho, Samuel M Y; Yip, Paul S F
2015-01-01
The present study investigated whether hope and hopelessness are better conceptualized as a single construct of bipolar spectrum or two distinct constructs and whether hope can moderate the relationship between hopelessness and suicidal ideation. Hope, hopelessness, and suicidal ideation were measured in a community sample of 2106 participants through a population-based household survey. Confirmatory factor analyses showed that a measurement model with separate, correlated second-order factors of hope and hopelessness provided a good fit to the data and was significantly better than that of the model collapsing hope and hopelessness into a single second-order factor. Negative binomial regression showed that hope and hopelessness interacted such that the effect of hopelessness on suicidal ideation was lower in individuals with higher hope than individuals with lower hope. Hope and hopelessness are two distinct but correlated constructs. Hope can act as a resilience factor that buffers the impact of hopelessness on suicidal ideation. Inducing hope in people may be a promising avenue for suicide prevention.
Didarloo, Alireza; Nabilou, Bahram; Khalkhali, Hamid Reza
2017-11-03
Breast cancer is a life-threatening condition affecting women around the world. The early detection of breast lumps using a breast self-examination (BSE) is important for the prevention and control of this disease. The aim of this study was to examine BSE behavior and its predictive factors among female university students using the Health Belief Model (HBM). This investigation was a cross-sectional survey carried out with 334 female students at Urmia University of Medical Sciences in the northwest of Iran. To collect the necessary data, researchers applied a valid and reliable three-part questionnaire. The data were analyzed using descriptive statistics and a chi-square test, in addition to multivariate logistic regression statistics in SPSS software version 16.0 (SPSS Inc., Chicago, IL, USA). The results indicated that 82 of the 334 participants (24.6%) reported practicing BSEs. Multivariate logistic regression analyses showed that high perceived severity [OR = 2.38, 95% CI = (1.02-5.54)], high perceived benefits [OR = 1.94, 95% CI = (1.09-3.46)], and high perceived self-efficacy [OR = 13.15, 95% CI = (3.64-47.51)] were better predictors of BSE behavior (P < 0.05) than low perceived severity, benefits, and self-efficacy. The findings also showed that a high level of knowledge compared to a low level of knowledge [OR = 5.51, 95% CI = (1.79-16.86)] and academic undergraduate and graduate degrees compared to doctoral degrees [OR = 2.90, 95% CI = (1.42-5.92)] of the participants were predictors of BSE performance (P < 0.05). The study revealed that the HBM constructs are able to predict BSE behavior. Among these constructs, self-efficacy was the most important predictor of the behavior. Interventions based on the constructs of perceived self-efficacy, benefits, and severity are recommended for increasing women's regular screening for breast cancer.
NASA Astrophysics Data System (ADS)
Mansoor Gorgees, Hazim; Hilal, Mariam Mohammed
2018-05-01
Fatigue cracking is one of the common types of pavement distresses and is an indicator of structural failure; cracks allow moisture infiltration, roughness, may further deteriorate to a pothole. Some causes of pavement deterioration are: traffic loading; environment influences; drainage deficiencies; materials quality problems; construction deficiencies and external contributors. Many researchers have made models that contain many variables like asphalt content, asphalt viscosity, fatigue life, stiffness of asphalt mixture, temperature and other parameters that affect the fatigue life. For this situation, a fuzzy linear regression model was employed and analyzed by using the traditional methods and our proposed method in order to overcome the multi-collinearity problem. The total spread error was used as a criterion to compare the performance of the studied methods. Simulation program was used to obtain the required results.
Sato, Takako; Zaitsu, Kei; Tsuboi, Kento; Nomura, Masakatsu; Kusano, Maiko; Shima, Noriaki; Abe, Shuntaro; Ishii, Akira; Tsuchihashi, Hitoshi; Suzuki, Koichi
2015-05-01
Estimation of postmortem interval (PMI) is an important goal in judicial autopsy. Although many approaches can estimate PMI through physical findings and biochemical tests, accurate PMI calculation by these conventional methods remains difficult because PMI is readily affected by surrounding conditions, such as ambient temperature and humidity. In this study, Sprague-Dawley (SD) rats (10 weeks) were sacrificed by suffocation, and blood was collected by dissection at various time intervals (0, 3, 6, 12, 24, and 48 h; n = 6) after death. A total of 70 endogenous metabolites were detected in plasma by gas chromatography-tandem mass spectrometry (GC-MS/MS). Each time group was separated from each other on the principal component analysis (PCA) score plot, suggesting that the various endogenous metabolites changed with time after death. To prepare a prediction model of a PMI, a partial least squares (or projection to latent structure, PLS) regression model was constructed using the levels of significantly different metabolites determined by variable importance in the projection (VIP) score and the Kruskal-Wallis test (P < 0.05). Because the constructed PLS regression model could successfully predict each PMI, this model was validated with another validation set (n = 3). In conclusion, plasma metabolic profiling demonstrated its ability to successfully estimate PMI under a certain condition. This result can be considered to be the first step for using the metabolomics method in future forensic casework.
Dong, Xiuwen Sue; Wang, Xuanwen; Largay, Julie A.
2015-01-01
Background: Many factors contribute to occupational injuries. However, these factors have been compartmentalized and isolated in most studies. Objective: To examine the relationship between work-related injuries and multiple occupational and non-occupational factors among construction workers in the USA. Methods: Data from the 1988–2000 National Longitudinal Survey of Youth, 1979 cohort (N = 12,686) were analyzed. Job exposures and health behaviors were examined and used as independent variables in four multivariate logistic regression models to identify associations with occupational injuries. Results: After controlling for demographic variables, occupational injuries were 18% (95% CI: 1.04–1.34) more likely in construction than in non-construction. Blue-collar occupations, job physical efforts, multiple jobs, and long working hours accounted for the escalated risk in construction. Smoking, obesity/overweight, and cocaine use significantly increased the risk of work-related injury when demographics and occupational factors were held constant. Conclusions: Workplace injuries are better explained by simultaneously examining occupational and non-occupational characteristics. PMID:25816923
Event-based total suspended sediment particle size distribution model
NASA Astrophysics Data System (ADS)
Thompson, Jennifer; Sattar, Ahmed M. A.; Gharabaghi, Bahram; Warner, Richard C.
2016-05-01
One of the most challenging modelling tasks in hydrology is prediction of the total suspended sediment particle size distribution (TSS-PSD) in stormwater runoff generated from exposed soil surfaces at active construction sites and surface mining operations. The main objective of this study is to employ gene expression programming (GEP) and artificial neural networks (ANN) to develop a new model with the ability to more accurately predict the TSS-PSD by taking advantage of both event-specific and site-specific factors in the model. To compile the data for this study, laboratory scale experiments using rainfall simulators were conducted on fourteen different soils to obtain TSS-PSD. This data is supplemented with field data from three construction sites in Ontario over a period of two years to capture the effect of transport and deposition within the site. The combined data sets provide a wide range of key overlooked site-specific and storm event-specific factors. Both parent soil and TSS-PSD in runoff are quantified by fitting each to a lognormal distribution. Compared to existing regression models, the developed model more accurately predicted the TSS-PSD using a more comprehensive list of key model input parameters. Employment of the new model will increase the efficiency of deployment of required best management practices, designed based on TSS-PSD, to minimize potential adverse effects of construction site runoff on aquatic life in the receiving watercourses.
de Vries, Natalie Jane; Carlson, Jamie; Moscato, Pablo
2014-01-01
Online consumer behavior in general and online customer engagement with brands in particular, has become a major focus of research activity fuelled by the exponential increase of interactive functions of the internet and social media platforms and applications. Current research in this area is mostly hypothesis-driven and much debate about the concept of Customer Engagement and its related constructs remains existent in the literature. In this paper, we aim to propose a novel methodology for reverse engineering a consumer behavior model for online customer engagement, based on a computational and data-driven perspective. This methodology could be generalized and prove useful for future research in the fields of consumer behaviors using questionnaire data or studies investigating other types of human behaviors. The method we propose contains five main stages; symbolic regression analysis, graph building, community detection, evaluation of results and finally, investigation of directed cycles and common feedback loops. The 'communities' of questionnaire items that emerge from our community detection method form possible 'functional constructs' inferred from data rather than assumed from literature and theory. Our results show consistent partitioning of questionnaire items into such 'functional constructs' suggesting the method proposed here could be adopted as a new data-driven way of human behavior modeling.
Zoellner, Jamie M.; Porter, Kathleen J.; Chen, Yvonnes; Hedrick, Valisa E.; You, Wen; Hickman, Maja; Estabrooks, Paul A.
2017-01-01
Objective Guided by the theory of planned behaviour (TPB) and health literacy concepts, SIPsmartER is a six-month multicomponent intervention effective at improving SSB behaviours. Using SIPsmartER data, this study explores prediction of SSB behavioural intention (BI) and behaviour from TPB constructs using: (1) cross-sectional and prospective models and (2) 11 single-item assessments from interactive voice response (IVR) technology. Design Quasi-experimental design, including pre- and post-outcome data and repeated-measures process data of 155 intervention participants. Main Outcome Measures Validated multi-item TPB measures, single-item TPB measures, and self-reported SSB behaviours. Hypothesised relationships were investigated using correlation and multiple regression models. Results TPB constructs explained 32% of the variance cross sectionally and 20% prospectively in BI; and explained 13–20% of variance cross sectionally and 6% prospectively. Single-item scale models were significant, yet explained less variance. All IVR models predicting BI (average 21%, range 6–38%) and behaviour (average 30%, range 6–55%) were significant. Conclusion Findings are interpreted in the context of other cross-sectional, prospective and experimental TPB health and dietary studies. Findings advance experimental application of the TPB, including understanding constructs at outcome and process time points and applying theory in all intervention development, implementation and evaluation phases. PMID:28165771
Impact of grade separator on pedestrian risk taking behavior.
Khatoon, Mariya; Tiwari, Geetam; Chatterjee, Niladri
2013-01-01
Pedestrians on Delhi roads are often exposed to high risks. This is because the basic needs of pedestrians are not recognized as a part of the urban transport infrastructure improvement projects in Delhi. Rather, an ever increasing number of cars and motorized two-wheelers encourage the construction of large numbers of flyovers/grade separators to facilitate signal free movement for motorized vehicles, exposing pedestrians to greater risk. This paper describes the statistical analysis of pedestrian risk taking behavior while crossing the road, before and after the construction of a grade separator at an intersection of Delhi. A significant number of pedestrians are willing to take risks in both before and after situations. The results indicate that absence of signals make pedestrians behave independently, leading to increased variability in their risk taking behavior. Variability in the speeds of all categories of vehicles has increased after the construction of grade separators. After the construction of the grade separator, the waiting time of pedestrians at the starting point of crossing has increased and the correlation between waiting times and gaps accepted by pedestrians show that after certain time of waiting, pedestrians become impatient and accepts smaller gap size to cross the road. A Logistic regression model is fitted by assuming that the probability of road crossing by pedestrians depends on the gap size (in s) between pedestrian and conflicting vehicles, sex, age, type of pedestrians (single or in a group) and type of conflicting vehicles. The results of Logistic regression explained that before the construction of the grade separator the probability of road crossing by the pedestrian depends on only the gap size parameter; however after the construction of the grade separator, other parameters become significant in determining pedestrian risk taking behavior. Copyright © 2012 Elsevier Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Kyong Ju, E-mail: kjkim@cau.ac.kr; Yun, Won Gun, E-mail: ogun78@naver.com; Cho, Namho, E-mail: nhc51@cau.ac.kr
The late rise in global concern for environmental issues such as global warming and air pollution is accentuating the need for environmental assessments in the construction industry. Promptly evaluating the environmental loads of the various design alternatives during the early stages of a construction project and adopting the most environmentally sustainable candidate is therefore of large importance. Yet, research on the early evaluation of a construction project's environmental load in order to aid the decision making process is hitherto lacking. In light of this dilemma, this study proposes a model for estimating the environmental load by employing only the mostmore » basic information accessible during the early design phases of a project for the pre-stressed concrete (PSC) beam bridge, the most common bridge structure. Firstly, a life cycle assessment (LCA) was conducted on the data from 99 bridges by integrating the bills of quantities (BOQ) with a life cycle inventory (LCI) database. The processed data was then utilized to construct a case based reasoning (CBR) model for estimating the environmental load. The accuracy of the estimation model was then validated using five test cases; the model's mean absolute error rates (MAER) for the total environmental load was calculated as 7.09%. Such test results were shown to be superior compared to those obtained from a multiple-regression based model and a slab area base-unit analysis model. Henceforth application of this model during the early stages of a project is expected to highly complement environmentally friendly designs and construction by facilitating the swift evaluation of the environmental load from multiple standpoints. - Highlights: • This study is to develop the model of assessing the environmental impacts on LCA. • Bills of quantity from completed designs of PSC Beam were linked with the LCI DB. • Previous cases were used to estimate the environmental load of new case by CBR model. • CBR model produces more accurate estimations (7.09%) than other conventional models. • This study supports decision making process in the early stage of a new construction case.« less
Connections between survey calibration estimators and semiparametric models for incomplete data
Lumley, Thomas; Shaw, Pamela A.; Dai, James Y.
2012-01-01
Survey calibration (or generalized raking) estimators are a standard approach to the use of auxiliary information in survey sampling, improving on the simple Horvitz–Thompson estimator. In this paper we relate the survey calibration estimators to the semiparametric incomplete-data estimators of Robins and coworkers, and to adjustment for baseline variables in a randomized trial. The development based on calibration estimators explains the ‘estimated weights’ paradox and provides useful heuristics for constructing practical estimators. We present some examples of using calibration to gain precision without making additional modelling assumptions in a variety of regression models. PMID:23833390
Aghamolaei, Teamur; Sadat Tavafian, Sedigheh; Madani, Abdoulhossain
2012-09-01
This study aimed to apply the conceptual framework of the theory of planned behavior (TPB) to explain fish consumption in a sample of people who lived in Bandar Abbass, Iran. We investigated the role of three traditional constructs of TPB that included attitude, social norms, and perceived behavioral control in an effort to characterize the intention to consume fish as well as the behavioral trends that characterize fish consumption. Data were derived from a cross-sectional sample of 321 subjects. Alpha coefficient correlation and linear regression analysis were applied to test the relationships between constructs. The predictors of fish consumption frequency were also evaluated. Multiple regression analysis revealed that attitude, subjective norms, and perceived behavioral control significantly predicted intention to eat fish (R2 = 0.54, F = 128.4, P < 0.001). Multiple regression analysis for the intention to eat fish and perceived behavioral control revealed that both factors significantly predicted fish consumption frequency (R2 = 0.58, F = 223.1, P < 0.001). The results indicated that the models fit well with the data. Attitude, subjective norms, and perceived behavioral control all had significant positive impacts on behavioral intention. Moreover, both intention and perceived behavioral control could be used to predict the frequency of fish consumption.
Singh, Kunwar P; Gupta, Shikha; Rai, Premanjali
2013-09-01
The research aims to develop global modeling tools capable of categorizing structurally diverse chemicals in various toxicity classes according to the EEC and European Community directives, and to predict their acute toxicity in fathead minnow using set of selected molecular descriptors. Accordingly, artificial intelligence approach based classification and regression models, such as probabilistic neural networks (PNN), generalized regression neural networks (GRNN), multilayer perceptron neural network (MLPN), radial basis function neural network (RBFN), support vector machines (SVM), gene expression programming (GEP), and decision tree (DT) were constructed using the experimental toxicity data. Diversity and non-linearity in the chemicals' data were tested using the Tanimoto similarity index and Brock-Dechert-Scheinkman statistics. Predictive and generalization abilities of various models constructed here were compared using several statistical parameters. PNN and GRNN models performed relatively better than MLPN, RBFN, SVM, GEP, and DT. Both in two and four category classifications, PNN yielded a considerably high accuracy of classification in training (95.85 percent and 90.07 percent) and validation data (91.30 percent and 86.96 percent), respectively. GRNN rendered a high correlation between the measured and model predicted -log LC50 values both for the training (0.929) and validation (0.910) data and low prediction errors (RMSE) of 0.52 and 0.49 for two sets. Efficiency of the selected PNN and GRNN models in predicting acute toxicity of new chemicals was adequately validated using external datasets of different fish species (fathead minnow, bluegill, trout, and guppy). The PNN and GRNN models showed good predictive and generalization abilities and can be used as tools for predicting toxicities of structurally diverse chemical compounds. Copyright © 2013 Elsevier Inc. All rights reserved.
Kasprzyk, Danuta; Tshimanga, Mufuta; Hamilton, Deven T; Gorn, Gerald J; Montaño, Daniel E
2018-02-01
Male circumcision (MC) significantly reduces HIV acquisition among men, leading WHO/UNAIDS to recommend high HIV and low MC prevalence countries circumcise 80% of adolescents and men age 15-49. Despite significant investment to increase MC capacity only 27% of the goal has been achieved in Zimbabwe. To increase adoption, research to create evidence-based messages is greatly needed. The Integrated Behavioral Model (IBM) was used to investigate factors affecting MC motivation among adolescents. Based on qualitative elicitation study results a survey was designed and administered to a representative sample of 802 adolescent boys aged 13-17 in two urban and two rural areas in Zimbabwe. Multiple regression analysis found all six IBM constructs (2 attitude, 2 social influence, 2 personal agency) significantly explained MC intention (R 2 = 0.55). Stepwise regression analysis of beliefs underlying each IBM belief-based construct found 9 behavioral, 6 injunctive norm, 2 descriptive norm, 5 efficacy, and 8 control beliefs significantly explained MC intention. A final stepwise regression of all the significant IBM construct beliefs identified 12 key beliefs best explaining intention. Similar analyses were carried out with subgroups of adolescents by urban-rural and age. Different sets of behavioral, normative, efficacy, and control beliefs were significant for each sub-group. This study demonstrates the application of theory-driven research to identify evidence-based targets for the design of effective MC messages for interventions to increase adolescents' motivation. Incorporating these findings into communication campaigns is likely to improve demand for MC.
Policy Implications and Suggestions on Administrative Measures of Urban Flood
NASA Astrophysics Data System (ADS)
Lee, S. V.; Lee, M. J.; Lee, C.; Yoon, J. H.; Chae, S. H.
2017-12-01
The frequency and intensity of floods are increasing worldwide as recent climate change progresses gradually. Flood management should be policy-oriented in urban municipalities due to the characteristics of urban areas with a lot of damage. Therefore, the purpose of this study is to prepare a flood susceptibility map by using data mining model and make a policy suggestion on administrative measures of urban flood. Therefore, we constructed a spatial database by collecting relevant factors including the topography, geology, soil and land use data of the representative city, Seoul, the capital city of Korea. Flood susceptibility map was constructed by applying the data mining models of random forest and boosted tree model to input data and existing flooded area data in 2010. The susceptibility map has been validated using the 2011 flood area data which was not used for training. The predictor importance value of each factor to the results was calculated in this process. The distance from the water, DEM and geology showed a high predictor importance value which means to be a high priority for flood preparation policy. As a result of receiver operating characteristic (ROC), random forest model showed 78.78% and 79.18% accuracy of regression and classification and boosted tree model showed 77.55% and 77.26% accuracy of regression and classification, respectively. The results show that the flood susceptibility maps can be applied to flood prevention and management, and it also can help determine the priority areas for flood mitigation policy by providing useful information to policy makers.
[Developing a predictive model for the caregiver strain index].
Álvarez-Tello, Margarita; Casado-Mejía, Rosa; Praena-Fernández, Juan Manuel; Ortega-Calvo, Manuel
Patient homecare with multiple morbidities is an increasingly common occurrence. The caregiver strain index is tool in the form of questionnaire that is designed to measure the perceived burden of those who care for their families. The aim of this study is to construct a diagnostic nomogram of informal caregiver burden using data from a predictive model. The model was drawn up using binary logistic regression and the questionnaire items as dichotomous factors. The dependent variable was the final score obtained with the questionnaire but categorised in accordance with that in the literature. Scores between 0 and 6 were labelled as "no" (no caregiver stress) and at or greater than 7 as "yes". The version 3.1.1R statistical software was used. To construct confidence intervals for the ROC curve 2000 boot strap replicates were used. A sample of 67 caregivers was obtained. A diagnosing nomogram was made up with its calibration graph (Brier scaled = 0.686, Nagelkerke R 2 =0.791), and the corresponding ROC curve (area under the curve=0.962). The predictive model generated using binary logistic regression and the nomogram contain four items (1, 4, 5 and 9) of the questionnaire. R plotting functions allow a very good solution for validating a model like this. The area under the ROC curve (0.96; 95% CI: 0.994-0.941) achieves a high discriminative value. Calibration also shows high goodness of fit values, suggesting that it may be clinically useful in community nursing and geriatric establishments. Copyright © 2015 SEGG. Publicado por Elsevier España, S.L.U. All rights reserved.
NASA Astrophysics Data System (ADS)
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
Development of a hybrid model to predict construction and demolition waste: China as a case study.
Song, Yiliao; Wang, Yong; Liu, Feng; Zhang, Yixin
2017-01-01
Construction and demolition waste (C&DW) is currently a worldwide issue, and the situation is the worst in China due to a rapid increase in the construction industry and the short life span of China's buildings. To create an opportunity out of this problem, comprehensive prevention measures and effective management strategies are urgently needed. One major gap in the literature of waste management is a lack of estimations on future C&DW generation. Therefore, this paper presents a forecasting procedure for C&DW in China that can forecast the quantity of each component in such waste. The proposed approach is based on a GM-SVR model that improves the forecasting effectiveness of the gray model (GM), which is achieved by adjusting the residual series by a support vector regression (SVR) method and a transition matrix that aims to estimate the discharge of each component in the C&DW. Through the proposed method, future C&DW volume are listed and analyzed containing their potential components and distribution in different provinces in China. Besides, model testing process provides mathematical evidence to validate the proposed model is an effective way to give future information of C&DW for policy makers. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lozano, Oscar M; Rojas, Antonio J; Pérez, Cristino; González-Sáiz, Francisco; Ballesta, Rosario; Izaskun, Bilbao
2008-05-01
The aim of this work is to show evidence of the validity of the Health-Related Quality of Life for Drug Abusers Test (HRQoLDA Test). This test was developed to measure specific HRQoL for drugs abusers, within the theoretical addiction framework of the biaxial model. The sample comprised 138 patients diagnosed with opiate drug dependence. In this study, the following constructs and variables of the biaxial model were measured: severity of dependence, physical health status, psychological adjustment and substance consumption. Results indicate that the HRQoLDA Test scores are related to dependency and consumption-related problems. Multiple regression analysis reveals that HRQoL can be predicted from drug dependence, physical health status and psychological adjustment. These results contribute empirical evidence of the theoretical relationships established between HRQoL and the biaxial model, and they support the interpretation of the HRQoLDA Test to measure HRQoL in drug abusers, thus providing a test to measure this specific construct in this population.
NASA Astrophysics Data System (ADS)
Ouyang, Qi; Lu, Wenxi; Lin, Jin; Deng, Wenbing; Cheng, Weiguo
2017-08-01
The surrogate-based simulation-optimization techniques are frequently used for optimal groundwater remediation design. When this technique is used, surrogate errors caused by surrogate-modeling uncertainty may lead to generation of infeasible designs. In this paper, a conservative strategy that pushes the optimal design into the feasible region was used to address surrogate-modeling uncertainty. In addition, chance-constrained programming (CCP) was adopted to compare with the conservative strategy in addressing this uncertainty. Three methods, multi-gene genetic programming (MGGP), Kriging (KRG) and support vector regression (SVR), were used to construct surrogate models for a time-consuming multi-phase flow model. To improve the performance of the surrogate model, ensemble surrogates were constructed based on combinations of different stand-alone surrogate models. The results show that: (1) the surrogate-modeling uncertainty was successfully addressed by the conservative strategy, which means that this method is promising for addressing surrogate-modeling uncertainty. (2) The ensemble surrogate model that combines MGGP with KRG showed the most favorable performance, which indicates that this ensemble surrogate can utilize both stand-alone surrogate models to improve the performance of the surrogate model.
Chan, Kwun Chuen Gary; Wang, Mei-Cheng
2017-01-01
Recurrent event processes with marker measurements are mostly and largely studied with forward time models starting from an initial event. Interestingly, the processes could exhibit important terminal behavior during a time period before occurrence of the failure event. A natural and direct way to study recurrent events prior to a failure event is to align the processes using the failure event as the time origin and to examine the terminal behavior by a backward time model. This paper studies regression models for backward recurrent marker processes by counting time backward from the failure event. A three-level semiparametric regression model is proposed for jointly modeling the time to a failure event, the backward recurrent event process, and the marker observed at the time of each backward recurrent event. The first level is a proportional hazards model for the failure time, the second level is a proportional rate model for the recurrent events occurring before the failure event, and the third level is a proportional mean model for the marker given the occurrence of a recurrent event backward in time. By jointly modeling the three components, estimating equations can be constructed for marked counting processes to estimate the target parameters in the three-level regression models. Large sample properties of the proposed estimators are studied and established. The proposed models and methods are illustrated by a community-based AIDS clinical trial to examine the terminal behavior of frequencies and severities of opportunistic infections among HIV infected individuals in the last six months of life.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rupšys, P.
A system of stochastic differential equations (SDE) with mixed-effects parameters and multivariate normal copula density function were used to develop tree height model for Scots pine trees in Lithuania. A two-step maximum likelihood parameter estimation method is used and computational guidelines are given. After fitting the conditional probability density functions to outside bark diameter at breast height, and total tree height, a bivariate normal copula distribution model was constructed. Predictions from the mixed-effects parameters SDE tree height model calculated during this research were compared to the regression tree height equations. The results are implemented in the symbolic computational language MAPLE.
NASA Astrophysics Data System (ADS)
Kim, Kyeong; Berezhnoy, Alexey; Wöhler, Christian; Grumpe, Arne; Rodriguez, Alexis; Hasebe, Nobuyuki; Van Gasselt, Stephan
2016-07-01
Using Kaguya GRS data, we investigated Si distribution on the Moon, based on study of the 4934 keV Si gamma ray peak caused by interaction between thermal neutrons and lunar Si-28 atoms. A Si peak analysis for a grid of 10 degrees in longitude and latitude was accomplished by the IRAP Aquarius program followed by a correction for altitude and thermal neutron density. A spectral parameter based regression model of the Si distribution was built for latitudes between 60°S and 60°N based on the continuum slopes, band depths, widths and minimum wavelengths of the absorption bands near 1 μμm and 2 μμm. Based on these regression models a nearly global cpm (counts per minute) map of Si with a resolution of 20 pixels per degree was constructed. The construction of a nearly global map of lunar Si abundances has been achieved by a combination of regression-based analysis of KGRS cpm data and M ^{3} spectral reflectance data, it has been calibrated with respect to returned sample-based wt% values. The Si abundances estimated with our method systematically exceed those of the LP GRS Si data set but are consistent with typical Si abundances of lunar basalt samples (in the maria) and feldspathic mineral samples (in the highlands). Our Si map shows that the Si abundance values on the Moon are typically between 17 and 28 wt%. The obtained Si map will provide an important aspect in both understanding the distribution of minerals and the evolution of the lunar surface since its formation.
Chapman, Benjamin P; Weiss, Alexander; Duberstein, Paul R
2016-12-01
Statistical learning theory (SLT) is the statistical formulation of machine learning theory, a body of analytic methods common in "big data" problems. Regression-based SLT algorithms seek to maximize predictive accuracy for some outcome, given a large pool of potential predictors, without overfitting the sample. Research goals in psychology may sometimes call for high dimensional regression. One example is criterion-keyed scale construction, where a scale with maximal predictive validity must be built from a large item pool. Using this as a working example, we first introduce a core principle of SLT methods: minimization of expected prediction error (EPE). Minimizing EPE is fundamentally different than maximizing the within-sample likelihood, and hinges on building a predictive model of sufficient complexity to predict the outcome well, without undue complexity leading to overfitting. We describe how such models are built and refined via cross-validation. We then illustrate how 3 common SLT algorithms-supervised principal components, regularization, and boosting-can be used to construct a criterion-keyed scale predicting all-cause mortality, using a large personality item pool within a population cohort. Each algorithm illustrates a different approach to minimizing EPE. Finally, we consider broader applications of SLT predictive algorithms, both as supportive analytic tools for conventional methods, and as primary analytic tools in discovery phase research. We conclude that despite their differences from the classic null-hypothesis testing approach-or perhaps because of them-SLT methods may hold value as a statistically rigorous approach to exploratory regression. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Efficient robust doubly adaptive regularized regression with applications.
Karunamuni, Rohana J; Kong, Linglong; Tu, Wei
2018-01-01
We consider the problem of estimation and variable selection for general linear regression models. Regularized regression procedures have been widely used for variable selection, but most existing methods perform poorly in the presence of outliers. We construct a new penalized procedure that simultaneously attains full efficiency and maximum robustness. Furthermore, the proposed procedure satisfies the oracle properties. The new procedure is designed to achieve sparse and robust solutions by imposing adaptive weights on both the decision loss and the penalty function. The proposed method of estimation and variable selection attains full efficiency when the model is correct and, at the same time, achieves maximum robustness when outliers are present. We examine the robustness properties using the finite-sample breakdown point and an influence function. We show that the proposed estimator attains the maximum breakdown point. Furthermore, there is no loss in efficiency when there are no outliers or the error distribution is normal. For practical implementation of the proposed method, we present a computational algorithm. We examine the finite-sample and robustness properties using Monte Carlo studies. Two datasets are also analyzed.
Prediction of strontium bromide laser efficiency using cluster and decision tree analysis
NASA Astrophysics Data System (ADS)
Iliev, Iliycho; Gocheva-Ilieva, Snezhana; Kulin, Chavdar
2018-01-01
Subject of investigation is a new high-powered strontium bromide (SrBr2) vapor laser emitting in multiline region of wavelengths. The laser is an alternative to the atom strontium lasers and electron free lasers, especially at the line 6.45 μm which line is used in surgery for medical processing of biological tissues and bones with minimal damage. In this paper the experimental data from measurements of operational and output characteristics of the laser are statistically processed by means of cluster analysis and tree-based regression techniques. The aim is to extract the more important relationships and dependences from the available data which influence the increase of the overall laser efficiency. There are constructed and analyzed a set of cluster models. It is shown by using different cluster methods that the seven investigated operational characteristics (laser tube diameter, length, supplied electrical power, and others) and laser efficiency are combined in 2 clusters. By the built regression tree models using Classification and Regression Trees (CART) technique there are obtained dependences to predict the values of efficiency, and especially the maximum efficiency with over 95% accuracy.
Dinç, Erdal; Ozdemir, Abdil
2005-01-01
Multivariate chromatographic calibration technique was developed for the quantitative analysis of binary mixtures enalapril maleate (EA) and hydrochlorothiazide (HCT) in tablets in the presence of losartan potassium (LST). The mathematical algorithm of multivariate chromatographic calibration technique is based on the use of the linear regression equations constructed using relationship between concentration and peak area at the five-wavelength set. The algorithm of this mathematical calibration model having a simple mathematical content was briefly described. This approach is a powerful mathematical tool for an optimum chromatographic multivariate calibration and elimination of fluctuations coming from instrumental and experimental conditions. This multivariate chromatographic calibration contains reduction of multivariate linear regression functions to univariate data set. The validation of model was carried out by analyzing various synthetic binary mixtures and using the standard addition technique. Developed calibration technique was applied to the analysis of the real pharmaceutical tablets containing EA and HCT. The obtained results were compared with those obtained by classical HPLC method. It was observed that the proposed multivariate chromatographic calibration gives better results than classical HPLC.
Predicting local field potentials with recurrent neural networks.
Kim, Louis; Harer, Jacob; Rangamani, Akshay; Moran, James; Parks, Philip D; Widge, Alik; Eskandar, Emad; Dougherty, Darin; Chin, Sang Peter
2016-08-01
We present a Recurrent Neural Network using LSTM (Long Short Term Memory) that is capable of modeling and predicting Local Field Potentials. We train and test the network on real data recorded from epilepsy patients. We construct networks that predict multi-channel LFPs for 1, 10, and 100 milliseconds forward in time. Our results show that prediction using LSTM outperforms regression when predicting 10 and 100 millisecond forward in time.
Failure of Standard Training Sets in the Analysis of Fast-Scan Cyclic Voltammetry Data.
Johnson, Justin A; Rodeberg, Nathan T; Wightman, R Mark
2016-03-16
The use of principal component regression, a multivariate calibration method, in the analysis of in vivo fast-scan cyclic voltammetry data allows for separation of overlapping signal contributions, permitting evaluation of the temporal dynamics of multiple neurotransmitters simultaneously. To accomplish this, the technique relies on information about current-concentration relationships across the scan-potential window gained from analysis of training sets. The ability of the constructed models to resolve analytes depends critically on the quality of these data. Recently, the use of standard training sets obtained under conditions other than those of the experimental data collection (e.g., with different electrodes, animals, or equipment) has been reported. This study evaluates the analyte resolution capabilities of models constructed using this approach from both a theoretical and experimental viewpoint. A detailed discussion of the theory of principal component regression is provided to inform this discussion. The findings demonstrate that the use of standard training sets leads to misassignment of the current-concentration relationships across the scan-potential window. This directly results in poor analyte resolution and, consequently, inaccurate quantitation, which may lead to erroneous conclusions being drawn from experimental data. Thus, it is strongly advocated that training sets be obtained under the experimental conditions to allow for accurate data analysis.
Zhang, Y J; Xue, F X; Bai, Z P
2017-03-06
The impact of maternal air pollution exposure on offspring health has received much attention. Precise and feasible exposure estimation is particularly important for clarifying exposure-response relationships and reducing heterogeneity among studies. Temporally-adjusted land use regression (LUR) models are exposure assessment methods developed in recent years that have the advantage of having high spatial-temporal resolution. Studies on the health effects of outdoor air pollution exposure during pregnancy have been increasingly carried out using this model. In China, research applying LUR models was done mostly at the model construction stage, and findings from related epidemiological studies were rarely reported. In this paper, the sources of heterogeneity and research progress of meta-analysis research on the associations between air pollution and adverse pregnancy outcomes were analyzed. The methods of the characteristics of temporally-adjusted LUR models were introduced. The current epidemiological studies on adverse pregnancy outcomes that applied this model were systematically summarized. Recommendations for the development and application of LUR models in China are presented. This will encourage the implementation of more valid exposure predictions during pregnancy in large-scale epidemiological studies on the health effects of air pollution in China.
Chiu, Herng-Chia; Ho, Te-Wei; Lee, King-Teh; Chen, Hong-Yaw; Ho, Wen-Hsien
2013-01-01
The aim of this present study is firstly to compare significant predictors of mortality for hepatocellular carcinoma (HCC) patients undergoing resection between artificial neural network (ANN) and logistic regression (LR) models and secondly to evaluate the predictive accuracy of ANN and LR in different survival year estimation models. We constructed a prognostic model for 434 patients with 21 potential input variables by Cox regression model. Model performance was measured by numbers of significant predictors and predictive accuracy. The results indicated that ANN had double to triple numbers of significant predictors at 1-, 3-, and 5-year survival models as compared with LR models. Scores of accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve (AUROC) of 1-, 3-, and 5-year survival estimation models using ANN were superior to those of LR in all the training sets and most of the validation sets. The study demonstrated that ANN not only had a great number of predictors of mortality variables but also provided accurate prediction, as compared with conventional methods. It is suggested that physicians consider using data mining methods as supplemental tools for clinical decision-making and prognostic evaluation. PMID:23737707
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML)
Lechevalier, D.; Ak, R.; Ferguson, M.; Law, K. H.; Lee, Y.-T. T.; Rachuri, S.
2017-01-01
This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain. PMID:29202125
Gaussian Process Regression (GPR) Representation in Predictive Model Markup Language (PMML).
Park, J; Lechevalier, D; Ak, R; Ferguson, M; Law, K H; Lee, Y-T T; Rachuri, S
2017-01-01
This paper describes Gaussian process regression (GPR) models presented in predictive model markup language (PMML). PMML is an extensible-markup-language (XML) -based standard language used to represent data-mining and predictive analytic models, as well as pre- and post-processed data. The previous PMML version, PMML 4.2, did not provide capabilities for representing probabilistic (stochastic) machine-learning algorithms that are widely used for constructing predictive models taking the associated uncertainties into consideration. The newly released PMML version 4.3, which includes the GPR model, provides new features: confidence bounds and distribution for the predictive estimations. Both features are needed to establish the foundation for uncertainty quantification analysis. Among various probabilistic machine-learning algorithms, GPR has been widely used for approximating a target function because of its capability of representing complex input and output relationships without predefining a set of basis functions, and predicting a target output with uncertainty quantification. GPR is being employed to various manufacturing data-analytics applications, which necessitates representing this model in a standardized form for easy and rapid employment. In this paper, we present a GPR model and its representation in PMML. Furthermore, we demonstrate a prototype using a real data set in the manufacturing domain.
Hodyna, Diana; Kovalishyn, Vasyl; Rogalsky, Sergiy; Blagodatnyi, Volodymyr; Petko, Kirill; Metelytsia, Larisa
2016-09-01
Predictive QSAR models for the inhibitors of B. subtilis and Ps. aeruginosa among imidazolium-based ionic liquids were developed using literary data. The regression QSAR models were created through Artificial Neural Network and k-nearest neighbor procedures. The classification QSAR models were constructed using WEKA-RF (random forest) method. The predictive ability of the models was tested by fivefold cross-validation; giving q(2) = 0.77-0.92 for regression models and accuracy 83-88% for classification models. Twenty synthesized samples of 1,3-dialkylimidazolium ionic liquids with predictive value of activity level of antimicrobial potential were evaluated. For all asymmetric 1,3-dialkylimidazolium ionic liquids, only compounds containing at least one radical with alkyl chain length of 12 carbon atoms showed high antibacterial activity. However, the activity of symmetric 1,3-dialkylimidazolium salts was found to have opposite relationship with the length of aliphatic radical being maximum for compounds based on 1,3-dioctylimidazolium cation. The obtained experimental results suggested that the application of classification QSAR models is more accurate for the prediction of activity of new imidazolium-based ILs as potential antibacterials. © 2016 John Wiley & Sons A/S.
Quantifying discrimination of Framingham risk functions with different survival C statistics.
Pencina, Michael J; D'Agostino, Ralph B; Song, Linye
2012-07-10
Cardiovascular risk prediction functions offer an important diagnostic tool for clinicians and patients themselves. They are usually constructed with the use of parametric or semi-parametric survival regression models. It is essential to be able to evaluate the performance of these models, preferably with summaries that offer natural and intuitive interpretations. The concept of discrimination, popular in the logistic regression context, has been extended to survival analysis. However, the extension is not unique. In this paper, we define discrimination in survival analysis as the model's ability to separate those with longer event-free survival from those with shorter event-free survival within some time horizon of interest. This definition remains consistent with that used in logistic regression, in the sense that it assesses how well the model-based predictions match the observed data. Practical and conceptual examples and numerical simulations are employed to examine four C statistics proposed in the literature to evaluate the performance of survival models. We observe that they differ in the numerical values and aspects of discrimination that they capture. We conclude that the index proposed by Harrell is the most appropriate to capture discrimination described by the above definition. We suggest researchers report which C statistic they are using, provide a rationale for their selection, and be aware that comparing different indices across studies may not be meaningful. Copyright © 2012 John Wiley & Sons, Ltd.
Multiple Regression Analysis of mRNA-miRNA Associations in Colorectal Cancer Pathway
Wang, Fengfeng; Wong, S. C. Cesar; Chan, Lawrence W. C.; Cho, William C. S.; Yip, S. P.; Yung, Benjamin Y. M.
2014-01-01
Background. MicroRNA (miRNA) is a short and endogenous RNA molecule that regulates posttranscriptional gene expression. It is an important factor for tumorigenesis of colorectal cancer (CRC), and a potential biomarker for diagnosis, prognosis, and therapy of CRC. Our objective is to identify the related miRNAs and their associations with genes frequently involved in CRC microsatellite instability (MSI) and chromosomal instability (CIN) signaling pathways. Results. A regression model was adopted to identify the significantly associated miRNAs targeting a set of candidate genes frequently involved in colorectal cancer MSI and CIN pathways. Multiple linear regression analysis was used to construct the model and find the significant mRNA-miRNA associations. We identified three significantly associated mRNA-miRNA pairs: BCL2 was positively associated with miR-16 and SMAD4 was positively associated with miR-567 in the CRC tissue, while MSH6 was positively associated with miR-142-5p in the normal tissue. As for the whole model, BCL2 and SMAD4 models were not significant, and MSH6 model was significant. The significant associations were different in the normal and the CRC tissues. Conclusion. Our results have laid down a solid foundation in exploration of novel CRC mechanisms, and identification of miRNA roles as oncomirs or tumor suppressor mirs in CRC. PMID:24895601
Weaver, Brian Thomas; Fitzsimons, Kathleen; Braman, Jerrod; Haut, Roger
2016-09-01
The goal of the current study was to expand on previous work to validate the use of pressure insole technology in conjunction with linear regression models to predict the free torque at the shoe-surface interface that is generated while wearing different athletic shoes. Three distinctly different shoe designs were utilised. The stiffness of each shoe was determined with a material's testing machine. Six participants wore each shoe that was fitted with an insole pressure measurement device and performed rotation trials on an embedded force plate. A pressure sensor mask was constructed from those sensors having a high linear correlation with free torque values. Linear regression models were developed to predict free torques from these pressure sensor data. The models were able to accurately predict their own free torque well (RMS error 3.72 ± 0.74 Nm), but not that of the other shoes (RMS error 10.43 ± 3.79 Nm). Models performing self-prediction were also able to measure differences in shoe stiffness. The results of the current study showed the need for participant-shoe specific linear regression models to insure high prediction accuracy of free torques from pressure sensor data during isolated internal and external rotations of the body with respect to a planted foot.
Bernardo, Allan B I
2013-12-01
Two studies explore whether general beliefs about the social world or social axioms may be antecedents of dispositional hope. Social axioms are generalized cognitive representations that provide frames for constructing individuals' hope-related cognitions. Considering social axioms' instrumental and ego-defensive functions, two social axioms, social cynicism and reward for application are hypothesized to be negative and positive predictors of hope, respectively. Study 1 used multiple regression analysis to test the hypothesis. Study 2 used structural equation modeling to test the model with a pathway linking reward for application with hope, and another pathway linking social cynicism and hope that is mediated by self-esteem. The results are discussed in terms of extending the range of psychological constructs and processes that foster the development of hope. © 2013 The Scandinavian Psychological Associations.
NASA Astrophysics Data System (ADS)
Holburn, E. R.; Bledsoe, B. P.; Poff, N. L.; Cuhaciyan, C. O.
2005-05-01
Using over 300 R/EMAP sites in OR and WA, we examine the relative explanatory power of watershed, valley, and reach scale descriptors in modeling variation in benthic macroinvertebrate indices. Innovative metrics describing flow regime, geomorphic processes, and hydrologic-distance weighted watershed and valley characteristics are used in multiple regression and regression tree modeling to predict EPT richness, % EPT, EPT/C, and % Plecoptera. A nested design using seven ecoregions is employed to evaluate the influence of geographic scale and environmental heterogeneity on the explanatory power of individual and combined scales. Regression tree models are constructed to explain variability while identifying threshold responses and interactions. Cross-validated models demonstrate differences in the explanatory power associated with single-scale and multi-scale models as environmental heterogeneity is varied. Models explaining the greatest variability in biological indices result from multi-scale combinations of physical descriptors. Results also indicate that substantial variation in benthic macroinvertebrate response can be explained with process-based watershed and valley scale metrics derived exclusively from common geospatial data. This study outlines a general framework for identifying key processes driving macroinvertebrate assemblages across a range of scales and establishing the geographic extent at which various levels of physical description best explain biological variability. Such information can guide process-based stratification to avoid spurious comparison of dissimilar stream types in bioassessments and ensure that key environmental gradients are adequately represented in sampling designs.
Groundwater depth prediction in a shallow aquifer in north China by a quantile regression model
NASA Astrophysics Data System (ADS)
Li, Fawen; Wei, Wan; Zhao, Yong; Qiao, Jiale
2017-01-01
There is a close relationship between groundwater level in a shallow aquifer and the surface ecological environment; hence, it is important to accurately simulate and predict the groundwater level in eco-environmental construction projects. The multiple linear regression (MLR) model is one of the most useful methods to predict groundwater level (depth); however, the predicted values by this model only reflect the mean distribution of the observations and cannot effectively fit the extreme distribution data (outliers). The study reported here builds a prediction model of groundwater-depth dynamics in a shallow aquifer using the quantile regression (QR) method on the basis of the observed data of groundwater depth and related factors. The proposed approach was applied to five sites in Tianjin city, north China, and the groundwater depth was calculated in different quantiles, from which the optimal quantile was screened out according to the box plot method and compared to the values predicted by the MLR model. The results showed that the related factors in the five sites did not follow the standard normal distribution and that there were outliers in the precipitation and last-month (initial state) groundwater-depth factors because the basic assumptions of the MLR model could not be achieved, thereby causing errors. Nevertheless, these conditions had no effect on the QR model, as it could more effectively describe the distribution of original data and had a higher precision in fitting the outliers.
Modeling of geogenic radon in Switzerland based on ordered logistic regression.
Kropat, Georg; Bochud, François; Murith, Christophe; Palacios Gruson, Martha; Baechler, Sébastien
2017-01-01
The estimation of the radon hazard of a future construction site should ideally be based on the geogenic radon potential (GRP), since this estimate is free of anthropogenic influences and building characteristics. The goal of this study was to evaluate terrestrial gamma dose rate (TGD), geology, fault lines and topsoil permeability as predictors for the creation of a GRP map based on logistic regression. Soil gas radon measurements (SRC) are more suited for the estimation of GRP than indoor radon measurements (IRC) since the former do not depend on ventilation and heating habits or building characteristics. However, SRC have only been measured at a few locations in Switzerland. In former studies a good correlation between spatial aggregates of IRC and SRC has been observed. That's why we used IRC measurements aggregated on a 10 km × 10 km grid to calibrate an ordered logistic regression model for geogenic radon potential (GRP). As predictors we took into account terrestrial gamma doserate, regrouped geological units, fault line density and the permeability of the soil. The classification success rate of the model results to 56% in case of the inclusion of all 4 predictor variables. Our results suggest that terrestrial gamma doserate and regrouped geological units are more suited to model GRP than fault line density and soil permeability. Ordered logistic regression is a promising tool for the modeling of GRP maps due to its simplicity and fast computation time. Future studies should account for additional variables to improve the modeling of high radon hazard in the Jura Mountains of Switzerland. Copyright © 2016 The Authors. Published by Elsevier Ltd.. All rights reserved.
A simple diagnostic model for ruling out pneumoconiosis among construction workers.
Suarthana, Eva; Moons, Karel G M; Heederik, Dick; Meijer, Evert
2007-09-01
Construction workers exposed to silica-containing dust are at risk of developing silicosis even at low exposure levels. Health surveillance among these workers is commonly advised but the exact diagnostic work-up is not specified and therefore may result in unnecessary chest x ray investigations. To develop a simple diagnostic model to estimate the probability of an individual worker having pneumoconiosis from questionnaire and spirometry results, in order to accurately rule out workers without pneumoconiosis. The study was performed using cross-sectional data of 1291 Dutch natural stone and construction workers with potentially high quartz dust exposure. A multivariable logistic regression model was developed using chest x ray with ILO profusion category > or =1/1 as the reference standard. The model's calibration was evaluated with the Hosmer-Lemeshow test; the discriminative ability was determined by calculating the area under the receiver operating characteristic curve (ROC area). Internal validity of the final model was assessed by a bootstrapping procedure. For clinical application, the diagnostic model was transformed into an easy-to-use score chart. Age 40 years or older, current smoker, high-exposure job, working 15 years or longer in the construction industry, "feeling unhealthy" and FEV1 were independent predictors in the diagnostic model. The model showed good calibration (a non-significant Hosmer-Lemeshow test) and discriminative ability (ROC area 0.81, 95% CI 0.74 to 0.85). Internal validity was reasonable; the optimism corrected ROC area was 0.76. By using a cut-off point with a high negative predictive value the occupational physician can efficiently detect a large proportion of workers with a low probability of having pneumoconiosis and exclude them from unnecessary x ray investigations. This diagnostic model is an efficient and effective instrument to rule out pneumoconiosis among construction workers. Its use in health surveillance among these workers can reduce the number of redundant x ray investigations.
Kepler AutoRegressive Planet Search: Motivation & Methodology
NASA Astrophysics Data System (ADS)
Caceres, Gabriel; Feigelson, Eric; Jogesh Babu, G.; Bahamonde, Natalia; Bertin, Karine; Christen, Alejandra; Curé, Michel; Meza, Cristian
2015-08-01
The Kepler AutoRegressive Planet Search (KARPS) project uses statistical methodology associated with autoregressive (AR) processes to model Kepler lightcurves in order to improve exoplanet transit detection in systems with high stellar variability. We also introduce a planet-search algorithm to detect transits in time-series residuals after application of the AR models. One of the main obstacles in detecting faint planetary transits is the intrinsic stellar variability of the host star. The variability displayed by many stars may have autoregressive properties, wherein later flux values are correlated with previous ones in some manner. Auto-Regressive Moving-Average (ARMA) models, Generalized Auto-Regressive Conditional Heteroskedasticity (GARCH), and related models are flexible, phenomenological methods used with great success to model stochastic temporal behaviors in many fields of study, particularly econometrics. Powerful statistical methods are implemented in the public statistical software environment R and its many packages. Modeling involves maximum likelihood fitting, model selection, and residual analysis. These techniques provide a useful framework to model stellar variability and are used in KARPS with the objective of reducing stellar noise to enhance opportunities to find as-yet-undiscovered planets. Our analysis procedure consisting of three steps: pre-processing of the data to remove discontinuities, gaps and outliers; ARMA-type model selection and fitting; and transit signal search of the residuals using a new Transit Comb Filter (TCF) that replaces traditional box-finding algorithms. We apply the procedures to simulated Kepler-like time series with known stellar and planetary signals to evaluate the effectiveness of the KARPS procedures. The ARMA-type modeling is effective at reducing stellar noise, but also reduces and transforms the transit signal into ingress/egress spikes. A periodogram based on the TCF is constructed to concentrate the signal of these periodic spikes. When a periodic transit is found, the model is displayed on a standard period-folded averaged light curve. We also illustrate the efficient coding in R.
How to estimate green house gas (GHG) emissions from an excavator by using CAT's performance chart
NASA Astrophysics Data System (ADS)
Hajji, Apif M.; Lewis, Michael P.
2017-09-01
Construction equipment activities are a major part of many infrastructure projects. This type of equipment typically releases large quantities of green house gas (GHG) emissions. GHG emissions may come from fuel consumption. Furthermore, equipment productivity affects the fuel consumption. Thus, an estimating tool based on the construction equipment productivity rate is able to accurately assess the GHG emissions resulted from the equipment activities. This paper proposes a methodology to estimate the environmental impact for a common construction activity. This paper delivers sensitivity analysis and a case study for an excavator based on trench excavation activity. The methodology delivered in this study can be applied to a stand-alone model, or a module that is integrated with other emissions estimators. The GHG emissions are highly correlated to diesel fuel use, which is approximately 10.15 kilograms (kg) of CO2 per gallon of diesel fuel. The results showed that the productivity rate model as the result from multiple regression analysis can be used as the basis for estimating GHG emissions, and also as the framework for developing emissions footprint and understanding the environmental impact from construction equipment activities introduction.
Barbot, Baptiste; Crossman, Elizabeth; Hunter, Scott R.; Grigorenko, Elena L.; Luthar, Suniya S.
2014-01-01
This study examines longitudinally the bidirectional influences between maternal parenting (behaviors and parenting stress) and mothers' perceptions of their children's adjustment, in a multivariate approach. Data was gathered from 361 low-income mothers (many with psychiatric diagnoses) reporting on their parenting behavior, parenting stress and their child's adjustment, in a two-wave longitudinal study over 5 years. Measurement models were developed to derive four broad parenting constructs (Involvement, Control, Rejection, and Stress) and three child adjustment constructs (Internalizing problems, Externalizing problems, and Social competence). After measurement invariance of these constructs was confirmed across relevant groups and over time, both measurement models were integrated in a single crossed-lagged regression analysis of latent constructs. Multiple reciprocal influence were observed between parenting and perceived child adjustment over time: Externalizing and internalizing problems in children were predicted by baseline maternal parenting behaviors, while child social competence was found to reduce parental stress and increase parental involvement and appropriate monitoring. These findings on the motherhood experience are discussed in light of recent research efforts to understand mother-child bi-directional influences, and their potential for practical applications. PMID:25089759
DOE Office of Scientific and Technical Information (OSTI.GOV)
De Ruyck, Kim, E-mail: kim.deruyck@UGent.be; Sabbe, Nick; Oberije, Cary
2011-10-01
Purpose: To construct a model for the prediction of acute esophagitis in lung cancer patients receiving chemoradiotherapy by combining clinical data, treatment parameters, and genotyping profile. Patients and Methods: Data were available for 273 lung cancer patients treated with curative chemoradiotherapy. Clinical data included gender, age, World Health Organization performance score, nicotine use, diabetes, chronic disease, tumor type, tumor stage, lymph node stage, tumor location, and medical center. Treatment parameters included chemotherapy, surgery, radiotherapy technique, tumor dose, mean fractionation size, mean and maximal esophageal dose, and overall treatment time. A total of 332 genetic polymorphisms were considered in 112 candidatemore » genes. The predicting model was achieved by lasso logistic regression for predictor selection, followed by classic logistic regression for unbiased estimation of the coefficients. Performance of the model was expressed as the area under the curve of the receiver operating characteristic and as the false-negative rate in the optimal point on the receiver operating characteristic curve. Results: A total of 110 patients (40%) developed acute esophagitis Grade {>=}2 (Common Terminology Criteria for Adverse Events v3.0). The final model contained chemotherapy treatment, lymph node stage, mean esophageal dose, gender, overall treatment time, radiotherapy technique, rs2302535 (EGFR), rs16930129 (ENG), rs1131877 (TRAF3), and rs2230528 (ITGB2). The area under the curve was 0.87, and the false-negative rate was 16%. Conclusion: Prediction of acute esophagitis can be improved by combining clinical, treatment, and genetic factors. A multicomponent prediction model for acute esophagitis with a sensitivity of 84% was constructed with two clinical parameters, four treatment parameters, and four genetic polymorphisms.« less
Applying the health promotion model to development of a worksite intervention.
Lusk, S L; Kerr, M J; Ronis, D L; Eakin, B L
1999-01-01
Consistent use of hearing protection devices (HPDs) decreases noise-induced hearing loss, however, many workers do not use them consistently. Past research has supported the need to use a conceptual framework to understand behaviors and guide intervention programs; however, few reports have specified a process to translate a conceptual model into an intervention. The strongest predictors from the Health Promotion Model were used to design a training program to increase HPD use among construction workers. Carpenters (n = 118), operating engineers (n = 109), and plumber/pipefitters (n = 129) in the Midwest were recruited to participate in the study. Written questionnaires including scales measuring the components of the Health Promotion Model were completed in classroom settings at worker trade group meetings. All items from scales predicting HPD use were reviewed to determine the basis for the content of a program to promote the use of HPDs. Three selection criteria were developed: (1) correlation with use of hearing protection (at least .20), (2) amenability to change, and (3) room for improvement (mean score not at ceiling). Linear regression and Pearson's correlation were used to assess the components of the model as predictors of HPD use. Five predictors had statistically significant regression coefficients: perceived noise exposure, self-efficacy, value of use, barriers to use, and modeling of use of hearing protection. Using items meeting the selection criteria, a 20-minute videotape with written handouts was developed as the core of an intervention. A clearly defined practice session was also incorporated in the training intervention. Determining salient factors for worker populations and specific protective equipment prior to designing an intervention is essential. These predictors provided the basis for a training program that addressed the specific needs of construction workers. Results of tests of the effectiveness of the program will be available in the near future.
Folta, Sara C; Bell, Rick; Economos, Christina; Landers, Stewart; Goldberg, Jeanne P
2006-01-01
The purpose of this study was to test the utility of the Theory of Reasoned Action (TRA) in explaining young elementary school children's intention to consume legumes. A survey was conducted with children in an urban, multicultural community in Massachusetts. A total of 336 children participated. Logistic regression analysis was used to assess the strength of the relationship between attitude and subjective norm and intention. Although attitude was significantly associated with intention, the pseudo-R2 for the regression model that included only the TRA constructs was extremely low (.01). Adding demographic factors and preference improved the model's predictive ability, but attitude was no longer significant. The results of this study do not provide support for the predictive utility of the TRA with young elementary school children for this behavior, when demographic factors are accounted for. Hedonic factors, rather than reasoned judgments, may help drive children's intentions.
NASA Astrophysics Data System (ADS)
Liu, Yande; Ying, Yibin; Lu, Huishan; Fu, Xiaping
2005-11-01
A new method is proposed to eliminate the varying background and noise simultaneously for multivariate calibration of Fourier transform near infrared (FT-NIR) spectral signals. An ideal spectrum signal prototype was constructed based on the FT-NIR spectrum of fruit sugar content measurement. The performances of wavelet based threshold de-noising approaches via different combinations of wavelet base functions were compared. Three families of wavelet base function (Daubechies, Symlets and Coiflets) were applied to estimate the performance of those wavelet bases and threshold selection rules by a series of experiments. The experimental results show that the best de-noising performance is reached via the combinations of Daubechies 4 or Symlet 4 wavelet base function. Based on the optimization parameter, wavelet regression models for sugar content of pear were also developed and result in a smaller prediction error than a traditional Partial Least Squares Regression (PLSR) mode.
Schmid, Matthias; Küchenhoff, Helmut; Hoerauf, Achim; Tutz, Gerhard
2016-02-28
Survival trees are a popular alternative to parametric survival modeling when there are interactions between the predictor variables or when the aim is to stratify patients into prognostic subgroups. A limitation of classical survival tree methodology is that most algorithms for tree construction are designed for continuous outcome variables. Hence, classical methods might not be appropriate if failure time data are measured on a discrete time scale (as is often the case in longitudinal studies where data are collected, e.g., quarterly or yearly). To address this issue, we develop a method for discrete survival tree construction. The proposed technique is based on the result that the likelihood of a discrete survival model is equivalent to the likelihood of a regression model for binary outcome data. Hence, we modify tree construction methods for binary outcomes such that they result in optimized partitions for the estimation of discrete hazard functions. By applying the proposed method to data from a randomized trial in patients with filarial lymphedema, we demonstrate how discrete survival trees can be used to identify clinically relevant patient groups with similar survival behavior. Copyright © 2015 John Wiley & Sons, Ltd.
Dynamical Cognitive Models of Social Issues in Russia
NASA Astrophysics Data System (ADS)
Mitina, Olga; Abraham, Fred; Petrenko, Victor
We examine and model dynamics in three areas of social cognition: (1) political transformations within Russia, (2) evaluation of political trends in other countries by Russians, and (3) evaluation of Russian stereotypes concerning women. We try to represent consciousness as vectorfields and trajectories in a cognitive state space. We use psychosemantic techniques that allow definition of the state space and the systematic construction of these vectorfields and trajectories and their portrait from research data. Then we construct models to fit them, using multiple regression methods to obtain linear differential equations. These dynamical models of social cognition fit the data quite well. (1) The political transformations were modeled by a spiral repellor in a two-dimensional space of a democratic-totalitarian factor and social depression-optimism factor. (2) The evaluation of alien political trends included a flow away from a saddle toward more stable and moderate political regimes in a 2D space, of democratic-totalitarian and unstable-stable cognitive dimensions. (3) The gender study showed expectations (attractors) for more liberated, emancipated roles for women in the future.
Schmitt, Neal; Golubovich, Juliya; Leong, Frederick T L
2011-12-01
The impact of measurement invariance and the provision for partial invariance in confirmatory factor analytic models on factor intercorrelations, latent mean differences, and estimates of relations with external variables is investigated for measures of two sets of widely assessed constructs: Big Five personality and the six Holland interests (RIASEC). In comparing models that include provisions for partial invariance with models that do not, the results indicate quite small differences in parameter estimates involving the relations between factors, one relatively large standardized mean difference in factors between the subgroups compared and relatively small differences in the regression coefficients when the factors are used to predict external variables. The results provide support for the use of partially invariant models, but there does not seem to be a great deal of difference between structural coefficients when the measurement model does or does not include separate estimates of subgroup parameters that differ across subgroups. Future research should include simulations in which the impact of various factors related to invariance is estimated.
QSAR modeling of flotation collectors using principal components extracted from topological indices.
Natarajan, R; Nirdosh, Inderjit; Basak, Subhash C; Mills, Denise R
2002-01-01
Several topological indices were calculated for substituted-cupferrons that were tested as collectors for the froth flotation of uranium. The principal component analysis (PCA) was used for data reduction. Seven principal components (PC) were found to account for 98.6% of the variance among the computed indices. The principal components thus extracted were used in stepwise regression analyses to construct regression models for the prediction of separation efficiencies (Es) of the collectors. A two-parameter model with a correlation coefficient of 0.889 and a three-parameter model with a correlation coefficient of 0.913 were formed. PCs were found to be better than partition coefficient to form regression equations, and inclusion of an electronic parameter such as Hammett sigma or quantum mechanically derived electronic charges on the chelating atoms did not improve the correlation coefficient significantly. The method was extended to model the separation efficiencies of mercaptobenzothiazoles (MBT) and aminothiophenols (ATP) used in the flotation of lead and zinc ores, respectively. Five principal components were found to explain 99% of the data variability in each series. A three-parameter equation with correlation coefficient of 0.985 and a two-parameter equation with correlation coefficient of 0.926 were obtained for MBT and ATP, respectively. The amenability of separation efficiencies of chelating collectors to QSAR modeling using PCs based on topological indices might lead to the selection of collectors for synthesis and testing from a virtual database.
Development and Validation of MMPI-2-RF Scales for Indexing Triarchic Psychopathy Constructs.
Sellbom, Martin; Drislane, Laura E; Johnson, Alexandria K; Goodwin, Brandee E; Phillips, Tasha R; Patrick, Christopher J
2016-10-01
The triarchic model characterizes psychopathy in terms of three distinct dispositional constructs of boldness, meanness, and disinhibition. The model can be operationalized through scales designed specifically to index these domains or by using items from other inventories that provide coverage of related constructs. The present study sought to develop and validate scales for assessing the triarchic model domains using items from the Minnesota Multiphasic Personality Inventory-2-Restructured Form (MMPI-2-RF). A consensus rating approach was used to identify items relevant to each triarchic domain, and following psychometric refinement, the resulting MMPI-2-RF-based triarchic scales were evaluated for convergent and discriminant validity in relation to multiple psychopathy-relevant criterion variables in offender and nonoffender samples. Expected convergent and discriminant associations were evident very clearly for the Boldness and Disinhibition scales and somewhat less clearly for the Meanness scale. Moreover, hierarchical regression analyses indicated that all MMPI-2-RF triarchic scales incremented standard MMPI-2-RF scale scores in predicting extant triarchic model scale scores. The widespread use of MMPI-2-RF in clinical and forensic settings provides avenues for both clinical and research applications in contexts where traditional psychopathy measures are less likely to be administered. © The Author(s) 2015.
Kasaie, Parastu; Mathema, Barun; Kelton, W David; Azman, Andrew S; Pennington, Jeff; Dowdy, David W
2015-01-01
In any setting, a proportion of incident active tuberculosis (TB) reflects recent transmission ("recent transmission proportion"), whereas the remainder represents reactivation. Appropriately estimating the recent transmission proportion has important implications for local TB control, but existing approaches have known biases, especially where data are incomplete. We constructed a stochastic individual-based model of a TB epidemic and designed a set of simulations (derivation set) to develop two regression-based tools for estimating the recent transmission proportion from five inputs: underlying TB incidence, sampling coverage, study duration, clustered proportion of observed cases, and proportion of observed clusters in the sample. We tested these tools on a set of unrelated simulations (validation set), and compared their performance against that of the traditional 'n-1' approach. In the validation set, the regression tools reduced the absolute estimation bias (difference between estimated and true recent transmission proportion) in the 'n-1' technique by a median [interquartile range] of 60% [9%, 82%] and 69% [30%, 87%]. The bias in the 'n-1' model was highly sensitive to underlying levels of study coverage and duration, and substantially underestimated the recent transmission proportion in settings of incomplete data coverage. By contrast, the regression models' performance was more consistent across different epidemiological settings and study characteristics. We provide one of these regression models as a user-friendly, web-based tool. Novel tools can improve our ability to estimate the recent TB transmission proportion from data that are observable (or estimable) by public health practitioners with limited available molecular data.
A Novel Continuous Blood Pressure Estimation Approach Based on Data Mining Techniques.
Miao, Fen; Fu, Nan; Zhang, Yuan-Ting; Ding, Xiao-Rong; Hong, Xi; He, Qingyun; Li, Ye
2017-11-01
Continuous blood pressure (BP) estimation using pulse transit time (PTT) is a promising method for unobtrusive BP measurement. However, the accuracy of this approach must be improved for it to be viable for a wide range of applications. This study proposes a novel continuous BP estimation approach that combines data mining techniques with a traditional mechanism-driven model. First, 14 features derived from simultaneous electrocardiogram and photoplethysmogram signals were extracted for beat-to-beat BP estimation. A genetic algorithm-based feature selection method was then used to select BP indicators for each subject. Multivariate linear regression and support vector regression were employed to develop the BP model. The accuracy and robustness of the proposed approach were validated for static, dynamic, and follow-up performance. Experimental results based on 73 subjects showed that the proposed approach exhibited excellent accuracy in static BP estimation, with a correlation coefficient and mean error of 0.852 and -0.001 ± 3.102 mmHg for systolic BP, and 0.790 and -0.004 ± 2.199 mmHg for diastolic BP. Similar performance was observed for dynamic BP estimation. The robustness results indicated that the estimation accuracy was lower by a certain degree one day after model construction but was relatively stable from one day to six months after construction. The proposed approach is superior to the state-of-the-art PTT-based model for an approximately 2-mmHg reduction in the standard derivation at different time intervals, thus providing potentially novel insights for cuffless BP estimation.
Wheeler, David C; Czarnota, Jenna; Jones, Resa M
2017-01-01
Socioeconomic status (SES) is often considered a risk factor for health outcomes. SES is typically measured using individual variables of educational attainment, income, housing, and employment variables or a composite of these variables. Approaches to building the composite variable include using equal weights for each variable or estimating the weights with principal components analysis or factor analysis. However, these methods do not consider the relationship between the outcome and the SES variables when constructing the index. In this project, we used weighted quantile sum (WQS) regression to estimate an area-level SES index and its effect in a model of colonoscopy screening adherence in the Minnesota-Wisconsin Metropolitan Statistical Area. We considered several specifications of the SES index including using different spatial scales (e.g., census block group-level, tract-level) for the SES variables. We found a significant positive association (odds ratio = 1.17, 95% CI: 1.15-1.19) between the SES index and colonoscopy adherence in the best fitting model. The model with the best goodness-of-fit included a multi-scale SES index with 10 variables at the block group-level and one at the tract-level, with home ownership, race, and income among the most important variables. Contrary to previous index construction, our results were not consistent with an assumption of equal importance of variables in the SES index when explaining colonoscopy screening adherence. Our approach is applicable in any study where an SES index is considered as a variable in a regression model and the weights for the SES variables are not known in advance.
Estimating Required Contingency Funds for Construction Projects using Multiple Linear Regression
2006-03-01
Breusch - Pagan test , in which the null hypothesis states that the residuals have constant variance. The alternate hypothesis is that the residuals do not...variance, the Breusch - Pagan test provides statistical evidence that the assumption is justified. For the proposed model, the p-value is 0.173...entire test sample. v Acknowledgments First, I would like to acknowledge the influence and help of Greg Hoffman. His work served as the
Predictors of Depression in Youth With Crohn Disease
Clark, Jeffrey G.; Srinath, Arvind I.; Youk, Ada O.; Kirshner, Margaret A.; McCarthy, F. Nicole; Keljo, David J.; Bousvaros, Athos; DeMaso, David R.; Szigethy, Eva M.
2014-01-01
Objective The aim of the study was to determine whether infliximab use and other potential predictors are associated with decreased prevalence and severity of depression in pediatric patients with Crohn disease (CD). Methods A total of 550 (n = 550) youth ages 9 to 17 years with biopsy-confirmed CD were consecutively recruited as part of a multicenter randomized controlled trial. Out of the 550, 499 patients met study criteria and were included in the analysis. At recruitment, each subject and a parent completed the Children’s Depression Inventory (CDI). A child or parent CDI score ≥ 12 was used to denote clinically significant depressive symptoms (CSDS). Child and parent CDI scores were summed to form total CDI (CDIT). Infliximab use, demographic information, steroid use, laboratory values, and Pediatric Crohn’s Disease Activity Index (PCDAI) were collected as the potential predictors of depression. Univariate regression models were constructed to determine the relations among predictors, CSDS, and CDIT. Stepwise multivariate regression models were constructed to predict the relation between infliximab use and depression while controlling for other predictors of depression. Results Infliximab use was not associated with a decreased proportion of CSDS and CDIT after adjusting for multiple comparisons. CSDS and CDIT were positively associated with PCDAI, erythrocyte sedimentation rate, and steroid dose (P<0.01) and negatively associated with socioeconomic status (SES) (P<0.001). In multivariate models, PCDAI and SES were the strongest predictors of depression. Conclusions Disease activity and SES are significant predictors of depression in youth with Crohn disease. PMID:24343281
Estimating procedure times for surgeries by determining location parameters for the lognormal model.
Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H
2004-05-01
We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.
Zhang, Xingyu; Kim, Joyce; Patzer, Rachel E; Pitts, Stephen R; Patzer, Aaron; Schrager, Justin D
2017-10-26
To describe and compare logistic regression and neural network modeling strategies to predict hospital admission or transfer following initial presentation to Emergency Department (ED) triage with and without the addition of natural language processing elements. Using data from the National Hospital Ambulatory Medical Care Survey (NHAMCS), a cross-sectional probability sample of United States EDs from 2012 and 2013 survey years, we developed several predictive models with the outcome being admission to the hospital or transfer vs. discharge home. We included patient characteristics immediately available after the patient has presented to the ED and undergone a triage process. We used this information to construct logistic regression (LR) and multilayer neural network models (MLNN) which included natural language processing (NLP) and principal component analysis from the patient's reason for visit. Ten-fold cross validation was used to test the predictive capacity of each model and receiver operating curves (AUC) were then calculated for each model. Of the 47,200 ED visits from 642 hospitals, 6,335 (13.42%) resulted in hospital admission (or transfer). A total of 48 principal components were extracted by NLP from the reason for visit fields, which explained 75% of the overall variance for hospitalization. In the model including only structured variables, the AUC was 0.824 (95% CI 0.818-0.830) for logistic regression and 0.823 (95% CI 0.817-0.829) for MLNN. Models including only free-text information generated AUC of 0.742 (95% CI 0.731- 0.753) for logistic regression and 0.753 (95% CI 0.742-0.764) for MLNN. When both structured variables and free text variables were included, the AUC reached 0.846 (95% CI 0.839-0.853) for logistic regression and 0.844 (95% CI 0.836-0.852) for MLNN. The predictive accuracy of hospital admission or transfer for patients who presented to ED triage overall was good, and was improved with the inclusion of free text data from a patient's reason for visit regardless of modeling approach. Natural language processing and neural networks that incorporate patient-reported outcome free text may increase predictive accuracy for hospital admission.
Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide
2009-01-01
The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Kunwar P., E-mail: kpsingh_52@yahoo.com; Environmental Chemistry Division, CSIR-Indian Institute of Toxicology Research, Post Box 80, Mahatma Gandhi Marg, Lucknow 226 001; Gupta, Shikha
Robust global models capable of discriminating positive and non-positive carcinogens; and predicting carcinogenic potency of chemicals in rodents were developed. The dataset of 834 structurally diverse chemicals extracted from Carcinogenic Potency Database (CPDB) was used which contained 466 positive and 368 non-positive carcinogens. Twelve non-quantum mechanical molecular descriptors were derived. Structural diversity of the chemicals and nonlinearity in the data were evaluated using Tanimoto similarity index and Brock–Dechert–Scheinkman statistics. Probabilistic neural network (PNN) and generalized regression neural network (GRNN) models were constructed for classification and function optimization problems using the carcinogenicity end point in rat. Validation of the models wasmore » performed using the internal and external procedures employing a wide series of statistical checks. PNN constructed using five descriptors rendered classification accuracy of 92.09% in complete rat data. The PNN model rendered classification accuracies of 91.77%, 80.70% and 92.08% in mouse, hamster and pesticide data, respectively. The GRNN constructed with nine descriptors yielded correlation coefficient of 0.896 between the measured and predicted carcinogenic potency with mean squared error (MSE) of 0.44 in complete rat data. The rat carcinogenicity model (GRNN) applied to the mouse and hamster data yielded correlation coefficient and MSE of 0.758, 0.71 and 0.760, 0.46, respectively. The results suggest for wide applicability of the inter-species models in predicting carcinogenic potency of chemicals. Both the PNN and GRNN (inter-species) models constructed here can be useful tools in predicting the carcinogenicity of new chemicals for regulatory purposes. - Graphical abstract: Figure (a) shows classification accuracies (positive and non-positive carcinogens) in rat, mouse, hamster, and pesticide data yielded by optimal PNN model. Figure (b) shows generalization and predictive abilities of the interspecies GRNN model to predict the carcinogenic potency of diverse chemicals. - Highlights: • Global robust models constructed for carcinogenicity prediction of diverse chemicals. • Tanimoto/BDS test revealed structural diversity of chemicals and nonlinearity in data. • PNN/GRNN successfully predicted carcinogenicity/carcinogenic potency of chemicals. • Developed interspecies PNN/GRNN models for carcinogenicity prediction. • Proposed models can be used as tool to predict carcinogenicity of new chemicals.« less
Statistically Controlling for Confounding Constructs Is Harder than You Think
Westfall, Jacob; Yarkoni, Tal
2016-01-01
Social scientists often seek to demonstrate that a construct has incremental validity over and above other related constructs. However, these claims are typically supported by measurement-level models that fail to consider the effects of measurement (un)reliability. We use intuitive examples, Monte Carlo simulations, and a novel analytical framework to demonstrate that common strategies for establishing incremental construct validity using multiple regression analysis exhibit extremely high Type I error rates under parameter regimes common in many psychological domains. Counterintuitively, we find that error rates are highest—in some cases approaching 100%—when sample sizes are large and reliability is moderate. Our findings suggest that a potentially large proportion of incremental validity claims made in the literature are spurious. We present a web application (http://jakewestfall.org/ivy/) that readers can use to explore the statistical properties of these and other incremental validity arguments. We conclude by reviewing SEM-based statistical approaches that appropriately control the Type I error rate when attempting to establish incremental validity. PMID:27031707
NASA Astrophysics Data System (ADS)
Shah-Heydari pour, A.; Pahlavani, P.; Bigdeli, B.
2017-09-01
According to the industrialization of cities and the apparent increase in pollutants and greenhouse gases, the importance of forests as the natural lungs of the earth is felt more than ever to clean these pollutants. Annually, a large part of the forests is destroyed due to the lack of timely action during the fire. Knowledge about areas with a high-risk of fire and equipping these areas by constructing access routes and allocating the fire-fighting equipment can help to eliminate the destruction of the forest. In this research, the fire risk of region was forecasted and the risk map of that was provided using MODIS images by applying geographically weighted regression model with Gaussian kernel and ordinary least squares over the effective parameters in forest fire including distance from residential areas, distance from the river, distance from the road, height, slope, aspect, soil type, land use, average temperature, wind speed, and rainfall. After the evaluation, it was found that the geographically weighted regression model with Gaussian kernel forecasted 93.4% of the all fire points properly, however the ordinary least squares method could forecast properly only 66% of the fire points.
Lin, Lixin; Wang, Yunjia; Teng, Jiyao; Wang, Xuchen
2016-02-01
Hyperspectral estimation of soil organic matter (SOM) in coal mining regions is an important tool for enhancing fertilization in soil restoration programs. The correlation--partial least squares regression (PLSR) method effectively solves the information loss problem of correlation--multiple linear stepwise regression, but results of the correlation analysis must be optimized to improve precision. This study considers the relationship between spectral reflectance and SOM based on spectral reflectance curves of soil samples collected from coal mining regions. Based on the major absorption troughs in the 400-1006 nm spectral range, PLSR analysis was performed using 289 independent bands of the second derivative (SDR) with three levels and measured SOM values. A wavelet-correlation-PLSR (W-C-PLSR) model was then constructed. By amplifying useful information that was previously obscured by noise, the W-C-PLSR model was optimal for estimating SOM content, with smaller prediction errors in both calibration (R(2) = 0.970, root mean square error (RMSEC) = 3.10, and mean relative error (MREC) = 8.75) and validation (RMSEV = 5.85 and MREV = 14.32) analyses, as compared with other models. Results indicate that W-C-PLSR has great potential to estimate SOM in coal mining regions.
Analysis of a Rocket Based Combined Cycle Engine during Rocket Only Operation
NASA Technical Reports Server (NTRS)
Smith, T. D.; Steffen, C. J., Jr.; Yungster, S.; Keller, D. J.
1998-01-01
The all rocket mode of operation is a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. However, outside of performing experiments or a full three dimensional analysis, there are no first order parametric models to estimate performance. As a result, an axisymmetric RBCC engine was used to analytically determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and statistical regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, percent of injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inject diameter ratio. A perfect gas computational fluid dynamics analysis was performed to obtain values of vacuum specific impulse. Statistical regression analysis was performed based on both full flow and gas generator engine cycles. Results were also found to be dependent upon the entire cycle assumptions. The statistical regression analysis determined that there were five significant linear effects, six interactions, and one second-order effect. Two parametric models were created to provide performance assessments of an RBCC engine in the all rocket mode of operation.
Feng, Yao-Ze; Elmasry, Gamal; Sun, Da-Wen; Scannell, Amalia G M; Walsh, Des; Morcy, Noha
2013-06-01
Bacterial pathogens are the main culprits for outbreaks of food-borne illnesses. This study aimed to use the hyperspectral imaging technique as a non-destructive tool for quantitative and direct determination of Enterobacteriaceae loads on chicken fillets. Partial least squares regression (PLSR) models were established and the best model using full wavelengths was obtained in the spectral range 930-1450 nm with coefficients of determination R(2)≥ 0.82 and root mean squared errors (RMSEs) ≤ 0.47 log(10)CFUg(-1). In further development of simplified models, second derivative spectra and weighted PLS regression coefficients (BW) were utilised to select important wavelengths. However, the three wavelengths (930, 1121 and 1345 nm) selected from BW were competent and more preferred for predicting Enterobacteriaceae loads with R(2) of 0.89, 0.86 and 0.87 and RMSEs of 0.33, 0.40 and 0.45 log(10)CFUg(-1) for calibration, cross-validation and prediction, respectively. Besides, the constructed prediction map provided the distribution of Enterobacteriaceae bacteria on chicken fillets, which cannot be achieved by conventional methods. It was demonstrated that hyperspectral imaging is a potential tool for determining food sanitation and detecting bacterial pathogens on food matrix without using complicated laboratory regimes. Copyright © 2012 Elsevier Ltd. All rights reserved.
Ugulu, Rex Asibuodu; Allen, Stephen
2017-12-01
The data presented in this article is an original data on "Investigating the role of onsite learning in the optimisation of craft gang's productivity in the construction industry". This article describes the constraints influencing craft gang's productivity and the influence of onsite learning on the blockwork craft gang's productivity. It also presented the method of data collection, using a semi-structured interview and an observation method to collect data from construction organisations. We provided statistics on the top most important constraints affecting the craft gang's productivity using 3-D Bar charts. In addition, we computed the correlation coefficients and the regression model on the influence of onsite learning on craft gang's productivity using the man-hour as the dependent variable. The relationship between blockwork inputs and cycle numbers was determined at 5% significance level. Finally, we presented data information on the application of the learning curve theory using the unit straight-line model equations and computed the learning rate of the observed craft gang's blockwork repetitive work.
Hope and Hopelessness: The Role of Hope in Buffering the Impact of Hopelessness on Suicidal Ideation
Huen, Jenny M. Y.; Ip, Brian Y. T.; Ho, Samuel M. Y.; Yip, Paul S. F.
2015-01-01
Objectives The present study investigated whether hope and hopelessness are better conceptualized as a single construct of bipolar spectrum or two distinct constructs and whether hope can moderate the relationship between hopelessness and suicidal ideation. Methods Hope, hopelessness, and suicidal ideation were measured in a community sample of 2106 participants through a population-based household survey. Results Confirmatory factor analyses showed that a measurement model with separate, correlated second-order factors of hope and hopelessness provided a good fit to the data and was significantly better than that of the model collapsing hope and hopelessness into a single second-order factor. Negative binomial regression showed that hope and hopelessness interacted such that the effect of hopelessness on suicidal ideation was lower in individuals with higher hope than individuals with lower hope. Conclusions Hope and hopelessness are two distinct but correlated constructs. Hope can act as a resilience factor that buffers the impact of hopelessness on suicidal ideation. Inducing hope in people may be a promising avenue for suicide prevention. PMID:26107687
Project risk management in the construction of high-rise buildings
NASA Astrophysics Data System (ADS)
Titarenko, Boris; Hasnaoui, Amir; Titarenko, Roman; Buzuk, Liliya
2018-03-01
This paper shows the project risk management methods, which allow to better identify risks in the construction of high-rise buildings and to manage them throughout the life cycle of the project. One of the project risk management processes is a quantitative analysis of risks. The quantitative analysis usually includes the assessment of the potential impact of project risks and their probabilities. This paper shows the most popular methods of risk probability assessment and tries to indicate the advantages of the robust approach over the traditional methods. Within the framework of the project risk management model a robust approach of P. Huber is applied and expanded for the tasks of regression analysis of project data. The suggested algorithms used to assess the parameters in statistical models allow to obtain reliable estimates. A review of the theoretical problems of the development of robust models built on the methodology of the minimax estimates was done and the algorithm for the situation of asymmetric "contamination" was developed.
Translational systems pharmacology‐based predictive assessment of drug‐induced cardiomyopathy
Messinis, Dimitris E.; Melas, Ioannis N.; Hur, Junguk; Varshney, Navya; Alexopoulos, Leonidas G.
2018-01-01
Drug‐induced cardiomyopathy contributes to drug attrition. We compared two pipelines of predictive modeling: (1) applying elastic net (EN) to differentially expressed genes (DEGs) of drugs; (2) applying integer linear programming (ILP) to construct each drug's signaling pathway starting from its targets to downstream proteins, to transcription factors, and to its DEGs in human cardiomyocytes, and then subjecting the genes/proteins in the drugs' signaling networks to EN regression. We classified 31 drugs with availability of DEGs into 13 toxic and 18 nontoxic drugs based on a clinical cardiomyopathy incidence cutoff of 0.1%. The ILP‐augmented modeling increased prediction accuracy from 79% to 88% (sensitivity: 88%; specificity: 89%) under leave‐one‐out cross validation. The ILP‐constructed signaling networks of drugs were better predictors than DEGs. Per literature, the microRNAs that reportedly regulate expression of our six top predictors are of diagnostic value for natural heart failure or doxorubicin‐induced cardiomyopathy. This translational predictive modeling might uncover potential biomarkers. PMID:29341478
Bouwhuis, Stef; Geuskens, Goedele A; Boot, Cécile R L; Bongers, Paulien M; van der Beek, Allard J
2017-08-01
To construct prediction models for transitions to combination multiple job holding (MJH) (multiple jobs as an employee) and hybrid MJH (being an employee and self-employed), among employees aged 45-64. A total of 5187 employees in the Netherlands completed online questionnaires annually between 2010 and 2013. We applied logistic regression analyses with a backward elimination strategy to construct prediction models. Transitions to combination MJH and hybrid MJH were best predicted by a combination of factors including: demographics, health and mastery, work characteristics, work history, skills and knowledge, social factors, and financial factors. Not having a permanent contract and a poor household financial situation predicted both transitions. Some predictors only predicted combination MJH, e.g., working part-time, or hybrid MJH, e.g., work-home interference. A wide variety of factors predict combination MJH and/or hybrid MJH. The prediction model approach allowed for the identification of predictors that have not been previously studied. © 2017 Wiley Periodicals, Inc.
Development of a web service for analysis in a distributed network.
Jiang, Xiaoqian; Wu, Yuan; Marsolo, Keith; Ohno-Machado, Lucila
2014-01-01
We describe functional specifications and practicalities in the software development process for a web service that allows the construction of the multivariate logistic regression model, Grid Logistic Regression (GLORE), by aggregating partial estimates from distributed sites, with no exchange of patient-level data. We recently developed and published a web service for model construction and data analysis in a distributed environment. This recent paper provided an overview of the system that is useful for users, but included very few details that are relevant for biomedical informatics developers or network security personnel who may be interested in implementing this or similar systems. We focus here on how the system was conceived and implemented. We followed a two-stage development approach by first implementing the backbone system and incrementally improving the user experience through interactions with potential users during the development. Our system went through various stages such as concept proof, algorithm validation, user interface development, and system testing. We used the Zoho Project management system to track tasks and milestones. We leveraged Google Code and Apache Subversion to share code among team members, and developed an applet-servlet architecture to support the cross platform deployment. During the development process, we encountered challenges such as Information Technology (IT) infrastructure gaps and limited team experience in user-interface design. We figured out solutions as well as enabling factors to support the translation of an innovative privacy-preserving, distributed modeling technology into a working prototype. Using GLORE (a distributed model that we developed earlier) as a pilot example, we demonstrated the feasibility of building and integrating distributed modeling technology into a usable framework that can support privacy-preserving, distributed data analysis among researchers at geographically dispersed institutes.
Development of a Web Service for Analysis in a Distributed Network
Jiang, Xiaoqian; Wu, Yuan; Marsolo, Keith; Ohno-Machado, Lucila
2014-01-01
Objective: We describe functional specifications and practicalities in the software development process for a web service that allows the construction of the multivariate logistic regression model, Grid Logistic Regression (GLORE), by aggregating partial estimates from distributed sites, with no exchange of patient-level data. Background: We recently developed and published a web service for model construction and data analysis in a distributed environment. This recent paper provided an overview of the system that is useful for users, but included very few details that are relevant for biomedical informatics developers or network security personnel who may be interested in implementing this or similar systems. We focus here on how the system was conceived and implemented. Methods: We followed a two-stage development approach by first implementing the backbone system and incrementally improving the user experience through interactions with potential users during the development. Our system went through various stages such as concept proof, algorithm validation, user interface development, and system testing. We used the Zoho Project management system to track tasks and milestones. We leveraged Google Code and Apache Subversion to share code among team members, and developed an applet-servlet architecture to support the cross platform deployment. Discussion: During the development process, we encountered challenges such as Information Technology (IT) infrastructure gaps and limited team experience in user-interface design. We figured out solutions as well as enabling factors to support the translation of an innovative privacy-preserving, distributed modeling technology into a working prototype. Conclusion: Using GLORE (a distributed model that we developed earlier) as a pilot example, we demonstrated the feasibility of building and integrating distributed modeling technology into a usable framework that can support privacy-preserving, distributed data analysis among researchers at geographically dispersed institutes. PMID:25848586
Yu, Ya-Hui; Xia, Wei-Xiong; Shi, Jun-Li; Ma, Wen-Juan; Li, Yong; Ye, Yan-Fang; Liang, Hu; Ke, Liang-Ru; Lv, Xing; Yang, Jing; Xiang, Yan-Qun; Guo, Xiang
2016-06-29
For patients with nasopharyngeal carcinoma (NPC) who undergo re-irradiation with intensity-modulated radiotherapy (IMRT), lethal nasopharyngeal necrosis (LNN) is a severe late adverse event. The purpose of this study was to identify risk factors for LNN and develop a model to predict LNN after radical re-irradiation with IMRT in patients with recurrent NPC. Patients who underwent radical re-irradiation with IMRT for locally recurrent NPC between March 2001 and December 2011 and who had no evidence of distant metastasis were included in this study. Clinical characteristics, including recurrent carcinoma conditions and dosimetric features, were evaluated as candidate risk factors for LNN. Logistic regression analysis was used to identify independent risk factors and construct the predictive scoring model. Among 228 patients enrolled in this study, 204 were at risk of developing LNN based on risk analysis. Of the 204 patients treated, 31 (15.2%) developed LNN. Logistic regression analysis showed that female sex (P = 0.008), necrosis before re-irradiation (P = 0.008), accumulated total prescription dose to the gross tumor volume (GTV) ≥145.5 Gy (P = 0.043), and recurrent tumor volume ≥25.38 cm(3) (P = 0.009) were independent risk factors for LNN. A model to predict LNN was then constructed that included these four independent risk factors. A model that includes sex, necrosis before re-irradiation, accumulated total prescription dose to GTV, and recurrent tumor volume can effectively predict the risk of developing LNN in NPC patients who undergo radical re-irradiation with IMRT.
NASA Astrophysics Data System (ADS)
Polat, Esra; Gunay, Suleyman
2013-10-01
One of the problems encountered in Multiple Linear Regression (MLR) is multicollinearity, which causes the overestimation of the regression parameters and increase of the variance of these parameters. Hence, in case of multicollinearity presents, biased estimation procedures such as classical Principal Component Regression (CPCR) and Partial Least Squares Regression (PLSR) are then performed. SIMPLS algorithm is the leading PLSR algorithm because of its speed, efficiency and results are easier to interpret. However, both of the CPCR and SIMPLS yield very unreliable results when the data set contains outlying observations. Therefore, Hubert and Vanden Branden (2003) have been presented a robust PCR (RPCR) method and a robust PLSR (RPLSR) method called RSIMPLS. In RPCR, firstly, a robust Principal Component Analysis (PCA) method for high-dimensional data on the independent variables is applied, then, the dependent variables are regressed on the scores using a robust regression method. RSIMPLS has been constructed from a robust covariance matrix for high-dimensional data and robust linear regression. The purpose of this study is to show the usage of RPCR and RSIMPLS methods on an econometric data set, hence, making a comparison of two methods on an inflation model of Turkey. The considered methods have been compared in terms of predictive ability and goodness of fit by using a robust Root Mean Squared Error of Cross-validation (R-RMSECV), a robust R2 value and Robust Component Selection (RCS) statistic.
Trends in the Surgical Management of Crohn's Disease.
Geltzeiler, Cristina B; Hart, Kyle D; Lu, Kim C; Deveney, Karen E; Herzig, Daniel O; Tsikitis, Vassiliki L
2015-10-01
Although medical management of Crohn's disease has changed in recent years, it is unclear whether surgical management has altered. We examined rate changes of surgical interventions, stoma constructions, and subset of ileostomy and colostomy constructions. We reviewed the Nationwide Inpatient Sample database from 1988 to 2011. We examined the number of Crohn's-related operations and stoma constructions, including ileostomies and colostomies; a multivariable logistic regression model was developed. A total of 355,239 Crohn's-related operations were analyzed. Operations increased from 13,955 in 1988 to 17,577 in 2011, p < 0.001. Stoma construction increased from 2493 to 4283, p < 0.001. The subset of ileostomies increased from 1201 to 3169, p < 0.001 while colostomies decreased from 1351 to 1201, p = 0.05. Operation percentages resulting in stoma construction increased from 18 to 24 %, p < 0.001. Weight loss (OR 2.25, 95 % CI 1.88, 2.69) and presence of perianal fistulizing disease (OR 2.91, 95 % CI 2.31, 3.67) were most predictive for requiring stoma construction. Crohn's-related surgical interventions and stoma constructions have increased. The largest predictors for stoma construction are weight loss and perianal fistulizing disease. As a result, nutrition should be optimized and the early involvement of a multidisciplinary team should be considered.
Predictors of Indoor Air Concentrations in Smoking and Non-Smoking Residences
Héroux, Marie-Eve; Clark, Nina; Van Ryswyk, Keith; Mallick, Ranjeeta; Gilbert, Nicolas L.; Harrison, Ian; Rispler, Kathleen; Wang, Daniel; Anastassopoulos, Angelos; Guay, Mireille; MacNeill, Morgan; Wheeler, Amanda J.
2010-01-01
Indoor concentrations of air pollutants (benzene, toluene, formaldehyde, acetaldehyde, acrolein, nitrogen dioxide, particulate matter, elemental carbon and ozone) were measured in residences in Regina, Saskatchewan, Canada. Data were collected in 106 homes in winter and 111 homes in summer of 2007, with 71 homes participating in both seasons. In addition, data for relative humidity, temperature, air exchange rates, housing characteristics and occupants’ activities during sampling were collected. Multiple linear regression analysis was used to construct season-specific models for the air pollutants. Where smoking was a major contributor to indoor concentrations, separate models were constructed for all homes and for those homes with no cigarette smoke exposure. The housing characteristics and occupants’ activities investigated in this study explained between 11% and 53% of the variability in indoor air pollutant concentrations, with ventilation, age of home and attached garage being important predictors for many pollutants. PMID:20948949
NASA Astrophysics Data System (ADS)
Pang, Guofei; Perdikaris, Paris; Cai, Wei; Karniadakis, George Em
2017-11-01
The fractional advection-dispersion equation (FADE) can describe accurately the solute transport in groundwater but its fractional order has to be determined a priori. Here, we employ multi-fidelity Bayesian optimization to obtain the fractional order under various conditions, and we obtain more accurate results compared to previously published data. Moreover, the present method is very efficient as we use different levels of resolution to construct a stochastic surrogate model and quantify its uncertainty. We consider two different problem set ups. In the first set up, we obtain variable fractional orders of one-dimensional FADE, considering both synthetic and field data. In the second set up, we identify constant fractional orders of two-dimensional FADE using synthetic data. We employ multi-resolution simulations using two-level and three-level Gaussian process regression models to construct the surrogates.
Liggett, Jacqueline; Sellbom, Martin; Carmichael, Kieran L C
2017-12-01
The current study examined the extent to which the trait-based operationalization of obsessive-compulsive personality disorder (OCPD) in Section III of the DSM-5 describes the same construct as the one described in Section II. A community sample of 313 adults completed a series of personality inventories indexing the DSM-5 Sections II and III diagnostic criteria for OCPD, in addition to a measure of functional impairment modelled after the criteria in Section III. Results indicated that latent constructs representing Section II and Section III OCPD overlapped substantially (r = .75, p < .001). Hierarchical latent regression models revealed that at least three of the four DSM-5 Section III facets (Rigid Perfectionism, Perseveration, and Intimacy Avoidance) uniquely accounted for a large proportion of variance (53%) in a latent Section II OCPD variable. Further, Anxiousness and (low) Impulsivity, as well as self and interpersonal impairment, augmented the prediction of latent OCPD scores.
Preoperative predictive model of recovery of urinary continence after radical prostatectomy.
Matsushita, Kazuhito; Kent, Matthew T; Vickers, Andrew J; von Bodman, Christian; Bernstein, Melanie; Touijer, Karim A; Coleman, Jonathan A; Laudone, Vincent T; Scardino, Peter T; Eastham, James A; Akin, Oguz; Sandhu, Jaspreet S
2015-10-01
To build a predictive model of urinary continence recovery after radical prostatectomy (RP) that incorporates magnetic resonance imaging (MRI) parameters and clinical data. We conducted a retrospective review of data from 2,849 patients who underwent pelvic staging MRI before RP from November 2001 to June 2010. We used logistic regression to evaluate the association between each MRI variable and continence at 6 or 12 months, adjusting for age, body mass index (BMI) and American Society of Anesthesiologists (ASA) score, and then used multivariable logistic regression to create our model. A nomogram was constructed using the multivariable logistic regression models. In all, 68% (1,742/2,559) and 82% (2,205/2,689) regained function at 6 and 12 months, respectively. In the base model, age, BMI and ASA score were significant predictors of continence at 6 or 12 months on univariate analysis (P < 0.005). Among the preoperative MRI measurements, membranous urethral length, which showed great significance, was incorporated into the base model to create the full model. For continence recovery at 6 months, the addition of membranous urethral length increased the area under the curve (AUC) to 0.664 for the validation set, an increase of 0.064 over the base model. For continence recovery at 12 months, the AUC was 0.674, an increase of 0.085 over the base model. Using our model, the likelihood of continence recovery increases with membranous urethral length and decreases with age, BMI and ASA score. This model could be used for patient counselling and for the identification of patients at high risk for urinary incontinence in whom to study changes in operative technique that improve urinary function after RP. © 2015 The Authors BJU International © 2015 BJU International Published by John Wiley & Sons Ltd.
Sritara, C; Thakkinstian, A; Ongphiphadhanakul, B; Chailurkit, L; Chanprasertyothin, S; Ratanachaiwong, W; Vathesatogkit, P; Sritara, P
2014-05-01
Using mediation analysis, a causal relationship between the AHSG gene and bone mineral density (BMD) through fetuin-A and body mass index (BMI) mediators was suggested. Fetuin-A, a multifunctional protein of hepatic origin, is associated with bone mineral density. It is unclear if this association is causal. This study aimed at clarification of this issue. A cross-sectional study was conducted among 1,741 healthy workers from the Electricity Generating Authority of Thailand (EGAT) cohort. The alpha-2-Heremans-Schmid glycoprotein (AHSG) rs2248690 gene was genotyped. Three mediation models were constructed using seemingly unrelated regression analysis. First, the ln[fetuin-A] group was regressed on the AHSG gene. Second, the BMI group was regressed on the AHSG gene and the ln[fetuin-A] group. Finally, the BMD model was constructed by fitting BMD on two mediators (ln[fetuin-A] and BMI) and the independent AHSG variable. All three analyses were adjusted for confounders. The prevalence of the minor T allele for the AHSG locus was 15.2%. The AHSG locus was highly related to serum fetuin-A levels (P < 0.001). Multiple mediation analyses showed that AHSG was significantly associated with BMD through the ln[fetuin-A] and BMI pathway, with beta coefficients of 0.0060 (95% CI 0.0038, 0.0083) and 0.0030 (95% CI 0.0020, 0.0045) at the total hip and lumbar spine, respectively. About 27.3 and 26.0% of total genetic effects on hip and spine BMD, respectively, were explained by the mediation effects of fetuin-A and BMI. Our study suggested evidence of a causal relationship between the AHSG gene and BMD through fetuin-A and BMI mediators.
Singal, Amit G.; Mukherjee, Ashin; Elmunzer, B. Joseph; Higgins, Peter DR; Lok, Anna S.; Zhu, Ji; Marrero, Jorge A; Waljee, Akbar K
2015-01-01
Background Predictive models for hepatocellular carcinoma (HCC) have been limited by modest accuracy and lack of validation. Machine learning algorithms offer a novel methodology, which may improve HCC risk prognostication among patients with cirrhosis. Our study's aim was to develop and compare predictive models for HCC development among cirrhotic patients, using conventional regression analysis and machine learning algorithms. Methods We enrolled 442 patients with Child A or B cirrhosis at the University of Michigan between January 2004 and September 2006 (UM cohort) and prospectively followed them until HCC development, liver transplantation, death, or study termination. Regression analysis and machine learning algorithms were used to construct predictive models for HCC development, which were tested on an independent validation cohort from the Hepatitis C Antiviral Long-term Treatment against Cirrhosis (HALT-C) Trial. Both models were also compared to the previously published HALT-C model. Discrimination was assessed using receiver operating characteristic curve analysis and diagnostic accuracy was assessed with net reclassification improvement and integrated discrimination improvement statistics. Results After a median follow-up of 3.5 years, 41 patients developed HCC. The UM regression model had a c-statistic of 0.61 (95%CI 0.56-0.67), whereas the machine learning algorithm had a c-statistic of 0.64 (95%CI 0.60–0.69) in the validation cohort. The machine learning algorithm had significantly better diagnostic accuracy as assessed by net reclassification improvement (p<0.001) and integrated discrimination improvement (p=0.04). The HALT-C model had a c-statistic of 0.60 (95%CI 0.50-0.70) in the validation cohort and was outperformed by the machine learning algorithm (p=0.047). Conclusion Machine learning algorithms improve the accuracy of risk stratifying patients with cirrhosis and can be used to accurately identify patients at high-risk for developing HCC. PMID:24169273
Empirical Assessment of Spatial Prediction Methods for Location Cost Adjustment Factors
Migliaccio, Giovanni C.; Guindani, Michele; D'Incognito, Maria; Zhang, Linlin
2014-01-01
In the feasibility stage, the correct prediction of construction costs ensures that budget requirements are met from the start of a project's lifecycle. A very common approach for performing quick-order-of-magnitude estimates is based on using Location Cost Adjustment Factors (LCAFs) that compute historically based costs by project location. Nowadays, numerous LCAF datasets are commercially available in North America, but, obviously, they do not include all locations. Hence, LCAFs for un-sampled locations need to be inferred through spatial interpolation or prediction methods. Currently, practitioners tend to select the value for a location using only one variable, namely the nearest linear-distance between two sites. However, construction costs could be affected by socio-economic variables as suggested by macroeconomic theories. Using a commonly used set of LCAFs, the City Cost Indexes (CCI) by RSMeans, and the socio-economic variables included in the ESRI Community Sourcebook, this article provides several contributions to the body of knowledge. First, the accuracy of various spatial prediction methods in estimating LCAF values for un-sampled locations was evaluated and assessed in respect to spatial interpolation methods. Two Regression-based prediction models were selected, a Global Regression Analysis and a Geographically-weighted regression analysis (GWR). Once these models were compared against interpolation methods, the results showed that GWR is the most appropriate way to model CCI as a function of multiple covariates. The outcome of GWR, for each covariate, was studied for all the 48 states in the contiguous US. As a direct consequence of spatial non-stationarity, it was possible to discuss the influence of each single covariate differently from state to state. In addition, the article includes a first attempt to determine if the observed variability in cost index values could be, at least partially explained by independent socio-economic variables. PMID:25018582
Constructing inverse probability weights for continuous exposures: a comparison of methods.
Naimi, Ashley I; Moodie, Erica E M; Auger, Nathalie; Kaufman, Jay S
2014-03-01
Inverse probability-weighted marginal structural models with binary exposures are common in epidemiology. Constructing inverse probability weights for a continuous exposure can be complicated by the presence of outliers, and the need to identify a parametric form for the exposure and account for nonconstant exposure variance. We explored the performance of various methods to construct inverse probability weights for continuous exposures using Monte Carlo simulation. We generated two continuous exposures and binary outcomes using data sampled from a large empirical cohort. The first exposure followed a normal distribution with homoscedastic variance. The second exposure followed a contaminated Poisson distribution, with heteroscedastic variance equal to the conditional mean. We assessed six methods to construct inverse probability weights using: a normal distribution, a normal distribution with heteroscedastic variance, a truncated normal distribution with heteroscedastic variance, a gamma distribution, a t distribution (1, 3, and 5 degrees of freedom), and a quantile binning approach (based on 10, 15, and 20 exposure categories). We estimated the marginal odds ratio for a single-unit increase in each simulated exposure in a regression model weighted by the inverse probability weights constructed using each approach, and then computed the bias and mean squared error for each method. For the homoscedastic exposure, the standard normal, gamma, and quantile binning approaches performed best. For the heteroscedastic exposure, the quantile binning, gamma, and heteroscedastic normal approaches performed best. Our results suggest that the quantile binning approach is a simple and versatile way to construct inverse probability weights for continuous exposures.
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.
Gu, Ja K; Charles, Luenda E; Fekedulegn, Desta; Ma, Claudia C; Andrew, Michael E; Burchfiel, Cecil M
2016-04-01
The aim of this study was to estimate prevalence of injury by occupation and industry and obesity's role. Self-reported injuries were collected annually for US workers during 2004 to 2013. Prevalence ratios (PRs) and 95% confidence intervals (CIs) were obtained from fitted logistic regression models. Overall weighted injury prevalence during the previous three months was 77 per 10,000 workers. Age-adjusted injury prevalence was greatest for Construction and Extraction workers (169.7/10,000) followed by Production (160.6) among occupations, while workers in the Construction industry sector (147.9) had the highest injury prevalence followed by the Agriculture/Forestry/Fishing/Mining/Utilities sector (122.1). Overweight and obese workers were 26% to 45% more likely to experience injuries than normal-weight workers. The prevalence of injury, highest for Construction workers, gradually increased as body mass index levels increased in most occupational and industry groups.
Analysis of ethnic disparities in workers' compensation claims using data linkage.
Friedman, Lee S; Ruestow, Peter; Forst, Linda
2012-10-01
The overall goal of this research project was to assess ethnic disparities in monetary compensation among construction workers injured on the job through the linkage of medical records and workers' compensation data. Probabilistic linkage of medical records with workers' compensation claim data. In the final multivariable robust regression model, compensation was $5824 higher (P = 0.030; 95% confidence interval: 551 to 11,097) for white non-Hispanic workers than for other ethnic groups when controlling for injury severity, affected body region, type of injury, average weekly wage, weeks of temporary total disability, percent permanent partial disability, death, or attorney use. The analysis indicates that white non-Hispanic construction workers are awarded higher monetary settlements despite the observation that for specific injuries the mean temporary total disability and permanent partial disability were equivalent to or lower than those in Hispanic and black construction workers.
Stephens, Peggy C; Sloboda, Zili; Stephens, Richard C; Teasdale, Brent; Grey, Scott F; Hawthorne, Richard D; Williams, Joseph
2009-06-01
We examined the relationships among targeted constructs of social influences and competence enhancement prevention curricula and cigarette, alcohol and marijuana use outcomes in a diverse sample of high school students. We tested the causal relationships of normative beliefs, perceptions of harm, attitudes toward use of these substances and refusal, communication, and decision-making skills predicting the self-reported use of each substance. In addition, we modeled the meditation of these constructs through the intentions to use each substance and tested the moderating effects of the skills variables on the relationships between intentions to use and self-reported use of each of these substances. Logistic regression path models were constructed for each of the drug use outcomes. Models were run using the Mplus 5.0 statistical application using the complex sample function to control for the sampling design of students nested within schools; full information maximum likelihood estimates (FIML) were utilized to address missing data. Relationships among targeted constructs and outcomes differed for each of the drugs with communication skills having a potentially iatrogenic effect on alcohol use. Program targets were mediated through the intentions to use these substances. Finally, we found evidence of a moderating effect of decision-making skills on perceptions of harm and attitudes toward use, depending upon the outcome. Prevention curricula may need to target specific drugs. In addition to normative beliefs, perceptions of harm, and refusal and decision-making skills, programs should directly target constructs proximal to behavioral outcomes such as attitudes and intentions. Finally, more research on the effects of communication skills on adolescent substance use should be examined.
Validity of VO(2 max) in predicting blood volume: implications for the effect of fitness on aging
NASA Technical Reports Server (NTRS)
Convertino, V. A.; Ludwig, D. A.
2000-01-01
A multiple regression model was constructed to investigate the premise that blood volume (BV) could be predicted using several anthropometric variables, age, and maximal oxygen uptake (VO(2 max)). To test this hypothesis, age, calculated body surface area (height/weight composite), percent body fat (hydrostatic weight), and VO(2 max) were regressed on to BV using data obtained from 66 normal healthy men. Results from the evaluation of the full model indicated that the most parsimonious result was obtained when age and VO(2 max) were regressed on BV expressed per kilogram body weight. The full model accounted for 52% of the total variance in BV per kilogram body weight. Both age and VO(2 max) were related to BV in the positive direction. Percent body fat contributed <1% to the explained variance in BV when expressed in absolute BV (ml) or as BV per kilogram body weight. When the model was cross validated on 41 new subjects and BV per kilogram body weight was reexpressed as raw BV, the results indicated that the statistical model would be stable under cross validation (e.g., predictive applications) with an accuracy of +/- 1,200 ml at 95% confidence. Our results support the hypothesis that BV is an increasing function of aerobic fitness and to a lesser extent the age of the subject. The results may have implication as to a mechanism by which aerobic fitness and activity may be protective against reduced BV associated with aging.
González-Madroño, A; Mancha, A; Rodríguez, F J; Culebras, J; de Ulibarri, J I
2012-01-01
To ratify previous validations of the CONUT nutritional screening tool by the development of two probabilistic models using the parameters included in the CONUT, to see if the CONUT´s effectiveness could be improved. It is a two step prospective study. In Step 1, 101 patients were randomly selected, and SGA and CONUT was made. With data obtained an unconditional logistic regression model was developed, and two variants of CONUT were constructed: Model 1 was made by a method of logistic regression. Model 2 was made by dividing the probabilities of undernutrition obtained in model 1 in seven regular intervals. In step 2, 60 patients were selected and underwent the SGA, the original CONUT and the new models developed. The diagnostic efficacy of the original CONUT and the new models was tested by means of ROC curves. Both samples 1 and 2 were put together to measure the agreement degree between the original CONUT and SGA, and diagnostic efficacy parameters were calculated. No statistically significant differences were found between sample 1 and 2, regarding age, sex and medical/surgical distribution and undernutrition rates were similar (over 40%). The AUC for the ROC curves were 0.862 for the original CONUT, and 0.839 and 0.874, for model 1 and 2 respectively. The kappa index for the CONUT and SGA was 0.680. The CONUT, with the original scores assigned by the authors is equally good than mathematical models and thus is a valuable tool, highly useful and efficient for the purpose of Clinical Undernutrition screening.
[Mathematical modeling for conditionality of cardiovascular disease by housing conditions].
Meshkov, N A
2014-01-01
There was studied the influence of living conditions (housing area per capita, availability of housing water supply, sewerage and central heating) on the morbidity of the cardiovascular diseases in child and adult population. With the method of regression analysis the morbidity rate was established to significantly decrease with the increase in the area of housing, constructed models are statistically significant, respectively, p = 0.01 and p = 0.02. There was revealed the relationship of the morbidity rate of cardiovascular diseases in children and adults with the supply with housing central heating (p = 0.02 and p = 0.009).
Limb-darkening and the structure of the Jovian atmosphere
NASA Technical Reports Server (NTRS)
Newman, W. I.; Sagan, C.
1978-01-01
By observing the transit of various cloud features across the Jovian disk, limb-darkening curves were constructed for three regions in the 4.6 to 5.1 mu cm band. Several models currently employed in describing the radiative or dynamical properties of planetary atmospheres are here examined to understand their implications for limb-darkening. The statistical problem of fitting these models to the observed data is reviewed and methods for applying multiple regression analysis are discussed. Analysis of variance techniques are introduced to test the viability of a given physical process as a cause of the observed limb-darkening.
Chin, Shao-Hua; Kahathuduwa, Chanaka N; Stearns, Macy B; Davis, Tyler; Binks, Martin
2018-01-01
We considered 1) influence of self-reported hunger in behavioral and fMRI food-cue reactivity (fMRI-FCR) 2) optimal methods to model this. Adults (N = 32; 19-60 years; F = 21; BMI 30-39.9 kg/m 2 ) participated in an fMRI-FCR task that required rating 240 images of food and matched objects for 'appeal'. Hunger, satiety, thirst, fullness and emptiness were measured pre- and post-scan (visual analogue scales). Hunger, satiety, fullness and emptiness were combined to form a latent factor (appetite). Post-vs. pre-scores were compared using paired t-tests. In mixed-effects models, appeal/fMRI-FCR responses were regressed on image (i.e. food/objects), with random intercepts and slopes of image for functional runs nested within subjects. Each of hunger, satiety, thirst, fullness, emptiness and appetite were added as covariates in 4 forms (separate models): 1) change; 2) post- and pre-mean; 3) pre-; 4) change and pre-. Satiety decreased (Δ = -13.39, p = 0.001) and thirst increased (Δ = 11.78, p = 0.006) during the scan. Changes in other constructs were not significant (p's > 0.05). Including covariates did not influence food vs. object contrast of appeal ratings/fMRI-FCR. Significant image X covariate interactions were observed in some fMRI models. However, including these constructs did not improve the overall model fit. While some subjective, self-reported hunger, satiety and related constructs may be moderating fMRI-FCR, these constructs do not appear to be salient influences on appeal/fMRI-FCR in people with obesity undergoing fMRI. Copyright © 2017 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Palou, Anna; Miró, Aira; Blanco, Marcelo; Larraz, Rafael; Gómez, José Francisco; Martínez, Teresa; González, Josep Maria; Alcalà, Manel
2017-06-01
Even when the feasibility of using near infrared (NIR) spectroscopy combined with partial least squares (PLS) regression for prediction of physico-chemical properties of biodiesel/diesel blends has been widely demonstrated, inclusion in the calibration sets of the whole variability of diesel samples from diverse production origins still remains as an important challenge when constructing the models. This work presents a useful strategy for the systematic selection of calibration sets of samples of biodiesel/diesel blends from diverse origins, based on a binary code, principal components analysis (PCA) and the Kennard-Stones algorithm. Results show that using this methodology the models can keep their robustness over time. PLS calculations have been done using a specialized chemometric software as well as the software of the NIR instrument installed in plant, and both produced RMSEP under reproducibility values of the reference methods. The models have been proved for on-line simultaneous determination of seven properties: density, cetane index, fatty acid methyl esters (FAME) content, cloud point, boiling point at 95% of recovery, flash point and sulphur.
Data Analysis & Statistical Methods for Command File Errors
NASA Technical Reports Server (NTRS)
Meshkat, Leila; Waggoner, Bruce; Bryant, Larry
2014-01-01
This paper explains current work on modeling for managing the risk of command file errors. It is focused on analyzing actual data from a JPL spaceflight mission to build models for evaluating and predicting error rates as a function of several key variables. We constructed a rich dataset by considering the number of errors, the number of files radiated, including the number commands and blocks in each file, as well as subjective estimates of workload and operational novelty. We have assessed these data using different curve fitting and distribution fitting techniques, such as multiple regression analysis, and maximum likelihood estimation to see how much of the variability in the error rates can be explained with these. We have also used goodness of fit testing strategies and principal component analysis to further assess our data. Finally, we constructed a model of expected error rates based on the what these statistics bore out as critical drivers to the error rate. This model allows project management to evaluate the error rate against a theoretically expected rate as well as anticipate future error rates.
Lowenstein, Ariela
2007-03-01
The purpose of this study was to test empirically two major conceptualizations of parent-child relations in later adulthood-intergenerational solidarity-conflict and ambivalence paradigms-and their predictive validity on elders' quality of life using comparative cross-national data. Data were from a sample of 2,064 elders (aged 75 and older) from the five-country OASIS study (Old Age and Autonomy: The Role of Service Systems and Intergenerational Family Solidarity; Norway, England, Germany, Spain, and Israel). Multivariate and block-recursive regression models estimated the predictivity of the two conceptualizations of family dynamics on quality of life controlling for country, personal characteristics, and activity of daily living functioning. Descriptive analyses indicated that family solidarity, especially the affective/cognitive component (called Solidarity A), was high in all five countries, whereas conflict and ambivalence were low. When I entered all three constructs into the regression Solidarity A, reciprocal intergenerational support and ambivalence predicted quality of life. Controlling for activity of daily living functioning, socioeconomics status, and country, intergenerational relations had only a weak explanatory power, and personal resources explained most of the variance. The data suggest that the three constructs exist simultaneously but in varying combinations, confirming that in cross-cultural contexts family cohesion predominates, albeit with low degrees of conflict and ambivalence. The solidarity construct evidenced relatively robust measurement. More work is required to enhance the ambivalence measurement.
2004-03-01
constant variance via an analysis of the residuals, as well as the Breusch - Pagan test (see Figure 3 below). As a result, we follow the footsteps of...reasonably normal, which ensures that our residuals meet the assumption of constant variance by passing the Breusch - Pagan test (see Figure 4 below...sections for Research and Development, Test and Evaluation (RDT&E), procurement and military construction (Jarvaise, 1996:3). While differing
1992-05-01
regression analysis. The strength of any one variable can be estimated along with the strength of the entire model in explaining the variance of percent... applicable a set of damage functions is to a particular situation. Sometimes depth- damage functions are embedded in computer programs which calculate...functions. Chapter Six concludes with recommended policies on the development and application of depth-damage functions. 5 6 CHAPTER TWO CONSTRUCTION OF
2014-12-01
Primary Military Occupational Specialty PRO Proficiency Q-Q Quantile - Quantile RSS Residual Sum of Squares SI Shop Information T&R Training and...construct multivariate linear regression models to estimate Marines’ Computed Tier Score and time to achieve E-4 based on their individual personal...Science (GS) score, ASVAB Mathematics Knowledge (MK) score, ASVAB Paragraph Comprehension (PC) score, weight , and whether a Marine receives a weight
Ono, Daiki; Bamba, Takeshi; Oku, Yuichi; Yonetani, Tsutomu; Fukusaki, Eiichiro
2011-09-01
In this study, we constructed prediction models by metabolic fingerprinting of fresh green tea leaves using Fourier transform near-infrared (FT-NIR) spectroscopy and partial least squares (PLS) regression analysis to objectively optimize of the steaming process conditions in green tea manufacture. The steaming process is the most important step for manufacturing high quality green tea products. However, the parameter setting of the steamer is currently determined subjectively by the manufacturer. Therefore, a simple and robust system that can be used to objectively set the steaming process parameters is necessary. We focused on FT-NIR spectroscopy because of its simple operation, quick measurement, and low running costs. After removal of noise in the spectral data by principal component analysis (PCA), PLS regression analysis was performed using spectral information as independent variables, and the steaming parameters set by experienced manufacturers as dependent variables. The prediction models were successfully constructed with satisfactory accuracy. Moreover, the results of the demonstrated experiment suggested that the green tea steaming process parameters could be predicted on a larger manufacturing scale. This technique will contribute to improvement of the quality and productivity of green tea because it can objectively optimize the complicated green tea steaming process and will be suitable for practical use in green tea manufacture. Copyright © 2011 The Society for Biotechnology, Japan. Published by Elsevier B.V. All rights reserved.
Is an index of co-occurring unhealthy lifestyles suitable for understanding migrant health?
Feng, Xiaoqi; Astell-Burt, Thomas; Kolt, Gregory S
2014-12-01
This study investigated variation in unhealthy lifestyles within Australia according to where people were born. Multilevel linear regression models were used to explore variation in co-occurring unhealthy lifestyles (from 0 to 8) constructed from responses to tobacco smoking, alcohol consumption, moderate-to-vigorous physical activity and a range of dietary indicators for 217,498 adults born in 22 different countries now living in Australia. Models were adjusted for socio-economic variables. Data was from the 45 and Up Study (2006-2009). Further analyses involved multilevel logistic regression to examine country-of-birth patterning of each individual unhealthy lifestyle. Small differences in the co-occurrence of unhealthy lifestyles were observed by country of birth, ranging from 3.1 (Philippines) to 3.8 (Russia). More substantial variation was observed for each individual unhealthy lifestyle. Smoking and alcohol ranged from 7.3% and 4.2% (both China) to 28.5% (Lebanon) and 30.8% (Ireland) respectively. Non-adherence to physical activity guidelines was joint-highest among participants born in Japan and China (both 74.5%), but lowest among those born in Scandinavian countries (52.5%). Substantial variation in meeting national dietary guidelines was also evident between participants born in different countries. The growing trend for constructing unhealthy lifestyle indices can hide important variation in individual unhealthy lifestyles by country of birth. Copyright © 2014. Published by Elsevier Inc.
Fetal nasal bone length and Down syndrome during the second trimester in a Chinese population.
Hung, Jeng-Hsiu; Fu, Chong Yau; Chen, Chih-Yao; Chao, Kuan-Chong; Hung, Jamie
2008-08-01
The purpose of the present study was to build a database of reference ranges of fetal nasal bone length (NBL) in a Chinese population. The accuracy rate of detecting Down syndrome was also analyzed using fetal NBL as a marker. The control group of fetuses included 342 normal singleton pregnancies with no chromosomal or congenital anomalies. The present study was a cross-section study and the control group was used to construct percentile values of NBL from 13 to 29 gestational weeks of age. Two-dimensional ultrasonography was used for the nasal bone studies. Measurements of NBL were collected and each fetus contributed a single value to the reference sample. During the study period, 14 fetuses with Down syndrome were examined. Measurement of fetal NBL was made during amniocentesis, with gestational age ranging from 13 to 19 weeks. From 342 normal fetuses with gestational age ranging from 13 to 29 weeks, reference ranges of NBL were constructed. The reference ranges were constructed from the 100(1 - p)% reference range: Y +/- Zp x square root sigma 2, where Y = 25 - exp(3.58 - 0.044 x t + 0.0006 x t2), with Y being the fitted mean of regression model and t being gestational age (weeks). Using fetal NBL, the regression model was Pr(Down syndrome) = exp(W)/ [1 + exp(W)], where W = 0.62-4.80 x NBL (multiples of the median) in predicting Down syndrome. Fetal NBL was found to have a sensitivity and specificity of 0.78 and 0.78, respectively, in predicting Down syndrome in the second trimester of pregnancy. Fetal NBL measurement can provide a simple and useful algorithm to predict Down syndrome during the second trimester of pregnancy.
Yang, Jun; Xie, Sheng-Xue; Huang, Yiling; Ling, Min; Liu, Jihong; Ran, Yali; Wang, Yanlin; Thrasher, J Brantley; Berkland, Cory; Li, Benyi
2012-01-01
Background Prostate cancer is the major cause of cancer death in men and the androgen receptor (AR) has been shown to play a critical role in the progression of the disease. Our previous reports showed that knocking down the expression of the AR gene using a siRNA-based approach in prostate cancer cells led to apoptotic cell death and xenograft tumor eradication. In this study, we utilized a biodegradable nanoparticle to deliver the therapeutic AR shRNA construct specifically to prostate cancer cells. Materials & methods The biodegradable nanoparticles were fabricated using a poly(dl-lactic-co-glycolic acid) polymer and the AR shRNA constructs were loaded inside the particles. The surface of the nanoparticles were then conjugated with prostate-specific membrane antigen aptamer A10 for prostate cancer cell-specific targeting. Results A10-conjugation largely enhanced cellular uptake of nanoparticles in both cell culture- and xenograft-based models. The efficacy of AR shRNA encapsulated in nanoparticles on AR gene silencing was confirmed in PC-3/AR-derived xenografts in nude mice. The therapeutic property of A10-conjugated AR shRNA-loaded nanoparticles was evaluated in xenograft models with different prostate cancer cell lines: 22RV1, LAPC-4 and LNCaP. Upon two injections of the AR shRNA-loaded nanoparticles, rapid tumor regression was observed over 2 weeks. Consistent with previous reports, A10 aptamer conjugation significantly enhanced xenograft tumor regression compared with nonconjugated nanoparticles. Discussion These data demonstrated that tissue-specific delivery of AR shRNA using a biodegradable nanoparticle approach represents a novel therapy for life-threatening prostate cancers. PMID:22583574
Sarac, Melike; Koc, Ismet
2018-07-01
SummaryThe inability to have children affects couples worldwide and causes emotional and psychological distress in both men and women. Turkey is a country that lays particular emphasis on the issue of infertility, especially after experiencing a dramatic fertility decline over the last two decades. This study aimed to understand the changes in the prevalence of infertility in Turkey using three different approaches: the DHS Approach, the Constructed Approach and the Current Duration Approach. Furthermore, the factors contributing to elevated risks of infertility as derived from the Constructed Approach were investigated using four different logistic regression models. The data came from the 1993, 1998, 2003, 2008 and 2013 Demographic and Health Surveys conducted by the Hacettepe University Institute of Population Studies. The findings of the Constructed and Current Duration Approaches suggested that the prevalence of infertility decreased markedly from 1993 to 2013 in Turkey. This decline was the result of improvements in maternal health care services in Turkey, as well as an increase in the use of Assisted Reproductive Technology (ART), from 1.9% in 2008 to 4.1% in 2013. The results of the final logistic regression model suggested that the risk of infertility was significantly higher among women aged between 35 and 49 (p<0.01), uneducated women (p<0.01), women whose age at first marriage was over 30 (p<0.01), women defined as overweight (p<0.05) and women whose age at menarche was less than 12 years (p<0.05). This is the first nationwide study to examine the prevalence of infertility and its socio-demographic risk factors in Turkey, a developing country; previous studies have established these risk factors mainly in developed countries.
2011-01-01
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586
Keithley, Richard B; Wightman, R Mark
2011-06-07
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.
Herrero, A M; de la Hoz, L; Ordóñez, J A; Herranz, B; Romero de Ávila, M D; Cambero, M I
2008-11-01
The possibilities of using breaking strength (BS) and energy to fracture (EF) for monitoring textural properties of some cooked meat sausages (chopped, mortadella and galantines) were studied. Texture profile analysis (TPA), folding test and physico-chemical measurements were also performed. Principal component analysis enabled these meat products to be grouped into three textural profiles which showed significant (p<0.05) differences mainly for BS, hardness, adhesiveness and cohesiveness. Multivariate analysis indicated that BS, EF and TPA parameters were correlated (p<0.05) for every individual meat product (chopped, mortadella and galantines) and all products together. On the basis of these results, TPA parameters could be used for constructing regression models to predict BS. The resulting regression model for all cooked meat products was BS=-0.160+6.600∗cohesiveness-1.255∗adhesiveness+0.048∗hardness-506.31∗springiness (R(2)=0.745, p<0.00005). Simple linear regression analysis showed significant coefficients of determination between BS (R(2)=0.586, p<0.0001) versus folding test grade (FG) and EF versus FG (R(2)=0.564, p<0.0001).
NASA Astrophysics Data System (ADS)
Chen, Hui; Tan, Chao; Lin, Zan; Wu, Tong
2018-01-01
Milk is among the most popular nutrient source worldwide, which is of great interest due to its beneficial medicinal properties. The feasibility of the classification of milk powder samples with respect to their brands and the determination of protein concentration is investigated by NIR spectroscopy along with chemometrics. Two datasets were prepared for experiment. One contains 179 samples of four brands for classification and the other contains 30 samples for quantitative analysis. Principal component analysis (PCA) was used for exploratory analysis. Based on an effective model-independent variable selection method, i.e., minimal-redundancy maximal-relevance (MRMR), only 18 variables were selected to construct a partial least-square discriminant analysis (PLS-DA) model. On the test set, the PLS-DA model based on the selected variable set was compared with the full-spectrum PLS-DA model, both of which achieved 100% accuracy. In quantitative analysis, the partial least-square regression (PLSR) model constructed by the selected subset of 260 variables outperforms significantly the full-spectrum model. It seems that the combination of NIR spectroscopy, MRMR and PLS-DA or PLSR is a powerful tool for classifying different brands of milk and determining the protein content.
Accessing and constructing driving data to develop fuel consumption forecast model
NASA Astrophysics Data System (ADS)
Yamashita, Rei-Jo; Yao, Hsiu-Hsen; Hung, Shih-Wei; Hackman, Acquah
2018-02-01
In this study, we develop a forecasting models, to estimate fuel consumption based on the driving behavior, in which vehicles and routes are known. First, the driving data are collected via telematics and OBDII. Then, the driving fuel consumption formula is used to calculate the estimate fuel consumption, and driving behavior indicators are generated for analysis. Based on statistical analysis method, the driving fuel consumption forecasting model is constructed. Some field experiment results were done in this study to generate hundreds of driving behavior indicators. Based on data mining approach, the Pearson coefficient correlation analysis is used to filter highly fuel consumption related DBIs. Only highly correlated DBI will be used in the model. These DBIs are divided into four classes: speed class, acceleration class, Left/Right/U-turn class and the other category. We then use K-means cluster analysis to group to the driver class and the route class. Finally, more than 12 aggregate models are generated by those highly correlated DBIs, using the neural network model and regression analysis. Based on Mean Absolute Percentage Error (MAPE) to evaluate from the developed AMs. The best MAPE values among these AM is below 5%.
Teasing apart the effects of natural and constructed green ...
Summer low flows and stream temperature maxima are key drivers affecting the sustainability of fish populations. Thus, it is critical to understand both the natural templates of spatiotemporal variability, how these are shifting due to anthropogenic influences of development and climate change, and how these impacts can be moderated by natural and constructed green infrastructure. Low flow statistics of New England streams have been characterized using a combination of regression equations to describe long-term averages as a function of indicators of hydrologic regime (rain- versus snow-dominated), precipitation, evapotranspiration or temperature, surface water storage, baseflow recession rates, and impervious cover. Difference equations have been constructed to describe interannual variation in low flow as a function of changing air temperature, precipitation, and ocean-atmospheric teleconnection indices. Spatial statistical network models have been applied to explore fine-scale variability of thermal regimes along stream networks in New England as a function of variables describing natural and altered energy inputs, groundwater contributions, and retention time. Low flows exacerbate temperature impacts by reducing thermal inertia of streams to energy inputs. Based on these models, we can construct scenarios of fish habitat suitability using current and projected future climate and the potential for preservation and restoration of historic habitat regimes th
Asadi-Ghalehni, Majid; Rasaee, Mohamad Javad; RajabiBazl, Masoumeh; Khosravani, Masood; Motaghinejad, Majid; Javanmardi, Masoud; Khalili, Saeed; Modjtahedi, Helmout; Sadroddiny, Esmaeil
2017-12-01
Over-expression of epidermal growth factor receptor (EGFR) has been reported in a number of human malignancies. Strong expression of this receptor has been associated with poor survival in many such patients. Active immunizations that elicit antibodies of the desired type could be an appealing alternative to conventional passive immunization. In this regard, a novel recombinant peptide vaccine capable of prophylactic and therapeutic effects was constructed. A novel fusion recombinant peptide base vaccine consisting of L2 domain of murine extra-cellular domain-EGFR and EGFR mimotope (EM-L2) was constructed and its prophylactic and therapeutic effects in a Lewis lung carcinoma mouse (C57/BL6) model evaluated. Constructed recombinant peptide vaccine is capable of reacting with anti-EGFR antibodies. Immunization of mice with EM-L2 peptide resulted in antibody production against EM-L2. The constructed recombinant peptide vaccine reduced tumor growth and increased the survival rate. Designing effective peptide vaccines could be an encouraging strategy in contemporary cancer immunotherapy. Investigating the efficacy of such cancer immunotherapy approaches may open exciting possibilities concerning hyperimmunization, leading to more promising effects on tumor regression and proliferation. © 2017 The Societies and John Wiley & Sons Australia, Ltd.
Four Major South Korea's Rivers Using Deep Learning Models.
Lee, Sangmok; Lee, Donghyun
2018-06-24
Harmful algal blooms are an annual phenomenon that cause environmental damage, economic losses, and disease outbreaks. A fundamental solution to this problem is still lacking, thus, the best option for counteracting the effects of algal blooms is to improve advance warnings (predictions). However, existing physical prediction models have difficulties setting a clear coefficient indicating the relationship between each factor when predicting algal blooms, and many variable data sources are required for the analysis. These limitations are accompanied by high time and economic costs. Meanwhile, artificial intelligence and deep learning methods have become increasingly common in scientific research; attempts to apply the long short-term memory (LSTM) model to environmental research problems are increasing because the LSTM model exhibits good performance for time-series data prediction. However, few studies have applied deep learning models or LSTM to algal bloom prediction, especially in South Korea, where algal blooms occur annually. Therefore, we employed the LSTM model for algal bloom prediction in four major rivers of South Korea. We conducted short-term (one week) predictions by employing regression analysis and deep learning techniques on a newly constructed water quality and quantity dataset drawn from 16 dammed pools on the rivers. Three deep learning models (multilayer perceptron, MLP; recurrent neural network, RNN; and long short-term memory, LSTM) were used to predict chlorophyll-a, a recognized proxy for algal activity. The results were compared to those from OLS (ordinary least square) regression analysis and actual data based on the root mean square error (RSME). The LSTM model showed the highest prediction rate for harmful algal blooms and all deep learning models out-performed the OLS regression analysis. Our results reveal the potential for predicting algal blooms using LSTM and deep learning.
The impact of social norm change strategies on smokers' quitting behaviours.
Zhang, Xueying; Cowling, David W; Tang, Hao
2010-04-01
Using a social norm change paradigm model that reflects the California Tobacco Control Program's (CTCP) priorities, we compare the strength of the relationship of the social norm constructs to key smoking behavioural outcomes. Social norm constructs that correspond to CTCP's priority areas were created from selected California Adult Tobacco Survey knowledge, attitude and belief questions using confirmatory factor analysis. We then examined the relationship between these constructs and quitting behaviours using logistic regression. The secondhand smoke (SHS) and countering pro-tobacco influences'(CPTI) constructs followed a dose-response curve with quitting behaviours. Respondents who rated high on the SHS construct were about 70% more likely to have made a recent quit attempt in the last 12 months and about 100% more likely to intend to quit in the next 6 months than respondents who rated low on the SHS construct. For CPTI, respondents who rated high on this construct were 67% more likely to have made a recent quit attempt in the last 12 months and 62% more likely to have intentions to quit in the next 6 months than respondents who rated low on the CPTI construct. Social norm change constructs represent CTCP's priorities and are strongly related to desired individual behaviour outcomes. This analysis provides strong support for the framework underlying CTCP--namely, that changing social norms affects behaviour change at the individual level through changing population-level smoking-related behaviours.
Health belief model and reasoned action theory in predicting water saving behaviors in yazd, iran.
Morowatisharifabad, Mohammad Ali; Momayyezi, Mahdieh; Ghaneian, Mohammad Taghi
2012-01-01
People's behaviors and intentions about healthy behaviors depend on their beliefs, values, and knowledge about the issue. Various models of health education are used in deter¬mining predictors of different healthy behaviors but their efficacy in cultural behaviors, such as water saving behaviors, are not studied. The study was conducted to explain water saving beha¬viors in Yazd, Iran on the basis of Health Belief Model and Reasoned Action Theory. The cross-sectional study used random cluster sampling to recruit 200 heads of households to collect the data. The survey questionnaire was tested for its content validity and reliability. Analysis of data included descriptive statistics, simple correlation, hierarchical multiple regression. Simple correlations between water saving behaviors and Reasoned Action Theory and Health Belief Model constructs were statistically significant. Health Belief Model and Reasoned Action Theory constructs explained 20.80% and 8.40% of the variances in water saving beha-viors, respectively. Perceived barriers were the strongest Predictor. Additionally, there was a sta¬tistically positive correlation between water saving behaviors and intention. In designing interventions aimed at water waste prevention, barriers of water saving behaviors should be addressed first, followed by people's attitude towards water saving. Health Belief Model constructs, with the exception of perceived severity and benefits, is more powerful than is Reasoned Action Theory in predicting water saving behavior and may be used as a framework for educational interventions aimed at improving water saving behaviors.
Health Belief Model and Reasoned Action Theory in Predicting Water Saving Behaviors in Yazd, Iran
Morowatisharifabad, Mohammad Ali; Momayyezi, Mahdieh; Ghaneian, Mohammad Taghi
2012-01-01
Background: People's behaviors and intentions about healthy behaviors depend on their beliefs, values, and knowledge about the issue. Various models of health education are used in deter¬mining predictors of different healthy behaviors but their efficacy in cultural behaviors, such as water saving behaviors, are not studied. The study was conducted to explain water saving beha¬viors in Yazd, Iran on the basis of Health Belief Model and Reasoned Action Theory. Methods: The cross-sectional study used random cluster sampling to recruit 200 heads of households to collect the data. The survey questionnaire was tested for its content validity and reliability. Analysis of data included descriptive statistics, simple correlation, hierarchical multiple regression. Results: Simple correlations between water saving behaviors and Reasoned Action Theory and Health Belief Model constructs were statistically significant. Health Belief Model and Reasoned Action Theory constructs explained 20.80% and 8.40% of the variances in water saving beha-viors, respectively. Perceived barriers were the strongest Predictor. Additionally, there was a sta¬tistically positive correlation between water saving behaviors and intention. Conclusion: In designing interventions aimed at water waste prevention, barriers of water saving behaviors should be addressed first, followed by people's attitude towards water saving. Health Belief Model constructs, with the exception of perceived severity and benefits, is more powerful than is Reasoned Action Theory in predicting water saving behavior and may be used as a framework for educational interventions aimed at improving water saving behaviors. PMID:24688927
Athamneh, Liqa; Essien, E James; Sansgiry, Sujit S; Abughosh, Susan
2017-01-01
In this study, we examined the effect of theory of planned behavior (TPB) constructs on the intention to quit water pipe smoking by using an observational, survey-based, cross-sectional study design with a convenient sample of Arab American adults in Houston, Texas. Multivariate logistic regression models were used to determine predictors of intention to quit water pipe smoking in the next year. A total of 340 participants completed the survey. Behavioral evaluation, normative beliefs, and motivation to comply were significant predictors of an intention to quit water pipe smoking adjusting for age, gender, income, marital status, and education. Interventions and strategies that include these constructs will assist water pipe smokers in quitting.
NASA Technical Reports Server (NTRS)
York, P.; Labell, R. W.
1980-01-01
An aircraft wing weight estimating method based on a component buildup technique is described. A simplified analytically derived beam model, modified by a regression analysis, is used to estimate the wing box weight, utilizing a data base of 50 actual airplane wing weights. Factors representing materials and methods of construction were derived and incorporated into the basic wing box equations. Weight penalties to the wing box for fuel, engines, landing gear, stores and fold or pivot are also included. Methods for estimating the weight of additional items (secondary structure, control surfaces) have the option of using details available at the design stage (i.e., wing box area, flap area) or default values based on actual aircraft from the data base.
Dorota, Myszkowska
2013-03-01
The aim of the study was to construct the model forecasting the birch pollen season characteristics in Cracow on the basis of an 18-year data series. The study was performed using the volumetric method (Lanzoni/Burkard trap). The 98/95 % method was used to calculate the pollen season. The Spearman's correlation test was applied to find the relationship between the meteorological parameters and pollen season characteristics. To construct the predictive model, the backward stepwise multiple regression analysis was used including the multi-collinearity of variables. The predictive models best fitted the pollen season start and end, especially models containing two independent variables. The peak concentration value was predicted with the higher prediction error. Also the accuracy of the models predicting the pollen season characteristics in 2009 was higher in comparison with 2010. Both, the multi-variable model and one-variable model for the beginning of the pollen season included air temperature during the last 10 days of February, while the multi-variable model also included humidity at the beginning of April. The models forecasting the end of the pollen season were based on temperature in March-April, while the peak day was predicted using the temperature during the last 10 days of March.
NASA Astrophysics Data System (ADS)
Erener, Arzu; Sivas, A. Abdullah; Selcuk-Kestel, A. Sevtap; Düzgün, H. Sebnem
2017-07-01
All of the quantitative landslide susceptibility mapping (QLSM) methods requires two basic data types, namely, landslide inventory and factors that influence landslide occurrence (landslide influencing factors, LIF). Depending on type of landslides, nature of triggers and LIF, accuracy of the QLSM methods differs. Moreover, how to balance the number of 0 (nonoccurrence) and 1 (occurrence) in the training set obtained from the landslide inventory and how to select which one of the 1's and 0's to be included in QLSM models play critical role in the accuracy of the QLSM. Although performance of various QLSM methods is largely investigated in the literature, the challenge of training set construction is not adequately investigated for the QLSM methods. In order to tackle this challenge, in this study three different training set selection strategies along with the original data set is used for testing the performance of three different regression methods namely Logistic Regression (LR), Bayesian Logistic Regression (BLR) and Fuzzy Logistic Regression (FLR). The first sampling strategy is proportional random sampling (PRS), which takes into account a weighted selection of landslide occurrences in the sample set. The second method, namely non-selective nearby sampling (NNS), includes randomly selected sites and their surrounding neighboring points at certain preselected distances to include the impact of clustering. Selective nearby sampling (SNS) is the third method, which concentrates on the group of 1's and their surrounding neighborhood. A randomly selected group of landslide sites and their neighborhood are considered in the analyses similar to NNS parameters. It is found that LR-PRS, FLR-PRS and BLR-Whole Data set-ups, with order, yield the best fits among the other alternatives. The results indicate that in QLSM based on regression models, avoidance of spatial correlation in the data set is critical for the model's performance.
Using multiple linear regression model to estimate thunderstorm activity
NASA Astrophysics Data System (ADS)
Suparta, W.; Putro, W. S.
2017-03-01
This paper is aimed to develop a numerical model with the use of a nonlinear model to estimate the thunderstorm activity. Meteorological data such as Pressure (P), Temperature (T), Relative Humidity (H), cloud (C), Precipitable Water Vapor (PWV), and precipitation on a daily basis were used in the proposed method. The model was constructed with six configurations of input and one target output. The output tested in this work is the thunderstorm event when one-year data is used. Results showed that the model works well in estimating thunderstorm activities with the maximum epoch reaching 1000 iterations and the percent error was found below 50%. The model also found that the thunderstorm activities in May and October are detected higher than the other months due to the inter-monsoon season.
Wang, Ying; Goh, Joshua O; Resnick, Susan M; Davatzikos, Christos
2013-01-01
In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM) and functional (positron emission tomography of regional cerebral blood flow; PET) brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA). We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM), and short-term memory (STM). For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination) as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive status.
Hermes, Ilarraza-Lomelí; Marianna, García-Saldivia; Jessica, Rojano-Castillo; Carlos, Barrera-Ramírez; Rafael, Chávez-Domínguez; María Dolores, Rius-Suárez; Pedro, Iturralde
2016-10-01
Mortality due to cardiovascular disease is often associated with ventricular arrhythmias. Nowadays, patients with cardiovascular disease are more encouraged to take part in physical training programs. Nevertheless, high-intensity exercise is associated to a higher risk for sudden death, even in apparently healthy people. During an exercise testing (ET), health care professionals provide patients, in a controlled scenario, an intense physiological stimulus that could precipitate cardiac arrhythmia in high risk individuals. There is still no clinical or statistical tool to predict this incidence. The aim of this study was to develop a statistical model to predict the incidence of exercise-induced potentially life-threatening ventricular arrhythmia (PLVA) during high intensity exercise. 6415 patients underwent a symptom-limited ET with a Balke ramp protocol. A multivariate logistic regression model where the primary outcome was PLVA was performed. Incidence of PLVA was 548 cases (8.5%). After a bivariate model, thirty one clinical or ergometric variables were statistically associated with PLVA and were included in the regression model. In the multivariate model, 13 of these variables were found to be statistically significant. A regression model (G) with a X(2) of 283.987 and a p<0.001, was constructed. Significant variables included: heart failure, antiarrhythmic drugs, myocardial lower-VD, age and use of digoxin, nitrates, among others. This study allows clinicians to identify patients at risk of ventricular tachycardia or couplets during exercise, and to take preventive measures or appropriate supervision. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Tree allometry and improved estimation of carbon stocks and balance in tropical forests.
Chave, J; Andalo, C; Brown, S; Cairns, M A; Chambers, J Q; Eamus, D; Fölster, H; Fromard, F; Higuchi, N; Kira, T; Lescure, J-P; Nelson, B W; Ogawa, H; Puig, H; Riéra, B; Yamakura, T
2005-08-01
Tropical forests hold large stores of carbon, yet uncertainty remains regarding their quantitative contribution to the global carbon cycle. One approach to quantifying carbon biomass stores consists in inferring changes from long-term forest inventory plots. Regression models are used to convert inventory data into an estimate of aboveground biomass (AGB). We provide a critical reassessment of the quality and the robustness of these models across tropical forest types, using a large dataset of 2,410 trees >or= 5 cm diameter, directly harvested in 27 study sites across the tropics. Proportional relationships between aboveground biomass and the product of wood density, trunk cross-sectional area, and total height are constructed. We also develop a regression model involving wood density and stem diameter only. Our models were tested for secondary and old-growth forests, for dry, moist and wet forests, for lowland and montane forests, and for mangrove forests. The most important predictors of AGB of a tree were, in decreasing order of importance, its trunk diameter, wood specific gravity, total height, and forest type (dry, moist, or wet). Overestimates prevailed, giving a bias of 0.5-6.5% when errors were averaged across all stands. Our regression models can be used reliably to predict aboveground tree biomass across a broad range of tropical forests. Because they are based on an unprecedented dataset, these models should improve the quality of tropical biomass estimates, and bring consensus about the contribution of the tropical forest biome and tropical deforestation to the global carbon cycle.
NASA Astrophysics Data System (ADS)
Farag, A. Z. A.; Sultan, M.; Elkadiri, R.; Abdelhalim, A.
2014-12-01
An integrated approach using remote sensing, landscape analysis and statistical methods was conducted to assess the role of groundwater sapping in shaping the Saharan landscape. A GIS-based logistic regression model was constructed to automatically delineate the spatial distribution of the sapping features over areas occupied by the Nubian Sandstone Aquifer System (NSAS): (1) an inventory was compiled of known locations of sapping features identified either in the field or from satellite datasets (e.g. Orbview-3 and Google Earth Digital Globe imagery); (2) spatial analyses were conducted in a GIS environment and seven geomorphological and geological predisposing factors (i.e. slope, stream density, cross-sectional and profile curvature, minimum and maximum curvature, and lithology) were identified; (3) a binary logistic regression model was constructed, optimized and validated to describe the relationship between the sapping locations and the set of controlling factors and (4) the generated model (prediction accuracy: 90.1%) was used to produce a regional sapping map over the NSAS. Model outputs indicate: (1) groundwater discharge and structural control played an important role in excavating the Saharan natural depressions as evidenced by the wide distribution of sapping features (areal extent: 1180 km2) along the fault-controlled escarpments of the Libyan Plateau; (2) proximity of mapped sapping features to reported paleolake and tufa deposits suggesting a causal effect. Our preliminary observations (from satellite imagery) and statistical analyses together with previous studies in the North Western Sahara Aquifer System (North Africa), Sinai Peninsula, Negev Desert, and The Plateau of Najd (Saudi Arabia) indicate extensive occurrence of sapping features along the escarpments bordering the northern margins of the Saharan-Arabian Desert; these areas share similar hydrologic settings with the NSAS domains and they too witnessed wet climatic periods in the Mid-Late Quaternary.
Wan, Ke; Zhao, Jianxun; Huang, Hao; Zhang, Qing; Chen, Xi; Zeng, Zhi; Zhang, Li; Chen, Yucheng
2015-01-01
Aims High triglycerides (TG) and low high-density lipoprotein cholesterol (HDL-C) are cardiovascular risk factors. A positive correlation between elevated TG/HDL-C ratio and all-cause mortality and cardiovascular events exists in women. However, utility of TG to HDL-C ratio for prediction is unknown among acute coronary syndrome (ACS). Methods Fasting lipid profiles, detailed demographic data, and clinical data were obtained at baseline from 416 patients with ACS after coronary revascularization. Subjects were stratified into three levels of TG/HDL-C. We constructed multivariate Cox-proportional hazard models for all-cause mortality over a median follow-up of 3 years using log TG to HDL-C ratio as a predictor variable and analyzing traditional cardiovascular risk factors. We constructed a logistic regression model for major adverse cardiovascular events (MACEs) to prove that the TG/HDL-C ratio is a risk factor. Results The subject’s mean age was 64 ± 11 years; 54.5% were hypertensive, 21.8% diabetic, and 61.0% current or prior smokers. TG/HDL-C ratio ranged from 0.27 to 14.33. During the follow-up period, there were 43 deaths. In multivariate Cox models after adjusting for age, smoking, hypertension, diabetes, and severity of angiographic coronary disease, patients in the highest tertile of ACS had a 5.32-fold increased risk of mortality compared with the lowest tertile. After adjusting for conventional coronary heart disease risk factors by the logistic regression model, the TG/HDL-C ratio was associated with MACEs. Conclusion The TG to HDL-C ratio is a powerful independent predictor of all-cause mortality and is a risk factor of cardiovascular events. PMID:25880982
An improved advertising CTR prediction approach based on the fuzzy deep neural network
Gao, Shu; Li, Mingjiang
2018-01-01
Combining a deep neural network with fuzzy theory, this paper proposes an advertising click-through rate (CTR) prediction approach based on a fuzzy deep neural network (FDNN). In this approach, fuzzy Gaussian-Bernoulli restricted Boltzmann machine (FGBRBM) is first applied to input raw data from advertising datasets. Next, fuzzy restricted Boltzmann machine (FRBM) is used to construct the fuzzy deep belief network (FDBN) with the unsupervised method layer by layer. Finally, fuzzy logistic regression (FLR) is utilized for modeling the CTR. The experimental results show that the proposed FDNN model outperforms several baseline models in terms of both data representation capability and robustness in advertising click log datasets with noise. PMID:29727443
Studies of the Earth Energy Budget and Water Cycle Using Satellite Observations and Model Analyses
NASA Technical Reports Server (NTRS)
Campbell, G. G.; VonderHarr, T. H.; Randel, D. L.; Kidder, S. Q.
1997-01-01
During this research period we have utilized the ERBE data set in comparisons to surface properties and water vapor observations in the atmosphere. A relationship between cloudiness and surface temperature anomalies was found. This same relationship was found in a general circulation model, verifying the model. The attempt to construct a homogeneous time series from Nimbus 6, Nimbus 7 and ERBE data is not complete because we are still waiting for the ERBE reanalysis to be completed. It will be difficult to merge the Nimbus 6 data in because its observations occurred when the average weather was different than the other periods, so regression adjustments are not effective.
An improved advertising CTR prediction approach based on the fuzzy deep neural network.
Jiang, Zilong; Gao, Shu; Li, Mingjiang
2018-01-01
Combining a deep neural network with fuzzy theory, this paper proposes an advertising click-through rate (CTR) prediction approach based on a fuzzy deep neural network (FDNN). In this approach, fuzzy Gaussian-Bernoulli restricted Boltzmann machine (FGBRBM) is first applied to input raw data from advertising datasets. Next, fuzzy restricted Boltzmann machine (FRBM) is used to construct the fuzzy deep belief network (FDBN) with the unsupervised method layer by layer. Finally, fuzzy logistic regression (FLR) is utilized for modeling the CTR. The experimental results show that the proposed FDNN model outperforms several baseline models in terms of both data representation capability and robustness in advertising click log datasets with noise.
NASA Astrophysics Data System (ADS)
Boeke, R.; Taylor, P. C.; Li, Y.
2017-12-01
Arctic cloud amount as simulated in CMIP5 models displays large intermodel spread- models disagree on the processes important for cloud formation as well as the radiative impact of clouds. The radiative response to cloud forcing can be better assessed when the drivers of Arctic cloud formation are known. Arctic cloud amount (CA) is a function of both atmospheric and surface conditions, and it is crucial to separate the influences of unique processes to understand why the models are different. This study uses a multilinear regression methodology to determine cloud changes using 3 variables as predictors: lower tropospheric stability (LTS), 500-hPa vertical velocity (ω500), and sea ice concentration (SIC). These three explanatory variables were chosen because their effects on clouds can be attributed to unique climate processes: LTS is a thermodynamic indicator of the relationship between clouds and atmospheric stability, SIC determines the interaction between clouds and the surface, and ω500 is a metric for dynamical change. Vertical, seasonal profiles of necessary variables are obtained from the Coupled Model Intercomparison Project 5 (CMIP5) historical simulation, an ocean-atmosphere couple model forced with the best-estimate natural and anthropogenic radiative forcing from 1850-2005, and statistical significance tests are used to confirm the regression equation. A unique heuristic model will be constructed for each climate model and for observations, and models will be tested by their ability to capture the observed cloud amount and behavior. Lastly, the intermodel spread in Arctic cloud amount will be attributed to individual processes, ranking the relative contributions of each factor to shed light on emergent constraints in the Arctic cloud radiative effect.
Weight management behaviors in a sample of Iranian adolescent girls.
Garousi, S; Garrusi, B; Baneshi, Mohammad Reza; Sharifi, Z
2016-09-01
Attempts to obtain the ideal body shape portrayed in advertising can result in behaviors that lead to an unhealthy reduction in weight. This study was designed to identify contributing factors that may be effective in changing the behavior of a sample of Iranian adolescents. Three hundred fifty adolescent girls from high schools in Kerman, Iran participated in a cross-sectional study based on a self-administered questionnaire. Multifactorial logistic regression modeling was used to identify the factors influencing each of the contributing factors for body management methods, and a decision tree model was constructed to identify individuals who were more or less likely to change their body shape. Approximately one-third of the adolescent girls had attempted dieting, and 37 % of them had exercised to lose weight. The logistic regression model showed that pressure from their mother and the media; father's education level; and body mass index (BMI) were important factors in dieting. BMI and perceived pressure from the media were risk factors for attempting exercise. BMI and perceived pressure from relatives, particularly mothers, and the media were important factors in attempts by adolescent girls to lose weight.
NASA Astrophysics Data System (ADS)
Powell, James Eckhardt
Emissions inventories are an important tool, often built by governments, and used to manage emissions. To build an inventory of urban CO2 emissions and other fossil fuel combustion products in the urban atmosphere, an inventory of on-road traffic is required. In particular, a high resolution inventory is necessary to capture the local characteristics of transport emissions. These emissions vary widely due to the local nature of the fleet, fuel, and roads. Here we show a new model of ADT for the Portland, OR metropolitan region. The backbone is traffic counter recordings made by the Portland Bureau of Transportation at 7,767 sites over 21 years (1986-2006), augmented with PORTAL (The Portland Regional Transportation Archive Listing) freeway traffic count data. We constructed a regression model to fill in traffic network gaps using GIS data such as road class and population density. An EPA-supplied emissions factor was used to estimate transportation CO2 emissions, which is compared to several other estimates for the city's CO2 footprint.
A robust nonparametric framework for reconstruction of stochastic differential equation models
NASA Astrophysics Data System (ADS)
Rajabzadeh, Yalda; Rezaie, Amir Hossein; Amindavar, Hamidreza
2016-05-01
In this paper, we employ a nonparametric framework to robustly estimate the functional forms of drift and diffusion terms from discrete stationary time series. The proposed method significantly improves the accuracy of the parameter estimation. In this framework, drift and diffusion coefficients are modeled through orthogonal Legendre polynomials. We employ the least squares regression approach along with the Euler-Maruyama approximation method to learn coefficients of stochastic model. Next, a numerical discrete construction of mean squared prediction error (MSPE) is established to calculate the order of Legendre polynomials in drift and diffusion terms. We show numerically that the new method is robust against the variation in sample size and sampling rate. The performance of our method in comparison with the kernel-based regression (KBR) method is demonstrated through simulation and real data. In case of real dataset, we test our method for discriminating healthy electroencephalogram (EEG) signals from epilepsy ones. We also demonstrate the efficiency of the method through prediction in the financial data. In both simulation and real data, our algorithm outperforms the KBR method.
Miles, Jeremy N V; Kulesza, Magdalena; Ewing, Brett; Shih, Regina A; Tucker, Joan S; D'Amico, Elizabeth J
When researchers find an association between two variables, it is useful to evaluate the role of other constructs in this association. While assessing these mediation effects, it is important to determine if results are equal for different groups. It is possible that the strength of a mediation effect may differ for males and females, for example - such an effect is known as moderated mediation. Participants were 2532 adolescents from diverse ethnic/racial backgrounds and equally distributed across gender. The goal of this study was to investigate parental respect as a potential mediator of the relationship between gender and delinquency and mental health, and to determine whether observed mediation is moderated by gender. Parental respect mediated the association between gender and both delinquency and mental health. Specifically, parental respect was a protective factor against delinquency and mental health problems for both females and males. Demonstrated the process of estimating models in Lavaan, using two approaches (i.e. single group regression and multiple group regression model), and including covariates in both models.
Hao, Z Q; Li, C M; Shen, M; Yang, X Y; Li, K H; Guo, L B; Li, X Y; Lu, Y F; Zeng, X Y
2015-03-23
Laser-induced breakdown spectroscopy (LIBS) with partial least squares regression (PLSR) has been applied to measuring the acidity of iron ore, which can be defined by the concentrations of oxides: CaO, MgO, Al₂O₃, and SiO₂. With the conventional internal standard calibration, it is difficult to establish the calibration curves of CaO, MgO, Al₂O₃, and SiO₂ in iron ore due to the serious matrix effects. PLSR is effective to address this problem due to its excellent performance in compensating the matrix effects. In this work, fifty samples were used to construct the PLSR calibration models for the above-mentioned oxides. These calibration models were validated by the 10-fold cross-validation method with the minimum root-mean-square errors (RMSE). Another ten samples were used as a test set. The acidities were calculated according to the estimated concentrations of CaO, MgO, Al₂O₃, and SiO₂ using the PLSR models. The average relative error (ARE) and RMSE of the acidity achieved 3.65% and 0.0048, respectively, for the test samples.
Miles, Jeremy N.V.; Kulesza, Magdalena; Ewing, Brett; Shih, Regina A.; Tucker, Joan S.; D’Amico, Elizabeth J.
2015-01-01
Purpose When researchers find an association between two variables, it is useful to evaluate the role of other constructs in this association. While assessing these mediation effects, it is important to determine if results are equal for different groups. It is possible that the strength of a mediation effect may differ for males and females, for example – such an effect is known as moderated mediation. Design Participants were 2532 adolescents from diverse ethnic/racial backgrounds and equally distributed across gender. The goal of this study was to investigate parental respect as a potential mediator of the relationship between gender and delinquency and mental health, and to determine whether observed mediation is moderated by gender. Findings Parental respect mediated the association between gender and both delinquency and mental health. Specifically, parental respect was a protective factor against delinquency and mental health problems for both females and males. Practical implications Demonstrated the process of estimating models in Lavaan, using two approaches (i.e. single group regression and multiple group regression model), and including covariates in both models. PMID:26500722
On the interannual oscillations in the northern temperate total ozone
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krzyscin, J.W.
1994-07-01
The interannual variations in total ozone are studied using revised Dobson total ozone records (1961-1990) from 17 stations located within the latitude band 30 deg N - 60 deg N. To obtain the quasi-biennial oscillation (QBO), El Nino-Southern Oscillation (ENSO), and 11-year solar cycle manifestation in the `northern temperate` total ozone data, various multiple regression models are constructed by the least squares fitting to the observed ozone. The statistical relationships between the selected indices of the atmospheric variabilities and total ozone are described in the linear and nonlinear regression models. Nonlinear relationships to the predictor variables are found. That is,more » the total ozone variations are statistically modeled by nonlinear terms accounting for the coupling between QBO and ENSO, QBO and solar activity, and ENSO and solar activity. It is suggested that large reduction of total ozone values over the `northern temperate` region occurs in cold season when a strong ENSO warm event meets the west phase of the QBO during the period of high solar activity.« less
Lin, Lei; Wang, Qian; Sadek, Adel W
2016-06-01
The duration of freeway traffic accidents duration is an important factor, which affects traffic congestion, environmental pollution, and secondary accidents. Among previous studies, the M5P algorithm has been shown to be an effective tool for predicting incident duration. M5P builds a tree-based model, like the traditional classification and regression tree (CART) method, but with multiple linear regression models as its leaves. The problem with M5P for accident duration prediction, however, is that whereas linear regression assumes that the conditional distribution of accident durations is normally distributed, the distribution for a "time-to-an-event" is almost certainly nonsymmetrical. A hazard-based duration model (HBDM) is a better choice for this kind of a "time-to-event" modeling scenario, and given this, HBDMs have been previously applied to analyze and predict traffic accidents duration. Previous research, however, has not yet applied HBDMs for accident duration prediction, in association with clustering or classification of the dataset to minimize data heterogeneity. The current paper proposes a novel approach for accident duration prediction, which improves on the original M5P tree algorithm through the construction of a M5P-HBDM model, in which the leaves of the M5P tree model are HBDMs instead of linear regression models. Such a model offers the advantage of minimizing data heterogeneity through dataset classification, and avoids the need for the incorrect assumption of normality for traffic accident durations. The proposed model was then tested on two freeway accident datasets. For each dataset, the first 500 records were used to train the following three models: (1) an M5P tree; (2) a HBDM; and (3) the proposed M5P-HBDM, and the remainder of data were used for testing. The results show that the proposed M5P-HBDM managed to identify more significant and meaningful variables than either M5P or HBDMs. Moreover, the M5P-HBDM had the lowest overall mean absolute percentage error (MAPE). Copyright © 2016 Elsevier Ltd. All rights reserved.
Dehdari, Tahereh; Rahimi, Tahereh; Aryaeian, Naheed; Gohari, Mahmood Reza; Esfeh, Jabiz Modaresi
2014-01-01
To develop an instrument for measuring Health Promotion Model constructs in terms of breakfast consumption, and to identify the constructs that were predictors of breakfast consumption among Iranian female students. A questionnaire on Health Promotion Model variables was developed and potential predictors of breakfast consumption were assessed using this tool. One hundred female students, mean age 13 years (SD ± 1.2 years). Two middle schools from moderate-income areas in Qom, Iran. Health Promotion Model variables were assessed using a 58-item questionnaire. Breakfast consumption was also measured. Internal consistency (Cronbach alpha), content validity index, content validity ratio, multiple linear regression using stepwise method, and Pearson correlation. Content validity index and content validity ratio scores of the developed scale items were 0.89 and 0.93, respectively. Internal consistencies (range, .74-.91) of subscales were acceptable. Prior related behaviors, perceived barriers, self-efficacy, and competing demand and preferences were 4 constructs that could predict 63% variance of breakfast frequency per week among subjects. The instrument developed in this study may be a useful tool for researchers to explore factors affecting breakfast consumption among students. Students with a high level of self-efficacy, more prior related behavior, fewer perceived barriers, and fewer competing demands were most likely to regularly consume breakfast. Copyright © 2014 Society for Nutrition Education and Behavior. Published by Elsevier Inc. All rights reserved.
Trehan, Sumeet; Carlberg, Kevin T.; Durlofsky, Louis J.
2017-07-14
A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed bymore » simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trehan, Sumeet; Carlberg, Kevin T.; Durlofsky, Louis J.
A machine learning–based framework for modeling the error introduced by surrogate models of parameterized dynamical systems is proposed. The framework entails the use of high-dimensional regression techniques (eg, random forests, and LASSO) to map a large set of inexpensively computed “error indicators” (ie, features) produced by the surrogate model at a given time instance to a prediction of the surrogate-model error in a quantity of interest (QoI). This eliminates the need for the user to hand-select a small number of informative features. The methodology requires a training set of parameter instances at which the time-dependent surrogate-model error is computed bymore » simulating both the high-fidelity and surrogate models. Using these training data, the method first determines regression-model locality (via classification or clustering) and subsequently constructs a “local” regression model to predict the time-instantaneous error within each identified region of feature space. We consider 2 uses for the resulting error model: (1) as a correction to the surrogate-model QoI prediction at each time instance and (2) as a way to statistically model arbitrary functions of the time-dependent surrogate-model error (eg, time-integrated errors). We then apply the proposed framework to model errors in reduced-order models of nonlinear oil-water subsurface flow simulations, with time-varying well-control (bottom-hole pressure) parameters. The reduced-order models used in this work entail application of trajectory piecewise linearization in conjunction with proper orthogonal decomposition. Moreover, when the first use of the method is considered, numerical experiments demonstrate consistent improvement in accuracy in the time-instantaneous QoI prediction relative to the original surrogate model, across a large number of test cases. When the second use is considered, results show that the proposed method provides accurate statistical predictions of the time- and well-averaged errors.« less
Kasaie, Parastu; Mathema, Barun; Kelton, W. David; Azman, Andrew S.; Pennington, Jeff; Dowdy, David W.
2015-01-01
In any setting, a proportion of incident active tuberculosis (TB) reflects recent transmission (“recent transmission proportion”), whereas the remainder represents reactivation. Appropriately estimating the recent transmission proportion has important implications for local TB control, but existing approaches have known biases, especially where data are incomplete. We constructed a stochastic individual-based model of a TB epidemic and designed a set of simulations (derivation set) to develop two regression-based tools for estimating the recent transmission proportion from five inputs: underlying TB incidence, sampling coverage, study duration, clustered proportion of observed cases, and proportion of observed clusters in the sample. We tested these tools on a set of unrelated simulations (validation set), and compared their performance against that of the traditional ‘n-1’ approach. In the validation set, the regression tools reduced the absolute estimation bias (difference between estimated and true recent transmission proportion) in the ‘n-1’ technique by a median [interquartile range] of 60% [9%, 82%] and 69% [30%, 87%]. The bias in the ‘n-1’ model was highly sensitive to underlying levels of study coverage and duration, and substantially underestimated the recent transmission proportion in settings of incomplete data coverage. By contrast, the regression models’ performance was more consistent across different epidemiological settings and study characteristics. We provide one of these regression models as a user-friendly, web-based tool. Novel tools can improve our ability to estimate the recent TB transmission proportion from data that are observable (or estimable) by public health practitioners with limited available molecular data. PMID:26679499
Nojima, Masanori; Tokunaga, Mutsumi; Nagamura, Fumitaka
2018-05-05
To investigate under what circumstances inappropriate use of 'multivariate analysis' is likely to occur and to identify the population that needs more support with medical statistics. The frequency of inappropriate regression model construction in multivariate analysis and related factors were investigated in observational medical research publications. The inappropriate algorithm of using only variables that were significant in univariate analysis was estimated to occur at 6.4% (95% CI 4.8% to 8.5%). This was observed in 1.1% of the publications with a medical statistics expert (hereinafter 'expert') as the first author, 3.5% if an expert was included as coauthor and in 12.2% if experts were not involved. In the publications where the number of cases was 50 or less and the study did not include experts, inappropriate algorithm usage was observed with a high proportion of 20.2%. The OR of the involvement of experts for this outcome was 0.28 (95% CI 0.15 to 0.53). A further, nation-level, analysis showed that the involvement of experts and the implementation of unfavourable multivariate analysis are associated at the nation-level analysis (R=-0.652). Based on the results of this study, the benefit of participation of medical statistics experts is obvious. Experts should be involved for proper confounding adjustment and interpretation of statistical models. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Toward a Comprehensive Model of Frailty: An Emerging Concept From the Hong Kong Centenarian Study.
Kwan, Joseph Shiu Kwong; Lau, Bobo Hi Po; Cheung, Karen Siu Lan
2015-06-01
A better understanding of the essential components of frailty is important for future developments of management strategies. We aimed to assess the incremental validity of a Comprehensive Model of Frailty (CMF) over Frailty Index (FI) in predicting self-rated health and functional dependency amongst near-centenarians and centenarians. Cross-sectional, community-based study. Two community-based social and clinical networks. One hundred twenty-four community-dwelling Chinese near-centenarians and centenarians. Frailty was first assessed using a 32-item FI (FI-32). Then, a new CMF was constructed by adding 12 items in the psychological, social/family, environmental, and economic domains to the FI-32. Hierarchical multiple regressions explored whether the new CMF provided significant additional predictive power for self-rated health and instrumental activities of daily living (IADL) dependency. Mean age was 97.7 (standard deviation 2.3) years, with a range from 95 to 108, and 74.2% were female. Overall, 16% of our participants were nonfrail, 59% were prefrail, and 25% were frail. Frailty according to FI-32 significantly predicted self-rated health and IADL dependency beyond the effect of age and gender. Inclusion of the new CMF into the regression models provided significant additional predictive power beyond FI-32 on self-rated health, but not IADL dependency. A CMF should ideally be a multidimensional and multidisciplinary construct including physical, cognitive, functional, psychosocial/family, environmental, and economic factors. Copyright © 2015 AMDA - The Society for Post-Acute and Long-Term Care Medicine. Published by Elsevier Inc. All rights reserved.
A real-time prediction model for post-irradiation malignant cervical lymph nodes.
Lo, W-C; Cheng, P-W; Shueng, P-W; Hsieh, C-H; Chang, Y-L; Liao, L-J
2018-04-01
To establish a real-time predictive scoring model based on sonographic characteristics for identifying malignant cervical lymph nodes (LNs) in cancer patients after neck irradiation. One-hundred forty-four irradiation-treated patients underwent ultrasonography and ultrasound-guided fine-needle aspirations (USgFNAs), and the resultant data were used to construct a real-time and computerised predictive scoring model. This scoring system was further compared with our previously proposed prediction model. A predictive scoring model, 1.35 × (L axis) + 2.03 × (S axis) + 2.27 × (margin) + 1.48 × (echogenic hilum) + 3.7, was generated by stepwise multivariate logistic regression analysis. Neck LNs were considered to be malignant when the score was ≥ 7, corresponding to a sensitivity of 85.5%, specificity of 79.4%, positive predictive value (PPV) of 82.3%, negative predictive value (NPV) of 83.1%, and overall accuracy of 82.6%. When this new model and the original model were compared, the areas under the receiver operating characteristic curve (c-statistic) were 0.89 and 0.81, respectively (P < .05). A real-time sonographic predictive scoring model was constructed to provide prompt and reliable guidance for USgFNA biopsies to manage cervical LNs after neck irradiation. © 2017 John Wiley & Sons Ltd.
Novel Method for Incorporating Model Uncertainties into Gravitational Wave Parameter Estimates
NASA Astrophysics Data System (ADS)
Moore, Christopher J.; Gair, Jonathan R.
2014-12-01
Posterior distributions on parameters computed from experimental data using Bayesian techniques are only as accurate as the models used to construct them. In many applications, these models are incomplete, which both reduces the prospects of detection and leads to a systematic error in the parameter estimates. In the analysis of data from gravitational wave detectors, for example, accurate waveform templates can be computed using numerical methods, but the prohibitive cost of these simulations means this can only be done for a small handful of parameters. In this Letter, a novel method to fold model uncertainties into data analysis is proposed; the waveform uncertainty is analytically marginalized over using with a prior distribution constructed by using Gaussian process regression to interpolate the waveform difference from a small training set of accurate templates. The method is well motivated, easy to implement, and no more computationally expensive than standard techniques. The new method is shown to perform extremely well when applied to a toy problem. While we use the application to gravitational wave data analysis to motivate and illustrate the technique, it can be applied in any context where model uncertainties exist.
Fischer, Thomas; Fischer, Susanne; Himmel, Wolfgang; Kochen, Michael M; Hummers-Pradier, Eva
2008-01-01
The influence of patient characteristics on family practitioners' (FPs') diagnostic decision making has mainly been investigated using indirect methods such as vignettes or questionnaires. Direct observation-borrowed from social and cultural anthropology-may be an alternative method for describing FPs' real-life behavior and may help in gaining insight into how FPs diagnose respiratory tract infections, which are frequent in primary care. To clarify FPs' diagnostic processes when treating patients suffering from symptoms of respiratory tract infection. This direct observation study was performed in 30 family practices using a checklist for patient complaints, history taking, physical examination, and diagnoses. The influence of patients' symptoms and complaints on the FPs' physical examination and diagnosis was calculated by logistic regression analyses. Dummy variables based on combinations of symptoms and complaints were constructed and tested against saturated (full) and backward regression models. In total, 273 patients (median age 37 years, 51% women) were included. The median number of symptoms described was 4 per patient, and most information was provided at the patients' own initiative. Multiple logistic regression analysis showed a strong association between patients' complaints and the physical examination. Frequent diagnoses were upper respiratory tract infection (URTI)/common cold (43%), bronchitis (26%), sinusitis (12%), and tonsillitis (11%). There were no significant statistical differences between "simple heuristic'' models and saturated regression models in the diagnoses of bronchitis, sinusitis, and tonsillitis, indicating that simple heuristics are probably used by the FPs, whereas "URTI/common cold'' was better explained by the full model. FPs tended to make their diagnosis based on a few patient symptoms and a limited physical examination. Simple heuristic models were almost as powerful in explaining most diagnoses as saturated models. Direct observation allowed for the study of decision making under real conditions, yielding both quantitative data and "qualitative'' information about the FPs' performance. It is important for investigators to be aware of the specific disadvantages of the method (e.g., a possible observer effect).
Meta-Analysis of the Reasoned Action Approach (RAA) to Understanding Health Behaviors.
McEachan, Rosemary; Taylor, Natalie; Harrison, Reema; Lawton, Rebecca; Gardner, Peter; Conner, Mark
2016-08-01
Reasoned action approach (RAA) includes subcomponents of attitude (experiential/instrumental), perceived norm (injunctive/descriptive), and perceived behavioral control (capacity/autonomy) to predict intention and behavior. To provide a meta-analysis of the RAA for health behaviors focusing on comparing the pairs of RAA subcomponents and differences between health protection and health-risk behaviors. The present research reports a meta-analysis of correlational tests of RAA subcomponents, examination of moderators, and combined effects of subcomponents on intention and behavior. Regressions were used to predict intention and behavior based on data from studies measuring all variables. Capacity and experiential attitude had large, and other constructs had small-medium-sized correlations with intention; all constructs except autonomy were significant independent predictors of intention in regressions. Intention, capacity, and experiential attitude had medium-large, and other constructs had small-medium-sized correlations with behavior; intention, capacity, experiential attitude, and descriptive norm were significant independent predictors of behavior in regressions. The RAA subcomponents have utility in predicting and understanding health behaviors.
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Liu, Xiaoyan; Li, Feng; Ding, Yongsheng; Zou, Ting; Wang, Lu; Hao, Kuangrong
2015-01-01
A hierarchical support vector regression (SVR) model (HSVRM) was employed to correlate the compositions and mechanical properties of bicomponent stents composed of poly(lactic-co-glycolic acid) (PGLA) film and poly(glycolic acid) (PGA) fibers for urethral repair for the first time. PGLA film and PGA fibers could provide ureteral stents with good compressive and tensile properties, respectively. In bicomponent stents, high film content led to high stiffness, while high fiber content resulted in poor compressional properties. To simplify the procedures to optimize the ratio of PGLA film and PGA fiber in the stents, a hierarchical support vector regression model (HSVRM) and particle swarm optimization (PSO) algorithm were used to construct relationships between the film-to-fiber weight ratio and the measured compressional/tensile properties of the stents. The experimental data and simulated data fit well, proving that the HSVRM could closely reflect the relationship between the component ratio and performance properties of the ureteral stents. PMID:28793658
Negash, Selam; Wilson, Robert S.; Leurgans, Sue E.; Wolk, David A.; Schneider, Julie A.; Buchman, Aron S.; Bennett, David A.; Arnold, Steven. E.
2014-01-01
Background Although it is now evident that normal cognition can occur despite significant AD pathology, few studies have attempted to characterize this discordance, or examine factors that may contribute to resilient brain aging in the setting of AD pathology. Methods More than 2,000 older persons underwent annual evaluation as part of participation in the Religious Orders Study or Rush Memory Aging Project. A total of 966 subjects who had brain autopsy and comprehensive cognitive testing proximate to death were analyzed. Resilience was quantified as a continuous measure using linear regression modeling, where global cognition was entered as a dependent variable and global pathology was an independent variable. Studentized residuals generated from the model represented the discordance between cognition and pathology, and served as measure of resilience. The relation of resilience index to known risk factors for AD and related variables was examined. Results Multivariate regression models that adjusted for demographic variables revealed significant associations for early life socioeconomic status, reading ability, APOE-ε4 status, and past cognitive activity. A stepwise regression model retained reading level (estimate = 0.10, SE = 0.02; p < 0.0001) and past cognitive activity (estimate = 0.27, SE = 0.09; p = 0.002), suggesting the potential mediating role of these variables for resilience. Conclusions The construct of resilient brain aging can provide a framework for quantifying the discordance between cognition and pathology, and help identify factors that may mediate this relationship. PMID:23919768
Agarwal, Parul; Sambamoorthi, Usha
2015-12-01
Depression is common among individuals with osteoarthritis and leads to increased healthcare burden. The objective of this study was to examine excess total healthcare expenditures associated with depression among individuals with osteoarthritis in the US. Adults with self-reported osteoarthritis (n = 1881) were identified using data from the 2010 Medical Expenditure Panel Survey (MEPS). Among those with osteoarthritis, chi-square tests and ordinary least square regressions (OLS) were used to examine differences in healthcare expenditures between those with and without depression. Post-regression linear decomposition technique was used to estimate the relative contribution of different constructs of the Anderson's behavioral model, i.e., predisposing, enabling, need, personal healthcare practices, and external environment factors, to the excess expenditures associated with depression among individuals with osteoarthritis. All analysis accounted for the complex survey design of MEPS. Depression coexisted among 20.6 % of adults with osteoarthritis. The average total healthcare expenditures were $13,684 among adults with depression compared to $9284 among those without depression. Multivariable OLS regression revealed that adults with depression had 38.8 % higher healthcare expenditures (p < 0.001) compared to those without depression. Post-regression linear decomposition analysis indicated that 50 % of differences in expenditures among adults with and without depression can be explained by differences in need factors. Among individuals with coexisting osteoarthritis and depression, excess healthcare expenditures associated with depression were mainly due to comorbid anxiety, chronic conditions and poor health status. These expenditures may potentially be reduced by providing timely intervention for need factors or by providing care under a collaborative care model.
Yamakado, Minoru; Tanaka, Takayuki; Nagao, Kenji; Imaizumi, Akira; Komatsu, Michiharu; Daimon, Takashi; Miyano, Hiroshi; Tani, Mizuki; Toda, Akiko; Yamamoto, Hiroshi; Horimoto, Katsuhisa; Ishizaka, Yuko
2017-11-03
Fatty liver disease (FLD) increases the risk of diabetes, cardiovascular disease, and steatohepatitis, which leads to fibrosis, cirrhosis, and hepatocellular carcinoma. Thus, the early detection of FLD is necessary. We aimed to find a quantitative and feasible model for discriminating the FLD, based on plasma free amino acid (PFAA) profiles. We constructed models of the relationship between PFAA levels in 2,000 generally healthy Japanese subjects and the diagnosis of FLD by abdominal ultrasound scan by multiple logistic regression analysis with variable selection. The performance of these models for FLD discrimination was validated using an independent data set of 2,160 subjects. The generated PFAA-based model was able to identify FLD patients. The area under the receiver operating characteristic curve for the model was 0.83, which was higher than those of other existing liver function-associated markers ranging from 0.53 to 0.80. The value of the linear discriminant in the model yielded the adjusted odds ratio (with 95% confidence intervals) for a 1 standard deviation increase of 2.63 (2.14-3.25) in the multiple logistic regression analysis with known liver function-associated covariates. Interestingly, the linear discriminant values were significantly associated with the progression of FLD, and patients with nonalcoholic steatohepatitis also exhibited higher values.
Frank, Laurence E; Heiser, Willem J
2008-05-01
A set of features is the basis for the network representation of proximity data achieved by feature network models (FNMs). Features are binary variables that characterize the objects in an experiment, with some measure of proximity as response variable. Sometimes features are provided by theory and play an important role in the construction of the experimental conditions. In some research settings, the features are not known a priori. This paper shows how to generate features in this situation and how to select an adequate subset of features that takes into account a good compromise between model fit and model complexity, using a new version of least angle regression that restricts coefficients to be non-negative, called the Positive Lasso. It will be shown that features can be generated efficiently with Gray codes that are naturally linked to the FNMs. The model selection strategy makes use of the fact that FNM can be considered as univariate multiple regression model. A simulation study shows that the proposed strategy leads to satisfactory results if the number of objects is less than or equal to 22. If the number of objects is larger than 22, the number of features selected by our method exceeds the true number of features in some conditions.
Baba, Hiromi; Takahara, Jun-ichi; Yamashita, Fumiyoshi; Hashida, Mitsuru
2015-11-01
The solvent effect on skin permeability is important for assessing the effectiveness and toxicological risk of new dermatological formulations in pharmaceuticals and cosmetics development. The solvent effect occurs by diverse mechanisms, which could be elucidated by efficient and reliable prediction models. However, such prediction models have been hampered by the small variety of permeants and mixture components archived in databases and by low predictive performance. Here, we propose a solution to both problems. We first compiled a novel large database of 412 samples from 261 structurally diverse permeants and 31 solvents reported in the literature. The data were carefully screened to ensure their collection under consistent experimental conditions. To construct a high-performance predictive model, we then applied support vector regression (SVR) and random forest (RF) with greedy stepwise descriptor selection to our database. The models were internally and externally validated. The SVR achieved higher performance statistics than RF. The (externally validated) determination coefficient, root mean square error, and mean absolute error of SVR were 0.899, 0.351, and 0.268, respectively. Moreover, because all descriptors are fully computational, our method can predict as-yet unsynthesized compounds. Our high-performance prediction model offers an attractive alternative to permeability experiments for pharmaceutical and cosmetic candidate screening and optimizing skin-permeable topical formulations.
MEDISE: A macroeconomic model for energy planning in Costa Rica
DOE Office of Scientific and Technical Information (OSTI.GOV)
Booth, S.R.; Leiva, C.L.
This report describes the development and results of MEDISE, an econometric macroeconomic model for energy planning in Costa Rica. The model is a simultaneous system of 19 equations that was constructed using ENERPLAN, an energy planning tool developed by the United Nations for use in developing countries. The equations were estimated using regression analysis on a data time series of 1966 to 1984. ENERPLAN's model solution package was used to obtain forecasts of 19 economic variables from 1985 to 2005. the modeling effort was conducted jointly by Los Alamos Central American Energy and Resources Project (CAP) personnel and the Energymore » Sector Directorate of Costa Rica during 1986. The CAP was funded by the US Agency for International Development. 6 refs., 3 figs., 11 tabs.« less
Genetic Modification of the Relationship between Parental Rejection and Adolescent Alcohol Use.
Stogner, John M; Gibson, Chris L
2016-07-01
Parenting practices are associated with adolescents' alcohol consumption, however not all youth respond similarly to challenging family situations and harsh environments. This study examines the relationship between perceived parental rejection and adolescent alcohol use, and specifically evaluates whether youth who possess greater genetic sensitivity to their environment are more susceptible to negative parental relationships. Analyzing data from the National Longitudinal Study of Adolescent Health, we estimated a series of regression models predicting alcohol use during adolescence. A multiplicative interaction term between parental rejection and a genetic index was constructed to evaluate this potential gene-environment interaction. Results from logistic regression analyses show a statistically significant gene-environment interaction predicting alcohol use. The relationship between parental rejection and alcohol use was moderated by the genetic index, indicating that adolescents possessing more 'risk alleles' for five candidate genes were affected more by stressful parental relationships. Feelings of parental rejection appear to influence the alcohol use decisions of youth, but they do not do so equally for all. Higher scores on the constructed genetic sensitivity measure are related to increased susceptibility to negative parental relationships. © The Author 2016. Medical Council on Alcohol and Oxford University Press. All rights reserved.
Factors associated with self-medication in Spain: a cross-sectional study in different age groups.
Niclós, Gracia; Olivar, Teresa; Rodilla, Vicent
2018-06-01
The identification of factors which may influence a patient's decision to self-medicate. Descriptive, cross-sectional study of the adult population (at least 16 years old), using data from the 2009 European Health Interview Survey in Spain, which included 22 188 subjects. Logistic regression models enabled us to estimate the effect of each analysed variable on self-medication. In total, 14 863 (67%) individuals reported using medication (prescribed and non-prescribed) and 3274 (22.0%) of them self-medicated. Using logistic regression and stratifying by age, four different models have been constructed. Our results include different variables in each of the models to explain self-medication, but the one that appears on all four models is education level. Age is the other important factor which influences self-medication. Self-medication is strongly associated with factors related to socio-demographic, such as sex, educational level or age, as well as several health factors such as long-standing illness or physical activity. When our data are compared to those from previous Spanish surveys carried out in 2003 and 2006, we can conclude that self-medication is increasing in Spain. © 2017 Royal Pharmaceutical Society.
Fossati, Andrea; Widiger, Thomas A; Borroni, Serena; Maffei, Cesare; Somma, Antonella
2017-06-01
To extend the evidence on the reliability and construct validity of the Five-Factor Model Rating Form (FFMRF) in its self-report version, two independent samples of Italian participants, which were composed of 510 adolescent high school students and 457 community-dwelling adults, respectively, were administered the FFMRF in its Italian translation. Adolescent participants were also administered the Italian translation of the Borderline Personality Features Scale for Children-11 (BPFSC-11), whereas adult participants were administered the Italian translation of the Triarchic Psychopathy Measure (TriPM). Cronbach α values were consistent with previous findings; in both samples, average interitem r values indicated acceptable internal consistency for all FFMRF scales. A multidimensional graded item response theory model indicated that the majority of FFMRF items had adequate discrimination parameters; information indices supported the reliability of the FFMRF scales. Both categorical (i.e., item-level) and scale-level regression analyses suggested that the FFMRF scores may predict a nonnegligible amount of variance in the BPFSC-11 total score in adolescent participants, and in the TriPM scale scores in adult participants.
Sharma, Praveen; Singh, Lakhvinder; Dilbaghi, Neeraj
2009-05-30
Decolorization of textile azo dye Disperse Yellow 211 (DY 211) was carried out from simulated aqueous solution by bacterial strain Bacillus subtilis. Response surface methodology (RSM), involving Box-Behnken design matrix in three most important operating variables; temperature, pH and initial dye concentration was successfully employed for the study and optimization of decolorization process. The total 17 experiments were conducted in the study towards the construction of a quadratic model. According to analysis of variance (ANOVA) results, the proposed model can be used to navigate the design space. Under optimized conditions the bacterial strain was able to decolorize DY 211 up to 80%. Model indicated that initial dye concentration of 100 mgl(-1), pH 7 and a temperature of 32.5 degrees C were found optimum for maximum % decolorization. Very high regression coefficient between the variables and the response (R(2)=0.9930) indicated excellent evaluation of experimental data by polynomial regression model. The combination of the three variables predicted through RSM was confirmed through confirmatory experiments, hence the bacterial strain holds a great potential for the treatment of colored textile effluents.
Taslimitehrani, Vahid; Dong, Guozhu; Pereira, Naveen L; Panahiazar, Maryam; Pathak, Jyotishman
2016-04-01
Computerized survival prediction in healthcare identifying the risk of disease mortality, helps healthcare providers to effectively manage their patients by providing appropriate treatment options. In this study, we propose to apply a classification algorithm, Contrast Pattern Aided Logistic Regression (CPXR(Log)) with the probabilistic loss function, to develop and validate prognostic risk models to predict 1, 2, and 5year survival in heart failure (HF) using data from electronic health records (EHRs) at Mayo Clinic. The CPXR(Log) constructs a pattern aided logistic regression model defined by several patterns and corresponding local logistic regression models. One of the models generated by CPXR(Log) achieved an AUC and accuracy of 0.94 and 0.91, respectively, and significantly outperformed prognostic models reported in prior studies. Data extracted from EHRs allowed incorporation of patient co-morbidities into our models which helped improve the performance of the CPXR(Log) models (15.9% AUC improvement), although did not improve the accuracy of the models built by other classifiers. We also propose a probabilistic loss function to determine the large error and small error instances. The new loss function used in the algorithm outperforms other functions used in the previous studies by 1% improvement in the AUC. This study revealed that using EHR data to build prediction models can be very challenging using existing classification methods due to the high dimensionality and complexity of EHR data. The risk models developed by CPXR(Log) also reveal that HF is a highly heterogeneous disease, i.e., different subgroups of HF patients require different types of considerations with their diagnosis and treatment. Our risk models provided two valuable insights for application of predictive modeling techniques in biomedicine: Logistic risk models often make systematic prediction errors, and it is prudent to use subgroup based prediction models such as those given by CPXR(Log) when investigating heterogeneous diseases. Copyright © 2016 Elsevier Inc. All rights reserved.
PRESS-based EFOR algorithm for the dynamic parametrical modeling of nonlinear MDOF systems
NASA Astrophysics Data System (ADS)
Liu, Haopeng; Zhu, Yunpeng; Luo, Zhong; Han, Qingkai
2017-09-01
In response to the identification problem concerning multi-degree of freedom (MDOF) nonlinear systems, this study presents the extended forward orthogonal regression (EFOR) based on predicted residual sums of squares (PRESS) to construct a nonlinear dynamic parametrical model. The proposed parametrical model is based on the non-linear autoregressive with exogenous inputs (NARX) model and aims to explicitly reveal the physical design parameters of the system. The PRESS-based EFOR algorithm is proposed to identify such a model for MDOF systems. By using the algorithm, we built a common-structured model based on the fundamental concept of evaluating its generalization capability through cross-validation. The resulting model aims to prevent over-fitting with poor generalization performance caused by the average error reduction ratio (AERR)-based EFOR algorithm. Then, a functional relationship is established between the coefficients of the terms and the design parameters of the unified model. Moreover, a 5-DOF nonlinear system is taken as a case to illustrate the modeling of the proposed algorithm. Finally, a dynamic parametrical model of a cantilever beam is constructed from experimental data. Results indicate that the dynamic parametrical model of nonlinear systems, which depends on the PRESS-based EFOR, can accurately predict the output response, thus providing a theoretical basis for the optimal design of modeling methods for MDOF nonlinear systems.
Peeters, Yvette; Boersma, Sandra N; Koopman, Hendrik M
2008-01-01
Background Aim of this study is to further explore predictors of health related quality of life in children with asthma using factors derived from to the extended stress-coping model. While the stress-coping model has often been used as a frame of reference in studying health related quality of life in chronic illness, few have actually tested the model in children with asthma. Method In this survey study data were obtained by means of self-report questionnaires from seventy-eight children with asthma and their parents. Based on data derived from these questionnaires the constructs of the extended stress-coping model were assessed, using regression analysis and path analysis. Results The results of both regression analysis and path analysis reveal tentative support for the proposed relationships between predictors and health related quality of life in the stress-coping model. Moreover, as indicated in the stress-coping model, HRQoL is only directly predicted by coping. Both coping strategies 'emotional reaction' (significantly) and 'avoidance' are directly related to HRQoL. Conclusion In children with asthma, the extended stress-coping model appears to be a useful theoretical framework for understanding the impact of the illness on their quality of life. Consequently, the factors suggested by this model should be taken into account when designing optimal psychosocial-care interventions. PMID:18366753
NASA Astrophysics Data System (ADS)
Fang, Kaizheng; Mu, Daobin; Chen, Shi; Wu, Borong; Wu, Feng
2012-06-01
In this study, a prediction model based on artificial neural network is constructed for surface temperature simulation of nickel-metal hydride battery. The model is developed from a back-propagation network which is trained by Levenberg-Marquardt algorithm. Under each ambient temperature of 10 °C, 20 °C, 30 °C and 40 °C, an 8 Ah cylindrical Ni-MH battery is charged in the rate of 1 C, 3 C and 5 C to its SOC of 110% in order to provide data for the model training. Linear regression method is adopted to check the quality of the model training, as well as mean square error and absolute error. It is shown that the constructed model is of excellent training quality for the guarantee of prediction accuracy. The surface temperature of battery during charging is predicted under various ambient temperatures of 50 °C, 60 °C, 70 °C by the model. The results are validated in good agreement with experimental data. The value of battery surface temperature is calculated to exceed 90 °C under the ambient temperature of 60 °C if it is overcharged in 5 C, which might cause battery safety issues.
Managing fish habitat for flow and temperature extremes ...
Summer low flows and stream temperature maxima are key drivers affecting the sustainability of fish populations. Thus, it is critical to understand both the natural templates of spatiotemporal variability, how these are shifting due to anthropogenic influences of development and climate change, and how these impacts can be moderated by natural and constructed green infrastructure. Low flow statistics of New England streams have been characterized using a combination of regression equations to describe long-term averages as a function of indicators of hydrologic regime (rain- versus snow-dominated), precipitation, evapotranspiration or temperature, surface water storage, baseflow recession rates, and impervious cover. Difference equations have been constructed to describe interannual variation in low flow as a function of changing air temperature, precipitation, and ocean-atmospheric teleconnection indices. Spatial statistical network models have been applied to explore fine-scale variability of thermal regimes along stream networks in New England as a function of variables describing natural and altered energy inputs, groundwater contributions, and retention time. Low flows exacerbate temperature impacts by reducing thermal inertia of streams to energy inputs. Based on these models, we can construct scenarios of fish habitat suitability using current and projected future climate and the potential for preservation and restoration of historic habitat regimes th
Predicting Intention Perform Breast Self-Examination: Application of the Theory of Reasoned Action
Dewi, Triana Kesuma; Zein, Rizqy Amelia
2017-01-01
Objective: The present study aimed to examine the applicability of the theory of reasoned action to explain intention to perform breast self-examination (BSE). Methods: A questionnaire was constructed to collect data. The hypothesis was tested in two steps. First, to assess the strength of the correlation among the constructs of theory of reasoned action (TRA), Pearson’s product moment correlations were applied. Second, multivariate relationships among the constructs were examined by performing hierarchical multiple linear regression analysis. Result: The findings supported the TRA model, explaining 45.8% of the variance in the students’ BSE intention, which was significantly correlated with attitude (r = 0.609, p = 0.000) and subjective norms (r = 0.420, p =0 .000). Conclusion: TRA could be a suitable model to predict BSE intentions. Participants who believed that doing BSE regularly is beneficial for early diagnosis of breast cancer and also believed that their significant referents think that doing BSE would significantly detect breast cancer earlier, were more likely to intend to perform BSE regularly. Therefore, the research findings supported the conclusion that promoting the importance of BSE at the community/social level would enhance individuals to perform BSE routinely. PMID:29172263
Predicting Intention Perform Breast Self-Examination: Application of the Theory of Reasoned Action
Dewi, Triana Kesuma; Zein, Rizqy Amelia
2017-11-26
Objective: The present study aimed to examine the applicability of the theory of reasoned action to explain intention to perform breast self-examination (BSE). Methods: A questionnaire was constructed to collect data. The hypothesis was tested in two steps. First, to assess the strength of the correlation among the constructs of theory of reasoned action (TRA), Pearson’s product moment correlations were applied. Second, multivariate relationships among the constructs were examined by performing hierarchical multiple linear regression analysis. Result: The findings supported the TRA model, explaining 45.8% of the variance in the students’ BSE intention, which was significantly correlated with attitude (r = 0.609, p = 0.000) and subjective norms (r = 0.420, p =0 .000). Conclusion: TRA could be a suitable model to predict BSE intentions . Participants who believed that doing BSE regularly is beneficial for early diagnosis of breast cancer and also believed that their significant referents think that doing BSE would significantly detect breast cancer earlier, were more likely to intend to perform BSE regularly. Therefore, the research findings supported the conclusion that promoting the importance of BSE at the community/social level would enhance individuals to perform BSE routinely. Creative Commons Attribution License
The Use of Remote Sensing Data for Modeling Air Quality in the Cities
NASA Astrophysics Data System (ADS)
Putrenko, V. V.; Pashynska, N. M.
2017-12-01
Monitoring of environmental pollution in the cities by the methods of remote sensing of the Earth is actual area of research for sustainable development. Ukraine has a poorly developed network of monitoring stations for air quality, the technical condition of which is deteriorating in recent years. Therefore, the possibility of obtaining data about the condition of air by remote sensing methods is of great importance. The paper considers the possibility of using the data about condition of atmosphere of the project AERONET to assess the air quality in Ukraine. The main pollution indicators were used data on fine particulate matter (PM2.5) and nitrogen dioxide (NO2) content in the atmosphere. The main indicator of air quality in Ukraine is the air pollution index (API). We have built regression models the relationship between indicators of NO2, which are measured by remote sensing methods and ground-based measurements of indicators. There have also been built regression models, the relationship between the data given to the land of NO2 and API. To simulate the relationship between the API and PM2.5 were used geographically weighted regression model, which allows to take into account the territorial differentiation between these indicators. As a result, the maps that show the distribution of the main types of pollution in the territory of Ukraine, were constructed. PM2.5 data modeling is complicated with using existing indicators, which requires a separate organization observation network for PM2.5 content in the atmosphere for sustainable development in cities of Ukraine.
Luna-Lario, P; Pena, J; Ojeda, N
2017-04-16
To perform an in-depth examination of the construct validity and the ecological validity of the Wechsler Memory Scale-III (WMS-III) and the Spain-Complutense Verbal Learning Test (TAVEC). The sample consists of 106 adults with acquired brain injury who were treated in the Area of Neuropsychology and Neuropsychiatry of the Complejo Hospitalario de Navarra and displayed memory deficit as the main sequela, measured by means of specific memory tests. The construct validity is determined by examining the tasks required in each test over the basic theoretical models, comparing the performance according to the parameters offered by the tests, contrasting the severity indices of each test and analysing their convergence. The external validity is explored through the correlation between the tests and by using regression models. According to the results obtained, both the WMS-III and the TAVEC have construct validity. The TAVEC is more sensitive and captures not only the deficits in mnemonic consolidation, but also in the executive functions involved in memory. The working memory index of the WMS-III is useful for predicting the return to work at two years after the acquired brain injury, but none of the instruments anticipates the disability and dependence at least six months after the injury. We reflect upon the construct validity of the tests and their insufficient capacity to predict functionality when the sequelae become chronic.
Influence of fatigue on construction workers’ physical and cognitive function
Zhang, M.; Murphy, L. A.; Fang, D.
2015-01-01
Background Despite scientific evidence linking workers’ fatigue to occupational safety (due to impaired physical or cognitive function), little is known about this relationship in construction workers. Aims To assess the association between construction workers’ reported fatigue and their perceived difficulties with physical and cognitive functions. Methods Using data from a convenience sample of US construction workers participating in the 2010–11 National Health Interview Survey two multivariate weighted logistic regression models were built to predict difficulty with physical and with cognitive functions associated with workers’ reported fatigue, while controlling for age, smoking status, alcohol consumption status, sleep hygiene, psychological distress and arthritis status. Results Of 606 construction workers surveyed, 49% reported being ‘tired some days’ in the past 3 months and 10% reported ‘tired most days or every day’. Compared with those feeling ‘never tired’, workers who felt ‘tired some days’ were significantly more likely to report difficulty with physical function (adjusted odds ratio [AOR] = 2.03; 95% confidence interval [CI] 1.17–3.51) and cognitive function (AOR = 2.27; 95% CI 1.06–4.88) after controlling for potential confounders. Conclusions Our results suggest an association between reported fatigue and experiencing difficulties with physical and cognitive functions in construction workers. PMID:25701835
Redick, Thomas S; Shipstead, Zach; Meier, Matthew E; Montroy, Janelle J; Hicks, Kenny L; Unsworth, Nash; Kane, Michael J; Hambrick, D Zachary; Engle, Randall W
2016-11-01
Previous research has identified several cognitive abilities that are important for multitasking, but few studies have attempted to measure a general multitasking ability using a diverse set of multitasks. In the final dataset, 534 young adult subjects completed measures of working memory (WM), attention control, fluid intelligence, and multitasking. Correlations, hierarchical regression analyses, confirmatory factor analyses, structural equation models, and relative weight analyses revealed several key findings. First, although the complex tasks used to assess multitasking differed greatly in their task characteristics and demands, a coherent construct specific to multitasking ability was identified. Second, the cognitive ability predictors accounted for substantial variance in the general multitasking construct, with WM and fluid intelligence accounting for the most multitasking variance compared to attention control. Third, the magnitude of the relationships among the cognitive abilities and multitasking varied as a function of the complexity and structure of the various multitasks assessed. Finally, structural equation models based on a multifaceted model of WM indicated that attention control and capacity fully mediated the WM and multitasking relationship. (PsycINFO Database Record (c) 2016 APA, all rights reserved).
Foster, Guy M.
2014-01-01
The Neosho River and its primary tributary, the Cottonwood River, are the primary sources of inflow to the John Redmond Reservoir in east-central Kansas. Sedimentation rate in the John Redmond Reservoir was estimated as 743 acre-feet per year for 1964–2006. This estimated sedimentation rate is more than 80 percent larger than the projected design sedimentation rate of 404 acre-feet per year, and resulted in a loss of 40 percent of the conservation pool since its construction in 1964. To reduce sediment input into the reservoir, the Kansas Water Office implemented stream bank stabilization techniques along an 8.3 mile reach of the Neosho River during 2010 through 2011. The U.S. Geological Survey, in cooperation with the Kansas Water Office and funded in part through the Kansas State Water Plan Fund, operated continuous real-time water-quality monitors upstream and downstream from stream bank stabilization efforts before, during, and after construction. Continuously measured water-quality properties include streamflow, specific conductance, water temperature, and turbidity. Discrete sediment samples were collected from June 2009 through September 2012 and analyzed for suspended-sediment concentration (SSC), percentage of sediments less than 63 micrometers (sand-fine break), and loss of material on ignition (analogous to amount of organic matter). Regression models were developed to establish relations between discretely measured SSC samples, and turbidity or streamflow to estimate continuously SSC. Continuous water-quality monitors represented between 96 and 99 percent of the cross-sectional variability for turbidity, and had slopes between 0.91 and 0.98. Because consistent bias was not observed, values from continuous water-quality monitors were considered representative of stream conditions. On average, turbidity-based SSC models explained 96 percent of the variance in SSC. Streamflow-based regressions explained 53 to 60 percent of the variance. Mean squared prediction error for turbidity-based regression relations ranged from -32 to 48 percent, whereas mean square prediction error for streamflow-based regressions ranged from -69 to 218 percent. These models are useful for evaluating the variability of SSC during rapidly changing conditions, computing loads and yields to assess SSC transport through the watershed, and for providing more accurate load estimates compared to streamflow-only based estimation methods used in the past. These models can be used to evaluate the efficacy of streambank stabilization efforts.
Personalized Modeling for Prediction with Decision-Path Models
Visweswaran, Shyam; Ferreira, Antonio; Ribeiro, Guilherme A.; Oliveira, Alexandre C.; Cooper, Gregory F.
2015-01-01
Deriving predictive models in medicine typically relies on a population approach where a single model is developed from a dataset of individuals. In this paper we describe and evaluate a personalized approach in which we construct a new type of decision tree model called decision-path model that takes advantage of the particular features of a given person of interest. We introduce three personalized methods that derive personalized decision-path models. We compared the performance of these methods to that of Classification And Regression Tree (CART) that is a population decision tree to predict seven different outcomes in five medical datasets. Two of the three personalized methods performed statistically significantly better on area under the ROC curve (AUC) and Brier skill score compared to CART. The personalized approach of learning decision path models is a new approach for predictive modeling that can perform better than a population approach. PMID:26098570
Modeling of Micro Deval abrasion loss based on some rock properties
NASA Astrophysics Data System (ADS)
Capik, Mehmet; Yilmaz, Ali Osman
2017-10-01
Aggregate is one of the most widely used construction material. The quality of the aggregate is determined using some testing methods. Among these methods, the Micro Deval Abrasion Loss (MDAL) test is commonly used for the determination of the quality and the abrasion resistance of aggregate. The main objective of this study is to develop models for the prediction of MDAL from rock properties, including uniaxial compressive strength, Brazilian tensile strength, point load index, Schmidt rebound hardness, apparent porosity, void ratio Cerchar abrasivity index and Bohme abrasion test are examined. Additionally, the MDAL is modeled using simple regression analysis and multiple linear regression analysis based on the rock properties. The study shows that the MDAL decreases with the increase of uniaxial compressive strength, Brazilian tensile strength, point load index, Schmidt rebound hardness and Cerchar abrasivity index. It is also concluded that the MDAL increases with the increase of apparent porosity, void ratio and Bohme abrasion test. The modeling results show that the models based on Bohme abrasion test and L type Schmidt rebound hardness give the better forecasting performances for the MDAL. More models, including the uniaxial compressive strength, the apparent porosity and Cerchar abrasivity index, are developed for the rapid estimation of the MDAL of the rocks. The developed models were verified by statistical tests. Additionally, it can be stated that the proposed models can be used as a forecasting for aggregate quality.
Hammer, Leslie B.; Kossek, Ellen Ernst; Yragui, Nanette L.; Bodner, Todd E.; Hanson, Ginger C.
2011-01-01
Due to growing work-family demands, supervisors need to effectively exhibit family supportive supervisor behaviors (FSSB). Drawing on social support theory and using data from two samples of lower wage workers, the authors develop and validate a measure of FSSB, defined as behaviors exhibited by supervisors that are supportive of families. FSSB is conceptualized as a multidimensional superordinate construct with four subordinate dimensions: emotional support, instrumental support, role modeling behaviors, and creative work-family management. Results from multilevel confirmatory factor analyses and multilevel regression analyses provide evidence of construct, criterion-related, and incremental validity. The authors found FSSB to be significantly related to work-family conflict, work-family positive spillover, job satisfaction, and turnover intentions over and above measures of general supervisor support. PMID:21660254
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271
Gagnon, Marie Pierre; Orruño, Estibalitz; Asua, José; Abdeljelil, Anis Ben; Emparanza, José
2012-01-01
To examine the factors that could influence the decision of healthcare professionals to use a telemonitoring system. A questionnaire, based on the Technology Acceptance Model (TAM), was developed. A panel of experts in technology assessment evaluated the face and content validity of the instrument. Two hundred and thirty-four questionnaires were distributed among nurses and doctors of the cardiology, pulmonology, and internal medicine departments of a tertiary hospital. Cronbach alpha was calculated to measure the internal consistency of the questionnaire items. Construct validity was evaluated using interitem correlation analysis. Logistic regression analysis was performed to test the theoretical model. Adjusted odds ratios (ORs) and their 95% confidence intervals (CIs) were computed. A response rate of 39.7% was achieved. With the exception of one theoretical construct (Habit) that corresponds to behaviors that become automatized, Cronbach alpha values were acceptably high for the remaining constructs. Theoretical variables were well correlated with each other and with the dependent variable. The original TAM was good at predicting telemonitoring usage intention, Perceived Usefulness being the only significant predictor (OR: 5.28, 95% CI: 2.12-13.11). The model was still significant and more powerful when the other theoretical variables were added. However, the only significant predictor in the modified model was Facilitators (OR: 4.96, 95% CI: 1.59-15.55). The TAM is a good predictive model of healthcare professionals' intention to use telemonitoring. However, the perception of facilitators is the most important variable to consider for increasing doctors' and nurses' intention to use the new technology.
Socioeconomic Status Index to Interpret Inequalities in Child Development
AHMADI DOULABI, Mahbobeh; SAJEDI, Firoozeh; VAMEGHI, Roshanak; MAZAHERI, Mohammad Ali; AKBARZADEH BAGHBAN, Alireza
2017-01-01
Objective There have been contradictory findings on the relationship between Socioeconomic Status (SES) and child development although SES is associated with child development outcomes. The present study intended to define the relationship between SES and child development in Tehran kindergartens, Iran. Materials & Methods This cross-sectional survey studied 1036 children aged 36-60 month, in different kindergartens in Tehran City, Iran, in 2014-2015. The principal factor analysis (PFA) model was employed to construct SES indices. The constructed SES variable was employed as an independent variable in logistic regression model to evaluate its role in developmental delay as a dependent variable. Results The relationship between SES and developmental delay was significant at P=0.003. SES proved to have a significant (P<0.05) impact on developmental delay, both as an independent variable and after controlling risk factors. Conclusion There should be more emphasis on developmental monitoring and appropriate intervention programs for children to give them higher chance of having a more productive life. PMID:28698723
Exploring the Theory of Planned Behavior to Explain Sugar-Sweetened Beverage Consumption
Estabrooks, Paul; Davy, Brenda; Chen, Yvonnes; You, Wendy
2011-01-01
Objective To describe sugar-sweetened beverage (SSB) consumption, establish psychometric properties and utility of a Theory of Planned Behavior (TPB) instrument for SSB consumption. Methods This cross-sectional survey included 119 southwest Virginia participants. Respondents were majority female (66%), white (89%), ≤ high school education (79%), and averaged 41.4 (±13.5) years. A validated beverage questionnaire was used to measure SSB. Eleven TPB constructs were assessed with a 56-item instrument. Analyses included descriptive statistics, one-way ANOVAs, Cronbach alphas, and multiple regressions. Results Sugar-sweetened beverage intake averaged 457 (±430) kilocalories/day. The TPB model provided a moderate explanation of SSB intake (R2=0.38; F=13.10, P<0.01). Behavioral intentions had the strongest relationships with SSB consumption, followed by attitudes, perceived behavioral control, and subjective norms. The six belief constructs did not predict significant variance in the models. Conclusions and Implications Future efforts to comprehensively develop and implement interventions guided by the TPB hold promise for reducing SSB intake. PMID:22154130
Rein, Thomas R; Harvati, Katerina; Harrison, Terry
2015-01-01
Uncovering links between skeletal morphology and locomotor behavior is an essential component of paleobiology because it allows researchers to infer the locomotor repertoire of extinct species based on preserved fossils. In this study, we explored ulnar shape in anthropoid primates using 3D geometric morphometrics to discover novel aspects of shape variation that correspond to observed differences in the relative amount of forelimb suspensory locomotion performed by species. The ultimate goal of this research was to construct an accurate predictive model that can be applied to infer the significance of these behaviors. We studied ulnar shape variation in extant species using principal component analysis. Species mainly clustered into phylogenetic groups along the first two principal components. Upon closer examination, the results showed that the position of species within each major clade corresponded closely with the proportion of forelimb suspensory locomotion that they have been observed to perform in nature. We used principal component regression to construct a predictive model for the proportion of these behaviors that would be expected to occur in the locomotor repertoire of anthropoid primates. We then applied this regression analysis to Pliopithecus vindobonensis, a stem catarrhine from the Miocene of central Europe, and found strong evidence that this species was adapted to perform a proportion of forelimb suspensory locomotion similar to that observed in the extant woolly monkey, Lagothrix lagothricha. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Rodrigues, João Fabrício Mota; Coelho, Marco Túlio Pacheco; Ribeiro, Bruno R.
2018-04-01
Species distribution models (SDM) have been broadly used in ecology to address theoretical and practical problems. Currently, there are two main approaches to generate SDMs: (i) correlative, which is based on species occurrences and environmental predictor layers and (ii) process-based models, which are constructed based on species' functional traits and physiological tolerances. The distributions estimated by each approach are based on different components of species niche. Predictions of correlative models approach species realized niches, while predictions of process-based are more akin to species fundamental niche. Here, we integrated the predictions of fundamental and realized distributions of the freshwater turtle Trachemys dorbigni. Fundamental distribution was estimated using data of T. dorbigni's egg incubation temperature, and realized distribution was estimated using species occurrence records. Both types of distributions were estimated using the same regression approaches (logistic regression and support vector machines), both considering macroclimatic and microclimatic temperatures. The realized distribution of T. dorbigni was generally nested in its fundamental distribution reinforcing theoretical assumptions that the species' realized niche is a subset of its fundamental niche. Both modelling algorithms produced similar results but microtemperature generated better results than macrotemperature for the incubation model. Finally, our results reinforce the conclusion that species realized distributions are constrained by other factors other than just thermal tolerances.
Prediction of cadmium enrichment in reclaimed coastal soils by classification and regression tree
NASA Astrophysics Data System (ADS)
Ru, Feng; Yin, Aijing; Jin, Jiaxin; Zhang, Xiuying; Yang, Xiaohui; Zhang, Ming; Gao, Chao
2016-08-01
Reclamation of coastal land is one of the most common ways to obtain land resources in China. However, it has long been acknowledged that the artificial interference with coastal land has disadvantageous effects, such as heavy metal contamination. This study aimed to develop a prediction model for cadmium enrichment levels and assess the importance of affecting factors in typical reclaimed land in Eastern China (DFCL: Dafeng Coastal Land). Two hundred and twenty seven surficial soil/sediment samples were collected and analyzed to identify the enrichment levels of cadmium and the possible affecting factors in soils and sediments. The classification and regression tree (CART) model was applied in this study to predict cadmium enrichment levels. The prediction results showed that cadmium enrichment levels assessed by the CART model had an accuracy of 78.0%. The CART model could extract more information on factors affecting the environmental behavior of cadmium than correlation analysis. The integration of correlation analysis and the CART model showed that fertilizer application and organic carbon accumulation were the most important factors affecting soil/sediment cadmium enrichment levels, followed by particle size effects (Al2O3, TFe2O3 and SiO2), contents of Cl and S, surrounding construction areas and reclamation history.
NASA Astrophysics Data System (ADS)
Rogers, Jeffrey N.; Parrish, Christopher E.; Ward, Larry G.; Burdick, David M.
2018-03-01
Salt marsh vegetation tends to increase vertical uncertainty in light detection and ranging (lidar) derived elevation data, often causing the data to become ineffective for analysis of topographic features governing tidal inundation or vegetation zonation. Previous attempts at improving lidar data collected in salt marsh environments range from simply computing and subtracting the global elevation bias to more complex methods such as computing vegetation-specific, constant correction factors. The vegetation specific corrections can be used along with an existing habitat map to apply separate corrections to different areas within a study site. It is hypothesized here that correcting salt marsh lidar data by applying location-specific, point-by-point corrections, which are computed from lidar waveform-derived features, tidal-datum based elevation, distance from shoreline and other lidar digital elevation model based variables, using nonparametric regression will produce better results. The methods were developed and tested using full-waveform lidar and ground truth for three marshes in Cape Cod, Massachusetts, U.S.A. Five different model algorithms for nonparametric regression were evaluated, with TreeNet's stochastic gradient boosting algorithm consistently producing better regression and classification results. Additionally, models were constructed to predict the vegetative zone (high marsh and low marsh). The predictive modeling methods used in this study estimated ground elevation with a mean bias of 0.00 m and a standard deviation of 0.07 m (0.07 m root mean square error). These methods appear very promising for correction of salt marsh lidar data and, importantly, do not require an existing habitat map, biomass measurements, or image based remote sensing data such as multi/hyperspectral imagery.
Sills, Deborah L; Gossett, James M
2012-04-01
Fourier transform infrared, attenuated total reflectance (FTIR-ATR) spectroscopy, combined with partial least squares (PLS) regression, accurately predicted solubilization of plant cell wall constituents and NaOH consumption through pretreatment, and overall sugar productions from combined pretreatment and enzymatic hydrolysis. PLS regression models were constructed by correlating FTIR spectra of six raw biomasses (two switchgrass cultivars, big bluestem grass, a low-impact, high-diversity mixture of prairie biomasses, mixed hardwood, and corn stover), plus alkali loading in pretreatment, to nine dependent variables: glucose, xylose, lignin, and total solids solubilized in pretreatment; NaOH consumed in pretreatment; and overall glucose and xylose conversions and yields from combined pretreatment and enzymatic hydrolysis. PLS models predicted the dependent variables with the following values of coefficient of determination for cross-validation (Q²): 0.86 for glucose, 0.90 for xylose, 0.79 for lignin, and 0.85 for total solids solubilized in pretreatment; 0.83 for alkali consumption; 0.93 for glucose conversion, 0.94 for xylose conversion, and 0.88 for glucose and xylose yields. The sugar yield models are noteworthy for their ability to predict overall saccharification through combined pretreatment and enzymatic hydrolysis per mass dry untreated solids without a priori knowledge of the composition of solids. All wavenumbers with significant variable-important-for-projection (VIP) scores have been attributed to chemical features of lignocellulose, demonstrating the models were based on real chemical information. These models suggest that PLS regression can be applied to FTIR-ATR spectra of raw biomasses to rapidly predict effects of pretreatment on solids and on subsequent enzymatic hydrolysis. Copyright © 2011 Wiley Periodicals, Inc.
Austin, P C; Shah, B R; Newman, A; Anderson, G M
2012-09-01
There are limited validated methods to ascertain comorbidities for risk adjustment in ambulatory populations of patients with diabetes using administrative health-care databases. The objective was to examine the ability of the Johns Hopkins' Aggregated Diagnosis Groups to predict mortality in population-based ambulatory samples of both incident and prevalent subjects with diabetes. Retrospective cohorts constructed using population-based administrative data. The incident cohort consisted of all 346,297 subjects diagnosed with diabetes between 1 April 2004 and 31 March 2008. The prevalent cohort consisted of all 879,849 subjects with pre-existing diabetes on 1 January, 2007. The outcome was death within 1 year of the subject's index date. A logistic regression model consisting of age, sex and indicator variables for 22 of the 32 Johns Hopkins' Aggregated Diagnosis Group categories had excellent discrimination for predicting mortality in incident diabetes patients: the c-statistic was 0.87 in an independent validation sample. A similar model had excellent discrimination for predicting mortality in prevalent diabetes patients: the c-statistic was 0.84 in an independent validation sample. Both models demonstrated very good calibration, denoting good agreement between observed and predicted mortality across the range of predicted mortality in which the large majority of subjects lay. For comparative purposes, regression models incorporating the Charlson comorbidity index, age and sex, age and sex, and age alone had poorer discrimination than the model that incorporated the Johns Hopkins' Aggregated Diagnosis Groups. Logistical regression models using age, sex and the John Hopkins' Aggregated Diagnosis Groups were able to accurately predict 1-year mortality in population-based samples of patients with diabetes. © 2011 The Authors. Diabetic Medicine © 2011 Diabetes UK.
Guo, Jin-Cheng; Wu, Yang; Chen, Yang; Pan, Feng; Wu, Zhi-Yong; Zhang, Jia-Sheng; Wu, Jian-Yi; Xu, Xiu-E; Zhao, Jian-Mei; Li, En-Min; Zhao, Yi; Xu, Li-Yan
2018-04-09
Esophageal squamous cell carcinoma (ESCC) is the predominant subtype of esophageal carcinoma in China. This study was to develop a staging model to predict outcomes of patients with ESCC. Using Cox regression analysis, principal component analysis (PCA), partitioning clustering, Kaplan-Meier analysis, receiver operating characteristic (ROC) curve analysis, and classification and regression tree (CART) analysis, we mined the Gene Expression Omnibus database to determine the expression profiles of genes in 179 patients with ESCC from GSE63624 and GSE63622 dataset. Univariate cox regression analysis of the GSE63624 dataset revealed that 2404 protein-coding genes (PCGs) and 635 long non-coding RNAs (lncRNAs) were associated with the survival of patients with ESCC. PCA categorized these PCGs and lncRNAs into three principal components (PCs), which were used to cluster the patients into three groups. ROC analysis demonstrated that the predictive ability of PCG-lncRNA PCs when applied to new patients was better than that of the tumor-node-metastasis staging (area under ROC curve [AUC]: 0.69 vs. 0.65, P < 0.05). Accordingly, we constructed a molecular disaggregated model comprising one lncRNA and two PCGs, which we designated as the LSB staging model using CART analysis in the GSE63624 dataset. This LSB staging model classified the GSE63622 dataset of patients into three different groups, and its effectiveness was validated by analysis of another cohort of 105 patients. The LSB staging model has clinical significance for the prognosis prediction of patients with ESCC and may serve as a three-gene staging microarray.
A structured analysis of uncertainty surrounding modeled impacts of groundwater-extraction rules
NASA Astrophysics Data System (ADS)
Guillaume, Joseph H. A.; Qureshi, M. Ejaz; Jakeman, Anthony J.
2012-08-01
Integrating economic and groundwater models for groundwater-management can help improve understanding of trade-offs involved between conflicting socioeconomic and biophysical objectives. However, there is significant uncertainty in most strategic decision-making situations, including in the models constructed to represent them. If not addressed, this uncertainty may be used to challenge the legitimacy of the models and decisions made using them. In this context, a preliminary uncertainty analysis was conducted of a dynamic coupled economic-groundwater model aimed at assessing groundwater extraction rules. The analysis demonstrates how a variety of uncertainties in such a model can be addressed. A number of methods are used including propagation of scenarios and bounds on parameters, multiple models, block bootstrap time-series sampling and robust linear regression for model calibration. These methods are described within the context of a theoretical uncertainty management framework, using a set of fundamental uncertainty management tasks and an uncertainty typology.
NASA Technical Reports Server (NTRS)
Smith, Timothy D.; Steffen, Christopher J., Jr.; Yungster, Shaye; Keller, Dennis J.
1998-01-01
The all rocket mode of operation is shown to be a critical factor in the overall performance of a rocket based combined cycle (RBCC) vehicle. An axisymmetric RBCC engine was used to determine specific impulse efficiency values based upon both full flow and gas generator configurations. Design of experiments methodology was used to construct a test matrix and multiple linear regression analysis was used to build parametric models. The main parameters investigated in this study were: rocket chamber pressure, rocket exit area ratio, injected secondary flow, mixer-ejector inlet area, mixer-ejector area ratio, and mixer-ejector length-to-inlet diameter ratio. A perfect gas computational fluid dynamics analysis, using both the Spalart-Allmaras and k-omega turbulence models, was performed with the NPARC code to obtain values of vacuum specific impulse. Results from the multiple linear regression analysis showed that for both the full flow and gas generator configurations increasing mixer-ejector area ratio and rocket area ratio increase performance, while increasing mixer-ejector inlet area ratio and mixer-ejector length-to-diameter ratio decrease performance. Increasing injected secondary flow increased performance for the gas generator analysis, but was not statistically significant for the full flow analysis. Chamber pressure was found to be not statistically significant.
Ensemble of ground subsidence hazard maps using fuzzy logic
NASA Astrophysics Data System (ADS)
Park, Inhye; Lee, Jiyeong; Saro, Lee
2014-06-01
Hazard maps of ground subsidence around abandoned underground coal mines (AUCMs) in Samcheok, Korea, were constructed using fuzzy ensemble techniques and a geographical information system (GIS). To evaluate the factors related to ground subsidence, a spatial database was constructed from topographic, geologic, mine tunnel, land use, groundwater, and ground subsidence maps. Spatial data, topography, geology, and various ground-engineering data for the subsidence area were collected and compiled in a database for mapping ground-subsidence hazard (GSH). The subsidence area was randomly split 70/30 for training and validation of the models. The relationships between the detected ground-subsidence area and the factors were identified and quantified by frequency ratio (FR), logistic regression (LR) and artificial neural network (ANN) models. The relationships were used as factor ratings in the overlay analysis to create ground-subsidence hazard indexes and maps. The three GSH maps were then used as new input factors and integrated using fuzzy-ensemble methods to make better hazard maps. All of the hazard maps were validated by comparison with known subsidence areas that were not used directly in the analysis. As the result, the ensemble model was found to be more effective in terms of prediction accuracy than the individual model.
ECOPASS - a multivariate model used as an index of growth performance of poplar clones
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ceulemans, R.; Impens, I.
The model (ECOlogical PASSport) reported was constructed by principal component analysis from a combination of biochemical, anatomical/morphological and ecophysiological gas exchange parameters measured on 5 fast growing poplar clones. Productivity data were 10 selected trees in 3 plantations in Belgium and given as m.a.i.(b.a.). The model is shown to be able to reflect not only genetic origin and the relative effects of the different parameters of the clones, but also their production potential. Multiple regression analysis of the 4 principal components showed a high cumulative correlation (96%) between the 3 components related to ecophysiological, biochemical and morphological parameters, and productivity;more » the ecophysiological component alone correlated 85% with productivity.« less
What variables can influence clinical reasoning?
Ashoorion, Vahid; Liaghatdar, Mohammad Javad; Adibi, Peyman
2012-12-01
Clinical reasoning is one of the most important competencies that a physician should achieve. Many medical schools and licensing bodies try to predict it based on some general measures such as critical thinking, personality, and emotional intelligence. This study aimed at providing a model to design the relationship between the constructs. Sixty-nine medical students participated in this study. A battery test devised that consist four parts: Clinical reasoning measures, personality NEO inventory, Bar-On EQ inventory, and California critical thinking questionnaire. All participants completed the tests. Correlation and multiple regression analysis consumed for data analysis. There is low to moderate correlations between clinical reasoning and other variables. Emotional intelligence is the only variable that contributes clinical reasoning construct (r=0.17-0.34) (R(2) chnage = 0.46, P Value = 0.000). Although, clinical reasoning can be considered as a kind of thinking, no significant correlation detected between it and other constructs. Emotional intelligence (and its subscales) is the only variable that can be used for clinical reasoning prediction.
Gu, Ja K.; Charles, Luenda E.; Fekedulegn, Desta; Ma, Claudia C.; Andrew, Michael E.; Burchfiel, Cecil M.
2016-01-01
Objectives The aim of this study was to estimate prevalence of injury by occupation and industry and obesity’s role. Methods Self-reported injuries were collected annually for US workers during 2004 to 2013. Prevalence ratios (PRs) and 95% confidence intervals (CIs) were obtained from fitted logistic regression models. Results Overall weighted injury prevalence during the previous three months was 77 per 10,000 workers. Age-adjusted injury prevalence was greatest for Construction and Extraction workers (169.7/10,000) followed by Production (160.6) among occupations, while workers in the Construction industry sector (147.9) had the highest injury prevalence followed by the Agriculture/Forestry/Fishing/Mining/Utilities sector (122.1). Overweight and obese workers were 26% to 45% more likely to experience injuries than normal-weight workers. Conclusion The prevalence of injury, highest for Construction workers, gradually increased as body mass index levels increased in most occupational and industry groups. PMID:27058472
Validating MEDIQUAL Constructs
NASA Astrophysics Data System (ADS)
Lee, Sang-Gun; Min, Jae H.
In this paper, we validate MEDIQUAL constructs through the different media users in help desk service. In previous research, only two end-users' constructs were used: assurance and responsiveness. In this paper, we extend MEDIQUAL constructs to include reliability, empathy, assurance, tangibles, and responsiveness, which are based on the SERVQUAL theory. The results suggest that: 1) five MEDIQUAL constructs are validated through the factor analysis. That is, importance of the constructs have relatively high correlations between measures of the same construct using different methods and low correlations between measures of the constructs that are expected to differ; and 2) five MEDIQUAL constructs are statistically significant on media users' satisfaction in help desk service by regression analysis.
NASA Astrophysics Data System (ADS)
Graboski, A. J.
2016-12-01
The Department of Defense (DoD) is planning over $600M in military construction on Eielson Air Force Base (AFB) within the next three fiscal years. Although many studies have been conducted on permafrost and climate change, the future of our climate as well as any impacts on arctic infrastructure, remains unclear. This research focused on future climate predictions to determine likely scenarios for the United States Air Force's Strategic Planners to consider. This research also looked at various construction methods being used by industry to glean best practices to incorporate into future construction in order to determine cost factors to consider when permafrost soils may be encountered. The most recent 2013 International Panel on Climate Change (IPCC) report predicts a 2.2ºC to 7.8ºC temperature rise in Arctic regions by the end of the 21st Century in the Representative Concentration Pathways, (RCP4.5) emissions scenario. A regression model was created using archived surface observations from 1944 to 2016. Initial analysis using regression/forecast techniques show a 1.17ºC temperature increase in the Arctic by the end of the 21st Century. Historical DoD construction data was then used to determine an appropriate cost factor. Applying statistical tests to the adjusted climate predictions supports continued usage of current DoD cost factors of 2.13 at Eielson and 2.97 at Thule AFBs as they should be sufficient when planning future construction projects in permafrost rich areas. These cost factors should allow planners the necessary funds to plan foundation mitigation techniques and prevent further degradation of permafrost soils around airbase infrastructure. This current research focused on Central Alaska while further research is recommended on the Alaskan North Slope and Greenland to determine climate change impacts on critical DoD infrastructure.
Satisfaction of active duty soldiers with family dental care.
Chisick, M C
1997-02-01
In the fall of 1992, a random, worldwide sample of 6,442 married and single parent soldiers completed a self-administered survey on satisfaction with 22 attributes of family dental care. Simple descriptive statistics for each attribute were derived, as was a composite overall satisfaction score using factor analysis. Composite scores were regressed on demographics, annual dental utilization, and access barriers to identify those factors having an impact on a soldier's overall satisfaction with family dental care. Separate regression models were constructed for single parents, childless couples, and couples with children. Results show below-average satisfaction with nearly all attributes of family dental care, with access attributes having the lowest average satisfaction scores. Factors influencing satisfaction with family dental care varied by family type with one exception: dependent dental utilization within the past year contributed positively to satisfaction across all family types.
Ahmadi, Mehdi; Shahlaei, Mohsen
2015-01-01
P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure-activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7-7-1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure-activity relationship model suggested is robust and satisfactory.
Ahmadi, Mehdi; Shahlaei, Mohsen
2015-01-01
P2X7 antagonist activity for a set of 49 molecules of the P2X7 receptor antagonists, derivatives of purine, was modeled with the aid of chemometric and artificial intelligence techniques. The activity of these compounds was estimated by means of combination of principal component analysis (PCA), as a well-known data reduction method, genetic algorithm (GA), as a variable selection technique, and artificial neural network (ANN), as a non-linear modeling method. First, a linear regression, combined with PCA, (principal component regression) was operated to model the structure–activity relationships, and afterwards a combination of PCA and ANN algorithm was employed to accurately predict the biological activity of the P2X7 antagonist. PCA preserves as much of the information as possible contained in the original data set. Seven most important PC's to the studied activity were selected as the inputs of ANN box by an efficient variable selection method, GA. The best computational neural network model was a fully-connected, feed-forward model with 7−7−1 architecture. The developed ANN model was fully evaluated by different validation techniques, including internal and external validation, and chemical applicability domain. All validations showed that the constructed quantitative structure–activity relationship model suggested is robust and satisfactory. PMID:26600858
Developing Data-driven models for quantifying Cochlodinium polykrikoides in Coastal Waters
NASA Astrophysics Data System (ADS)
Kwon, Yongsung; Jang, Eunna; Im, Jungho; Baek, Seungho; Park, Yongeun; Cho, Kyunghwa
2017-04-01
Harmful algal blooms have been worldwide problems because it leads to serious dangers to human health and aquatic ecosystems. Especially, fish killing red tide blooms by one of dinoflagellate, Cochlodinium polykrikoides (C. polykrikoides), have caused critical damage to mariculture in the Korean coastal waters. In this work, multiple linear regression (MLR), regression tree (RT), and random forest (RF) models were constructed and applied to estimate C. polykrikoides blooms in coastal waters. Five different types of input dataset were carried out to test the performance of three models. To train and validate the three models, observed number of C. polykrikoides cells from National institute of fisheries science (NIFS) and remote sensing reflectance data from Geostationary Ocean Color Imager (GOCI) images for 3 years from 2013 to 2015 were used. The RT model showed the best prediction performance when using 4 bands and 3 band ratios data were used as input data simultaneously. Results obtained from iterative model development with randomly chosen input data indicated that the recognition of patterns in training data caused a variation in prediction performance. This work provided useful tools for reliably estimate the number of C. polykrikoides cells by using reasonable input reflectance dataset in coastal waters. It is expected that the RT model is easily accessed and manipulated by administrators and decision-makers working with coastal waters.
Calibration and Data Analysis of the MC-130 Air Balance
NASA Technical Reports Server (NTRS)
Booth, Dennis; Ulbrich, N.
2012-01-01
Design, calibration, calibration analysis, and intended use of the MC-130 air balance are discussed. The MC-130 balance is an 8.0 inch diameter force balance that has two separate internal air flow systems and one external bellows system. The manual calibration of the balance consisted of a total of 1854 data points with both unpressurized and pressurized air flowing through the balance. A subset of 1160 data points was chosen for the calibration data analysis. The regression analysis of the subset was performed using two fundamentally different analysis approaches. First, the data analysis was performed using a recently developed extension of the Iterative Method. This approach fits gage outputs as a function of both applied balance loads and bellows pressures while still allowing the application of the iteration scheme that is used with the Iterative Method. Then, for comparison, the axial force was also analyzed using the Non-Iterative Method. This alternate approach directly fits loads as a function of measured gage outputs and bellows pressures and does not require a load iteration. The regression models used by both the extended Iterative and Non-Iterative Method were constructed such that they met a set of widely accepted statistical quality requirements. These requirements lead to reliable regression models and prevent overfitting of data because they ensure that no hidden near-linear dependencies between regression model terms exist and that only statistically significant terms are included. Finally, a comparison of the axial force residuals was performed. Overall, axial force estimates obtained from both methods show excellent agreement as the differences of the standard deviation of the axial force residuals are on the order of 0.001 % of the axial force capacity.
Rank score and permutation testing alternatives for regression quantile estimates
Cade, B.S.; Richards, J.D.; Mielke, P.W.
2006-01-01
Performance of quantile rank score tests used for hypothesis testing and constructing confidence intervals for linear quantile regression estimates (0 ≤ τ ≤ 1) were evaluated by simulation for models with p = 2 and 6 predictors, moderate collinearity among predictors, homogeneous and hetero-geneous errors, small to moderate samples (n = 20–300), and central to upper quantiles (0.50–0.99). Test statistics evaluated were the conventional quantile rank score T statistic distributed as χ2 random variable with q degrees of freedom (where q parameters are constrained by H 0:) and an F statistic with its sampling distribution approximated by permutation. The permutation F-test maintained better Type I errors than the T-test for homogeneous error models with smaller n and more extreme quantiles τ. An F distributional approximation of the F statistic provided some improvements in Type I errors over the T-test for models with > 2 parameters, smaller n, and more extreme quantiles but not as much improvement as the permutation approximation. Both rank score tests required weighting to maintain correct Type I errors when heterogeneity under the alternative model increased to 5 standard deviations across the domain of X. A double permutation procedure was developed to provide valid Type I errors for the permutation F-test when null models were forced through the origin. Power was similar for conditions where both T- and F-tests maintained correct Type I errors but the F-test provided some power at smaller n and extreme quantiles when the T-test had no power because of excessively conservative Type I errors. When the double permutation scheme was required for the permutation F-test to maintain valid Type I errors, power was less than for the T-test with decreasing sample size and increasing quantiles. Confidence intervals on parameters and tolerance intervals for future predictions were constructed based on test inversion for an example application relating trout densities to stream channel width:depth.
Potential Predictability and Prediction Skill for Southern Peru Summertime Rainfall
NASA Astrophysics Data System (ADS)
WU, S.; Notaro, M.; Vavrus, S. J.; Mortensen, E.; Block, P. J.; Montgomery, R. J.; De Pierola, J. N.; Sanchez, C.
2016-12-01
The central Andes receive over 50% of annual climatological rainfall during the short period of January-March. This summertime rainfall exhibits strong interannual and decadal variability, including severe drought events that incur devastating societal impacts and cause agricultural communities and mining facilities to compete for limited water resources. An improved seasonal prediction skill of summertime rainfall would aid in water resource planning and allocation across the water-limited southern Peru. While various underlying mechanisms have been proposed by past studies for the drivers of interannual variability in summertime rainfall across southern Peru, such as the El Niño-Southern Oscillation (ENSO), Madden Julian Oscillation (MJO), and extratropical forcings, operational forecasts continue to be largely based on rudimentary ENSO-based indices, such as NINO3.4, justifying further exploration of predictive skill. In order to bridge this gap between the understanding of driving mechanisms and the operational forecast, we performed systematic studies on the predictability and prediction skill of southern Peru summertime rainfall by constructing statistical forecast models using best available weather station and reanalysis datasets. At first, by assuming the first two empirical orthogonal functions (EOFs) of summertime rainfall are predictable, the potential predictability skill was evaluated for southern Peru. Then, we constructed a simple regression model, based on the time series of tropical Pacific sea-surface temperatures (SSTs), and a more advanced Linear Inverse Model (LIM), based on the EOFs of tropical ocean SSTs and large-scale atmosphere variables from reanalysis. Our results show that the LIM model consistently outperforms the more rudimentary regression models on the forecast skill of domain averaged precipitation index and individual station indices. The improvement of forecast correlation skill ranges from 10% to over 200% for different stations. Further analysis shows that this advantage of LIM is likely to arise from its representation of local zonal winds and the position of Intertropical Convergence Zone (ITCZ).
Data Mining for Efficient and Accurate Large Scale Retrieval of Geophysical Parameters
NASA Astrophysics Data System (ADS)
Obradovic, Z.; Vucetic, S.; Peng, K.; Han, B.
2004-12-01
Our effort is devoted to developing data mining technology for improving efficiency and accuracy of the geophysical parameter retrievals by learning a mapping from observation attributes to the corresponding parameters within the framework of classification and regression. We will describe a method for efficient learning of neural network-based classification and regression models from high-volume data streams. The proposed procedure automatically learns a series of neural networks of different complexities on smaller data stream chunks and then properly combines them into an ensemble predictor through averaging. Based on the idea of progressive sampling the proposed approach starts with a very simple network trained on a very small chunk and then gradually increases the model complexity and the chunk size until the learning performance no longer improves. Our empirical study on aerosol retrievals from data obtained with the MISR instrument mounted at Terra satellite suggests that the proposed method is successful in learning complex concepts from large data streams with near-optimal computational effort. We will also report on a method that complements deterministic retrievals by constructing accurate predictive algorithms and applying them on appropriately selected subsets of observed data. The method is based on developing more accurate predictors aimed to catch global and local properties synthesized in a region. The procedure starts by learning the global properties of data sampled over the entire space, and continues by constructing specialized models on selected localized regions. The global and local models are integrated through an automated procedure that determines the optimal trade-off between the two components with the objective of minimizing the overall mean square errors over a specific region. Our experimental results on MISR data showed that the combined model can increase the retrieval accuracy significantly. The preliminary results on various large heterogeneous spatial-temporal datasets provide evidence that the benefits of the proposed methodology for efficient and accurate learning exist beyond the area of retrieval of geophysical parameters.
NASA Astrophysics Data System (ADS)
Pianalto, Frederick S.
Coccidioidomycosis (Valley Fever) is an environmentally-mediated respiratory disease caused by the inhalation of airborne spores from the fungi Coccidioides spp. The fungi reside in arid and semi-arid soils of the Americas. The disease has increased epidemically in Arizona and other areas within the last two decades. Despite this increase, the ecology of the fungi remains obscure, and environmental antecedents of the disease are largely unstudied. Two sources of soil disturbance, hypothesized to affect soil ecology and initiate spore dissemination, are investigated. Nocturnal desert rodents interact substantially with the soil substrate. Rodents are hypothesized to act as a reservoir of coccidioidomycosis, a mediator of soil properties, and a disseminator of fungal spores. Rodent distributions are poorly mapped for the study area. We build automated multi-linear regression models and decision tree models for ten rodent species using rodent trapping data from the Organ Pipe Cactus National Monument (ORPI) in southwest Arizona with a combination of surface temperature, a vegetation index and its texture, and a suite of topographic rasters. Surface temperature, derived from Landsat TM thermal images, is the most widely selected predictive variable in both automated methods. Construction-related soil disturbance (e.g. road construction, trenching, land stripping, and earthmoving) is a significant source of fugitive dust, which decreases air quality and may carry soil pathogens. Annual differencing of Landsat Thematic Mapper (TM) mid-infrared images is used to create change images, and thresholded change areas are associated with coordinates of local dust inspections. The output metric identifies source areas of soil disturbance, and it estimates the annual amount of dust-producing surface area for eastern Pima County spanning 1994 through 2009. Spatially explicit construction-related soil disturbance and rodent abundance data are compared with coccidioidomycosis incidence data using rank order correlation and regression methods. Construction-related soil disturbance correlates strongly with annual county-wide incidence. It also correlates with Tucson periphery incidence aggregated to zip codes. Abundance values for the desert pocket mouse (Chaetodipus penicillatus), derived from a soil-adjusted vegetation index, aspect (northing) and thermal radiance, correlate with total study period incidence aggregated to zip code.
Jacob, Benjamin G; Novak, Robert J; Toe, Laurent; Sanfo, Moussa S; Afriyie, Abena N; Ibrahim, Mohammed A; Griffith, Daniel A; Unnasch, Thomas R
2012-01-01
The standard methods for regression analyses of clustered riverine larval habitat data of Simulium damnosum s.l. a major black-fly vector of Onchoceriasis, postulate models relating observational ecological-sampled parameter estimators to prolific habitats without accounting for residual intra-cluster error correlation effects. Generally, this correlation comes from two sources: (1) the design of the random effects and their assumed covariance from the multiple levels within the regression model; and, (2) the correlation structure of the residuals. Unfortunately, inconspicuous errors in residual intra-cluster correlation estimates can overstate precision in forecasted S.damnosum s.l. riverine larval habitat explanatory attributes regardless how they are treated (e.g., independent, autoregressive, Toeplitz, etc). In this research, the geographical locations for multiple riverine-based S. damnosum s.l. larval ecosystem habitats sampled from 2 pre-established epidemiological sites in Togo were identified and recorded from July 2009 to June 2010. Initially the data was aggregated into proc genmod. An agglomerative hierarchical residual cluster-based analysis was then performed. The sampled clustered study site data was then analyzed for statistical correlations using Monthly Biting Rates (MBR). Euclidean distance measurements and terrain-related geomorphological statistics were then generated in ArcGIS. A digital overlay was then performed also in ArcGIS using the georeferenced ground coordinates of high and low density clusters stratified by Annual Biting Rates (ABR). This data was overlain onto multitemporal sub-meter pixel resolution satellite data (i.e., QuickBird 0.61m wavbands ). Orthogonal spatial filter eigenvectors were then generated in SAS/GIS. Univariate and non-linear regression-based models (i.e., Logistic, Poisson and Negative Binomial) were also employed to determine probability distributions and to identify statistically significant parameter estimators from the sampled data. Thereafter, Durbin-Watson test statistics were used to test the null hypothesis that the regression residuals were not autocorrelated against the alternative that the residuals followed an autoregressive process in AUTOREG. Bayesian uncertainty matrices were also constructed employing normal priors for each of the sampled estimators in PROC MCMC. The residuals revealed both spatially structured and unstructured error effects in the high and low ABR-stratified clusters. The analyses also revealed that the estimators, levels of turbidity and presence of rocks were statistically significant for the high-ABR-stratified clusters, while the estimators distance between habitats and floating vegetation were important for the low-ABR-stratified cluster. Varying and constant coefficient regression models, ABR- stratified GIS-generated clusters, sub-meter resolution satellite imagery, a robust residual intra-cluster diagnostic test, MBR-based histograms, eigendecomposition spatial filter algorithms and Bayesian matrices can enable accurate autoregressive estimation of latent uncertainity affects and other residual error probabilities (i.e., heteroskedasticity) for testing correlations between georeferenced S. damnosum s.l. riverine larval habitat estimators. The asymptotic distribution of the resulting residual adjusted intra-cluster predictor error autocovariate coefficients can thereafter be established while estimates of the asymptotic variance can lead to the construction of approximate confidence intervals for accurately targeting productive S. damnosum s.l habitats based on spatiotemporal field-sampled count data.
Predictive modelling of flow in a two-dimensional intermediate-scale, heterogeneous porous media
Barth, Gilbert R.; Hill, M.C.; Illangasekare, T.H.; Rajaram, H.
2000-01-01
To better understand the role of sedimentary structures in flow through porous media, and to determine how small-scale laboratory-measured values of hydraulic conductivity relate to in situ values this work deterministically examines flow through simple, artificial structures constructed for a series of intermediate-scale (10 m long), two-dimensional, heterogeneous, laboratory experiments. Nonlinear regression was used to determine optimal values of in situ hydraulic conductivity, which were compared to laboratory-measured values. Despite explicit numerical representation of the heterogeneity, the optimized values were generally greater than the laboratory-measured values. Discrepancies between measured and optimal values varied depending on the sand sieve size, but their contribution to error in the predicted flow was fairly consistent for all sands. Results indicate that, even under these controlled circumstances, laboratory-measured values of hydraulic conductivity need to be applied to models cautiously.To better understand the role of sedimentary structures in flow through porous media, and to determine how small-scale laboratory-measured values of hydraulic conductivity relate to in situ values this work deterministically examines flow through simple, artificial structures constructed for a series of intermediate-scale (10 m long), two-dimensional, heterogeneous, laboratory experiments. Nonlinear regression was used to determine optimal values of in situ hydraulic conductivity, which were compared to laboratory-measured values. Despite explicit numerical representation of the heterogeneity, the optimized values were generally greater than the laboratory-measured values. Discrepancies between measured and optimal values varied depending on the sand sieve size, but their contribution to error in the predicted flow was fairly consistent for all sands. Results indicate that, even under these controlled circumstances, laboratory-measured values of hydraulic conductivity need to be applied to models cautiously.
Cozzi-Lepri, Alessandro; Prosperi, Mattia C F; Kjær, Jesper; Dunn, David; Paredes, Roger; Sabin, Caroline A; Lundgren, Jens D; Phillips, Andrew N; Pillay, Deenan
2011-01-01
The question of whether a score for a specific antiretroviral (e.g. lopinavir/r in this analysis) that improves prediction of viral load response given by existing expert-based interpretation systems (IS) could be derived from analyzing the correlation between genotypic data and virological response using statistical methods remains largely unanswered. We used the data of the patients from the UK Collaborative HIV Cohort (UK CHIC) Study for whom genotypic data were stored in the UK HIV Drug Resistance Database (UK HDRD) to construct a training/validation dataset of treatment change episodes (TCE). We used the average square error (ASE) on a 10-fold cross-validation and on a test dataset (the EuroSIDA TCE database) to compare the performance of a newly derived lopinavir/r score with that of the 3 most widely used expert-based interpretation rules (ANRS, HIVDB and Rega). Our analysis identified mutations V82A, I54V, K20I and I62V, which were associated with reduced viral response and mutations I15V and V91S which determined lopinavir/r hypersensitivity. All models performed equally well (ASE on test ranging between 1.1 and 1.3, p = 0.34). We fully explored the potential of linear regression to construct a simple predictive model for lopinavir/r-based TCE. Although, the performance of our proposed score was similar to that of already existing IS, previously unrecognized lopinavir/r-associated mutations were identified. The analysis illustrates an approach of validation of expert-based IS that could be used in the future for other antiretrovirals and in other settings outside HIV research.
NASA Astrophysics Data System (ADS)
Willis, Kyle V.; Srogi, LeeAnn; Lutz, Tim; Monson, Frederick C.; Pollock, Meagen
2017-12-01
Textures and compositions are critical information for interpreting rock formation. Existing methods to integrate both types of information favor high-resolution images of mineral compositions over small areas or low-resolution images of larger areas for phase identification. The method in this paper produces images of individual phases in which textural and compositional details are resolved over three orders of magnitude, from tens of micrometers to tens of millimeters. To construct these images, called Phase Composition Maps (PCMs), we make use of the resolution in backscattered electron (BSE) images and calibrate the gray scale values with mineral analyses by energy-dispersive X-ray spectrometry (EDS). The resulting images show the area of a standard thin section (roughly 40 mm × 20 mm) with spatial resolution as good as 3.5 μm/pixel, or more than 81 000 pixels/mm2, comparable to the resolution of X-ray element maps produced by wavelength-dispersive spectrometry (WDS). Procedures to create PCMs for mafic igneous rocks with multivariate linear regression models for minerals with solid solution (olivine, plagioclase feldspar, and pyroxenes) are presented and are applicable to other rock types. PCMs are processed using threshold functions based on the regression models to image specific composition ranges of minerals. PCMs are constructed using widely-available instrumentation: a scanning-electron microscope (SEM) with BSE and EDS X-ray detectors and standard image processing software such as ImageJ and Adobe Photoshop. Three brief applications illustrate the use of PCMs as petrologic tools: to reveal mineral composition patterns at multiple scales; to generate crystal size distributions for intracrystalline compositional zones and compare growth over time; and to image spatial distributions of minerals at different stages of magma crystallization by integrating textures and compositions with thermodynamic modeling.
Ha, Dongmun; Song, Inmyung; Lee, Eui-Kyung; Shin, Ju-Young
2018-05-03
Predicting pharmacy service fees is crucial to sustain the health insurance budget and maintain pharmacy management. However, there is no evidence on how to predict pharmacy service fees at the population level. This study compares the status of pharmacy services and constructs regression model to project annual pharmacy service fees in Korea. We conducted a time-series analysis by using sample data from the national health insurance database from 2006 and 2012. To reflect the latest trend, we categorized pharmacies into general hospital, special hospital, and clinic outpatient pharmacies based on the major source of service fees, using a 1% sample of the 2012 data. We estimated the daily number of prescriptions, pharmacy service fees, and drugs costs according to these three types of pharmacy services. To forecast pharmacy service fees, a regression model was constructed to estimate annual fees in the following year (2013). The dependent variable was pharmacy service fees and the independent variables were the number of prescriptions and service fees per pharmacy, ratio of patients (≥ 65 years), conversion factor, change of policy, and types of pharmacy services. Among the 21,283 pharmacies identified, 5.0% (1064), 4.6% (974), and 77.5% (16,340) were general hospital, special hospital, and clinic outpatient pharmacies, respectively, in 2012. General hospital pharmacies showed a higher daily number of prescriptions (111.9), higher pharmacy service fees ($25,546,342), and higher annual drugs costs ($215,728,000) per pharmacy than any other pharmacy (p < 0.05). The regression model to project found the ratio of patients aged 65 years and older and the conversion factor to be associated with an increase in pharmacy service fees. It also estimated the future rate of increase in pharmacy service fees to be between 3.1% and 7.8%. General hospital outpatient pharmacies spent more on annual pharmacy service fees than any other type of pharmacy. The forecast of annual pharmacy service fees in Korea was similar to that of Australia, but not that of the United Kingdom.
An, Ya-chen; Chen, Yun-xia; Wang, Yu-xun; Zhao, Xiao-jing; Wang, Yan; Zhang, Jiang; Li, Chun-ling; Peng, Yan-bo; Gao, Su-ling; Chang, Li-sha; Zhang, Li; Xue, Xin-hong; Chen, Rui-ying; Wang, Da-li
2011-08-01
To investigate the risk factors and establish the Cox's regression model on the recurrence of ischemic stroke. We retrospectively reviewed consecutive patients with ischemic stroke admitted to the Neurology Department of the Hebei United University Affiliated Hospital between January 1, 2008 and December 31, 2009. Cases had been followed since the onset of ischemic stroke. The follow-up program was finished in June 30, 2010. Kaplan-Meier methods were used to describe the recurrence rate. Monovariant and multivariate Cox's proportional hazard regression model were used to analyze the risk factors associated to the episodes of recurrence. And then, a recurrence model was set up. During the period of follow-up program, 79 cases were relapsed, with the recurrence rates as 12.75% in one year and 18.87% in two years. Monovariant and multivariate Cox's proportional hazard regression model showed that the independent risk factors that were associated with the recurrence appeared to be age (X₁) (RR = 1.025, 95%CI: 1.003 - 1.048), history of hypertension (X₂) (RR = 1.976, 95%CI: 1.014 - 3.851), history of family strokes (X₃) (RR = 2.647, 95%CI: 1.175 - 5.961), total cholesterol amount (X₄) (RR = 1.485, 95%CI: 1.214 - 1.817), ESRS total scores (X₅) (RR = 1.327, 95%CI: 1.057 - 1.666) and progression of the disease (X₆) (RR = 1.889, 95%CI: 1.123 - 3.178). Personal prognosis index (PI) of the recurrence model was as follows: PI = 0.025X₁ + 0.681X₂ + 0.973X₃ + 0.395X₄ + 0.283X₅ + 0.636X₆. The smaller the personal prognosis index was, the lower the recurrence risk appeared, while the bigger the personal prognosis index was, the higher the recurrence risk appeared. Age, history of hypertension, total cholesterol amount, total scores of ESRS, together with the disease progression were the independent risk factors associated with the recurrence episodes of ischemic stroke. Both recurrence model and the personal prognosis index equation were successful constructed.
An example of complex modelling in dentistry using Markov chain Monte Carlo (MCMC) simulation.
Helfenstein, Ulrich; Menghini, Giorgio; Steiner, Marcel; Murati, Francesca
2002-09-01
In the usual regression setting one regression line is computed for a whole data set. In a more complex situation, each person may be observed for example at several points in time and thus a regression line might be calculated for each person. Additional complexities, such as various forms of errors in covariables may make a straightforward statistical evaluation difficult or even impossible. During recent years methods have been developed allowing convenient analysis of problems where the data and the corresponding models show these and many other forms of complexity. The methodology makes use of a Bayesian approach and Markov chain Monte Carlo (MCMC) simulations. The methods allow the construction of increasingly elaborate models by building them up from local sub-models. The essential structure of the models can be represented visually by directed acyclic graphs (DAG). This attractive property allows communication and discussion of the essential structure and the substantial meaning of a complex model without needing algebra. After presentation of the statistical methods an example from dentistry is presented in order to demonstrate their application and use. The dataset of the example had a complex structure; each of a set of children was followed up over several years. The number of new fillings in permanent teeth had been recorded at several ages. The dependent variables were markedly different from the normal distribution and could not be transformed to normality. In addition, explanatory variables were assumed to be measured with different forms of error. Illustration of how the corresponding models can be estimated conveniently via MCMC simulation, in particular, 'Gibbs sampling', using the freely available software BUGS is presented. In addition, how the measurement error may influence the estimates of the corresponding coefficients is explored. It is demonstrated that the effect of the independent variable on the dependent variable may be markedly underestimated if the measurement error is not taken into account ('regression dilution bias'). Markov chain Monte Carlo methods may be of great value to dentists in allowing analysis of data sets which exhibit a wide range of different forms of complexity.
NASA Astrophysics Data System (ADS)
Buermeyer, Jonas; Gundlach, Matthias; Grund, Anna-Lisa; Grimm, Volker; Spizyn, Alexander; Breckow, Joachim
2016-09-01
This work is part of the analysis of the effects of constructional energy-saving measures to radon concentration levels in dwellings performed on behalf of the German Federal Office for Radiation Protection. In parallel to radon measurements for five buildings, both meteorological data outside the buildings and the indoor climate factors were recorded. In order to access effects of inhabited buildings, the amount of carbon dioxide (CO2) was measured. For a statistical linear regression model, the data of one object was chosen as an example. Three dummy variables were extracted from the process of the CO2 concentration to provide information on the usage and ventilation of the room. The analysis revealed a highly autoregressive model for the radon concentration with additional influence by the natural environmental factors. The autoregression implies a strong dependency on a radon source since it reflects a backward dependency in time. At this point of the investigation, it cannot be determined whether the influence by outside factors affects the source of radon or the habitant’s ventilation behavior resulting in variation of the occurring concentration levels. In any case, the regression analysis might provide further information that would help to distinguish these effects. In the next step, the influence factors will be weighted according to their impact on the concentration levels. This might lead to a model that enables the prediction of radon concentration levels based on the measurement of CO2 in combination with environmental parameters, as well as the development of advices for ventilation.
[Exploring novel hyperspectral band and key index for leaf nitrogen accumulation in wheat].
Yao, Xia; Zhu, Yan; Feng, Wei; Tian, Yong-Chao; Cao, Wei-Xing
2009-08-01
The objectives of the present study were to explore new sensitive spectral bands and ratio spectral indices based on precise analysis of ground-based hyperspectral information, and then develop regression model for estimating leaf N accumulation per unit soil area (LNA) in winter wheat (Triticum aestivum L.). Three field experiments were conducted with different N rates and cultivar types in three consecutive growing seasons, and time-course measurements were taken on canopy hyperspectral reflectance and LNA tinder the various treatments. By adopting the method of reduced precise sampling, the detailed ratio spectral indices (RSI) within the range of 350-2 500 nm were constructed, and the quantitative relationships between LNA (gN m(-2)) and RSI (i, j) were analyzed. It was found that several key spectral bands and spectral indices were suitable for estimating LNA in wheat, and the spectral parameter RSI (990, 720) was the most reliable indicator for LNA in wheat. The regression model based on the best RSI was formulated as y = 5.095x - 6.040, with R2 of 0.814. From testing of the derived equations with independent experiment data, the model on RSI (990, 720) had R2 of 0.847 and RRMSE of 24.7%. Thus, it is concluded that the present hyperspectral parameter of RSI (990, 720) and derived regression model can be reliably used for estimating LNA in winter wheat. These results provide the feasible key bands and technical basis for developing the portable instrument of monitoring wheat nitrogen status and for extracting useful spectral information from remote sensing images.
Cost estimators for construction of forest roads in the central Appalachians
Deborah, A. Layton; Chris O. LeDoux; Curt C. Hassler; Curt C. Hassler
1992-01-01
Regression equations were developed for estimating the total cost of road construction in the central Appalachian region. Estimators include methods for predicting total costs for roads constructed using hourly rental methods and roads built on a total-job bid basis. Results show that total-job bid roads cost up to five times as much as roads built than when equipment...
Francisco, Fabiane Lacerda; Saviano, Alessandro Morais; Almeida, Túlia de Souza Botelho; Lourenço, Felipe Rebello
2016-05-01
Microbiological assays are widely used to estimate the relative potencies of antibiotics in order to guarantee the efficacy, safety, and quality of drug products. Despite of the advantages of turbidimetric bioassays when compared to other methods, it has limitations concerning the linearity and range of the dose-response curve determination. Here, we proposed to use partial least squares (PLS) regression to solve these limitations and to improve the prediction of relative potencies of antibiotics. Kinetic-reading microplate turbidimetric bioassays for apramacyin and vancomycin were performed using Escherichia coli (ATCC 8739) and Bacillus subtilis (ATCC 6633), respectively. Microbial growths were measured as absorbance up to 180 and 300min for apramycin and vancomycin turbidimetric bioassays, respectively. Conventional dose-response curves (absorbances or area under the microbial growth curve vs. log of antibiotic concentration) showed significant regression, however there were significant deviation of linearity. Thus, they could not be used for relative potency estimations. PLS regression allowed us to construct a predictive model for estimating the relative potencies of apramycin and vancomycin without over-fitting and it improved the linear range of turbidimetric bioassay. In addition, PLS regression provided predictions of relative potencies equivalent to those obtained from agar diffusion official methods. Therefore, we conclude that PLS regression may be used to estimate the relative potencies of antibiotics with significant advantages when compared to conventional dose-response curve determination. Copyright © 2016 Elsevier B.V. All rights reserved.
Distributed Lag Models: Examining Associations between the Built Environment and Health
Baek, Jonggyu; Sánchez, Brisa N.; Berrocal, Veronica J.; Sanchez-Vaznaugh, Emma V.
2016-01-01
Built environment factors constrain individual level behaviors and choices, and thus are receiving increasing attention to assess their influence on health. Traditional regression methods have been widely used to examine associations between built environment measures and health outcomes, where a fixed, pre-specified spatial scale (e.g., 1 mile buffer) is used to construct environment measures. However, the spatial scale for these associations remains largely unknown and misspecifying it introduces bias. We propose the use of distributed lag models (DLMs) to describe the association between built environment features and health as a function of distance from the locations of interest and circumvent a-priori selection of a spatial scale. Based on simulation studies, we demonstrate that traditional regression models produce associations biased away from the null when there is spatial correlation among the built environment features. Inference based on DLMs is robust under a range of scenarios of the built environment. We use this innovative application of DLMs to examine the association between the availability of convenience stores near California public schools, which may affect children’s dietary choices both through direct access to junk food and exposure to advertisement, and children’s body mass index z-scores (BMIz). PMID:26414942
A Novel Multiobjective Evolutionary Algorithm Based on Regression Analysis
Song, Zhiming; Wang, Maocai; Dai, Guangming; Vasile, Massimiliano
2015-01-01
As is known, the Pareto set of a continuous multiobjective optimization problem with m objective functions is a piecewise continuous (m − 1)-dimensional manifold in the decision space under some mild conditions. However, how to utilize the regularity to design multiobjective optimization algorithms has become the research focus. In this paper, based on this regularity, a model-based multiobjective evolutionary algorithm with regression analysis (MMEA-RA) is put forward to solve continuous multiobjective optimization problems with variable linkages. In the algorithm, the optimization problem is modelled as a promising area in the decision space by a probability distribution, and the centroid of the probability distribution is (m − 1)-dimensional piecewise continuous manifold. The least squares method is used to construct such a model. A selection strategy based on the nondominated sorting is used to choose the individuals to the next generation. The new algorithm is tested and compared with NSGA-II and RM-MEDA. The result shows that MMEA-RA outperforms RM-MEDA and NSGA-II on the test instances with variable linkages. At the same time, MMEA-RA has higher efficiency than the other two algorithms. A few shortcomings of MMEA-RA have also been identified and discussed in this paper. PMID:25874246
Shaw, Souradet Y; Lorway, Robert; Bhattacharjee, Parinita; Reza-Paul, Sushena; du Plessis, Elsabé; McKinnon, Lyle; Thompson, Laura H; Isac, Shajy; Ramesh, Banadakoppa M; Washington, Reynold; Moses, Stephen; Blanchard, James F
2016-08-01
Men and transgender women who have sex with men (MTWSM) continue to be an at-risk population for human immunodeficiency virus (HIV) infection in India. Identification of risk factors and determinants of HIV infection is urgently needed to inform prevention and intervention programming. Data were collected from cross-sectional biological and behavioral surveys from four districts in Karnataka, India. Multivariable logistic regression models were constructed to examine factors related to HIV infection. Sociodemographic, sexual history, sex work history, condom practices, and substance use covariates were included in regression models. A total of 456 participants were included; HIV prevalence was 12.4%, with the highest prevalence (26%) among MTWSM from Bellary District. In bivariate analyses, district (P = 0.002), lack of a current regular female partner (P = 0.022), and reported consumption of an alcoholic drink in the last month (P = 0.004) were associated with HIV infection. In multivariable models, only alcohol use remained statistically significant (adjusted odds ratios: 2.6, 95% confidence intervals: 1.2-5.8; P = 0.02). The prevalence of HIV continues to be high among MTWSM, with the highest prevalence found in Bellary district.
Marsillas, Sara; De Donder, Liesbeth; Kardol, Tinie; van Regenmortel, Sofie; Dury, Sarah; Brosens, Dorien; Smetcoren, An-Sofie; Braña, Teresa; Varela, Jesús
2017-09-01
Several debates have emerged across the literature about the conceptualisation of active ageing. The aim of this study is to develop a model of the construct that is focused on the individual, including different elements of people's lives that have the potential to be modified by intervention programs. Moreover, the paper examines the contributions of active ageing to life satisfaction, as well as the possible predictive role of coping styles on active ageing. For this purpose, a representative sample of 404 Galician (Spain) community-dwelling older adults (aged ≥60 years) were interviewed using a structured survey. The results demonstrate that the proposed model composed of two broad categories is valid. The model comprises status variables (related to physical, psychological, and social health) as well as different types of activities, called processual variables. This model is tested using partial least squares (PLS) regression. The findings show that active ageing is a fourth-order, formative construct. In addition, PLS analyses indicate that active ageing has a moderate and positive path on life satisfaction and that coping styles may predict active ageing. The discussion highlights the potential of active ageing as a relevant concept for people's lives, drawing out policy implications and suggestions for further research.
Assessing product image quality for online shopping
NASA Astrophysics Data System (ADS)
Goswami, Anjan; Chung, Sung H.; Chittar, Naren; Islam, Atiq
2012-01-01
Assessing product-image quality is important in the context of online shopping. A high quality image that conveys more information about a product can boost the buyer's confidence and can get more attention. However, the notion of image quality for product-images is not the same as that in other domains. The perception of quality of product-images depends not only on various photographic quality features but also on various high level features such as clarity of the foreground or goodness of the background etc. In this paper, we define a notion of product-image quality based on various such features. We conduct a crowd-sourced experiment to collect user judgments on thousands of eBay's images. We formulate a multi-class classification problem for modeling image quality by classifying images into good, fair and poor quality based on the guided perceptual notions from the judges. We also conduct experiments with regression using average crowd-sourced human judgments as target. We compute a pseudo-regression score with expected average of predicted classes and also compute a score from the regression technique. We design many experiments with various sampling and voting schemes with crowd-sourced data and construct various experimental image quality models. Most of our models have reasonable accuracies (greater or equal to 70%) on test data set. We observe that our computed image quality score has a high (0.66) rank correlation with average votes from the crowd sourced human judgments.
Hsieh, Chung-Ho; Lu, Ruey-Hwa; Lee, Nai-Hsin; Chiu, Wen-Ta; Hsu, Min-Huei; Li, Yu-Chuan Jack
2011-01-01
Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16-85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The AUC of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making. Copyright © 2011 Mosby, Inc. All rights reserved.
Are Stage of Change constructs relevant for subjective oral health in a vulnerable population?
Jamieson, L M; Parker, E J; Broughton, J; Lawrence, H P; Armfield, J M
2015-06-01
Stage of Change constructs may be proxy markers of psychosocial health which, in turn, are related to oral health. To determine if Stage of Change constructs were associated with subjective oral health in a population at heightened risk of dental disease. Stage of Change constructs were developed from a validated 18-item scale and categorised into 'Pre-contemplative', 'Contemplative' and 'Active'. A convenience sample of 446 Australian non-Aboriginal women pregnant by an Aboriginal male (age range 14-43 years) provided data to evaluate the outcome variables (self-rated oral health and oral health impairment), the Stage of Change constructs and socio-demographic, behavioural and access-related factors. Factors significant at the p < 0.05 level in bivariate analysis were entered into prevalence regression models. Approximately 54% of participants had fair/poor self-rated oral health and 34% had oral health impairment. Around 12% were 'Pre-contemplative', 46% 'Contemplative' and 42% 'Active'. Being either 'pre-contemplative' or 'contemplative' was associated with poor self-rated oral health after adjusting for socio-demographic factors. 'Pre-contemplative' ceased being significant after adjusting for dentate status and dental behaviour. 'Pre-contemplative' remained significant when adjusting for dental cost, but not 'Contemplative'. The Stages of Change constructs ceased being associated with self-rated oral health after adjusting for all confounders. Only 'Contemplative' (reference: 'Active') was a risk indicator in the null model for oral health impairment which persisted after adding dentate status, dental behaviour and dental cost variables, but not socio-demographics. When adjusting for all confounders, 'Contemplative' was not a risk indicator for oral health impairment. Both the 'Pre-contemplative' and 'Contemplative' Stage of Change constructs were associated with poor self-rated oral health and oral health impairment after adjusting for some, but not all, covariates. When considered as a proxy marker of psychosocial health, Stage of Change constructs may have some relevance for subjective oral health.
A Method for Calculating the Probability of Successfully Completing a Rocket Propulsion Ground Test
NASA Technical Reports Server (NTRS)
Messer, Bradley P.
2004-01-01
Propulsion ground test facilities face the daily challenges of scheduling multiple customers into limited facility space and successfully completing their propulsion test projects. Due to budgetary and schedule constraints, NASA and industry customers are pushing to test more components, for less money, in a shorter period of time. As these new rocket engine component test programs are undertaken, the lack of technology maturity in the test articles, combined with pushing the test facilities capabilities to their limits, tends to lead to an increase in facility breakdowns and unsuccessful tests. Over the last five years Stennis Space Center's propulsion test facilities have performed hundreds of tests, collected thousands of seconds of test data, and broken numerous test facility and test article parts. While various initiatives have been implemented to provide better propulsion test techniques and improve the quality, reliability, and maintainability of goods and parts used in the propulsion test facilities, unexpected failures during testing still occur quite regularly due to the harsh environment in which the propulsion test facilities operate. Previous attempts at modeling the lifecycle of a propulsion component test project have met with little success. Each of the attempts suffered form incomplete or inconsistent data on which to base the models. By focusing on the actual test phase of the tests project rather than the formulation, design or construction phases of the test project, the quality and quantity of available data increases dramatically. A logistic regression model has been developed form the data collected over the last five years, allowing the probability of successfully completing a rocket propulsion component test to be calculated. A logistic regression model is a mathematical modeling approach that can be used to describe the relationship of several independent predictor variables X(sub 1), X(sub 2),..,X(sub k) to a binary or dichotomous dependent variable Y, where Y can only be one of two possible outcomes, in this case Success or Failure. Logistic regression has primarily been used in the fields of epidemiology and biomedical research, but lends itself to many other applications. As indicated the use of logistic regression is not new, however, modeling propulsion ground test facilities using logistic regression is both a new and unique application of the statistical technique. Results from the models provide project managers with insight and confidence into the affectivity of rocket engine component ground test projects. The initial success in modeling rocket propulsion ground test projects clears the way for more complex models to be developed in this area.
GWAS-based machine learning approach to predict duloxetine response in major depressive disorder.
Maciukiewicz, Malgorzata; Marshe, Victoria S; Hauschild, Anne-Christin; Foster, Jane A; Rotzinger, Susan; Kennedy, James L; Kennedy, Sidney H; Müller, Daniel J; Geraci, Joseph
2018-04-01
Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant drugs. However, large variability is observed in terms of response to antidepressants. Machine learning (ML) models may be useful to predict treatment outcomes. A sample of 186 MDD patients received treatment with duloxetine for up to 8 weeks were categorized as "responders" based on a MADRS change >50% from baseline; or "remitters" based on a MADRS score ≤10 at end point. The initial dataset (N = 186) was randomly divided into training and test sets in a nested 5-fold cross-validation, where 80% was used as a training set and 20% made up five independent test sets. We performed genome-wide logistic regression to identify potentially significant variants related to duloxetine response/remission and extracted the most promising predictors using LASSO regression. Subsequently, classification-regression trees (CRT) and support vector machines (SVM) were applied to construct models, using ten-fold cross-validation. With regards to response, none of the pairs performed significantly better than chance (accuracy p > .1). For remission, SVM achieved moderate performance with an accuracy = 0.52, a sensitivity = 0.58, and a specificity = 0.46, and 0.51 for all coefficients for CRT. The best performing SVM fold was characterized by an accuracy = 0.66 (p = .071), sensitivity = 0.70 and a sensitivity = 0.61. In this study, the potential of using GWAS data to predict duloxetine outcomes was examined using ML models. The models were characterized by a promising sensitivity, but specificity remained moderate at best. The inclusion of additional non-genetic variables to create integrated models may improve prediction. Copyright © 2017. Published by Elsevier Ltd.
van Mil, Anke C C M; Greyling, Arno; Zock, Peter L; Geleijnse, Johanna M; Hopman, Maria T; Mensink, Ronald P; Reesink, Koen D; Green, Daniel J; Ghiadoni, Lorenzo; Thijssen, Dick H
2016-09-01
Brachial artery flow-mediated dilation (FMD) is a popular technique to examine endothelial function in humans. Identifying volunteer and methodological factors related to variation in FMD is important to improve measurement accuracy and applicability. Volunteer-related and methodology-related parameters were collected in 672 volunteers from eight affiliated centres worldwide who underwent repeated measures of FMD. All centres adopted contemporary expert-consensus guidelines for FMD assessment. After calculating the coefficient of variation (%) of the FMD for each individual, we constructed quartiles (n = 168 per quartile). Based on two regression models (volunteer-related factors and methodology-related factors), statistically significant components of these two models were added to a final regression model (calculated as β-coefficient and R). This allowed us to identify factors that independently contributed to the variation in FMD%. Median coefficient of variation was 17.5%, with healthy volunteers demonstrating a coefficient of variation 9.3%. Regression models revealed age (β = 0.248, P < 0.001), hypertension (β = 0.104, P < 0.001), dyslipidemia (β = 0.331, P < 0.001), time between measurements (β = 0.318, P < 0.001), lab experience (β = -0.133, P < 0.001) and baseline FMD% (β = 0.082, P < 0.05) as contributors to the coefficient of variation. After including all significant factors in the final model, we found that time between measurements, hypertension, baseline FMD% and lab experience with FMD independently predicted brachial artery variability (total R = 0.202). Although FMD% showed good reproducibility, larger variation was observed in conditions with longer time between measurements, hypertension, less experience and lower baseline FMD%. Accounting for these factors may improve FMD% variability.
NASA Astrophysics Data System (ADS)
Nahib, Irmadi; Suryanta, Jaka; Niedyawati; Kardono, Priyadi; Turmudi; Lestari, Sri; Windiastuti, Rizka
2018-05-01
Ministry of Agriculture have targeted production of 1.718 million tons of dry grain harvest during period of 2016-2021 to achieve food self-sufficiency, through optimization of special commodities including paddy, soybean and corn. This research was conducted to develop a sustainable paddy field zone delineation model using logistic regression and multicriteria land evaluation in Indramayu Regency. A model was built on the characteristics of local function conversion by considering the concept of sustainable development. Spatial data overlay was constructed using available data, and then this model was built upon the occurrence of paddy field between 1998 and 2015. Equation for the model of paddy field changes obtained was: logit (paddy field conversion) = - 2.3048 + 0.0032*X1 – 0.0027*X2 + 0.0081*X3 + 0.0025*X4 + 0.0026*X5 + 0.0128*X6 – 0.0093*X7 + 0.0032*X8 + 0.0071*X9 – 0.0046*X10 where X1 to X10 were variables that determine the occurrence of changes in paddy fields, with a result value of Relative Operating Characteristics (ROC) of 0.8262. The weakest variable in influencing the change of paddy field function was X7 (paddy field price), while the most influential factor was X1 (distance from river). Result of the logistic regression was used as a weight for multicriteria land evaluation, which recommended three scenarios of paddy fields protection policy: standard, protective, and permissive. The result of this modelling, the priority paddy fields for protected scenario were obtained, as well as the buffer zones for the surrounding paddy fields.
Survival Regression Modeling Strategies in CVD Prediction.
Barkhordari, Mahnaz; Padyab, Mojgan; Sardarinia, Mahsa; Hadaegh, Farzad; Azizi, Fereidoun; Bozorgmanesh, Mohammadreza
2016-04-01
A fundamental part of prevention is prediction. Potential predictors are the sine qua non of prediction models. However, whether incorporating novel predictors to prediction models could be directly translated to added predictive value remains an area of dispute. The difference between the predictive power of a predictive model with (enhanced model) and without (baseline model) a certain predictor is generally regarded as an indicator of the predictive value added by that predictor. Indices such as discrimination and calibration have long been used in this regard. Recently, the use of added predictive value has been suggested while comparing the predictive performances of the predictive models with and without novel biomarkers. User-friendly statistical software capable of implementing novel statistical procedures is conspicuously lacking. This shortcoming has restricted implementation of such novel model assessment methods. We aimed to construct Stata commands to help researchers obtain the aforementioned statistical indices. We have written Stata commands that are intended to help researchers obtain the following. 1, Nam-D'Agostino X 2 goodness of fit test; 2, Cut point-free and cut point-based net reclassification improvement index (NRI), relative absolute integrated discriminatory improvement index (IDI), and survival-based regression analyses. We applied the commands to real data on women participating in the Tehran lipid and glucose study (TLGS) to examine if information relating to a family history of premature cardiovascular disease (CVD), waist circumference, and fasting plasma glucose can improve predictive performance of Framingham's general CVD risk algorithm. The command is adpredsurv for survival models. Herein we have described the Stata package "adpredsurv" for calculation of the Nam-D'Agostino X 2 goodness of fit test as well as cut point-free and cut point-based NRI, relative and absolute IDI, and survival-based regression analyses. We hope this work encourages the use of novel methods in examining predictive capacity of the emerging plethora of novel biomarkers.
Kashuba, Roxolana; Cha, YoonKyung; Alameddine, Ibrahim; Lee, Boknam; Cuffney, Thomas F.
2010-01-01
Multilevel hierarchical modeling methodology has been developed for use in ecological data analysis. The effect of urbanization on stream macroinvertebrate communities was measured across a gradient of basins in each of nine metropolitan regions across the conterminous United States. The hierarchical nature of this dataset was harnessed in a multi-tiered model structure, predicting both invertebrate response at the basin scale and differences in invertebrate response at the region scale. Ordination site scores, total taxa richness, Ephemeroptera, Plecoptera, Trichoptera (EPT) taxa richness, and richness-weighted mean tolerance of organisms at a site were used to describe invertebrate responses. Percentage of urban land cover was used as a basin-level predictor variable. Regional mean precipitation, air temperature, and antecedent agriculture were used as region-level predictor variables. Multilevel hierarchical models were fit to both levels of data simultaneously, borrowing statistical strength from the complete dataset to reduce uncertainty in regional coefficient estimates. Additionally, whereas non-hierarchical regressions were only able to show differing relations between invertebrate responses and urban intensity separately for each region, the multilevel hierarchical regressions were able to explain and quantify those differences within a single model. In this way, this modeling approach directly establishes the importance of antecedent agricultural conditions in masking the response of invertebrates to urbanization in metropolitan regions such as Milwaukee-Green Bay, Wisconsin; Denver, Colorado; and Dallas-Fort Worth, Texas. Also, these models show that regions with high precipitation, such as Atlanta, Georgia; Birmingham, Alabama; and Portland, Oregon, start out with better regional background conditions of invertebrates prior to urbanization but experience faster negative rates of change with urbanization. Ultimately, this urbanization-invertebrate response example is used to detail the multilevel hierarchical construction methodology, showing how the result is a set of models that are both statistically more rigorous and ecologically more interpretable than simple linear regression models.
Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-12-01
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-09-01
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Learning Instance-Specific Predictive Models
Visweswaran, Shyam; Cooper, Gregory F.
2013-01-01
This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. PMID:25045325
NASA Astrophysics Data System (ADS)
Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-12-01
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
NASA Astrophysics Data System (ADS)
Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-09-01
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Linear models: permutation methods
Cade, B.S.; Everitt, B.S.; Howell, D.C.
2005-01-01
Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...
Li, Lin; Xu, Shuo; An, Xin; Zhang, Lu-Da
2011-10-01
In near infrared spectral quantitative analysis, the precision of measured samples' chemical values is the theoretical limit of those of quantitative analysis with mathematical models. However, the number of samples that can obtain accurately their chemical values is few. Many models exclude the amount of samples without chemical values, and consider only these samples with chemical values when modeling sample compositions' contents. To address this problem, a semi-supervised LS-SVR (S2 LS-SVR) model is proposed on the basis of LS-SVR, which can utilize samples without chemical values as well as those with chemical values. Similar to the LS-SVR, to train this model is equivalent to solving a linear system. Finally, the samples of flue-cured tobacco were taken as experimental material, and corresponding quantitative analysis models were constructed for four sample compositions' content(total sugar, reducing sugar, total nitrogen and nicotine) with PLS regression, LS-SVR and S2 LS-SVR. For the S2 LS-SVR model, the average relative errors between actual values and predicted ones for the four sample compositions' contents are 6.62%, 7.56%, 6.11% and 8.20%, respectively, and the correlation coefficients are 0.974 1, 0.973 3, 0.923 0 and 0.948 6, respectively. Experimental results show the S2 LS-SVR model outperforms the other two, which verifies the feasibility and efficiency of the S2 LS-SVR model.
Boosted structured additive regression for Escherichia coli fed-batch fermentation modeling.
Melcher, Michael; Scharl, Theresa; Luchner, Markus; Striedner, Gerald; Leisch, Friedrich
2017-02-01
The quality of biopharmaceuticals and patients' safety are of highest priority and there are tremendous efforts to replace empirical production process designs by knowledge-based approaches. Main challenge in this context is that real-time access to process variables related to product quality and quantity is severely limited. To date comprehensive on- and offline monitoring platforms are used to generate process data sets that allow for development of mechanistic and/or data driven models for real-time prediction of these important quantities. Ultimate goal is to implement model based feed-back control loops that facilitate online control of product quality. In this contribution, we explore structured additive regression (STAR) models in combination with boosting as a variable selection tool for modeling the cell dry mass, product concentration, and optical density on the basis of online available process variables and two-dimensional fluorescence spectroscopic data. STAR models are powerful extensions of linear models allowing for inclusion of smooth effects or interactions between predictors. Boosting constructs the final model in a stepwise manner and provides a variable importance measure via predictor selection frequencies. Our results show that the cell dry mass can be modeled with a relative error of about ±3%, the optical density with ±6%, the soluble protein with ±16%, and the insoluble product with an accuracy of ±12%. Biotechnol. Bioeng. 2017;114: 321-334. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Hemmateenejad, Bahram; Yazdani, Mahdieh
2009-02-16
Steroids are widely distributed in nature and are found in plants, animals, and fungi in abundance. A data set consists of a diverse set of steroids have been used to develop quantitative structure-electrochemistry relationship (QSER) models for their half-wave reduction potential. Modeling was established by means of multiple linear regression (MLR) and principle component regression (PCR) analyses. In MLR analysis, the QSPR models were constructed by first grouping descriptors and then stepwise selection of variables from each group (MLR1) and stepwise selection of predictor variables from the pool of all calculated descriptors (MLR2). Similar procedure was used in PCR analysis so that the principal components (or features) were extracted from different group of descriptors (PCR1) and from entire set of descriptors (PCR2). The resulted models were evaluated using cross-validation, chance correlation, application to prediction reduction potential of some test samples and accessing applicability domain. Both MLR approaches represented accurate results however the QSPR model found by MLR1 was statistically more significant. PCR1 approach produced a model as accurate as MLR approaches whereas less accurate results were obtained by PCR2 approach. In overall, the correlation coefficients of cross-validation and prediction of the QSPR models resulted from MLR1, MLR2 and PCR1 approaches were higher than 90%, which show the high ability of the models to predict reduction potential of the studied steroids.
NASA Astrophysics Data System (ADS)
Denli, H. H.; Koc, Z.
2015-12-01
Estimation of real properties depending on standards is difficult to apply in time and location. Regression analysis construct mathematical models which describe or explain relationships that may exist between variables. The problem of identifying price differences of properties to obtain a price index can be converted into a regression problem, and standard techniques of regression analysis can be used to estimate the index. Considering regression analysis for real estate valuation, which are presented in real marketing process with its current characteristics and quantifiers, the method will help us to find the effective factors or variables in the formation of the value. In this study, prices of housing for sale in Zeytinburnu, a district in Istanbul, are associated with its characteristics to find a price index, based on information received from a real estate web page. The associated variables used for the analysis are age, size in m2, number of floors having the house, floor number of the estate and number of rooms. The price of the estate represents the dependent variable, whereas the rest are independent variables. Prices from 60 real estates have been used for the analysis. Same price valued locations have been found and plotted on the map and equivalence curves have been drawn identifying the same valued zones as lines.
Anodic microbial community diversity as a predictor of the power output of microbial fuel cells.
Stratford, James P; Beecroft, Nelli J; Slade, Robert C T; Grüning, André; Avignone-Rossa, Claudio
2014-03-01
The relationship between the diversity of mixed-species microbial consortia and their electrogenic potential in the anodes of microbial fuel cells was examined using different diversity measures as predictors. Identical microbial fuel cells were sampled at multiple time-points. Biofilm and suspension communities were analysed by denaturing gradient gel electrophoresis to calculate the number and relative abundance of species. Shannon and Simpson indices and richness were examined for association with power using bivariate and multiple linear regression, with biofilm DNA as an additional variable. In simple bivariate regressions, the correlation of Shannon diversity of the biofilm and power is stronger (r=0.65, p=0.001) than between power and richness (r=0.39, p=0.076), or between power and the Simpson index (r=0.5, p=0.018). Using Shannon diversity and biofilm DNA as predictors of power, a regression model can be constructed (r=0.73, p<0.001). Ecological parameters such as the Shannon index are predictive of the electrogenic potential of microbial communities. Copyright © 2014 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Valizadeh, Maryam; Sohrabi, Mahmoud Reza
2018-03-01
In the present study, artificial neural networks (ANNs) and support vector regression (SVR) as intelligent methods coupled with UV spectroscopy for simultaneous quantitative determination of Dorzolamide (DOR) and Timolol (TIM) in eye drop. Several synthetic mixtures were analyzed for validating the proposed methods. At first, neural network time series, which one type of network from the artificial neural network was employed and its efficiency was evaluated. Afterwards, the radial basis network was applied as another neural network. Results showed that the performance of this method is suitable for predicting. Finally, support vector regression was proposed to construct the Zilomole prediction model. Also, root mean square error (RMSE) and mean recovery (%) were calculated for SVR method. Moreover, the proposed methods were compared to the high-performance liquid chromatography (HPLC) as a reference method. One way analysis of variance (ANOVA) test at the 95% confidence level applied to the comparison results of suggested and reference methods that there were no significant differences between them. Also, the effect of interferences was investigated in spike solutions.
Nath, Dilip C.; Mwchahary, Dimacha Dwibrang
2013-01-01
A favorable climatic condition for transmission of malaria prevails in Kokrajhar district throughout the year. A sizeable part of the district is covered by forest due to which dissimilar dynamics of malaria transmission emerge in forest and non-forest areas. Observed malaria incidence rates of forest area, non-forest area and the whole district over the period 2001-2010 were considered for analyzing temporal correlation between malaria incidence and climatic variables. Associations between the two were examined by Pearson correlation analysis. Cross-correlation tests were performed between pre-whitened series of climatic variable and malaria series. Linear regressions were used to obtain linear relationships between climatic factors and malaria incidence, while weighted least squares regression was used to construct models for explaining and estimating malaria incidence rates. Annual concentration of malaria incidence was analyzed by Markham technique by obtaining seasonal index. Forest area and non-forest area have distinguishable malaria seasons. Relative humidity was positively correlated with z malaria incidence, while temperature series were negatively correlated with non-forest malaria incidence. There was higher seasonality of concentration of malaria in the forest area than non-forest area. Significant correlation between annual changes in malaria cases in forest area and temperature was observed (coeff=0.689, p=0.040). Separate reliable models constructed for forecasting malaria incidence rates based on the combined influence of climatic variables on malaria incidence in different areas of the district were able to explain substantial percentage of observed variability in the incidence rates (R2adj=45.4%, 50.6%, 47.2%; p< .001 for all). There is an intricate association between climatic variables and malaria incidence of the district. Climatic variables influence malaria incidence in forest area and non-forest area in different ways. Rainfall plays a primary role in characterizing malaria incidences in the district. Malaria parasites in the district had adapted to a relative humidity condition higher than the normal range for transmission in India. Instead of individual influence of the climatic variables, their combined influence was utilizable for construction of models. PMID:23283041
Nath, Dilip C; Mwchahary, Dimacha Dwibrang
2012-11-11
A favorable climatic condition for transmission of malaria prevails in Kokrajhar district throughout the year. A sizeable part of the district is covered by forest due to which dissimilar dynamics of malaria transmission emerge in forest and non-forest areas. Observed malaria incidence rates of forest area, non-forest area and the whole district over the period 2001-2010 were considered for analyzing temporal correlation between malaria incidence and climatic variables. Associations between the two were examined by Pearson correlation analysis. Cross-correlation tests were performed between pre-whitened series of climatic variable and malaria series. Linear regressions were used to obtain linear relationships between climatic factors and malaria incidence, while weighted least squares regression was used to construct models for explaining and estimating malaria incidence rates. Annual concentration of malaria incidence was analyzed by Markham technique by obtaining seasonal index. Forest area and non-forest area have distinguishable malaria seasons. Relative humidity was positively correlated with forest malaria incidence, while temperature series were negatively correlated with non-forest malaria incidence. There was higher seasonality of concentration of malaria in the forest area than non-forest area. Significant correlation between annual changes in malaria cases in forest area and temperature was observed (coeff=0.689, p=0.040). Separate reliable models constructed for forecasting malaria incidence rates based on the combined influence of climatic variables on malaria incidence in different areas of the district were able to explain substantial percentage of observed variability in the incidence rates (R2adj=45.4%, 50.6%, 47.2%; p< .001 for all). There is an intricate association between climatic variables and malaria incidence of the district. Climatic variables influence malaria incidence in forest area and non-forest area in different ways. Rainfall plays a primary role in characterizing malaria incidences in the district. Malaria parasites in the district had adapted to a relative humidity condition higher than the normal range for transmission in India. Instead of individual influence of the climatic variables, their combined influence was utilizable for construction of models.
Predictors of non- hookah smoking among high-school students based on prototype/willingness model.
Abedini, Sedigheh; MorowatiSharifabad, MohammadAli; Chaleshgar Kordasiabi, Mosharafeh; Ghanbarnejad, Amin
2014-01-01
The aim of the study was to determine predictors of refraining from hookah smoking among high-school students in Bandar Abbas, southern Iran based on Prototype/Willingness model. This cross- sectional with analytic approach was performed on 240 high-school students selected by a cluster random sampling. The data of demographic and Prototype-Willingness Model constructs were acquired via a self-administrated questionnaire. Data were analyzed by mean, frequency, correlation, liner and logistic regression statistical tests. Statistically significant determinants of the intention to refrain from hookah smoking were subjective norms, willingness, and attitude. Regression model indicated that the three items together explained 46.9% of the non-smoking hookah intention variance. Attitude and subjective norms predicted 36.0% of the non-smoking hookah intention variance. There was a significant relationship between the participants' negative prototype about the hookah smokers and the willingness to avoid from hookah smoking (P=0.002). Also willingness predicted non-smoking hookah better than the intention (P<0.001). Deigning intervention to increase negative prototype about the hookah smokers and reducing situations and conditions which facilitate hookah smoking, such as easy access to tobacco products in the cafés, beaches can be useful results among adolescents to hookah smoking prevention.
Tani, Yuji; Ogasawara, Katsuhiko
2012-01-01
This study aimed to contribute to the management of a healthcare organization by providing management information using time-series analysis of business data accumulated in the hospital information system, which has not been utilized thus far. In this study, we examined the performance of the prediction method using the auto-regressive integrated moving-average (ARIMA) model, using the business data obtained at the Radiology Department. We made the model using the data used for analysis, which was the number of radiological examinations in the past 9 years, and we predicted the number of radiological examinations in the last 1 year. Then, we compared the actual value with the forecast value. We were able to establish that the performance prediction method was simple and cost-effective by using free software. In addition, we were able to build the simple model by pre-processing the removal of trend components using the data. The difference between predicted values and actual values was 10%; however, it was more important to understand the chronological change rather than the individual time-series values. Furthermore, our method was highly versatile and adaptable compared to the general time-series data. Therefore, different healthcare organizations can use our method for the analysis and forecasting of their business data.
Tesoriero, A.J.; Voss, F.D.
1997-01-01
The occurrence and distribution of elevated nitrate concentrations (≥ 3 mg/l) in ground water in the Puget Sound Basin, Washington, were determined by examining existing data from more than 3000 wells. Models that estimate the probability that a well has an elevated nitrate concentration were constructed by relating the occurrence of elevated nitrate concentrations to both natural and anthropogenic variables using logistic regression. The variables that best explain the occurrence of elevated nitrate concentrations were well depth, surficial geology, and the percentage of urban and agricultural land within a radius of 3.2 kilometers of the well. From these relations, logistic regression models were developed to assess aquifer susceptibility (relative ease with which contaminants will reach aquifer) and ground-water vulnerability (relative ease with which contaminants will reach aquifer for a given set of land-use practices). Both models performed well at predicting the probability of elevated nitrate concentrations in an independent data set. This approach to assessing aquifer susceptibility and ground-water vulnerability has the advantages of having both model variables and coefficient values determined on the basis of existing water quality information and does not depend on the assignment of variables and weighting factors based on qualitative criteria.
Shan, Ming; Le, Yun; Yiu, Kenneth T W; Chan, Albert P C; Hu, Yi
2017-12-01
Over recent years, the issue of corruption in the public construction sector has attracted increasing attention from both practitioners and researchers worldwide. However, limited efforts are available for investigating the underlying factors of corruption in this sector. Thus, this study attempted to bridge this knowledge gap by exploring the underlying factors of corruption in the public construction sector of China. To achieve this goal, a total of 14 structured interviews were first carried out, and a questionnaire survey was then administered to 188 professionals in China. Two iterations of multivariate analysis approaches, namely, stepwise multiple regression analysis and partial least squares structural equation modeling were successively utilized to analyze the collected data. In addition, a case study was also conducted to triangulate the findings obtained from the statistical analysis. The results generated from these three research methods achieve the same conclusion: the most influential underlying factor leading to corruption was immorality, followed by opacity, unfairness, procedural violation, and contractual violation. This study has contributed to the body of knowledge by exploring the properties of corruption in the public construction sector. The findings from this study are also valuable to the construction authorities as they can assist in developing more effective anti-corruption strategies.
IPMP Global Fit - A one-step direct data analysis tool for predictive microbiology.
Huang, Lihan
2017-12-04
The objective of this work is to develop and validate a unified optimization algorithm for performing one-step global regression analysis of isothermal growth and survival curves for determination of kinetic parameters in predictive microbiology. The algorithm is incorporated with user-friendly graphical interfaces (GUIs) to develop a data analysis tool, the USDA IPMP-Global Fit. The GUIs are designed to guide the users to easily navigate through the data analysis process and properly select the initial parameters for different combinations of mathematical models. The software is developed for one-step kinetic analysis to directly construct tertiary models by minimizing the global error between the experimental observations and mathematical models. The current version of the software is specifically designed for constructing tertiary models with time and temperature as the independent model parameters in the package. The software is tested with a total of 9 different combinations of primary and secondary models for growth and survival of various microorganisms. The results of data analysis show that this software provides accurate estimates of kinetic parameters. In addition, it can be used to improve the experimental design and data collection for more accurate estimation of kinetic parameters. IPMP-Global Fit can be used in combination with the regular USDA-IPMP for solving the inverse problems and developing tertiary models in predictive microbiology. Published by Elsevier B.V.
Palazón-Bru, Antonio; Rizo-Baeza, María M; Martínez-Segura, Asier; Folgado-de la Rosa, David M; Gil-Guillén, Vicente F; Cortés-Castell, Ernesto
2018-03-01
Although 2 screening tests exist for having a high risk of muscle dysmorphia (MD) symptoms, they both require a long time to apply. Accordingly, we proposed the construction, validation, and implementation of such a test in a mobile application using easy-to-measure factors associated with MD. Cross-sectional observational study. Gyms in Alicante (Spain) during 2013 to 2014. One hundred forty-one men who engaged in weight training. The variables are as follows: age, educational level, income, buys own food, physical activity per week, daily meals, importance of nutrition, special nutrition, guilt about dietary nonadherence, supplements, and body mass index (BMI). A points system was constructed through a binary logistic regression model to predict a high risk of MD symptoms by testing all possible combinations of secondary variables (5035). The system was validated using bootstrapping and implemented in a mobile application. High risk of having MD symptoms (Muscle Appearance Satisfaction Scale). Of the 141 participants, 45 had a high risk of MD symptoms [31.9%, 95% confidence interval (CI), 24.2%-39.6%]. The logistic regression model combination providing the largest area under the receiver operating characteristic curve (0.76) included the following: age [odds ratio (OR) = 0.90; 95% CI, 0.84-0.97, P = 0.007], guilt about dietary nonadherence (OR = 2.46; 95% CI, 1.06-5.73, P = 0.037), energy supplements (OR = 3.60; 95% CI, 1.54-8.44, P = 0.003), and BMI (OR = 1.33, 95% CI, 1.12-1.57, P < 0.001). The points system was validated through 1000 bootstrap samples. A quick, easy-to-use, 4-factor test that could serve as a screening tool for a high risk of MD symptoms has been constructed, validated, and implemented in a mobile application.
Alternative approaches to predicting methane emissions from dairy cows.
Mills, J A N; Kebreab, E; Yates, C M; Crompton, L A; Cammell, S B; Dhanoa, M S; Agnew, R E; France, J
2003-12-01
Previous attempts to apply statistical models, which correlate nutrient intake with methane production, have been of limited value where predictions are obtained for nutrient intakes and diet types outside those used in model construction. Dynamic mechanistic models have proved more suitable for extrapolation, but they remain computationally expensive and are not applied easily in practical situations. The first objective of this research focused on employing conventional techniques to generate statistical models of methane production appropriate to United Kingdom dairy systems. The second objective was to evaluate these models and a model published previously using both United Kingdom and North American data sets. Thirdly, nonlinear models were considered as alternatives to the conventional linear regressions. The United Kingdom calorimetry data used to construct the linear models also were used to develop the three nonlinear alternatives that were all of modified Mitscherlich (monomolecular) form. Of the linear models tested, an equation from the literature proved most reliable across the full range of evaluation data (root mean square prediction error = 21.3%). However, the Mitscherlich models demonstrated the greatest degree of adaptability across diet types and intake level. The most successful model for simulating the independent data was a modified Mitscherlich equation with the steepness parameter set to represent dietary starch-to-ADF ratio (root mean square prediction error = 20.6%). However, when such data were unavailable, simpler Mitscherlich forms relating dry matter or metabolizable energy intake to methane production remained better alternatives relative to their linear counterparts.
Okubo, Hidenori; Ohori, Makoto; Ohno, Yoshio; Nakashima, Jun; Inoue, Rie; Nagao, Toshitaka; Tachibana, Masaaki
2014-05-01
To develop a nomogram based on postoperative factors and prostate-specific antigen levels to predict the non-biochemical recurrence rate after radical prostatectomy ina Japanese cohort. A total of 606 Japanese patients with T1-3N0M0 prostate cancer who underwent radical prostatectomy and pelvic lymph node dissection at Tokyo Medical University hospital from 2000 to 2010 were studied. A nomogram was constructed based on Cox hazard regression analysis evaluating the prognostic significance of serum prostate-specific antigen and pathological factors in the radical prostatectomy specimens. The discriminating ability of the nomogram was assessed by the concordance index (C-index), and the predicted and actual outcomes were compared with a bootstrapped calibration plot. With a mean follow up of 60.0 months, a total of 187 patients (30.9%) experienced biochemical recurrence, with a 5-year non-biochemical recurrence rate of 72.3%. Based on a Cox hazard regression model, a nomogram was constructed to predict non-biochemical recurrence using serum prostate-specific antigen level and pathological features in radical prostatectomy specimens. The concordance index was 0.77, and the calibration plots appeared to be accurate. The postoperative nomogram described here can provide valuable information regarding the need for adjuvant/salvage radiation or hormonal therapy in patients after radical prostatectomy.
Influence of fatigue on construction workers' physical and cognitive function.
Zhang, M; Murphy, L A; Fang, D; Caban-Martinez, A J
2015-04-01
Despite scientific evidence linking workers' fatigue to occupational safety (due to impaired physical or cognitive function), little is known about this relationship in construction workers. To assess the association between construction workers' reported fatigue and their perceived difficulties with physical and cognitive functions. Using data from a convenience sample of US construction workers participating in the 2010-11 National Health Interview Survey two multivariate weighted logistic regression models were built to predict difficulty with physical and with cognitive functions associated with workers' reported fatigue, while controlling for age, smoking status, alcohol consumption status, sleep hygiene, psychological distress and arthritis status. Of 606 construction workers surveyed, 49% reported being 'tired some days' in the past 3 months and 10% reported 'tired most days or every day'. Compared with those feeling 'never tired', workers who felt 'tired some days' were significantly more likely to report difficulty with physical function (adjusted odds ratio [AOR] = 2.03; 95% confidence interval [CI] 1.17-3.51) and cognitive function (AOR = 2.27; 95% CI 1.06-4.88) after controlling for potential confounders. Our results suggest an association between reported fatigue and experiencing difficulties with physical and cognitive functions in construction workers. © The Author 2015. Published by Oxford University Press on behalf of the Society of Occupational Medicine. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Statistical approach to the analysis of olive long-term pollen season trends in southern Spain.
García-Mozo, H; Yaezel, L; Oteros, J; Galán, C
2014-03-01
Analysis of long-term airborne pollen counts makes it possible not only to chart pollen-season trends but also to track changing patterns in flowering phenology. Changes in higher plant response over a long interval are considered among the most valuable bioindicators of climate change impact. Phenological-trend models can also provide information regarding crop production and pollen-allergen emission. The interest of this information makes essential the election of the statistical analysis for time series study. We analysed trends and variations in the olive flowering season over a 30-year period (1982-2011) in southern Europe (Córdoba, Spain), focussing on: annual Pollen Index (PI); Pollen Season Start (PSS), Peak Date (PD), Pollen Season End (PSE) and Pollen Season Duration (PSD). Apart from the traditional Linear Regression analysis, a Seasonal-Trend Decomposition procedure based on Loess (STL) and an ARIMA model were performed. Linear regression results indicated a trend toward delayed PSE and earlier PSS and PD, probably influenced by the rise in temperature. These changes are provoking longer flowering periods in the study area. The use of the STL technique provided a clearer picture of phenological behaviour. Data decomposition on pollination dynamics enabled the trend toward an alternate bearing cycle to be distinguished from the influence of other stochastic fluctuations. Results pointed to show a rising trend in pollen production. With a view toward forecasting future phenological trends, ARIMA models were constructed to predict PSD, PSS and PI until 2016. Projections displayed a better goodness of fit than those derived from linear regression. Findings suggest that olive reproductive cycle is changing considerably over the last 30years due to climate change. Further conclusions are that STL improves the effectiveness of traditional linear regression in trend analysis, and ARIMA models can provide reliable trend projections for future years taking into account the internal fluctuations in time series. Copyright © 2013 Elsevier B.V. All rights reserved.
Meloun, Milan; Hill, Martin; Vceláková-Havlíková, Helena
2009-01-01
Pregnenolone sulfate (PregS) is known as a steroid conjugate positively modulating N-methyl-D-aspartate receptors on neuronal membranes. These receptors are responsible for permeability of calcium channels and activation of neuronal function. Neuroactivating effect of PregS is also exerted via non-competitive negative modulation of GABA(A) receptors regulating the chloride influx. Recently, a penetrability of blood-brain barrier for PregS was found in rat, but some experiments in agreement with this finding were reported even earlier. It is known that circulating levels of PregS in human are relatively high depending primarily on age and adrenal activity. Concerning the neuromodulating effect of PregS, we recently evaluated age relationships of PregS in both sexes using polynomial regression models known to bring about the problems of multicollinearity, i.e., strong correlations among independent variables. Several criteria for the selection of suitable bias are demonstrated. Biased estimators based on the generalized principal component regression (GPCR) method avoiding multicollinearity problems are described. Significant differences were found between men and women in the course of the age dependence of PregS. In women, a significant maximum was found around the 30th year followed by a rapid decline, while the maximum in men was achieved almost 10 years earlier and changes were minor up to the 60th year. The investigation of gender differences and age dependencies in PregS could be of interest given its well-known neurostimulating effect, relatively high serum concentration, and the probable partial permeability of the blood-brain barrier for the steroid conjugate. GPCR in combination with the MEP (mean quadric error of prediction) criterion is extremely useful and appealing for constructing biased models. It can also be used for achieving such estimates with regard to keeping the model course corresponding to the data trend, especially in polynomial type regression models.
Akimoto, Yuki; Yugi, Katsuyuki; Uda, Shinsuke; Kudo, Takamasa; Komori, Yasunori; Kubota, Hiroyuki; Kuroda, Shinya
2013-01-01
Cells use common signaling molecules for the selective control of downstream gene expression and cell-fate decisions. The relationship between signaling molecules and downstream gene expression and cellular phenotypes is a multiple-input and multiple-output (MIMO) system and is difficult to understand due to its complexity. For example, it has been reported that, in PC12 cells, different types of growth factors activate MAP kinases (MAPKs) including ERK, JNK, and p38, and CREB, for selective protein expression of immediate early genes (IEGs) such as c-FOS, c-JUN, EGR1, JUNB, and FOSB, leading to cell differentiation, proliferation and cell death; however, how multiple-inputs such as MAPKs and CREB regulate multiple-outputs such as expression of the IEGs and cellular phenotypes remains unclear. To address this issue, we employed a statistical method called partial least squares (PLS) regression, which involves a reduction of the dimensionality of the inputs and outputs into latent variables and a linear regression between these latent variables. We measured 1,200 data points for MAPKs and CREB as the inputs and 1,900 data points for IEGs and cellular phenotypes as the outputs, and we constructed the PLS model from these data. The PLS model highlighted the complexity of the MIMO system and growth factor-specific input-output relationships of cell-fate decisions in PC12 cells. Furthermore, to reduce the complexity, we applied a backward elimination method to the PLS regression, in which 60 input variables were reduced to 5 variables, including the phosphorylation of ERK at 10 min, CREB at 5 min and 60 min, AKT at 5 min and JNK at 30 min. The simple PLS model with only 5 input variables demonstrated a predictive ability comparable to that of the full PLS model. The 5 input variables effectively extracted the growth factor-specific simple relationships within the MIMO system in cell-fate decisions in PC12 cells.
Analyzing thresholds and efficiency with hierarchical Bayesian logistic regression.
Houpt, Joseph W; Bittner, Jennifer L
2018-07-01
Ideal observer analysis is a fundamental tool used widely in vision science for analyzing the efficiency with which a cognitive or perceptual system uses available information. The performance of an ideal observer provides a formal measure of the amount of information in a given experiment. The ratio of human to ideal performance is then used to compute efficiency, a construct that can be directly compared across experimental conditions while controlling for the differences due to the stimuli and/or task specific demands. In previous research using ideal observer analysis, the effects of varying experimental conditions on efficiency have been tested using ANOVAs and pairwise comparisons. In this work, we present a model that combines Bayesian estimates of psychometric functions with hierarchical logistic regression for inference about both unadjusted human performance metrics and efficiencies. Our approach improves upon the existing methods by constraining the statistical analysis using a standard model connecting stimulus intensity to human observer accuracy and by accounting for variability in the estimates of human and ideal observer performance scores. This allows for both individual and group level inferences. Copyright © 2018 Elsevier Ltd. All rights reserved.
Risk factors for antepartum fetal death.
Oron, T; Sheiner, E; Shoham-Vardi, I; Mazor, M; Katz, M; Hallak, M
2001-09-01
To determine the demographic, maternal, pregnancy-related and fetal risk factors for antepartum fetal death (APFD). From our perinatal database between the years 1990 and 1997, 68,870 singleton birth files were analyzed. Fetuses weighing < 1,000 g at birth and those with structural malformations and/or known chromosomal anomalies were excluded from the study. In order to determine independent factors contributing to APFD, a multiple logistic regression model was constructed. During the study period there were 246 cases of APFD (3.6 per 1,000 births). The following obstetric factors significantly correlated with APFD in a multiple logistic regression model: preterm deliveries: small size for gestational age (SGA), multiparity (> 5 deliveries), oligohydramnios, placental abruption, umbilical cord complications (cord around the neck and true knot of cord), pathologic presentations (nonvertex) and meconium-stained amniotic fluid. APFD was not significantly associated with advanced maternal age. APFD was significantly associated with several risk factors. Placental and umbilical cord pathologies might be the direct cause of death. Grand multiparity, oligohydramnios, meconium-stained amniotic fluid, pathologic presentations and suspected SGA should be carefully evaluated during pregnancy in order to decrease the incidence of APFD.
Sharma, Praveen; Singh, Lakhvinder; Dilbaghi, Neeraj
2009-01-30
The aim of our research was to study, effect of temperature, pH and initial dye concentration on decolorization of diazo dye Acid Red 151 (AR 151) from simulated dye solution using a fungal isolate Aspergillus fumigatus fresenius have been investigated. The central composite design matrix and response surface methodology (RSM) have been applied to design the experiments to evaluate the interactive effects of three most important operating variables: temperature (25-35 degrees C), pH (4.0-7.0), and initial dye concentration (100-200 mg/L) on the biodegradation of AR 151. The total 20 experiments were conducted in the present study towards the construction of a quadratic model. Very high regression coefficient between the variables and the response (R(2)=0.9934) indicated excellent evaluation of experimental data by second-order polynomial regression model. The RSM indicated that initial dye concentration of 150 mg/L, pH 5.5 and a temperature of 30 degrees C were optimal for maximum % decolorization of AR 151 in simulated dye solution, and 84.8% decolorization of AR 151 was observed at optimum growth conditions.
Radio Propagation Prediction Software for Complex Mixed Path Physical Channels
2006-08-14
63 4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz 69 4.4.7. Projected Scaling to...4.4.6. Applied Linear Regression Analysis in the Frequency Range 1-50 MHz In order to construct a comprehensive numerical algorithm capable of
Antecedents of eating disorders and muscle dysmorphia in a non-clinical sample.
Lamanna, J; Grieve, F G; Derryberry, W Pitt; Hakman, M; McClure, A
2010-01-01
Muscle Dysmorphia (MD) has recently been conceptualized as the male form of Eating Disorders (ED); although, it is not currently classified as an ED. The current study compares etiological models of MD symptomatology and ED symptomatology. It was hypothesized that sociocultural influences on appearance (SIA) would predict body dissatisfaction (BD), and that this relationship would be mediated by self-esteem (SE) and perfectionism (P); that BD would predict negative affect (NA); and that NA would predict MD and ED symptomatology. Two-hundred-forty-seven female and 101 male college students at a midsouth university completed the study. All participants completed measures assessing each of the constructs, and multiple regression analyses were conducted to test each model's fit. In both models, most predictor paths were significant. These results suggest similarity in symptomatology and etiological models between ED and MD.
Li, Xuehua; Zhao, Wenxing; Li, Jing; Jiang, Jingqiu; Chen, Jianji; Chen, Jingwen
2013-08-01
To assess the persistence and fate of volatile organic compounds in the troposphere, the rate constants for the reaction with ozone (kO3) are needed. As kO3 values are only available for hundreds of compounds, and experimental determination of kO3 is costly and time-consuming, it is of importance to develop predictive models on kO3. In this study, a total of 379 logkO3 values at different temperatures were used to develop and validate a model for the prediction of kO3, based on quantum chemical descriptors, Dragon descriptors and structural fragments. Molecular descriptors were screened by stepwise multiple linear regression, and the model was constructed by partial least-squares regression. The cross validation coefficient QCUM(2) of the model is 0.836, and the external validation coefficient Qext(2) is 0.811, indicating that the model has high robustness and good predictive performance. The most significant descriptor explaining logkO3 is the BELm2 descriptor with connectivity information weighted atomic masses. kO3 increases with increasing BELm2, and decreases with increasing ionization potential. The applicability domain of the proposed model was visualized by the Williams plot. The developed model can be used to predict kO3 at different temperatures for a wide range of organic chemicals, including alkenes, cycloalkenes, haloalkenes, alkynes, oxygen-containing compounds, nitrogen-containing compounds (except primary amines) and aromatic compounds. Copyright © 2013 Elsevier Ltd. All rights reserved.
Jang, Jin-Young; Park, Taesung; Lee, Selyeong; Kim, Yongkang; Lee, Seung Yeoun; Kim, Sun-Whe; Kim, Song-Cheol; Song, Ki-Byung; Yamamoto, Masakazu; Hatori, Takashi; Hirono, Seiko; Satoi, Sohei; Fujii, Tsutomu; Hirano, Satoshi; Hashimoto, Yasushi; Shimizu, Yashuhiro; Choi, Dong Wook; Choi, Seong Ho; Heo, Jin Seok; Motoi, Fuyuhiko; Matsumoto, Ippei; Lee, Woo Jung; Kang, Chang Moo; Han, Ho-Seong; Yoon, Yoo-Seok; Sho, Masayuki; Nagano, Hiroaki; Honda, Goro; Kim, Sang Geol; Yu, Hee Chul; Chung, Jun Chul; Nagakawa, Yuichi; Seo, Hyung Il; Yamaue, Hiroki
2017-12-01
This study evaluated individual risks of malignancy and proposed a nomogram for predicting malignancy of branch duct type intraductal papillary mucinous neoplasms (BD-IPMNs) using the large database for IPMN. Although consensus guidelines list several malignancy predicting factors in patients with BD-IPMN, those variables have different predictability and individual quantitative prediction of malignancy risk is limited. Clinicopathological factors predictive of malignancy were retrospectively analyzed in 2525 patients with biopsy proven BD-IPMN at 22 tertiary hospitals in Korea and Japan. The patients with main duct dilatation >10 mm and inaccurate information were excluded. The study cohort consisted of 2258 patients. Malignant IPMNs were defined as those with high grade dysplasia and associated invasive carcinoma. Of 2258 patients, 986 (43.7%) had low, 443 (19.6%) had intermediate, 398 (17.6%) had high grade dysplasia, and 431 (19.1%) had invasive carcinoma. To construct and validate the nomogram, patients were randomly allocated into training and validation sets, with fixed ratios of benign and malignant lesions. Multiple logistic regression analysis resulted in five variables (cyst size, duct dilatation, mural nodule, serum CA19-9, and CEA) being selected to construct the nomogram. In the validation set, this nomogram showed excellent discrimination power through a 1000 times bootstrapped calibration test. A nomogram predicting malignancy in patients with BD-IPMN was constructed using a logistic regression model. This nomogram may be useful in identifying patients at risk of malignancy and for selecting optimal treatment methods. The nomogram is freely available at http://statgen.snu.ac.kr/software/nomogramIPMN.
Socioeconomic, hygienic, and sanitation factors in reducing diarrhea in the Amazon
Imada, Katiuscia Shirota; de Araújo, Thiago Santos; Muniz, Pascoal Torres; de Pádua, Valter Lúcio
2016-01-01
ABSTRACT OBJECTIVE To analyze the contributions of the socioeconomic, hygienic, and sanitation improvements in reducing the prevalence of diarrhea in a city of the Amazon. METHODS In this population-based cross-sectional study, we analyzed data from surveys conducted in the city of Jordão, Acre. In 2005 and 2012, these surveys evaluated, respectively, 466 and 826 children under five years old. Questionnaires were applied on the socioeconomic conditions, construction of houses, food and hygienic habits, and environmental sanitation. We applied Pearson’s Chi-squared test and Poisson regression to verify the relationship between origin of water, construction of homes, age of introduction of cow’s milk in the diet, place of birth and the prevalence of diarrhea. RESULTS The prevalence of diarrhea was reduced from 45.1% to 35.4%. We identified higher probability of diarrhea in children who did not use water from the public network, in those receiving cow’s milk in the first month after birth, and in those living in houses made of paxiúba. Children born at home presented lower risk of diarrhea when compared to those who were born in hospital, with this difference reversing for the 2012 survey. CONCLUSIONS Sanitation conditions improved with the increase of bathrooms with toilets, implementation of the Programa de Saúde da Família (PSF – Family Health Program), and water treatment in the city. The multivariate regression model identified a statistically significant association between use of water from the public network, construction of houses, late introduction of cow’s milk, and access to health service with occurrence of diarrhea. PMID:28099660
Predicting use of effective vegetable parenting practices with the Model of Goal Directed Behavior.
Diep, Cassandra S; Beltran, Alicia; Chen, Tzu-An; Thompson, Debbe; O'Connor, Teresia; Hughes, Sheryl; Baranowski, Janice; Baranowski, Tom
2015-06-01
To model effective vegetable parenting practices using the Model of Goal Directed Vegetable Parenting Practices construct scales. An Internet survey was conducted with parents of pre-school children to assess their agreement with effective vegetable parenting practices and Model of Goal Directed Vegetable Parenting Practices items. Block regression modelling was conducted using the composite score of effective vegetable parenting practices scales as the outcome variable and the Model of Goal Directed Vegetable Parenting Practices constructs as predictors in separate and sequential blocks: demographics, intention, desire (intrinsic motivation), perceived barriers, autonomy, relatedness, self-efficacy, habit, anticipated emotions, perceived behavioural control, attitudes and lastly norms. Backward deletion was employed at the end for any variable not significant at P<0·05. Houston, TX, USA. Three hundred and seven parents (mostly mothers) of pre-school children. Significant predictors in the final model in order of relationship strength included habit of active child involvement in vegetable selection, habit of positive vegetable communications, respondent not liking vegetables, habit of keeping a positive vegetable environment and perceived behavioural control of having a positive influence on child's vegetable consumption. The final model's adjusted R 2 was 0·486. This was the first study to test scales from a behavioural model to predict effective vegetable parenting practices. Further research needs to assess these Model of Goal Directed Vegetable Parenting Practices scales for their (i) predictiveness of child consumption of vegetables in longitudinal samples and (ii) utility in guiding design of vegetable parenting practices interventions.
Trust from the past: Bayesian Personalized Ranking based Link Prediction in Knowledge Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Baichuan; Choudhury, Sutanay; Al-Hasan, Mohammad
2016-02-01
Estimating the confidence for a link is a critical task for Knowledge Graph construction. Link prediction, or predicting the likelihood of a link in a knowledge graph based on prior state is a key research direction within this area. We propose a Latent Feature Embedding based link recommendation model for prediction task and utilize Bayesian Personalized Ranking based optimization technique for learning models for each predicate. Experimental results on large-scale knowledge bases such as YAGO2 show that our approach achieves substantially higher performance than several state-of-art approaches. Furthermore, we also study the performance of the link prediction algorithm in termsmore » of topological properties of the Knowledge Graph and present a linear regression model to reason about its expected level of accuracy.« less
Predicting the Trends of Social Events on Chinese Social Media.
Zhou, Yang; Zhang, Lei; Liu, Xiaoqian; Zhang, Zhen; Bai, Shuotian; Zhu, Tingshao
2017-09-01
Growing interest in social events on social media came along with the rapid development of the Internet. Social events that occur in the "real" world can spread on social media (e.g., Sina Weibo) rapidly, which may trigger severe consequences and thus require the government's timely attention and responses. This article proposes to predict the trends of social events on Sina Weibo, which is currently the most popular social media in China. Based on the theories of social psychology and communication sciences, we extract an unprecedented amount of comprehensive and effective features that relate to the trends of social events on Chinese social media, and we construct the trends of prediction models by using three classical regression algorithms. We found that lasso regression performed better with the precision 0.78 and the recall 0.88. The results of our experiments demonstrated the effectiveness of our proposed approach.
Sibling dilution hypothesis: a regression surface analysis.
Marjoribanks, K
2001-08-01
This study examined relationships between sibship size (the number of children in a family), birth order, and measures of academic performance, academic self-concept, and educational aspirations at different levels of family educational resources. As part of a national longitudinal study of Australian secondary school students data were collected from 2,530 boys and 2,450 girls in Years 9 and 10. Regression surfaces were constructed from models that included terms to account for linear, interaction, and curvilinear associations among the variables. Analysis suggests the general propositions (a) family educational resources have significant associations with children's school-related outcomes at different levels of sibling variables, the relationships for girls being curvilinear, and (b) sibling variables continue to have small significant associations with affective and cognitive outcomes, after taking into account variations in family educational resources. That is, the investigation provides only partial support for the sibling dilution hypothesis.
Health and safety implications of recruitment payments in migrant construction workers
Hassan, H. A.
2014-01-01
Background The Middle East construction sector is heavily reliant on a migrant workforce that predominantly originates from South Asia. It is common practice for migrant construction workers to pay a local labour recruiter the equivalent of one or more years’ prospective overseas salary to secure employment, work and travel permits and transportation. The occupational health and safety implications of these financial arrangements remain unexplored. Aims To examine associations between payment to a labour recruiter, perceived general health and worksite accidents among migrant construction workers in the Middle East. Methods A questionnaire was completed by a convenience sample of predominantly Indian migrant construction workers drawn from a large construction project. The relationship between payment and risk of poor health and workplace accidents was assessed using multivariate logistic regression models (crude and adjusted for socio-demographic and occupational factors). Results There were 651 participants. The majority (58%) of migrant construction workers had paid a labour recruiter and ~40% had experienced a worksite accident. Between 3% (labourers) and 9% (foremen) perceived their health to be poor. Labourers and skilled workers who had paid a labour recruiter were significantly more likely to have experienced a worksite accident in the previous 12 months. Skilled workers, but not labourers and foremen, who had paid a labour recruiter were at increased risk of poor health. Conclusions The mechanisms linking labour recruiter payments to adverse safety and health outcomes warrant investigation with a view to developing interventions to erode these links. PMID:24668316
Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas
Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian
2016-01-01
The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R2 of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R2 between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R2 value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R2 and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible. PMID:27367708
Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas.
Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian
2016-06-28
The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R² of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R² between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R² value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R² and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible.
A multimodel approach to interannual and seasonal prediction of Danube discharge anomalies
NASA Astrophysics Data System (ADS)
Rimbu, Norel; Ionita, Monica; Patrut, Simona; Dima, Mihai
2010-05-01
Interannual and seasonal predictability of Danube river discharge is investigated using three model types: 1) time series models 2) linear regression models of discharge with large-scale climate mode indices and 3) models based on stable teleconnections. All models are calibrated using discharge and climatic data for the period 1901-1977 and validated for the period 1978-2008 . Various time series models, like autoregressive (AR), moving average (MA), autoregressive and moving average (ARMA) or singular spectrum analysis and autoregressive moving average (SSA+ARMA) models have been calibrated and their skills evaluated. The best results were obtained using SSA+ARMA models. SSA+ARMA models proved to have the highest forecast skill also for other European rivers (Gamiz-Fortis et al. 2008). Multiple linear regression models using large-scale climatic mode indices as predictors have a higher forecast skill than the time series models. The best predictors for Danube discharge are the North Atlantic Oscillation (NAO) and the East Atlantic/Western Russia patterns during winter and spring. Other patterns, like Polar/Eurasian or Tropical Northern Hemisphere (TNH) are good predictors for summer and autumn discharge. Based on stable teleconnection approach (Ionita et al. 2008) we construct prediction models through a combination of sea surface temperature (SST), temperature (T) and precipitation (PP) from the regions where discharge and SST, T and PP variations are stable correlated. Forecast skills of these models are higher than forecast skills of the time series and multiple regression models. The models calibrated and validated in our study can be used for operational prediction of interannual and seasonal Danube discharge anomalies. References Gamiz-Fortis, S., D. Pozo-Vazquez, R.M. Trigo, and Y. Castro-Diez, Quantifying the predictability of winter river flow in Iberia. Part I: intearannual predictability. J. Climate, 2484-2501, 2008. Gamiz-Fortis, S., D. Pozo-Vazquez, R.M. Trigo, and Y. Castro-Diez, Quantifying the predictability of winter river flow in Iberia. Part II: seasonal predictability. J. Climate, 2503-2518, 2008. Ionita, M., G. Lohmann, and N. Rimbu, Prediction of spring Elbe river discharge based on stable teleconnections with global temperature and precipitation. J. Climate. 6215-6226, 2008.
Andrews, Arthur R.; Bridges, Ana J.; Gomez, Debbie
2014-01-01
Purpose The aims of the study were to evaluate the orthogonality of acculturation for Latinos. Design Regression analyses were used to examine acculturation in two Latino samples (N = 77; N = 40). In a third study (N = 673), confirmatory factor analyses compared unidimensional and bidimensional models. Method Acculturation was assessed with the ARSMA-II (Studies 1 and 2), and language proficiency items from the Children of Immigrants Longitudinal Study (Study 3). Results In Studies 1 and 2, the bidimensional model accounted for slightly more variance (R2Study 1 = .11; R2Study 2 = .21) than the unidimensional model (R2Study 1 = .10; R2Study 2 = .19). In Study 3, the bidimensional model evidenced better fit (Akaike information criterion = 167.36) than the unidimensional model (Akaike information criterion = 1204.92). Discussion/Conclusions Acculturation is multidimensional. Implications for Practice Care providers should examine acculturation as a bidimensional construct. PMID:23361579
Using the theory of planned behavior to predict infant restraint use in Saudi Arabia.
Nelson, Anna; Modeste, Naomi N; Marshak, Helen H; Hopp, Joyce W
2014-09-01
To determine whether the theory of planned behavior (TPB) predicted intent of child restraint system (CRS) use among pregnant women in the Kingdom of Saudi Arabia (KSA). In this cross-sectional study conducted in Dallah Hospital, Riyadh, KSA during June-July 2013, 196 pregnant women completed surveys assessing their beliefs regarding CRS. Simultaneous observations were conducted among a different sample of 150 women to determine CRS usage at hospital discharge following maternity stay. Logistic regression model with TPB constructs and covariates as predictors of CRS usage intent was significant (χ2=64.986, p<0.0001) and predicted 38% of intent. There was an increase in odds of intent for attitudes (31.5%, p<0.05), subjective norm (55.3%, p<0.001), and perceived behavioral control (76.9%, p<0.001). The 3 logistic regression models testing the association of the relevant set of composite belief scores were also significant for attitudes (χ2=16.803, p<0.05), subjective norm (χ2=29.681, p<0.0001), and perceived behavioral control (χ2=20.516, p<0.05). The behavioral observation showed that none of the 150 women observed used CRS for their newborn at discharge. The TPB constructs were significantly and independently associated with higher intent for CRS usage. While TPB appears to be a useful tool to identify beliefs related to CRS usage intentions in KSA, the results of the separate behavioral observation indicate that intentions may not be related to the actual usage of CRS in the Kingdom. Further studies are recommended to examine this association.
Prediction of postoperative pulmonary complications in a population-based surgical cohort.
Canet, Jaume; Gallart, Lluís; Gomar, Carmen; Paluzie, Guillem; Vallès, Jordi; Castillo, Jordi; Sabaté, Sergi; Mazo, Valentín; Briones, Zahara; Sanchis, Joaquín
2010-12-01
Current knowledge of the risk for postoperative pulmonary complications (PPCs) rests on studies that narrowly selected patients and procedures. Hypothesizing that PPC occurrence could be predicted from a reduced set of perioperative variables, we aimed to develop a predictive index for a broad surgical population. Patients undergoing surgical procedures given general, neuraxial, or regional anesthesia in 59 hospitals were randomly selected for this prospective, multicenter study. The main outcome was the development of at least one of the following: respiratory infection, respiratory failure, bronchospasm, atelectasis, pleural effusion, pneumothorax, or aspiration pneumonitis. The cohort was randomly divided into a development subsample to construct a logistic regression model and a validation subsample. A PPC predictive index was constructed. Of 2,464 patients studied, 252 events were observed in 123 (5%). Thirty-day mortality was higher in patients with a PPC (19.5%; 95% [CI], 12.5-26.5%) than in those without a PPC (0.5%; 95% CI, 0.2-0.8%). Regression modeling identified seven independent risk factors: low preoperative arterial oxygen saturation, acute respiratory infection during the previous month, age, preoperative anemia, upper abdominal or intrathoracic surgery, surgical duration of at least 2 h, and emergency surgery. The area under the receiver operating characteristic curve was 90% (95% CI, 85-94%) for the development subsample and 88% (95% CI, 84-93%) for the validation subsample. The risk index based on seven objective, easily assessed factors has excellent discriminative ability. The index can be used to assess individual risk of PPC and focus further research on measures to improve patient care.