ℓ(p)-Norm multikernel learning approach for stock market price forecasting.
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ(1)-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ(p)-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ(1)-norm multiple support vector regression model.
ℓ p-Norm Multikernel Learning Approach for Stock Market Price Forecasting
Shao, Xigao; Wu, Kun; Liao, Bifeng
2012-01-01
Linear multiple kernel learning model has been used for predicting financial time series. However, ℓ 1-norm multiple support vector regression is rarely observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures that generalize well, we adopt ℓ p-norm multiple kernel support vector regression (1 ≤ p < ∞) as a stock price prediction model. The optimization problem is decomposed into smaller subproblems, and the interleaved optimization strategy is employed to solve the regression model. The model is evaluated on forecasting the daily stock closing prices of Shanghai Stock Index in China. Experimental results show that our proposed model performs better than ℓ 1-norm multiple support vector regression model. PMID:23365561
NASA Astrophysics Data System (ADS)
Li, Tao
2018-06-01
The complexity of aluminum electrolysis process leads the temperature for aluminum reduction cells hard to measure directly. However, temperature is the control center of aluminum production. To solve this problem, combining some aluminum plant's practice data, this paper presents a Soft-sensing model of temperature for aluminum electrolysis process on Improved Twin Support Vector Regression (ITSVR). ITSVR eliminates the slow learning speed of Support Vector Regression (SVR) and the over-fit risk of Twin Support Vector Regression (TSVR) by introducing a regularization term into the objective function of TSVR, which ensures the structural risk minimization principle and lower computational complexity. Finally, the model with some other parameters as auxiliary variable, predicts the temperature by ITSVR. The simulation result shows Soft-sensing model based on ITSVR has short time-consuming and better generalization.
SEMIPARAMETRIC QUANTILE REGRESSION WITH HIGH-DIMENSIONAL COVARIATES
Zhu, Liping; Huang, Mian; Li, Runze
2012-01-01
This paper is concerned with quantile regression for a semiparametric regression model, in which both the conditional mean and conditional variance function of the response given the covariates admit a single-index structure. This semiparametric regression model enables us to reduce the dimension of the covariates and simultaneously retains the flexibility of nonparametric regression. Under mild conditions, we show that the simple linear quantile regression offers a consistent estimate of the index parameter vector. This is a surprising and interesting result because the single-index model is possibly misspecified under the linear quantile regression. With a root-n consistent estimate of the index vector, one may employ a local polynomial regression technique to estimate the conditional quantile function. This procedure is computationally efficient, which is very appealing in high-dimensional data analysis. We show that the resulting estimator of the quantile function performs asymptotically as efficiently as if the true value of the index vector were known. The methodologies are demonstrated through comprehensive simulation studies and an application to a real dataset. PMID:24501536
Experimental and computational prediction of glass transition temperature of drugs.
Alzghoul, Ahmad; Alhalaweh, Amjad; Mahlin, Denny; Bergström, Christel A S
2014-12-22
Glass transition temperature (Tg) is an important inherent property of an amorphous solid material which is usually determined experimentally. In this study, the relation between Tg and melting temperature (Tm) was evaluated using a data set of 71 structurally diverse druglike compounds. Further, in silico models for prediction of Tg were developed based on calculated molecular descriptors and linear (multilinear regression, partial least-squares, principal component regression) and nonlinear (neural network, support vector regression) modeling techniques. The models based on Tm predicted Tg with an RMSE of 19.5 K for the test set. Among the five computational models developed herein the support vector regression gave the best result with RMSE of 18.7 K for the test set using only four chemical descriptors. Hence, two different models that predict Tg of drug-like molecules with high accuracy were developed. If Tm is available, a simple linear regression can be used to predict Tg. However, the results also suggest that support vector regression and calculated molecular descriptors can predict Tg with equal accuracy, already before compound synthesis.
Prediction of hourly PM2.5 using a space-time support vector regression model
NASA Astrophysics Data System (ADS)
Yang, Wentao; Deng, Min; Xu, Feng; Wang, Hang
2018-05-01
Real-time air quality prediction has been an active field of research in atmospheric environmental science. The existing methods of machine learning are widely used to predict pollutant concentrations because of their enhanced ability to handle complex non-linear relationships. However, because pollutant concentration data, as typical geospatial data, also exhibit spatial heterogeneity and spatial dependence, they may violate the assumptions of independent and identically distributed random variables in most of the machine learning methods. As a result, a space-time support vector regression model is proposed to predict hourly PM2.5 concentrations. First, to address spatial heterogeneity, spatial clustering is executed to divide the study area into several homogeneous or quasi-homogeneous subareas. To handle spatial dependence, a Gauss vector weight function is then developed to determine spatial autocorrelation variables as part of the input features. Finally, a local support vector regression model with spatial autocorrelation variables is established for each subarea. Experimental data on PM2.5 concentrations in Beijing are used to verify whether the results of the proposed model are superior to those of other methods.
Held, Elizabeth; Cape, Joshua; Tintle, Nathan
2016-01-01
Machine learning methods continue to show promise in the analysis of data from genetic association studies because of the high number of variables relative to the number of observations. However, few best practices exist for the application of these methods. We extend a recently proposed supervised machine learning approach for predicting disease risk by genotypes to be able to incorporate gene expression data and rare variants. We then apply 2 different versions of the approach (radial and linear support vector machines) to simulated data from Genetic Analysis Workshop 19 and compare performance to logistic regression. Method performance was not radically different across the 3 methods, although the linear support vector machine tended to show small gains in predictive ability relative to a radial support vector machine and logistic regression. Importantly, as the number of genes in the models was increased, even when those genes contained causal rare variants, model predictive ability showed a statistically significant decrease in performance for both the radial support vector machine and logistic regression. The linear support vector machine showed more robust performance to the inclusion of additional genes. Further work is needed to evaluate machine learning approaches on larger samples and to evaluate the relative improvement in model prediction from the incorporation of gene expression data.
ERIC Educational Resources Information Center
Waller, Niels; Jones, Jeff
2011-01-01
We describe methods for assessing all possible criteria (i.e., dependent variables) and subsets of criteria for regression models with a fixed set of predictors, x (where x is an n x 1 vector of independent variables). Our methods build upon the geometry of regression coefficients (hereafter called regression weights) in n-dimensional space. For a…
Lorenz, Alyson; Dhingra, Radhika; Chang, Howard H; Bisanzio, Donal; Liu, Yang; Remais, Justin V
2014-01-01
Extrapolating landscape regression models for use in assessing vector-borne disease risk and other applications requires thoughtful evaluation of fundamental model choice issues. To examine implications of such choices, an analysis was conducted to explore the extent to which disparate landscape models agree in their epidemiological and entomological risk predictions when extrapolated to new regions. Agreement between six literature-drawn landscape models was examined by comparing predicted county-level distributions of either Lyme disease or Ixodes scapularis vector using Spearman ranked correlation. AUC analyses and multinomial logistic regression were used to assess the ability of these extrapolated landscape models to predict observed national data. Three models based on measures of vegetation, habitat patch characteristics, and herbaceous landcover emerged as effective predictors of observed disease and vector distribution. An ensemble model containing these three models improved precision and predictive ability over individual models. A priori assessment of qualitative model characteristics effectively identified models that subsequently emerged as better predictors in quantitative analysis. Both a methodology for quantitative model comparison and a checklist for qualitative assessment of candidate models for extrapolation are provided; both tools aim to improve collaboration between those producing models and those interested in applying them to new areas and research questions.
Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%. PMID:25302338
Chen, Suduan; Goo, Yeong-Jia James; Shen, Zone-De
2014-01-01
As the fraudulent financial statement of an enterprise is increasingly serious with each passing day, establishing a valid forecasting fraudulent financial statement model of an enterprise has become an important question for academic research and financial practice. After screening the important variables using the stepwise regression, the study also matches the logistic regression, support vector machine, and decision tree to construct the classification models to make a comparison. The study adopts financial and nonfinancial variables to assist in establishment of the forecasting fraudulent financial statement model. Research objects are the companies to which the fraudulent and nonfraudulent financial statement happened between years 1998 to 2012. The findings are that financial and nonfinancial information are effectively used to distinguish the fraudulent financial statement, and decision tree C5.0 has the best classification effect 85.71%.
Zlotnik, Alexander; Gallardo-Antolín, Ascensión; Cuchí Alfaro, Miguel; Pérez Pérez, María Carmen; Montero Martínez, Juan Manuel
2015-08-01
Although emergency department visit forecasting can be of use for nurse staff planning, previous research has focused on models that lacked sufficient resolution and realistic error metrics for these predictions to be applied in practice. Using data from a 1100-bed specialized care hospital with 553,000 patients assigned to its healthcare area, forecasts with different prediction horizons, from 2 to 24 weeks ahead, with an 8-hour granularity, using support vector regression, M5P, and stratified average time-series models were generated with an open-source software package. As overstaffing and understaffing errors have different implications, error metrics and potential personnel monetary savings were calculated with a custom validation scheme, which simulated subsequent generation of predictions during a 4-year period. Results were then compared with a generalized estimating equation regression. Support vector regression and M5P models were found to be superior to the stratified average model with a 95% confidence interval. Our findings suggest that medium and severe understaffing situations could be reduced in more than an order of magnitude and average yearly savings of up to €683,500 could be achieved if dynamic nursing staff allocation was performed with support vector regression instead of the static staffing levels currently in use.
Ebtehaj, Isa; Bonakdari, Hossein
2016-01-01
Sediment transport without deposition is an essential consideration in the optimum design of sewer pipes. In this study, a novel method based on a combination of support vector regression (SVR) and the firefly algorithm (FFA) is proposed to predict the minimum velocity required to avoid sediment settling in pipe channels, which is expressed as the densimetric Froude number (Fr). The efficiency of support vector machine (SVM) models depends on the suitable selection of SVM parameters. In this particular study, FFA is used by determining these SVM parameters. The actual effective parameters on Fr calculation are generally identified by employing dimensional analysis. The different dimensionless variables along with the models are introduced. The best performance is attributed to the model that employs the sediment volumetric concentration (C(V)), ratio of relative median diameter of particles to hydraulic radius (d/R), dimensionless particle number (D(gr)) and overall sediment friction factor (λ(s)) parameters to estimate Fr. The performance of the SVR-FFA model is compared with genetic programming, artificial neural network and existing regression-based equations. The results indicate the superior performance of SVR-FFA (mean absolute percentage error = 2.123%; root mean square error =0.116) compared with other methods.
García Nieto, P J; Alonso Fernández, J R; de Cos Juez, F J; Sánchez Lasheras, F; Díaz Muñiz, C
2013-04-01
Cyanotoxins, a kind of poisonous substances produced by cyanobacteria, are responsible for health risks in drinking and recreational waters. As a result, anticipate its presence is a matter of importance to prevent risks. The aim of this study is to use a hybrid approach based on support vector regression (SVR) in combination with genetic algorithms (GAs), known as a genetic algorithm support vector regression (GA-SVR) model, in forecasting the cyanotoxins presence in the Trasona reservoir (Northern Spain). The GA-SVR approach is aimed at highly nonlinear biological problems with sharp peaks and the tests carried out proved its high performance. Some physical-chemical parameters have been considered along with the biological ones. The results obtained are two-fold. In the first place, the significance of each biological and physical-chemical variable on the cyanotoxins presence in the reservoir is determined with success. Finally, a predictive model able to forecast the possible presence of cyanotoxins in a short term was obtained. Copyright © 2013 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Fei, Cheng-Wei; Bai, Guang-Chen
2014-12-01
To improve the computational precision and efficiency of probabilistic design for mechanical dynamic assembly like the blade-tip radial running clearance (BTRRC) of gas turbine, a distribution collaborative probabilistic design method-based support vector machine of regression (SR)(called as DCSRM) is proposed by integrating distribution collaborative response surface method and support vector machine regression model. The mathematical model of DCSRM is established and the probabilistic design idea of DCSRM is introduced. The dynamic assembly probabilistic design of aeroengine high-pressure turbine (HPT) BTRRC is accomplished to verify the proposed DCSRM. The analysis results reveal that the optimal static blade-tip clearance of HPT is gained for designing BTRRC, and improving the performance and reliability of aeroengine. The comparison of methods shows that the DCSRM has high computational accuracy and high computational efficiency in BTRRC probabilistic analysis. The present research offers an effective way for the reliability design of mechanical dynamic assembly and enriches mechanical reliability theory and method.
Combining Relevance Vector Machines and exponential regression for bearing residual life estimation
NASA Astrophysics Data System (ADS)
Di Maio, Francesco; Tsui, Kwok Leung; Zio, Enrico
2012-08-01
In this paper we present a new procedure for estimating the bearing Residual Useful Life (RUL) by combining data-driven and model-based techniques. Respectively, we resort to (i) Relevance Vector Machines (RVMs) for selecting a low number of significant basis functions, called Relevant Vectors (RVs), and (ii) exponential regression to compute and continuously update residual life estimations. The combination of these techniques is developed with reference to partially degraded thrust ball bearings and tested on real world vibration-based degradation data. On the case study considered, the proposed procedure outperforms other model-based methods, with the added value of an adequate representation of the uncertainty associated to the estimates of the quantification of the credibility of the results by the Prognostic Horizon (PH) metric.
NASA Astrophysics Data System (ADS)
Febrian Umbara, Rian; Tarwidi, Dede; Budi Setiawan, Erwin
2018-03-01
The paper discusses the prediction of Jakarta Composite Index (JCI) in Indonesia Stock Exchange. The study is based on JCI historical data for 1286 days to predict the value of JCI one day ahead. This paper proposes predictions done in two stages., The first stage using Fuzzy Time Series (FTS) to predict values of ten technical indicators, and the second stage using Support Vector Regression (SVR) to predict the value of JCI one day ahead, resulting in a hybrid prediction model FTS-SVR. The performance of this combined prediction model is compared with the performance of the single stage prediction model using SVR only. Ten technical indicators are used as input for each model.
ATLS Hypovolemic Shock Classification by Prediction of Blood Loss in Rats Using Regression Models.
Choi, Soo Beom; Choi, Joon Yul; Park, Jee Soo; Kim, Deok Won
2016-07-01
In our previous study, our input data set consisted of 78 rats, the blood loss in percent as a dependent variable, and 11 independent variables (heart rate, systolic blood pressure, diastolic blood pressure, mean arterial pressure, pulse pressure, respiration rate, temperature, perfusion index, lactate concentration, shock index, and new index (lactate concentration/perfusion)). The machine learning methods for multicategory classification were applied to a rat model in acute hemorrhage to predict the four Advanced Trauma Life Support (ATLS) hypovolemic shock classes for triage in our previous study. However, multicategory classification is much more difficult and complicated than binary classification. We introduce a simple approach for classifying ATLS hypovolaemic shock class by predicting blood loss in percent using support vector regression and multivariate linear regression (MLR). We also compared the performance of the classification models using absolute and relative vital signs. The accuracies of support vector regression and MLR models with relative values by predicting blood loss in percent were 88.5% and 84.6%, respectively. These were better than the best accuracy of 80.8% of the direct multicategory classification using the support vector machine one-versus-one model in our previous study for the same validation data set. Moreover, the simple MLR models with both absolute and relative values could provide possibility of the future clinical decision support system for ATLS classification. The perfusion index and new index were more appropriate with relative changes than absolute values.
A Novel Degradation Identification Method for Wind Turbine Pitch System
NASA Astrophysics Data System (ADS)
Guo, Hui-Dong
2018-04-01
It’s difficult for traditional threshold value method to identify degradation of operating equipment accurately. An novel degradation evaluation method suitable for wind turbine condition maintenance strategy implementation was proposed in this paper. Based on the analysis of typical variable-speed pitch-to-feather control principle and monitoring parameters for pitch system, a multi input multi output (MIMO) regression model was applied to pitch system, where wind speed, power generation regarding as input parameters, wheel rotation speed, pitch angle and motor driving currency for three blades as output parameters. Then, the difference between the on-line measurement and the calculated value from the MIMO regression model applying least square support vector machines (LSSVM) method was defined as the Observed Vector of the system. The Gaussian mixture model (GMM) was applied to fitting the distribution of the multi dimension Observed Vectors. Applying the model established, the Degradation Index was calculated using the SCADA data of a wind turbine damaged its pitch bearing retainer and rolling body, which illustrated the feasibility of the provided method.
Li, Yankun; Shao, Xueguang; Cai, Wensheng
2007-04-15
Consensus modeling of combining the results of multiple independent models to produce a single prediction avoids the instability of single model. Based on the principle of consensus modeling, a consensus least squares support vector regression (LS-SVR) method for calibrating the near-infrared (NIR) spectra was proposed. In the proposed approach, NIR spectra of plant samples were firstly preprocessed using discrete wavelet transform (DWT) for filtering the spectral background and noise, then, consensus LS-SVR technique was used for building the calibration model. With an optimization of the parameters involved in the modeling, a satisfied model was achieved for predicting the content of reducing sugar in plant samples. The predicted results show that consensus LS-SVR model is more robust and reliable than the conventional partial least squares (PLS) and LS-SVR methods.
Sparse kernel methods for high-dimensional survival data.
Evers, Ludger; Messow, Claudia-Martina
2008-07-15
Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be 'kernelized'. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, depending only on a small fraction of the training data. We propose two methods. One is based on a geometric idea, where-akin to support vector classification-the margin between the failed observation and the observations currently at risk is maximised. The other approach is based on obtaining a sparse model by adding observations one after another akin to the Import Vector Machine (IVM). Data examples studied suggest that both methods can outperform competing approaches. Software is available under the GNU Public License as an R package and can be obtained from the first author's website http://www.maths.bris.ac.uk/~maxle/software.html.
A novel strategy for forensic age prediction by DNA methylation and support vector regression model
Xu, Cheng; Qu, Hongzhu; Wang, Guangyu; Xie, Bingbing; Shi, Yi; Yang, Yaran; Zhao, Zhao; Hu, Lan; Fang, Xiangdong; Yan, Jiangwei; Feng, Lei
2015-01-01
High deviations resulting from prediction model, gender and population difference have limited age estimation application of DNA methylation markers. Here we identified 2,957 novel age-associated DNA methylation sites (P < 0.01 and R2 > 0.5) in blood of eight pairs of Chinese Han female monozygotic twins. Among them, nine novel sites (false discovery rate < 0.01), along with three other reported sites, were further validated in 49 unrelated female volunteers with ages of 20–80 years by Sequenom Massarray. A total of 95 CpGs were covered in the PCR products and 11 of them were built the age prediction models. After comparing four different models including, multivariate linear regression, multivariate nonlinear regression, back propagation neural network and support vector regression, SVR was identified as the most robust model with the least mean absolute deviation from real chronological age (2.8 years) and an average accuracy of 4.7 years predicted by only six loci from the 11 loci, as well as an less cross-validated error compared with linear regression model. Our novel strategy provides an accurate measurement that is highly useful in estimating the individual age in forensic practice as well as in tracking the aging process in other related applications. PMID:26635134
Igne, Benoît; Drennen, James K; Anderson, Carl A
2014-01-01
Changes in raw materials and process wear and tear can have significant effects on the prediction error of near-infrared calibration models. When the variability that is present during routine manufacturing is not included in the calibration, test, and validation sets, the long-term performance and robustness of the model will be limited. Nonlinearity is a major source of interference. In near-infrared spectroscopy, nonlinearity can arise from light path-length differences that can come from differences in particle size or density. The usefulness of support vector machine (SVM) regression to handle nonlinearity and improve the robustness of calibration models in scenarios where the calibration set did not include all the variability present in test was evaluated. Compared to partial least squares (PLS) regression, SVM regression was less affected by physical (particle size) and chemical (moisture) differences. The linearity of the SVM predicted values was also improved. Nevertheless, although visualization and interpretation tools have been developed to enhance the usability of SVM-based methods, work is yet to be done to provide chemometricians in the pharmaceutical industry with a regression method that can supplement PLS-based methods.
Li, Zhenghua; Cheng, Fansheng; Xia, Zhining
2011-01-01
The chemical structures of 114 polycyclic aromatic sulfur heterocycles (PASHs) have been studied by molecular electronegativity-distance vector (MEDV). The linear relationships between gas chromatographic retention index and the MEDV have been established by a multiple linear regression (MLR) model. The results of variable selection by stepwise multiple regression (SMR) and the powerful predictive abilities of the optimization model appraised by leave-one-out cross-validation showed that the optimization model with the correlation coefficient (R) of 0.994 7 and the cross-validated correlation coefficient (Rcv) of 0.994 0 possessed the best statistical quality. Furthermore, when the 114 PASHs compounds were divided into calibration and test sets in the ratio of 2:1, the statistical analysis showed our models possesses almost equal statistical quality, the very similar regression coefficients and the good robustness. The quantitative structure-retention relationship (QSRR) model established may provide a convenient and powerful method for predicting the gas chromatographic retention of PASHs.
Javed, Faizan; Chan, Gregory S H; Savkin, Andrey V; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-01-01
This paper uses non-linear support vector regression (SVR) to model the blood volume and heart rate (HR) responses in 9 hemodynamically stable kidney failure patients during hemodialysis. Using radial bias function (RBF) kernels the non-parametric models of relative blood volume (RBV) change with time as well as percentage change in HR with respect to RBV were obtained. The e-insensitivity based loss function was used for SVR modeling. Selection of the design parameters which includes capacity (C), insensitivity region (e) and the RBF kernel parameter (sigma) was made based on a grid search approach and the selected models were cross-validated using the average mean square error (AMSE) calculated from testing data based on a k-fold cross-validation technique. Linear regression was also applied to fit the curves and the AMSE was calculated for comparison with SVR. For the model based on RBV with time, SVR gave a lower AMSE for both training (AMSE=1.5) as well as testing data (AMSE=1.4) compared to linear regression (AMSE=1.8 and 1.5). SVR also provided a better fit for HR with RBV for both training as well as testing data (AMSE=15.8 and 16.4) compared to linear regression (AMSE=25.2 and 20.1).
NASA Astrophysics Data System (ADS)
Dougherty, Andrew W.
Metal oxides are a staple of the sensor industry. The combination of their sensitivity to a number of gases, and the electrical nature of their sensing mechanism, make the particularly attractive in solid state devices. The high temperature stability of the ceramic material also make them ideal for detecting combustion byproducts where exhaust temperatures can be high. However, problems do exist with metal oxide sensors. They are not very selective as they all tend to be sensitive to a number of reduction and oxidation reactions on the oxide's surface. This makes sensors with large numbers of sensors interesting to study as a method for introducing orthogonality to the system. Also, the sensors tend to suffer from long term drift for a number of reasons. In this thesis I will develop a system for intelligently modeling metal oxide sensors and determining their suitability for use in large arrays designed to analyze exhaust gas streams. It will introduce prior knowledge of the metal oxide sensors' response mechanisms in order to produce a response function for each sensor from sparse training data. The system will use the same technique to model and remove any long term drift from the sensor response. It will also provide an efficient means for determining the orthogonality of the sensor to determine whether they are useful in gas sensing arrays. The system is based on least squares support vector regression using the reciprocal kernel. The reciprocal kernel is introduced along with a method of optimizing the free parameters of the reciprocal kernel support vector machine. The reciprocal kernel is shown to be simpler and to perform better than an earlier kernel, the modified reciprocal kernel. Least squares support vector regression is chosen as it uses all of the training points and an emphasis was placed throughout this research for extracting the maximum information from very sparse data. The reciprocal kernel is shown to be effective in modeling the sensor responses in the time, gas and temperature domains, and the dual representation of the support vector regression solution is shown to provide insight into the sensor's sensitivity and potential orthogonality. Finally, the dual weights of the support vector regression solution to the sensor's response are suggested as a fitness function for a genetic algorithm, or some other method for efficiently searching large parameter spaces.
Fritscher, Karl; Schuler, Benedikt; Link, Thomas; Eckstein, Felix; Suhm, Norbert; Hänni, Markus; Hengg, Clemens; Schubert, Rainer
2008-01-01
Fractures of the proximal femur are one of the principal causes of mortality among elderly persons. Traditional methods for the determination of femoral fracture risk use methods for measuring bone mineral density. However, BMD alone is not sufficient to predict bone failure load for an individual patient and additional parameters have to be determined for this purpose. In this work an approach that uses statistical models of appearance to identify relevant regions and parameters for the prediction of biomechanical properties of the proximal femur will be presented. By using Support Vector Regression the proposed model based approach is capable of predicting two different biomechanical parameters accurately and fully automatically in two different testing scenarios.
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D.; Hood, Darryl B.; Skelton, Tyler
2014-01-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire. PMID:23395953
Chen, Chau-Kuang; Bruce, Michelle; Tyler, Lauren; Brown, Claudine; Garrett, Angelica; Goggins, Susan; Lewis-Polite, Brandy; Weriwoh, Mirabel L; Juarez, Paul D; Hood, Darryl B; Skelton, Tyler
2013-02-01
The goal of this study was to analyze a 54-item instrument for assessment of perception of exposure to environmental contaminants within the context of the built environment, or exposome. This exposome was defined in five domains to include 1) home and hobby, 2) school, 3) community, 4) occupation, and 5) exposure history. Interviews were conducted with child-bearing-age minority women at Metro Nashville General Hospital at Meharry Medical College. Data were analyzed utilizing DTReg software for Support Vector Machine (SVM) modeling followed by an SPSS package for a logistic regression model. The target (outcome) variable of interest was respondent's residence by ZIP code. The results demonstrate that the rank order of important variables with respect to SVM modeling versus traditional logistic regression models is almost identical. This is the first study documenting that SVM analysis has discriminate power for determination of higher-ordered spatial relationships on an environmental exposure history questionnaire.
MANCOVA for one way classification with homogeneity of regression coefficient vectors
NASA Astrophysics Data System (ADS)
Mokesh Rayalu, G.; Ravisankar, J.; Mythili, G. Y.
2017-11-01
The MANOVA and MANCOVA are the extensions of the univariate ANOVA and ANCOVA techniques to multidimensional or vector valued observations. The assumption of a Gaussian distribution has been replaced with the Multivariate Gaussian distribution for the vectors data and residual term variables in the statistical models of these techniques. The objective of MANCOVA is to determine if there are statistically reliable mean differences that can be demonstrated between groups later modifying the newly created variable. When randomization assignment of samples or subjects to groups is not possible, multivariate analysis of covariance (MANCOVA) provides statistical matching of groups by adjusting dependent variables as if all subjects scored the same on the covariates. In this research article, an extension has been made to the MANCOVA technique with more number of covariates and homogeneity of regression coefficient vectors is also tested.
NASA Astrophysics Data System (ADS)
Thomas, Stephanie Margarete; Beierkuhnlein, Carl
2013-05-01
The occurrence of ectotherm disease vectors outside of their previous distribution area and the emergence of vector-borne diseases can be increasingly observed at a global scale and are accompanied by a growing number of studies which investigate the vast range of determining factors and their causal links. Consequently, a broad span of scientific disciplines is involved in tackling these complex phenomena. First, we evaluate the citation behaviour of relevant scientific literature in order to clarify the question "do scientists consider results of other disciplines to extend their expertise?" We then highlight emerging tools and concepts useful for risk assessment. Correlative models (regression-based, machine-learning and profile techniques), mechanistic models (basic reproduction number R 0) and methods of spatial regression, interaction and interpolation are described. We discuss further steps towards multidisciplinary approaches regarding new tools and emerging concepts to combine existing approaches such as Bayesian geostatistical modelling, mechanistic models which avoid the need for parameter fitting, joined correlative and mechanistic models, multi-criteria decision analysis and geographic profiling. We take the quality of both occurrence data for vector, host and disease cases, and data of the predictor variables into consideration as both determine the accuracy of risk area identification. Finally, we underline the importance of multidisciplinary research approaches. Even if the establishment of communication networks between scientific disciplines and the share of specific methods is time consuming, it promises new insights for the surveillance and control of vector-borne diseases worldwide.
Hsieh, Chung-Ho; Lu, Ruey-Hwa; Lee, Nai-Hsin; Chiu, Wen-Ta; Hsu, Min-Huei; Li, Yu-Chuan Jack
2011-01-01
Diagnosing acute appendicitis clinically is still difficult. We developed random forests, support vector machines, and artificial neural network models to diagnose acute appendicitis. Between January 2006 and December 2008, patients who had a consultation session with surgeons for suspected acute appendicitis were enrolled. Seventy-five percent of the data set was used to construct models including random forest, support vector machines, artificial neural networks, and logistic regression. Twenty-five percent of the data set was withheld to evaluate model performance. The area under the receiver operating characteristic curve (AUC) was used to evaluate performance, which was compared with that of the Alvarado score. Data from a total of 180 patients were collected, 135 used for training and 45 for testing. The mean age of patients was 39.4 years (range, 16-85). Final diagnosis revealed 115 patients with and 65 without appendicitis. The AUC of random forest, support vector machines, artificial neural networks, logistic regression, and Alvarado was 0.98, 0.96, 0.91, 0.87, and 0.77, respectively. The sensitivity, specificity, positive, and negative predictive values of random forest were 94%, 100%, 100%, and 87%, respectively. Random forest performed better than artificial neural networks, logistic regression, and Alvarado. We demonstrated that random forest can predict acute appendicitis with good accuracy and, deployed appropriately, can be an effective tool in clinical decision making. Copyright © 2011 Mosby, Inc. All rights reserved.
Van Belle, Vanya; Pelckmans, Kristiaan; Van Huffel, Sabine; Suykens, Johan A K
2011-10-01
To compare and evaluate ranking, regression and combined machine learning approaches for the analysis of survival data. The literature describes two approaches based on support vector machines to deal with censored observations. In the first approach the key idea is to rephrase the task as a ranking problem via the concordance index, a problem which can be solved efficiently in a context of structural risk minimization and convex optimization techniques. In a second approach, one uses a regression approach, dealing with censoring by means of inequality constraints. The goal of this paper is then twofold: (i) introducing a new model combining the ranking and regression strategy, which retains the link with existing survival models such as the proportional hazards model via transformation models; and (ii) comparison of the three techniques on 6 clinical and 3 high-dimensional datasets and discussing the relevance of these techniques over classical approaches fur survival data. We compare svm-based survival models based on ranking constraints, based on regression constraints and models based on both ranking and regression constraints. The performance of the models is compared by means of three different measures: (i) the concordance index, measuring the model's discriminating ability; (ii) the logrank test statistic, indicating whether patients with a prognostic index lower than the median prognostic index have a significant different survival than patients with a prognostic index higher than the median; and (iii) the hazard ratio after normalization to restrict the prognostic index between 0 and 1. Our results indicate a significantly better performance for models including regression constraints above models only based on ranking constraints. This work gives empirical evidence that svm-based models using regression constraints perform significantly better than svm-based models based on ranking constraints. Our experiments show a comparable performance for methods including only regression or both regression and ranking constraints on clinical data. On high dimensional data, the former model performs better. However, this approach does not have a theoretical link with standard statistical models for survival data. This link can be made by means of transformation models when ranking constraints are included. Copyright © 2011 Elsevier B.V. All rights reserved.
Marek K. Jakubowksi; Qinghua Guo; Brandon Collins; Scott Stephens; Maggi Kelly
2013-01-01
We compared the ability of several classification and regression algorithms to predict forest stand structure metrics and standard surface fuel models. Our study area spans a dense, topographically complex Sierra Nevada mixed-conifer forest. We used clustering, regression trees, and support vector machine algorithms to analyze high density (average 9 pulses/m
Electricity Load Forecasting Using Support Vector Regression with Memetic Algorithms
Hu, Zhongyi; Xiong, Tao
2013-01-01
Electricity load forecasting is an important issue that is widely explored and examined in power systems operation literature and commercial transactions in electricity markets literature as well. Among the existing forecasting models, support vector regression (SVR) has gained much attention. Considering the performance of SVR highly depends on its parameters; this study proposed a firefly algorithm (FA) based memetic algorithm (FA-MA) to appropriately determine the parameters of SVR forecasting model. In the proposed FA-MA algorithm, the FA algorithm is applied to explore the solution space, and the pattern search is used to conduct individual learning and thus enhance the exploitation of FA. Experimental results confirm that the proposed FA-MA based SVR model can not only yield more accurate forecasting results than the other four evolutionary algorithms based SVR models and three well-known forecasting models but also outperform the hybrid algorithms in the related existing literature. PMID:24459425
Electricity load forecasting using support vector regression with memetic algorithms.
Hu, Zhongyi; Bao, Yukun; Xiong, Tao
2013-01-01
Electricity load forecasting is an important issue that is widely explored and examined in power systems operation literature and commercial transactions in electricity markets literature as well. Among the existing forecasting models, support vector regression (SVR) has gained much attention. Considering the performance of SVR highly depends on its parameters; this study proposed a firefly algorithm (FA) based memetic algorithm (FA-MA) to appropriately determine the parameters of SVR forecasting model. In the proposed FA-MA algorithm, the FA algorithm is applied to explore the solution space, and the pattern search is used to conduct individual learning and thus enhance the exploitation of FA. Experimental results confirm that the proposed FA-MA based SVR model can not only yield more accurate forecasting results than the other four evolutionary algorithms based SVR models and three well-known forecasting models but also outperform the hybrid algorithms in the related existing literature.
Using Time Series Analysis to Predict Cardiac Arrest in a PICU.
Kennedy, Curtis E; Aoki, Noriaki; Mariscalco, Michele; Turley, James P
2015-11-01
To build and test cardiac arrest prediction models in a PICU, using time series analysis as input, and to measure changes in prediction accuracy attributable to different classes of time series data. Retrospective cohort study. Thirty-one bed academic PICU that provides care for medical and general surgical (not congenital heart surgery) patients. Patients experiencing a cardiac arrest in the PICU and requiring external cardiac massage for at least 2 minutes. None. One hundred three cases of cardiac arrest and 109 control cases were used to prepare a baseline dataset that consisted of 1,025 variables in four data classes: multivariate, raw time series, clinical calculations, and time series trend analysis. We trained 20 arrest prediction models using a matrix of five feature sets (combinations of data classes) with four modeling algorithms: linear regression, decision tree, neural network, and support vector machine. The reference model (multivariate data with regression algorithm) had an accuracy of 78% and 87% area under the receiver operating characteristic curve. The best model (multivariate + trend analysis data with support vector machine algorithm) had an accuracy of 94% and 98% area under the receiver operating characteristic curve. Cardiac arrest predictions based on a traditional model built with multivariate data and a regression algorithm misclassified cases 3.7 times more frequently than predictions that included time series trend analysis and built with a support vector machine algorithm. Although the final model lacks the specificity necessary for clinical application, we have demonstrated how information from time series data can be used to increase the accuracy of clinical prediction models.
NASA Astrophysics Data System (ADS)
Delbari, Masoomeh; Sharifazari, Salman; Mohammadi, Ehsan
2018-02-01
The knowledge of soil temperature at different depths is important for agricultural industry and for understanding climate change. The aim of this study is to evaluate the performance of a support vector regression (SVR)-based model in estimating daily soil temperature at 10, 30 and 100 cm depth at different climate conditions over Iran. The obtained results were compared to those obtained from a more classical multiple linear regression (MLR) model. The correlation sensitivity for the input combinations and periodicity effect were also investigated. Climatic data used as inputs to the models were minimum and maximum air temperature, solar radiation, relative humidity, dew point, and the atmospheric pressure (reduced to see level), collected from five synoptic stations Kerman, Ahvaz, Tabriz, Saghez, and Rasht located respectively in the hyper-arid, arid, semi-arid, Mediterranean, and hyper-humid climate conditions. According to the results, the performance of both MLR and SVR models was quite well at surface layer, i.e., 10-cm depth. However, SVR performed better than MLR in estimating soil temperature at deeper layers especially 100 cm depth. Moreover, both models performed better in humid climate condition than arid and hyper-arid areas. Further, adding a periodicity component into the modeling process considerably improved the models' performance especially in the case of SVR.
TWSVR: Regression via Twin Support Vector Machine.
Khemchandani, Reshma; Goyal, Keshav; Chandra, Suresh
2016-02-01
Taking motivation from Twin Support Vector Machine (TWSVM) formulation, Peng (2010) attempted to propose Twin Support Vector Regression (TSVR) where the regressor is obtained via solving a pair of quadratic programming problems (QPPs). In this paper we argue that TSVR formulation is not in the true spirit of TWSVM. Further, taking motivation from Bi and Bennett (2003), we propose an alternative approach to find a formulation for Twin Support Vector Regression (TWSVR) which is in the true spirit of TWSVM. We show that our proposed TWSVR can be derived from TWSVM for an appropriately constructed classification problem. To check the efficacy of our proposed TWSVR we compare its performance with TSVR and classical Support Vector Regression(SVR) on various regression datasets. Copyright © 2015 Elsevier Ltd. All rights reserved.
Rainfall-induced Landslide Susceptibility assessment at the Longnan county
NASA Astrophysics Data System (ADS)
Hong, Haoyuan; Zhang, Ying
2017-04-01
Landslides are a serious disaster in Longnan county, China. Therefore landslide susceptibility assessment is useful tool for government or decision making. The main objective of this study is to investigate and compare the frequency ratio, support vector machines, and logistic regression. The Longnan county (Jiangxi province, China) was selected as the case study. First, the landslide inventory map with 354 landslide locations was constructed. Then landslide locations were then randomly divided into a ratio of 70/30 for the training and validating the models. Second, fourteen landslide conditioning factors were prepared such as slope, aspect, altitude, topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), plan curvature, lithology, distance to faults, distance to rivers, distance to roads, land use, normalized difference vegetation index (NDVI), and rainfall. Using the frequency ratio, support vector machines, and logistic regression, a total of three landslide susceptibility models were constructed. Finally, the overall performance of the resulting models was assessed and compared using the Receiver operating characteristic (ROC) curve technique. The result showed that the support vector machines model is the best model in the study area. The success rate is 88.39 %; and prediction rate is 84.06 %.
NASA Astrophysics Data System (ADS)
Heddam, Salim; Kisi, Ozgur
2018-04-01
In the present study, three types of artificial intelligence techniques, least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5T) are applied for modeling daily dissolved oxygen (DO) concentration using several water quality variables as inputs. The DO concentration and water quality variables data from three stations operated by the United States Geological Survey (USGS) were used for developing the three models. The water quality data selected consisted of daily measured of water temperature (TE, °C), pH (std. unit), specific conductance (SC, μS/cm) and discharge (DI cfs), are used as inputs to the LSSVM, MARS and M5T models. The three models were applied for each station separately and compared to each other. According to the results obtained, it was found that: (i) the DO concentration could be successfully estimated using the three models and (ii) the best model among all others differs from one station to another.
Estimating top-of-atmosphere thermal infrared radiance using MERRA-2 atmospheric data
NASA Astrophysics Data System (ADS)
Kleynhans, Tania; Montanaro, Matthew; Gerace, Aaron; Kanan, Christopher
2017-05-01
Thermal infrared satellite images have been widely used in environmental studies. However, satellites have limited temporal resolution, e.g., 16 day Landsat or 1 to 2 day Terra MODIS. This paper investigates the use of the Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) reanalysis data product, produced by NASA's Global Modeling and Assimilation Office (GMAO) to predict global topof-atmosphere (TOA) thermal infrared radiance. The high temporal resolution of the MERRA-2 data product presents opportunities for novel research and applications. Various methods were applied to estimate TOA radiance from MERRA-2 variables namely (1) a parameterized physics based method, (2) Linear regression models and (3) non-linear Support Vector Regression. Model prediction accuracy was evaluated using temporally and spatially coincident Moderate Resolution Imaging Spectroradiometer (MODIS) thermal infrared data as reference data. This research found that Support Vector Regression with a radial basis function kernel produced the lowest error rates. Sources of errors are discussed and defined. Further research is currently being conducted to train deep learning models to predict TOA thermal radiance
NASA Astrophysics Data System (ADS)
Dokuchaev, P. M.; Meshalkina, J. L.; Yaroslavtsev, A. M.
2018-01-01
Comparative analysis of soils geospatial modeling using multinomial logistic regression, decision trees, random forest, regression trees and support vector machines algorithms was conducted. The visual interpretation of the digital maps obtained and their comparison with the existing map, as well as the quantitative assessment of the individual soil groups detection overall accuracy and of the models kappa showed that multiple logistic regression, support vector method, and random forest models application with spatial prediction of the conditional soil groups distribution can be reliably used for mapping of the study area. It has shown the most accurate detection for sod-podzolics soils (Phaeozems Albic) lightly eroded and moderately eroded soils. In second place, according to the mean overall accuracy of the prediction, there are sod-podzolics soils - non-eroded and warp one, as well as sod-gley soils (Umbrisols Gleyic) and alluvial soils (Fluvisols Dystric, Umbric). Heavy eroded sod-podzolics and gray forest soils (Phaeozems Albic) were detected by methods of automatic classification worst of all.
The dynamic correlation between policy uncertainty and stock market returns in China
NASA Astrophysics Data System (ADS)
Yang, Miao; Jiang, Zhi-Qiang
2016-11-01
The dynamic correlation is examined between government's policy uncertainty and Chinese stock market returns in the period from January 1995 to December 2014. We find that the stock market is significantly correlated to policy uncertainty based on the results of the Vector Auto Regression (VAR) and Structural Vector Auto Regression (SVAR) models. In contrast, the results of the Dynamic Conditional Correlation Generalized Multivariate Autoregressive Conditional Heteroscedasticity (DCC-MGARCH) model surprisingly show a low dynamic correlation coefficient between policy uncertainty and market returns, suggesting that the fluctuations of each variable are greatly influenced by their values in the preceding period. Our analysis highlights the understanding of the dynamical relationship between stock market and fiscal and monetary policy.
Alwee, Razana; Hj Shamsuddin, Siti Mariyam; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models. PMID:23766729
Alwee, Razana; Shamsuddin, Siti Mariyam Hj; Sallehuddin, Roselina
2013-01-01
Crimes forecasting is an important area in the field of criminology. Linear models, such as regression and econometric models, are commonly applied in crime forecasting. However, in real crimes data, it is common that the data consists of both linear and nonlinear components. A single model may not be sufficient to identify all the characteristics of the data. The purpose of this study is to introduce a hybrid model that combines support vector regression (SVR) and autoregressive integrated moving average (ARIMA) to be applied in crime rates forecasting. SVR is very robust with small training data and high-dimensional problem. Meanwhile, ARIMA has the ability to model several types of time series. However, the accuracy of the SVR model depends on values of its parameters, while ARIMA is not robust to be applied to small data sets. Therefore, to overcome this problem, particle swarm optimization is used to estimate the parameters of the SVR and ARIMA models. The proposed hybrid model is used to forecast the property crime rates of the United State based on economic indicators. The experimental results show that the proposed hybrid model is able to produce more accurate forecasting results as compared to the individual models.
Soft computing techniques toward modeling the water supplies of Cyprus.
Iliadis, L; Maris, F; Tachos, S
2011-10-01
This research effort aims in the application of soft computing techniques toward water resources management. More specifically, the target is the development of reliable soft computing models capable of estimating the water supply for the case of "Germasogeia" mountainous watersheds in Cyprus. Initially, ε-Regression Support Vector Machines (ε-RSVM) and fuzzy weighted ε-RSVMR models have been developed that accept five input parameters. At the same time, reliable artificial neural networks have been developed to perform the same job. The 5-fold cross validation approach has been employed in order to eliminate bad local behaviors and to produce a more representative training data set. Thus, the fuzzy weighted Support Vector Regression (SVR) combined with the fuzzy partition has been employed in an effort to enhance the quality of the results. Several rational and reliable models have been produced that can enhance the efficiency of water policy designers. Copyright © 2011 Elsevier Ltd. All rights reserved.
2015-01-07
vector that helps to manage , predict, and mitigate the risk in the original variable. Residual risk can be exemplified as a quantification of the improved... the random variable of interest is viewed in concert with a related random vector that helps to manage , predict, and mitigate the risk in the original...measures of risk. They view a random variable of interest in concert with an auxiliary random vector that helps to manage , predict and mitigate the risk
Forecasting Daily Patient Outflow From a Ward Having No Real-Time Clinical Data
Tran, Truyen; Luo, Wei; Phung, Dinh; Venkatesh, Svetha
2016-01-01
Background: Modeling patient flow is crucial in understanding resource demand and prioritization. We study patient outflow from an open ward in an Australian hospital, where currently bed allocation is carried out by a manager relying on past experiences and looking at demand. Automatic methods that provide a reasonable estimate of total next-day discharges can aid in efficient bed management. The challenges in building such methods lie in dealing with large amounts of discharge noise introduced by the nonlinear nature of hospital procedures, and the nonavailability of real-time clinical information in wards. Objective Our study investigates different models to forecast the total number of next-day discharges from an open ward having no real-time clinical data. Methods We compared 5 popular regression algorithms to model total next-day discharges: (1) autoregressive integrated moving average (ARIMA), (2) the autoregressive moving average with exogenous variables (ARMAX), (3) k-nearest neighbor regression, (4) random forest regression, and (5) support vector regression. Although the autoregressive integrated moving average model relied on past 3-month discharges, nearest neighbor forecasting used median of similar discharges in the past in estimating next-day discharge. In addition, the ARMAX model used the day of the week and number of patients currently in ward as exogenous variables. For the random forest and support vector regression models, we designed a predictor set of 20 patient features and 88 ward-level features. Results Our data consisted of 12,141 patient visits over 1826 days. Forecasting quality was measured using mean forecast error, mean absolute error, symmetric mean absolute percentage error, and root mean square error. When compared with a moving average prediction model, all 5 models demonstrated superior performance with the random forests achieving 22.7% improvement in mean absolute error, for all days in the year 2014. Conclusions In the absence of clinical information, our study recommends using patient-level and ward-level data in predicting next-day discharges. Random forest and support vector regression models are able to use all available features from such data, resulting in superior performance over traditional autoregressive methods. An intelligent estimate of available beds in wards plays a crucial role in relieving access block in emergency departments. PMID:27444059
Robust support vector regression networks for function approximation with outliers.
Chuang, Chen-Chia; Su, Shun-Feng; Jeng, Jin-Tsong; Hsiao, Chih-Ching
2002-01-01
Support vector regression (SVR) employs the support vector machine (SVM) to tackle problems of function approximation and regression estimation. SVR has been shown to have good robust properties against noise. When the parameters used in SVR are improperly selected, overfitting phenomena may still occur. However, the selection of various parameters is not straightforward. Besides, in SVR, outliers may also possibly be taken as support vectors. Such an inclusion of outliers in support vectors may lead to seriously overfitting phenomena. In this paper, a novel regression approach, termed as the robust support vector regression (RSVR) network, is proposed to enhance the robust capability of SVR. In the approach, traditional robust learning approaches are employed to improve the learning performance for any selected parameters. From the simulation results, our RSVR can always improve the performance of the learned systems for all cases. Besides, it can be found that even the training lasted for a long period, the testing errors would not go up. In other words, the overfitting phenomenon is indeed suppressed.
[Mapping environmental vulnerability from ETM + data in the Yellow River Mouth Area].
Wang, Rui-Yan; Yu, Zhen-Wen; Xia, Yan-Ling; Wang, Xiang-Feng; Zhao, Geng-Xing; Jiang, Shu-Qian
2013-10-01
The environmental vulnerability retrieval is important to support continuing data. The spatial distribution of regional environmental vulnerability was got through remote sensing retrieval. In view of soil and vegetation, the environmental vulnerability evaluation index system was built, and the environmental vulnerability of sampling points was calculated by the AHP-fuzzy method, then the correlation between the sampling points environmental vulnerability and ETM + spectral reflectance ratio including some kinds of conversion data was analyzed to determine the sensitive spectral parameters. Based on that, models of correlation analysis, traditional regression, BP neural network and support vector regression were taken to explain the quantitative relationship between the spectral reflectance and the environmental vulnerability. With this model, the environmental vulnerability distribution was retrieved in the Yellow River Mouth Area. The results showed that the correlation between the environmental vulnerability and the spring NDVI, the September NDVI and the spring brightness was better than others, so they were selected as the sensitive spectral parameters. The model precision result showed that in addition to the support vector model, the other model reached the significant level. While all the multi-variable regression was better than all one-variable regression, and the model accuracy of BP neural network was the best. This study will serve as a reliable theoretical reference for the large spatial scale environmental vulnerability estimation based on remote sensing data.
Quantile regression via vector generalized additive models.
Yee, Thomas W
2004-07-30
One of the most popular methods for quantile regression is the LMS method of Cole and Green. The method naturally falls within a penalized likelihood framework, and consequently allows for considerable flexible because all three parameters may be modelled by cubic smoothing splines. The model is also very understandable: for a given value of the covariate, the LMS method applies a Box-Cox transformation to the response in order to transform it to standard normality; to obtain the quantiles, an inverse Box-Cox transformation is applied to the quantiles of the standard normal distribution. The purposes of this article are three-fold. Firstly, LMS quantile regression is presented within the framework of the class of vector generalized additive models. This confers a number of advantages such as a unifying theory and estimation process. Secondly, a new LMS method based on the Yeo-Johnson transformation is proposed, which has the advantage that the response is not restricted to be positive. Lastly, this paper describes a software implementation of three LMS quantile regression methods in the S language. This includes the LMS-Yeo-Johnson method, which is estimated efficiently by a new numerical integration scheme. The LMS-Yeo-Johnson method is illustrated by way of a large cross-sectional data set from a New Zealand working population. Copyright 2004 John Wiley & Sons, Ltd.
Improved animal models for testing gene therapy for atherosclerosis.
Du, Liang; Zhang, Jingwan; De Meyer, Guido R Y; Flynn, Rowan; Dichek, David A
2014-04-01
Gene therapy delivered to the blood vessel wall could augment current therapies for atherosclerosis, including systemic drug therapy and stenting. However, identification of clinically useful vectors and effective therapeutic transgenes remains at the preclinical stage. Identification of effective vectors and transgenes would be accelerated by availability of animal models that allow practical and expeditious testing of vessel-wall-directed gene therapy. Such models would include humanlike lesions that develop rapidly in vessels that are amenable to efficient gene delivery. Moreover, because human atherosclerosis develops in normal vessels, gene therapy that prevents atherosclerosis is most logically tested in relatively normal arteries. Similarly, gene therapy that causes atherosclerosis regression requires gene delivery to an existing lesion. Here we report development of three new rabbit models for testing vessel-wall-directed gene therapy that either prevents or reverses atherosclerosis. Carotid artery intimal lesions in these new models develop within 2-7 months after initiation of a high-fat diet and are 20-80 times larger than lesions in a model we described previously. Individual models allow generation of lesions that are relatively rich in either macrophages or smooth muscle cells, permitting testing of gene therapy strategies targeted at either cell type. Two of the models include gene delivery to essentially normal arteries and will be useful for identifying strategies that prevent lesion development. The third model generates lesions rapidly in vector-naïve animals and can be used for testing gene therapy that promotes lesion regression. These models are optimized for testing helper-dependent adenovirus (HDAd)-mediated gene therapy; however, they could be easily adapted for testing of other vectors or of different types of molecular therapies, delivered directly to the blood vessel wall. Our data also supports the promise of HDAd to deliver long-term therapy from vascular endothelium without accelerating atherosclerotic disease.
NASA Astrophysics Data System (ADS)
Zounemat-Kermani, Mohammad
2012-08-01
In this study, the ability of two models of multi linear regression (MLR) and Levenberg-Marquardt (LM) feed-forward neural network was examined to estimate the hourly dew point temperature. Dew point temperature is the temperature at which water vapor in the air condenses into liquid. This temperature can be useful in estimating meteorological variables such as fog, rain, snow, dew, and evapotranspiration and in investigating agronomical issues as stomatal closure in plants. The availability of hourly records of climatic data (air temperature, relative humidity and pressure) which could be used to predict dew point temperature initiated the practice of modeling. Additionally, the wind vector (wind speed magnitude and direction) and conceptual input of weather condition were employed as other input variables. The three quantitative standard statistical performance evaluation measures, i.e. the root mean squared error, mean absolute error, and absolute logarithmic Nash-Sutcliffe efficiency coefficient ( {| {{{Log}}({{NS}})} |} ) were employed to evaluate the performances of the developed models. The results showed that applying wind vector and weather condition as input vectors along with meteorological variables could slightly increase the ANN and MLR predictive accuracy. The results also revealed that LM-NN was superior to MLR model and the best performance was obtained by considering all potential input variables in terms of different evaluation criteria.
Applications of Support Vector Machines In Chemo And Bioinformatics
NASA Astrophysics Data System (ADS)
Jayaraman, V. K.; Sundararajan, V.
2010-10-01
Conventional linear & nonlinear tools for classification, regression & data driven modeling are being replaced on a rapid scale by newer techniques & tools based on artificial intelligence and machine learning. While the linear techniques are not applicable for inherently nonlinear problems, newer methods serve as attractive alternatives for solving real life problems. Support Vector Machine (SVM) classifiers are a set of universal feed-forward network based classification algorithms that have been formulated from statistical learning theory and structural risk minimization principle. SVM regression closely follows the classification methodology. In this work recent applications of SVM in Chemo & Bioinformatics will be described with suitable illustrative examples.
Zhou, Pei-pei; Shan, Jin-feng; Jiang, Jian-lan
2015-12-01
To optimize the optimal microwave-assisted extraction method of curcuminoids from Curcuma longa. On the base of single factor experiment, the ethanol concentration, the ratio of liquid to solid and the microwave time were selected for further optimization. Support Vector Regression (SVR) and Central Composite Design-Response Surface Methodology (CCD) algorithm were utilized to design and establish models respectively, while Particle Swarm Optimization (PSO) was introduced to optimize the parameters of SVR models and to search optimal points of models. The evaluation indicator, the sum of curcumin, demethoxycurcumin and bisdemethoxycurcumin by HPLC, were used. The optimal parameters of microwave-assisted extraction were as follows: ethanol concentration of 69%, ratio of liquid to solid of 21 : 1, microwave time of 55 s. On those conditions, the sum of three curcuminoids was 28.97 mg/g (per gram of rhizomes powder). Both the CCD model and the SVR model were credible, for they have predicted the similar process condition and the deviation of yield were less than 1.2%.
Blood glucose level prediction based on support vector regression using mobile platforms.
Reymann, Maximilian P; Dorschky, Eva; Groh, Benjamin H; Martindale, Christine; Blank, Peter; Eskofier, Bjoern M
2016-08-01
The correct treatment of diabetes is vital to a patient's health: Staying within defined blood glucose levels prevents dangerous short- and long-term effects on the body. Mobile devices informing patients about their future blood glucose levels could enable them to take counter-measures to prevent hypo or hyper periods. Previous work addressed this challenge by predicting the blood glucose levels using regression models. However, these approaches required a physiological model, representing the human body's response to insulin and glucose intake, or are not directly applicable to mobile platforms (smart phones, tablets). In this paper, we propose an algorithm for mobile platforms to predict blood glucose levels without the need for a physiological model. Using an online software simulator program, we trained a Support Vector Regression (SVR) model and exported the parameter settings to our mobile platform. The prediction accuracy of our mobile platform was evaluated with pre-recorded data of a type 1 diabetes patient. The blood glucose level was predicted with an error of 19 % compared to the true value. Considering the permitted error of commercially used devices of 15 %, our algorithm is the basis for further development of mobile prediction algorithms.
Spacebased Estimation of Moisture Transport in Marine Atmosphere Using Support Vector Regression
NASA Technical Reports Server (NTRS)
Xie, Xiaosu; Liu, W. Timothy; Tang, Benyang
2007-01-01
An improved algorithm is developed based on support vector regression (SVR) to estimate horizonal water vapor transport integrated through the depth of the atmosphere ((Theta)) over the global ocean from observations of surface wind-stress vector by QuikSCAT, cloud drift wind vector derived from the Multi-angle Imaging SpectroRadiometer (MISR) and geostationary satellites, and precipitable water from the Special Sensor Microwave/Imager (SSM/I). The statistical relation is established between the input parameters (the surface wind stress, the 850 mb wind, the precipitable water, time and location) and the target data ((Theta) calculated from rawinsondes and reanalysis of numerical weather prediction model). The results are validated with independent daily rawinsonde observations, monthly mean reanalysis data, and through regional water balance. This study clearly demonstrates the improvement of (Theta) derived from satellite data using SVR over previous data sets based on linear regression and neural network. The SVR methodology reduces both mean bias and standard deviation comparedwith rawinsonde observations. It agrees better with observations from synoptic to seasonal time scales, and compare more favorably with the reanalysis data on seasonal variations. Only the SVR result can achieve the water balance over South America. The rationale of the advantage by SVR method and the impact of adding the upper level wind will also be discussed.
NASA Astrophysics Data System (ADS)
Valizadeh, Maryam; Sohrabi, Mahmoud Reza
2018-03-01
In the present study, artificial neural networks (ANNs) and support vector regression (SVR) as intelligent methods coupled with UV spectroscopy for simultaneous quantitative determination of Dorzolamide (DOR) and Timolol (TIM) in eye drop. Several synthetic mixtures were analyzed for validating the proposed methods. At first, neural network time series, which one type of network from the artificial neural network was employed and its efficiency was evaluated. Afterwards, the radial basis network was applied as another neural network. Results showed that the performance of this method is suitable for predicting. Finally, support vector regression was proposed to construct the Zilomole prediction model. Also, root mean square error (RMSE) and mean recovery (%) were calculated for SVR method. Moreover, the proposed methods were compared to the high-performance liquid chromatography (HPLC) as a reference method. One way analysis of variance (ANOVA) test at the 95% confidence level applied to the comparison results of suggested and reference methods that there were no significant differences between them. Also, the effect of interferences was investigated in spike solutions.
Bias and uncertainty in regression-calibrated models of groundwater flow in heterogeneous media
Cooley, R.L.; Christensen, S.
2006-01-01
Groundwater models need to account for detailed but generally unknown spatial variability (heterogeneity) of the hydrogeologic model inputs. To address this problem we replace the large, m-dimensional stochastic vector ?? that reflects both small and large scales of heterogeneity in the inputs by a lumped or smoothed m-dimensional approximation ????*, where ?? is an interpolation matrix and ??* is a stochastic vector of parameters. Vector ??* has small enough dimension to allow its estimation with the available data. The consequence of the replacement is that model function f(????*) written in terms of the approximate inputs is in error with respect to the same model function written in terms of ??, ??,f(??), which is assumed to be nearly exact. The difference f(??) - f(????*), termed model error, is spatially correlated, generates prediction biases, and causes standard confidence and prediction intervals to be too small. Model error is accounted for in the weighted nonlinear regression methodology developed to estimate ??* and assess model uncertainties by incorporating the second-moment matrix of the model errors into the weight matrix. Techniques developed by statisticians to analyze classical nonlinear regression methods are extended to analyze the revised method. The analysis develops analytical expressions for bias terms reflecting the interaction of model nonlinearity and model error, for correction factors needed to adjust the sizes of confidence and prediction intervals for this interaction, and for correction factors needed to adjust the sizes of confidence and prediction intervals for possible use of a diagonal weight matrix in place of the correct one. If terms expressing the degree of intrinsic nonlinearity for f(??) and f(????*) are small, then most of the biases are small and the correction factors are reduced in magnitude. Biases, correction factors, and confidence and prediction intervals were obtained for a test problem for which model error is large to test robustness of the methodology. Numerical results conform with the theoretical analysis. ?? 2005 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Bilal, Maria; Bilal, Muhammad; Saleem, Muhammad; Khurram, Muhammad; Khan, Saranjam; Ullah, Rahat; Ali, Hina; Ahmed, Mushtaq; Shahzada, Shaista; Ullah Khan, Ehsan
2017-04-01
Raman spectroscopy based investigations of the molecular changes associated with an early stage of dengue virus infection (DENV) using a partial least squares (PLS) regression model is presented. This study is based on non-structural protein 1 (NS1) which appears after three days of DENV infection. In total, 39 blood sera samples were collected and divided into two groups. The control group contained samples which were the negative for NS1 and antibodies and the positive group contained those samples in which NS1 is positive and antibodies were negative. Out of 39 samples, 29 Raman spectra were used for the model development while the remaining 10 were kept hidden for blind testing of the model. PLS regression yielded a vector of regression coefficients as a function of Raman shift, which were analyzed. Cytokines in the region 775-875 cm-1, lectins at 1003, 1238, 1340, 1449 and 1672 cm-1, DNA in the region 1040-1140 cm-1 and alpha and beta structures of proteins in the region 933-967 cm-1 have been identified in the regression vector for their role in an early stage of DENV infection. Validity of the model was established by its R-square value of 0.891. Sensitivity, specificity and accuracy were 100% each and the area under the receiver operator characteristic curve was found to be 1.
Hanrahan, Kirsten; McCarthy, Ann Marie; Kleiber, Charmaine; Ataman, Kaan; Street, W Nick; Zimmerman, M Bridget; Ersig, Anne L
2012-10-01
This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children's responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model outputs to identify overall risk for distress. A decision tree was then applied to evidence-based instructions for tailoring distraction to characteristics and preferences of the parent and child. The resulting decision support computer application, titled Children, Parents and Distraction, is being used in research. Future use will support practitioners in deciding the level and type of distraction intervention needed by a child undergoing a healthcare procedure.
NASA Astrophysics Data System (ADS)
Chen, Jing; Qiu, Xiaojie; Yin, Cunyi; Jiang, Hao
2018-02-01
An efficient method to design the broadband gain-flattened Raman fiber amplifier with multiple pumps is proposed based on least squares support vector regression (LS-SVR). A multi-input multi-output LS-SVR model is introduced to replace the complicated solving process of the nonlinear coupled Raman amplification equation. The proposed approach contains two stages: offline training stage and online optimization stage. During the offline stage, the LS-SVR model is trained. Owing to the good generalization capability of LS-SVR, the net gain spectrum can be directly and accurately obtained when inputting any combination of the pump wavelength and power to the well-trained model. During the online stage, we incorporate the LS-SVR model into the particle swarm optimization algorithm to find the optimal pump configuration. The design results demonstrate that the proposed method greatly shortens the computation time and enhances the efficiency of the pump parameter optimization for Raman fiber amplifier design.
NASA Astrophysics Data System (ADS)
Kisi, Ozgur; Parmar, Kulwinder Singh
2016-03-01
This study investigates the accuracy of least square support vector machine (LSSVM), multivariate adaptive regression splines (MARS) and M5 model tree (M5Tree) in modeling river water pollution. Various combinations of water quality parameters, Free Ammonia (AMM), Total Kjeldahl Nitrogen (TKN), Water Temperature (WT), Total Coliform (TC), Fecal Coliform (FC) and Potential of Hydrogen (pH) monitored at Nizamuddin, Delhi Yamuna River in India were used as inputs to the applied models. Results indicated that the LSSVM and MARS models had almost same accuracy and they performed better than the M5Tree model in modeling monthly chemical oxygen demand (COD). The average root mean square error (RMSE) of the LSSVM and M5Tree models was decreased by 1.47% and 19.1% using MARS model, respectively. Adding TC input to the models did not increase their accuracy in modeling COD while adding FC and pH inputs to the models generally decreased the accuracy. The overall results indicated that the MARS and LSSVM models could be successfully used in estimating monthly river water pollution level by using AMM, TKN and WT parameters as inputs.
Balabin, Roman M; Lomakina, Ekaterina I
2011-04-21
In this study, we make a general comparison of the accuracy and robustness of five multivariate calibration models: partial least squares (PLS) regression or projection to latent structures, polynomial partial least squares (Poly-PLS) regression, artificial neural networks (ANNs), and two novel techniques based on support vector machines (SVMs) for multivariate data analysis: support vector regression (SVR) and least-squares support vector machines (LS-SVMs). The comparison is based on fourteen (14) different datasets: seven sets of gasoline data (density, benzene content, and fractional composition/boiling points), two sets of ethanol gasoline fuel data (density and ethanol content), one set of diesel fuel data (total sulfur content), three sets of petroleum (crude oil) macromolecules data (weight percentages of asphaltenes, resins, and paraffins), and one set of petroleum resins data (resins content). Vibrational (near-infrared, NIR) spectroscopic data are used to predict the properties and quality coefficients of gasoline, biofuel/biodiesel, diesel fuel, and other samples of interest. The four systems presented here range greatly in composition, properties, strength of intermolecular interactions (e.g., van der Waals forces, H-bonds), colloid structure, and phase behavior. Due to the high diversity of chemical systems studied, general conclusions about SVM regression methods can be made. We try to answer the following question: to what extent can SVM-based techniques replace ANN-based approaches in real-world (industrial/scientific) applications? The results show that both SVR and LS-SVM methods are comparable to ANNs in accuracy. Due to the much higher robustness of the former, the SVM-based approaches are recommended for practical (industrial) application. This has been shown to be especially true for complicated, highly nonlinear objects.
Modeling animal movements using stochastic differential equations
Haiganoush K. Preisler; Alan A. Ager; Bruce K. Johnson; John G. Kie
2004-01-01
We describe the use of bivariate stochastic differential equations (SDE) for modeling movements of 216 radiocollared female Rocky Mountain elk at the Starkey Experimental Forest and Range in northeastern Oregon. Spatially and temporally explicit vector fields were estimated using approximating difference equations and nonparametric regression techniques. Estimated...
Lu, Zhao; Sun, Jing; Butts, Kenneth
2014-05-01
Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yahya, Noorazrul, E-mail: noorazrul.yahya@research.uwa.edu.au; Ebert, Martin A.; Bulsara, Max
Purpose: Given the paucity of available data concerning radiotherapy-induced urinary toxicity, it is important to ensure derivation of the most robust models with superior predictive performance. This work explores multiple statistical-learning strategies for prediction of urinary symptoms following external beam radiotherapy of the prostate. Methods: The performance of logistic regression, elastic-net, support-vector machine, random forest, neural network, and multivariate adaptive regression splines (MARS) to predict urinary symptoms was analyzed using data from 754 participants accrued by TROG03.04-RADAR. Predictive features included dose-surface data, comorbidities, and medication-intake. Four symptoms were analyzed: dysuria, haematuria, incontinence, and frequency, each with three definitions (grade ≥more » 1, grade ≥ 2 and longitudinal) with event rate between 2.3% and 76.1%. Repeated cross-validations producing matched models were implemented. A synthetic minority oversampling technique was utilized in endpoints with rare events. Parameter optimization was performed on the training data. Area under the receiver operating characteristic curve (AUROC) was used to compare performance using sample size to detect differences of ≥0.05 at the 95% confidence level. Results: Logistic regression, elastic-net, random forest, MARS, and support-vector machine were the highest-performing statistical-learning strategies in 3, 3, 3, 2, and 1 endpoints, respectively. Logistic regression, MARS, elastic-net, random forest, neural network, and support-vector machine were the best, or were not significantly worse than the best, in 7, 7, 5, 5, 3, and 1 endpoints. The best-performing statistical model was for dysuria grade ≥ 1 with AUROC ± standard deviation of 0.649 ± 0.074 using MARS. For longitudinal frequency and dysuria grade ≥ 1, all strategies produced AUROC>0.6 while all haematuria endpoints and longitudinal incontinence models produced AUROC<0.6. Conclusions: Logistic regression and MARS were most likely to be the best-performing strategy for the prediction of urinary symptoms with elastic-net and random forest producing competitive results. The predictive power of the models was modest and endpoint-dependent. New features, including spatial dose maps, may be necessary to achieve better models.« less
Predicting Error Bars for QSAR Models
NASA Astrophysics Data System (ADS)
Schroeter, Timon; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-09-01
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D7 models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniques for the other modelling approaches.
Application of Support Vector Machine to Forex Monitoring
NASA Astrophysics Data System (ADS)
Kamruzzaman, Joarder; Sarker, Ruhul A.
Previous studies have demonstrated superior performance of artificial neural network (ANN) based forex forecasting models over traditional regression models. This paper applies support vector machines to build a forecasting model from the historical data using six simple technical indicators and presents a comparison with an ANN based model trained by scaled conjugate gradient (SCG) learning algorithm. The models are evaluated and compared on the basis of five commonly used performance metrics that measure closeness of prediction as well as correctness in directional change. Forecasting results of six different currencies against Australian dollar reveal superior performance of SVM model using simple linear kernel over ANN-SCG model in terms of all the evaluation metrics. The effect of SVM parameter selection on prediction performance is also investigated and analyzed.
Teutsch, T; Mesch, M; Giessen, H; Tarin, C
2015-01-01
In this contribution, a method to select discrete wavelengths that allow an accurate estimation of the glucose concentration in a biosensing system based on metamaterials is presented. The sensing concept is adapted to the particular application of ophthalmic glucose sensing by covering the metamaterial with a glucose-sensitive hydrogel and the sensor readout is performed optically. Due to the fact that in a mobile context a spectrometer is not suitable, few discrete wavelengths must be selected to estimate the glucose concentration. The developed selection methods are based on nonlinear support vector regression (SVR) models. Two selection methods are compared and it is shown that wavelengths selected by a sequential forward feature selection algorithm achieves an estimation improvement. The presented method can be easily applied to different metamaterial layouts and hydrogel configurations.
Aircraft Engine Thrust Estimator Design Based on GSA-LSSVM
NASA Astrophysics Data System (ADS)
Sheng, Hanlin; Zhang, Tianhong
2017-08-01
In view of the necessity of highly precise and reliable thrust estimator to achieve direct thrust control of aircraft engine, based on support vector regression (SVR), as well as least square support vector machine (LSSVM) and a new optimization algorithm - gravitational search algorithm (GSA), by performing integrated modelling and parameter optimization, a GSA-LSSVM-based thrust estimator design solution is proposed. The results show that compared to particle swarm optimization (PSO) algorithm, GSA can find unknown optimization parameter better and enables the model developed with better prediction and generalization ability. The model can better predict aircraft engine thrust and thus fulfills the need of direct thrust control of aircraft engine.
González Costa, J J; Reigosa, M J; Matías, J M; Covelo, E F
2017-09-01
The aim of this study was to model the sorption and retention of Cd, Cu, Ni, Pb and Zn in soils. To that extent, the sorption and retention of these metals were studied and the soil characterization was performed separately. Multiple stepwise regression was used to produce multivariate models with linear techniques and with support vector machines, all of which included 15 explanatory variables characterizing soils. When the R-squared values are represented, two different groups are noticed. Cr, Cu and Pb sorption and retention show a higher R-squared; the most explanatory variables being humified organic matter, Al oxides and, in some cases, cation-exchange capacity (CEC). The other group of metals (Cd, Ni and Zn) shows a lower R-squared, and clays are the most explanatory variables, including a percentage of vermiculite and slime. In some cases, quartz, plagioclase or hematite percentages also show some explanatory capacity. Support Vector Machine (SVM) regression shows that the different models are not as regular as in multiple regression in terms of number of variables, the regression for nickel adsorption being the one with the highest number of variables in its optimal model. On the other hand, there are cases where the most explanatory variables are the same for two metals, as it happens with Cd and Cr adsorption. A similar adsorption mechanism is thus postulated. These patterns of the introduction of variables in the model allow us to create explainability sequences. Those which are the most similar to the selectivity sequences obtained by Covelo (2005) are Mn oxides in multiple regression and change capacity in SVM. Among all the variables, the only one that is explanatory for all the metals after applying the maximum parsimony principle is the percentage of sand in the retention process. In the competitive model arising from the aforementioned sequences, the most intense competitiveness for the adsorption and retention of different metals appears between Cr and Cd, Cu and Zn in multiple regression; and between Cr and Cd in SVM regression. Copyright © 2017 Elsevier B.V. All rights reserved.
Rapid Detection of Volatile Oil in Mentha haplocalyx by Near-Infrared Spectroscopy and Chemometrics.
Yan, Hui; Guo, Cheng; Shao, Yang; Ouyang, Zhen
2017-01-01
Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . The effects of data pre-processing methods on the accuracy of the PLSR calibration models were investigated. The performance of the final model was evaluated according to the correlation coefficient ( R ) and root mean square error of prediction (RMSEP). For PLSR model, the best preprocessing method combination was first-order derivative, standard normal variate transformation (SNV), and mean centering, which had of 0.8805, of 0.8719, RMSEC of 0.091, and RMSEP of 0.097, respectively. The wave number variables linking to volatile oil are from 5500 to 4000 cm-1 by analyzing the loading weights and variable importance in projection (VIP) scores. For SVM model, six LVs (less than seven LVs in PLSR model) were adopted in model, and the result was better than PLSR model. The and were 0.9232 and 0.9202, respectively, with RMSEC and RMSEP of 0.084 and 0.082, respectively, which indicated that the predicted values were accurate and reliable. This work demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in M. haplocalyx . The quality of medicine directly links to clinical efficacy, thus, it is important to control the quality of Mentha haplocalyx . Near-infrared spectroscopy combined with partial least squares regression (PLSR) and support vector machine (SVM) was applied for the rapid determination of chemical component of volatile oil content in Mentha haplocalyx . For SVM model, 6 LVs (less than 7 LVs in PLSR model) were adopted in model, and the result was better than PLSR model. It demonstrated that near infrared reflectance spectroscopy with chemometrics could be used to rapidly detect the main content volatile oil in Mentha haplocalyx . Abbreviations used: 1 st der: First-order derivative; 2 nd der: Second-order derivative; LOO: Leave-one-out; LVs: Latent variables; MC: Mean centering, NIR: Near-infrared; NIRS: Near infrared spectroscopy; PCR: Principal component regression, PLSR: Partial least squares regression; RBF: Radial basis function; RMSEC: Root mean square error of cross validation, RMSEC: Root mean square error of calibration; RMSEP: Root mean square error of prediction; SNV: Standard normal variate transformation; SVM: Support vector machine; VIP: Variable Importance in projection.
The PX-EM algorithm for fast stable fitting of Henderson's mixed model
Foulley, Jean-Louis; Van Dyk, David A
2000-01-01
This paper presents procedures for implementing the PX-EM algorithm of Liu, Rubin and Wu to compute REML estimates of variance covariance components in Henderson's linear mixed models. The class of models considered encompasses several correlated random factors having the same vector length e.g., as in random regression models for longitudinal data analysis and in sire-maternal grandsire models for genetic evaluation. Numerical examples are presented to illustrate the procedures. Much better results in terms of convergence characteristics (number of iterations and time required for convergence) are obtained for PX-EM relative to the basic EM algorithm in the random regression. PMID:14736399
Risueño, José; Muñoz, Clara; Pérez-Cutillas, Pedro; Goyena, Elena; Gonzálvez, Moisés; Ortuño, María; Bernal, Luis Jesús; Ortiz, Juana; Alten, Bulent; Berriatua, Eduardo
2017-04-19
Leishmaniosis is associated with Phlebotomus sand fly vector density, but our knowledge of the environmental framework that regulates highly overdispersed vector abundance distributions is limited. We used a standardized sampling procedure in the bioclimatically diverse Murcia Region in Spain and multilevel regression models for count data to estimate P. perniciosus abundance in relation to environmental and anthropic factors. Twenty-five dog and sheep premises were sampled for sand flies using adhesive and light-attraction traps, from late May to early October 2015. Temperature, relative humidity and other animal- and premise-related data recorded on site and other environmental data were extracted from digital databases using a geographical information system. The relationship between sand fly abundance and explanatory variables was analysed using binomial regression models. The total number of sand flies captured, mostly with light-attraction traps, was 3,644 specimens, including 80% P. perniciosus, the main L. infantum vector in Spain. Abundance varied between and within zones and was positively associated with increasing altitude from 0 to 900 m above sea level, except from 500 to 700 m where it was low. Populations peaked in July and especially during a 3-day heat wave when relative humidity and wind speed plummeted. Regression models indicated that climate and not land use or soil characteristics have the greatest impact on this species density on a large geographical scale. In contrast, micro-environmental factors such as animal building characteristics and husbandry practices affect sand fly population size on a smaller scale. A standardised sampling procedure and statistical analysis for highly overdispersed distributions allow reliable estimation of P. perniciosus abundance and identification of environmental drivers. While climatic variables have the greatest impact at macro-environmental scale, anthropic factors may be determinant at a micro-geographical scale. These finding may be used to elaborate predictive distribution maps useful for vector and pathogen control programs.
Levine, Matthew E; Albers, David J; Hripcsak, George
2016-01-01
Time series analysis methods have been shown to reveal clinical and biological associations in data collected in the electronic health record. We wish to develop reliable high-throughput methods for identifying adverse drug effects that are easy to implement and produce readily interpretable results. To move toward this goal, we used univariate and multivariate lagged regression models to investigate associations between twenty pairs of drug orders and laboratory measurements. Multivariate lagged regression models exhibited higher sensitivity and specificity than univariate lagged regression in the 20 examples, and incorporating autoregressive terms for labs and drugs produced more robust signals in cases of known associations among the 20 example pairings. Moreover, including inpatient admission terms in the model attenuated the signals for some cases of unlikely associations, demonstrating how multivariate lagged regression models' explicit handling of context-based variables can provide a simple way to probe for health-care processes that confound analyses of EHR data.
Support vector regression methodology for estimating global solar radiation in Algeria
NASA Astrophysics Data System (ADS)
Guermoui, Mawloud; Rabehi, Abdelaziz; Gairaa, Kacem; Benkaciali, Said
2018-01-01
Accurate estimation of Daily Global Solar Radiation (DGSR) has been a major goal for solar energy applications. In this paper we show the possibility of developing a simple model based on the Support Vector Regression (SVM-R), which could be used to estimate DGSR on the horizontal surface in Algeria based only on sunshine ratio as input. The SVM model has been developed and tested using a data set recorded over three years (2005-2007). The data was collected at the Applied Research Unit for Renewable Energies (URAER) in Ghardaïa city. The data collected between 2005-2006 are used to train the model while the 2007 data are used to test the performance of the selected model. The measured and the estimated values of DGSR were compared during the testing phase statistically using the Root Mean Square Error (RMSE), Relative Square Error (rRMSE), and correlation coefficient (r2), which amount to 1.59(MJ/m2), 8.46 and 97,4%, respectively. The obtained results show that the SVM-R is highly qualified for DGSR estimation using only sunshine ratio.
McCarthy, Ann Marie; Kleiber, Charmaine; Ataman, Kaan; Street, W. Nick; Zimmerman, M. Bridget; Ersig, Anne L.
2012-01-01
This secondary data analysis used data mining methods to develop predictive models of child risk for distress during a healthcare procedure. Data used came from a study that predicted factors associated with children’s responses to an intravenous catheter insertion while parents provided distraction coaching. From the 255 items used in the primary study, 44 predictive items were identified through automatic feature selection and used to build support vector machine regression models. Models were validated using multiple cross-validation tests and by comparing variables identified as explanatory in the traditional versus support vector machine regression. Rule-based approaches were applied to the model outputs to identify overall risk for distress. A decision tree was then applied to evidence-based instructions for tailoring distraction to characteristics and preferences of the parent and child. The resulting decision support computer application, the Children, Parents and Distraction (CPaD), is being used in research. Future use will support practitioners in deciding the level and type of distraction intervention needed by a child undergoing a healthcare procedure. PMID:22805121
Predicting Error Bars for QSAR Models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Schroeter, Timon; Technische Universitaet Berlin, Department of Computer Science, Franklinstrasse 28/29, 10587 Berlin; Schwaighofer, Anton
2007-09-18
Unfavorable physicochemical properties often cause drug failures. It is therefore important to take lipophilicity and water solubility into account early on in lead discovery. This study presents log D{sub 7} models built using Gaussian Process regression, Support Vector Machines, decision trees and ridge regression algorithms based on 14556 drug discovery compounds of Bayer Schering Pharma. A blind test was conducted using 7013 new measurements from the last months. We also present independent evaluations using public data. Apart from accuracy, we discuss the quality of error bars that can be computed by Gaussian Process models, and ensemble and distance based techniquesmore » for the other modelling approaches.« less
Vector autoregressive models: A Gini approach
NASA Astrophysics Data System (ADS)
Mussard, Stéphane; Ndiaye, Oumar Hamady
2018-02-01
In this paper, it is proven that the usual VAR models may be performed in the Gini sense, that is, on a ℓ1 metric space. The Gini regression is robust to outliers. As a consequence, when data are contaminated by extreme values, we show that semi-parametric VAR-Gini regressions may be used to obtain robust estimators. The inference about the estimators is made with the ℓ1 norm. Also, impulse response functions and Gini decompositions for prevision errors are introduced. Finally, Granger's causality tests are properly derived based on U-statistics.
Rank-Optimized Logistic Matrix Regression toward Improved Matrix Data Classification.
Zhang, Jianguang; Jiang, Jianmin
2018-02-01
While existing logistic regression suffers from overfitting and often fails in considering structural information, we propose a novel matrix-based logistic regression to overcome the weakness. In the proposed method, 2D matrices are directly used to learn two groups of parameter vectors along each dimension without vectorization, which allows the proposed method to fully exploit the underlying structural information embedded inside the 2D matrices. Further, we add a joint [Formula: see text]-norm on two parameter matrices, which are organized by aligning each group of parameter vectors in columns. This added co-regularization term has two roles-enhancing the effect of regularization and optimizing the rank during the learning process. With our proposed fast iterative solution, we carried out extensive experiments. The results show that in comparison to both the traditional tensor-based methods and the vector-based regression methods, our proposed solution achieves better performance for matrix data classifications.
Du, Hongying; Wang, Jie; Yao, Xiaojun; Hu, Zhide
2009-01-01
The heuristic method (HM) and support vector machine (SVM) were used to construct quantitative structure-retention relationship models by a series of compounds to predict the gradient retention times of reversed-phase high-performance liquid chromatography (HPLC) in three different columns. The aims of this investigation were to predict the retention times of multifarious compounds, to find the main properties of the three columns, and to indicate the theory of separation procedures. In our method, we correlated the retention times of many diverse structural analytes in three columns (Symmetry C18, Chromolith, and SG-MIX) with their representative molecular descriptors, calculated from the molecular structures alone. HM was used to select the most important molecular descriptors and build linear regression models. Furthermore, non-linear regression models were built using the SVM method; the performance of the SVM models were better than that of the HM models, and the prediction results were in good agreement with the experimental values. This paper could give some insights into the factors that were likely to govern the gradient retention process of the three investigated HPLC columns, which could theoretically supervise the practical experiment.
Parra-Henao, Gabriel; Quirós-Gómez, Oscar; Jaramillo-O, Nicolas; Cardona, Ángela Segura
2016-04-01
Triatoma dimidiata (Hemiptera: Reduviidae) is a secondary vector of Trypanosoma cruzi in Colombia and represents an important epidemiological risk mainly in the central and oriental regions of the country where it occupies sylvatic, peridomestic, and intradomestic ecotopes, and because of this complex distribution, its distribution and abundance could be conditioned by environmental factors. In this work, we explored the relationship between T. dimidiata distribution and environmental factors in the northwest, northeast, and central zones of Colombia and developed predictive models of infestation in the country. The associations between the presence ofT. dimidiata and environmental variables were studied using logistic regression models and ecological niche modeling for a sample of villages in Colombia. The analysis was based on the information collected in field about the presence ofT. dimidiata and the environmental data for each village extracted from remote sensing images. The presence of Triatoma dimidiata(Latreille, 1811) was found to be significantly associated with the maximum vegetation index, minimum land surface temperature (LST), and the digital elevation for the statistical model. Temperature seasonality, annual precipitation, and vegetation index were the variables that most influenced the ecological niche model ofT. dimidiata distribution. The logistic regression model showed a good fit and predicted suitable habitats in the Andean and Caribbean regions, which agrees with the known distribution of the species, but predicted suitable habitats in the Pacific and Orinoco regions proposing new areas of research. Improved models to predict suitable habitats forT. dimidiata hold promise for spatial targeting of integrated vector management. © The American Society of Tropical Medicine and Hygiene.
Parra-Henao, Gabriel; Quirós-Gómez, Oscar; Jaramillo-O, Nicolas; Cardona, Ángela Segura
2016-01-01
Triatoma dimidiata (Hemiptera: Reduviidae) is a secondary vector of Trypanosoma cruzi in Colombia and represents an important epidemiological risk mainly in the central and oriental regions of the country where it occupies sylvatic, peridomestic, and intradomestic ecotopes, and because of this complex distribution, its distribution and abundance could be conditioned by environmental factors. In this work, we explored the relationship between T. dimidiata distribution and environmental factors in the northwest, northeast, and central zones of Colombia and developed predictive models of infestation in the country. The associations between the presence of T. dimidiata and environmental variables were studied using logistic regression models and ecological niche modeling for a sample of villages in Colombia. The analysis was based on the information collected in field about the presence of T. dimidiata and the environmental data for each village extracted from remote sensing images. The presence of Triatoma dimidiata (Latreille, 1811) was found to be significantly associated with the maximum vegetation index, minimum land surface temperature (LST), and the digital elevation for the statistical model. Temperature seasonality, annual precipitation, and vegetation index were the variables that most influenced the ecological niche model of T. dimidiata distribution. The logistic regression model showed a good fit and predicted suitable habitats in the Andean and Caribbean regions, which agrees with the known distribution of the species, but predicted suitable habitats in the Pacific and Orinoco regions proposing new areas of research. Improved models to predict suitable habitats for T. dimidiata hold promise for spatial targeting of integrated vector management. PMID:26856910
Marabel, Miguel; Alvarez-Taboada, Flor
2013-01-01
Aboveground biomass (AGB) is one of the strategic biophysical variables of interest in vegetation studies. The main objective of this study was to evaluate the Support Vector Machine (SVM) and Partial Least Squares Regression (PLSR) for estimating the AGB of grasslands from field spectrometer data and to find out which data pre-processing approach was the most suitable. The most accurate model to predict the total AGB involved PLSR and the Maximum Band Depth index derived from the continuum removed reflectance in the absorption features between 916–1,120 nm and 1,079–1,297 nm (R2 = 0.939, RMSE = 7.120 g/m2). Regarding the green fraction of the AGB, the Area Over the Minimum index derived from the continuum removed spectra provided the most accurate model overall (R2 = 0.939, RMSE = 3.172 g/m2). Identifying the appropriate absorption features was proved to be crucial to improve the performance of PLSR to estimate the total and green aboveground biomass, by using the indices derived from those spectral regions. Ordinary Least Square Regression could be used as a surrogate for the PLSR approach with the Area Over the Minimum index as the independent variable, although the resulting model would not be as accurate. PMID:23925082
Hoffman, Haydn; Lee, Sunghoon I; Garst, Jordan H; Lu, Derek S; Li, Charles H; Nagasawa, Daniel T; Ghalehsari, Nima; Jahanforouz, Nima; Razaghy, Mehrdad; Espinal, Marie; Ghavamrezaii, Amir; Paak, Brian H; Wu, Irene; Sarrafzadeh, Majid; Lu, Daniel C
2015-09-01
This study introduces the use of multivariate linear regression (MLR) and support vector regression (SVR) models to predict postoperative outcomes in a cohort of patients who underwent surgery for cervical spondylotic myelopathy (CSM). Currently, predicting outcomes after surgery for CSM remains a challenge. We recruited patients who had a diagnosis of CSM and required decompressive surgery with or without fusion. Fine motor function was tested preoperatively and postoperatively with a handgrip-based tracking device that has been previously validated, yielding mean absolute accuracy (MAA) results for two tracking tasks (sinusoidal and step). All patients completed Oswestry disability index (ODI) and modified Japanese Orthopaedic Association questionnaires preoperatively and postoperatively. Preoperative data was utilized in MLR and SVR models to predict postoperative ODI. Predictions were compared to the actual ODI scores with the coefficient of determination (R(2)) and mean absolute difference (MAD). From this, 20 patients met the inclusion criteria and completed follow-up at least 3 months after surgery. With the MLR model, a combination of the preoperative ODI score, preoperative MAA (step function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.452; MAD=0.0887; p=1.17 × 10(-3)). With the SVR model, a combination of preoperative ODI score, preoperative MAA (sinusoidal function), and symptom duration yielded the best prediction of postoperative ODI (R(2)=0.932; MAD=0.0283; p=5.73 × 10(-12)). The SVR model was more accurate than the MLR model. The SVR can be used preoperatively in risk/benefit analysis and the decision to operate. Copyright © 2015 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Ebrahimi, Hadi; Rajaee, Taher
2017-01-01
Simulation of groundwater level (GWL) fluctuations is an important task in management of groundwater resources. In this study, the effect of wavelet analysis on the training of the artificial neural network (ANN), multi linear regression (MLR) and support vector regression (SVR) approaches was investigated, and the ANN, MLR and SVR along with the wavelet-ANN (WNN), wavelet-MLR (WLR) and wavelet-SVR (WSVR) models were compared in simulating one-month-ahead of GWL. The only variable used to develop the models was the monthly GWL data recorded over a period of 11 years from two wells in the Qom plain, Iran. The results showed that decomposing GWL time series into several sub-time series, extremely improved the training of the models. For both wells 1 and 2, the Meyer and Db5 wavelets produced better results compared to the other wavelets; which indicated wavelet types had similar behavior in similar case studies. The optimal number of delays was 6 months, which seems to be due to natural phenomena. The best WNN model, using Meyer mother wavelet with two decomposition levels, simulated one-month-ahead with RMSE values being equal to 0.069 m and 0.154 m for wells 1 and 2, respectively. The RMSE values for the WLR model were 0.058 m and 0.111 m, and for WSVR model were 0.136 m and 0.060 m for wells 1 and 2, respectively.
The Geometry of Enhancement in Multiple Regression
ERIC Educational Resources Information Center
Waller, Niels G.
2011-01-01
In linear multiple regression, "enhancement" is said to occur when R[superscript 2] = b[prime]r greater than r[prime]r, where b is a p x 1 vector of standardized regression coefficients and r is a p x 1 vector of correlations between a criterion y and a set of standardized regressors, x. When p = 1 then b [is congruent to] r and…
Stable Local Volatility Calibration Using Kernel Splines
NASA Astrophysics Data System (ADS)
Coleman, Thomas F.; Li, Yuying; Wang, Cheng
2010-09-01
We propose an optimization formulation using L1 norm to ensure accuracy and stability in calibrating a local volatility function for option pricing. Using a regularization parameter, the proposed objective function balances the calibration accuracy with the model complexity. Motivated by the support vector machine learning, the unknown local volatility function is represented by a kernel function generating splines and the model complexity is controlled by minimizing the 1-norm of the kernel coefficient vector. In the context of the support vector regression for function estimation based on a finite set of observations, this corresponds to minimizing the number of support vectors for predictability. We illustrate the ability of the proposed approach to reconstruct the local volatility function in a synthetic market. In addition, based on S&P 500 market index option data, we demonstrate that the calibrated local volatility surface is simple and resembles the observed implied volatility surface in shape. Stability is illustrated by calibrating local volatility functions using market option data from different dates.
Conditional Density Estimation with HMM Based Support Vector Machines
NASA Astrophysics Data System (ADS)
Hu, Fasheng; Liu, Zhenqiu; Jia, Chunxin; Chen, Dechang
Conditional density estimation is very important in financial engineer, risk management, and other engineering computing problem. However, most regression models have a latent assumption that the probability density is a Gaussian distribution, which is not necessarily true in many real life applications. In this paper, we give a framework to estimate or predict the conditional density mixture dynamically. Through combining the Input-Output HMM with SVM regression together and building a SVM model in each state of the HMM, we can estimate a conditional density mixture instead of a single gaussian. With each SVM in each node, this model can be applied for not only regression but classifications as well. We applied this model to denoise the ECG data. The proposed method has the potential to apply to other time series such as stock market return predictions.
A Bayesian model averaging method for the derivation of reservoir operating rules
NASA Astrophysics Data System (ADS)
Zhang, Jingwen; Liu, Pan; Wang, Hao; Lei, Xiaohui; Zhou, Yanlai
2015-09-01
Because the intrinsic dynamics among optimal decision making, inflow processes and reservoir characteristics are complex, functional forms of reservoir operating rules are always determined subjectively. As a result, the uncertainty of selecting form and/or model involved in reservoir operating rules must be analyzed and evaluated. In this study, we analyze the uncertainty of reservoir operating rules using the Bayesian model averaging (BMA) model. Three popular operating rules, namely piecewise linear regression, surface fitting and a least-squares support vector machine, are established based on the optimal deterministic reservoir operation. These individual models provide three-member decisions for the BMA combination, enabling the 90% release interval to be estimated by the Markov Chain Monte Carlo simulation. A case study of China's the Baise reservoir shows that: (1) the optimal deterministic reservoir operation, superior to any reservoir operating rules, is used as the samples to derive the rules; (2) the least-squares support vector machine model is more effective than both piecewise linear regression and surface fitting; (3) BMA outperforms any individual model of operating rules based on the optimal trajectories. It is revealed that the proposed model can reduce the uncertainty of operating rules, which is of great potential benefit in evaluating the confidence interval of decisions.
Mocellin, Simone; Thompson, John F; Pasquali, Sandro; Montesco, Maria C; Pilati, Pierluigi; Nitti, Donato; Saw, Robyn P; Scolyer, Richard A; Stretch, Jonathan R; Rossi, Carlo R
2009-12-01
To improve selection for sentinel node (SN) biopsy (SNB) in patients with cutaneous melanoma using statistical models predicting SN status. About 80% of patients currently undergoing SNB are node negative. In the absence of conclusive evidence of a SNBassociated survival benefit, these patients may be over-treated. Here, we tested the efficiency of 4 different models in predicting SN status. The clinicopathologic data (age, gender, tumor thickness, Clark level, regression, ulceration, histologic subtype, and mitotic index) of 1132 melanoma patients who had undergone SNB at institutions in Italy and Australia were analyzed. Logistic regression, classification tree, random forest, and support vector machine models were fitted to the data. The predictive models were built with the aim of maximizing the negative predictive value (NPV) and reducing the rate of SNB procedures though minimizing the error rate. After cross-validation logistic regression, classification tree, random forest, and support vector machine predictive models obtained clinically relevant NPV (93.6%, 94.0%, 97.1%, and 93.0%, respectively), SNB reduction (27.5%, 29.8%, 18.2%, and 30.1%, respectively), and error rates (1.8%, 1.8%, 0.5%, and 2.1%, respectively). Using commonly available clinicopathologic variables, predictive models can preoperatively identify a proportion of patients ( approximately 25%) who might be spared SNB, with an acceptable (1%-2%) error. If validated in large prospective series, these models might be implemented in the clinical setting for improved patient selection, which ultimately would lead to better quality of life for patients and optimization of resource allocation for the health care system.
Nogales-Bueno, Julio; Ayala, Fernando; Hernández-Hierro, José Miguel; Rodríguez-Pulido, Francisco José; Echávarri, José Federico; Heredia, Francisco José
2015-05-06
Characteristic vector analysis has been applied to near-infrared spectra to extract the main spectral information from hyperspectral images. For this purpose, 3, 6, 9, and 12 characteristic vectors have been used to reconstruct the spectra, and root-mean-square errors (RMSEs) have been calculated to measure the differences between characteristic vector reconstructed spectra (CVRS) and hyperspectral imaging spectra (HIS). RMSE values obtained were 0.0049, 0.0018, 0.0012, and 0.0012 [log(1/R) units] for spectra allocated into the validation set, for 3, 6, 9, and 12 characteristic vectors, respectively. After that, calibration models have been developed and validated using the different groups of CVRS to predict skin total phenolic concentration, sugar concentration, titratable acidity, and pH by modified partial least-squares (MPLS) regression. The obtained results have been compared to those previously obtained from HIS. The models developed from the CVRS reconstructed from 12 characteristic vectors present similar values of coefficients of determination (RSQ) and standard errors of prediction (SEP) than the models developed from HIS. RSQ and SEP were 0.84 and 1.13 mg g(-1) of skin grape (expressed as gallic acid equivalents), 0.93 and 2.26 °Brix, 0.97 and 3.87 g L(-1) (expressed as tartaric acid equivalents), and 0.91 and 0.14 for skin total phenolic concentration, sugar concentration, titratable acidity, and pH, respectively, for the models developed from the CVRS reconstructed from 12 characteristic vectors.
Mixed kernel function support vector regression for global sensitivity analysis
NASA Astrophysics Data System (ADS)
Cheng, Kai; Lu, Zhenzhou; Wei, Yuhao; Shi, Yan; Zhou, Yicheng
2017-11-01
Global sensitivity analysis (GSA) plays an important role in exploring the respective effects of input variables on an assigned output response. Amongst the wide sensitivity analyses in literature, the Sobol indices have attracted much attention since they can provide accurate information for most models. In this paper, a mixed kernel function (MKF) based support vector regression (SVR) model is employed to evaluate the Sobol indices at low computational cost. By the proposed derivation, the estimation of the Sobol indices can be obtained by post-processing the coefficients of the SVR meta-model. The MKF is constituted by the orthogonal polynomials kernel function and Gaussian radial basis kernel function, thus the MKF possesses both the global characteristic advantage of the polynomials kernel function and the local characteristic advantage of the Gaussian radial basis kernel function. The proposed approach is suitable for high-dimensional and non-linear problems. Performance of the proposed approach is validated by various analytical functions and compared with the popular polynomial chaos expansion (PCE). Results demonstrate that the proposed approach is an efficient method for global sensitivity analysis.
Initial Flight Test Evaluation of the F-15 ACTIVE Axisymmetric Vectoring Nozzle Performance
NASA Technical Reports Server (NTRS)
Orme, John S.; Hathaway, Ross; Ferguson, Michael D.
1998-01-01
A full envelope database of a thrust-vectoring axisymmetric nozzle performance for the Pratt & Whitney Pitch/Yaw Balance Beam Nozzle (P/YBBN) is being developed using the F-15 Advanced Control Technology for Integrated Vehicles (ACTIVE) aircraft. At this time, flight research has been completed for steady-state pitch vector angles up to 20' at an altitude of 30,000 ft from low power settings to maximum afterburner power. The nozzle performance database includes vector forces, internal nozzle pressures, and temperatures all of which can be used for regression analysis modeling. The database was used to substantiate a set of nozzle performance data from wind tunnel testing and computational fluid dynamic analyses. Findings from initial flight research at Mach 0.9 and 1.2 are presented in this paper. The results show that vector efficiency is strongly influenced by power setting. A significant discrepancy in nozzle performance has been discovered between predicted and measured results during vectoring.
NASA Astrophysics Data System (ADS)
Stas, Michiel; Dong, Qinghan; Heremans, Stien; Zhang, Beier; Van Orshoven, Jos
2016-08-01
This paper compares two machine learning techniques to predict regional winter wheat yields. The models, based on Boosted Regression Trees (BRT) and Support Vector Machines (SVM), are constructed of Normalized Difference Vegetation Indices (NDVI) derived from low resolution SPOT VEGETATION satellite imagery. Three types of NDVI-related predictors were used: Single NDVI, Incremental NDVI and Targeted NDVI. BRT and SVM were first used to select features with high relevance for predicting the yield. Although the exact selections differed between the prefectures, certain periods with high influence scores for multiple prefectures could be identified. The same period of high influence stretching from March to June was detected by both machine learning methods. After feature selection, BRT and SVM models were applied to the subset of selected features for actual yield forecasting. Whereas both machine learning methods returned very low prediction errors, BRT seems to slightly but consistently outperform SVM.
Predicting pork loin intramuscular fat using computer vision system.
Liu, J-H; Sun, X; Young, J M; Bachmeier, L A; Newman, D J
2018-09-01
The objective of this study was to investigate the ability of computer vision system to predict pork intramuscular fat percentage (IMF%). Center-cut loin samples (n = 85) were trimmed of subcutaneous fat and connective tissue. Images were acquired and pixels were segregated to estimate image IMF% and 18 image color features for each image. Subjective IMF% was determined by a trained grader. Ether extract IMF% was calculated using ether extract method. Image color features and image IMF% were used as predictors for stepwise regression and support vector machine models. Results showed that subjective IMF% had a correlation of 0.81 with ether extract IMF% while the image IMF% had a 0.66 correlation with ether extract IMF%. Accuracy rates for regression models were 0.63 for stepwise and 0.75 for support vector machine. Although subjective IMF% has shown to have better prediction, results from computer vision system demonstrates the potential of being used as a tool in predicting pork IMF% in the future. Copyright © 2018 Elsevier Ltd. All rights reserved.
Lu, Chi-Jie; Chang, Chi-Chang
2014-01-01
Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting.
2014-01-01
Sales forecasting plays an important role in operating a business since it can be used to determine the required inventory level to meet consumer demand and avoid the problem of under/overstocking. Improving the accuracy of sales forecasting has become an important issue of operating a business. This study proposes a hybrid sales forecasting scheme by combining independent component analysis (ICA) with K-means clustering and support vector regression (SVR). The proposed scheme first uses the ICA to extract hidden information from the observed sales data. The extracted features are then applied to K-means algorithm for clustering the sales data into several disjoined clusters. Finally, the SVR forecasting models are applied to each group to generate final forecasting results. Experimental results from information technology (IT) product agent sales data reveal that the proposed sales forecasting scheme outperforms the three comparison models and hence provides an efficient alternative for sales forecasting. PMID:25045738
Liu, Xiaoyan; Li, Feng; Ding, Yongsheng; Zou, Ting; Wang, Lu; Hao, Kuangrong
2015-01-01
A hierarchical support vector regression (SVR) model (HSVRM) was employed to correlate the compositions and mechanical properties of bicomponent stents composed of poly(lactic-co-glycolic acid) (PGLA) film and poly(glycolic acid) (PGA) fibers for urethral repair for the first time. PGLA film and PGA fibers could provide ureteral stents with good compressive and tensile properties, respectively. In bicomponent stents, high film content led to high stiffness, while high fiber content resulted in poor compressional properties. To simplify the procedures to optimize the ratio of PGLA film and PGA fiber in the stents, a hierarchical support vector regression model (HSVRM) and particle swarm optimization (PSO) algorithm were used to construct relationships between the film-to-fiber weight ratio and the measured compressional/tensile properties of the stents. The experimental data and simulated data fit well, proving that the HSVRM could closely reflect the relationship between the component ratio and performance properties of the ureteral stents. PMID:28793658
ERIC Educational Resources Information Center
Hollingsworth, Holly H.
This study shows that the test statistic for Analysis of Covariance (ANCOVA) has a noncentral F-districution with noncentrality parameter equal to zero if and only if the regression planes are homogeneous and/or the vector of overall covariate means is the null vector. The effect of heterogeneous regression slope parameters is to either increase…
Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu
2016-01-01
We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1–10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633–0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926–0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation. PMID:27586851
NASA Astrophysics Data System (ADS)
Huang, Mengmeng; Wei, Yan; Wang, Jun; Zhang, Yu
2016-09-01
We used the support vector regression (SVR) approach to predict and unravel reduction/promotion effect of characteristic flavonoids on the acrylamide formation under a low-moisture Maillard reaction system. Results demonstrated the reduction/promotion effects by flavonoids at addition levels of 1-10000 μmol/L. The maximal inhibition rates (51.7%, 68.8% and 26.1%) and promote rates (57.7%, 178.8% and 27.5%) caused by flavones, flavonols and isoflavones were observed at addition levels of 100 μmol/L and 10000 μmol/L, respectively. The reduction/promotion effects were closely related to the change of trolox equivalent antioxidant capacity (ΔTEAC) and well predicted by triple ΔTEAC measurements via SVR models (R: 0.633-0.900). Flavonols exhibit stronger effects on the acrylamide formation than flavones and isoflavones as well as their O-glycosides derivatives, which may be attributed to the number and position of phenolic and 3-enolic hydroxyls. The reduction/promotion effects were well predicted by using optimized quantitative structure-activity relationship (QSAR) descriptors and SVR models (R: 0.926-0.994). Compared to artificial neural network and multi-linear regression models, SVR models exhibited better fitting performance for both TEAC-dependent and QSAR descriptor-dependent predicting work. These observations demonstrated that the SVR models are competent for predicting our understanding on the future use of natural antioxidants for decreasing the acrylamide formation.
NASA Astrophysics Data System (ADS)
Wang, Hongjin; Hsieh, Sheng-Jen; Peng, Bo; Zhou, Xunfei
2016-07-01
A method without requirements on knowledge about thermal properties of coatings or those of substrates will be interested in the industrial application. Supervised machine learning regressions may provide possible solution to the problem. This paper compares the performances of two regression models (artificial neural networks (ANN) and support vector machines for regression (SVM)) with respect to coating thickness estimations made based on surface temperature increments collected via time resolved thermography. We describe SVM roles in coating thickness prediction. Non-dimensional analyses are conducted to illustrate the effects of coating thicknesses and various factors on surface temperature increments. It's theoretically possible to correlate coating thickness with surface increment. Based on the analyses, the laser power is selected in such a way: during the heating, the temperature increment is high enough to determine the coating thickness variance but low enough to avoid surface melting. Sixty-one pain-coated samples with coating thicknesses varying from 63.5 μm to 571 μm are used to train models. Hyper-parameters of the models are optimized by 10-folder cross validation. Another 28 sets of data are then collected to test the performance of the three methods. The study shows that SVM can provide reliable predictions of unknown data, due to its deterministic characteristics, and it works well when used for a small input data group. The SVM model generates more accurate coating thickness estimates than the ANN model.
High dimensional linear regression models under long memory dependence and measurement error
NASA Astrophysics Data System (ADS)
Kaul, Abhishek
This dissertation consists of three chapters. The first chapter introduces the models under consideration and motivates problems of interest. A brief literature review is also provided in this chapter. The second chapter investigates the properties of Lasso under long range dependent model errors. Lasso is a computationally efficient approach to model selection and estimation, and its properties are well studied when the regression errors are independent and identically distributed. We study the case, where the regression errors form a long memory moving average process. We establish a finite sample oracle inequality for the Lasso solution. We then show the asymptotic sign consistency in this setup. These results are established in the high dimensional setup (p> n) where p can be increasing exponentially with n. Finally, we show the consistency, n½ --d-consistency of Lasso, along with the oracle property of adaptive Lasso, in the case where p is fixed. Here d is the memory parameter of the stationary error sequence. The performance of Lasso is also analysed in the present setup with a simulation study. The third chapter proposes and investigates the properties of a penalized quantile based estimator for measurement error models. Standard formulations of prediction problems in high dimension regression models assume the availability of fully observed covariates and sub-Gaussian and homogeneous model errors. This makes these methods inapplicable to measurement errors models where covariates are unobservable and observations are possibly non sub-Gaussian and heterogeneous. We propose weighted penalized corrected quantile estimators for the regression parameter vector in linear regression models with additive measurement errors, where unobservable covariates are nonrandom. The proposed estimators forgo the need for the above mentioned model assumptions. We study these estimators in both the fixed dimension and high dimensional sparse setups, in the latter setup, the dimensionality can grow exponentially with the sample size. In the fixed dimensional setting we provide the oracle properties associated with the proposed estimators. In the high dimensional setting, we provide bounds for the statistical error associated with the estimation, that hold with asymptotic probability 1, thereby providing the ℓ1-consistency of the proposed estimator. We also establish the model selection consistency in terms of the correctly estimated zero components of the parameter vector. A simulation study that investigates the finite sample accuracy of the proposed estimator is also included in this chapter.
NASA Astrophysics Data System (ADS)
Tang, Jie; Liu, Rong; Zhang, Yue-Li; Liu, Mou-Ze; Hu, Yong-Fang; Shao, Ming-Jie; Zhu, Li-Jun; Xin, Hua-Wen; Feng, Gui-Wen; Shang, Wen-Jun; Meng, Xiang-Guang; Zhang, Li-Rong; Ming, Ying-Zi; Zhang, Wei
2017-02-01
Tacrolimus has a narrow therapeutic window and considerable variability in clinical use. Our goal was to compare the performance of multiple linear regression (MLR) and eight machine learning techniques in pharmacogenetic algorithm-based prediction of tacrolimus stable dose (TSD) in a large Chinese cohort. A total of 1,045 renal transplant patients were recruited, 80% of which were randomly selected as the “derivation cohort” to develop dose-prediction algorithm, while the remaining 20% constituted the “validation cohort” to test the final selected algorithm. MLR, artificial neural network (ANN), regression tree (RT), multivariate adaptive regression splines (MARS), boosted regression tree (BRT), support vector regression (SVR), random forest regression (RFR), lasso regression (LAR) and Bayesian additive regression trees (BART) were applied and their performances were compared in this work. Among all the machine learning models, RT performed best in both derivation [0.71 (0.67-0.76)] and validation cohorts [0.73 (0.63-0.82)]. In addition, the ideal rate of RT was 4% higher than that of MLR. To our knowledge, this is the first study to use machine learning models to predict TSD, which will further facilitate personalized medicine in tacrolimus administration in the future.
Random forest models to predict aqueous solubility.
Palmer, David S; O'Boyle, Noel M; Glen, Robert C; Mitchell, John B O
2007-01-01
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
Nmor, Jephtha C; Sunahara, Toshihiko; Goto, Kensuke; Futami, Kyoko; Sonye, George; Akweywa, Peter; Dida, Gabriel; Minakawa, Noboru
2013-01-16
Identification of malaria vector breeding sites can enhance control activities. Although associations between malaria vector breeding sites and topography are well recognized, practical models that predict breeding sites from topographic information are lacking. We used topographic variables derived from remotely sensed Digital Elevation Models (DEMs) to model the breeding sites of malaria vectors. We further compared the predictive strength of two different DEMs and evaluated the predictability of various habitat types inhabited by Anopheles larvae. Using GIS techniques, topographic variables were extracted from two DEMs: 1) Shuttle Radar Topography Mission 3 (SRTM3, 90-m resolution) and 2) the Advanced Spaceborne Thermal Emission Reflection Radiometer Global DEM (ASTER, 30-m resolution). We used data on breeding sites from an extensive field survey conducted on an island in western Kenya in 2006. Topographic variables were extracted for 826 breeding sites and for 4520 negative points that were randomly assigned. Logistic regression modelling was applied to characterize topographic features of the malaria vector breeding sites and predict their locations. Model accuracy was evaluated using the area under the receiver operating characteristics curve (AUC). All topographic variables derived from both DEMs were significantly correlated with breeding habitats except for the aspect of SRTM. The magnitude and direction of correlation for each variable were similar in the two DEMs. Multivariate models for SRTM and ASTER showed similar levels of fit indicated by Akaike information criterion (3959.3 and 3972.7, respectively), though the former was slightly better than the latter. The accuracy of prediction indicated by AUC was also similar in SRTM (0.758) and ASTER (0.755) in the training site. In the testing site, both SRTM and ASTER models showed higher AUC in the testing sites than in the training site (0.829 and 0.799, respectively). The predictability of habitat types varied. Drains, foot-prints, puddles and swamp habitat types were most predictable. Both SRTM and ASTER models had similar predictive potentials, which were sufficiently accurate to predict vector habitats. The free availability of these DEMs suggests that topographic predictive models could be widely used by vector control managers in Africa to complement malaria control strategies.
Major vectors and vector-borne diseases in small ruminants in Ethiopia: A systematic review.
Asmare, Kassahun; Abayneh, Takele; Sibhat, Berhanu; Shiferaw, Dessie; Szonyi, Barbara; Krontveit, Randi I; Skjerve, Eystein; Wieland, Barbara
2017-06-01
Vector-borne diseases are among major health constraints of small ruminant in Ethiopia. While various studies on single vector-borne diseases or presence of vectors have been conducted, no summarized evidence is available on the occurrence of these diseases and the related vectors. This systematic literature review provides a comprehensive summary on major vectors and vector-borne diseases in small ruminants in Ethiopia. Search for published and unpublished literature was conducted between 8th of January and 25th of June 2015. The search was both manual and electronic. The databases used in electronic search were PubMed, Web of Science, CAB Direct and AJOL. For most of the vector-borne diseases, the summary was limited to narrative synthesis due to lack of sufficient data. Meta-analysis was computed for trypanosomosis and dermatophilosis while meta-regression and sensitivity analysis was done only for trypanososmosis due to lack of sufficient reports on dermatophilosis. Owing emphasis to their vector role, ticks and flies were summarized narratively at genera/species level. In line with inclusion criteria, out of 106 initially identified research reports 43 peer-reviewed articles passed the quality assessment. Data on 7 vector-borne diseases were extracted at species and region level from each source. Accordingly, the pooled prevalence estimate of trypanosomosis was 3.7% with 95% confidence interval (CI) 2.8, 4.9), while that of dermatophilosis was 3.1% (95% CI: 1.6, 6.0). The in-between study variance noted for trypanosomosis was statistically significant (p<0.05). Among the three covariates considered for meta-regression, only one (species) fitted the final model significantly (p<0.05) and explained 65.44% of the between studies variance (R 2 ). The prevalence in sheep (5.5%) increased nearly by 34% compared to goats (2.9%). The parasitic presence in blood was documented for babesiosis (3.7% in goats); and anaplasmosis (3.9% in sheep). Serological evidence was retrieved for bluetongue ranging from 34.1% to 46.67% in sheep, and coxiellosis was 10.4% in goats. There was also molecular evidence on the presence of theileriosis in sheep (93%, n=160) and goats (1.9%, n=265). Regarding vectors of veterinary importance, 14 species of ticks in five genera, four species of Glossina and 4 genera of biting flies were reported. Despite the evidence on presence of various vectors including ticks, flies, mosquitoes and midges, studies on vector-borne diseases in Ethiopia are surprisingly rare, especially considering risks related to climate change, which is likely to affect distribution of vectors. Thus better evidence on the current situation is urgently needed in order to prevent spread and to model future distribution scenarios. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Baydaroğlu, Özlem; Koçak, Kasım; Duran, Kemal
2018-06-01
Prediction of water amount that will enter the reservoirs in the following month is of vital importance especially for semi-arid countries like Turkey. Climate projections emphasize that water scarcity will be one of the serious problems in the future. This study presents a methodology for predicting river flow for the subsequent month based on the time series of observed monthly river flow with hybrid models of support vector regression (SVR). Monthly river flow over the period 1940-2012 observed for the Kızılırmak River in Turkey has been used for training the method, which then has been applied for predictions over a period of 3 years. SVR is a specific implementation of support vector machines (SVMs), which transforms the observed input data time series into a high-dimensional feature space (input matrix) by way of a kernel function and performs a linear regression in this space. SVR requires a special input matrix. The input matrix was produced by wavelet transforms (WT), singular spectrum analysis (SSA), and a chaotic approach (CA) applied to the input time series. WT convolutes the original time series into a series of wavelets, and SSA decomposes the time series into a trend, an oscillatory and a noise component by singular value decomposition. CA uses a phase space formed by trajectories, which represent the dynamics producing the time series. These three methods for producing the input matrix for the SVR proved successful, while the SVR-WT combination resulted in the highest coefficient of determination and the lowest mean absolute error.
SNPs selection using support vector regression and genetic algorithms in GWAS
2014-01-01
Introduction This paper proposes a new methodology to simultaneously select the most relevant SNPs markers for the characterization of any measurable phenotype described by a continuous variable using Support Vector Regression with Pearson Universal kernel as fitness function of a binary genetic algorithm. The proposed methodology is multi-attribute towards considering several markers simultaneously to explain the phenotype and is based jointly on statistical tools, machine learning and computational intelligence. Results The suggested method has shown potential in the simulated database 1, with additive effects only, and real database. In this simulated database, with a total of 1,000 markers, and 7 with major effect on the phenotype and the other 993 SNPs representing the noise, the method identified 21 markers. Of this total, 5 are relevant SNPs between the 7 but 16 are false positives. In real database, initially with 50,752 SNPs, we have reduced to 3,073 markers, increasing the accuracy of the model. In the simulated database 2, with additive effects and interactions (epistasis), the proposed method matched to the methodology most commonly used in GWAS. Conclusions The method suggested in this paper demonstrates the effectiveness in explaining the real phenotype (PTA for milk), because with the application of the wrapper based on genetic algorithm and Support Vector Regression with Pearson Universal, many redundant markers were eliminated, increasing the prediction and accuracy of the model on the real database without quality control filters. The PUK demonstrated that it can replicate the performance of linear and RBF kernels. PMID:25573332
Kwon, Deukwoo; Hoffman, F Owen; Moroz, Brian E; Simon, Steven L
2016-02-10
Most conventional risk analysis methods rely on a single best estimate of exposure per person, which does not allow for adjustment for exposure-related uncertainty. Here, we propose a Bayesian model averaging method to properly quantify the relationship between radiation dose and disease outcomes by accounting for shared and unshared uncertainty in estimated dose. Our Bayesian risk analysis method utilizes multiple realizations of sets (vectors) of doses generated by a two-dimensional Monte Carlo simulation method that properly separates shared and unshared errors in dose estimation. The exposure model used in this work is taken from a study of the risk of thyroid nodules among a cohort of 2376 subjects who were exposed to fallout from nuclear testing in Kazakhstan. We assessed the performance of our method through an extensive series of simulations and comparisons against conventional regression risk analysis methods. When the estimated doses contain relatively small amounts of uncertainty, the Bayesian method using multiple a priori plausible draws of dose vectors gave similar results to the conventional regression-based methods of dose-response analysis. However, when large and complex mixtures of shared and unshared uncertainties are present, the Bayesian method using multiple dose vectors had significantly lower relative bias than conventional regression-based risk analysis methods and better coverage, that is, a markedly increased capability to include the true risk coefficient within the 95% credible interval of the Bayesian-based risk estimate. An evaluation of the dose-response using our method is presented for an epidemiological study of thyroid disease following radiation exposure. Copyright © 2015 John Wiley & Sons, Ltd.
Changes in Black-legged Tick Population in New England with Future Climate Change
NASA Astrophysics Data System (ADS)
Krishnan, S.; Huber, M.
2015-12-01
Lyme disease is one of the most frequently reported vector-borne diseases in the United States. In the Northeastern United States, vector transmission is maintained in a horizontal transmission cycle between the vector, the black-legged ticks, and the vertebrate reservoir hosts, which include white-tailed deer, rodents and other medium to large sized mammals. Predicting how vector populations change with future climate change is critical to understanding disease spread in the future, and for developing suitable regional adaptation strategies. For the United States, these predictions have mostly been made using regressions based on field and lab studies, or using spatial suitability studies. However, the relation between tick populations at various life-cycle stages and climate variables are complex, necessitating a mechanistic approach. In this study, we present a framework for driving a mechanistic tick population model with high-resolution regional climate modeling projections. The goal is to estimate changes in black-legged tick populations in New England for the 21st century. The tick population model used is based on the mechanistic approach of Ogden et al., (2005) developed for Canada. Dynamically downscaled climate projections at a 3-kms resolution using the Weather and Research Forecasting Model (WRF) are used to drive the tick population model.
Regression-assisted deconvolution.
McIntyre, Julie; Stefanski, Leonard A
2011-06-30
We present a semi-parametric deconvolution estimator for the density function of a random variable biX that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean-variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length. Copyright © 2011 John Wiley & Sons, Ltd.
A Fast Vector Radiative Transfer Model for Atmospheric and Oceanic Remote Sensing
NASA Astrophysics Data System (ADS)
Ding, J.; Yang, P.; King, M. D.; Platnick, S. E.; Meyer, K.
2017-12-01
A fast vector radiative transfer model is developed in support of atmospheric and oceanic remote sensing. This model is capable of simulating the Stokes vector observed at the top of the atmosphere (TOA) and the terrestrial surface by considering absorption, scattering, and emission. The gas absorption is parameterized in terms of atmospheric gas concentrations, temperature, and pressure. The parameterization scheme combines a regression method and the correlated-K distribution method, and can easily integrate with multiple scattering computations. The approach is more than four orders of magnitude faster than a line-by-line radiative transfer model with errors less than 0.5% in terms of transmissivity. A two-component approach is utilized to solve the vector radiative transfer equation (VRTE). The VRTE solver separates the phase matrices of aerosol and cloud into forward and diffuse parts and thus the solution is also separated. The forward solution can be expressed by a semi-analytical equation based on the small-angle approximation, and serves as the source of the diffuse part. The diffuse part is solved by the adding-doubling method. The adding-doubling implementation is computationally efficient because the diffuse component needs much fewer spherical function expansion terms. The simulated Stokes vector at both the TOA and the surface have comparable accuracy compared with the counterparts based on numerically rigorous methods.
van der Ploeg, Tjeerd; Nieboer, Daan; Steyerberg, Ewout W
2016-10-01
Prediction of medical outcomes may potentially benefit from using modern statistical modeling techniques. We aimed to externally validate modeling strategies for prediction of 6-month mortality of patients suffering from traumatic brain injury (TBI) with predictor sets of increasing complexity. We analyzed individual patient data from 15 different studies including 11,026 TBI patients. We consecutively considered a core set of predictors (age, motor score, and pupillary reactivity), an extended set with computed tomography scan characteristics, and a further extension with two laboratory measurements (glucose and hemoglobin). With each of these sets, we predicted 6-month mortality using default settings with five statistical modeling techniques: logistic regression (LR), classification and regression trees, random forests (RFs), support vector machines (SVM) and neural nets. For external validation, a model developed on one of the 15 data sets was applied to each of the 14 remaining sets. This process was repeated 15 times for a total of 630 validations. The area under the receiver operating characteristic curve (AUC) was used to assess the discriminative ability of the models. For the most complex predictor set, the LR models performed best (median validated AUC value, 0.757), followed by RF and support vector machine models (median validated AUC value, 0.735 and 0.732, respectively). With each predictor set, the classification and regression trees models showed poor performance (median validated AUC value, <0.7). The variability in performance across the studies was smallest for the RF- and LR-based models (inter quartile range for validated AUC values from 0.07 to 0.10). In the area of predicting mortality from TBI, nonlinear and nonadditive effects are not pronounced enough to make modern prediction methods beneficial. Copyright © 2016 Elsevier Inc. All rights reserved.
Srimath-Tirumula-Peddinti, Ravi Chandra Pavan Kumar; Neelapu, Nageswara Rao Reddy; Sidagam, Naresh
2015-01-01
Malarial incidence, severity, dynamics and distribution of malaria are strongly determined by climatic factors, i.e., temperature, precipitation, and relative humidity. The objectives of the current study were to analyse and model the relationships among climate, vector and malaria disease in district of Visakhapatnam, India to understand malaria transmission mechanism (MTM). Epidemiological, vector and climate data were analysed for the years 2005 to 2011 in Visakhapatnam to understand the magnitude, trends and seasonal patterns of the malarial disease. Statistical software MINITAB ver. 14 was used for performing correlation, linear and multiple regression analysis. Perennial malaria disease incidence and mosquito population was observed in the district of Visakhapatnam with peaks in seasons. All the climatic variables have a significant influence on disease incidence as well as on mosquito populations. Correlation coefficient analysis, seasonal index and seasonal analysis demonstrated significant relationships among climatic factors, mosquito population and malaria disease incidence in the district of Visakhapatnam, India. Multiple regression and ARIMA (I) models are best suited models for modeling and prediction of disease incidences and mosquito population. Predicted values of average temperature, mosquito population and malarial cases increased along with the year. Developed MTM algorithm observed a major MTM cycle following the June to August rains and occurring between June to September and minor MTM cycles following March to April rains and occurring between March to April in the district of Visakhapatnam. Fluctuations in climatic factors favored an increase in mosquito populations and thereby increasing the number of malarial cases. Rainfall, temperatures (20°C to 33°C) and humidity (66% to 81%) maintained a warmer, wetter climate for mosquito growth, parasite development and malaria transmission. Changes in climatic factors influence malaria directly by modifying the behaviour and geographical distribution of vectors and by changing the length of the life cycle of the parasite.
Srimath-Tirumula-Peddinti, Ravi Chandra Pavan Kumar; Neelapu, Nageswara Rao Reddy; Sidagam, Naresh
2015-01-01
Background Malarial incidence, severity, dynamics and distribution of malaria are strongly determined by climatic factors, i.e., temperature, precipitation, and relative humidity. The objectives of the current study were to analyse and model the relationships among climate, vector and malaria disease in district of Visakhapatnam, India to understand malaria transmission mechanism (MTM). Methodology Epidemiological, vector and climate data were analysed for the years 2005 to 2011 in Visakhapatnam to understand the magnitude, trends and seasonal patterns of the malarial disease. Statistical software MINITAB ver. 14 was used for performing correlation, linear and multiple regression analysis. Results/Findings Perennial malaria disease incidence and mosquito population was observed in the district of Visakhapatnam with peaks in seasons. All the climatic variables have a significant influence on disease incidence as well as on mosquito populations. Correlation coefficient analysis, seasonal index and seasonal analysis demonstrated significant relationships among climatic factors, mosquito population and malaria disease incidence in the district of Visakhapatnam, India. Multiple regression and ARIMA (I) models are best suited models for modeling and prediction of disease incidences and mosquito population. Predicted values of average temperature, mosquito population and malarial cases increased along with the year. Developed MTM algorithm observed a major MTM cycle following the June to August rains and occurring between June to September and minor MTM cycles following March to April rains and occurring between March to April in the district of Visakhapatnam. Fluctuations in climatic factors favored an increase in mosquito populations and thereby increasing the number of malarial cases. Rainfall, temperatures (20°C to 33°C) and humidity (66% to 81%) maintained a warmer, wetter climate for mosquito growth, parasite development and malaria transmission. Conclusions/Significance Changes in climatic factors influence malaria directly by modifying the behaviour and geographical distribution of vectors and by changing the length of the life cycle of the parasite. PMID:26110279
Baba, Hiromi; Takahara, Jun-ichi; Yamashita, Fumiyoshi; Hashida, Mitsuru
2015-11-01
The solvent effect on skin permeability is important for assessing the effectiveness and toxicological risk of new dermatological formulations in pharmaceuticals and cosmetics development. The solvent effect occurs by diverse mechanisms, which could be elucidated by efficient and reliable prediction models. However, such prediction models have been hampered by the small variety of permeants and mixture components archived in databases and by low predictive performance. Here, we propose a solution to both problems. We first compiled a novel large database of 412 samples from 261 structurally diverse permeants and 31 solvents reported in the literature. The data were carefully screened to ensure their collection under consistent experimental conditions. To construct a high-performance predictive model, we then applied support vector regression (SVR) and random forest (RF) with greedy stepwise descriptor selection to our database. The models were internally and externally validated. The SVR achieved higher performance statistics than RF. The (externally validated) determination coefficient, root mean square error, and mean absolute error of SVR were 0.899, 0.351, and 0.268, respectively. Moreover, because all descriptors are fully computational, our method can predict as-yet unsynthesized compounds. Our high-performance prediction model offers an attractive alternative to permeability experiments for pharmaceutical and cosmetic candidate screening and optimizing skin-permeable topical formulations.
Optimization of fixture layouts of glass laser optics using multiple kernel regression.
Su, Jianhua; Cao, Enhua; Qiao, Hong
2014-05-10
We aim to build an integrated fixturing model to describe the structural properties and thermal properties of the support frame of glass laser optics. Therefore, (a) a near global optimal set of clamps can be computed to minimize the surface shape error of the glass laser optic based on the proposed model, and (b) a desired surface shape error can be obtained by adjusting the clamping forces under various environmental temperatures based on the model. To construct the model, we develop a new multiple kernel learning method and call it multiple kernel support vector functional regression. The proposed method uses two layer regressions to group and order the data sources by the weights of the kernels and the factors of the layers. Because of that, the influences of the clamps and the temperature can be evaluated by grouping them into different layers.
NASA Astrophysics Data System (ADS)
Zhai, Mengting; Chen, Yan; Li, Jing; Zhou, Jun
2017-12-01
The molecular electrongativity distance vector (MEDV-13) was used to describe the molecular structure of benzyl ether diamidine derivatives in this paper, Based on MEDV-13, The three-parameter (M 3, M 15, M 47) QSAR model of insecticidal activity (pIC 50) for 60 benzyl ether diamidine derivatives was constructed by leaps-and-bounds regression (LBR) . The traditional correlation coefficient (R) and the cross-validation correlation coefficient (R CV ) were 0.975 and 0.971, respectively. The robustness of the regression model was validated by Jackknife method, the correlation coefficient R were between 0.971 and 0.983. Meanwhile, the independent variables in the model were tested to be no autocorrelation. The regression results indicate that the model has good robust and predictive capabilities. The research would provide theoretical guidance for the development of new generation of anti African trypanosomiasis drugs with efficiency and low toxicity.
Wang, Lai; Tian, Fang; Arias, Ana; Yang, Mingjie; Sharifi, Behrooz G; Shah, Prediman K
2016-05-01
Apolipoprotein A-1 (Apo A-I) Milano, a naturally occurring Arg173to Cys mutant of Apo A-1, has been shown to reduce atherosclerosis in animal models and in a small phase 2 human trial. We have shown the superior atheroprotective effects of Apo A-I Milano (Apo A-IM) gene compared to wild-type Apo A-I gene using transplantation of retrovirally transduced bone marrow in Apo A-I/Apo E null mice. In this study, we compared the effect of dietary lipid lowering versus lipid lowering plus Apo A-IM gene transfer using recombinant adeno-associated virus (rAAV) 8 as vectors on atherosclerosis regression in Apo A-I/Apo E null mice. All mice were fed a high-cholesterol diet from age of 6 weeks until week 20, and at 20 weeks, 10 mice were euthanized to determine the extent of atherosclerosis. After 20 weeks, an additional 20 mice were placed on either a low-cholesterol diet plus empty rAAV (n = 10) to serve as controls or low-cholesterol diet plus 1 single intravenous injection of 1.2 × 10(12)vector genomes of adeno-associated virus (AAV) 8 vectors expressing Apo A-IM (n = 10). At the 40 week time point, intravenous AAV8 Apo A-IM recipients showed a significant regression of atherosclerosis in the whole aorta (P< .01), aortic sinuses (P< .05), and brachiocephalic arteries (P< .05) compared to 20-week-old mice, whereas low-cholesterol diet plus empty vector control group showed no significant regression in lesion size. Immunostaining showed that compared to the 20-week-old mice, there was a significantly reduced macrophage content in the brachiocephalic (P< .05) and aortic sinus plaques (P< .05) of AAV8 Apo A-IM recipients. These data show that although dietary-mediated cholesterol lowering halts progression of atherosclerosis, it does not induce regression, whereas combination of low-cholesterol diet and AAV8 mediated Apo A-I Milano gene therapy induces rapid and significant regression of atherosclerosis in mice. These data provide support for the potential feasibility of this approach for atherosclerosis regression. © The Author(s) 2015.
Sun, Lili; Zhou, Liping; Yu, Yu; Lan, Yukun; Li, Zhiliang
2007-01-01
Polychlorinated diphenyl ethers (PCDEs) have received more and more concerns as a group of ubiquitous potential persistent organic pollutants (POPs). By using molecular electronegativity distance vector (MEDV-4), multiple linear regression (MLR) models are developed for sub-cooled liquid vapor pressures (P(L)), n-octanol/water partition coefficients (K(OW)) and sub-cooled liquid water solubilities (S(W,L)) of 209 PCDEs and diphenyl ether. The correlation coefficients (R) and the leave-one-out cross-validation (LOO) correlation coefficients (R(CV)) of all the 6-descriptor models for logP(L), logK(OW) and logS(W,L) are more than 0.98. By using stepwise multiple regression (SMR), the descriptors are selected and the resulting models are 5-descriptor model for logP(L), 4-descriptor model for logK(OW), and 6-descriptor model for logS(W,L), respectively. All these models exhibit excellent estimate capabilities for internal sample set and good predictive capabilities for external samples set. The consistency between observed and estimated/predicted values for logP(L) is the best (R=0.996, R(CV)=0.996), followed by logK(OW) (R=0.992, R(CV)=0.992) and logS(W,L) (R=0.983, R(CV)=0.980). By using MEDV-4 descriptors, the QSPR models can be used for prediction and the model predictions can hence extend the current database of experimental values.
NASA Astrophysics Data System (ADS)
Naguib, Ibrahim A.; Darwish, Hany W.
2012-02-01
A comparison between support vector regression (SVR) and Artificial Neural Networks (ANNs) multivariate regression methods is established showing the underlying algorithm for each and making a comparison between them to indicate the inherent advantages and limitations. In this paper we compare SVR to ANN with and without variable selection procedure (genetic algorithm (GA)). To project the comparison in a sensible way, the methods are used for the stability indicating quantitative analysis of mixtures of mebeverine hydrochloride and sulpiride in binary mixtures as a case study in presence of their reported impurities and degradation products (summing up to 6 components) in raw materials and pharmaceutical dosage form via handling the UV spectral data. For proper analysis, a 6 factor 5 level experimental design was established resulting in a training set of 25 mixtures containing different ratios of the interfering species. An independent test set consisting of 5 mixtures was used to validate the prediction ability of the suggested models. The proposed methods (linear SVR (without GA) and linear GA-ANN) were successfully applied to the analysis of pharmaceutical tablets containing mebeverine hydrochloride and sulpiride mixtures. The results manifest the problem of nonlinearity and how models like the SVR and ANN can handle it. The methods indicate the ability of the mentioned multivariate calibration models to deconvolute the highly overlapped UV spectra of the 6 components' mixtures, yet using cheap and easy to handle instruments like the UV spectrophotometer.
Haptic exploration of fingertip-sized geometric features using a multimodal tactile sensor
NASA Astrophysics Data System (ADS)
Ponce Wong, Ruben D.; Hellman, Randall B.; Santos, Veronica J.
2014-06-01
Haptic perception remains a grand challenge for artificial hands. Dexterous manipulators could be enhanced by "haptic intelligence" that enables identification of objects and their features via touch alone. Haptic perception of local shape would be useful when vision is obstructed or when proprioceptive feedback is inadequate, as observed in this study. In this work, a robot hand outfitted with a deformable, bladder-type, multimodal tactile sensor was used to replay four human-inspired haptic "exploratory procedures" on fingertip-sized geometric features. The geometric features varied by type (bump, pit), curvature (planar, conical, spherical), and footprint dimension (1.25 - 20 mm). Tactile signals generated by active fingertip motions were used to extract key parameters for use as inputs to supervised learning models. A support vector classifier estimated order of curvature while support vector regression models estimated footprint dimension once curvature had been estimated. A distal-proximal stroke (along the long axis of the finger) enabled estimation of order of curvature with an accuracy of 97%. Best-performing, curvature-specific, support vector regression models yielded R2 values of at least 0.95. While a radial-ulnar stroke (along the short axis of the finger) was most helpful for estimating feature type and size for planar features, a rolling motion was most helpful for conical and spherical features. The ability to haptically perceive local shape could be used to advance robot autonomy and provide haptic feedback to human teleoperators of devices ranging from bomb defusal robots to neuroprostheses.
Liu, Bing-Chun; Binaykia, Arihant; Chang, Pei-Chann; Tiwari, Manoj Kumar; Tsao, Cheng-Chin
2017-01-01
Today, China is facing a very serious issue of Air Pollution due to its dreadful impact on the human health as well as the environment. The urban cities in China are the most affected due to their rapid industrial and economic growth. Therefore, it is of extreme importance to come up with new, better and more reliable forecasting models to accurately predict the air quality. This paper selected Beijing, Tianjin and Shijiazhuang as three cities from the Jingjinji Region for the study to come up with a new model of collaborative forecasting using Support Vector Regression (SVR) for Urban Air Quality Index (AQI) prediction in China. The present study is aimed to improve the forecasting results by minimizing the prediction error of present machine learning algorithms by taking into account multiple city multi-dimensional air quality information and weather conditions as input. The results show that there is a decrease in MAPE in case of multiple city multi-dimensional regression when there is a strong interaction and correlation of the air quality characteristic attributes with AQI. Also, the geographical location is found to play a significant role in Beijing, Tianjin and Shijiazhuang AQI prediction. PMID:28708836
Can Emotional and Behavioral Dysregulation in Youth Be Decoded from Functional Neuroimaging?
Portugal, Liana C L; Rosa, Maria João; Rao, Anil; Bebko, Genna; Bertocci, Michele A; Hinze, Amanda K; Bonar, Lisa; Almeida, Jorge R C; Perlman, Susan B; Versace, Amelia; Schirda, Claudiu; Travis, Michael; Gill, Mary Kay; Demeter, Christine; Diwadkar, Vaibhav A; Ciuffetelli, Gary; Rodriguez, Eric; Forbes, Erika E; Sunshine, Jeffrey L; Holland, Scott K; Kowatch, Robert A; Birmaher, Boris; Axelson, David; Horwitz, Sarah M; Arnold, Eugene L; Fristad, Mary A; Youngstrom, Eric A; Findling, Robert L; Pereira, Mirtes; Oliveira, Leticia; Phillips, Mary L; Mourao-Miranda, Janaina
2016-01-01
High comorbidity among pediatric disorders characterized by behavioral and emotional dysregulation poses problems for diagnosis and treatment, and suggests that these disorders may be better conceptualized as dimensions of abnormal behaviors. Furthermore, identifying neuroimaging biomarkers related to dimensional measures of behavior may provide targets to guide individualized treatment. We aimed to use functional neuroimaging and pattern regression techniques to determine whether patterns of brain activity could accurately decode individual-level severity on a dimensional scale measuring behavioural and emotional dysregulation at two different time points. A sample of fifty-seven youth (mean age: 14.5 years; 32 males) was selected from a multi-site study of youth with parent-reported behavioral and emotional dysregulation. Participants performed a block-design reward paradigm during functional Magnetic Resonance Imaging (fMRI). Pattern regression analyses consisted of Relevance Vector Regression (RVR) and two cross-validation strategies implemented in the Pattern Recognition for Neuroimaging toolbox (PRoNTo). Medication was treated as a binary confounding variable. Decoded and actual clinical scores were compared using Pearson's correlation coefficient (r) and mean squared error (MSE) to evaluate the models. Permutation test was applied to estimate significance levels. Relevance Vector Regression identified patterns of neural activity associated with symptoms of behavioral and emotional dysregulation at the initial study screen and close to the fMRI scanning session. The correlation and the mean squared error between actual and decoded symptoms were significant at the initial study screen and close to the fMRI scanning session. However, after controlling for potential medication effects, results remained significant only for decoding symptoms at the initial study screen. Neural regions with the highest contribution to the pattern regression model included cerebellum, sensory-motor and fronto-limbic areas. The combination of pattern regression models and neuroimaging can help to determine the severity of behavioral and emotional dysregulation in youth at different time points.
Carvalho, Bruno M; Rangel, Elizabeth F; Ready, Paul D; Vale, Mariana M
2015-01-01
Vector borne diseases are susceptible to climate change because distributions and densities of many vectors are climate driven. The Amazon region is endemic for cutaneous leishmaniasis and is predicted to be severely impacted by climate change. Recent records suggest that the distributions of Lutzomyia (Nyssomyia) flaviscutellata and the parasite it transmits, Leishmania (Leishmania) amazonensis, are expanding southward, possibly due to climate change, and sometimes associated with new human infection cases. We define the vector's climatic niche and explore future projections under climate change scenarios. Vector occurrence records were compiled from the literature, museum collections and Brazilian Health Departments. Six bioclimatic variables were used as predictors in six ecological niche model algorithms (BIOCLIM, DOMAIN, MaxEnt, GARP, logistic regression and Random Forest). Projections for 2050 used 17 general circulation models in two greenhouse gas representative concentration pathways: "stabilization" and "high increase". Ensemble models and consensus maps were produced by overlapping binary predictions. Final model outputs showed good performance and significance. The use of species absence data substantially improved model performance. Currently, L. flaviscutellata is widely distributed in the Amazon region, with records in the Atlantic Forest and savannah regions of Central Brazil. Future projections indicate expansion of the climatically suitable area for the vector in both scenarios, towards higher latitudes and elevations. L. flaviscutellata is likely to find increasingly suitable conditions for its expansion into areas where human population size and density are much larger than they are in its current locations. If environmental conditions change as predicted, the range of the vector is likely to expand to southeastern and central-southern Brazil, eastern Paraguay and further into the Amazonian areas of Bolivia, Peru, Ecuador, Colombia and Venezuela. These areas will only become endemic for L. amazonensis, however, if they have competent reservoir hosts and transmission dynamics matching those in the Amazon region.
Notes on power of normality tests of error terms in regression models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Střelec, Luboš
2015-03-10
Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
Fatigue design of a cellular phone folder using regression model-based multi-objective optimization
NASA Astrophysics Data System (ADS)
Kim, Young Gyun; Lee, Jongsoo
2016-08-01
In a folding cellular phone, the folding device is repeatedly opened and closed by the user, which eventually results in fatigue damage, particularly to the front of the folder. Hence, it is important to improve the safety and endurance of the folder while also reducing its weight. This article presents an optimal design for the folder front that maximizes its fatigue endurance while minimizing its thickness. Design data for analysis and optimization were obtained experimentally using a test jig. Multi-objective optimization was carried out using a nonlinear regression model. Three regression methods were employed: back-propagation neural networks, logistic regression and support vector machines. The AdaBoost ensemble technique was also used to improve the approximation. Two-objective Pareto-optimal solutions were identified using the non-dominated sorting genetic algorithm (NSGA-II). Finally, a numerically optimized solution was validated against experimental product data, in terms of both fatigue endurance and thickness index.
NASA Astrophysics Data System (ADS)
Lee, Jung-Hyun; Sameen, Maher Ibrahim; Pradhan, Biswajeet; Park, Hyuck-Jin
2018-02-01
This study evaluated the generalizability of five models to select a suitable approach for landslide susceptibility modeling in data-scarce environments. In total, 418 landslide inventories and 18 landslide conditioning factors were analyzed. Multicollinearity and factor optimization were investigated before data modeling, and two experiments were then conducted. In each experiment, five susceptibility maps were produced based on support vector machine (SVM), random forest (RF), weight-of-evidence (WoE), ridge regression (Rid_R), and robust regression (RR) models. The highest accuracy (AUC = 0.85) was achieved with the SVM model when either the full or limited landslide inventories were used. Furthermore, the RF and WoE models were severely affected when less landslide samples were used for training. The other models were affected slightly when the training samples were limited.
Goodarzi, Mohammad; Jensen, Richard; Vander Heyden, Yvan
2012-12-01
A Quantitative Structure-Retention Relationship (QSRR) is proposed to estimate the chromatographic retention of 83 diverse drugs on a Unisphere poly butadiene (PBD) column, using isocratic elutions at pH 11.7. Previous work has generated QSRR models for them using Classification And Regression Trees (CART). In this work, Ant Colony Optimization is used as a feature selection method to find the best molecular descriptors from a large pool. In addition, several other selection methods have been applied, such as Genetic Algorithms, Stepwise Regression and the Relief method, not only to evaluate Ant Colony Optimization as a feature selection method but also to investigate its ability to find the important descriptors in QSRR. Multiple Linear Regression (MLR) and Support Vector Machines (SVMs) were applied as linear and nonlinear regression methods, respectively, giving excellent correlation between the experimental, i.e. extrapolated to a mobile phase consisting of pure water, and predicted logarithms of the retention factors of the drugs (logk(w)). The overall best model was the SVM one built using descriptors selected by ACO. Copyright © 2012 Elsevier B.V. All rights reserved.
Face Hallucination with Linear Regression Model in Semi-Orthogonal Multilinear PCA Method
NASA Astrophysics Data System (ADS)
Asavaskulkiet, Krissada
2018-04-01
In this paper, we propose a new face hallucination technique, face images reconstruction in HSV color space with a semi-orthogonal multilinear principal component analysis method. This novel hallucination technique can perform directly from tensors via tensor-to-vector projection by imposing the orthogonality constraint in only one mode. In our experiments, we use facial images from FERET database to test our hallucination approach which is demonstrated by extensive experiments with high-quality hallucinated color faces. The experimental results assure clearly demonstrated that we can generate photorealistic color face images by using the SO-MPCA subspace with a linear regression model.
NASA Astrophysics Data System (ADS)
Salawu, Emmanuel Oluwatobi; Hesse, Evelyn; Stopford, Chris; Davey, Neil; Sun, Yi
2017-11-01
Better understanding and characterization of cloud particles, whose properties and distributions affect climate and weather, are essential for the understanding of present climate and climate change. Since imaging cloud probes have limitations of optical resolution, especially for small particles (with diameter < 25 μm), instruments like the Small Ice Detector (SID) probes, which capture high-resolution spatial light scattering patterns from individual particles down to 1 μm in size, have been developed. In this work, we have proposed a method using Machine Learning techniques to estimate simulated particles' orientation-averaged projected sizes (PAD) and aspect ratio from their 2D scattering patterns. The two-dimensional light scattering patterns (2DLSP) of hexagonal prisms are computed using the Ray Tracing with Diffraction on Facets (RTDF) model. The 2DLSP cover the same angular range as the SID probes. We generated 2DLSP for 162 hexagonal prisms at 133 orientations for each. In a first step, the 2DLSP were transformed into rotation-invariant Zernike moments (ZMs), which are particularly suitable for analyses of pattern symmetry. Then we used ZMs, summed intensities, and root mean square contrast as inputs to the advanced Machine Learning methods. We created one random forests classifier for predicting prism orientation, 133 orientation-specific (OS) support vector classification models for predicting the prism aspect-ratios, 133 OS support vector regression models for estimating prism sizes, and another 133 OS Support Vector Regression (SVR) models for estimating the size PADs. We have achieved a high accuracy of 0.99 in predicting prism aspect ratios, and a low value of normalized mean square error of 0.004 for estimating the particle's size and size PADs.
Supplier Short Term Load Forecasting Using Support Vector Regression and Exogenous Input
NASA Astrophysics Data System (ADS)
Matijaš, Marin; Vukićcević, Milan; Krajcar, Slavko
2011-09-01
In power systems, task of load forecasting is important for keeping equilibrium between production and consumption. With liberalization of electricity markets, task of load forecasting changed because each market participant has to forecast their own load. Consumption of end-consumers is stochastic in nature. Due to competition, suppliers are not in a position to transfer their costs to end-consumers; therefore it is essential to keep forecasting error as low as possible. Numerous papers are investigating load forecasting from the perspective of the grid or production planning. We research forecasting models from the perspective of a supplier. In this paper, we investigate different combinations of exogenous input on the simulated supplier loads and show that using points of delivery as a feature for Support Vector Regression leads to lower forecasting error, while adding customer number in different datasets does the opposite.
Nonparametric methods for drought severity estimation at ungauged sites
NASA Astrophysics Data System (ADS)
Sadri, S.; Burn, D. H.
2012-12-01
The objective in frequency analysis is, given extreme events such as drought severity or duration, to estimate the relationship between that event and the associated return periods at a catchment. Neural networks and other artificial intelligence approaches in function estimation and regression analysis are relatively new techniques in engineering, providing an attractive alternative to traditional statistical models. There are, however, few applications of neural networks and support vector machines in the area of severity quantile estimation for drought frequency analysis. In this paper, we compare three methods for this task: multiple linear regression, radial basis function neural networks, and least squares support vector regression (LS-SVR). The area selected for this study includes 32 catchments in the Canadian Prairies. From each catchment drought severities are extracted and fitted to a Pearson type III distribution, which act as observed values. For each method-duration pair, we use a jackknife algorithm to produce estimated values at each site. The results from these three approaches are compared and analyzed, and it is found that LS-SVR provides the best quantile estimates and extrapolating capacity.
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-01-01
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides. PMID:27187430
Yu, Xianyu; Wang, Yi; Niu, Ruiqing; Hu, Youjian
2016-05-11
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%-19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.
An LPV Adaptive Observer for Updating a Map Applied to an MAF Sensor in a Diesel Engine.
Liu, Zhiyuan; Wang, Changhui
2015-10-23
In this paper, a new method for mass air flow (MAF) sensor error compensation and an online updating error map (or lookup table) due to installation and aging in a diesel engine is developed. Since the MAF sensor error is dependent on the engine operating point, the error model is represented as a two-dimensional (2D) map with two inputs, fuel mass injection quantity and engine speed. Meanwhile, the 2D map representing the MAF sensor error is described as a piecewise bilinear interpolation model, which can be written as a dot product between the regression vector and parameter vector using a membership function. With the combination of the 2D map regression model and the diesel engine air path system, an LPV adaptive observer with low computational load is designed to estimate states and parameters jointly. The convergence of the proposed algorithm is proven under the conditions of persistent excitation and given inequalities. The observer is validated against the simulation data from engine software enDYNA provided by Tesis. The results demonstrate that the operating point-dependent error of the MAF sensor can be approximated acceptably by the 2D map from the proposed method.
Application of XGBoost algorithm in hourly PM2.5 concentration prediction
NASA Astrophysics Data System (ADS)
Pan, Bingyue
2018-02-01
In view of prediction techniques of hourly PM2.5 concentration in China, this paper applied the XGBoost(Extreme Gradient Boosting) algorithm to predict hourly PM2.5 concentration. The monitoring data of air quality in Tianjin city was analyzed by using XGBoost algorithm. The prediction performance of the XGBoost method is evaluated by comparing observed and predicted PM2.5 concentration using three measures of forecast accuracy. The XGBoost method is also compared with the random forest algorithm, multiple linear regression, decision tree regression and support vector machines for regression models using computational results. The results demonstrate that the XGBoost algorithm outperforms other data mining methods.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
NASA Astrophysics Data System (ADS)
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-03-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states.
Higher-order Multivariable Polynomial Regression to Estimate Human Affective States
Wei, Jie; Chen, Tong; Liu, Guangyuan; Yang, Jiemin
2016-01-01
From direct observations, facial, vocal, gestural, physiological, and central nervous signals, estimating human affective states through computational models such as multivariate linear-regression analysis, support vector regression, and artificial neural network, have been proposed in the past decade. In these models, linear models are generally lack of precision because of ignoring intrinsic nonlinearities of complex psychophysiological processes; and nonlinear models commonly adopt complicated algorithms. To improve accuracy and simplify model, we introduce a new computational modeling method named as higher-order multivariable polynomial regression to estimate human affective states. The study employs standardized pictures in the International Affective Picture System to induce thirty subjects’ affective states, and obtains pure affective patterns of skin conductance as input variables to the higher-order multivariable polynomial model for predicting affective valence and arousal. Experimental results show that our method is able to obtain efficient correlation coefficients of 0.98 and 0.96 for estimation of affective valence and arousal, respectively. Moreover, the method may provide certain indirect evidences that valence and arousal have their brain’s motivational circuit origins. Thus, the proposed method can serve as a novel one for efficiently estimating human affective states. PMID:26996254
NASA Astrophysics Data System (ADS)
Jeyaram, A.; Kesari, S.; Bajpai, A.; Bhunia, G. S.; Krishna Murthy, Y. V. N.
2012-07-01
Visceral Leishmaniasis (VL) commonly known as Kala-azar is one of the most neglected tropical disease affecting approximately 200 million poorest populations 'at risk in 109 districts of three endemic countries namely Bangladesh, India and Nepal at different levels. This tropical disease is caused by the protozoan parasite Leishmania donovani and transmitted by female Phlebotomus argentipes sand flies. The analysis of disease dynamics indicate the periodicity at seasonal and inter-annual temporal scale which forms the basis for development of advanced early warning system. Study area of highly endemic Vaishali district, Bihar, India has been taken for model development. A Systematic study of geo-environmental parameters derived from satellite data in conjunction with ground intelligence enabled modelling of infectious disease and risk villages. High resolution Indian satellites data of IRS LISS IV (multi-spectral) and Cartosat-1 (Pan) have been used for studying environmentally risk parameters viz. peri-domestic vegetation, dwelling condition, wetland ecosystem, cropping pattern, Normalised Difference Vegetation Index (NDVI), detailed land use etc towards risk assessment. Univariate analysis of the relationship between vector density and various land cover categories and climatic variables suggested that all the variables are significantly correlated. Using the significantly correlated variables with vector density, a seasonal multivariate regression model has been carried out incorporating geo-environmental parameters, climate variables and seasonal time series disease parameters. Linear and non-linear models have been applied for periodicity and interannual temporal scale to predict Man-hour-density (MHD) and 'out-of-fit' data set used for validating the model with reasonable accuracy. To improve the MHD predictive approach, fuzzy model has also been incorporated in GIS environment combining spatial geo-environmental and climatic variables using fuzzy membership logic. Based on the perceived importance of the geoenvironmental parameters assigned by epidemiology expert, combined fuzzy membership has been calculated. The combined fuzzy membership indicate the predictive measure of vector density in each village. A γ factor has been introduced to have increasing effect in the higher side and decreasing effect in the lower side which facilitated for prioritisation of the villages. This approach is not only to predict vector density but also to prioritise the villages for effective control measures. A software package for modelling the risk villages integrating multivariate regression and fuzzy membership analysis models have been developed to estimate MHD (vector density) as part of the early warning system.
Modeling and forecasting US presidential election using learning algorithms
NASA Astrophysics Data System (ADS)
Zolghadr, Mohammad; Niaki, Seyed Armin Akhavan; Niaki, S. T. A.
2017-09-01
The primary objective of this research is to obtain an accurate forecasting model for the US presidential election. To identify a reliable model, artificial neural networks (ANN) and support vector regression (SVR) models are compared based on some specified performance measures. Moreover, six independent variables such as GDP, unemployment rate, the president's approval rate, and others are considered in a stepwise regression to identify significant variables. The president's approval rate is identified as the most significant variable, based on which eight other variables are identified and considered in the model development. Preprocessing methods are applied to prepare the data for the learning algorithms. The proposed procedure significantly increases the accuracy of the model by 50%. The learning algorithms (ANN and SVR) proved to be superior to linear regression based on each method's calculated performance measures. The SVR model is identified as the most accurate model among the other models as this model successfully predicted the outcome of the election in the last three elections (2004, 2008, and 2012). The proposed approach significantly increases the accuracy of the forecast.
MacMillan, Katherine; Monaghan, Andrew J.; Apangu, Titus; Griffith, Kevin S.; Mead, Paul S.; Acayo, Sarah; Acidri, Rogers; Moore, Sean M.; Mpanga, Joseph Tendo; Enscore, Russel E.; Gage, Kenneth L.; Eisen, Rebecca J.
2012-01-01
East Africa has been identified as a region where vector-borne and zoonotic diseases are most likely to emerge or re-emerge and where morbidity and mortality from these diseases is significant. Understanding when and where humans are most likely to be exposed to vector-borne and zoonotic disease agents in this region can aid in targeting limited prevention and control resources. Often, spatial and temporal distributions of vectors and vector-borne disease agents are predictable based on climatic variables. However, because of coarse meteorological observation networks, appropriately scaled and accurate climate data are often lacking for Africa. Here, we use a recently developed 10-year gridded meteorological dataset from the Advanced Weather Research and Forecasting Model to identify climatic variables predictive of the spatial distribution of human plague cases in the West Nile region of Uganda. Our logistic regression model revealed that within high elevation sites (above 1,300 m), plague risk was positively associated with rainfall during the months of February, October, and November and negatively associated with rainfall during the month of June. These findings suggest that areas that receive increased but not continuous rainfall provide ecologically conducive conditions for Yersinia pestis transmission in this region. This study serves as a foundation for similar modeling efforts of other vector-borne and zoonotic disease in regions with sparse observational meteorologic networks. PMID:22403328
Predicting complications of percutaneous coronary intervention using a novel support vector method.
Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan
2013-01-01
To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer-Lemeshow χ(2) value (seven cases) and the mean cross-entropy error (eight cases). The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains.
Predicting complications of percutaneous coronary intervention using a novel support vector method
Lee, Gyemin; Gurm, Hitinder S; Syed, Zeeshan
2013-01-01
Objective To explore the feasibility of a novel approach using an augmented one-class learning algorithm to model in-laboratory complications of percutaneous coronary intervention (PCI). Materials and methods Data from the Blue Cross Blue Shield of Michigan Cardiovascular Consortium (BMC2) multicenter registry for the years 2007 and 2008 (n=41 016) were used to train models to predict 13 different in-laboratory PCI complications using a novel one-plus-class support vector machine (OP-SVM) algorithm. The performance of these models in terms of discrimination and calibration was compared to the performance of models trained using the following classification algorithms on BMC2 data from 2009 (n=20 289): logistic regression (LR), one-class support vector machine classification (OC-SVM), and two-class support vector machine classification (TC-SVM). For the OP-SVM and TC-SVM approaches, variants of the algorithms with cost-sensitive weighting were also considered. Results The OP-SVM algorithm and its cost-sensitive variant achieved the highest area under the receiver operating characteristic curve for the majority of the PCI complications studied (eight cases). Similar improvements were observed for the Hosmer–Lemeshow χ2 value (seven cases) and the mean cross-entropy error (eight cases). Conclusions The OP-SVM algorithm based on an augmented one-class learning problem improved discrimination and calibration across different PCI complications relative to LR and traditional support vector machine classification. Such an approach may have value in a broader range of clinical domains. PMID:23599229
Muñoz-Barús, José I; Rodríguez-Calvo, María Sol; Suárez-Peñaranda, José M; Vieira, Duarte N; Cadarso-Suárez, Carmen; Febrero-Bande, Manuel
2010-01-30
In legal medicine the correct determination of the time of death is of utmost importance. Recent advances in estimating post-mortem interval (PMI) have made use of vitreous humour chemistry in conjunction with Linear Regression, but the results are questionable. In this paper we present PMICALC, an R code-based freeware package which estimates PMI in cadavers of recent death by measuring the concentrations of potassium ([K+]), hypoxanthine ([Hx]) and urea ([U]) in the vitreous humor using two different regression models: Additive Models (AM) and Support Vector Machine (SVM), which offer more flexibility than the previously used Linear Regression. The results from both models are better than those published to date and can give numerical expression of PMI with confidence intervals and graphic support within 20 min. The program also takes into account the cause of death. 2009 Elsevier Ireland Ltd. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang
This work proposes an approach for distribution system load forecasting, which aims to provide highly accurate short-term load forecasting with high resolution utilizing a support vector regression (SVR) based forecaster and a two-step hybrid parameters optimization method. Specifically, because the load profiles in distribution systems contain abrupt deviations, a data normalization is designed as the pretreatment for the collected historical load data. Then an SVR model is trained by the load data to forecast the future load. For better performance of SVR, a two-step hybrid optimization algorithm is proposed to determine the best parameters. In the first step of themore » hybrid optimization algorithm, a designed grid traverse algorithm (GTA) is used to narrow the parameters searching area from a global to local space. In the second step, based on the result of the GTA, particle swarm optimization (PSO) is used to determine the best parameters in the local parameter space. After the best parameters are determined, the SVR model is used to forecast the short-term load deviation in the distribution system.« less
Jeffrey T. Walton
2008-01-01
Three machine learning subpixel estimation methods (Cubist, Random Forests, and support vector regression) were applied to estimate urban cover. Urban forest canopy cover and impervious surface cover were estimated from Landsat-7 ETM+ imagery using a higher resolution cover map resampled to 30 m as training and reference data. Three different band combinations (...
A surrogate model for thermal characteristics of stratospheric airship
NASA Astrophysics Data System (ADS)
Zhao, Da; Liu, Dongxu; Zhu, Ming
2018-06-01
A simple and accurate surrogate model is extremely needed to reduce the analysis complexity of thermal characteristics for a stratospheric airship. In this paper, a surrogate model based on the Least Squares Support Vector Regression (LSSVR) is proposed. The Gravitational Search Algorithm (GSA) is used to optimize hyper parameters. A novel framework consisting of a preprocessing classifier and two regression models is designed to train the surrogate model. Various temperature datasets of the airship envelope and the internal gas are obtained by a three-dimensional transient model for thermal characteristics. Using these thermal datasets, two-factor and multi-factor surrogate models are trained and several comparison simulations are conducted. Results illustrate that the surrogate models based on LSSVR-GSA have good fitting and generalization abilities. The pre-treated classification strategy proposed in this paper plays a significant role in improving the accuracy of the surrogate model.
Support vector regression to predict porosity and permeability: Effect of sample size
NASA Astrophysics Data System (ADS)
Al-Anazi, A. F.; Gates, I. D.
2012-02-01
Porosity and permeability are key petrophysical parameters obtained from laboratory core analysis. Cores, obtained from drilled wells, are often few in number for most oil and gas fields. Porosity and permeability correlations based on conventional techniques such as linear regression or neural networks trained with core and geophysical logs suffer poor generalization to wells with only geophysical logs. The generalization problem of correlation models often becomes pronounced when the training sample size is small. This is attributed to the underlying assumption that conventional techniques employing the empirical risk minimization (ERM) inductive principle converge asymptotically to the true risk values as the number of samples increases. In small sample size estimation problems, the available training samples must span the complexity of the parameter space so that the model is able both to match the available training samples reasonably well and to generalize to new data. This is achieved using the structural risk minimization (SRM) inductive principle by matching the capability of the model to the available training data. One method that uses SRM is support vector regression (SVR) network. In this research, the capability of SVR to predict porosity and permeability in a heterogeneous sandstone reservoir under the effect of small sample size is evaluated. Particularly, the impact of Vapnik's ɛ-insensitivity loss function and least-modulus loss function on generalization performance was empirically investigated. The results are compared to the multilayer perception (MLP) neural network, a widely used regression method, which operates under the ERM principle. The mean square error and correlation coefficients were used to measure the quality of predictions. The results demonstrate that SVR yields consistently better predictions of the porosity and permeability with small sample size than the MLP method. Also, the performance of SVR depends on both kernel function type and loss functions used.
NASA Astrophysics Data System (ADS)
Boucher, Thomas F.; Ozanne, Marie V.; Carmosino, Marco L.; Dyar, M. Darby; Mahadevan, Sridhar; Breves, Elly A.; Lepore, Kate H.; Clegg, Samuel M.
2015-05-01
The ChemCam instrument on the Mars Curiosity rover is generating thousands of LIBS spectra and bringing interest in this technique to public attention. The key to interpreting Mars or any other types of LIBS data are calibrations that relate laboratory standards to unknowns examined in other settings and enable predictions of chemical composition. Here, LIBS spectral data are analyzed using linear regression methods including partial least squares (PLS-1 and PLS-2), principal component regression (PCR), least absolute shrinkage and selection operator (lasso), elastic net, and linear support vector regression (SVR-Lin). These were compared against results from nonlinear regression methods including kernel principal component regression (K-PCR), polynomial kernel support vector regression (SVR-Py) and k-nearest neighbor (kNN) regression to discern the most effective models for interpreting chemical abundances from LIBS spectra of geological samples. The results were evaluated for 100 samples analyzed with 50 laser pulses at each of five locations averaged together. Wilcoxon signed-rank tests were employed to evaluate the statistical significance of differences among the nine models using their predicted residual sum of squares (PRESS) to make comparisons. For MgO, SiO2, Fe2O3, CaO, and MnO, the sparse models outperform all the others except for linear SVR, while for Na2O, K2O, TiO2, and P2O5, the sparse methods produce inferior results, likely because their emission lines in this energy range have lower transition probabilities. The strong performance of the sparse methods in this study suggests that use of dimensionality-reduction techniques as a preprocessing step may improve the performance of the linear models. Nonlinear methods tend to overfit the data and predict less accurately, while the linear methods proved to be more generalizable with better predictive performance. These results are attributed to the high dimensionality of the data (6144 channels) relative to the small number of samples studied. The best-performing models were SVR-Lin for SiO2, MgO, Fe2O3, and Na2O, lasso for Al2O3, elastic net for MnO, and PLS-1 for CaO, TiO2, and K2O. Although these differences in model performance between methods were identified, most of the models produce comparable results when p ≤ 0.05 and all techniques except kNN produced statistically-indistinguishable results. It is likely that a combination of models could be used together to yield a lower total error of prediction, depending on the requirements of the user.
Development of precursors recognition methods in vector signals
NASA Astrophysics Data System (ADS)
Kapralov, V. G.; Elagin, V. V.; Kaveeva, E. G.; Stankevich, L. A.; Dremin, M. M.; Krylov, S. V.; Borovov, A. E.; Harfush, H. A.; Sedov, K. S.
2017-10-01
Precursor recognition methods in vector signals of plasma diagnostics are presented. Their requirements and possible options for their development are considered. In particular, the variants of using symbolic regression for building a plasma disruption prediction system are discussed. The initial data preparation using correlation analysis and symbolic regression is discussed. Special attention is paid to the possibility of using algorithms in real time.
Calvete, C; Estrada, R; Miranda, M A; Borrás, D; Calvo, J H; Lucientes, J
2008-06-01
Data obtained by a Spanish national surveillance programme in 2005 were used to develop climatic models for predictions of the distribution of the bluetongue virus (BTV) vectors Culicoides imicola Kieffer (Diptera: Ceratopogonidae) and the Culicoides obsoletus group Meigen throughout the Iberian peninsula. Models were generated using logistic regression to predict the probability of species occurrence at an 8-km spatial resolution. Predictor variables included the annual mean values and seasonalities of a remotely sensed normalized difference vegetation index (NDVI), a sun index, interpolated precipitation and temperature. Using an information-theoretic paradigm based on Akaike's criterion, a set of best models accounting for 95% of model selection certainty were selected and used to generate an average predictive model for each vector. The predictive performances (i.e. the discrimination capacity and calibration) of the average models were evaluated by both internal and external validation. External validation was achieved by comparing average model predictions with surveillance programme data obtained in 2004 and 2006. The discriminatory capacity of both models was found to be reasonably high. The estimated areas under the receiver operating characteristic (ROC) curve (AUC) were 0.78 and 0.70 for the C. imicola and C. obsoletus group models, respectively, in external validation, and 0.81 and 0.75, respectively, in internal validation. The predictions of both models were in close agreement with the observed distribution patterns of both vectors. Both models, however, showed a systematic bias in their predicted probability of occurrence: observed occurrence was systematically overestimated for C. imicola and underestimated for the C. obsoletus group. Average models were used to determine the areas of spatial coincidence of the two vectors. Although their spatial distributions were highly complementary, areas of spatial coincidence were identified, mainly in Portugal and in the southwest of peninsular Spain. In a hypothetical scenario in which both Culicoides members had similar vectorial capacity for a BTV strain, these areas should be considered of special epidemiological concern because any epizootic event could be intensified by consecutive vector activity developed for both species during the year; consequently, the probability of BTV spreading to remaining areas occupied by both vectors might also be higher.
Modeling Dengue vector population using remotely sensed data and machine learning.
Scavuzzo, Juan M; Trucco, Francisco; Espinosa, Manuel; Tauro, Carolina B; Abril, Marcelo; Scavuzzo, Carlos M; Frery, Alejandro C
2018-05-16
Mosquitoes are vectors of many human diseases. In particular, Aedes ægypti (Linnaeus) is the main vector for Chikungunya, Dengue, and Zika viruses in Latin America and it represents a global threat. Public health policies that aim at combating this vector require dependable and timely information, which is usually expensive to obtain with field campaigns. For this reason, several efforts have been done to use remote sensing due to its reduced cost. The present work includes the temporal modeling of the oviposition activity (measured weekly on 50 ovitraps in a north Argentinean city) of Aedes ægypti (Linnaeus), based on time series of data extracted from operational earth observation satellite images. We use are NDVI, NDWI, LST night, LST day and TRMM-GPM rain from 2012 to 2016 as predictive variables. In contrast to previous works which use linear models, we employ Machine Learning techniques using completely accessible open source toolkits. These models have the advantages of being non-parametric and capable of describing nonlinear relationships between variables. Specifically, in addition to two linear approaches, we assess a support vector machine, an artificial neural networks, a K-nearest neighbors and a decision tree regressor. Considerations are made on parameter tuning and the validation and training approach. The results are compared to linear models used in previous works with similar data sets for generating temporal predictive models. These new tools perform better than linear approaches, in particular nearest neighbor regression (KNNR) performs the best. These results provide better alternatives to be implemented operatively on the Argentine geospatial risk system that is running since 2012. Copyright © 2018 Elsevier B.V. All rights reserved.
Cheong, Yoon Ling; Leitão, Pedro J; Lakes, Tobia
2014-07-01
The transmission of dengue disease is influenced by complex interactions among vector, host and virus. Land use such as water bodies or certain agricultural practices have been identified as likely risk factors for dengue because of the provision of suitable habitats for the vector. Many studies have focused on the land use factors of dengue vector abundance in small areas but have not yet studied the relationship between land use factors and dengue cases for large regions. This study aims to clarify if land use factors other than human settlements, e.g. different types of agricultural land use, water bodies and forest are associated with reported dengue cases from 2008 to 2010 in the state of Selangor, Malaysia. From the correlative relationship, we aim to generate a prediction risk map. We used Boosted Regression Trees (BRT) to account for nonlinearities and interactions between the factors with high predictive accuracies. Our model with a cross-validated performance score (Area Under the Receiver Operator Characteristic Curve, ROC AUC) of 0.81 showed that the most important land use factors are human settlements (model importance of 39.2%), followed by water bodies (16.1%), mixed horticulture (8.7%), open land (7.5%) and neglected grassland (6.7%). A risk map after 100 model runs with a cross-validated ROC AUC mean of 0.81 (±0.001 s.d.) is presented. Our findings may be an important asset for improving surveillance and control interventions for dengue. Copyright © 2014 The Authors. Published by Elsevier Ltd.. All rights reserved.
Support vector machine firefly algorithm based optimization of lens system.
Shamshirband, Shahaboddin; Petković, Dalibor; Pavlović, Nenad T; Ch, Sudheer; Altameem, Torki A; Gani, Abdullah
2015-01-01
Lens system design is an important factor in image quality. The main aspect of the lens system design methodology is the optimization procedure. Since optimization is a complex, nonlinear task, soft computing optimization algorithms can be used. There are many tools that can be employed to measure optical performance, but the spot diagram is the most useful. The spot diagram gives an indication of the image of a point object. In this paper, the spot size radius is considered an optimization criterion. Intelligent soft computing scheme support vector machines (SVMs) coupled with the firefly algorithm (FFA) are implemented. The performance of the proposed estimators is confirmed with the simulation results. The result of the proposed SVM-FFA model has been compared with support vector regression (SVR), artificial neural networks, and generic programming methods. The results show that the SVM-FFA model performs more accurately than the other methodologies. Therefore, SVM-FFA can be used as an efficient soft computing technique in the optimization of lens system designs.
Zhou, Yang; Fu, Xiaping; Ying, Yibin; Fang, Zhenhuan
2015-06-23
A fiber-optic probe system was developed to estimate the optical properties of turbid media based on spatially resolved diffuse reflectance. Because of the limitations in numerical calculation of radiative transfer equation (RTE), diffusion approximation (DA) and Monte Carlo simulations (MC), support vector regression (SVR) was introduced to model the relationship between diffuse reflectance values and optical properties. The SVR models of four collection fibers were trained by phantoms in calibration set with a wide range of optical properties which represented products of different applications, then the optical properties of phantoms in prediction set were predicted after an optimal searching on SVR models. The results indicated that the SVR model was capable of describing the relationship with little deviation in forward validation. The correlation coefficient (R) of reduced scattering coefficient μ'(s) and absorption coefficient μ(a) in the prediction set were 0.9907 and 0.9980, respectively. The root mean square errors of prediction (RMSEP) of μ'(s) and μ(a) in inverse validation were 0.411 cm(-1) and 0.338 cm(-1), respectively. The results indicated that the integrated fiber-optic probe system combined with SVR model were suitable for fast and accurate estimation of optical properties of turbid media based on spatially resolved diffuse reflectance. Copyright © 2015 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Hill, D.; Bell, K. R. W.; McMillan, D.; Infield, D.
2014-05-01
The growth of wind power production in the electricity portfolio is striving to meet ambitious targets set, for example by the EU, to reduce greenhouse gas emissions by 20% by 2020. Huge investments are now being made in new offshore wind farms around UK coastal waters that will have a major impact on the GB electrical supply. Representations of the UK wind field in syntheses which capture the inherent structure and correlations between different locations including offshore sites are required. Here, Vector Auto-Regressive (VAR) models are presented and extended in a novel way to incorporate offshore time series from a pan-European meteorological model called COSMO, with onshore wind speeds from the MIDAS dataset provided by the British Atmospheric Data Centre. Forecasting ability onshore is shown to be improved with the inclusion of the offshore sites with improvements of up to 25% in RMS error at 6 h ahead. In addition, the VAR model is used to synthesise time series of wind at each offshore site, which are then used to estimate wind farm capacity factors at the sites in question. These are then compared with estimates of capacity factors derived from the work of Hawkins et al. (2011). A good degree of agreement is established indicating that this synthesis tool should be useful in power system impact studies.
Carvalho, Bruno M.; Ready, Paul D.
2015-01-01
Vector borne diseases are susceptible to climate change because distributions and densities of many vectors are climate driven. The Amazon region is endemic for cutaneous leishmaniasis and is predicted to be severely impacted by climate change. Recent records suggest that the distributions of Lutzomyia (Nyssomyia) flaviscutellata and the parasite it transmits, Leishmania (Leishmania) amazonensis, are expanding southward, possibly due to climate change, and sometimes associated with new human infection cases. We define the vector’s climatic niche and explore future projections under climate change scenarios. Vector occurrence records were compiled from the literature, museum collections and Brazilian Health Departments. Six bioclimatic variables were used as predictors in six ecological niche model algorithms (BIOCLIM, DOMAIN, MaxEnt, GARP, logistic regression and Random Forest). Projections for 2050 used 17 general circulation models in two greenhouse gas representative concentration pathways: “stabilization” and “high increase”. Ensemble models and consensus maps were produced by overlapping binary predictions. Final model outputs showed good performance and significance. The use of species absence data substantially improved model performance. Currently, L. flaviscutellata is widely distributed in the Amazon region, with records in the Atlantic Forest and savannah regions of Central Brazil. Future projections indicate expansion of the climatically suitable area for the vector in both scenarios, towards higher latitudes and elevations. L. flaviscutellata is likely to find increasingly suitable conditions for its expansion into areas where human population size and density are much larger than they are in its current locations. If environmental conditions change as predicted, the range of the vector is likely to expand to southeastern and central-southern Brazil, eastern Paraguay and further into the Amazonian areas of Bolivia, Peru, Ecuador, Colombia and Venezuela. These areas will only become endemic for L. amazonensis, however, if they have competent reservoir hosts and transmission dynamics matching those in the Amazon region. PMID:26619186
Habitat suitability and ecological niche profile of major malaria vectors in Cameroon
2009-01-01
Background Suitability of environmental conditions determines a species distribution in space and time. Understanding and modelling the ecological niche of mosquito disease vectors can, therefore, be a powerful predictor of the risk of exposure to the pathogens they transmit. In Africa, five anophelines are responsible for over 95% of total malaria transmission. However, detailed knowledge of the geographic distribution and ecological requirements of these species is to date still inadequate. Methods Indoor-resting mosquitoes were sampled from 386 villages covering the full range of ecological settings available in Cameroon, Central Africa. Using a predictive species distribution modeling approach based only on presence records, habitat suitability maps were constructed for the five major malaria vectors Anopheles gambiae, Anopheles funestus, Anopheles arabiensis, Anopheles nili and Anopheles moucheti. The influence of 17 climatic, topographic, and land use variables on mosquito geographic distribution was assessed by multivariate regression and ordination techniques. Results Twenty-four anopheline species were collected, of which 17 are known to transmit malaria in Africa. Ecological Niche Factor Analysis, Habitat Suitability modeling and Canonical Correspondence Analysis revealed marked differences among the five major malaria vector species, both in terms of ecological requirements and niche breadth. Eco-geographical variables (EGVs) related to human activity had the highest impact on habitat suitability for the five major malaria vectors, with areas of low population density being of marginal or unsuitable habitat quality. Sunlight exposure, rainfall, evapo-transpiration, relative humidity, and wind speed were among the most discriminative EGVs separating "forest" from "savanna" species. Conclusions The distribution of major malaria vectors in Cameroon is strongly affected by the impact of humans on the environment, with variables related to proximity to human settings being among the best predictors of habitat suitability. The ecologically more tolerant species An. gambiae and An. funestus were recorded in a wide range of eco-climatic settings. The other three major vectors, An. arabiensis, An. moucheti, and An. nili, were more specialized. Ecological niche and species distribution modelling should help improve malaria vector control interventions by targeting places and times where the impact on vector populations and disease transmission can be optimized. PMID:20028559
Habitat suitability and ecological niche profile of major malaria vectors in Cameroon.
Ayala, Diego; Costantini, Carlo; Ose, Kenji; Kamdem, Guy C; Antonio-Nkondjio, Christophe; Agbor, Jean-Pierre; Awono-Ambene, Parfait; Fontenille, Didier; Simard, Frédéric
2009-12-23
Suitability of environmental conditions determines a species distribution in space and time. Understanding and modelling the ecological niche of mosquito disease vectors can, therefore, be a powerful predictor of the risk of exposure to the pathogens they transmit. In Africa, five anophelines are responsible for over 95% of total malaria transmission. However, detailed knowledge of the geographic distribution and ecological requirements of these species is to date still inadequate. Indoor-resting mosquitoes were sampled from 386 villages covering the full range of ecological settings available in Cameroon, Central Africa. Using a predictive species distribution modeling approach based only on presence records, habitat suitability maps were constructed for the five major malaria vectors Anopheles gambiae, Anopheles funestus, Anopheles arabiensis, Anopheles nili and Anopheles moucheti. The influence of 17 climatic, topographic, and land use variables on mosquito geographic distribution was assessed by multivariate regression and ordination techniques. Twenty-four anopheline species were collected, of which 17 are known to transmit malaria in Africa. Ecological Niche Factor Analysis, Habitat Suitability modeling and Canonical Correspondence Analysis revealed marked differences among the five major malaria vector species, both in terms of ecological requirements and niche breadth. Eco-geographical variables (EGVs) related to human activity had the highest impact on habitat suitability for the five major malaria vectors, with areas of low population density being of marginal or unsuitable habitat quality. Sunlight exposure, rainfall, evapo-transpiration, relative humidity, and wind speed were among the most discriminative EGVs separating "forest" from "savanna" species. The distribution of major malaria vectors in Cameroon is strongly affected by the impact of humans on the environment, with variables related to proximity to human settings being among the best predictors of habitat suitability. The ecologically more tolerant species An. gambiae and An. funestus were recorded in a wide range of eco-climatic settings. The other three major vectors, An. arabiensis, An. moucheti, and An. nili, were more specialized. Ecological niche and species distribution modelling should help improve malaria vector control interventions by targeting places and times where the impact on vector populations and disease transmission can be optimized.
Han, Xiao; Ge, Miao; Dong, Jie; Xue, Ranying; Wang, Zixuan; He, Jinwei
2014-09-01
The aim of this paper is to analyze the geographical distribution of reference value of aging people's left ventricular end systolic diameter (LVDs), and to provide a scientific basis for clinical examination. The study is focus on the relationship between reference value of left ventricular end systolic diameter of aging people and 14 geographical factors, selecting 2495 samples of left ventricular end systolic diameter (LVDs) of aging people in 71 units of China, in which including 1620 men and 875 women. By using the Moran's I index to make sure the relationship between the reference values and spatial geographical factors, extracting 5 geographical factors which have significant correlation with left ventricular end systolic diameter for building the support vector regression, detecting by the method of paired sample t test to make sure the consistency between predicted and measured values, finally, makes the distribution map through the disjunctive kriging interpolation method and fits the three-dimensional trend of normal reference value. It is found that the correlation between the extracted geographical factors and the reference value of left ventricular end systolic diameter is quite significant, the 5 indexes respectively are latitude, annual mean air temperature, annual mean relative humidity, annual precipitation amount, annual range of air temperature, the predicted values and the observed ones are in good conformity, there is no significant difference at 95% degree of confidence. The overall trend of predicted values increases from west to east, increases first and then decreases from north to south. If geographical values are obtained in one region, the reference value of left ventricular end systolic diameter of aging people in this region can be obtained by using the support vector regression model. It could be more scientific to formulate the different distributions on the basis of synthesizing the physiological and the geographical factors. -Use Moran's index to analyze the spatial correlation. -Choose support vector machine to build model that overcome complexity of variables. -Test normal distribution of predicted data to guarantee the interpolation results. -Through trend analysis to explain the changes of reference value clearly. Copyright © 2014 Elsevier Inc. All rights reserved.
Raman spectroscopy-based screening of hepatitis C and associated molecular changes
NASA Astrophysics Data System (ADS)
Bilal, Maria; Bilal, M.; Saleem, M.; Khan, Saranjam; Ullah, Rahat; Fatima, Kiran; Ahmed, M.; Hayat, Abbas; Shahzada, Shaista; Ullah Khan, Ehsan
2017-09-01
This study presents the optical screening of hepatitis C and its associated molecular changes in human blood sera using a partial least-squares regression model based on their Raman spectra. In total, 152 samples were tested through enzyme-linked immunosorbent assay for confirmation. This model utilizes minor spectral variations in the Raman spectra of the positive and control groups. Regression coefficients of this model were analyzed with reference to the variations in concentration of associated molecules in these two groups. It was found that trehalose, chitin, ammonia, and cytokines are positively correlated while lipids, beta structures of proteins, and carbohydrate-binding proteins are negatively correlated with hepatitis C. The regression vector yielded by this model is utilized to predict hepatitis C in unknown samples. This model has been evaluated by a cross-validation method, which yielded a correlation coefficient of 0.91. Moreover, 30 unknown samples were screened for hepatitis C infection using this model to test its performance. Sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve from these predictions were found to be 93.3%, 100%, 96.7%, and 1, respectively.
Non-Gaussian spatiotemporal simulation of multisite daily precipitation: downscaling framework
NASA Astrophysics Data System (ADS)
Ben Alaya, M. A.; Ouarda, T. B. M. J.; Chebana, F.
2018-01-01
Probabilistic regression approaches for downscaling daily precipitation are very useful. They provide the whole conditional distribution at each forecast step to better represent the temporal variability. The question addressed in this paper is: how to simulate spatiotemporal characteristics of multisite daily precipitation from probabilistic regression models? Recent publications point out the complexity of multisite properties of daily precipitation and highlight the need for using a non-Gaussian flexible tool. This work proposes a reasonable compromise between simplicity and flexibility avoiding model misspecification. A suitable nonparametric bootstrapping (NB) technique is adopted. A downscaling model which merges a vector generalized linear model (VGLM as a probabilistic regression tool) and the proposed bootstrapping technique is introduced to simulate realistic multisite precipitation series. The model is applied to data sets from the southern part of the province of Quebec, Canada. It is shown that the model is capable of reproducing both at-site properties and the spatial structure of daily precipitations. Results indicate the superiority of the proposed NB technique, over a multivariate autoregressive Gaussian framework (i.e. Gaussian copula).
Van Looy, Stijn; Verplancke, Thierry; Benoit, Dominique; Hoste, Eric; Van Maele, Georges; De Turck, Filip; Decruyenaere, Johan
2007-01-01
Tacrolimus is an important immunosuppressive drug for organ transplantation patients. It has a narrow therapeutic range, toxic side effects, and a blood concentration with wide intra- and interindividual variability. Hence, it is of the utmost importance to monitor tacrolimus blood concentration, thereby ensuring clinical effect and avoiding toxic side effects. Prediction models for tacrolimus blood concentration can improve clinical care by optimizing monitoring of these concentrations, especially in the initial phase after transplantation during intensive care unit (ICU) stay. This is the first study in the ICU in which support vector machines, as a new data modeling technique, are investigated and tested in their prediction capabilities of tacrolimus blood concentration. Linear support vector regression (SVR) and nonlinear radial basis function (RBF) SVR are compared with multiple linear regression (MLR). Tacrolimus blood concentrations, together with 35 other relevant variables from 50 liver transplantation patients, were extracted from our ICU database. This resulted in a dataset of 457 blood samples, on average between 9 and 10 samples per patient, finally resulting in a database of more than 16,000 data values. Nonlinear RBF SVR, linear SVR, and MLR were performed after selection of clinically relevant input variables and model parameters. Differences between observed and predicted tacrolimus blood concentrations were calculated. Prediction accuracy of the three methods was compared after fivefold cross-validation (Friedman test and Wilcoxon signed rank analysis). Linear SVR and nonlinear RBF SVR had mean absolute differences between observed and predicted tacrolimus blood concentrations of 2.31 ng/ml (standard deviation [SD] 2.47) and 2.38 ng/ml (SD 2.49), respectively. MLR had a mean absolute difference of 2.73 ng/ml (SD 3.79). The difference between linear SVR and MLR was statistically significant (p < 0.001). RBF SVR had the advantage of requiring only 2 input variables to perform this prediction in comparison to 15 and 16 variables needed by linear SVR and MLR, respectively. This is an indication of the superior prediction capability of nonlinear SVR. Prediction of tacrolimus blood concentration with linear and nonlinear SVR was excellent, and accuracy was superior in comparison with an MLR model.
Qin, Zijian; Wang, Maolin; Yan, Aixia
2017-07-01
In this study, quantitative structure-activity relationship (QSAR) models using various descriptor sets and training/test set selection methods were explored to predict the bioactivity of hepatitis C virus (HCV) NS3/4A protease inhibitors by using a multiple linear regression (MLR) and a support vector machine (SVM) method. 512 HCV NS3/4A protease inhibitors and their IC 50 values which were determined by the same FRET assay were collected from the reported literature to build a dataset. All the inhibitors were represented with selected nine global and 12 2D property-weighted autocorrelation descriptors calculated from the program CORINA Symphony. The dataset was divided into a training set and a test set by a random and a Kohonen's self-organizing map (SOM) method. The correlation coefficients (r 2 ) of training sets and test sets were 0.75 and 0.72 for the best MLR model, 0.87 and 0.85 for the best SVM model, respectively. In addition, a series of sub-dataset models were also developed. The performances of all the best sub-dataset models were better than those of the whole dataset models. We believe that the combination of the best sub- and whole dataset SVM models can be used as reliable lead designing tools for new NS3/4A protease inhibitors scaffolds in a drug discovery pipeline. Copyright © 2017 Elsevier Ltd. All rights reserved.
Savjani, Ricky R; Taylor, Brian A; Acion, Laura; Wilde, Elisabeth A; Jorge, Ricardo E
2017-11-15
Finding objective and quantifiable imaging markers of mild traumatic brain injury (TBI) has proven challenging, especially in the military population. Changes in cortical thickness after injury have been reported in animals and in humans, but it is unclear how these alterations manifest in the chronic phase, and it is difficult to characterize accurately with imaging. We used cortical thickness measures derived from Advanced Normalization Tools (ANTs) to predict a continuous demographic variable: age. We trained four different regression models (linear regression, support vector regression, Gaussian process regression, and random forests) to predict age from healthy control brains from publicly available datasets (n = 762). We then used these models to predict brain age in military Service Members with TBI (n = 92) and military Service Members without TBI (n = 34). Our results show that all four models overpredicted age in Service Members with TBI, and the predicted age difference was significantly greater compared with military controls. These data extend previous civilian findings and show that cortical thickness measures may reveal an association of accelerated changes over time with military TBI.
A comparative study of machine learning models for ethnicity classification
NASA Astrophysics Data System (ADS)
Trivedi, Advait; Bessie Amali, D. Geraldine
2017-11-01
This paper endeavours to adopt a machine learning approach to solve the problem of ethnicity recognition. Ethnicity identification is an important vision problem with its use cases being extended to various domains. Despite the multitude of complexity involved, ethnicity identification comes naturally to humans. This meta information can be leveraged to make several decisions, be it in target marketing or security. With the recent development of intelligent systems a sub module to efficiently capture ethnicity would be useful in several use cases. Several attempts to identify an ideal learning model to represent a multi-ethnic dataset have been recorded. A comparative study of classifiers such as support vector machines, logistic regression has been documented. Experimental results indicate that the logical classifier provides a much accurate classification than the support vector machine.
Software tool for data mining and its applications
NASA Astrophysics Data System (ADS)
Yang, Jie; Ye, Chenzhou; Chen, Nianyi
2002-03-01
A software tool for data mining is introduced, which integrates pattern recognition (PCA, Fisher, clustering, hyperenvelop, regression), artificial intelligence (knowledge representation, decision trees), statistical learning (rough set, support vector machine), computational intelligence (neural network, genetic algorithm, fuzzy systems). It consists of nine function models: pattern recognition, decision trees, association rule, fuzzy rule, neural network, genetic algorithm, Hyper Envelop, support vector machine, visualization. The principle and knowledge representation of some function models of data mining are described. The software tool of data mining is realized by Visual C++ under Windows 2000. Nonmonotony in data mining is dealt with by concept hierarchy and layered mining. The software tool of data mining has satisfactorily applied in the prediction of regularities of the formation of ternary intermetallic compounds in alloy systems, and diagnosis of brain glioma.
Height and Weight Estimation From Anthropometric Measurements Using Machine Learning Regressions
Fernandes, Bruno J. T.; Roque, Alexandre
2018-01-01
Height and weight are measurements explored to tracking nutritional diseases, energy expenditure, clinical conditions, drug dosages, and infusion rates. Many patients are not ambulant or may be unable to communicate, and a sequence of these factors may not allow accurate estimation or measurements; in those cases, it can be estimated approximately by anthropometric means. Different groups have proposed different linear or non-linear equations which coefficients are obtained by using single or multiple linear regressions. In this paper, we present a complete study of the application of different learning models to estimate height and weight from anthropometric measurements: support vector regression, Gaussian process, and artificial neural networks. The predicted values are significantly more accurate than that obtained with conventional linear regressions. In all the cases, the predictions are non-sensitive to ethnicity, and to gender, if more than two anthropometric parameters are analyzed. The learning model analysis creates new opportunities for anthropometric applications in industry, textile technology, security, and health care. PMID:29651366
An LPV Adaptive Observer for Updating a Map Applied to an MAF Sensor in a Diesel Engine
Liu, Zhiyuan; Wang, Changhui
2015-01-01
In this paper, a new method for mass air flow (MAF) sensor error compensation and an online updating error map (or lookup table) due to installation and aging in a diesel engine is developed. Since the MAF sensor error is dependent on the engine operating point, the error model is represented as a two-dimensional (2D) map with two inputs, fuel mass injection quantity and engine speed. Meanwhile, the 2D map representing the MAF sensor error is described as a piecewise bilinear interpolation model, which can be written as a dot product between the regression vector and parameter vector using a membership function. With the combination of the 2D map regression model and the diesel engine air path system, an LPV adaptive observer with low computational load is designed to estimate states and parameters jointly. The convergence of the proposed algorithm is proven under the conditions of persistent excitation and given inequalities. The observer is validated against the simulation data from engine software enDYNA provided by Tesis. The results demonstrate that the operating point-dependent error of the MAF sensor can be approximated acceptably by the 2D map from the proposed method. PMID:26512675
Confounder Detection in High-Dimensional Linear Models Using First Moments of Spectral Measures.
Liu, Furui; Chan, Laiwan
2018-06-12
In this letter, we study the confounder detection problem in the linear model, where the target variable [Formula: see text] is predicted using its [Formula: see text] potential causes [Formula: see text]. Based on an assumption of a rotation-invariant generating process of the model, recent study shows that the spectral measure induced by the regression coefficient vector with respect to the covariance matrix of [Formula: see text] is close to a uniform measure in purely causal cases, but it differs from a uniform measure characteristically in the presence of a scalar confounder. Analyzing spectral measure patterns could help to detect confounding. In this letter, we propose to use the first moment of the spectral measure for confounder detection. We calculate the first moment of the regression vector-induced spectral measure and compare it with the first moment of a uniform spectral measure, both defined with respect to the covariance matrix of [Formula: see text]. The two moments coincide in nonconfounding cases and differ from each other in the presence of confounding. This statistical causal-confounding asymmetry can be used for confounder detection. Without the need to analyze the spectral measure pattern, our method avoids the difficulty of metric choice and multiple parameter optimization. Experiments on synthetic and real data show the performance of this method.
Stylianou, Neophytos; Akbarov, Artur; Kontopantelis, Evangelos; Buchan, Iain; Dunn, Ken W
2015-08-01
Predicting mortality from burn injury has traditionally employed logistic regression models. Alternative machine learning methods have been introduced in some areas of clinical prediction as the necessary software and computational facilities have become accessible. Here we compare logistic regression and machine learning predictions of mortality from burn. An established logistic mortality model was compared to machine learning methods (artificial neural network, support vector machine, random forests and naïve Bayes) using a population-based (England & Wales) case-cohort registry. Predictive evaluation used: area under the receiver operating characteristic curve; sensitivity; specificity; positive predictive value and Youden's index. All methods had comparable discriminatory abilities, similar sensitivities, specificities and positive predictive values. Although some machine learning methods performed marginally better than logistic regression the differences were seldom statistically significant and clinically insubstantial. Random forests were marginally better for high positive predictive value and reasonable sensitivity. Neural networks yielded slightly better prediction overall. Logistic regression gives an optimal mix of performance and interpretability. The established logistic regression model of burn mortality performs well against more complex alternatives. Clinical prediction with a small set of strong, stable, independent predictors is unlikely to gain much from machine learning outside specialist research contexts. Copyright © 2015 Elsevier Ltd and ISBI. All rights reserved.
Oliveira, Ana R S; Cohnstaedt, Lee W; Strathe, Erin; Hernández, Luciana Etcheverry; McVey, D Scott; Piaggio, José; Cernicchiaro, Natalia
2017-09-07
Japanese encephalitis (JE) is a zoonosis in Southeast Asia vectored by mosquitoes infected with the Japanese encephalitis virus (JEV). Japanese encephalitis is considered an emerging exotic infectious disease with potential for introduction in currently JEV-free countries. Pigs and ardeid birds are reservoir hosts and play a major role on the transmission dynamics of the disease. The objective of the study was to quantitatively summarize the proportion of JEV infection in vectors and vertebrate hosts from data pertaining to observational studies obtained in a systematic review of the literature on vector and host competence for JEV, using meta-analyses. Data gathered in this study pertained to three outcomes: proportion of JEV infection in vectors, proportion of JEV infection in vertebrate hosts, and minimum infection rate (MIR) in vectors. Random-effects subgroup meta-analysis models were fitted by species (mosquito or vertebrate host species) to estimate pooled summary measures, as well as to compute the variance between studies. Meta-regression models were fitted to assess the association between different predictors and the outcomes of interest and to identify sources of heterogeneity among studies. Predictors included in all models were mosquito/vertebrate host species, diagnostic methods, mosquito capture methods, season, country/region, age category, and number of mosquitos per pool. Mosquito species, diagnostic method, country, and capture method represented important sources of heterogeneity associated with the proportion of JEV infection; host species and region were considered sources of heterogeneity associated with the proportion of JEV infection in hosts; and diagnostic and mosquito capture methods were deemed important contributors of heterogeneity for the MIR outcome. Our findings provide reference pooled summary estimates of vector competence for JEV for some mosquito species, as well as of sources of variability for these outcomes. Moreover, this work provides useful guidelines when interpreting vector and host infection proportions or prevalence from observational studies, and contributes to further our understanding of vector and vertebrate host competence for JEV, elucidating information on the relative importance of vectors and hosts on JEV introduction and transmission.
Improving precision of glomerular filtration rate estimating model by ensemble learning.
Liu, Xun; Li, Ningshan; Lv, Linsheng; Fu, Yongmei; Cheng, Cailian; Wang, Caixia; Ye, Yuqiu; Li, Shaomin; Lou, Tanqi
2017-11-09
Accurate assessment of kidney function is clinically important, but estimates of glomerular filtration rate (GFR) by regression are imprecise. We hypothesized that ensemble learning could improve precision. A total of 1419 participants were enrolled, with 1002 in the development dataset and 417 in the external validation dataset. GFR was independently estimated from age, sex and serum creatinine using an artificial neural network (ANN), support vector machine (SVM), regression, and ensemble learning. GFR was measured by 99mTc-DTPA renal dynamic imaging calibrated with dual plasma sample 99mTc-DTPA GFR. Mean measured GFRs were 70.0 ml/min/1.73 m 2 in the developmental and 53.4 ml/min/1.73 m 2 in the external validation cohorts. In the external validation cohort, precision was better in the ensemble model of the ANN, SVM and regression equation (IQR = 13.5 ml/min/1.73 m 2 ) than in the new regression model (IQR = 14.0 ml/min/1.73 m 2 , P < 0.001). The precision of ensemble learning was the best of the three models, but the models had similar bias and accuracy. The median difference ranged from 2.3 to 3.7 ml/min/1.73 m 2 , 30% accuracy ranged from 73.1 to 76.0%, and P was > 0.05 for all comparisons of the new regression equation and the other new models. An ensemble learning model including three variables, the average ANN, SVM, and regression equation values, was more precise than the new regression model. A more complex ensemble learning strategy may further improve GFR estimates.
Unresolved Galaxy Classifier for ESA/Gaia mission: Support Vector Machines approach
NASA Astrophysics Data System (ADS)
Bellas-Velidis, Ioannis; Kontizas, Mary; Dapergolas, Anastasios; Livanou, Evdokia; Kontizas, Evangelos; Karampelas, Antonios
A software package Unresolved Galaxy Classifier (UGC) is being developed for the ground-based pipeline of ESA's Gaia mission. It aims to provide an automated taxonomic classification and specific parameters estimation analyzing Gaia BP/RP instrument low-dispersion spectra of unresolved galaxies. The UGC algorithm is based on a supervised learning technique, the Support Vector Machines (SVM). The software is implemented in Java as two separate modules. An offline learning module provides functions for SVM-models training. Once trained, the set of models can be repeatedly applied to unknown galaxy spectra by the pipeline's application module. A library of galaxy models synthetic spectra, simulated for the BP/RP instrument, is used to train and test the modules. Science tests show a very good classification performance of UGC and relatively good regression performance, except for some of the parameters. Possible approaches to improve the performance are discussed.
Saunders, Christina T; Blume, Jeffrey D
2017-10-26
Mediation analysis explores the degree to which an exposure's effect on an outcome is diverted through a mediating variable. We describe a classical regression framework for conducting mediation analyses in which estimates of causal mediation effects and their variance are obtained from the fit of a single regression model. The vector of changes in exposure pathway coefficients, which we named the essential mediation components (EMCs), is used to estimate standard causal mediation effects. Because these effects are often simple functions of the EMCs, an analytical expression for their model-based variance follows directly. Given this formula, it is instructive to revisit the performance of routinely used variance approximations (e.g., delta method and resampling methods). Requiring the fit of only one model reduces the computation time required for complex mediation analyses and permits the use of a rich suite of regression tools that are not easily implemented on a system of three equations, as would be required in the Baron-Kenny framework. Using data from the BRAIN-ICU study, we provide examples to illustrate the advantages of this framework and compare it with the existing approaches. © The Author 2017. Published by Oxford University Press.
Prediction of monthly rainfall in Victoria, Australia: Clusterwise linear regression approach
NASA Astrophysics Data System (ADS)
Bagirov, Adil M.; Mahmood, Arshad; Barton, Andrew
2017-05-01
This paper develops the Clusterwise Linear Regression (CLR) technique for prediction of monthly rainfall. The CLR is a combination of clustering and regression techniques. It is formulated as an optimization problem and an incremental algorithm is designed to solve it. The algorithm is applied to predict monthly rainfall in Victoria, Australia using rainfall data with five input meteorological variables over the period of 1889-2014 from eight geographically diverse weather stations. The prediction performance of the CLR method is evaluated by comparing observed and predicted rainfall values using four measures of forecast accuracy. The proposed method is also compared with the CLR using the maximum likelihood framework by the expectation-maximization algorithm, multiple linear regression, artificial neural networks and the support vector machines for regression models using computational results. The results demonstrate that the proposed algorithm outperforms other methods in most locations.
Chung, Moo K.; Qiu, Anqi; Seo, Seongho; Vorperian, Houri K.
2014-01-01
We present a novel kernel regression framework for smoothing scalar surface data using the Laplace-Beltrami eigenfunctions. Starting with the heat kernel constructed from the eigenfunctions, we formulate a new bivariate kernel regression framework as a weighted eigenfunction expansion with the heat kernel as the weights. The new kernel regression is mathematically equivalent to isotropic heat diffusion, kernel smoothing and recently popular diffusion wavelets. Unlike many previous partial differential equation based approaches involving diffusion, our approach represents the solution of diffusion analytically, reducing numerical inaccuracy and slow convergence. The numerical implementation is validated on a unit sphere using spherical harmonics. As an illustration, we have applied the method in characterizing the localized growth pattern of mandible surfaces obtained in CT images from subjects between ages 0 and 20 years by regressing the length of displacement vectors with respect to the template surface. PMID:25791435
An Intelligent Decision System for Intraoperative Somatosensory Evoked Potential Monitoring.
Fan, Bi; Li, Han-Xiong; Hu, Yong
2016-02-01
Somatosensory evoked potential (SEP) is a useful, noninvasive technique widely used for spinal cord monitoring during surgery. One of the main indicators of a spinal cord injury is the drop in amplitude of the SEP signal in comparison to the nominal baseline that is assumed to be constant during the surgery. However, in practice, the real-time baseline is not constant and may vary during the operation due to nonsurgical factors, such as blood pressure, anaesthesia, etc. Thus, a false warning is often generated if the nominal baseline is used for SEP monitoring. In current practice, human experts must be used to prevent this false warning. However, these well-trained human experts are expensive and may not be reliable and consistent due to various reasons like fatigue and emotion. In this paper, an intelligent decision system is proposed to improve SEP monitoring. First, the least squares support vector regression and multi-support vector regression models are trained to construct the dynamic baseline from historical data. Then a control chart is applied to detect abnormalities during surgery. The effectiveness of the intelligent decision system is evaluated by comparing its performance against the nominal baseline model by using the real experimental datasets derived from clinical conditions.
Structural features that predict real-value fluctuations of globular proteins
Jamroz, Michal; Kolinski, Andrzej; Kihara, Daisuke
2012-01-01
It is crucial to consider dynamics for understanding the biological function of proteins. We used a large number of molecular dynamics trajectories of non-homologous proteins as references and examined static structural features of proteins that are most relevant to fluctuations. We examined correlation of individual structural features with fluctuations and further investigated effective combinations of features for predicting the real-value of residue fluctuations using the support vector regression. It was found that some structural features have higher correlation than crystallographic B-factors with fluctuations observed in molecular dynamics trajectories. Moreover, support vector regression that uses combinations of static structural features showed accurate prediction of fluctuations with an average Pearson’s correlation coefficient of 0.669 and a root mean square error of 1.04 Å. This correlation coefficient is higher than the one observed for the prediction by the Gaussian network model. An advantage of the developed method over the Gaussian network models is that the former predicts the real-value of fluctuation. The results help improve our understanding of relationships between protein structure and fluctuation. Furthermore, the developed method provides a convienient practial way to predict fluctuations of proteins using easily computed static structural features of proteins. PMID:22328193
Dai, Wensheng; Wu, Jui-Yu; Lu, Chi-Jie
2014-01-01
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting.
Dai, Wensheng
2014-01-01
Sales forecasting is one of the most important issues in managing information technology (IT) chain store sales since an IT chain store has many branches. Integrating feature extraction method and prediction tool, such as support vector regression (SVR), is a useful method for constructing an effective sales forecasting scheme. Independent component analysis (ICA) is a novel feature extraction technique and has been widely applied to deal with various forecasting problems. But, up to now, only the basic ICA method (i.e., temporal ICA model) was applied to sale forecasting problem. In this paper, we utilize three different ICA methods including spatial ICA (sICA), temporal ICA (tICA), and spatiotemporal ICA (stICA) to extract features from the sales data and compare their performance in sales forecasting of IT chain store. Experimental results from a real sales data show that the sales forecasting scheme by integrating stICA and SVR outperforms the comparison models in terms of forecasting error. The stICA is a promising tool for extracting effective features from branch sales data and the extracted features can improve the prediction performance of SVR for sales forecasting. PMID:25165740
Multivariate Models for Prediction of Human Skin Sensitization ...
One of the lnteragency Coordinating Committee on the Validation of Alternative Method's (ICCVAM) top priorities is the development and evaluation of non-animal approaches to identify potential skin sensitizers. The complexity of biological events necessary to produce skin sensitization suggests that no single alternative method will replace the currently accepted animal tests. ICCVAM is evaluating an integrated approach to testing and assessment based on the adverse outcome pathway for skin sensitization that uses machine learning approaches to predict human skin sensitization hazard. We combined data from three in chemico or in vitro assays - the direct peptide reactivity assay (DPRA), human cell line activation test (h-CLAT) and KeratinoSens TM assay - six physicochemical properties and an in silico read-across prediction of skin sensitization hazard into 12 variable groups. The variable groups were evaluated using two machine learning approaches , logistic regression and support vector machine, to predict human skin sensitization hazard. Models were trained on 72 substances and tested on an external set of 24 substances. The six models (three logistic regression and three support vector machine) with the highest accuracy (92%) used: (1) DPRA, h-CLAT and read-across; (2) DPRA, h-CLAT, read-across and KeratinoSens; or (3) DPRA, h-CLAT, read-across, KeratinoSens and log P. The models performed better at predicting human skin sensitization hazard than the murine
NASA Astrophysics Data System (ADS)
Validi, AbdoulAhad
2014-03-01
This study introduces a non-intrusive approach in the context of low-rank separated representation to construct a surrogate of high-dimensional stochastic functions, e.g., PDEs/ODEs, in order to decrease the computational cost of Markov Chain Monte Carlo simulations in Bayesian inference. The surrogate model is constructed via a regularized alternative least-square regression with Tikhonov regularization using a roughening matrix computing the gradient of the solution, in conjunction with a perturbation-based error indicator to detect optimal model complexities. The model approximates a vector of a continuous solution at discrete values of a physical variable. The required number of random realizations to achieve a successful approximation linearly depends on the function dimensionality. The computational cost of the model construction is quadratic in the number of random inputs, which potentially tackles the curse of dimensionality in high-dimensional stochastic functions. Furthermore, this vector-valued separated representation-based model, in comparison to the available scalar-valued case, leads to a significant reduction in the cost of approximation by an order of magnitude equal to the vector size. The performance of the method is studied through its application to three numerical examples including a 41-dimensional elliptic PDE and a 21-dimensional cavity flow.
New analysis methods to push the boundaries of diagnostic techniques in the environmental sciences
NASA Astrophysics Data System (ADS)
Lungaroni, M.; Murari, A.; Peluso, E.; Gelfusa, M.; Malizia, A.; Vega, J.; Talebzadeh, S.; Gaudio, P.
2016-04-01
In the last years, new and more sophisticated measurements have been at the basis of the major progress in various disciplines related to the environment, such as remote sensing and thermonuclear fusion. To maximize the effectiveness of the measurements, new data analysis techniques are required. First data processing tasks, such as filtering and fitting, are of primary importance, since they can have a strong influence on the rest of the analysis. Even if Support Vector Regression is a method devised and refined at the end of the 90s, a systematic comparison with more traditional non parametric regression methods has never been reported. In this paper, a series of systematic tests is described, which indicates how SVR is a very competitive method of non-parametric regression that can usefully complement and often outperform more consolidated approaches. The performance of Support Vector Regression as a method of filtering is investigated first, comparing it with the most popular alternative techniques. Then Support Vector Regression is applied to the problem of non-parametric regression to analyse Lidar surveys for the environments measurement of particulate matter due to wildfires. The proposed approach has given very positive results and provides new perspectives to the interpretation of the data.
Lockaby, Graeme; Noori, Navideh; Morse, Wayde; Zipperer, Wayne; Kalin, Latif; Governo, Robin; Sawant, Rajesh; Ricker, Matthew
2016-12-01
The integrated effects of the many risk factors associated with West Nile virus (WNV) incidence are complex and not well understood. We studied an array of risk factors in and around Atlanta, GA, that have been shown to be linked with WNV in other locations. This array was comprehensive and included climate and meteorological metrics, vegetation characteristics, land use / land cover analyses, and socioeconomic factors. Data on mosquito abundance and WNV mosquito infection rates were obtained for 58 sites and covered 2009-2011, a period following the combined storm water - sewer overflow remediation in that city. Risk factors were compared to mosquito abundance and the WNV vector index (VI) using regression analyses individually and in combination. Lagged climate variables, including soil moisture and temperature, were significantly correlated (positively) with vector index as were forest patch size and percent pine composition of patches (both negatively). Socioeconomic factors that were most highly correlated (positively) with the VI included the proportion of low income households and homes built before 1960 and housing density. The model selected through stepwise regression that related risk factors to the VI included (in the order of decreasing influence) proportion of houses built before 1960, percent of pine in patches, and proportion of low income households. © 2016 The Society for Vector Ecology.
Chung, Moo K; Qiu, Anqi; Seo, Seongho; Vorperian, Houri K
2015-05-01
We present a novel kernel regression framework for smoothing scalar surface data using the Laplace-Beltrami eigenfunctions. Starting with the heat kernel constructed from the eigenfunctions, we formulate a new bivariate kernel regression framework as a weighted eigenfunction expansion with the heat kernel as the weights. The new kernel method is mathematically equivalent to isotropic heat diffusion, kernel smoothing and recently popular diffusion wavelets. The numerical implementation is validated on a unit sphere using spherical harmonics. As an illustration, the method is applied to characterize the localized growth pattern of mandible surfaces obtained in CT images between ages 0 and 20 by regressing the length of displacement vectors with respect to a surface template. Copyright © 2015 Elsevier B.V. All rights reserved.
Chen, Xi; Lu, Fang; Jiang, Lu-di; Cai, Yi-Lian; Li, Gong-Yu; Zhang, Yan-Ling
2016-07-01
Inhibition of cytochrome P450 (CYP450) enzymes is the most common reasons for drug interactions, so the study on early prediction of CYPs inhibitors can help to decrease the incidence of adverse reactions caused by drug interactions.CYP450 2E1(CYP2E1), as a key role in drug metabolism process, has broad spectrum of drug metabolism substrate. In this study, 32 CYP2E1 inhibitors were collected for the construction of support vector regression (SVR) model. The test set data were used to verify CYP2E1 quantitative models and obtain the optimal prediction model of CYP2E1 inhibitor. Meanwhile, one molecular docking program, CDOCKER, was utilized to analyze the interaction pattern between positive compounds and active pocket to establish the optimal screening model of CYP2E1 inhibitors.SVR model and molecular docking prediction model were combined to screen traditional Chinese medicine database (TCMD), which could improve the calculation efficiency and prediction accuracy. 6 376 traditional Chinese medicine (TCM) compounds predicted by SVR model were obtained, and in further verification by using molecular docking model, 247 TCM compounds with potential inhibitory activities against CYP2E1 were finally retained. Some of them have been verified by experiments. The results demonstrated that this study could provide guidance for the virtual screening of CYP450 inhibitors and the prediction of CYPs-mediated DDIs, and also provide references for clinical rational drug use. Copyright© by the Chinese Pharmaceutical Association.
Sallam, Mohamed F; Al Ahmed, Azzam M; Abdel-Dayem, Mahmoud S; Abdullah, Mohamed A R
2013-01-01
The mosquito, Culex tritaeniorhynchus Giles is a prevalent and confirmed Rift Valley Fever virus (RVFV) vector. This vector, in association with Aedimorphus arabiensis (Patton), was responsible for causing the outbreak of 2000 in Jazan Province, Saudi Arabia. Larval occurrence records and a total of 19 bioclimatic and three topographic layers imported from Worldclim Database were used to predict the larval suitable breeding habitats for this vector in Jazan Province using ArcGIS ver.10 and MaxEnt modeling program. Also, a supervised land cover classification from SPOT5 imagery was developed to assess the land cover distribution within the suitable predicted habitats. Eleven bioclimatic and slope attributes were found to be the significant predictors for this larval suitable breeding habitat. Precipitation and temperature were strong predictors of mosquito distribution. Among six land cover classes, the linear regression model (LM) indicated wet muddy substrate is significantly associated with high-very high suitable predicted habitats (R(2) = 73.7%, P<0.05). Also, LM indicated that total dissolved salts (TDS) was a significant contributor (R(2) = 23.9%, P<0.01) in determining mosquito larval abundance. This model is a first step in understanding the spatial distribution of Cx. tritaeniorhynchus and consequently the risk of RVFV in Saudi Arabia and to assist in planning effective mosquito surveillance and control programs by public health personnel and researchers.
A New Approach for Mobile Advertising Click-Through Rate Estimation Based on Deep Belief Nets.
Chen, Jie-Hao; Zhao, Zi-Qian; Shi, Ji-Yun; Zhao, Chong
2017-01-01
In recent years, with the rapid development of mobile Internet and its business applications, mobile advertising Click-Through Rate (CTR) estimation has become a hot research direction in the field of computational advertising, which is used to achieve accurate advertisement delivery for the best benefits in the three-side game between media, advertisers, and audiences. Current research on the estimation of CTR mainly uses the methods and models of machine learning, such as linear model or recommendation algorithms. However, most of these methods are insufficient to extract the data features and cannot reflect the nonlinear relationship between different features. In order to solve these problems, we propose a new model based on Deep Belief Nets to predict the CTR of mobile advertising, which combines together the powerful data representation and feature extraction capability of Deep Belief Nets, with the advantage of simplicity of traditional Logistic Regression models. Based on the training dataset with the information of over 40 million mobile advertisements during a period of 10 days, our experiments show that our new model has better estimation accuracy than the classic Logistic Regression (LR) model by 5.57% and Support Vector Regression (SVR) model by 5.80%.
A New Approach for Mobile Advertising Click-Through Rate Estimation Based on Deep Belief Nets
Zhao, Zi-Qian; Shi, Ji-Yun; Zhao, Chong
2017-01-01
In recent years, with the rapid development of mobile Internet and its business applications, mobile advertising Click-Through Rate (CTR) estimation has become a hot research direction in the field of computational advertising, which is used to achieve accurate advertisement delivery for the best benefits in the three-side game between media, advertisers, and audiences. Current research on the estimation of CTR mainly uses the methods and models of machine learning, such as linear model or recommendation algorithms. However, most of these methods are insufficient to extract the data features and cannot reflect the nonlinear relationship between different features. In order to solve these problems, we propose a new model based on Deep Belief Nets to predict the CTR of mobile advertising, which combines together the powerful data representation and feature extraction capability of Deep Belief Nets, with the advantage of simplicity of traditional Logistic Regression models. Based on the training dataset with the information of over 40 million mobile advertisements during a period of 10 days, our experiments show that our new model has better estimation accuracy than the classic Logistic Regression (LR) model by 5.57% and Support Vector Regression (SVR) model by 5.80%. PMID:29209363
Neonatal MRI is associated with future cognition and academic achievement in preterm children
Spencer-Smith, Megan; Thompson, Deanne K.; Doyle, Lex W.; Inder, Terrie E.; Anderson, Peter J.; Klingberg, Torkel
2015-01-01
School-age children born preterm are particularly at risk for low mathematical achievement, associated with reduced working memory and number skills. Early identification of preterm children at risk for future impairments using brain markers might assist in referral for early intervention. This study aimed to examine the use of neonatal magnetic resonance imaging measures derived from automated methods (Jacobian maps from deformation-based morphometry; fractional anisotropy maps from diffusion tensor images) to predict skills important for mathematical achievement (working memory, early mathematical skills) at 5 and 7 years in a cohort of preterm children using both univariable (general linear model) and multivariable models (support vector regression). Participants were preterm children born <30 weeks’ gestational age and healthy control children born ≥37 weeks’ gestational age at the Royal Women’s Hospital in Melbourne, Australia between July 2001 and December 2003 and recruited into a prospective longitudinal cohort study. At term-equivalent age ( ±2 weeks) 224 preterm and 46 control infants were recruited for magnetic resonance imaging. Working memory and early mathematics skills were assessed at 5 years (n = 195 preterm; n = 40 controls) and 7 years (n = 197 preterm; n = 43 controls). In the preterm group, results identified localized regions around the insula and putamen in the neonatal Jacobian map that were positively associated with early mathematics at 5 and 7 years (both P < 0.05), even after covarying for important perinatal clinical factors using general linear model but not support vector regression. The neonatal Jacobian map showed the same trend for association with working memory at 7 years (models ranging from P = 0.07 to P = 0.05). Neonatal fractional anisotropy was positively associated with working memory and early mathematics at 5 years (both P < 0.001) even after covarying for clinical factors using support vector regression but not general linear model. These significant relationships were not observed in the control group. In summary, we identified, in the preterm brain, regions around the insula and putamen using neonatal deformation-based morphometry, and brain microstructural organization using neonatal diffusion tensor imaging, associated with skills important for childhood mathematical achievement. Results contribute to the growing evidence for the clinical utility of neonatal magnetic resonance imaging for early identification of preterm infants at risk for childhood cognitive and academic impairment. PMID:26329284
Estimation of Electrically-Evoked Knee Torque from Mechanomyography Using Support Vector Regression.
Ibitoye, Morufu Olusola; Hamzaid, Nur Azah; Abdul Wahab, Ahmad Khairi; Hasnan, Nazirah; Olatunji, Sunday Olusanya; Davis, Glen M
2016-07-19
The difficulty of real-time muscle force or joint torque estimation during neuromuscular electrical stimulation (NMES) in physical therapy and exercise science has motivated recent research interest in torque estimation from other muscle characteristics. This study investigated the accuracy of a computational intelligence technique for estimating NMES-evoked knee extension torque based on the Mechanomyographic signals (MMG) of contracting muscles that were recorded from eight healthy males. Simulation of the knee torque was modelled via Support Vector Regression (SVR) due to its good generalization ability in related fields. Inputs to the proposed model were MMG amplitude characteristics, the level of electrical stimulation or contraction intensity, and knee angle. Gaussian kernel function, as well as its optimal parameters were identified with the best performance measure and were applied as the SVR kernel function to build an effective knee torque estimation model. To train and test the model, the data were partitioned into training (70%) and testing (30%) subsets, respectively. The SVR estimation accuracy, based on the coefficient of determination (R²) between the actual and the estimated torque values was up to 94% and 89% during the training and testing cases, with root mean square errors (RMSE) of 9.48 and 12.95, respectively. The knee torque estimations obtained using SVR modelling agreed well with the experimental data from an isokinetic dynamometer. These findings support the realization of a closed-loop NMES system for functional tasks using MMG as the feedback signal source and an SVR algorithm for joint torque estimation.
NASA Astrophysics Data System (ADS)
Tang, J. L.; Cai, C. Z.; Xiao, T. T.; Huang, S. J.
2012-07-01
The electrical conductivity of solid oxide fuel cell (SOFC) cathode is one of the most important indices affecting the efficiency of SOFC. In order to improve the performance of fuel cell system, it is advantageous to have accurate model with which one can predict the electrical conductivity. In this paper, a model utilizing support vector regression (SVR) approach combined with particle swarm optimization (PSO) algorithm for its parameter optimization was established to modeling and predicting the electrical conductivity of Ba0.5Sr0.5Co0.8Fe0.2 O3-δ-xSm0.5Sr0.5CoO3-δ (BSCF-xSSC) composite cathode under two influence factors, including operating temperature (T) and SSC content (x) in BSCF-xSSC composite cathode. The leave-one-out cross validation (LOOCV) test result by SVR strongly supports that the generalization ability of SVR model is high enough. The absolute percentage error (APE) of 27 samples does not exceed 0.05%. The mean absolute percentage error (MAPE) of all 30 samples is only 0.09% and the correlation coefficient (R2) as high as 0.999. This investigation suggests that the hybrid PSO-SVR approach may be not only a promising and practical methodology to simulate the properties of fuel cell system, but also a powerful tool to be used for optimal designing or controlling the operating process of a SOFC system.
Kim, Dong Wook; Kim, Hwiyoung; Nam, Woong; Kim, Hyung Jun; Cha, In-Ho
2018-04-23
The aim of this study was to build and validate five types of machine learning models that can predict the occurrence of BRONJ associated with dental extraction in patients taking bisphosphonates for the management of osteoporosis. A retrospective review of the medical records was conducted to obtain cases and controls for the study. Total 125 patients consisting of 41 cases and 84 controls were selected for the study. Five machine learning prediction algorithms including multivariable logistic regression model, decision tree, support vector machine, artificial neural network, and random forest were implemented. The outputs of these models were compared with each other and also with conventional methods, such as serum CTX level. Area under the receiver operating characteristic (ROC) curve (AUC) was used to compare the results. The performance of machine learning models was significantly superior to conventional statistical methods and single predictors. The random forest model yielded the best performance (AUC = 0.973), followed by artificial neural network (AUC = 0.915), support vector machine (AUC = 0.882), logistic regression (AUC = 0.844), decision tree (AUC = 0.821), drug holiday alone (AUC = 0.810), and CTX level alone (AUC = 0.630). Machine learning methods showed superior performance in predicting BRONJ associated with dental extraction compared to conventional statistical methods using drug holiday and serum CTX level. Machine learning can thus be applied in a wide range of clinical studies. Copyright © 2017. Published by Elsevier Inc.
NASA Astrophysics Data System (ADS)
Reinhardt, Katja; Samimi, Cyrus
2018-01-01
While climatological data of high spatial resolution are largely available in most developed countries, the network of climatological stations in many other regions of the world still constitutes large gaps. Especially for those regions, interpolation methods are important tools to fill these gaps and to improve the data base indispensible for climatological research. Over the last years, new hybrid methods of machine learning and geostatistics have been developed which provide innovative prospects in spatial predictive modelling. This study will focus on evaluating the performance of 12 different interpolation methods for the wind components \\overrightarrow{u} and \\overrightarrow{v} in a mountainous region of Central Asia. Thereby, a special focus will be on applying new hybrid methods on spatial interpolation of wind data. This study is the first evaluating and comparing the performance of several of these hybrid methods. The overall aim of this study is to determine whether an optimal interpolation method exists, which can equally be applied for all pressure levels, or whether different interpolation methods have to be used for the different pressure levels. Deterministic (inverse distance weighting) and geostatistical interpolation methods (ordinary kriging) were explored, which take into account only the initial values of \\overrightarrow{u} and \\overrightarrow{v} . In addition, more complex methods (generalized additive model, support vector machine and neural networks as single methods and as hybrid methods as well as regression-kriging) that consider additional variables were applied. The analysis of the error indices revealed that regression-kriging provided the most accurate interpolation results for both wind components and all pressure heights. At 200 and 500 hPa, regression-kriging is followed by the different kinds of neural networks and support vector machines and for 850 hPa it is followed by the different types of support vector machine and ordinary kriging. Overall, explanatory variables improve the interpolation results.
TI-59 Programs for Multiple Regression.
1980-05-01
general linear hypothesis model of full rank [ Graybill , 19611 can be written as Y = x 8 + C , s-N(O,o 2I) nxl nxk kxl nxl where Y is the vector of n...a "reduced model " solution, and confidence intervals for linear functions of the coefficients can be obtained using (x’x) and a2, based on the t...O107)l UA.LLL. Library ModuIe NASTER -Puter 0NTINA Cards 1 PROGRAM DESCRIPTION (s s 2 ror the general linear hypothesis model Y - XO + C’ calculates
Comparative decision models for anticipating shortage of food grain production in India
NASA Astrophysics Data System (ADS)
Chattopadhyay, Manojit; Mitra, Subrata Kumar
2018-01-01
This paper attempts to predict food shortages in advance from the analysis of rainfall during the monsoon months along with other inputs used for crop production, such as land used for cereal production, percentage of area covered under irrigation and fertiliser use. We used six binary classification data mining models viz., logistic regression, Multilayer Perceptron, kernel lab-Support Vector Machines, linear discriminant analysis, quadratic discriminant analysis and k-Nearest Neighbors Network, and found that linear discriminant analysis and kernel lab-Support Vector Machines are equally suitable for predicting per capita food shortage with 89.69 % accuracy in overall prediction and 92.06 % accuracy in predicting food shortage ( true negative rate). Advance information of food shortage can help policy makers to take remedial measures in order to prevent devastating consequences arising out of food non-availability.
Ji, Xiaoliang; Shang, Xu; Dahlgren, Randy A; Zhang, Minghua
2017-07-01
Accurate quantification of dissolved oxygen (DO) is critically important for managing water resources and controlling pollution. Artificial intelligence (AI) models have been successfully applied for modeling DO content in aquatic ecosystems with limited data. However, the efficacy of these AI models in predicting DO levels in the hypoxic river systems having multiple pollution sources and complicated pollutants behaviors is unclear. Given this dilemma, we developed a promising AI model, known as support vector machine (SVM), to predict the DO concentration in a hypoxic river in southeastern China. Four different calibration models, specifically, multiple linear regression, back propagation neural network, general regression neural network, and SVM, were established, and their prediction accuracy was systemically investigated and compared. A total of 11 hydro-chemical variables were used as model inputs. These variables were measured bimonthly at eight sampling sites along the rural-suburban-urban portion of Wen-Rui Tang River from 2004 to 2008. The performances of the established models were assessed through the mean square error (MSE), determination coefficient (R 2 ), and Nash-Sutcliffe (NS) model efficiency. The results indicated that the SVM model was superior to other models in predicting DO concentration in Wen-Rui Tang River. For SVM, the MSE, R 2 , and NS values for the testing subset were 0.9416 mg/L, 0.8646, and 0.8763, respectively. Sensitivity analysis showed that ammonium-nitrogen was the most significant input variable of the proposal SVM model. Overall, these results demonstrated that the proposed SVM model can efficiently predict water quality, especially for highly impaired and hypoxic river systems.
Developing a dengue forecast model using machine learning: A case study in China.
Guo, Pi; Liu, Tao; Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-10-01
In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011-2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics.
NASA Astrophysics Data System (ADS)
Liu, Ronghua; Sun, Qiaofeng; Hu, Tian; Li, Lian; Nie, Lei; Wang, Jiayue; Zhou, Wanhui; Zang, Hengchang
2018-03-01
As a powerful process analytical technology (PAT) tool, near infrared (NIR) spectroscopy has been widely used in real-time monitoring. In this study, NIR spectroscopy was applied to monitor multi-parameters of traditional Chinese medicine (TCM) Shenzhiling oral liquid during the concentration process to guarantee the quality of products. Five lab scale batches were employed to construct quantitative models to determine five chemical ingredients and physical change (samples density) during concentration process. The paeoniflorin, albiflorin, liquiritin and samples density were modeled by partial least square regression (PLSR), while the content of the glycyrrhizic acid and cinnamic acid were modeled by support vector machine regression (SVMR). Standard normal variate (SNV) and/or Savitzkye-Golay (SG) smoothing with derivative methods were adopted for spectra pretreatment. Variable selection methods including correlation coefficient (CC), competitive adaptive reweighted sampling (CARS) and interval partial least squares regression (iPLS) were performed for optimizing the models. The results indicated that NIR spectroscopy was an effective tool to successfully monitoring the concentration process of Shenzhiling oral liquid.
[New method of mixed gas infrared spectrum analysis based on SVM].
Bai, Peng; Xie, Wen-Jun; Liu, Jun-Hua
2007-07-01
A new method of infrared spectrum analysis based on support vector machine (SVM) for mixture gas was proposed. The kernel function in SVM was used to map the seriously overlapping absorption spectrum into high-dimensional space, and after transformation, the high-dimensional data could be processed in the original space, so the regression calibration model was established, then the regression calibration model with was applied to analyze the concentration of component gas. Meanwhile it was proved that the regression calibration model with SVM also could be used for component recognition of mixture gas. The method was applied to the analysis of different data samples. Some factors such as scan interval, range of the wavelength, kernel function and penalty coefficient C that affect the model were discussed. Experimental results show that the component concentration maximal Mean AE is 0.132%, and the component recognition accuracy is higher than 94%. The problems of overlapping absorption spectrum, using the same method for qualitative and quantitative analysis, and limit number of training sample, were solved. The method could be used in other mixture gas infrared spectrum analyses, promising theoretic and application values.
Solving large test-day models by iteration on data and preconditioned conjugate gradient.
Lidauer, M; Strandén, I; Mäntysaari, E A; Pösö, J; Kettunen, A
1999-12-01
A preconditioned conjugate gradient method was implemented into an iteration on a program for data estimation of breeding values, and its convergence characteristics were studied. An algorithm was used as a reference in which one fixed effect was solved by Gauss-Seidel method, and other effects were solved by a second-order Jacobi method. Implementation of the preconditioned conjugate gradient required storing four vectors (size equal to number of unknowns in the mixed model equations) in random access memory and reading the data at each round of iteration. The preconditioner comprised diagonal blocks of the coefficient matrix. Comparison of algorithms was based on solutions of mixed model equations obtained by a single-trait animal model and a single-trait, random regression test-day model. Data sets for both models used milk yield records of primiparous Finnish dairy cows. Animal model data comprised 665,629 lactation milk yields and random regression test-day model data of 6,732,765 test-day milk yields. Both models included pedigree information of 1,099,622 animals. The animal model ¿random regression test-day model¿ required 122 ¿305¿ rounds of iteration to converge with the reference algorithm, but only 88 ¿149¿ were required with the preconditioned conjugate gradient. To solve the random regression test-day model with the preconditioned conjugate gradient required 237 megabytes of random access memory and took 14% of the computation time needed by the reference algorithm.
Javed, Faizan; Savkin, Andrey V; Chan, Gregory S H; Middleton, Paul M; Malouf, Philip; Steel, Elizabeth; Mackie, James; Lovell, Nigel H
2009-11-01
This study aims to assess the blood volume and heart rate (HR) responses during haemodialysis in fluid overloaded patients by a nonparametric nonlinear regression approach based on a support vector machine (SVM). Relative blood volume (RBV) and electrocardiogram (ECG) was recorded from 23 haemodynamically stable renal failure patients during regular haemodialysis. Modelling was performed on 18 fluid overloaded patients (fluid removal of >2 L). SVM-based regression was used to obtain the models of RBV change with time as well as the percentage change in HR with respect to RBV. Mean squared error (MSE) and goodness of fit (R(2)) were used for comparison among different kernel functions. The design parameters were estimated using a grid search approach and the selected models were validated by a k-fold cross-validation technique. For the model of HR versus RBV change, a radial basis function (RBF) kernel (MSE = 17.37 and R(2) = 0.932) gave the least MSE compared to linear (MSE = 25.97 and R(2) = 0.898) and polynomial (MSE = 18.18 and R(2)= 0.929). The MSE was significantly lower for training data set when using RBF kernel compared to other kernels (p < 0.01). The RBF kernel also provided a slightly better fit of RBV change with time (MSE = 1.12 and R(2) = 0.91) compared to a linear kernel (MSE = 1.46 and R(2) = 0.88). The modelled HR response was characterized by an initial drop and a subsequent rise during progressive reduction in RBV, which may be interpreted as the reflex response to a transition from central hypervolaemia to hypovolaemia. These modelled curves can be used as references to a controller that can be designed to regulate the haemodynamic variables to ensure the stability of patients undergoing haemodialysis.
NASA Astrophysics Data System (ADS)
Wang, Lunche; Kisi, Ozgur; Zounemat-Kermani, Mohammad; Li, Hui
2017-01-01
Pan evaporation (Ep) plays important roles in agricultural water resources management. One of the basic challenges is modeling Ep using limited climatic parameters because there are a number of factors affecting the evaporation rate. This study investigated the abilities of six different soft computing methods, multi-layer perceptron (MLP), generalized regression neural network (GRNN), fuzzy genetic (FG), least square support vector machine (LSSVM), multivariate adaptive regression spline (MARS), adaptive neuro-fuzzy inference systems with grid partition (ANFIS-GP), and two regression methods, multiple linear regression (MLR) and Stephens and Stewart model (SS) in predicting monthly Ep. Long-term climatic data at various sites crossing a wide range of climates during 1961-2000 are used for model development and validation. The results showed that the models have different accuracies in different climates and the MLP model performed superior to the other models in predicting monthly Ep at most stations using local input combinations (for example, the MAE (mean absolute errors), RMSE (root mean square errors), and determination coefficient (R2) are 0.314 mm/day, 0.405 mm/day and 0.988, respectively for HEB station), while GRNN model performed better in Tibetan Plateau (MAE, RMSE and R2 are 0.459 mm/day, 0.592 mm/day and 0.932, respectively). The accuracies of above models ranked as: MLP, GRNN, LSSVM, FG, ANFIS-GP, MARS and MLR. The overall results indicated that the soft computing techniques generally performed better than the regression methods, but MLR and SS models can be more preferred at some climatic zones instead of complex nonlinear models, for example, the BJ (Beijing), CQ (Chongqing) and HK (Haikou) stations. Therefore, it can be concluded that Ep could be successfully predicted using above models in hydrological modeling studies.
Application of near-infrared spectroscopy in the detection of fat-soluble vitamins in premix feed
NASA Astrophysics Data System (ADS)
Jia, Lian Ping; Tian, Shu Li; Zheng, Xue Cong; Jiao, Peng; Jiang, Xun Peng
2018-02-01
Vitamin is the organic compound and necessary for animal physiological maintenance. The rapid determination of the content of different vitamins in premix feed can help to achieve accurate diets and efficient feeding. Compared with high-performance liquid chromatography and other wet chemical methods, near-infrared spectroscopy is a fast, non-destructive, non-polluting method. 168 samples of premix feed were collected and the contents of vitamin A, vitamin E and vitamin D3 were detected by the standard method. The near-infrared spectra of samples ranging from 10 000 to 4 000 cm-1 were obtained. Partial least squares regression (PLSR) and support vector machine regression (SVMR) were used to construct the quantitative model. The results showed that the RMSEP of PLSR model of vitamin A, vitamin E and vitamin D3 were 0.43×107 IU/kg, 0.09×105 IU/kg and 0.17×107 IU/kg, respectively. The RMSEP of SVMR model was 0.45×107 IU/kg, 0.11×105 IU/kg and 0.18×107 IU/kg. Compared with nonlinear regression method (SVMR), linear regression method (PLSR) is more suitable for the quantitative analysis of vitamins in premix feed.
Field applications of stand-off sensing using visible/NIR multivariate optical computing
NASA Astrophysics Data System (ADS)
Eastwood, DeLyle; Soyemi, Olusola O.; Karunamuni, Jeevanandra; Zhang, Lixia; Li, Hongli; Myrick, Michael L.
2001-02-01
12 A novel multivariate visible/NIR optical computing approach applicable to standoff sensing will be demonstrated with porphyrin mixtures as examples. The ultimate goal is to develop environmental or counter-terrorism sensors for chemicals such as organophosphorus (OP) pesticides or chemical warfare simulants in the near infrared spectral region. The mathematical operation that characterizes prediction of properties via regression from optical spectra is a calculation of inner products between the spectrum and the pre-determined regression vector. The result is scaled appropriately and offset to correspond to the basis from which the regression vector is derived. The process involves collecting spectroscopic data and synthesizing a multivariate vector using a pattern recognition method. Then, an interference coating is designed that reproduces the pattern of the multivariate vector in its transmission or reflection spectrum, and appropriate interference filters are fabricated. High and low refractive index materials such as Nb2O5 and SiO2 are excellent choices for the visible and near infrared regions. The proof of concept has now been established for this system in the visible and will later be extended to chemicals such as OP compounds in the near and mid-infrared.
Method for enhanced accuracy in predicting peptides using liquid separations or chromatography
Kangas, Lars J.; Auberry, Kenneth J.; Anderson, Gordon A.; Smith, Richard D.
2006-11-14
A method for predicting the elution time of a peptide in chromatographic and electrophoretic separations by first providing a data set of known elution times of known peptides, then creating a plurality of vectors, each vector having a plurality of dimensions, and each dimension representing the elution time of amino acids present in each of these known peptides from the data set. The elution time of any protein is then be predicted by first creating a vector by assigning dimensional values for the elution time of amino acids of at least one hypothetical peptide and then calculating a predicted elution time for the vector by performing a multivariate regression of the dimensional values of the hypothetical peptide using the dimensional values of the known peptides. Preferably, the multivariate regression is accomplished by the use of an artificial neural network and the elution times are first normalized using a transfer function.
Wang, Xibin; Luo, Fengji; Qian, Ying; Ranzi, Gianluca
2016-01-01
With the rapid development of ICT and Web technologies, a large an amount of information is becoming available and this is producing, in some instances, a condition of information overload. Under these conditions, it is difficult for a person to locate and access useful information for making decisions. To address this problem, there are information filtering systems, such as the personalized recommendation system (PRS) considered in this paper, that assist a person in identifying possible products or services of interest based on his/her preferences. Among available approaches, collaborative Filtering (CF) is one of the most widely used recommendation techniques. However, CF has some limitations, e.g., the relatively simple similarity calculation, cold start problem, etc. In this context, this paper presents a new regression model based on the support vector machine (SVM) classification and an improved PSO (IPSO) for the development of an electronic movie PRS. In its implementation, a SVM classification model is first established to obtain a preliminary movie recommendation list based on which a SVM regression model is applied to predict movies’ ratings. The proposed PRS not only considers the movie’s content information but also integrates the users’ demographic and behavioral information to better capture the users’ interests and preferences. The efficiency of the proposed method is verified by a series of experiments based on the MovieLens benchmark data set. PMID:27898691
Wang, Xibin; Luo, Fengji; Qian, Ying; Ranzi, Gianluca
2016-01-01
With the rapid development of ICT and Web technologies, a large an amount of information is becoming available and this is producing, in some instances, a condition of information overload. Under these conditions, it is difficult for a person to locate and access useful information for making decisions. To address this problem, there are information filtering systems, such as the personalized recommendation system (PRS) considered in this paper, that assist a person in identifying possible products or services of interest based on his/her preferences. Among available approaches, collaborative Filtering (CF) is one of the most widely used recommendation techniques. However, CF has some limitations, e.g., the relatively simple similarity calculation, cold start problem, etc. In this context, this paper presents a new regression model based on the support vector machine (SVM) classification and an improved PSO (IPSO) for the development of an electronic movie PRS. In its implementation, a SVM classification model is first established to obtain a preliminary movie recommendation list based on which a SVM regression model is applied to predict movies' ratings. The proposed PRS not only considers the movie's content information but also integrates the users' demographic and behavioral information to better capture the users' interests and preferences. The efficiency of the proposed method is verified by a series of experiments based on the MovieLens benchmark data set.
Temperature-based estimation of global solar radiation using soft computing methodologies
NASA Astrophysics Data System (ADS)
Mohammadi, Kasra; Shamshirband, Shahaboddin; Danesh, Amir Seyed; Abdullah, Mohd Shahidan; Zamani, Mazdak
2016-07-01
Precise knowledge of solar radiation is indeed essential in different technological and scientific applications of solar energy. Temperature-based estimation of global solar radiation would be appealing owing to broad availability of measured air temperatures. In this study, the potentials of soft computing techniques are evaluated to estimate daily horizontal global solar radiation (DHGSR) from measured maximum, minimum, and average air temperatures ( T max, T min, and T avg) in an Iranian city. For this purpose, a comparative evaluation between three methodologies of adaptive neuro-fuzzy inference system (ANFIS), radial basis function support vector regression (SVR-rbf), and polynomial basis function support vector regression (SVR-poly) is performed. Five combinations of T max, T min, and T avg are served as inputs to develop ANFIS, SVR-rbf, and SVR-poly models. The attained results show that all ANFIS, SVR-rbf, and SVR-poly models provide favorable accuracy. Based upon all techniques, the higher accuracies are achieved by models (5) using T max- T min and T max as inputs. According to the statistical results, SVR-rbf outperforms SVR-poly and ANFIS. For SVR-rbf (5), the mean absolute bias error, root mean square error, and correlation coefficient are 1.1931 MJ/m2, 2.0716 MJ/m2, and 0.9380, respectively. The survey results approve that SVR-rbf can be used efficiently to estimate DHGSR from air temperatures.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-25
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.
NASA Astrophysics Data System (ADS)
Zhan, Liwei; Li, Chengwei
2017-02-01
A hybrid PSO-SVM-based model is proposed to predict the friction coefficient between aircraft tire and coating. The presented hybrid model combines a support vector machine (SVM) with particle swarm optimization (PSO) technique. SVM has been adopted to solve regression problems successfully. Its regression accuracy is greatly related to optimizing parameters such as the regularization constant C , the parameter gamma γ corresponding to RBF kernel and the epsilon parameter \\varepsilon in the SVM training procedure. However, the friction coefficient which is predicted based on SVM has yet to be explored between aircraft tire and coating. The experiment reveals that drop height and tire rotational speed are the factors affecting friction coefficient. Bearing in mind, the friction coefficient can been predicted using the hybrid PSO-SVM-based model by the measured friction coefficient between aircraft tire and coating. To compare regression accuracy, a grid search (GS) method and a genetic algorithm (GA) are used to optimize the relevant parameters (C , γ and \\varepsilon ), respectively. The regression accuracy could be reflected by the coefficient of determination ({{R}2} ). The result shows that the hybrid PSO-RBF-SVM-based model has better accuracy compared with the GS-RBF-SVM- and GA-RBF-SVM-based models. The agreement of this model (PSO-RBF-SVM) with experiment data confirms its good performance.
Hettige, Nuwan C; Nguyen, Thai Binh; Yuan, Chen; Rajakulendran, Thanara; Baddour, Jermeen; Bhagwat, Nikhil; Bani-Fatemi, Ali; Voineskos, Aristotle N; Mallar Chakravarty, M; De Luca, Vincenzo
2017-07-01
Suicide is a major concern for those afflicted by schizophrenia. Identifying patients at the highest risk for future suicide attempts remains a complex problem for psychiatric interventions. Machine learning models allow for the integration of many risk factors in order to build an algorithm that predicts which patients are likely to attempt suicide. Currently it is unclear how to integrate previously identified risk factors into a clinically relevant predictive tool to estimate the probability of a patient with schizophrenia for attempting suicide. We conducted a cross-sectional assessment on a sample of 345 participants diagnosed with schizophrenia spectrum disorders. Suicide attempters and non-attempters were clearly identified using the Columbia Suicide Severity Rating Scale (C-SSRS) and the Beck Suicide Ideation Scale (BSS). We developed four classification algorithms using a regularized regression, random forest, elastic net and support vector machine models with sociocultural and clinical variables as features to train the models. All classification models performed similarly in identifying suicide attempters and non-attempters. Our regularized logistic regression model demonstrated an accuracy of 67% and an area under the curve (AUC) of 0.71, while the random forest model demonstrated 66% accuracy and an AUC of 0.67. Support vector classifier (SVC) model demonstrated an accuracy of 67% and an AUC of 0.70, and the elastic net model demonstrated and accuracy of 65% and an AUC of 0.71. Machine learning algorithms offer a relatively successful method for incorporating many clinical features to predict individuals at risk for future suicide attempts. Increased performance of these models using clinically relevant variables offers the potential to facilitate early treatment and intervention to prevent future suicide attempts. Copyright © 2017 Elsevier Inc. All rights reserved.
Deep learning architecture for air quality predictions.
Li, Xiang; Peng, Ling; Hu, Yuan; Shao, Jing; Chi, Tianhe
2016-11-01
With the rapid development of urbanization and industrialization, many developing countries are suffering from heavy air pollution. Governments and citizens have expressed increasing concern regarding air pollution because it affects human health and sustainable development worldwide. Current air quality prediction methods mainly use shallow models; however, these methods produce unsatisfactory results, which inspired us to investigate methods of predicting air quality based on deep architecture models. In this paper, a novel spatiotemporal deep learning (STDL)-based air quality prediction method that inherently considers spatial and temporal correlations is proposed. A stacked autoencoder (SAE) model is used to extract inherent air quality features, and it is trained in a greedy layer-wise manner. Compared with traditional time series prediction models, our model can predict the air quality of all stations simultaneously and shows the temporal stability in all seasons. Moreover, a comparison with the spatiotemporal artificial neural network (STANN), auto regression moving average (ARMA), and support vector regression (SVR) models demonstrates that the proposed method of performing air quality predictions has a superior performance.
NASA Astrophysics Data System (ADS)
Li, Lin
2008-12-01
Partial least squares (PLS) regressions were applied to lunar highland and mare soil data characterized by the Lunar Soil Characterization Consortium (LSCC) for spectral estimation of the abundance of lunar soil chemical constituents FeO and Al2O3. The LSCC data set was split into a number of subsets including the total highland, Apollo 16, Apollo 14, and total mare soils, and then PLS was applied to each to investigate the effect of nonlinearity on the performance of the PLS method. The weight-loading vectors resulting from PLS were analyzed to identify mineral species responsible for spectral estimation of the soil chemicals. The results from PLS modeling indicate that the PLS performance depends on the correlation of constituents of interest to their major mineral carriers, and the Apollo 16 soils are responsible for the large errors of FeO and Al2O3 estimates when the soils were modeled along with other types of soils. These large errors are primarily attributed to the degraded correlation FeO to pyroxene for the relatively mature Apollo 16 soils as a result of space weathering and secondary to the interference of olivine. PLS consistently yields very accurate fits to the two soil chemicals when applied to mare soils. Although Al2O3 has no spectrally diagnostic characteristics, this chemical can be predicted for all subset data by PLS modeling at high accuracies because of its correlation to FeO. This correlation is reflected in the symmetry of the PLS weight-loading vectors for FeO and Al2O3, which prove to be very useful for qualitative interpretation of the PLS results. However, this qualitative interpretation of PLS modeling cannot be achieved using principal component regression loading vectors.
Least Square Regression Method for Estimating Gas Concentration in an Electronic Nose System
Khalaf, Walaa; Pace, Calogero; Gaudioso, Manlio
2009-01-01
We describe an Electronic Nose (ENose) system which is able to identify the type of analyte and to estimate its concentration. The system consists of seven sensors, five of them being gas sensors (supplied with different heater voltage values), the remainder being a temperature and a humidity sensor, respectively. To identify a new analyte sample and then to estimate its concentration, we use both some machine learning techniques and the least square regression principle. In fact, we apply two different training models; the first one is based on the Support Vector Machine (SVM) approach and is aimed at teaching the system how to discriminate among different gases, while the second one uses the least squares regression approach to predict the concentration of each type of analyte. PMID:22573980
Retrieval and Mapping of Heavy Metal Concentration in Soil Using Time Series Landsat 8 Imagery
NASA Astrophysics Data System (ADS)
Fang, Y.; Xu, L.; Peng, J.; Wang, H.; Wong, A.; Clausi, D. A.
2018-04-01
Heavy metal pollution is a critical global environmental problem which has always been a concern. Traditional approach to obtain heavy metal concentration relying on field sampling and lab testing is expensive and time consuming. Although many related studies use spectrometers data to build relational model between heavy metal concentration and spectra information, and then use the model to perform prediction using the hyperspectral imagery, this manner can hardly quickly and accurately map soil metal concentration of an area due to the discrepancies between spectrometers data and remote sensing imagery. Taking the advantage of easy accessibility of Landsat 8 data, this study utilizes Landsat 8 imagery to retrieve soil Cu concentration and mapping its distribution in the study area. To enlarge the spectral information for more accurate retrieval and mapping, 11 single date Landsat 8 imagery from 2013-2017 are selected to form a time series imagery. Three regression methods, partial least square regression (PLSR), artificial neural network (ANN) and support vector regression (SVR) are used to model construction. By comparing these models unbiasedly, the best model are selected to mapping Cu concentration distribution. The produced distribution map shows a good spatial autocorrelation and consistency with the mining area locations.
Body Fat Percentage Prediction Using Intelligent Hybrid Approaches
Shao, Yuehjen E.
2014-01-01
Excess of body fat often leads to obesity. Obesity is typically associated with serious medical diseases, such as cancer, heart disease, and diabetes. Accordingly, knowing the body fat is an extremely important issue since it affects everyone's health. Although there are several ways to measure the body fat percentage (BFP), the accurate methods are often associated with hassle and/or high costs. Traditional single-stage approaches may use certain body measurements or explanatory variables to predict the BFP. Diverging from existing approaches, this study proposes new intelligent hybrid approaches to obtain fewer explanatory variables, and the proposed forecasting models are able to effectively predict the BFP. The proposed hybrid models consist of multiple regression (MR), artificial neural network (ANN), multivariate adaptive regression splines (MARS), and support vector regression (SVR) techniques. The first stage of the modeling includes the use of MR and MARS to obtain fewer but more important sets of explanatory variables. In the second stage, the remaining important variables are served as inputs for the other forecasting methods. A real dataset was used to demonstrate the development of the proposed hybrid models. The prediction results revealed that the proposed hybrid schemes outperformed the typical, single-stage forecasting models. PMID:24723804
Prediction of Baseflow Index of Catchments using Machine Learning Algorithms
NASA Astrophysics Data System (ADS)
Yadav, B.; Hatfield, K.
2017-12-01
We present the results of eight machine learning techniques for predicting the baseflow index (BFI) of ungauged basins using a surrogate of catchment scale climate and physiographic data. The tested algorithms include ordinary least squares, ridge regression, least absolute shrinkage and selection operator (lasso), elasticnet, support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Our work seeks to identify the dominant controls of BFI that can be readily obtained from ancillary geospatial databases and remote sensing measurements, such that the developed techniques can be extended to ungauged catchments. More than 800 gauged catchments spanning the continental United States were selected to develop the general methodology. The BFI calculation was based on the baseflow separated from daily streamflow hydrograph using HYSEP filter. The surrogate catchment attributes were compiled from multiple sources including digital elevation model, soil, landuse, climate data, other publicly available ancillary and geospatial data. 80% catchments were used to train the ML algorithms, and the remaining 20% of the catchments were used as an independent test set to measure the generalization performance of fitted models. A k-fold cross-validation using exhaustive grid search was used to fit the hyperparameters of each model. Initial model development was based on 19 independent variables, but after variable selection and feature ranking, we generated revised sparse models of BFI prediction that are based on only six catchment attributes. These key predictive variables selected after the careful evaluation of bias-variance tradeoff include average catchment elevation, slope, fraction of sand, permeability, temperature, and precipitation. The most promising algorithms exceeding an accuracy score (r-square) of 0.7 on test data include support vector machine, gradient boosted regression trees, random forests, and extremely randomized trees. Considering both the accuracy and the computational complexity of these algorithms, we identify the extremely randomized trees as the best performing algorithm for BFI prediction in ungauged basins.
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
NASA Astrophysics Data System (ADS)
Liu, Fei; He, Yong
2008-02-01
Visible and near infrared (Vis/NIR) transmission spectroscopy and chemometric methods were utilized to predict the pH values of cola beverages. Five varieties of cola were prepared and 225 samples (45 samples for each variety) were selected for the calibration set, while 75 samples (15 samples for each variety) for the validation set. The smoothing way of Savitzky-Golay and standard normal variate (SNV) followed by first-derivative were used as the pre-processing methods. Partial least squares (PLS) analysis was employed to extract the principal components (PCs) which were used as the inputs of least squares-support vector machine (LS-SVM) model according to their accumulative reliabilities. Then LS-SVM with radial basis function (RBF) kernel function and a two-step grid search technique were applied to build the regression model with a comparison of PLS regression. The correlation coefficient (r), root mean square error of prediction (RMSEP) and bias were 0.961, 0.040 and 0.012 for PLS, while 0.975, 0.031 and 4.697x10 -3 for LS-SVM, respectively. Both methods obtained a satisfying precision. The results indicated that Vis/NIR spectroscopy combined with chemometric methods could be applied as an alternative way for the prediction of pH of cola beverages.
NASA Astrophysics Data System (ADS)
Dong, Hancheng; Jin, Xiaoning; Lou, Yangbing; Wang, Changhong
2014-12-01
Lithium-ion batteries are used as the main power source in many electronic and electrical devices. In particular, with the growth in battery-powered electric vehicle development, the lithium-ion battery plays a critical role in the reliability of vehicle systems. In order to provide timely maintenance and replacement of battery systems, it is necessary to develop a reliable and accurate battery health diagnostic that takes a prognostic approach. Therefore, this paper focuses on two main methods to determine a battery's health: (1) Battery State-of-Health (SOH) monitoring and (2) Remaining Useful Life (RUL) prediction. Both of these are calculated by using a filter algorithm known as the Support Vector Regression-Particle Filter (SVR-PF). Models for battery SOH monitoring based on SVR-PF are developed with novel capacity degradation parameters introduced to determine battery health in real time. Moreover, the RUL prediction model is proposed, which is able to provide the RUL value and update the RUL probability distribution to the End-of-Life cycle. Results for both methods are presented, showing that the proposed SOH monitoring and RUL prediction methods have good performance and that the SVR-PF has better monitoring and prediction capability than the standard particle filter (PF).
NASA Astrophysics Data System (ADS)
Duan, Libin; Xiao, Ning-cong; Li, Guangyao; Cheng, Aiguo; Chen, Tao
2017-07-01
Tailor-rolled blank thin-walled (TRB-TH) structures have become important vehicle components owing to their advantages of light weight and crashworthiness. The purpose of this article is to provide an efficient lightweight design for improving the energy-absorbing capability of TRB-TH structures under dynamic loading. A finite element (FE) model for TRB-TH structures is established and validated by performing a dynamic axial crash test. Different material properties for individual parts with different thicknesses are considered in the FE model. Then, a multi-objective crashworthiness design of the TRB-TH structure is constructed based on the ɛ-support vector regression (ɛ-SVR) technique and non-dominated sorting genetic algorithm-II. The key parameters (C, ɛ and σ) are optimized to further improve the predictive accuracy of ɛ-SVR under limited sample points. Finally, the technique for order preference by similarity to the ideal solution method is used to rank the solutions in Pareto-optimal frontiers and find the best compromise optima. The results demonstrate that the light weight and crashworthiness performance of the optimized TRB-TH structures are superior to their uniform thickness counterparts. The proposed approach provides useful guidance for designing TRB-TH energy absorbers for vehicle bodies.
Jiang, Huaiguang; Zhang, Yingchen; Muljadi, Eduard; ...
2016-01-01
This paper proposes an approach for distribution system load forecasting, which aims to provide highly accurate short-term load forecasting with high resolution utilizing a support vector regression (SVR) based forecaster and a two-step hybrid parameters optimization method. Specifically, because the load profiles in distribution systems contain abrupt deviations, a data normalization is designed as the pretreatment for the collected historical load data. Then an SVR model is trained by the load data to forecast the future load. For better performance of SVR, a two-step hybrid optimization algorithm is proposed to determine the best parameters. In the first step of themore » hybrid optimization algorithm, a designed grid traverse algorithm (GTA) is used to narrow the parameters searching area from a global to local space. In the second step, based on the result of the GTA, particle swarm optimization (PSO) is used to determine the best parameters in the local parameter space. After the best parameters are determined, the SVR model is used to forecast the short-term load deviation in the distribution system. The performance of the proposed approach is compared to some classic methods in later sections of the paper.« less
Michaels, Sarah R.; Riegel, Claudia; Pereira, Roberto M.; Zipperer, Wayne; Lockaby, B. Graeme; Koehler, Philip G.
2017-01-01
The consistent sporadic transmission of West Nile Virus (WNV) in the city of New Orleans justifies the need for distribution risk maps highlighting human risk of mosquito bites. We modeled the influence of biophysical and socioeconomic metrics on the spatio-temporal distributions of presence/vector-host contact (VHC) ratios of WNV vector, Culex quinquefasciatus, within their flight range. Biophysical and socioeconomic data were extracted within 5-km buffer radii around sampling localities of gravid female Culex quinquefasciatus. The spatio-temporal correlations between VHC data and 33 variables, including climate, land use-land cover (LULC), socioeconomic, and land surface terrain were analyzed using stepwise linear regression models (RM). Using MaxEnt, we developed a distribution model using the correlated predicting variables. Only 12 factors showed significant correlations with spatial distribution of VHC ratios (R2 = 81.62, p < 0.01). Non-forested wetland (NFWL), tree density (TD) and residential-urban (RU) settings demonstrated the strongest relationship. The VHC ratios showed monthly environmental resilience in terms of number and type of influential factors. The highest prediction power of RU and other urban and built up land (OUBL), was demonstrated during May–August. This association was positively correlated with the onset of the mosquito WNV infection rate during June. These findings were confirmed by the Jackknife analysis in MaxEnt and independently collected field validation points. The spatial and temporal correlations of VHC ratios and their response to the predicting variables are discussed. PMID:28786934
Sallam, Mohamed F; Michaels, Sarah R; Riegel, Claudia; Pereira, Roberto M; Zipperer, Wayne; Lockaby, B Graeme; Koehler, Philip G
2017-08-08
The consistent sporadic transmission of West Nile Virus (WNV) in the city of New Orleans justifies the need for distribution risk maps highlighting human risk of mosquito bites. We modeled the influence of biophysical and socioeconomic metrics on the spatio-temporal distributions of presence/vector-host contact (VHC) ratios of WNV vector, Culex quinquefasciatus , within their flight range . Biophysical and socioeconomic data were extracted within 5-km buffer radii around sampling localities of gravid female Culex quinquefasciatus . The spatio-temporal correlations between VHC data and 33 variables, including climate, land use-land cover (LULC), socioeconomic, and land surface terrain were analyzed using stepwise linear regression models (RM). Using MaxEnt, we developed a distribution model using the correlated predicting variables. Only 12 factors showed significant correlations with spatial distribution of VHC ratios ( R ² = 81.62, p < 0.01). Non-forested wetland (NFWL), tree density (TD) and residential-urban (RU) settings demonstrated the strongest relationship. The VHC ratios showed monthly environmental resilience in terms of number and type of influential factors. The highest prediction power of RU and other urban and built up land (OUBL), was demonstrated during May-August. This association was positively correlated with the onset of the mosquito WNV infection rate during June. These findings were confirmed by the Jackknife analysis in MaxEnt and independently collected field validation points. The spatial and temporal correlations of VHC ratios and their response to the predicting variables are discussed.
NASA Astrophysics Data System (ADS)
Yang, Liansheng; Zhu, Yingming; Wang, Yudong; Wang, Yiqi
2016-11-01
Based on the daily price data of spot prices of West Texas Intermediate (WTI) crude oil and ten CSI300 sector indices in China, we apply multifractal detrended cross-correlation analysis (MF-DCCA) method to investigate the cross-correlations between crude oil and Chinese sector stock markets. We find that the strength of multifractality between WTI crude oil and energy sector stock market is the highest, followed by the strength of multifractality between WTI crude oil and financial sector market, which reflects a close connection between energy and financial market. Then we do vector autoregression (VAR) analysis to capture the interdependencies among the multiple time series. By comparing the strength of multifractality for original data and residual errors of VAR model, we get a conclusion that vector auto-regression (VAR) model could not be used to describe the dynamics of the cross-correlations between WTI crude oil and the ten sector stock markets.
Uncertainty Management for Diagnostics and Prognostics of Batteries using Bayesian Techniques
NASA Technical Reports Server (NTRS)
Saha, Bhaskar; Goebel, kai
2007-01-01
Uncertainty management has always been the key hurdle faced by diagnostics and prognostics algorithms. A Bayesian treatment of this problem provides an elegant and theoretically sound approach to the modern Condition- Based Maintenance (CBM)/Prognostic Health Management (PHM) paradigm. The application of the Bayesian techniques to regression and classification in the form of Relevance Vector Machine (RVM), and to state estimation as in Particle Filters (PF), provides a powerful tool to integrate the diagnosis and prognosis of battery health. The RVM, which is a Bayesian treatment of the Support Vector Machine (SVM), is used for model identification, while the PF framework uses the learnt model, statistical estimates of noise and anticipated operational conditions to provide estimates of remaining useful life (RUL) in the form of a probability density function (PDF). This type of prognostics generates a significant value addition to the management of any operation involving electrical systems.
Evaluating neighborhood structures for modeling intercity diffusion of large-scale dengue epidemics.
Wen, Tzai-Hung; Hsu, Ching-Shun; Hu, Ming-Che
2018-05-03
Dengue fever is a vector-borne infectious disease that is transmitted by contact between vector mosquitoes and susceptible hosts. The literature has addressed the issue on quantifying the effect of individual mobility on dengue transmission. However, there are methodological concerns in the spatial regression model configuration for examining the effect of intercity-scale human mobility on dengue diffusion. The purposes of the study are to investigate the influence of neighborhood structures on intercity epidemic progression from pre-epidemic to epidemic periods and to compare definitions of different neighborhood structures for interpreting the spread of dengue epidemics. We proposed a framework for assessing the effect of model configurations on dengue incidence in 2014 and 2015, which were the most severe outbreaks in 70 years in Taiwan. Compared with the conventional model configuration in spatial regression analysis, our proposed model used a radiation model, which reflects population flow between townships, as a spatial weight to capture the structure of human mobility. The results of our model demonstrate better model fitting performance, indicating that the structure of human mobility has better explanatory power in dengue diffusion than the geometric structure of administration boundaries and geographic distance between centroids of cities. We also identified spatial-temporal hierarchy of dengue diffusion: dengue incidence would be influenced by its immediate neighboring townships during pre-epidemic and epidemic periods, and also with more distant neighbors (based on mobility) in pre-epidemic periods. Our findings suggest that the structure of population mobility could more reasonably capture urban-to-urban interactions, which implies that the hub cities could be a "bridge" for large-scale transmission and make townships that immediately connect to hub cities more vulnerable to dengue epidemics.
Highly predictive and interpretable models for PAMPA permeability.
Sun, Hongmao; Nguyen, Kimloan; Kerns, Edward; Yan, Zhengyin; Yu, Kyeong Ri; Shah, Pranav; Jadhav, Ajit; Xu, Xin
2017-02-01
Cell membrane permeability is an important determinant for oral absorption and bioavailability of a drug molecule. An in silico model predicting drug permeability is described, which is built based on a large permeability dataset of 7488 compound entries or 5435 structurally unique molecules measured by the same lab using parallel artificial membrane permeability assay (PAMPA). On the basis of customized molecular descriptors, the support vector regression (SVR) model trained with 4071 compounds with quantitative data is able to predict the remaining 1364 compounds with the qualitative data with an area under the curve of receiver operating characteristic (AUC-ROC) of 0.90. The support vector classification (SVC) model trained with half of the whole dataset comprised of both the quantitative and the qualitative data produced accurate predictions to the remaining data with the AUC-ROC of 0.88. The results suggest that the developed SVR model is highly predictive and provides medicinal chemists a useful in silico tool to facilitate design and synthesis of novel compounds with optimal drug-like properties, and thus accelerate the lead optimization in drug discovery. Copyright © 2016 Elsevier Ltd. All rights reserved.
Maggi, Federico; Bosco, Domenico; Galetto, Luciana; Palmano, Sabrina; Marzachì, Cristina
2017-01-01
Analyses of space-time statistical features of a flavescence dorée (FD) epidemic in Vitis vinifera plants are presented. FD spread was surveyed from 2011 to 2015 in a vineyard of 17,500 m2 surface area in the Piemonte region, Italy; count and position of symptomatic plants were used to test the hypothesis of epidemic Complete Spatial Randomness and isotropicity in the space-time static (year-by-year) point pattern measure. Space-time dynamic (year-to-year) point pattern analyses were applied to newly infected and recovered plants to highlight statistics of FD progression and regression over time. Results highlighted point patterns ranging from disperse (at small scales) to aggregated (at large scales) over the years, suggesting that the FD epidemic is characterized by multiscale properties that may depend on infection incidence, vector population, and flight behavior. Dynamic analyses showed moderate preferential progression and regression along rows. Nearly uniform distributions of direction and negative exponential distributions of distance of newly symptomatic and recovered plants relative to existing symptomatic plants highlighted features of vector mobility similar to Brownian motion. These evidences indicate that space-time epidemics modeling should include environmental setting (e.g., vineyard geometry and topography) to capture anisotropicity as well as statistical features of vector flight behavior, plant recovery and susceptibility, and plant mortality. PMID:28111581
Filgueiras, Paulo R; Terra, Luciana A; Castro, Eustáquio V R; Oliveira, Lize M S L; Dias, Júlio C M; Poppi, Ronei J
2015-09-01
This paper aims to estimate the temperature equivalent to 10% (T10%), 50% (T50%) and 90% (T90%) of distilled volume in crude oils using (1)H NMR and support vector regression (SVR). Confidence intervals for the predicted values were calculated using a boosting-type ensemble method in a procedure called ensemble support vector regression (eSVR). The estimated confidence intervals obtained by eSVR were compared with previously accepted calculations from partial least squares (PLS) models and a boosting-type ensemble applied in the PLS method (ePLS). By using the proposed boosting strategy, it was possible to identify outliers in the T10% property dataset. The eSVR procedure improved the accuracy of the distillation temperature predictions in relation to standard PLS, ePLS and SVR. For T10%, a root mean square error of prediction (RMSEP) of 11.6°C was obtained in comparison with 15.6°C for PLS, 15.1°C for ePLS and 28.4°C for SVR. The RMSEPs for T50% were 24.2°C, 23.4°C, 22.8°C and 14.4°C for PLS, ePLS, SVR and eSVR, respectively. For T90%, the values of RMSEP were 39.0°C, 39.9°C and 39.9°C for PLS, ePLS, SVR and eSVR, respectively. The confidence intervals calculated by the proposed boosting methodology presented acceptable values for the three properties analyzed; however, they were lower than those calculated by the standard methodology for PLS. Copyright © 2015 Elsevier B.V. All rights reserved.
Using Data Mining for Wine Quality Assessment
NASA Astrophysics Data System (ADS)
Cortez, Paulo; Teixeira, Juliana; Cerdeira, António; Almeida, Fernando; Matos, Telmo; Reis, José
Certification and quality assessment are crucial issues within the wine industry. Currently, wine quality is mostly assessed by physicochemical (e.g alcohol levels) and sensory (e.g. human expert evaluation) tests. In this paper, we propose a data mining approach to predict wine preferences that is based on easily available analytical tests at the certification step. A large dataset is considered with white vinho verde samples from the Minho region of Portugal. Wine quality is modeled under a regression approach, which preserves the order of the grades. Explanatory knowledge is given in terms of a sensitivity analysis, which measures the response changes when a given input variable is varied through its domain. Three regression techniques were applied, under a computationally efficient procedure that performs simultaneous variable and model selection and that is guided by the sensitivity analysis. The support vector machine achieved promising results, outperforming the multiple regression and neural network methods. Such model is useful for understanding how physicochemical tests affect the sensory preferences. Moreover, it can support the wine expert evaluations and ultimately improve the production.
Image segmentation using hidden Markov Gauss mixture models.
Pyun, Kyungsuk; Lim, Johan; Won, Chee Sun; Gray, Robert M
2007-07-01
Image segmentation is an important tool in image processing and can serve as an efficient front end to sophisticated algorithms and thereby simplify subsequent processing. We develop a multiclass image segmentation method using hidden Markov Gauss mixture models (HMGMMs) and provide examples of segmentation of aerial images and textures. HMGMMs incorporate supervised learning, fitting the observation probability distribution given each class by a Gauss mixture estimated using vector quantization with a minimum discrimination information (MDI) distortion. We formulate the image segmentation problem using a maximum a posteriori criteria and find the hidden states that maximize the posterior density given the observation. We estimate both the hidden Markov parameter and hidden states using a stochastic expectation-maximization algorithm. Our results demonstrate that HMGMM provides better classification in terms of Bayes risk and spatial homogeneity of the classified objects than do several popular methods, including classification and regression trees, learning vector quantization, causal hidden Markov models (HMMs), and multiresolution HMMs. The computational load of HMGMM is similar to that of the causal HMM.
NASA Astrophysics Data System (ADS)
Imani, Moslem; You, Rey-Jer; Kuo, Chung-Yen
2014-10-01
Sea level forecasting at various time intervals is of great importance in water supply management. Evolutionary artificial intelligence (AI) approaches have been accepted as an appropriate tool for modeling complex nonlinear phenomena in water bodies. In the study, we investigated the ability of two AI techniques: support vector machine (SVM), which is mathematically well-founded and provides new insights into function approximation, and gene expression programming (GEP), which is used to forecast Caspian Sea level anomalies using satellite altimetry observations from June 1992 to December 2013. SVM demonstrates the best performance in predicting Caspian Sea level anomalies, given the minimum root mean square error (RMSE = 0.035) and maximum coefficient of determination (R2 = 0.96) during the prediction periods. A comparison between the proposed AI approaches and the cascade correlation neural network (CCNN) model also shows the superiority of the GEP and SVM models over the CCNN.
A Functional Varying-Coefficient Single-Index Model for Functional Response Data
Li, Jialiang; Huang, Chao; Zhu, Hongtu
2016-01-01
Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. PMID:29200540
A Functional Varying-Coefficient Single-Index Model for Functional Response Data.
Li, Jialiang; Huang, Chao; Zhu, Hongtu
2017-01-01
Motivated by the analysis of imaging data, we propose a novel functional varying-coefficient single index model (FVCSIM) to carry out the regression analysis of functional response data on a set of covariates of interest. FVCSIM represents a new extension of varying-coefficient single index models for scalar responses collected from cross-sectional and longitudinal studies. An efficient estimation procedure is developed to iteratively estimate varying coefficient functions, link functions, index parameter vectors, and the covariance function of individual functions. We systematically examine the asymptotic properties of all estimators including the weak convergence of the estimated varying coefficient functions, the asymptotic distribution of the estimated index parameter vectors, and the uniform convergence rate of the estimated covariance function and their spectrum. Simulation studies are carried out to assess the finite-sample performance of the proposed procedure. We apply FVCSIM to investigating the development of white matter diffusivities along the corpus callosum skeleton obtained from Alzheimer's Disease Neuroimaging Initiative (ADNI) study.
Hsu, Pi-Shan; Chen, Chaur-Dong; Lian, Ie-Bin; Chao, Day-Yu
2015-01-01
Background Despite dengue dynamics being driven by complex interactions between human hosts, mosquito vectors and viruses that are influenced by climate factors, an operational model that will enable health authorities to anticipate the outbreak risk in a dengue non-endemic area has not been developed. The objectives of this study were to evaluate the temporal relationship between meteorological variables, entomological surveillance indices and confirmed dengue cases; and to establish the threshold for entomological surveillance indices including three mosquito larval indices [Breteau (BI), Container (CI) and House indices (HI)] and one adult index (AI) as an early warning tool for dengue epidemic. Methodology/Principal Findings Epidemiological, entomological and meteorological data were analyzed from 2005 to 2012 in Kaohsiung City, Taiwan. The successive waves of dengue outbreaks with different magnitudes were recorded in Kaohsiung City, and involved a dominant serotype during each epidemic. The annual indigenous dengue cases usually started from May to June and reached a peak in October to November. Vector data from 2005–2012 showed that the peak of the adult mosquito population was followed by a peak in the corresponding dengue activity with a lag period of 1–2 months. Therefore, we focused the analysis on the data from May to December and the high risk district, where the inspection of the immature and mature mosquitoes was carried out on a weekly basis and about 97.9% dengue cases occurred. The two-stage model was utilized here to estimate the risk and time-lag effect of annual dengue outbreaks in Taiwan. First, Poisson regression was used to select the optimal subset of variables and time-lags for predicting the number of dengue cases, and the final results of the multivariate analysis were selected based on the smallest AIC value. Next, each vector index models with selected variables were subjected to multiple logistic regression models to examine the accuracy of predicting the occurrence of dengue cases. The results suggested that Model-AI, BI, CI and HI predicted the occurrence of dengue cases with 83.8, 87.8, 88.3 and 88.4% accuracy, respectively. The predicting threshold based on individual Model-AI, BI, CI and HI was 0.97, 1.16, 1.79 and 0.997, respectively. Conclusion/Significance There was little evidence of quantifiable association among vector indices, meteorological factors and dengue transmission that could reliably be used for outbreak prediction. Our study here provided the proof-of-concept of how to search for the optimal model and determine the threshold for dengue epidemics. Since those factors used for prediction varied, depending on the ecology and herd immunity level under different geological areas, different thresholds may be developed for different countries using a similar structure of the two-stage model. PMID:26366874
Saavedra, Laura M; Romanelli, Gustavo P; Rozo, Ciro E; Duchowicz, Pablo R
2018-01-01
The insecticidal activity of a series of 62 plant derived molecules against the chikungunya, dengue and zika vector, the Aedes aegypti (Diptera:Culicidae) mosquito, is subjected to a Quantitative Structure-Activity Relationships (QSAR) analysis. The Replacement Method (RM) variable subset selection technique based on Multivariable Linear Regression (MLR) proves to be successful for exploring 4885 molecular descriptors calculated with Dragon 6. The predictive capability of the obtained models is confirmed through an external test set of compounds, Leave-One-Out (LOO) cross-validation and Y-Randomization. The present study constitutes a first necessary computational step for designing less toxic insecticides. Copyright © 2017 Elsevier B.V. All rights reserved.
RRegrs: an R package for computer-aided model selection with multiple regression models.
Tsiliki, Georgia; Munteanu, Cristian R; Seoane, Jose A; Fernandez-Lozano, Carlos; Sarimveis, Haralambos; Willighagen, Egon L
2015-01-01
Predictive regression models can be created with many different modelling approaches. Choices need to be made for data set splitting, cross-validation methods, specific regression parameters and best model criteria, as they all affect the accuracy and efficiency of the produced predictive models, and therefore, raising model reproducibility and comparison issues. Cheminformatics and bioinformatics are extensively using predictive modelling and exhibit a need for standardization of these methodologies in order to assist model selection and speed up the process of predictive model development. A tool accessible to all users, irrespectively of their statistical knowledge, would be valuable if it tests several simple and complex regression models and validation schemes, produce unified reports, and offer the option to be integrated into more extensive studies. Additionally, such methodology should be implemented as a free programming package, in order to be continuously adapted and redistributed by others. We propose an integrated framework for creating multiple regression models, called RRegrs. The tool offers the option of ten simple and complex regression methods combined with repeated 10-fold and leave-one-out cross-validation. Methods include Multiple Linear regression, Generalized Linear Model with Stepwise Feature Selection, Partial Least Squares regression, Lasso regression, and Support Vector Machines Recursive Feature Elimination. The new framework is an automated fully validated procedure which produces standardized reports to quickly oversee the impact of choices in modelling algorithms and assess the model and cross-validation results. The methodology was implemented as an open source R package, available at https://www.github.com/enanomapper/RRegrs, by reusing and extending on the caret package. The universality of the new methodology is demonstrated using five standard data sets from different scientific fields. Its efficiency in cheminformatics and QSAR modelling is shown with three use cases: proteomics data for surface-modified gold nanoparticles, nano-metal oxides descriptor data, and molecular descriptors for acute aquatic toxicity data. The results show that for all data sets RRegrs reports models with equal or better performance for both training and test sets than those reported in the original publications. Its good performance as well as its adaptability in terms of parameter optimization could make RRegrs a popular framework to assist the initial exploration of predictive models, and with that, the design of more comprehensive in silico screening applications.Graphical abstractRRegrs is a computer-aided model selection framework for R multiple regression models; this is a fully validated procedure with application to QSAR modelling.
NASA Astrophysics Data System (ADS)
Ibrahim, Elsy; Kim, Wonkook; Crawford, Melba; Monbaliu, Jaak
2017-02-01
Remote sensing has been successfully utilized to distinguish and quantify sediment properties in the intertidal environment. Classification approaches of imagery are popular and powerful yet can lead to site- and case-specific results. Such specificity creates challenges for temporal studies. Thus, this paper investigates the use of regression models to quantify sediment properties instead of classifying them. Two regression approaches, namely multiple regression (MR) and support vector regression (SVR), are used in this study for the retrieval of bio-physical variables of intertidal surface sediment of the IJzermonding, a Belgian nature reserve. In the regression analysis, mud content, chlorophyll a concentration, organic matter content, and soil moisture are estimated using radiometric variables of two airborne sensors, namely airborne hyperspectral sensor (AHS) and airborne prism experiment (APEX) and and using field hyperspectral acquisitions by analytical spectral device (ASD). The performance of the two regression approaches is best for the estimation of moisture content. SVR attains the highest accuracy without feature reduction while MR achieves good results when feature reduction is carried out. Sediment property maps are successfully obtained using the models and hyperspectral imagery where SVR used with all bands achieves the best performance. The study also involves the extraction of weights identifying the contribution of each band of the images in the quantification of each sediment property when MR and principal component analysis are used.
NASA Astrophysics Data System (ADS)
Marco, F. J.; Martínez, M. J.; López, J. A.
2015-04-01
The high quality of Hipparcos data in position, proper motion, and parallax has allowed for studies about stellar kinematics with the aim of achieving a better physical understanding of our galaxy, based on accurate calculus of the Ogorodnikov-Milne model (OMM) parameters. The use of discrete least squares is the most common adjustment method, but it may lead to errors mainly because of the inhomogeneous spatial distribution of the data. We present an example of the instability of this method using the case of a function given by a linear combination of Legendre polynomials. These polynomials are basic in the use of vector spherical harmonics, which have been used to compute the OMM parameters by several authors, such as Makarov & Murphy, Mignard & Klioner, and Vityazev & Tsvetkov. To overcome the former problem, we propose the use of a mixed method (see Marco et al.) that includes the extension of the functions of residuals to any point on the celestial sphere. The goal is to be able to work with continuous variables in the calculation of the coefficients of the vector spherical harmonic developments with stability and efficiency. We apply this mixed procedure to the study of the kinematics of the stars in our Galaxy, employing the Hipparcos velocity field data to obtain the OMM parameters. Previously, we tested the method by perturbing the Vectorial Spherical Harmonics model as well as the velocity vector field.
Tsetse Fly (G.f. fuscipes) Distribution in the Lake Victoria Basin of Uganda
Albert, Mugenyi; Wardrop, Nicola A; Atkinson, Peter M; Torr, Steve J; Welburn, Susan C
2015-01-01
Tsetse flies transmit trypanosomes, the causative agent of human and animal African trypanosomiasis. The tsetse vector is extensively distributed across sub-Saharan Africa. Trypanosomiasis maintenance is determined by the interrelationship of three elements: vertebrate host, parasite and the vector responsible for transmission. Mapping the distribution and abundance of tsetse flies assists in predicting trypanosomiasis distributions and developing rational strategies for disease and vector control. Given scarce resources to carry out regular full scale field tsetse surveys to up-date existing tsetse maps, there is a need to devise inexpensive means for regularly obtaining dependable area-wide tsetse data to guide control activities. In this study we used spatial epidemiological modelling techniques (logistic regression) involving 5000 field-based tsetse-data (G. f. fuscipes) points over an area of 40,000 km2, with satellite-derived environmental surrogates composed of precipitation, temperature, land cover, normalised difference vegetation index (NDVI) and elevation at the sub-national level. We used these extensive tsetse data to analyse the relationships between presence of tsetse (G. f. fuscipes) and environmental variables. The strength of the results was enhanced through the application of a spatial autologistic regression model (SARM). Using the SARM we showed that the probability of tsetse presence increased with proportion of forest cover and riverine vegetation. The key outputs are a predictive tsetse distribution map for the Lake Victoria basin of Uganda and an improved understanding of the association between tsetse presence and environmental variables. The predicted spatial distribution of tsetse in the Lake Victoria basin of Uganda will provide significant new information to assist with the spatial targeting of tsetse and trypanosomiasis control. PMID:25875201
Alamaniotis, Miltiadis; Bargiotas, Dimitrios; Tsoukalas, Lefteri H
2016-01-01
Integration of energy systems with information technologies has facilitated the realization of smart energy systems that utilize information to optimize system operation. To that end, crucial in optimizing energy system operation is the accurate, ahead-of-time forecasting of load demand. In particular, load forecasting allows planning of system expansion, and decision making for enhancing system safety and reliability. In this paper, the application of two types of kernel machines for medium term load forecasting (MTLF) is presented and their performance is recorded based on a set of historical electricity load demand data. The two kernel machine models and more specifically Gaussian process regression (GPR) and relevance vector regression (RVR) are utilized for making predictions over future load demand. Both models, i.e., GPR and RVR, are equipped with a Gaussian kernel and are tested on daily predictions for a 30-day-ahead horizon taken from the New England Area. Furthermore, their performance is compared to the ARMA(2,2) model with respect to mean average percentage error and squared correlation coefficient. Results demonstrate the superiority of RVR over the other forecasting models in performing MTLF.
PREDICTION OF SOLAR FLARE SIZE AND TIME-TO-FLARE USING SUPPORT VECTOR MACHINE REGRESSION
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boucheron, Laura E.; Al-Ghraibah, Amani; McAteer, R. T. James
We study the prediction of solar flare size and time-to-flare using 38 features describing magnetic complexity of the photospheric magnetic field. This work uses support vector regression to formulate a mapping from the 38-dimensional feature space to a continuous-valued label vector representing flare size or time-to-flare. When we consider flaring regions only, we find an average error in estimating flare size of approximately half a geostationary operational environmental satellite (GOES) class. When we additionally consider non-flaring regions, we find an increased average error of approximately three-fourths a GOES class. We also consider thresholding the regressed flare size for the experimentmore » containing both flaring and non-flaring regions and find a true positive rate of 0.69 and a true negative rate of 0.86 for flare prediction. The results for both of these size regression experiments are consistent across a wide range of predictive time windows, indicating that the magnetic complexity features may be persistent in appearance long before flare activity. This is supported by our larger error rates of some 40 hr in the time-to-flare regression problem. The 38 magnetic complexity features considered here appear to have discriminative potential for flare size, but their persistence in time makes them less discriminative for the time-to-flare problem.« less
Westreich, Daniel; Lessler, Justin; Funk, Michele Jonsson
2010-08-01
Propensity scores for the analysis of observational data are typically estimated using logistic regression. Our objective in this review was to assess machine learning alternatives to logistic regression, which may accomplish the same goals but with fewer assumptions or greater accuracy. We identified alternative methods for propensity score estimation and/or classification from the public health, biostatistics, discrete mathematics, and computer science literature, and evaluated these algorithms for applicability to the problem of propensity score estimation, potential advantages over logistic regression, and ease of use. We identified four techniques as alternatives to logistic regression: neural networks, support vector machines, decision trees (classification and regression trees [CART]), and meta-classifiers (in particular, boosting). Although the assumptions of logistic regression are well understood, those assumptions are frequently ignored. All four alternatives have advantages and disadvantages compared with logistic regression. Boosting (meta-classifiers) and, to a lesser extent, decision trees (particularly CART), appear to be most promising for use in the context of propensity score analysis, but extensive simulation studies are needed to establish their utility in practice. Copyright (c) 2010 Elsevier Inc. All rights reserved.
Sturm, Marc; Quinten, Sascha; Huber, Christian G.; Kohlbacher, Oliver
2007-01-01
We propose a new model for predicting the retention time of oligonucleotides. The model is based on ν support vector regression using features derived from base sequence and predicted secondary structure of oligonucleotides. Because of the secondary structure information, the model is applicable even at relatively low temperatures where the secondary structure is not suppressed by thermal denaturing. This makes the prediction of oligonucleotide retention time for arbitrary temperatures possible, provided that the target temperature lies within the temperature range of the training data. We describe different possibilities of feature calculation from base sequence and secondary structure, present the results and compare our model to existing models. PMID:17567619
Carbon Nanotube Growth Rate Regression using Support Vector Machines and Artificial Neural Networks
2014-03-27
intensity D peak. Reprinted with permission from [38]. The SVM classifier is trained using custom written Java code leveraging the Sequential Minimal...Society Encog is a machine learning framework for Java , C++ and .Net applications that supports Bayesian Networks, Hidden Markov Models, SVMs and ANNs [13...SVM classifiers are trained using Weka libraries and leveraging custom written Java code. The data set is created as an Attribute Relationship File
Linear Regression Modeling of Selected Analytes from the Balad Air Sampling Program
2012-04-05
groundwater, air and soil contamination with unwanted chemicals as well as attract vectors (Insects, rodents, etc.) for diseases. In deployed...via in-flight jettisoning of fuel and from 31 accidental spills or leaks to soil during use, storage, and transportation. VOC components of JP-8...can be introduced to the atmosphere from the soil through volatilization.46 In addition, the reaction between JP-8 and atmospheric chemicals may
Improving Non-Destructive Concrete Strength Tests Using Support Vector Machines
Shih, Yi-Fan; Wang, Yu-Ren; Lin, Kuo-Liang; Chen, Chin-Wen
2015-01-01
Non-destructive testing (NDT) methods are important alternatives when destructive tests are not feasible to examine the in situ concrete properties without damaging the structure. The rebound hammer test and the ultrasonic pulse velocity test are two popular NDT methods to examine the properties of concrete. The rebound of the hammer depends on the hardness of the test specimen and ultrasonic pulse travelling speed is related to density, uniformity, and homogeneity of the specimen. Both of these two methods have been adopted to estimate the concrete compressive strength. Statistical analysis has been implemented to establish the relationship between hammer rebound values/ultrasonic pulse velocities and concrete compressive strength. However, the estimated results can be unreliable. As a result, this research proposes an Artificial Intelligence model using support vector machines (SVMs) for the estimation. Data from 95 cylinder concrete samples are collected to develop and validate the model. The results show that combined NDT methods (also known as SonReb method) yield better estimations than single NDT methods. The results also show that the SVMs model is more accurate than the statistical regression model. PMID:28793627
Riemannian multi-manifold modeling and clustering in brain networks
NASA Astrophysics Data System (ADS)
Slavakis, Konstantinos; Salsabilian, Shiva; Wack, David S.; Muldoon, Sarah F.; Baidoo-Williams, Henry E.; Vettel, Jean M.; Cieslak, Matthew; Grafton, Scott T.
2017-08-01
This paper introduces Riemannian multi-manifold modeling in the context of brain-network analytics: Brainnetwork time-series yield features which are modeled as points lying in or close to a union of a finite number of submanifolds within a known Riemannian manifold. Distinguishing disparate time series amounts thus to clustering multiple Riemannian submanifolds. To this end, two feature-generation schemes for brain-network time series are put forth. The first one is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second one utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positivedefinite matrices. Based on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their Riemannian-geometry properties. Numerical tests on time series, synthetically generated from real brain-network structural connectivity matrices, reveal that the proposed scheme outperforms classical and state-of-the-art techniques in clustering brain-network states/structures.
NASA Astrophysics Data System (ADS)
Wang, Li-yong; Li, Le; Zhang, Zhi-hua
2016-09-01
Hot compression tests of Ti-6Al-4V alloy in a wide temperature range of 1023-1323 K and strain rate range of 0.01-10 s-1 were conducted by a servo-hydraulic and computer-controlled Gleeble-3500 machine. In order to accurately and effectively characterize the highly nonlinear flow behaviors, support vector regression (SVR) which is a machine learning method was combined with genetic algorithm (GA) for characterizing the flow behaviors, namely, the GA-SVR. The prominent character of GA-SVR is that it with identical training parameters will keep training accuracy and prediction accuracy at a stable level in different attempts for a certain dataset. The learning abilities, generalization abilities, and modeling efficiencies of the mathematical regression model, ANN, and GA-SVR for Ti-6Al-4V alloy were detailedly compared. Comparison results show that the learning ability of the GA-SVR is stronger than the mathematical regression model. The generalization abilities and modeling efficiencies of these models were shown as follows in ascending order: the mathematical regression model < ANN < GA-SVR. The stress-strain data outside experimental conditions were predicted by the well-trained GA-SVR, which improved simulation accuracy of the load-stroke curve and can further improve the related research fields where stress-strain data play important roles, such as speculating work hardening and dynamic recovery, characterizing dynamic recrystallization evolution, and improving processing maps.
Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin
2016-01-01
To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026
Fang, Xingang; Bagui, Sikha; Bagui, Subhash
2017-08-01
The readily available high throughput screening (HTS) data from the PubChem database provides an opportunity for mining of small molecules in a variety of biological systems using machine learning techniques. From the thousands of available molecular descriptors developed to encode useful chemical information representing the characteristics of molecules, descriptor selection is an essential step in building an optimal quantitative structural-activity relationship (QSAR) model. For the development of a systematic descriptor selection strategy, we need the understanding of the relationship between: (i) the descriptor selection; (ii) the choice of the machine learning model; and (iii) the characteristics of the target bio-molecule. In this work, we employed the Signature descriptor to generate a dataset on the Human kallikrein 5 (hK 5) inhibition confirmatory assay data and compared multiple classification models including logistic regression, support vector machine, random forest and k-nearest neighbor. Under optimal conditions, the logistic regression model provided extremely high overall accuracy (98%) and precision (90%), with good sensitivity (65%) in the cross validation test. In testing the primary HTS screening data with more than 200K molecular structures, the logistic regression model exhibited the capability of eliminating more than 99.9% of the inactive structures. As part of our exploration of the descriptor-model-target relationship, the excellent predictive performance of the combination of the Signature descriptor and the logistic regression model on the assay data of the Human kallikrein 5 (hK 5) target suggested a feasible descriptor/model selection strategy on similar targets. Copyright © 2017 Elsevier Ltd. All rights reserved.
Zeilhofer, Peter; Santos, Emerson Soares dos; Ribeiro, Ana LM; Miyazaki, Rosina D; Santos, Marina Atanaka dos
2007-01-01
Background Hydropower plants provide more than 78 % of Brazil's electricity generation, but the country's reservoirs are potential new habitats for main vectors of malaria. In a case study in the surroundings of the Manso hydropower plant in Mato Grosso state, Central Brazil, habitat suitability of Anopheles darlingi was studied. Habitat profile was characterized by collecting environmental data. Remote sensing and GIS techniques were applied to extract additional spatial layers of land use, distance maps, and relief characteristics for spatial model building. Results Logistic regression analysis and ROC curves indicate significant relationships between the environment and presence of An. darlingi. Probabilities of presence strongly vary as a function of land cover and distance from the lake shoreline. Vector presence was associated with spatial proximity to reservoir and semi-deciduous forests followed by Cerrado woodland. Vector absence was associated with open vegetation formations such as grasslands and agricultural areas. We suppose that non-significant differences of vector incidences between rainy and dry seasons are associated with the availability of anthropogenic breeding habitat of the reservoir throughout the year. Conclusion Satellite image classification and multitemporal shoreline simulations through DEM-based GIS-analyses consist in a valuable tool for spatial modeling of A. darlingi habitats in the studied hydropower reservoir area. Vector presence is significantly increased in forested areas near reservoirs in bays protected from wind and wave action. Construction of new reservoirs under the tropical, sub-humid climatic conditions should therefore be accompanied by entomologic studies to predict the risk of malaria epidemics. PMID:17343728
Zeilhofer, Peter; dos Santos, Emerson Soares; Ribeiro, Ana L M; Miyazaki, Rosina D; dos Santos, Marina Atanaka
2007-03-07
Hydropower plants provide more than 78 % of Brazil's electricity generation, but the country's reservoirs are potential new habitats for main vectors of malaria. In a case study in the surroundings of the Manso hydropower plant in Mato Grosso state, Central Brazil, habitat suitability of Anopheles darlingi was studied. Habitat profile was characterized by collecting environmental data. Remote sensing and GIS techniques were applied to extract additional spatial layers of land use, distance maps, and relief characteristics for spatial model building. Logistic regression analysis and ROC curves indicate significant relationships between the environment and presence of An. darlingi. Probabilities of presence strongly vary as a function of land cover and distance from the lake shoreline. Vector presence was associated with spatial proximity to reservoir and semi-deciduous forests followed by Cerrado woodland. Vector absence was associated with open vegetation formations such as grasslands and agricultural areas. We suppose that non-significant differences of vector incidences between rainy and dry seasons are associated with the availability of anthropogenic breeding habitat of the reservoir throughout the year. Satellite image classification and multitemporal shoreline simulations through DEM-based GIS-analyses consist in a valuable tool for spatial modeling of A. darlingi habitats in the studied hydropower reservoir area. Vector presence is significantly increased in forested areas near reservoirs in bays protected from wind and wave action. Construction of new reservoirs under the tropical, sub-humid climatic conditions should therefore be accompanied by entomologic studies to predict the risk of malaria epidemics.
Developing a dengue forecast model using machine learning: A case study in China
Zhang, Qin; Wang, Li; Xiao, Jianpeng; Zhang, Qingying; Luo, Ganfeng; Li, Zhihao; He, Jianfeng; Zhang, Yonghui; Ma, Wenjun
2017-01-01
Background In China, dengue remains an important public health issue with expanded areas and increased incidence recently. Accurate and timely forecasts of dengue incidence in China are still lacking. We aimed to use the state-of-the-art machine learning algorithms to develop an accurate predictive model of dengue. Methodology/Principal findings Weekly dengue cases, Baidu search queries and climate factors (mean temperature, relative humidity and rainfall) during 2011–2014 in Guangdong were gathered. A dengue search index was constructed for developing the predictive models in combination with climate factors. The observed year and week were also included in the models to control for the long-term trend and seasonality. Several machine learning algorithms, including the support vector regression (SVR) algorithm, step-down linear regression model, gradient boosted regression tree algorithm (GBM), negative binomial regression model (NBM), least absolute shrinkage and selection operator (LASSO) linear regression model and generalized additive model (GAM), were used as candidate models to predict dengue incidence. Performance and goodness of fit of the models were assessed using the root-mean-square error (RMSE) and R-squared measures. The residuals of the models were examined using the autocorrelation and partial autocorrelation function analyses to check the validity of the models. The models were further validated using dengue surveillance data from five other provinces. The epidemics during the last 12 weeks and the peak of the 2014 large outbreak were accurately forecasted by the SVR model selected by a cross-validation technique. Moreover, the SVR model had the consistently smallest prediction error rates for tracking the dynamics of dengue and forecasting the outbreaks in other areas in China. Conclusion and significance The proposed SVR model achieved a superior performance in comparison with other forecasting techniques assessed in this study. The findings can help the government and community respond early to dengue epidemics. PMID:29036169
Model selection with multiple regression on distance matrices leads to incorrect inferences.
Franckowiak, Ryan P; Panasci, Michael; Jarvis, Karl J; Acuña-Rodriguez, Ian S; Landguth, Erin L; Fortin, Marie-Josée; Wagner, Helene H
2017-01-01
In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike's information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.
Dicko, Ahmadou H; Lancelot, Renaud; Seck, Momar T; Guerrini, Laure; Sall, Baba; Lo, Mbargou; Vreysen, Marc J B; Lefrançois, Thierry; Fonta, William M; Peck, Steven L; Bouyer, Jérémy
2014-07-15
Tsetse flies are vectors of human and animal trypanosomoses in sub-Saharan Africa and are the target of the Pan African Tsetse and Trypanosomiasis Eradication Campaign (PATTEC). Glossina palpalis gambiensis (Diptera: Glossinidae) is a riverine species that is still present as an isolated metapopulation in the Niayes area of Senegal. It is targeted by a national eradication campaign combining a population reduction phase based on insecticide-treated targets (ITTs) and cattle and an eradication phase based on the sterile insect technique. In this study, we used species distribution models to optimize control operations. We compared the probability of the presence of G. p. gambiensis and habitat suitability using a regularized logistic regression and Maxent, respectively. Both models performed well, with an area under the curve of 0.89 and 0.92, respectively. Only the Maxent model predicted an expert-based classification of landscapes correctly. Maxent predictions were therefore used throughout the eradication campaign in the Niayes to make control operations more efficient in terms of deployment of ITTs, release density of sterile males, and location of monitoring traps used to assess program progress. We discuss how the models' results informed about the particular ecology of tsetse in the target area. Maxent predictions allowed optimizing efficiency and cost within our project, and might be useful for other tsetse control campaigns in the framework of the PATTEC and, more generally, other vector or insect pest control programs.
Cottrell, Gilles; Kouwaye, Bienvenue; Pierrat, Charlotte; le Port, Agnès; Bouraïma, Aziz; Fonton, Noël; Hounkonnou, Mahouton Norbert; Massougbodji, Achille; Corbel, Vincent; Garcia, André
2012-01-01
Malaria remains endemic in tropical areas, especially in Africa. For the evaluation of new tools and to further our understanding of host-parasite interactions, knowing the environmental risk of transmission--even at a very local scale--is essential. The aim of this study was to assess how malaria transmission is influenced and can be predicted by local climatic and environmental factors.As the entomological part of a cohort study of 650 newborn babies in nine villages in the Tori Bossito district of Southern Benin between June 2007 and February 2010, human landing catches were performed to assess the density of malaria vectors and transmission intensity. Climatic factors as well as household characteristics were recorded throughout the study. Statistical correlations between Anopheles density and environmental and climatic factors were tested using a three-level Poisson mixed regression model. The results showed both temporal variations in vector density (related to season and rainfall), and spatial variations at the level of both village and house. These spatial variations could be largely explained by factors associated with the house's immediate surroundings, namely soil type, vegetation index and the proximity of a watercourse. Based on these results, a predictive regression model was developed using a leave-one-out method, to predict the spatiotemporal variability of malaria transmission in the nine villages.This study points up the importance of local environmental factors in malaria transmission and describes a model to predict the transmission risk of individual children, based on environmental and behavioral characteristics.
Pierrat, Charlotte; le Port, Agnès; Bouraïma, Aziz; Fonton, Noël; Hounkonnou, Mahouton Norbert; Massougbodji, Achille; Corbel, Vincent; Garcia, André
2012-01-01
Malaria remains endemic in tropical areas, especially in Africa. For the evaluation of new tools and to further our understanding of host-parasite interactions, knowing the environmental risk of transmission—even at a very local scale—is essential. The aim of this study was to assess how malaria transmission is influenced and can be predicted by local climatic and environmental factors. As the entomological part of a cohort study of 650 newborn babies in nine villages in the Tori Bossito district of Southern Benin between June 2007 and February 2010, human landing catches were performed to assess the density of malaria vectors and transmission intensity. Climatic factors as well as household characteristics were recorded throughout the study. Statistical correlations between Anopheles density and environmental and climatic factors were tested using a three-level Poisson mixed regression model. The results showed both temporal variations in vector density (related to season and rainfall), and spatial variations at the level of both village and house. These spatial variations could be largely explained by factors associated with the house's immediate surroundings, namely soil type, vegetation index and the proximity of a watercourse. Based on these results, a predictive regression model was developed using a leave-one-out method, to predict the spatiotemporal variability of malaria transmission in the nine villages. This study points up the importance of local environmental factors in malaria transmission and describes a model to predict the transmission risk of individual children, based on environmental and behavioral characteristics. PMID:22238582
Clifford support vector machines for classification, regression, and recurrence.
Bayro-Corrochano, Eduardo Jose; Arana-Daniel, Nancy
2010-11-01
This paper introduces the Clifford support vector machines (CSVM) as a generalization of the real and complex-valued support vector machines using the Clifford geometric algebra. In this framework, we handle the design of kernels involving the Clifford or geometric product. In this approach, one redefines the optimization variables as multivectors. This allows us to have a multivector as output. Therefore, we can represent multiple classes according to the dimension of the geometric algebra in which we work. We show that one can apply CSVM for classification and regression and also to build a recurrent CSVM. The CSVM is an attractive approach for the multiple input multiple output processing of high-dimensional geometric entities. We carried out comparisons between CSVM and the current approaches to solve multiclass classification and regression. We also study the performance of the recurrent CSVM with experiments involving time series. The authors believe that this paper can be of great use for researchers and practitioners interested in multiclass hypercomplex computing, particularly for applications in complex and quaternion signal and image processing, satellite control, neurocomputation, pattern recognition, computer vision, augmented virtual reality, robotics, and humanoids.
Guo, Canyong; Luo, Xuefang; Zhou, Xiaohua; Shi, Beijia; Wang, Juanjuan; Zhao, Jinqi; Zhang, Xiaoxia
2017-06-05
Vibrational spectroscopic techniques such as infrared, near-infrared and Raman spectroscopy have become popular in detecting and quantifying polymorphism of pharmaceutics since they are fast and non-destructive. This study assessed the ability of three vibrational spectroscopy combined with multivariate analysis to quantify a low-content undesired polymorph within a binary polymorphic mixture. Partial least squares (PLS) regression and support vector machine (SVM) regression were employed to build quantitative models. Fusidic acid, a steroidal antibiotic, was used as the model compound. It was found that PLS regression performed slightly better than SVM regression in all the three spectroscopic techniques. Root mean square errors of prediction (RMSEP) were ranging from 0.48% to 1.17% for diffuse reflectance FTIR spectroscopy and 1.60-1.93% for diffuse reflectance FT-NIR spectroscopy and 1.62-2.31% for Raman spectroscopy. The results indicate that diffuse reflectance FTIR spectroscopy offers significant advantages in providing accurate measurement of polymorphic content in the fusidic acid binary mixtures, while Raman spectroscopy is the least accurate technique for quantitative analysis of polymorphs. Copyright © 2017 Elsevier B.V. All rights reserved.
Chiogna, Gabriele; Marcolini, Giorgia; Liu, Wanying; Pérez Ciria, Teresa; Tuo, Ye
2018-08-15
Water management in the alpine region has an important impact on streamflow. In particular, hydropower production is known to cause hydropeaking i.e., sudden fluctuations in river stage caused by the release or storage of water in artificial reservoirs. Modeling hydropeaking with hydrological models, such as the Soil Water Assessment Tool (SWAT), requires knowledge of reservoir management rules. These data are often not available since they are sensitive information belonging to hydropower production companies. In this short communication, we propose to couple the results of a calibrated hydrological model with a machine learning method to reproduce hydropeaking without requiring the knowledge of the actual reservoir management operation. We trained a support vector machine (SVM) with SWAT model outputs, the day of the week and the energy price. We tested the model for the Upper Adige river basin in North-East Italy. A wavelet analysis showed that energy price has a significant influence on river discharge, and a wavelet coherence analysis demonstrated the improved performance of the SVM model in comparison to the SWAT model alone. The SVM model was also able to capture the fluctuations in streamflow caused by hydropeaking when both energy price and river discharge displayed a complex temporal dynamic. Copyright © 2018 Elsevier B.V. All rights reserved.
2014-01-01
Background Support vector regression (SVR) and Gaussian process regression (GPR) were used for the analysis of electroanalytical experimental data to estimate diffusion coefficients. Results For simulated cyclic voltammograms based on the EC, Eqr, and EqrC mechanisms these regression algorithms in combination with nonlinear kernel/covariance functions yielded diffusion coefficients with higher accuracy as compared to the standard approach of calculating diffusion coefficients relying on the Nicholson-Shain equation. The level of accuracy achieved by SVR and GPR is virtually independent of the rate constants governing the respective reaction steps. Further, the reduction of high-dimensional voltammetric signals by manual selection of typical voltammetric peak features decreased the performance of both regression algorithms compared to a reduction by downsampling or principal component analysis. After training on simulated data sets, diffusion coefficients were estimated by the regression algorithms for experimental data comprising voltammetric signals for three organometallic complexes. Conclusions Estimated diffusion coefficients closely matched the values determined by the parameter fitting method, but reduced the required computational time considerably for one of the reaction mechanisms. The automated processing of voltammograms according to the regression algorithms yields better results than the conventional analysis of peak-related data. PMID:24987463
Allan, Bruce D; Hassan, Hala; Ieong, Alvin
2015-05-01
To describe and evaluate a new multiple regression-derived nomogram for myopic wavefront laser in situ keratomileusis (LASIK). Moorfields Eye Hospital, London, United Kingdom. Prospective comparative case series. Multiple regression modeling was used to derive a simplified formula for adjusting attempted spherical correction in myopic LASIK. An adaptation of Thibos' power vector method was then applied to derive adjustments to attempted cylindrical correction in eyes with 1.0 diopter (D) or more of preoperative cylinder. These elements were combined in a new nomogram (nomogram II). The 3-month refractive results for myopic wavefront LASIK (spherical equivalent ≤11.0 D; cylinder ≤4.5 D) were compared between 299 consecutive eyes treated using the earlier nomogram (nomogram I) in 2009 and 2010 and 414 eyes treated using nomogram II in 2011 and 2012. There was no significant difference in treatment accuracy (variance in the postoperative manifest refraction spherical equivalent error) between nomogram I and nomogram II (P = .73, Bartlett test). Fewer patients treated with nomogram II had more than 0.5 D of residual postoperative astigmatism (P = .0001, Fisher exact test). There was no significant coupling between adjustments to the attempted cylinder and the achieved sphere (P = .18, t test). Discarding marginal influences from a multiple regression-derived nomogram for myopic wavefront LASIK had no clinically significant effect on treatment accuracy. Thibos' power vector method can be used to guide adjustments to the treatment cylinder alongside nomograms designed to optimize postoperative spherical equivalent results in myopic LASIK. mentioned. Copyright © 2015 ASCRS and ESCRS. Published by Elsevier Inc. All rights reserved.
Mocellin, Simone; Ambrosi, Alessandro; Montesco, Maria Cristina; Foletto, Mirto; Zavagno, Giorgio; Nitti, Donato; Lise, Mario; Rossi, Carlo Riccardo
2006-08-01
Currently, approximately 80% of melanoma patients undergoing sentinel node biopsy (SNB) have negative sentinel lymph nodes (SLNs), and no prediction system is reliable enough to be implemented in the clinical setting to reduce the number of SNB procedures. In this study, the predictive power of support vector machine (SVM)-based statistical analysis was tested. The clinical records of 246 patients who underwent SNB at our institution were used for this analysis. The following clinicopathologic variables were considered: the patient's age and sex and the tumor's histological subtype, Breslow thickness, Clark level, ulceration, mitotic index, lymphocyte infiltration, regression, angiolymphatic invasion, microsatellitosis, and growth phase. The results of SVM-based prediction of SLN status were compared with those achieved with logistic regression. The SLN positivity rate was 22% (52 of 234). When the accuracy was > or = 80%, the negative predictive value, positive predictive value, specificity, and sensitivity were 98%, 54%, 94%, and 77% and 82%, 41%, 69%, and 93% by using SVM and logistic regression, respectively. Moreover, SVM and logistic regression were associated with a diagnostic error and an SNB percentage reduction of (1) 1% and 60% and (2) 15% and 73%, respectively. The results from this pilot study suggest that SVM-based prediction of SLN status might be evaluated as a prognostic method to avoid the SNB procedure in 60% of patients currently eligible, with a very low error rate. If validated in larger series, this strategy would lead to obvious advantages in terms of both patient quality of life and costs for the health care system.
Knowledge, Attitude, and Practices Regarding Vector-borne Diseases in Western Jamaica.
Alobuia, Wilson M; Missikpode, Celestin; Aung, Maung; Jolly, Pauline E
2015-01-01
Outbreaks of vector-borne diseases (VBDs) such as dengue and malaria can overwhelm health systems in resource-poor countries. Environmental management strategies that reduce or eliminate vector breeding sites combined with improved personal prevention strategies can help to significantly reduce transmission of these infections. The aim of this study was to assess the knowledge, attitudes, and practices (KAPs) of residents in western Jamaica regarding control of mosquito vectors and protection from mosquito bites. A cross-sectional study was conducted between May and August 2010 among patients or family members of patients waiting to be seen at hospitals in western Jamaica. Participants completed an interviewer-administered questionnaire on sociodemographic factors and KAPs regarding VBDs. KAP scores were calculated and categorized as high or low based on the number of correct or positive responses. Logistic regression analyses were conducted to identify predictors of KAP and linear regression analysis conducted to determine if knowledge and attitude scores predicted practice scores. In all, 361 (85 men and 276 women) people participated in the study. Most participants (87%) scored low on knowledge and practice items (78%). Conversely, 78% scored high on attitude items. By multivariate logistic regression, housewives were 82% less likely than laborers to have high attitude scores; homeowners were 65% less likely than renters to have high attitude scores. Participants from households with 1 to 2 children were 3.4 times more likely to have high attitude scores compared with those from households with no children. Participants from households with at least 5 people were 65% less likely than those from households with fewer than 5 people to have high practice scores. By multivariable linear regression knowledge and attitude scores were significant predictors of practice score. The study revealed poor knowledge of VBDs and poor prevention practices among participants. It identified specific groups that can be targeted with vector control and personal protection interventions to decrease transmission of the infections. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Domain-Invariant Partial-Least-Squares Regression.
Nikzad-Langerodi, Ramin; Zellinger, Werner; Lughofer, Edwin; Saminger-Platz, Susanne
2018-05-11
Multivariate calibration models often fail to extrapolate beyond the calibration samples because of changes associated with the instrumental response, environmental condition, or sample matrix. Most of the current methods used to adapt a source calibration model to a target domain exclusively apply to calibration transfer between similar analytical devices, while generic methods for calibration-model adaptation are largely missing. To fill this gap, we here introduce domain-invariant partial-least-squares (di-PLS) regression, which extends ordinary PLS by a domain regularizer in order to align the source and target distributions in the latent-variable space. We show that a domain-invariant weight vector can be derived in closed form, which allows the integration of (partially) labeled data from the source and target domains as well as entirely unlabeled data from the latter. We test our approach on a simulated data set where the aim is to desensitize a source calibration model to an unknown interfering agent in the target domain (i.e., unsupervised model adaptation). In addition, we demonstrate unsupervised, semisupervised, and supervised model adaptation by di-PLS on two real-world near-infrared (NIR) spectroscopic data sets.
Support vector machines classifiers of physical activities in preschoolers
USDA-ARS?s Scientific Manuscript database
The goal of this study is to develop, test, and compare multinomial logistic regression (MLR) and support vector machines (SVM) in classifying preschool-aged children physical activity data acquired from an accelerometer. In this study, 69 children aged 3-5 years old were asked to participate in a s...
2015-04-15
manage , predict, and mitigate the risk in the original variable. Residual risk can be exemplified as a quantification of the improved... the random variable of interest is viewed in concert with a related random vector that helps to manage , predict, and mitigate the risk in the original... manage , predict and mitigate the risk in the original variable. Residual risk can be exemplified as a quantification of the improved situation faced
Alamaniotis, Miltiadis; Agarwal, Vivek
2014-04-01
Anticipatory control systems are a class of systems whose decisions are based on predictions for the future state of the system under monitoring. Anticipation denotes intelligence and is an inherent property of humans that make decisions by projecting in future. Likewise, artificially intelligent systems equipped with predictive functions may be utilized for anticipating future states of complex systems, and therefore facilitate automated control decisions. Anticipatory control of complex energy systems is paramount to their normal and safe operation. In this paper a new intelligent methodology integrating fuzzy inference with support vector regression is introduced. Our proposed methodology implements an anticipatorymore » system aiming at controlling energy systems in a robust way. Initially a set of support vector regressors is adopted for making predictions over critical system parameters. Furthermore, the predicted values are fed into a two stage fuzzy inference system that makes decisions regarding the state of the energy system. The inference system integrates the individual predictions into a single one at its first stage, and outputs a decision together with a certainty factor computed at its second stage. The certainty factor is an index of the significance of the decision. The proposed anticipatory control system is tested on a real world set of data obtained from a complex energy system, describing the degradation of a turbine. Results exhibit the robustness of the proposed system in controlling complex energy systems.« less
Fuzzy classifier based support vector regression framework for Poisson ratio determination
NASA Astrophysics Data System (ADS)
Asoodeh, Mojtaba; Bagheripour, Parisa
2013-09-01
Poisson ratio is considered as one of the most important rock mechanical properties of hydrocarbon reservoirs. Determination of this parameter through laboratory measurement is time, cost, and labor intensive. Furthermore, laboratory measurements do not provide continuous data along the reservoir intervals. Hence, a fast, accurate, and inexpensive way of determining Poisson ratio which produces continuous data over the whole reservoir interval is desirable. For this purpose, support vector regression (SVR) method based on statistical learning theory (SLT) was employed as a supervised learning algorithm to estimate Poisson ratio from conventional well log data. SVR is capable of accurately extracting the implicit knowledge contained in conventional well logs and converting the gained knowledge into Poisson ratio data. Structural risk minimization (SRM) principle which is embedded in the SVR structure in addition to empirical risk minimization (EMR) principle provides a robust model for finding quantitative formulation between conventional well log data and Poisson ratio. Although satisfying results were obtained from an individual SVR model, it had flaws of overestimation in low Poisson ratios and underestimation in high Poisson ratios. These errors were eliminated through implementation of fuzzy classifier based SVR (FCBSVR). The FCBSVR significantly improved accuracy of the final prediction. This strategy was successfully applied to data from carbonate reservoir rocks of an Iranian Oil Field. Results indicated that SVR predicted Poisson ratio values are in good agreement with measured values.
[Hyperspectral Estimation of Apple Tree Canopy LAI Based on SVM and RF Regression].
Han, Zhao-ying; Zhu, Xi-cun; Fang, Xian-yi; Wang, Zhuo-yuan; Wang, Ling; Zhao, Geng-Xing; Jiang, Yuan-mao
2016-03-01
Leaf area index (LAI) is the dynamic index of crop population size. Hyperspectral technology can be used to estimate apple canopy LAI rapidly and nondestructively. It can be provide a reference for monitoring the tree growing and yield estimation. The Red Fuji apple trees of full bearing fruit are the researching objects. Ninety apple trees canopies spectral reflectance and LAI values were measured by the ASD Fieldspec3 spectrometer and LAI-2200 in thirty orchards in constant two years in Qixia research area of Shandong Province. The optimal vegetation indices were selected by the method of correlation analysis of the original spectral reflectance and vegetation indices. The models of predicting the LAI were built with the multivariate regression analysis method of support vector machine (SVM) and random forest (RF). The new vegetation indices, GNDVI527, ND-VI676, RVI682, FD-NVI656 and GRVI517 and the previous two main vegetation indices, NDVI670 and NDVI705, are in accordance with LAI. In the RF regression model, the calibration set decision coefficient C-R2 of 0.920 and validation set decision coefficient V-R2 of 0.889 are higher than the SVM regression model by 0.045 and 0.033 respectively. The root mean square error of calibration set C-RMSE of 0.249, the root mean square error validation set V-RMSE of 0.236 are lower than that of the SVM regression model by 0.054 and 0.058 respectively. Relative analysis of calibrating error C-RPD and relative analysis of validation set V-RPD reached 3.363 and 2.520, 0.598 and 0.262, respectively, which were higher than the SVM regression model. The measured and predicted the scatterplot trend line slope of the calibration set and validation set C-S and V-S are close to 1. The estimation result of RF regression model is better than that of the SVM. RF regression model can be used to estimate the LAI of red Fuji apple trees in full fruit period.
Research on On-Line Modeling of Fed-Batch Fermentation Process Based on v-SVR
NASA Astrophysics Data System (ADS)
Ma, Yongjun
The fermentation process is very complex and non-linear, many parameters are not easy to measure directly on line, soft sensor modeling is a good solution. This paper introduces v-support vector regression (v-SVR) for soft sensor modeling of fed-batch fermentation process. v-SVR is a novel type of learning machine. It can control the accuracy of fitness and prediction error by adjusting the parameter v. An on-line training algorithm is discussed in detail to reduce the training complexity of v-SVR. The experimental results show that v-SVR has low error rate and better generalization with appropriate v.
Data-driven mapping of the potential mountain permafrost distribution.
Deluigi, Nicola; Lambiel, Christophe; Kanevski, Mikhail
2017-07-15
Existing mountain permafrost distribution models generally offer a good overview of the potential extent of this phenomenon at a regional scale. They are however not always able to reproduce the high spatial discontinuity of permafrost at the micro-scale (scale of a specific landform; ten to several hundreds of meters). To overcome this lack, we tested an alternative modelling approach using three classification algorithms belonging to statistics and machine learning: Logistic regression, Support Vector Machines and Random forests. These supervised learning techniques infer a classification function from labelled training data (pixels of permafrost absence and presence) with the aim of predicting the permafrost occurrence where it is unknown. The research was carried out in a 588km 2 area of the Western Swiss Alps. Permafrost evidences were mapped from ortho-image interpretation (rock glacier inventorying) and field data (mainly geoelectrical and thermal data). The relationship between selected permafrost evidences and permafrost controlling factors was computed with the mentioned techniques. Classification performances, assessed with AUROC, range between 0.81 for Logistic regression, 0.85 with Support Vector Machines and 0.88 with Random forests. The adopted machine learning algorithms have demonstrated to be efficient for permafrost distribution modelling thanks to consistent results compared to the field reality. The high resolution of the input dataset (10m) allows elaborating maps at the micro-scale with a modelled permafrost spatial distribution less optimistic than classic spatial models. Moreover, the probability output of adopted algorithms offers a more precise overview of the potential distribution of mountain permafrost than proposing simple indexes of the permafrost favorability. These encouraging results also open the way to new possibilities of permafrost data analysis and mapping. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Rocha, Alby D.; Groen, Thomas A.; Skidmore, Andrew K.; Darvishzadeh, Roshanak; Willemen, Louise
2017-11-01
The growing number of narrow spectral bands in hyperspectral remote sensing improves the capacity to describe and predict biological processes in ecosystems. But it also poses a challenge to fit empirical models based on such high dimensional data, which often contain correlated and noisy predictors. As sample sizes, to train and validate empirical models, seem not to be increasing at the same rate, overfitting has become a serious concern. Overly complex models lead to overfitting by capturing more than the underlying relationship, and also through fitting random noise in the data. Many regression techniques claim to overcome these problems by using different strategies to constrain complexity, such as limiting the number of terms in the model, by creating latent variables or by shrinking parameter coefficients. This paper is proposing a new method, named Naïve Overfitting Index Selection (NOIS), which makes use of artificially generated spectra, to quantify the relative model overfitting and to select an optimal model complexity supported by the data. The robustness of this new method is assessed by comparing it to a traditional model selection based on cross-validation. The optimal model complexity is determined for seven different regression techniques, such as partial least squares regression, support vector machine, artificial neural network and tree-based regressions using five hyperspectral datasets. The NOIS method selects less complex models, which present accuracies similar to the cross-validation method. The NOIS method reduces the chance of overfitting, thereby avoiding models that present accurate predictions that are only valid for the data used, and too complex to make inferences about the underlying process.
NASA Astrophysics Data System (ADS)
Shastri, Niket; Pathak, Kamlesh
2018-05-01
The water vapor content in atmosphere plays very important role in climate. In this paper the application of GPS signal in meteorology is discussed, which is useful technique that is used to estimate the perceptible water vapor of atmosphere. In this paper various algorithms like artificial neural network, support vector machine and multiple linear regression are use to predict perceptible water vapor. The comparative studies in terms of root mean square error and mean absolute errors are also carried out for all the algorithms.
Wang, Hui; Qin, Feng; Ruan, Liu; Wang, Rui; Liu, Qi; Ma, Zhanhong; Li, Xiaolong; Cheng, Pei; Wang, Haiguang
2016-01-01
It is important to implement detection and assessment of plant diseases based on remotely sensed data for disease monitoring and control. Hyperspectral data of healthy leaves, leaves in incubation period and leaves in diseased period of wheat stripe rust and wheat leaf rust were collected under in-field conditions using a black-paper-based measuring method developed in this study. After data preprocessing, the models to identify the diseases were built using distinguished partial least squares (DPLS) and support vector machine (SVM), and the disease severity inversion models of stripe rust and the disease severity inversion models of leaf rust were built using quantitative partial least squares (QPLS) and support vector regression (SVR). All the models were validated by using leave-one-out cross validation and external validation. The diseases could be discriminated using both distinguished partial least squares and support vector machine with the accuracies of more than 99%. For each wheat rust, disease severity levels were accurately retrieved using both the optimal QPLS models and the optimal SVR models with the coefficients of determination (R2) of more than 0.90 and the root mean square errors (RMSE) of less than 0.15. The results demonstrated that identification and severity evaluation of stripe rust and leaf rust at the leaf level could be implemented based on the hyperspectral data acquired using the developed method. A scientific basis was provided for implementing disease monitoring by using aerial and space remote sensing technologies.
Ruan, Liu; Wang, Rui; Liu, Qi; Ma, Zhanhong; Li, Xiaolong; Cheng, Pei; Wang, Haiguang
2016-01-01
It is important to implement detection and assessment of plant diseases based on remotely sensed data for disease monitoring and control. Hyperspectral data of healthy leaves, leaves in incubation period and leaves in diseased period of wheat stripe rust and wheat leaf rust were collected under in-field conditions using a black-paper-based measuring method developed in this study. After data preprocessing, the models to identify the diseases were built using distinguished partial least squares (DPLS) and support vector machine (SVM), and the disease severity inversion models of stripe rust and the disease severity inversion models of leaf rust were built using quantitative partial least squares (QPLS) and support vector regression (SVR). All the models were validated by using leave-one-out cross validation and external validation. The diseases could be discriminated using both distinguished partial least squares and support vector machine with the accuracies of more than 99%. For each wheat rust, disease severity levels were accurately retrieved using both the optimal QPLS models and the optimal SVR models with the coefficients of determination (R2) of more than 0.90 and the root mean square errors (RMSE) of less than 0.15. The results demonstrated that identification and severity evaluation of stripe rust and leaf rust at the leaf level could be implemented based on the hyperspectral data acquired using the developed method. A scientific basis was provided for implementing disease monitoring by using aerial and space remote sensing technologies. PMID:27128464
Marchese Robinson, Richard L; Palczewska, Anna; Palczewski, Jan; Kidley, Nathan
2017-08-28
The ability to interpret the predictions made by quantitative structure-activity relationships (QSARs) offers a number of advantages. While QSARs built using nonlinear modeling approaches, such as the popular Random Forest algorithm, might sometimes be more predictive than those built using linear modeling approaches, their predictions have been perceived as difficult to interpret. However, a growing number of approaches have been proposed for interpreting nonlinear QSAR models in general and Random Forest in particular. In the current work, we compare the performance of Random Forest to those of two widely used linear modeling approaches: linear Support Vector Machines (SVMs) (or Support Vector Regression (SVR)) and partial least-squares (PLS). We compare their performance in terms of their predictivity as well as the chemical interpretability of the predictions using novel scoring schemes for assessing heat map images of substructural contributions. We critically assess different approaches for interpreting Random Forest models as well as for obtaining predictions from the forest. We assess the models on a large number of widely employed public-domain benchmark data sets corresponding to regression and binary classification problems of relevance to hit identification and toxicology. We conclude that Random Forest typically yields comparable or possibly better predictive performance than the linear modeling approaches and that its predictions may also be interpreted in a chemically and biologically meaningful way. In contrast to earlier work looking at interpretation of nonlinear QSAR models, we directly compare two methodologically distinct approaches for interpreting Random Forest models. The approaches for interpreting Random Forest assessed in our article were implemented using open-source programs that we have made available to the community. These programs are the rfFC package ( https://r-forge.r-project.org/R/?group_id=1725 ) for the R statistical programming language and the Python program HeatMapWrapper [ https://doi.org/10.5281/zenodo.495163 ] for heat map generation.
Regression-based adaptive sparse polynomial dimensional decomposition for sensitivity analysis
NASA Astrophysics Data System (ADS)
Tang, Kunkun; Congedo, Pietro; Abgrall, Remi
2014-11-01
Polynomial dimensional decomposition (PDD) is employed in this work for global sensitivity analysis and uncertainty quantification of stochastic systems subject to a large number of random input variables. Due to the intimate structure between PDD and Analysis-of-Variance, PDD is able to provide simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to polynomial chaos (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of the standard method unaffordable for real engineering applications. In order to address this problem of curse of dimensionality, this work proposes a variance-based adaptive strategy aiming to build a cheap meta-model by sparse-PDD with PDD coefficients computed by regression. During this adaptive procedure, the model representation by PDD only contains few terms, so that the cost to resolve repeatedly the linear system of the least-square regression problem is negligible. The size of the final sparse-PDD representation is much smaller than the full PDD, since only significant terms are eventually retained. Consequently, a much less number of calls to the deterministic model is required to compute the final PDD coefficients.
Climate, Deer, Rodents, and Acorns as Determinants of Variation in Lyme-Disease Risk
Canham, Charles D; Oggenfuss, Kelly; Winchcombe, Raymond J; Keesing, Felicia
2006-01-01
Risk of human exposure to vector-borne zoonotic pathogens is a function of the abundance and infection prevalence of vectors. We assessed the determinants of Lyme-disease risk (density and Borrelia burgdorferi-infection prevalence of nymphal Ixodes scapularis ticks) over 13 y on several field plots within eastern deciduous forests in the epicenter of US Lyme disease (Dutchess County, New York). We used a model comparison approach to simultaneously test the importance of ambient growing-season temperature, precipitation, two indices of deer (Odocoileus virginianus) abundance, and densities of white-footed mice (Peromyscus leucopus), eastern chipmunks (Tamias striatus), and acorns ( Quercus spp.), in both simple and multiple regression models, in predicting entomological risk. Indices of deer abundance had no predictive power, and precipitation in the current year and temperature in the prior year had only weak effects on entomological risk. The strongest predictors of a current year's risk were the prior year's abundance of mice and chipmunks and abundance of acorns 2 y previously. In no case did inclusion of deer or climate variables improve the predictive power of models based on rodents, acorns, or both. We conclude that interannual variation in entomological risk of exposure to Lyme disease is correlated positively with prior abundance of key hosts for the immature stages of the tick vector and with critical food resources for those hosts. PMID:16669698
[Exploration of influencing factors of price of herbal based on VAR model].
Wang, Nuo; Liu, Shu-Zhen; Yang, Guang
2014-10-01
Based on vector auto-regression (VAR) model, this paper takes advantage of Granger causality test, variance decomposition and impulse response analysis techniques to carry out a comprehensive study of the factors influencing the price of Chinese herbal, including herbal cultivation costs, acreage, natural disasters, the residents' needs and inflation. The study found that there is Granger causality relationship between inflation and herbal prices, cultivation costs and herbal prices. And in the total variance analysis of Chinese herbal and medicine price index, the largest contribution to it is from its own fluctuations, followed by the cultivation costs and inflation.
Balabin, Roman M; Lomakina, Ekaterina I
2011-06-28
A multilayer feed-forward artificial neural network (MLP-ANN) with a single, hidden layer that contains a finite number of neurons can be regarded as a universal non-linear approximator. Today, the ANN method and linear regression (MLR) model are widely used for quantum chemistry (QC) data analysis (e.g., thermochemistry) to improve their accuracy (e.g., Gaussian G2-G4, B3LYP/B3-LYP, X1, or W1 theoretical methods). In this study, an alternative approach based on support vector machines (SVMs) is used, the least squares support vector machine (LS-SVM) regression. It has been applied to ab initio (first principle) and density functional theory (DFT) quantum chemistry data. So, QC + SVM methodology is an alternative to QC + ANN one. The task of the study was to estimate the Møller-Plesset (MPn) or DFT (B3LYP, BLYP, BMK) energies calculated with large basis sets (e.g., 6-311G(3df,3pd)) using smaller ones (6-311G, 6-311G*, 6-311G**) plus molecular descriptors. A molecular set (BRM-208) containing a total of 208 organic molecules was constructed and used for the LS-SVM training, cross-validation, and testing. MP2, MP3, MP4(DQ), MP4(SDQ), and MP4/MP4(SDTQ) ab initio methods were tested. Hartree-Fock (HF/SCF) results were also reported for comparison. Furthermore, constitutional (CD: total number of atoms and mole fractions of different atoms) and quantum-chemical (QD: HOMO-LUMO gap, dipole moment, average polarizability, and quadrupole moment) molecular descriptors were used for the building of the LS-SVM calibration model. Prediction accuracies (MADs) of 1.62 ± 0.51 and 0.85 ± 0.24 kcal mol(-1) (1 kcal mol(-1) = 4.184 kJ mol(-1)) were reached for SVM-based approximations of ab initio and DFT energies, respectively. The LS-SVM model was more accurate than the MLR model. A comparison with the artificial neural network approach shows that the accuracy of the LS-SVM method is similar to the accuracy of ANN. The extrapolation and interpolation results show that LS-SVM is superior by almost an order of magnitude over the ANN method in terms of the stability, generality, and robustness of the final model. The LS-SVM model needs a much smaller numbers of samples (a much smaller sample set) to make accurate prediction results. Potential energy surface (PES) approximations for molecular dynamics (MD) studies are discussed as a promising application for the LS-SVM calibration approach. This journal is © the Owner Societies 2011
NASA Astrophysics Data System (ADS)
Muller, Sybrand Jacobus; van Niekerk, Adriaan
2016-07-01
Soil salinity often leads to reduced crop yield and quality and can render soils barren. Irrigated areas are particularly at risk due to intensive cultivation and secondary salinization caused by waterlogging. Regular monitoring of salt accumulation in irrigation schemes is needed to keep its negative effects under control. The dynamic spatial and temporal characteristics of remote sensing can provide a cost-effective solution for monitoring salt accumulation at irrigation scheme level. This study evaluated a range of pan-fused SPOT-5 derived features (spectral bands, vegetation indices, image textures and image transformations) for classifying salt-affected areas in two distinctly different irrigation schemes in South Africa, namely Vaalharts and Breede River. The relationship between the input features and electro conductivity measurements were investigated using regression modelling (stepwise linear regression, partial least squares regression, curve fit regression modelling) and supervised classification (maximum likelihood, nearest neighbour, decision tree analysis, support vector machine and random forests). Classification and regression trees and random forest were used to select the most important features for differentiating salt-affected and unaffected areas. The results showed that the regression analyses produced weak models (<0.4 R squared). Better results were achieved using the supervised classifiers, but the algorithms tend to over-estimate salt-affected areas. A key finding was that none of the feature sets or classification algorithms stood out as being superior for monitoring salt accumulation at irrigation scheme level. This was attributed to the large variations in the spectral responses of different crops types at different growing stages, coupled with their individual tolerances to saline conditions.
NASA Astrophysics Data System (ADS)
Wu, Peilin; Zhang, Qunying; Fei, Chunjiao; Fang, Guangyou
2017-04-01
Aeromagnetic gradients are typically measured by optically pumped magnetometers mounted on an aircraft. Any aircraft, particularly helicopters, produces significant levels of magnetic interference. Therefore, aeromagnetic compensation is essential, and least square (LS) is the conventional method used for reducing interference levels. However, the LSs approach to solving the aeromagnetic interference model has a few difficulties, one of which is in handling multicollinearity. Therefore, we propose an aeromagnetic gradient compensation method, specifically targeted for helicopter use but applicable on any airborne platform, which is based on the ɛ-support vector regression algorithm. The structural risk minimization criterion intrinsic to the method avoids multicollinearity altogether. Local aeromagnetic anomalies can be retained, and platform-generated fields are suppressed simultaneously by constructing an appropriate loss function and kernel function. The method was tested using an unmanned helicopter and obtained improvement ratios of 12.7 and 3.5 in the vertical and horizontal gradient data, respectively. Both of these values are probably better than those that would have been obtained from the conventional method applied to the same data, had it been possible to do so in a suitable comparative context. The validity of the proposed method is demonstrated by the experimental result.
Predicting Active Users' Personality Based on Micro-Blogging Behaviors
Hao, Bibo; Guan, Zengda; Zhu, Tingshao
2014-01-01
Because of its richness and availability, micro-blogging has become an ideal platform for conducting psychological research. In this paper, we proposed to predict active users' personality traits through micro-blogging behaviors. 547 Chinese active users of micro-blogging participated in this study. Their personality traits were measured by the Big Five Inventory, and digital records of micro-blogging behaviors were collected via web crawlers. After extracting 845 micro-blogging behavioral features, we first trained classification models utilizing Support Vector Machine (SVM), differentiating participants with high and low scores on each dimension of the Big Five Inventory. The classification accuracy ranged from 84% to 92%. We also built regression models utilizing PaceRegression methods, predicting participants' scores on each dimension of the Big Five Inventory. The Pearson correlation coefficients between predicted scores and actual scores ranged from 0.48 to 0.54. Results indicated that active users' personality traits could be predicted by micro-blogging behaviors. PMID:24465462
A Language-Independent Approach to Automatic Text Difficulty Assessment for Second-Language Learners
2013-08-01
best-suited for regression. Our baseline uses z-normalized shallow length features and TF -LOG weighted vectors on bag-of-words for Arabic, Dari...length features and TF -LOG weighted vectors on bag-of-words for Arabic, Dari, English and Pashto. We compare Support Vector Machines and the Margin...football, whereas they are much less common in documents about opera). We used TF -LOG weighted word frequencies on bag-of-words for each document
Kesari, Shreekant; Bhunia, Gouri Sankar; Kumar, Vijay; Jeyaram, Algarswamy; Ranjan, Alok; Das, Pradeep
2011-08-01
In visceral leishmaniasis, phlebotomine vectors are targets for control measures. Understanding the ecosystem of the vectors is a prerequisite for creating these control measures. This study endeavours to delineate the suitable locations of Phlebotomus argentipes with relation to environmental characteristics between endemic and non-endemic districts in India. A cross-sectional survey was conducted on 25 villages in each district. Environmental data were obtained through remote sensing images and vector density was measured using a CDC light trap. Simple linear regression analysis was used to measure the association between climatic parameters and vector density. Using factor analysis, the relationship between land cover classes and P. argentipes density among the villages in both districts was investigated. The results of the regression analysis indicated that indoor temperature and relative humidity are the best predictors for P. argentipes distribution. Factor analysis confirmed breeding preferences for P. argentipes by landscape element. Minimum Normalised Difference Vegetation Index, marshy land and orchard/settlement produced high loading in an endemic region, whereas water bodies and dense forest were preferred in non-endemic sites. Soil properties between the two districts were studied and indicated that soil pH and moisture content is higher in endemic sites compared to non-endemic sites. The present study should be utilised to make critical decisions for vector surveillance and controlling Kala-azar disease vectors.
Liu, Shu-Shen; Liu, Yan; Yin, Da-Qian; Wang, Xiao-Dong; Wang, Lian-Sheng
2006-02-01
Using the molecular electronegativity distance vector (MEDV) descriptors derived directly from the molecular topological structures, the gas chromatographic relative retention times (RRTs) of 209 polychlorinated biphenyls (PCBs) on the SE-54 stationary phase were predicted. A five-variable regression equation with the correlation coefficient of 0.9964 and the root mean square errors of 0.0152 was developed. The descriptors included in the equation represent degree of chlorination (nCl), nonortho index (Ino), and interactions between three pairs of atom types, i.e., atom groups -C= and -C=, -C= and >C=, -C= and -Cl. It has been proved that the retention times of all 209 PCB congeners can be accurately predicted as long as there are more than 50 calibration compounds. In the same way, the MEDV descriptors are also used to develop the five- or six-variable models of RRTs of PCBs on other 18 stationary phases and the correlation coefficients in both modeling stage and LOO cross-validation step are not lower than 0.99 except two models.
Dicko, Ahmadou H.; Lancelot, Renaud; Seck, Momar T.; Guerrini, Laure; Sall, Baba; Lo, Mbargou; Vreysen, Marc J. B.; Lefrançois, Thierry; Fonta, William M.; Peck, Steven L.; Bouyer, Jérémy
2014-01-01
Tsetse flies are vectors of human and animal trypanosomoses in sub-Saharan Africa and are the target of the Pan African Tsetse and Trypanosomiasis Eradication Campaign (PATTEC). Glossina palpalis gambiensis (Diptera: Glossinidae) is a riverine species that is still present as an isolated metapopulation in the Niayes area of Senegal. It is targeted by a national eradication campaign combining a population reduction phase based on insecticide-treated targets (ITTs) and cattle and an eradication phase based on the sterile insect technique. In this study, we used species distribution models to optimize control operations. We compared the probability of the presence of G. p. gambiensis and habitat suitability using a regularized logistic regression and Maxent, respectively. Both models performed well, with an area under the curve of 0.89 and 0.92, respectively. Only the Maxent model predicted an expert-based classification of landscapes correctly. Maxent predictions were therefore used throughout the eradication campaign in the Niayes to make control operations more efficient in terms of deployment of ITTs, release density of sterile males, and location of monitoring traps used to assess program progress. We discuss how the models’ results informed about the particular ecology of tsetse in the target area. Maxent predictions allowed optimizing efficiency and cost within our project, and might be useful for other tsetse control campaigns in the framework of the PATTEC and, more generally, other vector or insect pest control programs. PMID:24982143
A Field Trial of Alternative Targeted Screening Strategies for Chagas Disease in Arequipa, Peru
Hunter, Gabrielle C.; Borrini-Mayorí, Katty; Ancca Juárez, Jenny; Castillo Neyra, Ricardo; Verastegui, Manuela R.; Malaga Chavez, Fernando S.; Cornejo del Carpio, Juan Geny; Córdova Benzaquen, Eleazar; Náquira, César; Gilman, Robert H.; Bern, Caryn; Levy, Michael Z.
2012-01-01
Background Chagas disease is endemic in the rural areas of southern Peru and a growing urban problem in the regional capital of Arequipa, population ∼860,000. It is unclear how to implement cost-effective screening programs across a large urban and periurban environment. Methods We compared four alternative screening strategies in 18 periurban communities, testing individuals in houses with 1) infected vectors; 2) high vector densities; 3) low vector densities; and 4) no vectors. Vector data were obtained from routine Ministry of Health insecticide application campaigns. We performed ring case detection (radius of 15 m) around seropositive individuals, and collected data on costs of implementation for each strategy. Results Infection was detected in 21 of 923 (2.28%) participants. Cases had lived more time on average in rural places than non-cases (7.20 years versus 3.31 years, respectively). Significant risk factors on univariate logistic regression for infection were age (OR 1.02; p = 0.041), time lived in a rural location (OR 1.04; p = 0.022), and time lived in an infested area (OR 1.04; p = 0.008). No multivariate model with these variables fit the data better than a simple model including only the time lived in an area with triatomine bugs. There was no significant difference in prevalence across the screening strategies; however a self-assessment of disease risk may have biased participation, inflating prevalence among residents of houses where no infestation was detected. Testing houses with infected-vectors was least expensive. Ring case detection yielded four secondary cases in only one community, possibly due to vector-borne transmission in this community, apparently absent in the others. Conclusions Targeted screening for urban Chagas disease is promising in areas with ongoing vector-borne transmission; however, these pockets of epidemic transmission remain difficult to detect a priori. The flexibility to adapt to the epidemiology that emerges during screening is key to an efficient case detection intervention. In heterogeneous urban environments, self-assessments of risk and simple residence history questionnaires may be useful to identify those at highest risk for Chagas disease to guide diagnostic efforts. PMID:22253939
A field trial of alternative targeted screening strategies for Chagas disease in Arequipa, Peru.
Hunter, Gabrielle C; Borrini-Mayorí, Katty; Ancca Juárez, Jenny; Castillo Neyra, Ricardo; Verastegui, Manuela R; Malaga Chavez, Fernando S; Cornejo del Carpio, Juan Geny; Córdova Benzaquen, Eleazar; Náquira, César; Gilman, Robert H; Bern, Caryn; Levy, Michael Z
2012-01-01
Chagas disease is endemic in the rural areas of southern Peru and a growing urban problem in the regional capital of Arequipa, population ∼860,000. It is unclear how to implement cost-effective screening programs across a large urban and periurban environment. We compared four alternative screening strategies in 18 periurban communities, testing individuals in houses with 1) infected vectors; 2) high vector densities; 3) low vector densities; and 4) no vectors. Vector data were obtained from routine Ministry of Health insecticide application campaigns. We performed ring case detection (radius of 15 m) around seropositive individuals, and collected data on costs of implementation for each strategy. Infection was detected in 21 of 923 (2.28%) participants. Cases had lived more time on average in rural places than non-cases (7.20 years versus 3.31 years, respectively). Significant risk factors on univariate logistic regression for infection were age (OR 1.02; p = 0.041), time lived in a rural location (OR 1.04; p = 0.022), and time lived in an infested area (OR 1.04; p = 0.008). No multivariate model with these variables fit the data better than a simple model including only the time lived in an area with triatomine bugs. There was no significant difference in prevalence across the screening strategies; however a self-assessment of disease risk may have biased participation, inflating prevalence among residents of houses where no infestation was detected. Testing houses with infected-vectors was least expensive. Ring case detection yielded four secondary cases in only one community, possibly due to vector-borne transmission in this community, apparently absent in the others. Targeted screening for urban Chagas disease is promising in areas with ongoing vector-borne transmission; however, these pockets of epidemic transmission remain difficult to detect a priori. The flexibility to adapt to the epidemiology that emerges during screening is key to an efficient case detection intervention. In heterogeneous urban environments, self-assessments of risk and simple residence history questionnaires may be useful to identify those at highest risk for Chagas disease to guide diagnostic efforts.
Zhu, K; Lou, Z; Zhou, J; Ballester, N; Kong, N; Parikh, P
2015-01-01
This article is part of the Focus Theme of Methods of Information in Medicine on "Big Data and Analytics in Healthcare". Hospital readmissions raise healthcare costs and cause significant distress to providers and patients. It is, therefore, of great interest to healthcare organizations to predict what patients are at risk to be readmitted to their hospitals. However, current logistic regression based risk prediction models have limited prediction power when applied to hospital administrative data. Meanwhile, although decision trees and random forests have been applied, they tend to be too complex to understand among the hospital practitioners. Explore the use of conditional logistic regression to increase the prediction accuracy. We analyzed an HCUP statewide inpatient discharge record dataset, which includes patient demographics, clinical and care utilization data from California. We extracted records of heart failure Medicare beneficiaries who had inpatient experience during an 11-month period. We corrected the data imbalance issue with under-sampling. In our study, we first applied standard logistic regression and decision tree to obtain influential variables and derive practically meaning decision rules. We then stratified the original data set accordingly and applied logistic regression on each data stratum. We further explored the effect of interacting variables in the logistic regression modeling. We conducted cross validation to assess the overall prediction performance of conditional logistic regression (CLR) and compared it with standard classification models. The developed CLR models outperformed several standard classification models (e.g., straightforward logistic regression, stepwise logistic regression, random forest, support vector machine). For example, the best CLR model improved the classification accuracy by nearly 20% over the straightforward logistic regression model. Furthermore, the developed CLR models tend to achieve better sensitivity of more than 10% over the standard classification models, which can be translated to correct labeling of additional 400 - 500 readmissions for heart failure patients in the state of California over a year. Lastly, several key predictor identified from the HCUP data include the disposition location from discharge, the number of chronic conditions, and the number of acute procedures. It would be beneficial to apply simple decision rules obtained from the decision tree in an ad-hoc manner to guide the cohort stratification. It could be potentially beneficial to explore the effect of pairwise interactions between influential predictors when building the logistic regression models for different data strata. Judicious use of the ad-hoc CLR models developed offers insights into future development of prediction models for hospital readmissions, which can lead to better intuition in identifying high-risk patients and developing effective post-discharge care strategies. Lastly, this paper is expected to raise the awareness of collecting data on additional markers and developing necessary database infrastructure for larger-scale exploratory studies on readmission risk prediction.
Multi-fidelity Gaussian process regression for prediction of random fields
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parussini, L.; Venturi, D., E-mail: venturi@ucsc.edu; Perdikaris, P.
We propose a new multi-fidelity Gaussian process regression (GPR) approach for prediction of random fields based on observations of surrogate models or hierarchies of surrogate models. Our method builds upon recent work on recursive Bayesian techniques, in particular recursive co-kriging, and extends it to vector-valued fields and various types of covariances, including separable and non-separable ones. The framework we propose is general and can be used to perform uncertainty propagation and quantification in model-based simulations, multi-fidelity data fusion, and surrogate-based optimization. We demonstrate the effectiveness of the proposed recursive GPR techniques through various examples. Specifically, we study the stochastic Burgersmore » equation and the stochastic Oberbeck–Boussinesq equations describing natural convection within a square enclosure. In both cases we find that the standard deviation of the Gaussian predictors as well as the absolute errors relative to benchmark stochastic solutions are very small, suggesting that the proposed multi-fidelity GPR approaches can yield highly accurate results.« less
The prediction of food additives in the fruit juice based on electronic nose with chemometrics.
Qiu, Shanshan; Wang, Jun
2017-09-01
Food additives are added to products to enhance their taste, and preserve flavor or appearance. While their use should be restricted to achieve a technological benefit, the contents of food additives should be also strictly controlled. In this study, E-nose was applied as an alternative to traditional monitoring technologies for determining two food additives, namely benzoic acid and chitosan. For quantitative monitoring, support vector machine (SVM), random forest (RF), extreme learning machine (ELM) and partial least squares regression (PLSR) were applied to establish regression models between E-nose signals and the amount of food additives in fruit juices. The monitoring models based on ELM and RF reached higher correlation coefficients (R 2 s) and lower root mean square errors (RMSEs) than models based on PLSR and SVM. This work indicates that E-nose combined with RF or ELM can be a cost-effective, easy-to-build and rapid detection system for food additive monitoring. Copyright © 2017 Elsevier Ltd. All rights reserved.
Prediction of Spirometric Forced Expiratory Volume (FEV1) Data Using Support Vector Regression
NASA Astrophysics Data System (ADS)
Kavitha, A.; Sujatha, C. M.; Ramakrishnan, S.
2010-01-01
In this work, prediction of forced expiratory volume in 1 second (FEV1) in pulmonary function test is carried out using the spirometer and support vector regression analysis. Pulmonary function data are measured with flow volume spirometer from volunteers (N=175) using a standard data acquisition protocol. The acquired data are then used to predict FEV1. Support vector machines with polynomial kernel function with four different orders were employed to predict the values of FEV1. The performance is evaluated by computing the average prediction accuracy for normal and abnormal cases. Results show that support vector machines are capable of predicting FEV1 in both normal and abnormal cases and the average prediction accuracy for normal subjects was higher than that of abnormal subjects. Accuracy in prediction was found to be high for a regularization constant of C=10. Since FEV1 is the most significant parameter in the analysis of spirometric data, it appears that this method of assessment is useful in diagnosing the pulmonary abnormalities with incomplete data and data with poor recording.
Zhu, Hongyan; Chu, Bingquan; Fan, Yangyang; Tao, Xiaoya; Yin, Wenxin; He, Yong
2017-08-10
We investigated the feasibility and potentiality of determining firmness, soluble solids content (SSC), and pH in kiwifruits using hyperspectral imaging, combined with variable selection methods and calibration models. The images were acquired by a push-broom hyperspectral reflectance imaging system covering two spectral ranges. Weighted regression coefficients (BW), successive projections algorithm (SPA) and genetic algorithm-partial least square (GAPLS) were compared and evaluated for the selection of effective wavelengths. Moreover, multiple linear regression (MLR), partial least squares regression and least squares support vector machine (LS-SVM) were developed to predict quality attributes quantitatively using effective wavelengths. The established models, particularly SPA-MLR, SPA-LS-SVM and GAPLS-LS-SVM, performed well. The SPA-MLR models for firmness (R pre = 0.9812, RPD = 5.17) and SSC (R pre = 0.9523, RPD = 3.26) at 380-1023 nm showed excellent performance, whereas GAPLS-LS-SVM was the optimal model at 874-1734 nm for predicting pH (R pre = 0.9070, RPD = 2.60). Image processing algorithms were developed to transfer the predictive model in every pixel to generate prediction maps that visualize the spatial distribution of firmness and SSC. Hence, the results clearly demonstrated that hyperspectral imaging has the potential as a fast and non-invasive method to predict the quality attributes of kiwifruits.
NASA Astrophysics Data System (ADS)
Das, Bappa; Sahoo, Rabi N.; Pargal, Sourabh; Krishna, Gopal; Verma, Rakesh; Chinnusamy, Viswanathan; Sehgal, Vinay K.; Gupta, Vinod K.; Dash, Sushanta K.; Swain, Padmini
2018-03-01
In the present investigation, the changes in sucrose, reducing and total sugar content due to water-deficit stress in rice leaves were modeled using visible, near infrared (VNIR) and shortwave infrared (SWIR) spectroscopy. The objectives of the study were to identify the best vegetation indices and suitable multivariate technique based on precise analysis of hyperspectral data (350 to 2500 nm) and sucrose, reducing sugar and total sugar content measured at different stress levels from 16 different rice genotypes. Spectral data analysis was done to identify suitable spectral indices and models for sucrose estimation. Novel spectral indices in near infrared (NIR) range viz. ratio spectral index (RSI) and normalised difference spectral indices (NDSI) sensitive to sucrose, reducing sugar and total sugar content were identified which were subsequently calibrated and validated. The RSI and NDSI models had R2 values of 0.65, 0.71 and 0.67; RPD values of 1.68, 1.95 and 1.66 for sucrose, reducing sugar and total sugar, respectively for validation dataset. Different multivariate spectral models such as artificial neural network (ANN), multivariate adaptive regression splines (MARS), multiple linear regression (MLR), partial least square regression (PLSR), random forest regression (RFR) and support vector machine regression (SVMR) were also evaluated. The best performing multivariate models for sucrose, reducing sugars and total sugars were found to be, MARS, ANN and MARS, respectively with respect to RPD values of 2.08, 2.44, and 1.93. Results indicated that VNIR and SWIR spectroscopy combined with multivariate calibration can be used as a reliable alternative to conventional methods for measurement of sucrose, reducing sugars and total sugars of rice under water-deficit stress as this technique is fast, economic, and noninvasive.
Stratton, Margaret D.; Ehrlich, Hanna Y.; Mor, Siobhan M.; Naumova, Elena N.
2017-01-01
Ross River virus (RRV), Barmah Forest virus (BFV), and dengue are three common mosquito-borne diseases in Australia that display notable seasonal patterns. Although all three diseases have been modeled on localized scales, no previous study has used harmonic models to compare seasonality of mosquito-borne diseases on a continent-wide scale. We fit Poisson harmonic regression models to surveillance data on RRV, BFV, and dengue (from 1993, 1995 and 1991, respectively, through 2015) incorporating seasonal, trend, and climate (temperature and rainfall) parameters. The models captured an average of 50–65% variability of the data. Disease incidence for all three diseases generally peaked in January or February, but peak timing was most variable for dengue. The most significant predictor parameters were trend and inter-annual periodicity for BFV, intra-annual periodicity for RRV, and trend for dengue. We found that a Temperature Suitability Index (TSI), designed to reclassify climate data relative to optimal conditions for vector establishment, could be applied to this context. Finally, we extrapolated our models to estimate the impact of a false-positive BFV epidemic in 2013. Creating these models and comparing variations in periodicities may provide insight into historical outbreaks as well as future patterns of mosquito-borne diseases. PMID:28071683
Stratton, Margaret D; Ehrlich, Hanna Y; Mor, Siobhan M; Naumova, Elena N
2017-01-10
Ross River virus (RRV), Barmah Forest virus (BFV), and dengue are three common mosquito-borne diseases in Australia that display notable seasonal patterns. Although all three diseases have been modeled on localized scales, no previous study has used harmonic models to compare seasonality of mosquito-borne diseases on a continent-wide scale. We fit Poisson harmonic regression models to surveillance data on RRV, BFV, and dengue (from 1993, 1995 and 1991, respectively, through 2015) incorporating seasonal, trend, and climate (temperature and rainfall) parameters. The models captured an average of 50-65% variability of the data. Disease incidence for all three diseases generally peaked in January or February, but peak timing was most variable for dengue. The most significant predictor parameters were trend and inter-annual periodicity for BFV, intra-annual periodicity for RRV, and trend for dengue. We found that a Temperature Suitability Index (TSI), designed to reclassify climate data relative to optimal conditions for vector establishment, could be applied to this context. Finally, we extrapolated our models to estimate the impact of a false-positive BFV epidemic in 2013. Creating these models and comparing variations in periodicities may provide insight into historical outbreaks as well as future patterns of mosquito-borne diseases.
NASA Astrophysics Data System (ADS)
Stratton, Margaret D.; Ehrlich, Hanna Y.; Mor, Siobhan M.; Naumova, Elena N.
2017-01-01
Ross River virus (RRV), Barmah Forest virus (BFV), and dengue are three common mosquito-borne diseases in Australia that display notable seasonal patterns. Although all three diseases have been modeled on localized scales, no previous study has used harmonic models to compare seasonality of mosquito-borne diseases on a continent-wide scale. We fit Poisson harmonic regression models to surveillance data on RRV, BFV, and dengue (from 1993, 1995 and 1991, respectively, through 2015) incorporating seasonal, trend, and climate (temperature and rainfall) parameters. The models captured an average of 50-65% variability of the data. Disease incidence for all three diseases generally peaked in January or February, but peak timing was most variable for dengue. The most significant predictor parameters were trend and inter-annual periodicity for BFV, intra-annual periodicity for RRV, and trend for dengue. We found that a Temperature Suitability Index (TSI), designed to reclassify climate data relative to optimal conditions for vector establishment, could be applied to this context. Finally, we extrapolated our models to estimate the impact of a false-positive BFV epidemic in 2013. Creating these models and comparing variations in periodicities may provide insight into historical outbreaks as well as future patterns of mosquito-borne diseases.
An Enhanced MEMS Error Modeling Approach Based on Nu-Support Vector Regression
Bhatt, Deepak; Aggarwal, Priyanka; Bhattacharya, Prabir; Devabhaktuni, Vijay
2012-01-01
Micro Electro Mechanical System (MEMS)-based inertial sensors have made possible the development of a civilian land vehicle navigation system by offering a low-cost solution. However, the accurate modeling of the MEMS sensor errors is one of the most challenging tasks in the design of low-cost navigation systems. These sensors exhibit significant errors like biases, drift, noises; which are negligible for higher grade units. Different conventional techniques utilizing the Gauss Markov model and neural network method have been previously utilized to model the errors. However, Gauss Markov model works unsatisfactorily in the case of MEMS units due to the presence of high inherent sensor errors. On the other hand, modeling the random drift utilizing Neural Network (NN) is time consuming, thereby affecting its real-time implementation. We overcome these existing drawbacks by developing an enhanced Support Vector Machine (SVM) based error model. Unlike NN, SVMs do not suffer from local minimisation or over-fitting problems and delivers a reliable global solution. Experimental results proved that the proposed SVM approach reduced the noise standard deviation by 10–35% for gyroscopes and 61–76% for accelerometers. Further, positional error drifts under static conditions improved by 41% and 80% in comparison to NN and GM approaches. PMID:23012552
Zhang, Yong-Hong; Xia, Zhi-Ning; Qin, Li-Tang; Liu, Shu-Shen
2010-09-01
The objective of this paper is to build a reliable model based on the molecular electronegativity distance vector (MEDV) descriptors for predicting the blood-brain barrier (BBB) permeability and to reveal the effects of the molecular structural segments on the BBB permeability. Using 70 structurally diverse compounds, the partial least squares regression (PLSR) models between the BBB permeability and the MEDV descriptors were developed and validated by the variable selection and modeling based on prediction (VSMP) technique. The estimation ability, stability, and predictive power of a model are evaluated by the estimated correlation coefficient (r), leave-one-out (LOO) cross-validation correlation coefficient (q), and predictive correlation coefficient (R(p)). It has been found that PLSR model has good quality, r=0.9202, q=0.7956, and R(p)=0.6649 for M1 model based on the training set of 57 samples. To search the most important structural factors affecting the BBB permeability of compounds, we performed the values of the variable importance in projection (VIP) analysis for MEDV descriptors. It was found that some structural fragments in compounds, such as -CH(3), -CH(2)-, =CH-, =C, triple bond C-, -CH<, =C<, =N-, -NH-, =O, and -OH, are the most important factors affecting the BBB permeability. (c) 2010. Published by Elsevier Inc.
NASA Technical Reports Server (NTRS)
Bean, W. C.
1971-01-01
Comparison of two-impulse and three-impulse orbital transfer, using data from a 63-case numerical study. For each case investigated for which coplanarity of the regressing assembly parking ellipse was attained with the target asymptotic velocity vector, a two-impulse maneuver (or a one-impulse equivalent) was found for which the velocity expenditure was within 1% of a reference absolute minimum lower bound. Therefore, for the coplanar cases, use of a minimum delta-V three-impulse maneuver afforded scant improvement in velocity penalty. However, as the noncoplanarity of the parking ellipse and the target asymptotic velocity vector increased, there was a significant increase in the superiority of minimum delta-V three-impulse maneuvers for slowing the growth of velocity expenditure. It is concluded that a multiple-impulse maneuver should be contemplated if nonnominal launch conditions could occur.
Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment
Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T.; Alcázar-Ramírez, José D.; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A.
2015-01-01
Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients' facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition), over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets). Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs). Support vector regression (SVR) is applied on facial features and i-vectors to estimate the AHI. PMID:26664493
Speech Signal and Facial Image Processing for Obstructive Sleep Apnea Assessment.
Espinoza-Cuadros, Fernando; Fernández-Pozo, Rubén; Toledano, Doroteo T; Alcázar-Ramírez, José D; López-Gonzalo, Eduardo; Hernández-Gómez, Luis A
2015-01-01
Obstructive sleep apnea (OSA) is a common sleep disorder characterized by recurring breathing pauses during sleep caused by a blockage of the upper airway (UA). OSA is generally diagnosed through a costly procedure requiring an overnight stay of the patient at the hospital. This has led to proposing less costly procedures based on the analysis of patients' facial images and voice recordings to help in OSA detection and severity assessment. In this paper we investigate the use of both image and speech processing to estimate the apnea-hypopnea index, AHI (which describes the severity of the condition), over a population of 285 male Spanish subjects suspected to suffer from OSA and referred to a Sleep Disorders Unit. Photographs and voice recordings were collected in a supervised but not highly controlled way trying to test a scenario close to an OSA assessment application running on a mobile device (i.e., smartphones or tablets). Spectral information in speech utterances is modeled by a state-of-the-art low-dimensional acoustic representation, called i-vector. A set of local craniofacial features related to OSA are extracted from images after detecting facial landmarks using Active Appearance Models (AAMs). Support vector regression (SVR) is applied on facial features and i-vectors to estimate the AHI.
Dai, C; Cai, X H; Cai, Y P; Guo, H C; Sun, W; Tan, Q; Huang, G H
2014-06-01
This research developed a simulation-aided nonlinear programming model (SNPM). This model incorporated the consideration of pollutant dispersion modeling, and the management of coal blending and the related human health risks within a general modeling framework In SNPM, the simulation effort (i.e., California puff [CALPUFF]) was used to forecast the fate of air pollutants for quantifying the health risk under various conditions, while the optimization studies were to identify the optimal coal blending strategies from a number of alternatives. To solve the model, a surrogate-based indirect search approach was proposed, where the support vector regression (SVR) was used to create a set of easy-to-use and rapid-response surrogates for identifying the function relationships between coal-blending operating conditions and health risks. Through replacing the CALPUFF and the corresponding hazard quotient equation with the surrogates, the computation efficiency could be improved. The developed SNPM was applied to minimize the human health risk associated with air pollutants discharged from Gaojing and Shijingshan power plants in the west of Beijing. Solution results indicated that it could be used for reducing the health risk of the public in the vicinity of the two power plants, identifying desired coal blending strategies for decision makers, and considering a proper balance between coal purchase cost and human health risk. A simulation-aided nonlinear programming model (SNPM) is developed. It integrates the advantages of CALPUFF and nonlinear programming model. To solve the model, a surrogate-based indirect search approach based on the combination of support vector regression and genetic algorithm is proposed. SNPM is applied to reduce the health risk caused by air pollutants discharged from Gaojing and Shijingshan power plants in the west of Beijing. Solution results indicate that it is useful for generating coal blending schemes, reducing the health risk of the public, reflecting the trade-offbetween coal purchase cost and health risk.
Santos, Frédéric; Guyomarc'h, Pierre; Bruzek, Jaroslav
2014-12-01
Accuracy of identification tools in forensic anthropology primarily rely upon the variations inherent in the data upon which they are built. Sex determination methods based on craniometrics are widely used and known to be specific to several factors (e.g. sample distribution, population, age, secular trends, measurement technique, etc.). The goal of this study is to discuss the potential variations linked to the statistical treatment of the data. Traditional craniometrics of four samples extracted from documented osteological collections (from Portugal, France, the U.S.A., and Thailand) were used to test three different classification methods: linear discriminant analysis (LDA), logistic regression (LR), and support vector machines (SVM). The Portuguese sample was set as a training model on which the other samples were applied in order to assess the validity and reliability of the different models. The tests were performed using different parameters: some included the selection of the best predictors; some included a strict decision threshold (sex assessed only if the related posterior probability was high, including the notion of indeterminate result); and some used an unbalanced sex-ratio. Results indicated that LR tends to perform slightly better than the other techniques and offers a better selection of predictors. Also, the use of a decision threshold (i.e. p>0.95) is essential to ensure an acceptable reliability of sex determination methods based on craniometrics. Although the Portuguese, French, and American samples share a similar sexual dimorphism, application of Western models on the Thai sample (that displayed a lower degree of dimorphism) was unsuccessful. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Reiter, M.E.; Lapointe, D.A.
2007-01-01
Mosquito-borne avian diseases, principally avian malaria (Plasmodium relictum Grassi and Feletti) and avian pox (Avipoxvirus sp.) have been implicated as the key limiting factor associated with recent declines of endemic avifauna in the Hawaiian Island archipelago. We present data on the relative abundance, infection status, and spatial distribution of the primary mosquito vector Culex quinquefasciatus Say (Diptera: Culicidae) across a mixed, residential-agricultural community adjacent to Hawai'i Volcanoes National Park on Hawai'i Island. We modeled the effect of agriculture and forest fragmentation in determining relative abundance of adult Cx. quinquefasciatus in Volcano Village, and we implement our statistical model in a geographic information system to generate a probability of mosquito capture prediction surface for the study area. Our model was based on biweekly captures of adult mosquitoes from 20 locations within Volcano Village from October 2001 to April 2003. We used mixed effects logistic regression to model the probability of capturing a mosquito, and we developed a set of 17 competing models a priori to specifically evaluate the effect of agriculture and fragmentation (i.e., residential landscapes) at two spatial scales. In total, 2,126 mosquitoes were captured in CO 2-baited traps with an average probability of 0.27 (SE = 0.10) of capturing one or more mosquitoes per trap night. Twelve percent of mosquitoes captured were infected with P. relictum. Our data indicate that agricultural lands and forest fragmentation significantly increase the probability of mosquito capture. The prediction surface identified areas along the Hawai'i Volcanoes National Park boundary that may have high relative abundance of the vector. Our data document the potential of avian malaria transmission in residential-agricultural landscapes and support the need for vector management that extends beyond reserve boundaries and considers a reserve's spatial position in a highly heterogeneous landscape.
DOA Finding with Support Vector Regression Based Forward-Backward Linear Prediction.
Pan, Jingjing; Wang, Yide; Le Bastard, Cédric; Wang, Tianzhen
2017-05-27
Direction-of-arrival (DOA) estimation has drawn considerable attention in array signal processing, particularly with coherent signals and a limited number of snapshots. Forward-backward linear prediction (FBLP) is able to directly deal with coherent signals. Support vector regression (SVR) is robust with small samples. This paper proposes the combination of the advantages of FBLP and SVR in the estimation of DOAs of coherent incoming signals with low snapshots. The performance of the proposed method is validated with numerical simulations in coherent scenarios, in terms of different angle separations, numbers of snapshots, and signal-to-noise ratios (SNRs). Simulation results show the effectiveness of the proposed method.
Song, Kai; Wang, Qi; Liu, Qi; Zhang, Hongquan; Cheng, Yingguo
2011-01-01
This paper describes the design and implementation of a wireless electronic nose (WEN) system which can online detect the combustible gases methane and hydrogen (CH4/H2) and estimate their concentrations, either singly or in mixtures. The system is composed of two wireless sensor nodes—a slave node and a master node. The former comprises a Fe2O3 gas sensing array for the combustible gas detection, a digital signal processor (DSP) system for real-time sampling and processing the sensor array data and a wireless transceiver unit (WTU) by which the detection results can be transmitted to the master node connected with a computer. A type of Fe2O3 gas sensor insensitive to humidity is developed for resistance to environmental influences. A threshold-based least square support vector regression (LS-SVR)estimator is implemented on a DSP for classification and concentration measurements. Experimental results confirm that LS-SVR produces higher accuracy compared with artificial neural networks (ANNs) and a faster convergence rate than the standard support vector regression (SVR). The designed WEN system effectively achieves gas mixture analysis in a real-time process. PMID:22346587
NASA Technical Reports Server (NTRS)
Beck, Louisa R.; Rodriquez, Mario H.; Dister, Sheri W.; Rodriquez, Americo D.; Rejmankova, Eliska; Ulloa, Armando; Meza, Rosa A.; Roberts, Donald R.; Paris, Jack F.; Spanner, Michael A.;
1994-01-01
A landscape approach using remote sensing and Geographic Information System (GIS) technologies was developed to discriminate between villages at high and low risk for malaria transmission, as defined by adult Anopheles albimanus abundance. Satellite data for an area in southern Chiapas, Mexico were digitally processed to generate a map of landscape elements. The GIS processes were used to determine the proportion of mapped landscape elements surrounding 40 villages where An. albimanus data had been collected. The relationships between vector abundance and landscape element proportions were investigated using stepwise discriminant analysis and stepwise linear regression. Both analyses indicated that the most important landscape elements in terms of explaining vector abundance were transitional swamp and unmanaged pasture. Discriminant functions generated for these two elements were able to correctly distinguish between villages with high ind low vector abundance, with an overall accuracy of 90%. Regression results found both transitional swamp and unmanaged pasture proportions to be predictive of vector abundance during the mid-to-late wet season. This approach, which integrates remotely sensed data and GIS capabilities to identify villages with high vector-human contact risk, provides a promising tool for malaria surveillance programs that depend on labor-intensive field techniques. This is particularly relevant in areas where the lack of accurate surveillance capabilities may result in no malaria control action when, in fact, directed action is necessary. In general, this landscape approach could be applied to other vector-borne diseases in areas where: 1. the landscape elements critical to vector survival are known and 2. these elements can be detected at remote sensing scales.
Novel SHM method to locate damages in substructures based on VARX models
NASA Astrophysics Data System (ADS)
Ugalde, U.; Anduaga, J.; Martínez, F.; Iturrospe, A.
2015-07-01
A novel damage localization method is proposed, which is based on a substructuring approach and makes use of Vector Auto-Regressive with eXogenous input (VARX) models. The substructuring approach aims to divide the monitored structure into several multi-DOF isolated substructures. Later, each individual substructure is modelled as a VARX model, and the health of each substructure is determined analyzing the variation of the VARX model. The method allows to detect whether the isolated substructure is damaged, and besides allows to locate and quantify the damage within the substructure. It is not necessary to have a theoretical model of the structure and only the measured displacement data is required to estimate the isolated substructure's VARX model. The proposed method is validated by simulations of a two-dimensional lattice structure.
Pre-operative prediction of surgical morbidity in children: comparison of five statistical models.
Cooper, Jennifer N; Wei, Lai; Fernandez, Soledad A; Minneci, Peter C; Deans, Katherine J
2015-02-01
The accurate prediction of surgical risk is important to patients and physicians. Logistic regression (LR) models are typically used to estimate these risks. However, in the fields of data mining and machine-learning, many alternative classification and prediction algorithms have been developed. This study aimed to compare the performance of LR to several data mining algorithms for predicting 30-day surgical morbidity in children. We used the 2012 National Surgical Quality Improvement Program-Pediatric dataset to compare the performance of (1) a LR model that assumed linearity and additivity (simple LR model) (2) a LR model incorporating restricted cubic splines and interactions (flexible LR model) (3) a support vector machine, (4) a random forest and (5) boosted classification trees for predicting surgical morbidity. The ensemble-based methods showed significantly higher accuracy, sensitivity, specificity, PPV, and NPV than the simple LR model. However, none of the models performed better than the flexible LR model in terms of the aforementioned measures or in model calibration or discrimination. Support vector machines, random forests, and boosted classification trees do not show better performance than LR for predicting pediatric surgical morbidity. After further validation, the flexible LR model derived in this study could be used to assist with clinical decision-making based on patient-specific surgical risks. Copyright © 2014 Elsevier Ltd. All rights reserved.
Stochastic Parametrization for the Impact of Neglected Variability Patterns
NASA Astrophysics Data System (ADS)
Kaiser, Olga; Hien, Steffen; Achatz, Ulrich; Horenko, Illia
2017-04-01
An efficient description of the gravity wave variability and the related spontaneous emission processes requires an empirical stochastic closure for the impact of neglected variability patterns (subgridscales or SGS). In particular, we focus on the analysis of the IGW emission within a tangent linear model which requires a stochastic SGS parameterization for taking the self interaction of the ageostrophic flow components into account. For this purpose, we identify the best SGS model in terms of exactness and simplicity by deploying a wide range of different data-driven model classes, including standard stationary regression models, autoregression and artificial neuronal networks models - as well as the family of nonstationary models like FEM-BV-VARX model class (Finite Element based vector autoregressive time series analysis with bounded variation of the model parameters). The models are used to investigate the main characteristics of the underlying dynamics and to explore the significant spatial and temporal neighbourhood dependencies. The best SGS model in terms of exactness and simplicity is obtained for the nonstationary FEM-BV-VARX setting, determining only direct spatial and temporal neighbourhood as significant - and allowing to drastically reduce the number of informations that are required for the optimal SGS. Additionally, the models are characterized by sets of vector- and matrix-valued parameters that must be inferred from big data sets provided by simulations - making it a task that can not be solved without deploying high-performance computing facilities (HPC).
REJEKI, Dwi Sarwani Sri; NURHAYATI, Nunung; AJI, Budi; MURHANDARWATI, E. Elsa Herdiana; KUSNANTO, Hari
2018-01-01
Background: Climatic and weather factors become important determinants of vector-borne diseases transmission like malaria. This study aimed to prove relationships between weather factors with considering human migration and previous case findings and malaria cases in endemic areas in Purworejo during 2005–2014. Methods: This study employed ecological time series analysis by using monthly data. The independent variables were the maximum temperature, minimum temperature, maximum humidity, minimum humidity, precipitation, human migration, and previous malaria cases, while the dependent variable was positive malaria cases. Three models of count data regression analysis i.e. Poisson model, quasi-Poisson model, and negative binomial model were applied to measure the relationship. The least Akaike Information Criteria (AIC) value was also performed to find the best model. Negative binomial regression analysis was considered as the best model. Results: The model showed that humidity (lag 2), precipitation (lag 3), precipitation (lag 12), migration (lag1) and previous malaria cases (lag 12) had a significant relationship with malaria cases. Conclusion: Weather, migration and previous malaria cases factors need to be considered as prominent indicators for the increase of malaria case projection. PMID:29900134
Shahlaei, M.; Saghaie, L.
2014-01-01
A quantitative structure–activity relationship (QSAR) study is suggested for the prediction of biological activity (pIC50) of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors. Modeling of the biological activities of compounds of interest as a function of molecular structures was established by means of principal component analysis (PCA) and least square support vector machine (LS-SVM) methods. The results showed that the pIC50 values calculated by LS-SVM are in good agreement with the experimental data, and the performance of the LS-SVM regression model is superior to the PCA-based model. The developed LS-SVM model was applied for the prediction of the biological activities of pyrimidone derivatives, which were not in the modeling procedure. The resulted model showed high prediction ability with root mean square error of prediction of 0.460 for LS-SVM. The study provided a novel and effective approach for predicting biological activities of 3, 4-dihydropyrido [3,2-d] pyrimidone derivatives as p38 inhibitors and disclosed that LS-SVM can be used as a powerful chemometrics tool for QSAR studies. PMID:26339262
Li, Hang; Wang, Maolin; Gong, Ya-Nan; Yan, Aixia
2016-01-01
β-secretase (BACE1) is an aspartyl protease, which is considered as a novel vital target in Alzheimer`s disease therapy. We collected a data set of 294 BACE1 inhibitors, and built six classification models to discriminate active and weakly active inhibitors using Kohonen's Self-Organizing Map (SOM) method and Support Vector Machine (SVM) method. Each molecular descriptor was calculated using the program ADRIANA.Code. We adopted two different methods: random method and Self-Organizing Map method, for training/test set split. The descriptors were selected by F-score and stepwise linear regression analysis. The best SVM model Model2C has a good prediction performance on test set with prediction accuracy, sensitivity (SE), specificity (SP) and Matthews correlation coefficient (MCC) of 89.02%, 90%, 88%, 0.78, respectively. Model 1A is the best SOM model, whose accuracy and MCC of the test set were 94.57% and 0.98, respectively. The lone pair electronegativity and polarizability related descriptors importantly contributed to bioactivity of BACE1 inhibitor. The Extended-Connectivity Finger-Prints_4 (ECFP_4) analysis found some vitally key substructural features, which could be helpful for further drug design research. The SOM and SVM models built in this study can be obtained from the authors by email or other contacts.
Brugger, Katharina; Rubel, Franz
2013-01-01
Bluetongue is an arboviral disease of ruminants causing significant economic losses. Our risk assessment is based on the epidemiological key parameter, the basic reproduction number. It is defined as the number of secondary cases caused by one primary case in a fully susceptible host population, in which values greater than one indicate the possibility, i.e., the risk, for a major disease outbreak. In the course of the Bluetongue virus serotype 8 (BTV-8) outbreak in Europe in 2006 we developed such a risk assessment for the University of Veterinary Medicine Vienna, Austria. Basic reproduction numbers were calculated using a well-known formula for vector-borne diseases considering the population densities of hosts (cattle and small ruminants) and vectors (biting midges of the Culicoides obsoletus spp.) as well as temperature dependent rates. The latter comprise the biting and mortality rate of midges as well as the reciprocal of the extrinsic incubation period. Most important, but generally unknown, is the spatio-temporal distribution of the vector density. Therefore, we established a continuously operating daily monitoring to quantify the seasonal cycle of the vector population by a statistical model. We used cross-correlation maps and Poisson regression to describe vector densities by environmental temperature and precipitation. Our results comprise time series of observed and simulated Culicoides obsoletus spp. counts as well as basic reproduction numbers for the period 2009–2011. For a spatio-temporal risk assessment we projected our results from the location of Vienna to the entire region of Austria. We compiled both daily maps of vector densities and the basic reproduction numbers, respectively. Basic reproduction numbers above one were generally found between June and August except in the mountainous regions of the Alps. The highest values coincide with the locations of confirmed BTV cases. PMID:23560090
Mwangangi, Joseph M; Mbogo, Charles M; Orindi, Benedict O; Muturi, Ephantus J; Midega, Janet T; Nzovu, Joseph; Gatakaa, Hellen; Githure, John; Borgemeister, Christian; Keating, Joseph; Beier, John C
2013-01-08
Over the past 20 years, numerous studies have investigated the ecology and behaviour of malaria vectors and Plasmodium falciparum malaria transmission on the coast of Kenya. Substantial progress has been made to control vector populations and reduce high malaria prevalence and severe disease. The goal of this paper was to examine trends over the past 20 years in Anopheles species composition, density, blood-feeding behaviour, and P. falciparum sporozoite transmission along the coast of Kenya. Using data collected from 1990 to 2010, vector density, species composition, blood-feeding patterns, and malaria transmission intensity was examined along the Kenyan coast. Mosquitoes were identified to species, based on morphological characteristics and DNA extracted from Anopheles gambiae for amplification. Using negative binomial generalized estimating equations, mosquito abundance over the period were modelled while adjusting for season. A multiple logistic regression model was used to analyse the sporozoite rates. Results show that in some areas along the Kenyan coast, Anopheles arabiensis and Anopheles merus have replaced An. gambiae sensu stricto (s.s.) and Anopheles funestus as the major mosquito species. Further, there has been a shift from human to animal feeding for both An. gambiae sensu lato (s.l.) (99% to 16%) and An. funestus (100% to 3%), and P. falciparum sporozoite rates have significantly declined over the last 20 years, with the lowest sporozoite rates being observed in 2007 (0.19%) and 2008 (0.34%). There has been, on average, a significant reduction in the abundance of An. gambiae s.l. over the years (IRR = 0.94, 95% CI 0.90-0.98), with the density standing at low levels of an average 0.006 mosquitoes/house in the year 2010. Reductions in the densities of the major malaria vectors and a shift from human to animal feeding have contributed to the decreased burden of malaria along the Kenyan coast. Vector species composition remains heterogeneous but in many areas An. arabiensis has replaced An. gambiae as the major malaria vector. This has important implications for malaria epidemiology and control given that this vector predominately rests and feeds on humans outdoors. Strategies for vector control need to continue focusing on tools for protecting residents inside houses but additionally employ outdoor control tools because these are essential for further reducing the levels of malaria transmission.
Face aging effect simulation model based on multilayer representation and shearlet transform
NASA Astrophysics Data System (ADS)
Li, Yuancheng; Li, Yan
2017-09-01
In order to extract detailed facial features, we build a face aging effect simulation model based on multilayer representation and shearlet transform. The face is divided into three layers: the global layer of the face, the local features layer, and texture layer, which separately establishes the aging model. First, the training samples are classified according to different age groups, and we use active appearance model (AAM) at the global level to obtain facial features. The regression equations of shape and texture with age are obtained by fitting the support vector machine regression, which is based on the radial basis function. We use AAM to simulate the aging of facial organs. Then, for the texture detail layer, we acquire the significant high-frequency characteristic components of the face by using the multiscale shearlet transform. Finally, we get the last simulated aging images of the human face by the fusion algorithm. Experiments are carried out on the FG-NET dataset, and the experimental results show that the simulated face images have less differences from the original image and have a good face aging simulation effect.
NASA Astrophysics Data System (ADS)
Mohan, Dhanya; Kumar, C. Santhosh
2016-03-01
Predicting the physiological condition (normal/abnormal) of a patient is highly desirable to enhance the quality of health care. Multi-parameter patient monitors (MPMs) using heart rate, arterial blood pressure, respiration rate and oxygen saturation (S pO2) as input parameters were developed to monitor the condition of patients, with minimum human resource utilization. The Support vector machine (SVM), an advanced machine learning approach popularly used for classification and regression is used for the realization of MPMs. For making MPMs cost effective, we experiment on the hardware implementation of the MPM using support vector machine classifier. The training of the system is done using the matlab environment and the detection of the alarm/noalarm condition is implemented in hardware. We used different kernels for SVM classification and note that the best performance was obtained using intersection kernel SVM (IKSVM). The intersection kernel support vector machine classifier MPM has outperformed the best known MPM using radial basis function kernel by an absoute improvement of 2.74% in accuracy, 1.86% in sensitivity and 3.01% in specificity. The hardware model was developed based on the improved performance system using Verilog Hardware Description Language and was implemented on Altera cyclone-II development board.
Zhao, Yangbing; Moon, Edmund; Carpenito, Carmine; Paulos, Chrystal M; Liu, Xiaojun; Brennan, Andrea L; Chew, Anne; Carroll, Richard G; Scholler, John; Levine, Bruce L; Albelda, Steven M; June, Carl H
2010-11-15
Redirecting T lymphocyte antigen specificity by gene transfer can provide large numbers of tumor-reactive T lymphocytes for adoptive immunotherapy. However, safety concerns associated with viral vector production have limited clinical application of T cells expressing chimeric antigen receptors (CAR). T lymphocytes can be gene modified by RNA electroporation without integration-associated safety concerns. To establish a safe platform for adoptive immunotherapy, we first optimized the vector backbone for RNA in vitro transcription to achieve high-level transgene expression. CAR expression and function of RNA-electroporated T cells could be detected up to a week after electroporation. Multiple injections of RNA CAR-electroporated T cells mediated regression of large vascularized flank mesothelioma tumors in NOD/scid/γc(-/-) mice. Dramatic tumor reduction also occurred when the preexisting intraperitoneal human-derived tumors, which had been growing in vivo for >50 days, were treated by multiple injections of autologous human T cells electroporated with anti-mesothelin CAR mRNA. This is the first report using matched patient tumor and lymphocytes showing that autologous T cells from cancer patients can be engineered to provide an effective therapy for a disseminated tumor in a robust preclinical model. Multiple injections of RNA-engineered T cells are a novel approach for adoptive cell transfer, providing flexible platform for the treatment of cancer that may complement the use of retroviral and lentiviral engineered T cells. This approach may increase the therapeutic index of T cells engineered to express powerful activation domains without the associated safety concerns of integrating viral vectors. Copyright © 2010 AACR.
Zhao, Yangbing; Moon, Edmund; Carpenito, Carmine; Paulos, Chrystal M.; Liu, Xiaojun; Brennan, Andrea L; Chew, Anne; Carroll, Richard G.; Scholler, John; Levine, Bruce L.; Albelda, Steven M.; June, Carl H.
2010-01-01
Redirecting T lymphocyte antigen specificity by gene transfer can provide large numbers of tumor reactive T lymphocytes for adoptive immunotherapy. However, safety concerns associated with viral vector production have limited clinical application of T cells expressing chimeric antigen receptors (CARs). T lymphocytes can be gene modified by RNA electroporation without integration-associated safety concerns. To establish a safe platform for adoptive immunotherapy, we first optimized the vector backbone for RNA in vitro transcription to achieve high level transgene expression. CAR expression and function of RNA-electroporated T cells could be detected up to a week post electroporation. Multiple injections of RNA CAR electroporated T cells mediated regression of large vascularized flank mesothelioma tumors in NOD/scid/γc(−/−) mice. Dramatic tumor reduction also occurred when the pre-existing intraperitoneal human-derived tumors, that had been growing in vivo for over 50 days, were treated by multiple injections of autologous human T cells electroporated with anti-mesothelin CAR mRNA. This is the first report using matched patient tumor and lymphocytes demonstrating that autologous T cells from cancer patients can be engineered to provide an effective therapy for a disseminated tumor in a robust preclinical model. Multiple injections of RNA engineered T cells are a novel approach for adoptive cell transfer, providing flexible platform for the treatment of cancer that may complement the use of retroviral and lentiviral engineered T cells. This approach may increase the therapeutic index of T cells engineered to express powerful activation domains without the associated safety concerns of integrating viral vectors. PMID:20926399
Ramilo, David W; Nunes, Telmo; Madeira, Sara; Boinas, Fernando; da Fonseca, Isabel Pereira
2017-01-01
Vector-borne diseases are not only accounted responsible for their burden on human health-care systems, but also known to cause economic constraints to livestock and animal production. Animals are affected directly by the transmitted pathogens and indirectly when animal movement is restricted. Distribution of such diseases depends on climatic and social factors, namely, environmental changes, globalization, trade and unplanned urbanization. Culicoides biting midges are responsible for the transmission of several pathogenic agents with relevant economic impact. Due to a fragmentary knowledge of their ecology, occurrence is difficult to predict consequently, limiting the control of these arthropod vectors. In order to understand the distribution of Culicoides species, in mainland Portugal, data collected during the National Entomologic Surveillance Program for Bluetongue disease (2005-2013), were used for statistical evaluation. Logistic regression analysis was preformed and prediction maps (per season) were obtained for vector and potentially vector species. The variables used at the present study were selected from WorldClim (two climatic variables) and CORINE databases (twenty-two land cover variables). This work points to an opposite distribution of C. imicola and species from the Obsoletus group within mainland Portugal. Such findings are evidenced in autumn, with the former appearing in Central and Southern regions. Although appearing northwards, on summer and autumn, C. newsteadi reveals a similar distribution to C. imicola. The species C. punctatus appears in all Portuguese territory throughout the year. Contrary, C. pulicaris is poorly caught in all areas of mainland Portugal, being paradoxical present near coastal areas and higher altitude regions.
Madeira, Sara; Boinas, Fernando; da Fonseca, Isabel Pereira
2017-01-01
Vector-borne diseases are not only accounted responsible for their burden on human health-care systems, but also known to cause economic constraints to livestock and animal production. Animals are affected directly by the transmitted pathogens and indirectly when animal movement is restricted. Distribution of such diseases depends on climatic and social factors, namely, environmental changes, globalization, trade and unplanned urbanization. Culicoides biting midges are responsible for the transmission of several pathogenic agents with relevant economic impact. Due to a fragmentary knowledge of their ecology, occurrence is difficult to predict consequently, limiting the control of these arthropod vectors. In order to understand the distribution of Culicoides species, in mainland Portugal, data collected during the National Entomologic Surveillance Program for Bluetongue disease (2005–2013), were used for statistical evaluation. Logistic regression analysis was preformed and prediction maps (per season) were obtained for vector and potentially vector species. The variables used at the present study were selected from WorldClim (two climatic variables) and CORINE databases (twenty-two land cover variables). This work points to an opposite distribution of C. imicola and species from the Obsoletus group within mainland Portugal. Such findings are evidenced in autumn, with the former appearing in Central and Southern regions. Although appearing northwards, on summer and autumn, C. newsteadi reveals a similar distribution to C. imicola. The species C. punctatus appears in all Portuguese territory throughout the year. Contrary, C. pulicaris is poorly caught in all areas of mainland Portugal, being paradoxical present near coastal areas and higher altitude regions. PMID:28683145
NASA Astrophysics Data System (ADS)
Attia, Khalid A. M.; Nassar, Mohammed W. I.; El-Zeiny, Mohamed B.; Serag, Ahmed
2017-01-01
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration.
NASA Astrophysics Data System (ADS)
Mofavvaz, Shirin; Sohrabi, Mahmoud Reza; Nezamzadeh-Ejhieh, Alireza
2017-07-01
In the present study, artificial neural networks (ANNs) and least squares support vector machines (LS-SVM) as intelligent methods based on absorption spectra in the range of 230-300 nm have been used for determination of antihistamine decongestant contents. In the first step, one type of network (feed-forward back-propagation) from the artificial neural network with two different training algorithms, Levenberg-Marquardt (LM) and gradient descent with momentum and adaptive learning rate back-propagation (GDX) algorithm, were employed and their performance was evaluated. The performance of the LM algorithm was better than the GDX algorithm. In the second one, the radial basis network was utilized and results compared with the previous network. In the last one, the other intelligent method named least squares support vector machine was proposed to construct the antihistamine decongestant prediction model and the results were compared with two of the aforementioned networks. The values of the statistical parameters mean square error (MSE), Regression coefficient (R2), correlation coefficient (r) and also mean recovery (%), relative standard deviation (RSD) used for selecting the best model between these methods. Moreover, the proposed methods were compared to the high- performance liquid chromatography (HPLC) as a reference method. One way analysis of variance (ANOVA) test at the 95% confidence level applied to the comparison results of suggested and reference methods that there were no significant differences between them.
On Relevance of Codon Usage to Expression of Synthetic and Natural Genes in Escherichia coli
Supek, Fran; Šmuc, Tomislav
2010-01-01
A recent investigation concluded that codon bias did not affect expression of green fluorescent protein (GFP) variants in Escherichia coli, while stability of an mRNA secondary structure near the 5′ end played a dominant role. We demonstrate that combining the two variables using regression trees or support vector regression yields a biologically plausible model with better support in the GFP data set and in other experimental data: codon usage is relevant for protein levels if the 5′ mRNA structures are not strong. Natural E. coli genes had weaker 5′ mRNA structures than the examined set of GFP variants and did not exhibit a correlation between the folding free energy of 5′ mRNA structures and protein expression. PMID:20421604
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression.
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson's statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran's index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China's regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test.
Torres-Valencia, Cristian A; Álvarez, Mauricio A; Orozco-Gutiérrez, Alvaro A
2014-01-01
Human emotion recognition (HER) allows the assessment of an affective state of a subject. Until recently, such emotional states were described in terms of discrete emotions, like happiness or contempt. In order to cover a high range of emotions, researchers in the field have introduced different dimensional spaces for emotion description that allow the characterization of affective states in terms of several variables or dimensions that measure distinct aspects of the emotion. One of the most common of such dimensional spaces is the bidimensional Arousal/Valence space. To the best of our knowledge, all HER systems so far have modelled independently, the dimensions in these dimensional spaces. In this paper, we study the effect of modelling the output dimensions simultaneously and show experimentally the advantages in modeling them in this way. We consider a multimodal approach by including features from the Electroencephalogram and a few physiological signals. For modelling the multiple outputs, we employ a multiple output regressor based on support vector machines. We also include an stage of feature selection that is developed within an embedded approach known as Recursive Feature Elimination (RFE), proposed initially for SVM. The results show that several features can be eliminated using the multiple output support vector regressor with RFE without affecting the performance of the regressor. From the analysis of the features selected in smaller subsets via RFE, it can be observed that the signals that are more informative into the arousal and valence space discrimination are the EEG, Electrooculogram/Electromiogram (EOG/EMG) and the Galvanic Skin Response (GSR).
Zhang, Daqing; Xiao, Jianfeng; Zhou, Nannan; Luo, Xiaomin; Jiang, Hualiang; Chen, Kaixian
2015-01-01
Blood-brain barrier (BBB) is a highly complex physical barrier determining what substances are allowed to enter the brain. Support vector machine (SVM) is a kernel-based machine learning method that is widely used in QSAR study. For a successful SVM model, the kernel parameters for SVM and feature subset selection are the most important factors affecting prediction accuracy. In most studies, they are treated as two independent problems, but it has been proven that they could affect each other. We designed and implemented genetic algorithm (GA) to optimize kernel parameters and feature subset selection for SVM regression and applied it to the BBB penetration prediction. The results show that our GA/SVM model is more accurate than other currently available log BB models. Therefore, to optimize both SVM parameters and feature subset simultaneously with genetic algorithm is a better approach than other methods that treat the two problems separately. Analysis of our log BB model suggests that carboxylic acid group, polar surface area (PSA)/hydrogen-bonding ability, lipophilicity, and molecular charge play important role in BBB penetration. Among those properties relevant to BBB penetration, lipophilicity could enhance the BBB penetration while all the others are negatively correlated with BBB penetration. PMID:26504797
Bao, Jie; Hou, Zhangshuan; Huang, Maoyi; ...
2015-12-04
Here, effective sensitivity analysis approaches are needed to identify important parameters or factors and their uncertainties in complex Earth system models composed of multi-phase multi-component phenomena and multiple biogeophysical-biogeochemical processes. In this study, the impacts of 10 hydrologic parameters in the Community Land Model on simulations of runoff and latent heat flux are evaluated using data from a watershed. Different metrics, including residual statistics, the Nash-Sutcliffe coefficient, and log mean square error, are used as alternative measures of the deviations between the simulated and field observed values. Four sensitivity analysis (SA) approaches, including analysis of variance based on the generalizedmore » linear model, generalized cross validation based on the multivariate adaptive regression splines model, standardized regression coefficients based on a linear regression model, and analysis of variance based on support vector machine, are investigated. Results suggest that these approaches show consistent measurement of the impacts of major hydrologic parameters on response variables, but with differences in the relative contributions, particularly for the secondary parameters. The convergence behaviors of the SA with respect to the number of sampling points are also examined with different combinations of input parameter sets and output response variables and their alternative metrics. This study helps identify the optimal SA approach, provides guidance for the calibration of the Community Land Model parameters to improve the model simulations of land surface fluxes, and approximates the magnitudes to be adjusted in the parameter values during parametric model optimization.« less
A new computational strategy for predicting essential genes.
Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng
2013-12-21
Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
Laporta, Gabriel Zorello; Ramos, Daniel Garkauskas; Ribeiro, Milton Cezar; Sallum, Maria Anice Mureb
2011-08-01
Every year, autochthonous cases of Plasmodium vivax malaria occur in low-endemicity areas of Vale do Ribeira in the south-eastern part of the Atlantic Forest, state of São Paulo, where Anopheles cruzii and Anopheles bellator are considered the primary vectors. However, other species in the subgenus Nyssorhynchus of Anopheles (e.g., Anopheles marajoara) are abundant and may participate in the dynamics of malarial transmission in that region. The objectives of the present study were to assess the spatial distribution of An. cruzii, An. bellator and An. marajoara and to associate the presence of these species with malaria cases in the municipalities of the Vale do Ribeira. Potential habitat suitability modelling was applied to determine both the spatial distribution of An. cruzii, An. bellator and An. marajoara and to establish the density of each species. Poisson regression was utilized to associate malaria cases with estimated vector densities. As a result, An. cruzii was correlated with the forested slopes of the Serra do Mar, An. bellator with the coastal plain and An. marajoara with the deforested areas. Moreover, both An. marajoara and An. cruzii were positively associated with malaria cases. Considering that An. marajoara was demonstrated to be a primary vector of human Plasmodium in the rural areas of the state of Amapá, more attention should be given to the species in the deforested areas of the Atlantic Forest, where it might be a secondary vector.
NASA Astrophysics Data System (ADS)
Jiang, Weiping; Ma, Jun; Li, Zhao; Zhou, Xiaohui; Zhou, Boye
2018-05-01
The analysis of the correlations between the noise in different components of GPS stations has positive significance to those trying to obtain more accurate uncertainty of velocity with respect to station motion. Previous research into noise in GPS position time series focused mainly on single component evaluation, which affects the acquisition of precise station positions, the velocity field, and its uncertainty. In this study, before and after removing the common-mode error (CME), we performed one-dimensional linear regression analysis of the noise amplitude vectors in different components of 126 GPS stations with a combination of white noise, flicker noise, and random walking noise in Southern California. The results show that, on the one hand, there are above-moderate degrees of correlation between the white noise amplitude vectors in all components of the stations before and after removal of the CME, while the correlations between flicker noise amplitude vectors in horizontal and vertical components are enhanced from un-correlated to moderately correlated by removing the CME. On the other hand, the significance tests show that, all of the obtained linear regression equations, which represent a unique function of the noise amplitude in any two components, are of practical value after removing the CME. According to the noise amplitude estimates in two components and the linear regression equations, more accurate noise amplitudes can be acquired in the two components.
Development of Ensemble Model Based Water Demand Forecasting Model
NASA Astrophysics Data System (ADS)
Kwon, Hyun-Han; So, Byung-Jin; Kim, Seong-Hyeon; Kim, Byung-Seop
2014-05-01
In recent years, Smart Water Grid (SWG) concept has globally emerged over the last decade and also gained significant recognition in South Korea. Especially, there has been growing interest in water demand forecast and optimal pump operation and this has led to various studies regarding energy saving and improvement of water supply reliability. Existing water demand forecasting models are categorized into two groups in view of modeling and predicting their behavior in time series. One is to consider embedded patterns such as seasonality, periodicity and trends, and the other one is an autoregressive model that is using short memory Markovian processes (Emmanuel et al., 2012). The main disadvantage of the abovementioned model is that there is a limit to predictability of water demands of about sub-daily scale because the system is nonlinear. In this regard, this study aims to develop a nonlinear ensemble model for hourly water demand forecasting which allow us to estimate uncertainties across different model classes. The proposed model is consist of two parts. One is a multi-model scheme that is based on combination of independent prediction model. The other one is a cross validation scheme named Bagging approach introduced by Brieman (1996) to derive weighting factors corresponding to individual models. Individual forecasting models that used in this study are linear regression analysis model, polynomial regression, multivariate adaptive regression splines(MARS), SVM(support vector machine). The concepts are demonstrated through application to observed from water plant at several locations in the South Korea. Keywords: water demand, non-linear model, the ensemble forecasting model, uncertainty. Acknowledgements This subject is supported by Korea Ministry of Environment as "Projects for Developing Eco-Innovation Technologies (GT-11-G-02-001-6)
Deep Restricted Kernel Machines Using Conjugate Feature Duality.
Suykens, Johan A K
2017-08-01
The aim of this letter is to propose a theory of deep restricted kernel machines offering new foundations for deep learning with kernel machines. From the viewpoint of deep learning, it is partially related to restricted Boltzmann machines, which are characterized by visible and hidden units in a bipartite graph without hidden-to-hidden connections and deep learning extensions as deep belief networks and deep Boltzmann machines. From the viewpoint of kernel machines, it includes least squares support vector machines for classification and regression, kernel principal component analysis (PCA), matrix singular value decomposition, and Parzen-type models. A key element is to first characterize these kernel machines in terms of so-called conjugate feature duality, yielding a representation with visible and hidden units. It is shown how this is related to the energy form in restricted Boltzmann machines, with continuous variables in a nonprobabilistic setting. In this new framework of so-called restricted kernel machine (RKM) representations, the dual variables correspond to hidden features. Deep RKM are obtained by coupling the RKMs. The method is illustrated for deep RKM, consisting of three levels with a least squares support vector machine regression level and two kernel PCA levels. In its primal form also deep feedforward neural networks can be trained within this framework.
2018-01-01
Background Many studies have tried to develop predictors for return-to-work (RTW). However, since complex factors have been demonstrated to predict RTW, it is difficult to use them practically. This study investigated whether factors used in previous studies could predict whether an individual had returned to his/her original work by four years after termination of the worker's recovery period. Methods An initial logistic regression analysis of 1,567 participants of the fourth Panel Study of Worker's Compensation Insurance yielded odds ratios. The participants were divided into two subsets, a training dataset and a test dataset. Using the training dataset, logistic regression, decision tree, random forest, and support vector machine models were established, and important variables of each model were identified. The predictive abilities of the different models were compared. Results The analysis showed that only earned income and company-related factors significantly affected return-to-original-work (RTOW). The random forest model showed the best accuracy among the tested machine learning models; however, the difference was not prominent. Conclusion It is possible to predict a worker's probability of RTOW using machine learning techniques with moderate accuracy. PMID:29736160
NASA Astrophysics Data System (ADS)
Chakraborty, Joheen; Banerji, Sugata
2018-03-01
Driven by a desire to control climate change and reduce the dependence on fossil fuels, governments around the world are increasing the adoption of renewable energy sources. However, among the US states, we observe a wide disparity in renewable penetration. In this study, we have identified and cleaned over a dozen datasets representing solar energy penetration in each US state, and the potentially relevant socioeconomic and other factors that may be driving the growth in solar. We have applied a number of predictive modeling approaches - including machine learning and regression - on these datasets over a 17-year period and evaluated the relative performance of the models. Our goals were: (1) identify the most important factors that are driving the growth in solar, (2) choose the most effective predictive modeling technique for solar growth, and (3) develop a model for predicting next year’s solar growth using this year’s data. We obtained very promising results with random forests (about 90% efficacy) and varying degrees of success with support vector machines and regression techniques (linear, polynomial, ridge). We also identified states with solar growth slower than expected and representing a potential for stronger growth in future.
2011-01-01
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586
Keithley, Richard B; Wightman, R Mark
2011-06-07
Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.
Fisher, Aaron J; Reeves, Jonathan W; Chi, Cyrus
2016-07-01
Expanding on recently published methods, the current study presents an approach to estimating the dynamic, regulatory effect of the parasympathetic nervous system on heart period on a moment-to-moment basis. We estimated second-to-second variation in respiratory sinus arrhythmia (RSA) in order to estimate the contemporaneous and time-lagged relationships among RSA, interbeat interval (IBI), and respiration rate via vector autoregression. Moreover, we modeled these relationships at lags of 1 s to 10 s, in order to evaluate the optimal latency for estimating dynamic RSA effects. The IBI (t) on RSA (t-n) regression parameter was extracted from individual models as an operationalization of the regulatory effect of RSA on IBI-referred to as dynamic RSA (dRSA). Dynamic RSA positively correlated with standard averages of heart rate and negatively correlated with standard averages of RSA. We propose that dRSA reflects the active downregulation of heart period by the parasympathetic nervous system and thus represents a novel metric that provides incremental validity in the measurement of autonomic cardiac control-specifically, a method by which parasympathetic regulatory effects can be measured in process. © 2016 Society for Psychophysiological Research.
Fan, X-J; Wan, X-B; Huang, Y; Cai, H-M; Fu, X-H; Yang, Z-L; Chen, D-K; Song, S-X; Wu, P-H; Liu, Q; Wang, L; Wang, J-P
2012-01-01
Background: Current imaging modalities are inadequate in preoperatively predicting regional lymph node metastasis (RLNM) status in rectal cancer (RC). Here, we designed support vector machine (SVM) model to address this issue by integrating epithelial–mesenchymal-transition (EMT)-related biomarkers along with clinicopathological variables. Methods: Using tissue microarrays and immunohistochemistry, the EMT-related biomarkers expression was measured in 193 RC patients. Of which, 74 patients were assigned to the training set to select the robust variables for designing SVM model. The SVM model predictive value was validated in the testing set (119 patients). Results: In training set, eight variables, including six EMT-related biomarkers and two clinicopathological variables, were selected to devise SVM model. In testing set, we identified 63 patients with high risk to RLNM and 56 patients with low risk. The sensitivity, specificity and overall accuracy of SVM in predicting RLNM were 68.3%, 81.1% and 72.3%, respectively. Importantly, multivariate logistic regression analysis showed that SVM model was indeed an independent predictor of RLNM status (odds ratio, 11.536; 95% confidence interval, 4.113–32.361; P<0.0001). Conclusion: Our SVM-based model displayed moderately strong predictive power in defining the RLNM status in RC patients, providing an important approach to select RLNM high-risk subgroup for neoadjuvant chemoradiotherapy. PMID:22538975
NASA Astrophysics Data System (ADS)
Li, Hui; Hong, Lu-Yao; Zhou, Qing; Yu, Hai-Jie
2015-08-01
The business failure of numerous companies results in financial crises. The high social costs associated with such crises have made people to search for effective tools for business risk prediction, among which, support vector machine is very effective. Several modelling means, including single-technique modelling, hybrid modelling, and ensemble modelling, have been suggested in forecasting business risk with support vector machine. However, existing literature seldom focuses on the general modelling frame for business risk prediction, and seldom investigates performance differences among different modelling means. We reviewed researches on forecasting business risk with support vector machine, proposed the general assisted prediction modelling frame with hybridisation and ensemble (APMF-WHAE), and finally, investigated the use of principal components analysis, support vector machine, random sampling, and group decision, under the general frame in forecasting business risk. Under the APMF-WHAE frame with support vector machine as the base predictive model, four specific predictive models were produced, namely, pure support vector machine, a hybrid support vector machine involved with principal components analysis, a support vector machine ensemble involved with random sampling and group decision, and an ensemble of hybrid support vector machine using group decision to integrate various hybrid support vector machines on variables produced from principle components analysis and samples from random sampling. The experimental results indicate that hybrid support vector machine and ensemble of hybrid support vector machines were able to produce dominating performance than pure support vector machine and support vector machine ensemble.
Lopes, Marta B; Calado, Cecília R C; Figueiredo, Mário A T; Bioucas-Dias, José M
2017-06-01
The monitoring of biopharmaceutical products using Fourier transform infrared (FT-IR) spectroscopy relies on calibration techniques involving the acquisition of spectra of bioprocess samples along the process. The most commonly used method for that purpose is partial least squares (PLS) regression, under the assumption that a linear model is valid. Despite being successful in the presence of small nonlinearities, linear methods may fail in the presence of strong nonlinearities. This paper studies the potential usefulness of nonlinear regression methods for predicting, from in situ near-infrared (NIR) and mid-infrared (MIR) spectra acquired in high-throughput mode, biomass and plasmid concentrations in Escherichia coli DH5-α cultures producing the plasmid model pVAX-LacZ. The linear methods PLS and ridge regression (RR) are compared with their kernel (nonlinear) versions, kPLS and kRR, as well as with the (also nonlinear) relevance vector machine (RVM) and Gaussian process regression (GPR). For the systems studied, RR provided better predictive performances compared to the remaining methods. Moreover, the results point to further investigation based on larger data sets whenever differences in predictive accuracy between a linear method and its kernelized version could not be found. The use of nonlinear methods, however, shall be judged regarding the additional computational cost required to tune their additional parameters, especially when the less computationally demanding linear methods herein studied are able to successfully monitor the variables under study.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tang, Kunkun, E-mail: ktg@illinois.edu; Inria Bordeaux – Sud-Ouest, Team Cardamom, 200 avenue de la Vieille Tour, 33405 Talence; Congedo, Pietro M.
The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable formore » real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.« less
2011-01-01
Background The final article in a series of three publications examining the global distribution of 41 dominant vector species (DVS) of malaria is presented here. The first publication examined the DVS from the Americas, with the second covering those species present in Africa, Europe and the Middle East. Here we discuss the 19 DVS of the Asian-Pacific region. This region experiences a high diversity of vector species, many occurring sympatrically, which, combined with the occurrence of a high number of species complexes and suspected species complexes, and behavioural plasticity of many of these major vectors, adds a level of entomological complexity not comparable elsewhere globally. To try and untangle the intricacy of the vectors of this region and to increase the effectiveness of vector control interventions, an understanding of the contemporary distribution of each species, combined with a synthesis of the current knowledge of their behaviour and ecology is needed. Results Expert opinion (EO) range maps, created with the most up-to-date expert knowledge of each DVS distribution, were combined with a contemporary database of occurrence data and a suite of open access, environmental and climatic variables. Using the Boosted Regression Tree (BRT) modelling method, distribution maps of each DVS were produced. The occurrence data were abstracted from the formal, published literature, plus other relevant sources, resulting in the collation of DVS occurrence at 10116 locations across 31 countries, of which 8853 were successfully geo-referenced and 7430 were resolved to spatial areas that could be included in the BRT model. A detailed summary of the information on the bionomics of each species and species complex is also presented. Conclusions This article concludes a project aimed to establish the contemporary global distribution of the DVS of malaria. The three articles produced are intended as a detailed reference for scientists continuing research into the aspects of taxonomy, biology and ecology relevant to species-specific vector control. This research is particularly relevant to help unravel the complicated taxonomic status, ecology and epidemiology of the vectors of the Asia-Pacific region. All the occurrence data, predictive maps and EO-shape files generated during the production of these publications will be made available in the public domain. We hope that this will encourage data sharing to improve future iterations of the distribution maps. PMID:21612587
Sinka, Marianne E; Bangs, Michael J; Manguin, Sylvie; Chareonviriyaphap, Theeraphap; Patil, Anand P; Temperley, William H; Gething, Peter W; Elyazar, Iqbal R F; Kabaria, Caroline W; Harbach, Ralph E; Hay, Simon I
2011-05-25
The final article in a series of three publications examining the global distribution of 41 dominant vector species (DVS) of malaria is presented here. The first publication examined the DVS from the Americas, with the second covering those species present in Africa, Europe and the Middle East. Here we discuss the 19 DVS of the Asian-Pacific region. This region experiences a high diversity of vector species, many occurring sympatrically, which, combined with the occurrence of a high number of species complexes and suspected species complexes, and behavioural plasticity of many of these major vectors, adds a level of entomological complexity not comparable elsewhere globally. To try and untangle the intricacy of the vectors of this region and to increase the effectiveness of vector control interventions, an understanding of the contemporary distribution of each species, combined with a synthesis of the current knowledge of their behaviour and ecology is needed. Expert opinion (EO) range maps, created with the most up-to-date expert knowledge of each DVS distribution, were combined with a contemporary database of occurrence data and a suite of open access, environmental and climatic variables. Using the Boosted Regression Tree (BRT) modelling method, distribution maps of each DVS were produced. The occurrence data were abstracted from the formal, published literature, plus other relevant sources, resulting in the collation of DVS occurrence at 10116 locations across 31 countries, of which 8853 were successfully geo-referenced and 7430 were resolved to spatial areas that could be included in the BRT model. A detailed summary of the information on the bionomics of each species and species complex is also presented. This article concludes a project aimed to establish the contemporary global distribution of the DVS of malaria. The three articles produced are intended as a detailed reference for scientists continuing research into the aspects of taxonomy, biology and ecology relevant to species-specific vector control. This research is particularly relevant to help unravel the complicated taxonomic status, ecology and epidemiology of the vectors of the Asia-Pacific region. All the occurrence data, predictive maps and EO-shape files generated during the production of these publications will be made available in the public domain. We hope that this will encourage data sharing to improve future iterations of the distribution maps.
Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.
Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo
2015-08-01
Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.
Cohen, Justin M; Wilson, Mark L; Cruz-Celis, Adriana; Ordoñez, Rosalinda; Ramsey, Janine M
2006-11-01
Long-term control of Chagas disease requires not only interruption of the human transmission cycle of Trypanosoma cruzi Schyzotrypanum, Chagas, 1909 by controlling its domestic triatomine vectors but also surveillance to prevent reinfestation of residences from sylvatic or persistent peridomestic populations. Although a number of potential risk factors for infestation have been implicated in previous studies, the explanatory power of resulting models has been low. Two years after cessation of triatomine vector control efforts in the town of Chalcatzingo, Morelos, 78 environmental, socioecological, and spatial variables were analyzed for association with infestation by Triatoma pallidipennis Stal 1872 (Hemiptera: Reduviidae: Triatominae), the principal vector of T. cruzi. We studied 712 residences in this rural community to identify specific intradomestic and peridomestic risk factors that predicted infestation with T. pallidipennis. From numerous characteristics that were identified as correlated with infestation, we derived multivariate logistic regression models to predict residences that were more or less likely to be infested with T. pallidipennis. The most important risk factors for infestation included measurements of house age, upkeep, and spatial location in the town. The effects of certain risk factors on infestation were found to be modified by spatial characteristics of residences. The results of this study provide new information regarding risk factors for infestation by T. pallidipennis that may aid in designing sustainable disease control programs in rural Mexico.
NASA Astrophysics Data System (ADS)
Mehdizadeh, Saeid; Behmanesh, Javad; Khalili, Keivan
2017-07-01
Soil temperature (T s) and its thermal regime are the most important factors in plant growth, biological activities, and water movement in soil. Due to scarcity of the T s data, estimation of soil temperature is an important issue in different fields of sciences. The main objective of the present study is to investigate the accuracy of multivariate adaptive regression splines (MARS) and support vector machine (SVM) methods for estimating the T s. For this aim, the monthly mean data of the T s (at depths of 5, 10, 50, and 100 cm) and meteorological parameters of 30 synoptic stations in Iran were utilized. To develop the MARS and SVM models, various combinations of minimum, maximum, and mean air temperatures (T min, T max, T); actual and maximum possible sunshine duration; sunshine duration ratio (n, N, n/N); actual, net, and extraterrestrial solar radiation data (R s, R n, R a); precipitation (P); relative humidity (RH); wind speed at 2 m height (u 2); and water vapor pressure (Vp) were used as input variables. Three error statistics including root-mean-square-error (RMSE), mean absolute error (MAE), and determination coefficient (R 2) were used to check the performance of MARS and SVM models. The results indicated that the MARS was superior to the SVM at different depths. In the test and validation phases, the most accurate estimations for the MARS were obtained at the depth of 10 cm for T max, T min, T inputs (RMSE = 0.71 °C, MAE = 0.54 °C, and R 2 = 0.995) and for RH, V p, P, and u 2 inputs (RMSE = 0.80 °C, MAE = 0.61 °C, and R 2 = 0.996), respectively.
Linear regression models for solvent accessibility prediction in proteins.
Wagner, Michael; Adamczak, Rafał; Porollo, Aleksey; Meller, Jarosław
2005-04-01
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
How to predict the sugariness and hardness of melons: A near-infrared hyperspectral imaging method.
Sun, Meijun; Zhang, Dong; Liu, Li; Wang, Zheng
2017-03-01
Hyperspectral imaging (HSI) in the near-infrared (NIR) region (900-1700nm) was used for non-intrusive quality measurements (of sweetness and texture) in melons. First, HSI data from melon samples were acquired to extract the spectral signatures. The corresponding sample sweetness and hardness values were recorded using traditional intrusive methods. Partial least squares regression (PLSR), principal component analysis (PCA), support vector machine (SVM), and artificial neural network (ANN) models were created to predict melon sweetness and hardness values from the hyperspectral data. Experimental results for the three types of melons show that PLSR produces the most accurate results. To reduce the high dimensionality of the hyperspectral data, the weighted regression coefficients of the resulting PLSR models were used to identify the most important wavelengths. On the basis of these wavelengths, each image pixel was used to visualize the sweetness and hardness in all the portions of each sample. Copyright © 2016 Elsevier Ltd. All rights reserved.
A canonical correlation neural network for multicollinearity and functional data.
Gou, Zhenkun; Fyfe, Colin
2004-03-01
We review a recent neural implementation of Canonical Correlation Analysis and show, using ideas suggested by Ridge Regression, how to make the algorithm robust. The network is shown to operate on data sets which exhibit multicollinearity. We develop a second model which not only performs as well on multicollinear data but also on general data sets. This model allows us to vary a single parameter so that the network is capable of performing Partial Least Squares regression (at one extreme) to Canonical Correlation Analysis (at the other)and every intermediate operation between the two. On multicollinear data, the parameter setting is shown to be important but on more general data no particular parameter setting is required. Finally, we develop a second penalty term which acts on such data as a smoother in that the resulting weight vectors are much smoother and more interpretable than the weights without the robustification term. We illustrate our algorithms on both artificial and real data.
Ahmadi, Hamed; Rodehutscord, Markus
2017-01-01
In the nutrition literature, there are several reports on the use of artificial neural network (ANN) and multiple linear regression (MLR) approaches for predicting feed composition and nutritive value, while the use of support vector machines (SVM) method as a new alternative approach to MLR and ANN models is still not fully investigated. The MLR, ANN, and SVM models were developed to predict metabolizable energy (ME) content of compound feeds for pigs based on the German energy evaluation system from analyzed contents of crude protein (CP), ether extract (EE), crude fiber (CF), and starch. A total of 290 datasets from standardized digestibility studies with compound feeds was provided from several institutions and published papers, and ME was calculated thereon. Accuracy and precision of developed models were evaluated, given their produced prediction values. The results revealed that the developed ANN [ R 2 = 0.95; root mean square error (RMSE) = 0.19 MJ/kg of dry matter] and SVM ( R 2 = 0.95; RMSE = 0.21 MJ/kg of dry matter) models produced better prediction values in estimating ME in compound feed than those produced by conventional MLR ( R 2 = 0.89; RMSE = 0.27 MJ/kg of dry matter). The developed ANN and SVM models produced better prediction values in estimating ME in compound feed than those produced by conventional MLR; however, there were not obvious differences between performance of ANN and SVM models. Thus, SVM model may also be considered as a promising tool for modeling the relationship between chemical composition and ME of compound feeds for pigs. To provide the readers and nutritionist with the easy and rapid tool, an Excel ® calculator, namely, SVM_ME_pig, was created to predict the metabolizable energy values in compound feeds for pigs using developed support vector machine model.
Logistic regression of family data from retrospective study designs.
Whittemore, Alice S; Halpern, Jerry
2003-11-01
We wish to study the effects of genetic and environmental factors on disease risk, using data from families ascertained because they contain multiple cases of the disease. To do so, we must account for the way participants were ascertained, and for within-family correlations in both disease occurrences and covariates. We model the joint probability distribution of the covariates of ascertained family members, given family disease occurrence and pedigree structure. We describe two such covariate models: the random effects model and the marginal model. Both models assume a logistic form for the distribution of one person's covariates that involves a vector beta of regression parameters. The components of beta in the two models have different interpretations, and they differ in magnitude when the covariates are correlated within families. We describe ascertainment assumptions needed to estimate consistently the parameters beta(RE) in the random effects model and the parameters beta(M) in the marginal model. Under the ascertainment assumptions for the random effects model, we show that conditional logistic regression (CLR) of matched family data gives a consistent estimate beta(RE) for beta(RE) and a consistent estimate for the covariance matrix of beta(RE). Under the ascertainment assumptions for the marginal model, we show that unconditional logistic regression (ULR) gives a consistent estimate for beta(M), and we give a consistent estimator for its covariance matrix. The random effects/CLR approach is simple to use and to interpret, but it can use data only from families containing both affected and unaffected members. The marginal/ULR approach uses data from all individuals, but its variance estimates require special computations. A C program to compute these variance estimates is available at http://www.stanford.edu/dept/HRP/epidemiology. We illustrate these pros and cons by application to data on the effects of parity on ovarian cancer risk in mother/daughter pairs, and use simulations to study the performance of the estimates. Copyright 2003 Wiley-Liss, Inc.
Gidwani, Kamlesh; Picado, Albert; Rijal, Suman; Singh, Shri Prakash; Roy, Lalita; Volfova, Vera; Andersen, Elisabeth Wreford; Uranw, Surendra; Ostyn, Bart; Sudarshan, Medhavi; Chakravarty, Jaya; Volf, Petr; Sundar, Shyam; Boelaert, Marleen; Rogers, Matthew Edward
2011-01-01
Background Visceral leishmaniasis is the world' second largest vector-borne parasitic killer and a neglected tropical disease, prevalent in poor communities. Long-lasting insecticidal nets (LNs) are a low cost proven vector intervention method for malaria control; however, their effectiveness against visceral leishmaniasis (VL) is unknown. This study quantified the effect of LNs on exposure to the sand fly vector of VL in India and Nepal during a two year community intervention trial. Methods As part of a paired-cluster randomized controlled clinical trial in VL-endemic regions of India and Nepal we tested the effect of LNs on sand fly biting by measuring the antibody response of subjects to the saliva of Leishmania donovani vector Phlebotomus argentipes and the sympatric (non-vector) Phlebotomus papatasi. Fifteen to 20 individuals above 15 years of age from 26 VL endemic clusters were asked to provide a blood sample at baseline, 12 and 24 months post-intervention. Results A total of 305 individuals were included in the study, 68 participants provided two blood samples and 237 gave three samples. A random effect linear regression model showed that cluster-wide distribution of LNs reduced exposure to P. argentipes by 12% at 12 months (effect 0.88; 95% CI 0.83–0.94) and 9% at 24 months (effect 0.91; 95% CI 0.80–1.02) in the intervention group compared to control adjusting for baseline values and pair. Similar results were obtained for P. papatasi. Conclusions This trial provides evidence that LNs have a limited effect on sand fly exposure in VL endemic communities in India and Nepal and supports the use of sand fly saliva antibodies as a marker to evaluate vector control interventions. PMID:21931871
Gidwani, Kamlesh; Picado, Albert; Rijal, Suman; Singh, Shri Prakash; Roy, Lalita; Volfova, Vera; Andersen, Elisabeth Wreford; Uranw, Surendra; Ostyn, Bart; Sudarshan, Medhavi; Chakravarty, Jaya; Volf, Petr; Sundar, Shyam; Boelaert, Marleen; Rogers, Matthew Edward
2011-09-01
Visceral leishmaniasis is the world' second largest vector-borne parasitic killer and a neglected tropical disease, prevalent in poor communities. Long-lasting insecticidal nets (LNs) are a low cost proven vector intervention method for malaria control; however, their effectiveness against visceral leishmaniasis (VL) is unknown. This study quantified the effect of LNs on exposure to the sand fly vector of VL in India and Nepal during a two year community intervention trial. As part of a paired-cluster randomized controlled clinical trial in VL-endemic regions of India and Nepal we tested the effect of LNs on sand fly biting by measuring the antibody response of subjects to the saliva of Leishmania donovani vector Phlebotomus argentipes and the sympatric (non-vector) Phlebotomus papatasi. Fifteen to 20 individuals above 15 years of age from 26 VL endemic clusters were asked to provide a blood sample at baseline, 12 and 24 months post-intervention. A total of 305 individuals were included in the study, 68 participants provided two blood samples and 237 gave three samples. A random effect linear regression model showed that cluster-wide distribution of LNs reduced exposure to P. argentipes by 12% at 12 months (effect 0.88; 95% CI 0.83-0.94) and 9% at 24 months (effect 0.91; 95% CI 0.80-1.02) in the intervention group compared to control adjusting for baseline values and pair. Similar results were obtained for P. papatasi. This trial provides evidence that LNs have a limited effect on sand fly exposure in VL endemic communities in India and Nepal and supports the use of sand fly saliva antibodies as a marker to evaluate vector control interventions.
Barhoumi, Walid; Qualls, Whitney A.; Archer, Reginald; Fuller, Douglas O.; Chelbi, Ifhem; Cherni, Saifedine; Derbali, Mohamed; Arheart, Kristopher L.; Zhioua, Elyes; Beier, John C.
2015-01-01
The distribution expansion of important human visceral leishmaniasis (HVL) and sporadic cutaneous leishmaniasis (SCL) vector species, Phlebotomus perfiliewi and P. perniciosus, throughout central Tunisia is a major public health concern. This study was designed to investigate if the expansion of irrigation influences the abundance of sand fly species potentially involved in the transmission of HVL and SCL located in arid bioclimatic regions. Geographic and remote sensing approaches were used to predict the density of visceral leishmaniasis vectors in Tunisia. Entomological investigations were performed in the governorate of Sidi Bouzid, located in the arid bioclimatic region of Tunisia. In 2012, sand flies were collected by CDC light traps located at nine irrigated and nine non-irrigated sites to determine species abundance. Eight species in two genera were collected. Among sand flies of the subgenus Larroussius, P. perfiliewi was the only species collected significantly more in irrigated areas. Trap data were then used to develop Poisson regression models to map the apparent density of important sand fly species as a function of different environmental covariates including climate and vegetation density. The density of P. perfiliewi is predicted to be moderately high in the arid regions. These results highlight that the abundance of P. perfiliewi is associated with the development of irrigated areas and suggests that the expansion of this species will continue to more arid areas of the country as irrigation sites continue to be developed in the region. The continued increase in irrigated areas in the Middle East and North Africa region deserves attention, as it is associated with the spread of L. infantum vector P. perfiliewi. Integrated vector management strategies targeting irrigation structures to reduce sand fly vector populations should be evaluated in light of these findings. PMID:25447265
NASA Astrophysics Data System (ADS)
Tang, Kunkun; Congedo, Pietro M.; Abgrall, Rémi
2016-06-01
The Polynomial Dimensional Decomposition (PDD) is employed in this work for the global sensitivity analysis and uncertainty quantification (UQ) of stochastic systems subject to a moderate to large number of input random variables. Due to the intimate connection between the PDD and the Analysis of Variance (ANOVA) approaches, PDD is able to provide a simpler and more direct evaluation of the Sobol' sensitivity indices, when compared to the Polynomial Chaos expansion (PC). Unfortunately, the number of PDD terms grows exponentially with respect to the size of the input random vector, which makes the computational cost of standard methods unaffordable for real engineering applications. In order to address the problem of the curse of dimensionality, this work proposes essentially variance-based adaptive strategies aiming to build a cheap meta-model (i.e. surrogate model) by employing the sparse PDD approach with its coefficients computed by regression. Three levels of adaptivity are carried out in this paper: 1) the truncated dimensionality for ANOVA component functions, 2) the active dimension technique especially for second- and higher-order parameter interactions, and 3) the stepwise regression approach designed to retain only the most influential polynomials in the PDD expansion. During this adaptive procedure featuring stepwise regressions, the surrogate model representation keeps containing few terms, so that the cost to resolve repeatedly the linear systems of the least-squares regression problem is negligible. The size of the finally obtained sparse PDD representation is much smaller than the one of the full expansion, since only significant terms are eventually retained. Consequently, a much smaller number of calls to the deterministic model is required to compute the final PDD coefficients.
NASA Technical Reports Server (NTRS)
Wentz, F. J.
1977-01-01
The general problem of bistatic scattering from a two scale surface was evaluated. The treatment was entirely two-dimensional and in a vector formulation independent of any particular coordinate system. The two scale scattering model was then applied to backscattering from the sea surface. In particular, the model was used in conjunction with the JONSWAP 1975 aircraft scatterometer measurements to determine the sea surface's two scale roughness distributions, namely the probability density of the large scale surface slope and the capillary wavenumber spectrum. Best fits yield, on the average, a 0.7 dB rms difference between the model computations and the vertical polarization measurements of the normalized radar cross section. Correlations between the distribution parameters and the wind speed were established from linear, least squares regressions.
Attia, Khalid A M; Nassar, Mohammed W I; El-Zeiny, Mohamed B; Serag, Ahmed
2017-01-05
For the first time, a new variable selection method based on swarm intelligence namely firefly algorithm is coupled with three different multivariate calibration models namely, concentration residual augmented classical least squares, artificial neural network and support vector regression in UV spectral data. A comparative study between the firefly algorithm and the well-known genetic algorithm was developed. The discussion revealed the superiority of using this new powerful algorithm over the well-known genetic algorithm. Moreover, different statistical tests were performed and no significant differences were found between all the models regarding their predictabilities. This ensures that simpler and faster models were obtained without any deterioration of the quality of the calibration. Copyright © 2016 Elsevier B.V. All rights reserved.
Predicting Culex pipiens/restuans population dynamics by interval lagged weather data
2013-01-01
Background Culex pipiens/restuans mosquitoes are important vectors for a variety of arthropod borne viral infections. In this study, the associations between 20 years of mosquito capture data and the time lagged environmental quantities daytime length, temperature, precipitation, relative humidity and wind speed were used to generate a predictive model for the population dynamics of this vector species. Methods Mosquito population in the study area was represented by averaged time series of mosquitos counts captured at 6 sites in Cook County (Illinois, USA). Cross-correlation maps (CCMs) were compiled to investigate the association between mosquito abundances and environmental quantities. The results obtained from the CCMs were incorporated into a Poisson regression to generate a predictive model. To optimize the predictive model the time lags obtained from the CCMs were adjusted using a genetic algorithm. Results CCMs for weekly data showed a highly positive correlation of mosquito abundances with daytime length 4 to 5 weeks prior to capture (quantified by a Spearman rank order correlation of rS = 0.898) and with temperature during 2 weeks prior to capture (rS = 0.870). Maximal negative correlations were found for wind speed averaged over 3 week prior to capture (rS = −0.621). Cx. pipiens/restuans population dynamics was predicted by integrating the CCM results in Poisson regression models. They were used to simulate the average seasonal cycle of the mosquito abundance. Verification with observations resulted in a correlation of rS = 0.899 for daily and rS = 0.917 for weekly data. Applying the optimized models to the entire 20-years time series also resulted in a suitable fit with rS = 0.876 for daily and rS = 0.899 for weekly data. Conclusions The study demonstrates the application of interval lagged weather data to predict mosquito abundances with a feasible accuracy, especially when related to weekly Cx. pipiens/restuans populations. PMID:23634763
Analysis of near infrared spectra for age-grading of wild populations of Anopheles gambiae.
Krajacich, Benjamin J; Meyers, Jacob I; Alout, Haoues; Dabiré, Roch K; Dowell, Floyd E; Foy, Brian D
2017-11-07
Understanding the age-structure of mosquito populations, especially malaria vectors such as Anopheles gambiae, is important for assessing the risk of infectious mosquitoes, and how vector control interventions may impact this risk. The use of near-infrared spectroscopy (NIRS) for age-grading has been demonstrated previously on laboratory and semi-field mosquitoes, but to date has not been utilized on wild-caught mosquitoes whose age is externally validated via parity status or parasite infection stage. In this study, we developed regression and classification models using NIRS on datasets of wild An. gambiae (s.l.) reared from larvae collected from the field in Burkina Faso, and two laboratory strains. We compared the accuracy of these models for predicting the ages of wild-caught mosquitoes that had been scored for their parity status as well as for positivity for Plasmodium sporozoites. Regression models utilizing variable selection increased predictive accuracy over the more common full-spectrum partial least squares (PLS) approach for cross-validation of the datasets, validation, and independent test sets. Models produced from datasets that included the greatest range of mosquito samples (i.e. different sampling locations and times) had the highest predictive accuracy on independent testing sets, though overall accuracy on these samples was low. For classification, we found that intramodel accuracy ranged between 73.5-97.0% for grouping of mosquitoes into "early" and "late" age classes, with the highest prediction accuracy found in laboratory colonized mosquitoes. However, this accuracy was decreased on test sets, with the highest classification of an independent set of wild-caught larvae reared to set ages being 69.6%. Variation in NIRS data, likely from dietary, genetic, and other factors limits the accuracy of this technique with wild-caught mosquitoes. Alternative algorithms may help improve prediction accuracy, but care should be taken to either maximize variety in models or minimize confounders.
Cui, Zaixu; Gong, Gaolang
2018-06-02
Individualized behavioral/cognitive prediction using machine learning (ML) regression approaches is becoming increasingly applied. The specific ML regression algorithm and sample size are two key factors that non-trivially influence prediction accuracies. However, the effects of the ML regression algorithm and sample size on individualized behavioral/cognitive prediction performance have not been comprehensively assessed. To address this issue, the present study included six commonly used ML regression algorithms: ordinary least squares (OLS) regression, least absolute shrinkage and selection operator (LASSO) regression, ridge regression, elastic-net regression, linear support vector regression (LSVR), and relevance vector regression (RVR), to perform specific behavioral/cognitive predictions based on different sample sizes. Specifically, the publicly available resting-state functional MRI (rs-fMRI) dataset from the Human Connectome Project (HCP) was used, and whole-brain resting-state functional connectivity (rsFC) or rsFC strength (rsFCS) were extracted as prediction features. Twenty-five sample sizes (ranged from 20 to 700) were studied by sub-sampling from the entire HCP cohort. The analyses showed that rsFC-based LASSO regression performed remarkably worse than the other algorithms, and rsFCS-based OLS regression performed markedly worse than the other algorithms. Regardless of the algorithm and feature type, both the prediction accuracy and its stability exponentially increased with increasing sample size. The specific patterns of the observed algorithm and sample size effects were well replicated in the prediction using re-testing fMRI data, data processed by different imaging preprocessing schemes, and different behavioral/cognitive scores, thus indicating excellent robustness/generalization of the effects. The current findings provide critical insight into how the selected ML regression algorithm and sample size influence individualized predictions of behavior/cognition and offer important guidance for choosing the ML regression algorithm or sample size in relevant investigations. Copyright © 2018 Elsevier Inc. All rights reserved.
Liu, Shu-Shen; Qin, Li-Tang; Liu, Hai-Ling; Yin, Da-Qiang
2008-02-01
Molecular electronegativity distance vector (MEDV) derived directly from the molecular topological structures was used to describe the structures of 122 nonionic organic compounds (NOCs) and a quantitative relationship between the MEDV descriptors and the bioconcentration factors (BCF) of NOCs in fish was developed using the variable selection and modeling based on prediction (VSMP). It was found that some main structural factors influencing the BCFs of NOCs are the substructures expressed by four atomic types of nos. 2, 3, 5, and 13, i.e., atom groups -CH(2)- or =CH-, -CH< or =C<, -NH(2), and -Cl or -Br where the former two groups exist in the molecular skeleton of NOC and the latter three groups are related closely to the substituting groups on a benzene ring. The best 5-variable model, with the correlation coefficient (r(2)) of 0.9500 and the leave-one-out cross-validation correlation coefficient (q(2)) of 0.9428, was built by multiple linear regressions, which shows a good estimation ability and stability. A predictive power for the external samples was tested by the model from the training set of 80 NOCs and the predictive correlation coefficient (u(2)) for the 42 external samples in the test set was 0.9028.
Wan, Jian; Chen, Yi-Chieh; Morris, A Julian; Thennadil, Suresh N
2017-07-01
Near-infrared (NIR) spectroscopy is being widely used in various fields ranging from pharmaceutics to the food industry for analyzing chemical and physical properties of the substances concerned. Its advantages over other analytical techniques include available physical interpretation of spectral data, nondestructive nature and high speed of measurements, and little or no need for sample preparation. The successful application of NIR spectroscopy relies on three main aspects: pre-processing of spectral data to eliminate nonlinear variations due to temperature, light scattering effects and many others, selection of those wavelengths that contribute useful information, and identification of suitable calibration models using linear/nonlinear regression . Several methods have been developed for each of these three aspects and many comparative studies of different methods exist for an individual aspect or some combinations. However, there is still a lack of comparative studies for the interactions among these three aspects, which can shed light on what role each aspect plays in the calibration and how to combine various methods of each aspect together to obtain the best calibration model. This paper aims to provide such a comparative study based on four benchmark data sets using three typical pre-processing methods, namely, orthogonal signal correction (OSC), extended multiplicative signal correction (EMSC) and optical path-length estimation and correction (OPLEC); two existing wavelength selection methods, namely, stepwise forward selection (SFS) and genetic algorithm optimization combined with partial least squares regression for spectral data (GAPLSSP); four popular regression methods, namely, partial least squares (PLS), least absolute shrinkage and selection operator (LASSO), least squares support vector machine (LS-SVM), and Gaussian process regression (GPR). The comparative study indicates that, in general, pre-processing of spectral data can play a significant role in the calibration while wavelength selection plays a marginal role and the combination of certain pre-processing, wavelength selection, and nonlinear regression methods can achieve superior performance over traditional linear regression-based calibration.
Vectorized Jiles-Atherton hysteresis model
NASA Astrophysics Data System (ADS)
Szymański, Grzegorz; Waszak, Michał
2004-01-01
This paper deals with vector hysteresis modeling. A vector model consisting of individual Jiles-Atherton components placed along principal axes is proposed. The cross-axis coupling ensures general vector model properties. Minor loops are obtained using scaling method. The model is intended for efficient finite element method computations defined in terms of magnetic vector potential. Numerical efficiency is ensured by differential susceptibility approach.
Quantitative prediction of ionization effect on human skin permeability.
Baba, Hiromi; Ueno, Yusuke; Hashida, Mitsuru; Yamashita, Fumiyoshi
2017-04-30
Although skin permeability of an active ingredient can be severely affected by its ionization in a dose solution, most of the existing prediction models cannot predict such impacts. To provide reliable predictors, we curated a novel large dataset of in vitro human skin permeability coefficients for 322 entries comprising chemically diverse permeants whose ionization fractions can be calculated. Subsequently, we generated thousands of computational descriptors, including LogD (octanol-water distribution coefficient at a specific pH), and analyzed the dataset using nonlinear support vector regression (SVR) and Gaussian process regression (GPR) combined with greedy descriptor selection. The SVR model was slightly superior to the GPR model, with externally validated squared correlation coefficient, root mean square error, and mean absolute error values of 0.94, 0.29, and 0.21, respectively. These models indicate that Log D is effective for a comprehensive prediction of ionization effects on skin permeability. In addition, the proposed models satisfied the statistical criteria endorsed in recent model validation studies. These models can evaluate virtually generated compounds at any pH; therefore, they can be used for high-throughput evaluations of numerous active ingredients and optimization of their skin permeability with respect to permeant ionization. Copyright © 2017 Elsevier B.V. All rights reserved.
Prediction of brain maturity in infants using machine-learning algorithms.
Smyser, Christopher D; Dosenbach, Nico U F; Smyser, Tara A; Snyder, Abraham Z; Rogers, Cynthia E; Inder, Terrie E; Schlaggar, Bradley L; Neil, Jeffrey J
2016-08-01
Recent resting-state functional MRI investigations have demonstrated that much of the large-scale functional network architecture supporting motor, sensory and cognitive functions in older pediatric and adult populations is present in term- and prematurely-born infants. Application of new analytical approaches can help translate the improved understanding of early functional connectivity provided through these studies into predictive models of neurodevelopmental outcome. One approach to achieving this goal is multivariate pattern analysis, a machine-learning, pattern classification approach well-suited for high-dimensional neuroimaging data. It has previously been adapted to predict brain maturity in children and adolescents using structural and resting state-functional MRI data. In this study, we evaluated resting state-functional MRI data from 50 preterm-born infants (born at 23-29weeks of gestation and without moderate-severe brain injury) scanned at term equivalent postmenstrual age compared with data from 50 term-born control infants studied within the first week of life. Using 214 regions of interest, binary support vector machines distinguished term from preterm infants with 84% accuracy (p<0.0001). Inter- and intra-hemispheric connections throughout the brain were important for group categorization, indicating that widespread changes in the brain's functional network architecture associated with preterm birth are detectable by term equivalent age. Support vector regression enabled quantitative estimation of birth gestational age in single subjects using only term equivalent resting state-functional MRI data, indicating that the present approach is sensitive to the degree of disruption of brain development associated with preterm birth (using gestational age as a surrogate for the extent of disruption). This suggests that support vector regression may provide a means for predicting neurodevelopmental outcome in individual infants. Copyright © 2016 Elsevier Inc. All rights reserved.
Prediction of brain maturity in infants using machine-learning algorithms
Smyser, Christopher D.; Dosenbach, Nico U.F.; Smyser, Tara A.; Snyder, Abraham Z.; Rogers, Cynthia E.; Inder, Terrie E.; Schlaggar, Bradley L.; Neil, Jeffrey J.
2016-01-01
Recent resting-state functional MRI investigations have demonstrated that much of the large-scale functional network architecture supporting motor, sensory and cognitive functions in older pediatric and adult populations is present in term- and prematurely-born infants. Application of new analytical approaches can help translate the improved understanding of early functional connectivity provided through these studies into predictive models of neurodevelopmental outcome. One approach to achieving this goal is multivariate pattern analysis, a machine-learning, pattern classification approach well-suited for high-dimensional neuroimaging data. It has previously been adapted to predict brain maturity in children and adolescents using structural and resting state-functional MRI data. In this study, we evaluated resting state-functional MRI data from 50 preterm-born infants (born at 23–29 weeks of gestation and without moderate–severe brain injury) scanned at term equivalent postmenstrual age compared with data from 50 term-born control infants studied within the first week of life. Using 214 regions of interest, binary support vector machines distinguished term from preterm infants with 84% accuracy (p < 0.0001). Inter- and intra-hemispheric connections throughout the brain were important for group categorization, indicating that widespread changes in the brain's functional network architecture associated with preterm birth are detectable by term equivalent age. Support vector regression enabled quantitative estimation of birth gestational age in single subjects using only term equivalent resting state-functional MRI data, indicating that the present approach is sensitive to the degree of disruption of brain development associated with preterm birth (using gestational age as a surrogate for the extent of disruption). This suggests that support vector regression may provide a means for predicting neurodevelopmental outcome in individual infants. PMID:27179605
Schmitz, M; Graf, C; Gut, T; Sirena, D; Peter, I; Dummer, R; Greber, U F; Hemmi, S
2006-06-01
Replicating adenovirus (Ad) vectors with tumour tissue specificity hold great promise for treatment of cancer. We have recently constructed a conditionally replicating Ad5 AdDeltaEP-TETP inducing tumour regression in a xenograft mouse model. For further improvement of this vector, we introduced four genetic modifications and analysed the viral cytotoxicity in a large panel of melanoma cell lines and patient-derived melanoma cells. (1) The antiapoptotic gene E1B-19 kDa (Delta19 mutant) was deleted increasing the cytolytic activity in 18 of 21 melanoma cells. (2) Introduction of the E1A 122-129 deletion (Delta24 mutant), suggested to attenuate viral replication in cell cycle-arrested cells, did not abrogate this activity and increased the cytolytic activity in two of 21 melanoma cells. (3) We inserted an RGD sequence into the fiber to extend viral tropism to alphav integrin-expressing cells, and (4) swapped the fiber with the Ad35 fiber (F35) enhancing the tropism to malignant melanoma cells expressing CD46. The RGD-fiber modification strongly increased cytolysis in all of the 11 CAR-low melanoma cells. The F35 fiber-chimeric vector boosted the cytotoxicity in nine of 11 cells. Our results show that rational engineering additively enhances the cytolytic potential of Ad vectors, a prerequisite for the development of patient-customized viral therapies.
Rhabdoviruses as vaccine platforms for infectious disease and cancer.
Zemp, Franz; Rajwani, Jahanara; Mahoney, Douglas J
2018-05-21
The family Rhabdoviridae (RV) comprises a large, genetically diverse collection of single-stranded, negative sense RNA viruses from the order Mononegavirales. Several RV members are being developed as live-attenuated vaccine vectors for the prevention or treatment of infectious disease and cancer. These include the prototype recombinant Vesicular Stomatitis Virus (rVSV) and the more recently developed recombinant Maraba Virus, both species within the genus Vesiculoviridae. A relatively strong safety profile in humans, robust immunogenicity and genetic malleability are key features that make the RV family attractive vaccine platforms. Currently, the rVSV vector is in preclinical development for vaccination against numerous high-priority infectious diseases, with clinical evaluation underway for HIV/AIDS and Ebola virus disease. Indeed, the success of the rVSV-ZEBOV vaccine during the 2014-15 Ebola virus outbreak in West Africa highlights the therapeutic potential of rVSV as a vaccine vector for acute, life-threatening viral illnesses. The rVSV and rMaraba platforms are also being tested as 'oncolytic' cancer vaccines in a series of phase 1-2 clinical trials, after being proven effective at eliciting immune-mediated tumour regression in preclinical mouse models. In this review, we discuss the biological and genetic features that make RVs attractive vaccine platforms and the development and ongoing testing of rVSV and rMaraba strains as vaccine vectors for infectious disease and cancer.
NASA Astrophysics Data System (ADS)
Nieto, Paulino José García; García-Gonzalo, Esperanza; Vilán, José Antonio Vilán; Robleda, Abraham Segade
2015-12-01
The main aim of this research work is to build a new practical hybrid regression model to predict the milling tool wear in a regular cut as well as entry cut and exit cut of a milling tool. The model was based on Particle Swarm Optimization (PSO) in combination with support vector machines (SVMs). This optimization mechanism involved kernel parameter setting in the SVM training procedure, which significantly influences the regression accuracy. Bearing this in mind, a PSO-SVM-based model, which is based on the statistical learning theory, was successfully used here to predict the milling tool flank wear (output variable) as a function of the following input variables: the time duration of experiment, depth of cut, feed, type of material, etc. To accomplish the objective of this study, the experimental dataset represents experiments from runs on a milling machine under various operating conditions. In this way, data sampled by three different types of sensors (acoustic emission sensor, vibration sensor and current sensor) were acquired at several positions. A second aim is to determine the factors with the greatest bearing on the milling tool flank wear with a view to proposing milling machine's improvements. Firstly, this hybrid PSO-SVM-based regression model captures the main perception of statistical learning theory in order to obtain a good prediction of the dependence among the flank wear (output variable) and input variables (time, depth of cut, feed, etc.). Indeed, regression with optimal hyperparameters was performed and a determination coefficient of 0.95 was obtained. The agreement of this model with experimental data confirmed its good performance. Secondly, the main advantages of this PSO-SVM-based model are its capacity to produce a simple, easy-to-interpret model, its ability to estimate the contributions of the input variables, and its computational efficiency. Finally, the main conclusions of this study are exposed.
Feature selection using probabilistic prediction of support vector regression.
Yang, Jian-Bo; Ong, Chong-Jin
2011-06-01
This paper presents a new wrapper-based feature selection method for support vector regression (SVR) using its probabilistic predictions. The method computes the importance of a feature by aggregating the difference, over the feature space, of the conditional density functions of the SVR prediction with and without the feature. As the exact computation of this importance measure is expensive, two approximations are proposed. The effectiveness of the measure using these approximations, in comparison to several other existing feature selection methods for SVR, is evaluated on both artificial and real-world problems. The result of the experiments show that the proposed method generally performs better than, or at least as well as, the existing methods, with notable advantage when the dataset is sparse.
NASA Astrophysics Data System (ADS)
Xian, Guangming
2018-03-01
In this paper, the vibration flow field parameters of polymer melts in a visual slit die are optimized by using intelligent algorithm. Experimental small angle light scattering (SALS) patterns are shown to characterize the processing process. In order to capture the scattered light, a polarizer and an analyzer are placed before and after the polymer melts. The results reported in this study are obtained using high-density polyethylene (HDPE) with rotation speed at 28 rpm. In addition, support vector regression (SVR) analytical method is introduced for optimization the parameters of vibration flow field. This work establishes the general applicability of SVR for predicting the optimal parameters of vibration flow field.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models.
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance.
Predicting Market Impact Costs Using Nonparametric Machine Learning Models
Park, Saerom; Lee, Jaewook; Son, Youngdoo
2016-01-01
Market impact cost is the most significant portion of implicit transaction costs that can reduce the overall transaction cost, although it cannot be measured directly. In this paper, we employed the state-of-the-art nonparametric machine learning models: neural networks, Bayesian neural network, Gaussian process, and support vector regression, to predict market impact cost accurately and to provide the predictive model that is versatile in the number of variables. We collected a large amount of real single transaction data of US stock market from Bloomberg Terminal and generated three independent input variables. As a result, most nonparametric machine learning models outperformed a-state-of-the-art benchmark parametric model such as I-star model in four error measures. Although these models encounter certain difficulties in separating the permanent and temporary cost directly, nonparametric machine learning models can be good alternatives in reducing transaction costs by considerably improving in prediction performance. PMID:26926235
Multiple linear regression analysis
NASA Technical Reports Server (NTRS)
Edwards, T. R.
1980-01-01
Program rapidly selects best-suited set of coefficients. User supplies only vectors of independent and dependent data and specifies confidence level required. Program uses stepwise statistical procedure for relating minimal set of variables to set of observations; final regression contains only most statistically significant coefficients. Program is written in FORTRAN IV for batch execution and has been implemented on NOVA 1200.
NASA Astrophysics Data System (ADS)
Zhang, Changjiang; Dai, Lijie; Ma, Leiming; Qian, Jinfang; Yang, Bo
2017-10-01
An objective technique is presented for estimating tropical cyclone (TC) innercore two-dimensional (2-D) surface wind field structure using infrared satellite imagery and machine learning. For a TC with eye, the eye contour is first segmented by a geodesic active contour model, based on which the eye circumference is obtained as the TC eye size. A mathematical model is then established between the eye size and the radius of maximum wind obtained from the past official TC report to derive the 2-D surface wind field within the TC eye. Meanwhile, the composite information about the latitude of TC center, surface maximum wind speed, TC age, and critical wind radii of 34- and 50-kt winds can be combined to build another mathematical model for deriving the innercore wind structure. After that, least squares support vector machine (LSSVM), radial basis function neural network (RBFNN), and linear regression are introduced, respectively, in the two mathematical models, which are then tested with sensitivity experiments on real TC cases. Verification shows that the innercore 2-D surface wind field structure estimated by LSSVM is better than that of RBFNN and linear regression.
Multidirectional Scanning Model, MUSCLE, to Vectorize Raster Images with Straight Lines
Karas, Ismail Rakip; Bayram, Bulent; Batuk, Fatmagul; Akay, Abdullah Emin; Baz, Ibrahim
2008-01-01
This paper presents a new model, MUSCLE (Multidirectional Scanning for Line Extraction), for automatic vectorization of raster images with straight lines. The algorithm of the model implements the line thinning and the simple neighborhood methods to perform vectorization. The model allows users to define specified criteria which are crucial for acquiring the vectorization process. In this model, various raster images can be vectorized such as township plans, maps, architectural drawings, and machine plans. The algorithm of the model was developed by implementing an appropriate computer programming and tested on a basic application. Results, verified by using two well known vectorization programs (WinTopo and Scan2CAD), indicated that the model can successfully vectorize the specified raster data quickly and accurately. PMID:27879843
Multidirectional Scanning Model, MUSCLE, to Vectorize Raster Images with Straight Lines.
Karas, Ismail Rakip; Bayram, Bulent; Batuk, Fatmagul; Akay, Abdullah Emin; Baz, Ibrahim
2008-04-15
This paper presents a new model, MUSCLE (Multidirectional Scanning for Line Extraction), for automatic vectorization of raster images with straight lines. The algorithm of the model implements the line thinning and the simple neighborhood methods to perform vectorization. The model allows users to define specified criteria which are crucial for acquiring the vectorization process. In this model, various raster images can be vectorized such as township plans, maps, architectural drawings, and machine plans. The algorithm of the model was developed by implementing an appropriate computer programming and tested on a basic application. Results, verified by using two well known vectorization programs (WinTopo and Scan2CAD), indicated that the model can successfully vectorize the specified raster data quickly and accurately.
Hadisoemarto, Panji Fortuna; Castro, Marcia C
2013-01-01
All four serotypes of dengue virus are endemic in Indonesia, where the population at risk for infection exceeds 200 million people. Despite continuous control efforts that were initiated more than four decades ago, Indonesia still suffers from multi-annual cycles of dengue outbreak and dengue remains as a major public health problem. Dengue vaccines have been viewed as a promising solution for controlling dengue in Indonesia, but thus far its potential acceptability has not been assessed. We conducted a household survey in the city of Bandung, Indonesia by administering a questionnaire to examine (i) acceptance of a hypothetical pediatric dengue vaccine; (ii) participant's willingness-to-pay (WTP) for the vaccine, had it not been provided for free; and (iii) whether people think vector control would be unnecessary if the vaccine was available. A proportional odds model and an interval regression model were employed to identify determinants of acceptance and WTP, respectively. We demonstrated that out of 500 heads of household being interviewed, 94.2% would agree to vaccinate their children with the vaccine. Of all participants, 94.6% were willing to pay for the vaccine with a median WTP of US$1.94. In addition, 7.2% stated that vector control would not be necessary had there been a dengue vaccination program. Our results suggest that future dengue vaccines can have a very high uptake even when delivered through the private market. This, however, can be influenced by vaccine characteristics and price. In addition, reduction in community vector control efforts may be observed following vaccine introduction but its potential impact in the transmission of dengue and other vector-borne diseases requires further study.
Hadisoemarto, Panji Fortuna; Castro, Marcia C.
2013-01-01
Background All four serotypes of dengue virus are endemic in Indonesia, where the population at risk for infection exceeds 200 million people. Despite continuous control efforts that were initiated more than four decades ago, Indonesia still suffers from multi-annual cycles of dengue outbreak and dengue remains as a major public health problem. Dengue vaccines have been viewed as a promising solution for controlling dengue in Indonesia, but thus far its potential acceptability has not been assessed. Methodology/Principal Findings We conducted a household survey in the city of Bandung, Indonesia by administering a questionnaire to examine (i) acceptance of a hypothetical pediatric dengue vaccine; (ii) participant's willingness-to-pay (WTP) for the vaccine, had it not been provided for free; and (iii) whether people think vector control would be unnecessary if the vaccine was available. A proportional odds model and an interval regression model were employed to identify determinants of acceptance and WTP, respectively. We demonstrated that out of 500 heads of household being interviewed, 94.2% would agree to vaccinate their children with the vaccine. Of all participants, 94.6% were willing to pay for the vaccine with a median WTP of US$1.94. In addition, 7.2% stated that vector control would not be necessary had there been a dengue vaccination program. Conclusions/Significance Our results suggest that future dengue vaccines can have a very high uptake even when delivered through the private market. This, however, can be influenced by vaccine characteristics and price. In addition, reduction in community vector control efforts may be observed following vaccine introduction but its potential impact in the transmission of dengue and other vector-borne diseases requires further study. PMID:24069482
Tsafa, Effrosyni; Al-Bahrani, Mariam; Bentayebi, Kaoutar; Przystal, Justyna; Suwan, Keittisak; Hajitou, Amin
2016-08-09
Gene therapy has long been regarded as a promising treatment for cancer. However, cancer gene therapy is still facing the challenge of targeting gene delivery vectors specifically to tumors when administered via clinically acceptable non-invasive systemic routes (i.e. intravenous). The bacteria virus, bacteriophage (phage), represents a new generation of promising vectors in systemic gene delivery since their targeting can be achieved through phage capsid display ligands, which enable them to home to specific tumor receptors without the need to ablate any native eukaryotic tropism. We have previously reported a tumor specific bacteriophage vector named adeno-associated virus/phage, or AAVP, in which gene expression is under a recombinant human rAAV2 virus genome targeted to tumors via a ligand-directed phage capsid. However, cancer gene therapy with this tumor-targeted vector achieved variable outcomes ranging from tumor regression to no effect in both experimental and natural preclinical models. Herein, we hypothesized that combining the natural dietary genistein, with proven anticancer activity, would improve bacteriophage anticancer safe therapy. We show that combination treatment with genistein and AAVP increased targeted cancer cell killing by AAVP carrying the gene for Herpes simplex virus thymidine kinase (HSVtk) in 2D tissue cultures and 3D tumor spheroids. We found this increased tumor cell killing was associated with enhanced AAVP-mediated gene expression. Next, we established that genistein protects AAVP against proteasome degradation and enhances vector genome accumulation in the nucleus. Combination of genistein and phage-guided virotherapy is a safe and promising strategy that should be considered in anticancer therapy with AAVP.
Deeb, Omar; Shaik, Basheerulla; Agrawal, Vijay K
2014-10-01
Quantitative Structure-Activity Relationship (QSAR) models for binding affinity constants (log Ki) of 78 flavonoid ligands towards the benzodiazepine site of GABA (A) receptor complex were calculated using the machine learning methods: artificial neural network (ANN) and support vector machine (SVM) techniques. The models obtained were compared with those obtained using multiple linear regression (MLR) analysis. The descriptor selection and model building were performed with 10-fold cross-validation using the training data set. The SVM and MLR coefficient of determination values are 0.944 and 0.879, respectively, for the training set and are higher than those of ANN models. Though the SVM model shows improvement of training set fitting, the ANN model was superior to SVM and MLR in predicting the test set. Randomization test is employed to check the suitability of the models.
An ultra low power feature extraction and classification system for wearable seizure detection.
Page, Adam; Pramod Tim Oates, Siddharth; Mohsenin, Tinoosh
2015-01-01
In this paper we explore the use of a variety of machine learning algorithms for designing a reliable and low-power, multi-channel EEG feature extractor and classifier for predicting seizures from electroencephalographic data (scalp EEG). Different machine learning classifiers including k-nearest neighbor, support vector machines, naïve Bayes, logistic regression, and neural networks are explored with the goal of maximizing detection accuracy while minimizing power, area, and latency. The input to each machine learning classifier is a 198 feature vector containing 9 features for each of the 22 EEG channels obtained over 1-second windows. All classifiers were able to obtain F1 scores over 80% and onset sensitivity of 100% when tested on 10 patients. Among five different classifiers that were explored, logistic regression (LR) proved to have minimum hardware complexity while providing average F-1 score of 91%. Both ASIC and FPGA implementations of logistic regression are presented and show the smallest area, power consumption, and the lowest latency when compared to the previous work.
Investigation on the effect of diaphragm on the combustion characteristics of solid-fuel ramjet
NASA Astrophysics Data System (ADS)
Gong, Lunkun; Chen, Xiong; Yang, Haitao; Li, Weixuan; Zhou, Changsheng
2017-10-01
The flow field characteristics and the regression rate distribution of solid-fuel ramjet with three-hole diaphragm were investigated by numerical and experimental methods. The experimental data were obtained by burning high-density polyethylene using a connected-pipe facility to validate the numerical model and analyze the combustion efficiency of the solid-fuel ramjet. The three-dimensional code developed in the present study adopted three-order MUSCL and central difference schemes, AUSMPW + flux vector splitting method, and second-order moment turbulence-chemistry model, together with k-ω shear stress transport (SST) turbulence model. The solid fuel surface temperature was calculated with fluid-solid heat coupling method. The numerical results show that strong circumferential flow exists in the region upstream of the diaphragm. The diaphragm can enhance the regression rate of the solid fuel in the region downstream of the diaphragm significantly, which mainly results from the increase of turbulent viscosity. As the diaphragm port area decreases, the regression rate of the solid fuel downstream of the diaphragm increases. The diaphragm can result in more sufficient mixing between the incoming air and fuel pyrolysis gases, while inevitably producing some pressure loss. The experimental results indicate that the effect of the diaphragm on the combustion efficiency of hydrocarbon fuels is slightly negative. It is conjectured that the diaphragm may have some positive effects on the combustion efficiency of the solid fuel with metal particles.
The measurement of linear frequency drift in oscillators
NASA Astrophysics Data System (ADS)
Barnes, J. A.
1985-04-01
A linear drift in frequency is an important element in most stochastic models of oscillator performance. Quartz crystal oscillators often have drifts in excess of a part in ten to the tenth power per day. Even commercial cesium beam devices often show drifts of a few parts in ten to the thirteenth per year. There are many ways to estimate the drift rates from data samples (e.g., regress the phase on a quadratic; regress the frequency on a linear; compute the simple mean of the first difference of frequency; use Kalman filters with a drift term as one element in the state vector; and others). Although most of these estimators are unbiased, they vary in efficiency (i.e., confidence intervals). Further, the estimation of confidence intervals using the standard analysis of variance (typically associated with the specific estimating technique) can give amazingly optimistic results. The source of these problems is not an error in, say, the regressions techniques, but rather the problems arise from correlations within the residuals. That is, the oscillator model is often not consistent with constraints on the analysis technique or, in other words, some specific analysis techniques are often inappropriate for the task at hand. The appropriateness of a specific analysis technique is critically dependent on the oscillator model and can often be checked with a simple whiteness test on the residuals.
NASA Astrophysics Data System (ADS)
Salmon, B. P.; Kleynhans, W.; Olivier, J. C.; van den Bergh, F.; Wessels, K. J.
2018-05-01
Humans are transforming land cover at an ever-increasing rate. Accurate geographical maps on land cover, especially rural and urban settlements are essential to planning sustainable development. Time series extracted from MODerate resolution Imaging Spectroradiometer (MODIS) land surface reflectance products have been used to differentiate land cover classes by analyzing the seasonal patterns in reflectance values. The proper fitting of a parametric model to these time series usually requires several adjustments to the regression method. To reduce the workload, a global setting of parameters is done to the regression method for a geographical area. In this work we have modified a meta-optimization approach to setting a regression method to extract the parameters on a per time series basis. The standard deviation of the model parameters and magnitude of residuals are used as scoring function. We successfully fitted a triply modulated model to the seasonal patterns of our study area using a non-linear extended Kalman filter (EKF). The approach uses temporal information which significantly reduces the processing time and storage requirements to process each time series. It also derives reliability metrics for each time series individually. The features extracted using the proposed method are classified with a support vector machine and the performance of the method is compared to the original approach on our ground truth data.
Fast metabolite identification with Input Output Kernel Regression.
Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho
2016-06-15
An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. celine.brouard@aalto.fi Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Fast metabolite identification with Input Output Kernel Regression
Brouard, Céline; Shen, Huibin; Dührkop, Kai; d'Alché-Buc, Florence; Böcker, Sebastian; Rousu, Juho
2016-01-01
Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods. Availability and implementation: Contact: celine.brouard@aalto.fi Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27307628
Spatial Autocorrelation Approaches to Testing Residuals from Least Squares Regression
Chen, Yanguang
2016-01-01
In geo-statistics, the Durbin-Watson test is frequently employed to detect the presence of residual serial correlation from least squares regression analyses. However, the Durbin-Watson statistic is only suitable for ordered time or spatial series. If the variables comprise cross-sectional data coming from spatial random sampling, the test will be ineffectual because the value of Durbin-Watson’s statistic depends on the sequence of data points. This paper develops two new statistics for testing serial correlation of residuals from least squares regression based on spatial samples. By analogy with the new form of Moran’s index, an autocorrelation coefficient is defined with a standardized residual vector and a normalized spatial weight matrix. Then by analogy with the Durbin-Watson statistic, two types of new serial correlation indices are constructed. As a case study, the two newly presented statistics are applied to a spatial sample of 29 China’s regions. These results show that the new spatial autocorrelation models can be used to test the serial correlation of residuals from regression analysis. In practice, the new statistics can make up for the deficiencies of the Durbin-Watson test. PMID:26800271
Wang, Ying; Goh, Joshua O; Resnick, Susan M; Davatzikos, Christos
2013-01-01
In this study, we used high-dimensional pattern regression methods based on structural (gray and white matter; GM and WM) and functional (positron emission tomography of regional cerebral blood flow; PET) brain data to identify cross-sectional imaging biomarkers of cognitive performance in cognitively normal older adults from the Baltimore Longitudinal Study of Aging (BLSA). We focused on specific components of executive and memory domains known to decline with aging, including manipulation, semantic retrieval, long-term memory (LTM), and short-term memory (STM). For each imaging modality, brain regions associated with each cognitive domain were generated by adaptive regional clustering. A relevance vector machine was adopted to model the nonlinear continuous relationship between brain regions and cognitive performance, with cross-validation to select the most informative brain regions (using recursive feature elimination) as imaging biomarkers and optimize model parameters. Predicted cognitive scores using our regression algorithm based on the resulting brain regions correlated well with actual performance. Also, regression models obtained using combined GM, WM, and PET imaging modalities outperformed models based on single modalities. Imaging biomarkers related to memory performance included the orbito-frontal and medial temporal cortical regions with LTM showing stronger correlation with the temporal lobe than STM. Brain regions predicting executive performance included orbito-frontal, and occipito-temporal areas. The PET modality had higher contribution to most cognitive domains except manipulation, which had higher WM contribution from the superior longitudinal fasciculus and the genu of the corpus callosum. These findings based on machine-learning methods demonstrate the importance of combining structural and functional imaging data in understanding complex cognitive mechanisms and also their potential usage as biomarkers that predict cognitive status.
Potta, Thrimoorthy; Zhen, Zhuo; Grandhi, Taraka Sai Pavan; Christensen, Matthew D.; Ramos, James; Breneman, Curt M.; Rege, Kaushal
2014-01-01
We describe the combinatorial synthesis and cheminformatics modeling of aminoglycoside antibiotics-derived polymers for transgene delivery and expression. Fifty-six polymers were synthesized by polymerizing aminoglycosides with diglycidyl ether cross-linkers. Parallel screening resulted in identification of several lead polymers that resulted in high transgene expression levels in cells. The role of polymer physicochemical properties in determining efficacy of transgene expression was investigated using Quantitative Structure-Activity Relationship (QSAR) cheminformatics models based on Support Vector Regression (SVR) and ‘building block’ polymer structures. The QSAR model exhibited high predictive ability, and investigation of descriptors in the model, using molecular visualization and correlation plots, indicated that physicochemical attributes related to both, aminoglycosides and diglycidyl ethers facilitated transgene expression. This work synergistically combines combinatorial synthesis and parallel screening with cheminformatics-based QSAR models for discovery and physicochemical elucidation of effective antibiotics-derived polymers for transgene delivery in medicine and biotechnology. PMID:24331709
Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-12-01
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-09-01
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
NASA Astrophysics Data System (ADS)
Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-12-01
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
NASA Astrophysics Data System (ADS)
Schroeter, Timon Sebastian; Schwaighofer, Anton; Mika, Sebastian; Ter Laak, Antonius; Suelzle, Detlev; Ganzer, Ursula; Heinrich, Nikolaus; Müller, Klaus-Robert
2007-09-01
We investigate the use of different Machine Learning methods to construct models for aqueous solubility. Models are based on about 4000 compounds, including an in-house set of 632 drug discovery molecules of Bayer Schering Pharma. For each method, we also consider an appropriate method to obtain error bars, in order to estimate the domain of applicability (DOA) for each model. Here, we investigate error bars from a Bayesian model (Gaussian Process (GP)), an ensemble based approach (Random Forest), and approaches based on the Mahalanobis distance to training data (for Support Vector Machine and Ridge Regression models). We evaluate all approaches in terms of their prediction accuracy (in cross-validation, and on an external validation set of 536 molecules) and in how far the individual error bars can faithfully represent the actual prediction error.
Directional Characteristics of Inner Shelf Internal Tides
2007-06-01
Figure 18. YD 202-206 Current vector plot of significant events. Significant events include internal tidal bores, solibores, and solitons . The upper...Events (Bores, Solibores, and Solitons ): Upper column leading-edge cross-shore current velocity and cross-shore wind regression. The small ellipse...Significant Events (Bores, Solibores, and Solitons ): Upper column leading-edge along-shore current velocity and along-shore wind regression. The small
Multivariate models for prediction of human skin sensitization hazard.
Strickland, Judy; Zang, Qingda; Paris, Michael; Lehmann, David M; Allen, David; Choksi, Neepa; Matheson, Joanna; Jacobs, Abigail; Casey, Warren; Kleinstreuer, Nicole
2017-03-01
One of the Interagency Coordinating Committee on the Validation of Alternative Method's (ICCVAM) top priorities is the development and evaluation of non-animal approaches to identify potential skin sensitizers. The complexity of biological events necessary to produce skin sensitization suggests that no single alternative method will replace the currently accepted animal tests. ICCVAM is evaluating an integrated approach to testing and assessment based on the adverse outcome pathway for skin sensitization that uses machine learning approaches to predict human skin sensitization hazard. We combined data from three in chemico or in vitro assays - the direct peptide reactivity assay (DPRA), human cell line activation test (h-CLAT) and KeratinoSens™ assay - six physicochemical properties and an in silico read-across prediction of skin sensitization hazard into 12 variable groups. The variable groups were evaluated using two machine learning approaches, logistic regression and support vector machine, to predict human skin sensitization hazard. Models were trained on 72 substances and tested on an external set of 24 substances. The six models (three logistic regression and three support vector machine) with the highest accuracy (92%) used: (1) DPRA, h-CLAT and read-across; (2) DPRA, h-CLAT, read-across and KeratinoSens; or (3) DPRA, h-CLAT, read-across, KeratinoSens and log P. The models performed better at predicting human skin sensitization hazard than the murine local lymph node assay (accuracy 88%), any of the alternative methods alone (accuracy 63-79%) or test batteries combining data from the individual methods (accuracy 75%). These results suggest that computational methods are promising tools to identify effectively the potential human skin sensitizers without animal testing. Published 2016. This article has been contributed to by US Government employees and their work is in the public domain in the USA. Published 2016. This article has been contributed to by US Government employees and their work is in the public domain in the USA.
Detection of fraudulent financial statements using the hybrid data mining approach.
Chen, Suduan
2016-01-01
The purpose of this study is to construct a valid and rigorous fraudulent financial statement detection model. The research objects are companies which experienced both fraudulent and non-fraudulent financial statements between the years 2002 and 2013. In the first stage, two decision tree algorithms, including the classification and regression trees (CART) and the Chi squared automatic interaction detector (CHAID) are applied in the selection of major variables. The second stage combines CART, CHAID, Bayesian belief network, support vector machine and artificial neural network in order to construct fraudulent financial statement detection models. According to the results, the detection performance of the CHAID-CART model is the most effective, with an overall accuracy of 87.97 % (the FFS detection accuracy is 92.69 %).
Ließ, Mareike; Schmidt, Johannes; Glaser, Bruno
2016-01-01
Tropical forests are significant carbon sinks and their soils' carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms-including the model tuning and predictor selection-were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models' predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction.
1991-09-01
matrix, the Regression Sum of Squares (SSR) and Error Sum of Squares (SSE) are also displayed as a percentage of the Total Sum of Squares ( SSTO ...vector when the student compares the SSR to the SSE. In addition to the plot, the actual values of SSR, SSE, and SSTO are also provided. Figure 3 gives the...Es ainSpace = E 3 Error- Eor Space =n t! L . Pro~cio q Yonto Pro~rct on of Y onto the simaton, pac ror Space SSR SSEL0.20 IV = 14,1 +IErrorI 2 SSTO
Compound analysis via graph kernels incorporating chirality.
Brown, J B; Urata, Takashi; Tamura, Takeyuki; Arai, Midori A; Kawabata, Takeo; Akutsu, Tatsuya
2010-12-01
High accuracy is paramount when predicting biochemical characteristics using Quantitative Structural-Property Relationships (QSPRs). Although existing graph-theoretic kernel methods combined with machine learning techniques are efficient for QSPR model construction, they cannot distinguish topologically identical chiral compounds which often exhibit different biological characteristics. In this paper, we propose a new method that extends the recently developed tree pattern graph kernel to accommodate stereoisomers. We show that Support Vector Regression (SVR) with a chiral graph kernel is useful for target property prediction by demonstrating its application to a set of human vitamin D receptor ligands currently under consideration for their potential anti-cancer effects.
2003-04-01
any of the P interfering sources, and Hkt i (1) (P)] T is defined below. The P-variate vector = t kt , • t J consists of complex waveforms radiated by...line. More precisely, the (i, j ) t element of the matrix Hke is a complex 4-4 coefficient which is practically constant over the kth PRI, and is a...multivariate auto-regressive (AR) model of order n: Ykt + Z Bj Yk- j , t = tkt (25) j =l In the above equation, Bj are the M-variate matrices which are the
Multi-pose facial correction based on Gaussian process with combined kernel function
NASA Astrophysics Data System (ADS)
Shi, Shuyan; Ji, Ruirui; Zhang, Fan
2018-04-01
In order to improve the recognition rate of various postures, this paper proposes a method of facial correction based on Gaussian Process which build a nonlinear regression model between the front and the side face with combined kernel function. The face images with horizontal angle from -45° to +45° can be properly corrected to front faces. Finally, Support Vector Machine is employed for face recognition. Experiments on CAS PEAL R1 face database show that Gaussian process can weaken the influence of pose changes and improve the accuracy of face recognition to certain extent.
De Carli, Margherita M; Baccarelli, Andrea A; Trevisi, Letizia; Pantic, Ivan; Brennan, Kasey JM; Hacker, Michele R; Loudon, Holly; Brunst, Kelly J; Wright, Robert O; Wright, Rosalind J; Just, Allan C
2017-01-01
Aim: We compared predictive modeling approaches to estimate placental methylation using cord blood methylation. Materials & methods: We performed locus-specific methylation prediction using both linear regression and support vector machine models with 174 matched pairs of 450k arrays. Results: At most CpG sites, both approaches gave poor predictions in spite of a misleading improvement in array-wide correlation. CpG islands and gene promoters, but not enhancers, were the genomic contexts where the correlation between measured and predicted placental methylation levels achieved higher values. We provide a list of 714 sites where both models achieved an R2 ≥0.75. Conclusion: The present study indicates the need for caution in interpreting cross-tissue predictions. Few methylation sites can be predicted between cord blood and placenta. PMID:28234020
Estimation of mechanical properties of nanomaterials using artificial intelligence methods
NASA Astrophysics Data System (ADS)
Vijayaraghavan, V.; Garg, A.; Wong, C. H.; Tai, K.
2014-09-01
Computational modeling tools such as molecular dynamics (MD), ab initio, finite element modeling or continuum mechanics models have been extensively applied to study the properties of carbon nanotubes (CNTs) based on given input variables such as temperature, geometry and defects. Artificial intelligence techniques can be used to further complement the application of numerical methods in characterizing the properties of CNTs. In this paper, we have introduced the application of multi-gene genetic programming (MGGP) and support vector regression to formulate the mathematical relationship between the compressive strength of CNTs and input variables such as temperature and diameter. The predictions of compressive strength of CNTs made by these models are compared to those generated using MD simulations. The results indicate that MGGP method can be deployed as a powerful method for predicting the compressive strength of the carbon nanotubes.
Rees, Robert C; McArdle, Stephanie; Mian, Shahid; Li, Geng; Ahmad, Murrium; Parkinson, Richard; Ali, Selman A
2002-02-01
Disabled infectious single cycle-herpes simplex viruses (DISC-HSV) have been shown to be safe for use in humans and may be considered efficacious as vectors for immunogene therapy in cancer. Preclinical studies show that DISC-HSV is an efficient delivery system for cytokine genes and antigens. DISC-HSV infects a high proportion of cells, resulting in rapid gene expression for at least 72 h. The DISC-HSV-mGM-CSF vector, when inoculated into tumors, induces tumor regression in a high percentage of animals, concomitant with establishing a cytotoxic T-cell response, which is MHC class I restricted and directed against peptides of known tumor antigens. The inherent properties of DISC-HSV makes it a suitable vector for consideration in human immunogene therapy trials.
Likhvantseva, V G; Sokolov, V A; Levanova, O N; Kovelenova, I V
2018-01-01
Prediction of the clinical course of primary open-angle glaucoma (POAG) is one of the main directions in solving the problem of vision loss prevention and stabilization of the pathological process. Simple statistical methods of correlation analysis show the extent of each risk factor's impact, but do not indicate the total impact of these factors in personalized combinations. The relationships between the risk factors is subject to correlation and regression analysis. The regression equation represents the dependence of the mathematical expectation of the resulting sign on the combination of factor signs. To develop a technique for predicting the probability of development and progression of primary open-angle glaucoma based on a personalized combination of risk factors by linear multivariate regression analysis. The study included 66 patients (23 female and 43 male; 132 eyes) with newly diagnosed primary open-angle glaucoma. The control group consisted of 14 patients (8 male and 6 female). Standard ophthalmic examination was supplemented with biochemical study of lacrimal fluid. Concentration of matrix metalloproteinase MMP-2 and MMP-9 in tear fluid in both eyes was determined using 'sandwich' enzyme-linked immunosorbent assay (ELISA) method. The study resulted in the development of regression equations and step-by-step multivariate logistic models that can help calculate the risk of development and progression of POAG. Those models are based on expert evaluation of clinical and instrumental indicators of hydrodynamic disturbances (coefficient of outflow ease - C, volume of intraocular fluid secretion - F, fluctuation of intraocular pressure), as well as personalized morphometric parameters of the retina (central retinal thickness in the macular area) and concentration of MMP-2 and MMP-9 in the tear film. The newly developed regression equations are highly informative and can be a reliable tool for studying of the influence vector and assessment of pathogenic potential of the independent risk factors in specific personalized combinations.
NASA Astrophysics Data System (ADS)
Rodrigues, João Fabrício Mota; Coelho, Marco Túlio Pacheco; Ribeiro, Bruno R.
2018-04-01
Species distribution models (SDM) have been broadly used in ecology to address theoretical and practical problems. Currently, there are two main approaches to generate SDMs: (i) correlative, which is based on species occurrences and environmental predictor layers and (ii) process-based models, which are constructed based on species' functional traits and physiological tolerances. The distributions estimated by each approach are based on different components of species niche. Predictions of correlative models approach species realized niches, while predictions of process-based are more akin to species fundamental niche. Here, we integrated the predictions of fundamental and realized distributions of the freshwater turtle Trachemys dorbigni. Fundamental distribution was estimated using data of T. dorbigni's egg incubation temperature, and realized distribution was estimated using species occurrence records. Both types of distributions were estimated using the same regression approaches (logistic regression and support vector machines), both considering macroclimatic and microclimatic temperatures. The realized distribution of T. dorbigni was generally nested in its fundamental distribution reinforcing theoretical assumptions that the species' realized niche is a subset of its fundamental niche. Both modelling algorithms produced similar results but microtemperature generated better results than macrotemperature for the incubation model. Finally, our results reinforce the conclusion that species realized distributions are constrained by other factors other than just thermal tolerances.
Nematollahi, M; Akbari, R; Nikeghbalian, S; Salehnasab, C
2017-01-01
Kidney transplantation is the treatment of choice for patients with end-stage renal disease (ESRD). Prediction of the transplant survival is of paramount importance. The objective of this study was to develop a model for predicting survival in kidney transplant recipients. In a cross-sectional study, 717 patients with ESRD admitted to Nemazee Hospital during 2008-2012 for renal transplantation were studied and the transplant survival was predicted for 5 years. The multilayer perceptron of artificial neural networks (MLP-ANN), logistic regression (LR), Support Vector Machine (SVM), and evaluation tools were used to verify the determinant models of the predictions and determine the independent predictors. The accuracy, area under curve (AUC), sensitivity, and specificity of SVM, MLP-ANN, and LR models were 90.4%, 86.5%, 98.2%, and 49.6%; 85.9%, 76.9%, 97.3%, and 26.1%; and 84.7%, 77.4%, 97.5%, and 17.4%, respectively. Meanwhile, the independent predictors were discharge time creatinine level, recipient age, donor age, donor blood group, cause of ESRD, recipient hypertension after transplantation, and duration of dialysis before transplantation. SVM and MLP-ANN models could efficiently be used for determining survival prediction in kidney transplant recipients.
The Mantel-Haenszel procedure revisited: models and generalizations.
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented.
The Mantel-Haenszel Procedure Revisited: Models and Generalizations
Fidler, Vaclav; Nagelkerke, Nico
2013-01-01
Several statistical methods have been developed for adjusting the Odds Ratio of the relation between two dichotomous variables X and Y for some confounders Z. With the exception of the Mantel-Haenszel method, commonly used methods, notably binary logistic regression, are not symmetrical in X and Y. The classical Mantel-Haenszel method however only works for confounders with a limited number of discrete strata, which limits its utility, and appears to have no basis in statistical models. Here we revisit the Mantel-Haenszel method and propose an extension to continuous and vector valued Z. The idea is to replace the observed cell entries in strata of the Mantel-Haenszel procedure by subject specific classification probabilities for the four possible values of (X,Y) predicted by a suitable statistical model. For situations where X and Y can be treated symmetrically we propose and explore the multinomial logistic model. Under the homogeneity hypothesis, which states that the odds ratio does not depend on Z, the logarithm of the odds ratio estimator can be expressed as a simple linear combination of three parameters of this model. Methods for testing the homogeneity hypothesis are proposed. The relationship between this method and binary logistic regression is explored. A numerical example using survey data is presented. PMID:23516463
Ecological Modeling of Aedes aegypti (L.) Pupal Production in Rural Kamphaeng Phet, Thailand
Aldstadt, Jared; Koenraadt, Constantianus J. M.; Fansiri, Thanyalak; Kijchalao, Udom; Richardson, Jason; Jones, James W.; Scott, Thomas W.
2011-01-01
Background Aedes aegypti (L.) is the primary vector of dengue, the most important arboviral infection globally. Until an effective vaccine is licensed and rigorously administered, Ae. aegypti control remains the principal tool in preventing and curtailing dengue transmission. Accurate predictions of vector populations are required to assess control methods and develop effective population reduction strategies. Ae. aegypti develops primarily in artificial water holding containers. Release recapture studies indicate that most adult Ae. aegypti do not disperse over long distances. We expect, therefore, that containers in an area of high development site density are more likely to be oviposition sites and to be more frequently used as oviposition sites than containers that are relatively isolated from other development sites. After accounting for individual container characteristics, containers more frequently used as oviposition sites are likely to produce adult mosquitoes consistently and at a higher rate. To this point, most studies of Ae. aegypti populations ignore the spatial density of larval development sites. Methodology Pupal surveys were carried out from 2004 to 2007 in rural Kamphaeng Phet, Thailand. In total, 84,840 samples of water holding containers were used to estimate model parameters. Regression modeling was used to assess the effect of larval development site density, access to piped water, and seasonal variation on container productivity. A varying-coefficients model was employed to account for the large differences in productivity between container types. A two-part modeling structure, called a hurdle model, accounts for the large number of zeroes and overdispersion present in pupal population counts. Findings The number of suitable larval development sites and their density in the environment were the primary determinants of the distribution and abundance of Ae. aegypti pupae. The productivity of most container types increased significantly as habitat density increased. An ecological approach, accounting for development site density, is appropriate for predicting Ae. aegypti population levels and developing efficient vector control programs. PMID:21267055
Predicting the dissolution kinetics of silicate glasses using machine learning
NASA Astrophysics Data System (ADS)
Anoop Krishnan, N. M.; Mangalathu, Sujith; Smedskjaer, Morten M.; Tandia, Adama; Burton, Henry; Bauchy, Mathieu
2018-05-01
Predicting the dissolution rates of silicate glasses in aqueous conditions is a complex task as the underlying mechanism(s) remain poorly understood and the dissolution kinetics can depend on a large number of intrinsic and extrinsic factors. Here, we assess the potential of data-driven models based on machine learning to predict the dissolution rates of various aluminosilicate glasses exposed to a wide range of solution pH values, from acidic to caustic conditions. Four classes of machine learning methods are investigated, namely, linear regression, support vector machine regression, random forest, and artificial neural network. We observe that, although linear methods all fail to describe the dissolution kinetics, the artificial neural network approach offers excellent predictions, thanks to its inherent ability to handle non-linear data. Overall, we suggest that a more extensive use of machine learning approaches could significantly accelerate the design of novel glasses with tailored properties.
NASA Astrophysics Data System (ADS)
Xu, Chao; Zhou, Dongxiang; Zhai, Yongping; Liu, Yunhui
2015-12-01
This paper realizes the automatic segmentation and classification of Mycobacterium tuberculosis with conventional light microscopy. First, the candidate bacillus objects are segmented by the marker-based watershed transform. The markers are obtained by an adaptive threshold segmentation based on the adaptive scale Gaussian filter. The scale of the Gaussian filter is determined according to the color model of the bacillus objects. Then the candidate objects are extracted integrally after region merging and contaminations elimination. Second, the shape features of the bacillus objects are characterized by the Hu moments, compactness, eccentricity, and roughness, which are used to classify the single, touching and non-bacillus objects. We evaluated the logistic regression, random forest, and intersection kernel support vector machines classifiers in classifying the bacillus objects respectively. Experimental results demonstrate that the proposed method yields to high robustness and accuracy. The logistic regression classifier performs best with an accuracy of 91.68%.
Modeling habitat and environmental factors affecting mosquito abundance in Chesapeake, Virginia
NASA Astrophysics Data System (ADS)
Bellows, Alan Scott
The models I present in this dissertation were designed to enable mosquito control agencies in the mid-Atlantic region that oversee large jurisdictions to rapidly track the spatial and temporal distributions of mosquito species, especially those species known to be vectors of eastern equine encephalitis and West Nile virus. I was able to keep these models streamlined, user-friendly, and not cost-prohibitive using empirically based digital data to analyze mosquito-abundance patterns in real landscapes. This research is presented in three major chapters: (II) a series of semi-static habitat suitability indices (HSI) grounded on well-documented associations between mosquito abundance and environmental variables, (III) a dynamic model for predicting both spatial and temporal mosquito abundance based on a topographic soil moisture index and recent weather patterns, and (IV) a set of protocols laid out to aid mosquito control agencies for the use of these models. The HSIs (Chapter II) were based on relationships of mosquitoes to digital surrogates of soil moisture and vegetation characteristics. These models grouped mosquitoes species derived from similarities in habitat requirements, life-cycle type, and vector competence. Quantification of relationships was determined using multiple linear regression models. As in Chapter II, relationships between mosquito abundance and environmental factors in Chapter III were quantified using regression models. However, because this model was, in part, a function of changes in weather patterns, it enables the prediction of both 'where' and 'when' mosquito outbreaks are likely to occur. This model is distinctive among similar studies in the literature because of my use of NOAA's NEXRAD Doppler radar (3-hr precipitation accumulation data) to quantify the spatial and temporal distributions in precipitation accumulation. \\ Chapter IV is unique among the chapters in this dissertation because in lieu of presenting new research, it summarizes the preprocessing steps and analyses used in the HSIs and the dynamic, weather-based, model generated in Chapters II and III. The purpose of this chapter is to provide the reader and potential users with the necessary protocols for modeling the spatial and temporal abundances and distributions of mosquitoes, with emphasis on Culiseta melanura, in a real-world landscape of the mid-Atlantic region. This chapter also provides enhancements that could easily be incorporated into an environmentally sensitive integrated pest management program.
A biometeorological model of an encephalitis vector
NASA Astrophysics Data System (ADS)
Raddatz, R. L.
1986-01-01
Multiple linear regression techniques and seven years of data were used to build a biometeorological model of Winnipeg's mean daily levels of Culex tarsalis Coquillett. An eighth year of data was used to test the model. Hydrologic accounting of precipitation, evapotranspiration and runoff provided estimates of wetness while the warmness of the season was gauged in terms of the average temperature difference from normal and a threshold antecedent temperature regime. These factors were found to be highly correlated with the time-series of Cx. tarsalis counts. The impact of mosquito adulticiding measures was included in the model via a control effectiveness parameter. An activity-level adjustment, based on mean daily temperatures, was also made to the counts. This model can, by monitoring the weather, provide forecasts of Cx. tarsalis populations for Winnipeg with a lead-time of three weeks, thereby, contributing to an early warning of an impending Western Equine Encephalitis outbreak.
Predicting perceptual quality of images in realistic scenario using deep filter banks
NASA Astrophysics Data System (ADS)
Zhang, Weixia; Yan, Jia; Hu, Shiyong; Ma, Yang; Deng, Dexiang
2018-03-01
Classical image perceptual quality assessment models usually resort to natural scene statistic methods, which are based on an assumption that certain reliable statistical regularities hold on undistorted images and will be corrupted by introduced distortions. However, these models usually fail to accurately predict degradation severity of images in realistic scenarios since complex, multiple, and interactive authentic distortions usually appear on them. We propose a quality prediction model based on convolutional neural network. Quality-aware features extracted from filter banks of multiple convolutional layers are aggregated into the image representation. Furthermore, an easy-to-implement and effective feature selection strategy is used to further refine the image representation and finally a linear support vector regression model is trained to map image representation into images' subjective perceptual quality scores. The experimental results on benchmark databases present the effectiveness and generalizability of the proposed model.
Schmidt, Johannes; Glaser, Bruno
2016-01-01
Tropical forests are significant carbon sinks and their soils’ carbon storage potential is immense. However, little is known about the soil organic carbon (SOC) stocks of tropical mountain areas whose complex soil-landscape and difficult accessibility pose a challenge to spatial analysis. The choice of methodology for spatial prediction is of high importance to improve the expected poor model results in case of low predictor-response correlations. Four aspects were considered to improve model performance in predicting SOC stocks of the organic layer of a tropical mountain forest landscape: Different spatial predictor settings, predictor selection strategies, various machine learning algorithms and model tuning. Five machine learning algorithms: random forests, artificial neural networks, multivariate adaptive regression splines, boosted regression trees and support vector machines were trained and tuned to predict SOC stocks from predictors derived from a digital elevation model and satellite image. Topographical predictors were calculated with a GIS search radius of 45 to 615 m. Finally, three predictor selection strategies were applied to the total set of 236 predictors. All machine learning algorithms—including the model tuning and predictor selection—were compared via five repetitions of a tenfold cross-validation. The boosted regression tree algorithm resulted in the overall best model. SOC stocks ranged between 0.2 to 17.7 kg m-2, displaying a huge variability with diffuse insolation and curvatures of different scale guiding the spatial pattern. Predictor selection and model tuning improved the models’ predictive performance in all five machine learning algorithms. The rather low number of selected predictors favours forward compared to backward selection procedures. Choosing predictors due to their indiviual performance was vanquished by the two procedures which accounted for predictor interaction. PMID:27128736
NASA Astrophysics Data System (ADS)
Bellugi, D. G.; Tennant, C.; Larsen, L.
2016-12-01
Catchment and climate heterogeneity complicate prediction of runoff across time and space, and resulting parameter uncertainty can lead to large accumulated errors in hydrologic models, particularly in ungauged basins. Recently, data-driven modeling approaches have been shown to avoid the accumulated uncertainty associated with many physically-based models, providing an appealing alternative for hydrologic prediction. However, the effectiveness of different methods in hydrologically and geomorphically distinct catchments, and the robustness of these methods to changing climate and changing hydrologic processes remain to be tested. Here, we evaluate the use of machine learning techniques to predict daily runoff across time and space using only essential climatic forcing (e.g. precipitation, temperature, and potential evapotranspiration) time series as model input. Model training and testing was done using a high quality dataset of daily runoff and climate forcing data for 25+ years for 600+ minimally-disturbed catchments (drainage area range 5-25,000 km2, median size 336 km2) that cover a wide range of climatic and physical characteristics. Preliminary results using Support Vector Regression (SVR) suggest that in some catchments this nonlinear-based regression technique can accurately predict daily runoff, while the same approach fails in other catchments, indicating that the representation of climate inputs and/or catchment filter characteristics in the model structure need further refinement to increase performance. We bolster this analysis by using Sparse Identification of Nonlinear Dynamics (a sparse symbolic regression technique) to uncover the governing equations that describe runoff processes in catchments where SVR performed well and for ones where it performed poorly, thereby enabling inference about governing processes. This provides a robust means of examining how catchment complexity influences runoff prediction skill, and represents a contribution towards the integration of data-driven inference and physically-based models.
Zhang, Ni; Liu, Xu; Jin, Xiaoduo; Li, Chen; Wu, Xuan; Yang, Shuqin; Ning, Jifeng; Yanne, Paul
2017-12-15
Phenolics contents in wine grapes are key indicators for assessing ripeness. Near-infrared hyperspectral images during ripening have been explored to achieve an effective method for predicting phenolics contents. Principal component regression (PCR), partial least squares regression (PLSR) and support vector regression (SVR) models were built, respectively. The results show that SVR behaves globally better than PLSR and PCR, except in predicting tannins content of seeds. For the best prediction results, the squared correlation coefficient and root mean square error reached 0.8960 and 0.1069g/L (+)-catechin equivalents (CE), respectively, for tannins in skins, 0.9065 and 0.1776 (g/L CE) for total iron-reactive phenolics (TIRP) in skins, 0.8789 and 0.1442 (g/L M3G) for anthocyanins in skins, 0.9243 and 0.2401 (g/L CE) for tannins in seeds, and 0.8790 and 0.5190 (g/L CE) for TIRP in seeds. Our results indicated that NIR hyperspectral imaging has good prospects for evaluation of phenolics in wine grapes. Copyright © 2017 Elsevier Ltd. All rights reserved.
Williams, C.J.; Heglund, P.J.
2009-01-01
Habitat association models are commonly developed for individual animal species using generalized linear modeling methods such as logistic regression. We considered the issue of grouping species based on their habitat use so that management decisions can be based on sets of species rather than individual species. This research was motivated by a study of western landbirds in northern Idaho forests. The method we examined was to separately fit models to each species and to use a generalized Mahalanobis distance between coefficient vectors to create a distance matrix among species. Clustering methods were used to group species from the distance matrix, and multidimensional scaling methods were used to visualize the relations among species groups. Methods were also discussed for evaluating the sensitivity of the conclusions because of outliers or influential data points. We illustrate these methods with data from the landbird study conducted in northern Idaho. Simulation results are presented to compare the success of this method to alternative methods using Euclidean distance between coefficient vectors and to methods that do not use habitat association models. These simulations demonstrate that our Mahalanobis-distance- based method was nearly always better than Euclidean-distance-based methods or methods not based on habitat association models. The methods used to develop candidate species groups are easily explained to other scientists and resource managers since they mainly rely on classical multivariate statistical methods. ?? 2008 Springer Science+Business Media, LLC.
Binary dislocation junction formation and strength in hexagonal close-packed crystals
Wu, Chi -Chin; Aubry, Sylvie; Arsenlis, Athanasios; ...
2015-12-17
This work examines binary dislocation interactions, junction formation and junction strengths in hexagonal close-packed ( hcp ) crystals. Through a line-tension model and dislocation dynamics (DD) simulations, the interaction and dissociation of different sets of binary junctions are investigated involving one dislocation on the (011¯0) prismatic plane and a second dislocation on one of the following planes: (0001) basal, (11¯00) prismatic, (11¯01) primary pyramidal, or (2¯112) secondary pyramidal. Varying pairs of Burgers vectors are chosen from among the common types the basal type < a > 1/3 < 112¯0 >, prismatic type < c > <0001>, and pyramidal type
Age- and bite-structured models for vector-borne diseases.
Rock, K S; Wood, D A; Keeling, M J
2015-09-01
The biology and behaviour of biting insects is a vitally important aspect in the spread of vector-borne diseases. This paper aims to determine, through the use of mathematical models, what effect incorporating vector senescence and realistic feeding patterns has on disease. A novel model is developed to enable the effects of age- and bite-structure to be examined in detail. This original PDE framework extends previous age-structured models into a further dimension to give a new insight into the role of vector biting and its interaction with vector mortality and spread of disease. Through the PDE model, the roles of the vector death and bite rates are examined in a way which is impossible under the traditional ODE formulation. It is demonstrated that incorporating more realistic functions for vector biting and mortality in a model may give rise to different dynamics than those seen under a more simple ODE formulation. The numerical results indicate that the efficacy of control methods that increase vector mortality may not be as great as predicted under a standard host-vector model, whereas other controls including treatment of humans may be more effective than previously thought. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Prediction of Nursing Workload in Hospital.
Fiebig, Madlen; Hunstein, Dirk; Bartholomeyczik, Sabine
2018-01-01
A dissertation project at the Witten/Herdecke University [1] is investigating which (nursing sensitive) patient characteristics are suitable for predicting a higher or lower degree of nursing workload. For this research project four predictive modelling methods were selected. In a first step, SUPPORT VECTOR MACHINE, RANDOM FOREST, and GRADIENT BOOSTING were used to identify potential predictors from the nursing sensitive patient characteristics. The results were compared via FEATURE IMPORTANCE. To predict nursing workload the predictors identified in step 1 were modelled using MULTINOMIAL LOGISTIC REGRESSION. First results from the data mining process will be presented. A prognostic determination of nursing workload can be used not only as a basis for human resource planning in hospital, but also to respond to health policy issues.
Predict the fatigue life of crack based on extended finite element method and SVR
NASA Astrophysics Data System (ADS)
Song, Weizhen; Jiang, Zhansi; Jiang, Hui
2018-05-01
Using extended finite element method (XFEM) and support vector regression (SVR) to predict the fatigue life of plate crack. Firstly, the XFEM is employed to calculate the stress intensity factors (SIFs) with given crack sizes. Then predicetion model can be built based on the function relationship of the SIFs with the fatigue life or crack length. Finally, according to the prediction model predict the SIFs at different crack sizes or different cycles. Because of the accuracy of the forward Euler method only ensured by the small step size, a new prediction method is presented to resolve the issue. The numerical examples were studied to demonstrate the proposed method allow a larger step size and have a high accuracy.
Large Animal Models for Foamy Virus Vector Gene Therapy
Trobridge, Grant D.; Horn, Peter A.; Beard, Brian C.; Kiem, Hans-Peter
2012-01-01
Foamy virus (FV) vectors have shown great promise for hematopoietic stem cell (HSC) gene therapy. Their ability to efficiently deliver transgenes to multi-lineage long-term repopulating cells in large animal models suggests they will be effective for several human hematopoietic diseases. Here, we review FV vector studies in large animal models, including the use of FV vectors with the mutant O6-methylguanine-DNA methyltransferase, MGMTP140K to increase the number of genetically modified cells after transplantation. In these studies, FV vectors have mediated efficient gene transfer to polyclonal repopulating cells using short ex vivo transduction protocols designed to minimize the negative effects of ex vivo culture on stem cell engraftment. In this regard, FV vectors appear superior to gammaretroviral vectors, which require longer ex vivo culture to effect efficient transduction. FV vectors have also compared favorably with lentiviral vectors when directly compared in the dog model. FV vectors have corrected leukocyte adhesion deficiency and pyruvate kinase deficiency in the dog large animal model. FV vectors also appear safer than gammaretroviral vectors based on a reduced frequency of integrants near promoters and also near proto-oncogenes in canine repopulating cells. Together, these studies suggest that FV vectors should be highly effective for several human hematopoietic diseases, including those that will require relatively high percentages of gene-modified cells to achieve clinical benefit. PMID:23223198
Assessing LULC changes over Chilika Lake watershed in Eastern India using Driving Force Analysis
NASA Astrophysics Data System (ADS)
Jadav, S.; Syed, T. H.
2017-12-01
Rapid population growth and industrial development has brought about significant changes in Land Use Land Cover (LULC) of many developing countries in the world. This study investigates LULC changes in the Chilika Lake watershed of Eastern India for the period of 1988 to 2016. The methodology involves pre-processing and classification of Landsat satellite images using support vector machine (SVM) supervised classification algorithm. Results reveal that `Cropland', `Emergent Vegetation' and `Settlement' has expanded over the study period by 284.61 km², 106.83 km² and 98.83 km² respectively. Contemporaneously, `Lake Area', `Vegetation' and `Scrub Land' have decreased by 121.62 km², 96.05 km² and 80.29 km² respectively. This study also analyzes five major driving force variables of socio-economic and climatological factors triggering LULC changes through a bivariate logistic regression model. The outcome gives credible relative operating characteristics (ROC) value of 0.76 that indicate goodness fit of logistic regression model. In addition, independent variables like distance to drainage network and average annual rainfall have negative regression coefficient values that represent decreased rate of dependent variable (changed LULC) whereas independent variables (population density, distance to road and distance to railway) have positive regression coefficient indicates increased rate of changed LULC . Results from this study will be crucial for planning and restoration of this vital lake water body that has major implications over the society and environment at large.
Data mining: Potential applications in research on nutrition and health.
Batterham, Marijka; Neale, Elizabeth; Martin, Allison; Tapsell, Linda
2017-02-01
Data mining enables further insights from nutrition-related research, but caution is required. The aim of this analysis was to demonstrate and compare the utility of data mining methods in classifying a categorical outcome derived from a nutrition-related intervention. Baseline data (23 variables, 8 categorical) on participants (n = 295) in an intervention trial were used to classify participants in terms of meeting the criteria of achieving 10 000 steps per day. Results from classification and regression trees (CARTs), random forests, adaptive boosting, logistic regression, support vector machines and neural networks were compared using area under the curve (AUC) and error assessments. The CART produced the best model when considering the AUC (0.703), overall error (18%) and within class error (28%). Logistic regression also performed reasonably well compared to the other models (AUC 0.675, overall error 23%, within class error 36%). All the methods gave different rankings of variables' importance. CART found that body fat, quality of life using the SF-12 Physical Component Summary (PCS) and the cholesterol: HDL ratio were the most important predictors of meeting the 10 000 steps criteria, while logistic regression showed the SF-12PCS, glucose levels and level of education to be the most significant predictors (P ≤ 0.01). Differing outcomes suggest caution is required with a single data mining method, particularly in a dataset with nonlinear relationships and outliers and when exploring relationships that were not the primary outcomes of the research. © 2017 Dietitians Association of Australia.
A diagram for evaluating multiple aspects of model performance in simulating vector fields
NASA Astrophysics Data System (ADS)
Xu, Zhongfeng; Hou, Zhaolu; Han, Ying; Guo, Weidong
2016-12-01
Vector quantities, e.g., vector winds, play an extremely important role in climate systems. The energy and water exchanges between different regions are strongly dominated by wind, which in turn shapes the regional climate. Thus, how well climate models can simulate vector fields directly affects model performance in reproducing the nature of a regional climate. This paper devises a new diagram, termed the vector field evaluation (VFE) diagram, which is a generalized Taylor diagram and able to provide a concise evaluation of model performance in simulating vector fields. The diagram can measure how well two vector fields match each other in terms of three statistical variables, i.e., the vector similarity coefficient, root mean square length (RMSL), and root mean square vector difference (RMSVD). Similar to the Taylor diagram, the VFE diagram is especially useful for evaluating climate models. The pattern similarity of two vector fields is measured by a vector similarity coefficient (VSC) that is defined by the arithmetic mean of the inner product of normalized vector pairs. Examples are provided, showing that VSC can identify how close one vector field resembles another. Note that VSC can only describe the pattern similarity, and it does not reflect the systematic difference in the mean vector length between two vector fields. To measure the vector length, RMSL is included in the diagram. The third variable, RMSVD, is used to identify the magnitude of the overall difference between two vector fields. Examples show that the VFE diagram can clearly illustrate the extent to which the overall RMSVD is attributed to the systematic difference in RMSL and how much is due to the poor pattern similarity.
Tsafa, Effrosyni; Al-Bahrani, Mariam; Bentayebi, Kaoutar; Przystal, Justyna; Suwan, Keittisak; Hajitou, Amin
2016-01-01
Gene therapy has long been regarded as a promising treatment for cancer. However, cancer gene therapy is still facing the challenge of targeting gene delivery vectors specifically to tumors when administered via clinically acceptable non-invasive systemic routes (i.e. intravenous). The bacteria virus, bacteriophage (phage), represents a new generation of promising vectors in systemic gene delivery since their targeting can be achieved through phage capsid display ligands, which enable them to home to specific tumor receptors without the need to ablate any native eukaryotic tropism. We have previously reported a tumor specific bacteriophage vector named adeno-associated virus/phage, or AAVP, in which gene expression is under a recombinant human rAAV2 virus genome targeted to tumors via a ligand-directed phage capsid. However, cancer gene therapy with this tumor-targeted vector achieved variable outcomes ranging from tumor regression to no effect in both experimental and natural preclinical models. Herein, we hypothesized that combining the natural dietary genistein, with proven anticancer activity, would improve bacteriophage anticancer safe therapy. We show that combination treatment with genistein and AAVP increased targeted cancer cell killing by AAVP carrying the gene for Herpes simplex virus thymidine kinase (HSVtk) in 2D tissue cultures and 3D tumor spheroids. We found this increased tumor cell killing was associated with enhanced AAVP-mediated gene expression. Next, we established that genistein protects AAVP against proteasome degradation and enhances vector genome accumulation in the nucleus. Combination of genistein and phage-guided virotherapy is a safe and promising strategy that should be considered in anticancer therapy with AAVP. PMID:27437775
Monthly evaporation forecasting using artificial neural networks and support vector machines
NASA Astrophysics Data System (ADS)
Tezel, Gulay; Buyukyildiz, Meral
2016-04-01
Evaporation is one of the most important components of the hydrological cycle, but is relatively difficult to estimate, due to its complexity, as it can be influenced by numerous factors. Estimation of evaporation is important for the design of reservoirs, especially in arid and semi-arid areas. Artificial neural network methods and support vector machines (SVM) are frequently utilized to estimate evaporation and other hydrological variables. In this study, usability of artificial neural networks (ANNs) (multilayer perceptron (MLP) and radial basis function network (RBFN)) and ɛ-support vector regression (SVR) artificial intelligence methods was investigated to estimate monthly pan evaporation. For this aim, temperature, relative humidity, wind speed, and precipitation data for the period 1972 to 2005 from Beysehir meteorology station were used as input variables while pan evaporation values were used as output. The Romanenko and Meyer method was also considered for the comparison. The results were compared with observed class A pan evaporation data. In MLP method, four different training algorithms, gradient descent with momentum and adaptive learning rule backpropagation (GDX), Levenberg-Marquardt (LVM), scaled conjugate gradient (SCG), and resilient backpropagation (RBP), were used. Also, ɛ-SVR model was used as SVR model. The models were designed via 10-fold cross-validation (CV); algorithm performance was assessed via mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R 2). According to the performance criteria, the ANN algorithms and ɛ-SVR had similar results. The ANNs and ɛ-SVR methods were found to perform better than the Romanenko and Meyer methods. Consequently, the best performance using the test data was obtained using SCG(4,2,2,1) with R 2 = 0.905.
Kim, Jongin; Park, Hyeong-jun
2016-01-01
The purpose of this study is to classify EEG data on imagined speech in a single trial. We recorded EEG data while five subjects imagined different vowels, /a/, /e/, /i/, /o/, and /u/. We divided each single trial dataset into thirty segments and extracted features (mean, variance, standard deviation, and skewness) from all segments. To reduce the dimension of the feature vector, we applied a feature selection algorithm based on the sparse regression model. These features were classified using a support vector machine with a radial basis function kernel, an extreme learning machine, and two variants of an extreme learning machine with different kernels. Because each single trial consisted of thirty segments, our algorithm decided the label of the single trial by selecting the most frequent output among the outputs of the thirty segments. As a result, we observed that the extreme learning machine and its variants achieved better classification rates than the support vector machine with a radial basis function kernel and linear discrimination analysis. Thus, our results suggested that EEG responses to imagined speech could be successfully classified in a single trial using an extreme learning machine with a radial basis function and linear kernel. This study with classification of imagined speech might contribute to the development of silent speech BCI systems. PMID:28097128
Analyzing big data with the hybrid interval regression methods.
Huang, Chia-Hui; Yang, Keng-Chieh; Kao, Han-Ying
2014-01-01
Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes.
Analyzing Big Data with the Hybrid Interval Regression Methods
Kao, Han-Ying
2014-01-01
Big data is a new trend at present, forcing the significant impacts on information technologies. In big data applications, one of the most concerned issues is dealing with large-scale data sets that often require computation resources provided by public cloud services. How to analyze big data efficiently becomes a big challenge. In this paper, we collaborate interval regression with the smooth support vector machine (SSVM) to analyze big data. Recently, the smooth support vector machine (SSVM) was proposed as an alternative of the standard SVM that has been proved more efficient than the traditional SVM in processing large-scale data. In addition the soft margin method is proposed to modify the excursion of separation margin and to be effective in the gray zone that the distribution of data becomes hard to be described and the separation margin between classes. PMID:25143968
Combination Gene Therapy for Liver Metastasis of Colon Carcinoma in vivo
NASA Astrophysics Data System (ADS)
Chen, Shu-Hsai; Chen, X. H. Li; Wang, Yibin; Kosai, Ken-Ichiro; Finegold, Milton J.; Rich, Susan S.
1995-03-01
The efficacy of combination therapy with a "suicide gene" and a cytokine gene to treat metastatic colon carcinoma in the liver was investigated. Tumor in the liver was generated by intrahepatic injection of a colon carcinoma cell line (MCA-26) in syngeneic BALB/c mice. Recombinant adenoviral vectors containing various control and therapeutic genes were injected directly into the solid tumors, followed by treatment with ganciclovir. While the tumors continued to grow in all animals treated with a control vector or a mouse interleukin 2 vector, those treated with a herpes simplex virus thymidine kinase vector, with or without the coadministration of the mouse interleukin 2 vector, exhibited dramatic necrosis and regression. However, only animals treated with both vectors developed an effective systemic antitumoral immunity against challenges of tumorigenic doses of parental tumor cells inoculated at distant sites. The antitumoral immunity was associated with the presence of MCA-26 tumor-specific cytolytic CD8^+ T lymphocytes. The results suggest that combination suicide and cytokine gene therapy in vivo can be a powerful approach for treatment of metastatic colon carcinoma in the liver.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nasehi Tehrani, J; Wang, J; McEwan, A
Purpose: In this study, we developed and evaluated a method for predicting lung surface deformation vector fields (SDVFs) based on surrogate signals such as chest and abdomen motion at selected locations and spirometry measurements. Methods: A Patient-specific 3D triangular surface mesh of the lung region at end-expiration (EE) phase was obtained by threshold-based segmentation method. For each patient, a spirometer recorded the flow volume changes of the lungs; and 192 selected points at a regular spacing of 2cm X 2cm matrix points over a total area of 34cm X 24cm on the surface of chest and abdomen was used tomore » detect chest wall motions. Preprocessing techniques such as QR factorization with column pivoting (QRCP) were employed to remove redundant observations of the chest and abdominal area. To create a statistical model between the lung surface and the corresponding surrogate signals, we developed a predictive model based on canonical ridge regression (CRR). Two unique weighting vectors were selected for each vertex on the surface of the lung, and they were optimized during the training process using the all other phases of 4D-CT except the end-inspiration (EI) phase. These parameters were employed to predict the vertices locations of a testing data set, which was the EI phase of 4D-CT. Results: For ten lung cancer patients, the deformation vector field of each vertex of lung surface mesh was estimated from the external motion at selected positions on the chest wall surface plus spirometry measurements. The average estimation of 98th percentile of error was less than 1 mm (AP= 0.85, RL= 0.61, and SI= 0.82). Conclusion: The developed predictive model provides a non-invasive approach to derive lung boundary condition. Together with personalized biomechanical respiration modelling, the proposed model can be used to derive the lung tumor motion during radiation therapy accurately from non-invasive measurements.« less
Predicting summer monsoon of Bhutan based on SST and teleconnection indices
NASA Astrophysics Data System (ADS)
Dorji, Singay; Herath, Srikantha; Mishra, Binaya Kumar; Chophel, Ugyen
2018-02-01
The paper uses a statistical method of predicting summer monsoon over Bhutan using the ocean-atmospheric circulation variables of sea surface temperature (SST), mean sea-level pressure (MSLP), and selected teleconnection indices. The predictors are selected based on the correlation. They are the SST and MSLP of the Bay of Bengal and the Arabian Sea and the MSLP of Bangladesh and northeast India. The Northern Hemisphere teleconnections of East Atlantic Pattern (EA), West Pacific Pattern (WP), Pacific/North American Pattern, and East Atlantic/West Russia Pattern (EA/WR). The rainfall station data are grouped into two regions with principal components analysis and Ward's hierarchical clustering algorithm. A support vector machine for regression model is proposed to predict the monsoon. The model shows improved skills over traditional linear regression. The model was able to predict the summer monsoon for the test data from 2011 to 2015 with a total monthly root mean squared error of 112 mm for region A and 33 mm for region B. Model could also forecast the 2016 monsoon of the South Asia Monsoon Outlook of World Meteorological Organization (WMO) for Bhutan. The reliance on agriculture and hydropower economy makes the prediction of summer monsoon highly valuable information for farmers and various other sectors. The proposed method can predict summer monsoon for operational forecasting.
2018-01-01
This paper measures the adhesion/cohesion force among asphalt molecules at nanoscale level using an Atomic Force Microscopy (AFM) and models the moisture damage by applying state-of-the-art Computational Intelligence (CI) techniques (e.g., artificial neural network (ANN), support vector regression (SVR), and an Adaptive Neuro Fuzzy Inference System (ANFIS)). Various combinations of lime and chemicals as well as dry and wet environments are used to produce different asphalt samples. The parameters that were varied to generate different asphalt samples and measure the corresponding adhesion/cohesion forces are percentage of antistripping agents (e.g., Lime and Unichem), AFM tips K values, and AFM tip types. The CI methods are trained to model the adhesion/cohesion forces given the variation in values of the above parameters. To achieve enhanced performance, the statistical methods such as average, weighted average, and regression of the outputs generated by the CI techniques are used. The experimental results show that, of the three individual CI methods, ANN can model moisture damage to lime- and chemically modified asphalt better than the other two CI techniques for both wet and dry conditions. Moreover, the ensemble of CI along with statistical measurement provides better accuracy than any of the individual CI techniques. PMID:29849551
Xu, Jian-Wu; Suzuki, Kenji
2011-01-01
Purpose: A massive-training artificial neural network (MTANN) has been developed for the reduction of false positives (FPs) in computer-aided detection (CADe) of polyps in CT colonography (CTC). A major limitation of the MTANN is the long training time. To address this issue, the authors investigated the feasibility of two state-of-the-art regression models, namely, support vector regression (SVR) and Gaussian process regression (GPR) models, in the massive-training framework and developed massive-training SVR (MTSVR) and massive-training GPR (MTGPR) for the reduction of FPs in CADe of polyps. Methods: The authors applied SVR and GPR as volume-processing techniques in the distinction of polyps from FP detections in a CTC CADe scheme. Unlike artificial neural networks (ANNs), both SVR and GPR are memory-based methods that store a part of or the entire training data for testing. Therefore, their training is generally fast and they are able to improve the efficiency of the massive-training methodology. Rooted in a maximum margin property, SVR offers excellent generalization ability and robustness to outliers. On the other hand, GPR approaches nonlinear regression from a Bayesian perspective, which produces both the optimal estimated function and the covariance associated with the estimation. Therefore, both SVR and GPR, as the state-of-the-art nonlinear regression models, are able to offer a performance comparable or potentially superior to that of ANN, with highly efficient training. Both MTSVR and MTGPR were trained directly with voxel values from CTC images. A 3D scoring method based on a 3D Gaussian weighting function was applied to the outputs of MTSVR and MTGPR for distinction between polyps and nonpolyps. To test the performance of the proposed models, the authors compared them to the original MTANN in the distinction between actual polyps and various types of FPs in terms of training time reduction and FP reduction performance. The authors’ CTC database consisted of 240 CTC data sets obtained from 120 patients in the supine and prone positions. The training set consisted of 27 patients, 10 of which had 10 polyps. The authors selected 10 nonpolyps (i.e., FP sources) from the training set. These ten polyps and ten nonpolyps were used for training the proposed models. The testing set consisted of 93 patients, including 19 polyps in 7 patients and 86 negative patients with 474 FPs produced by an original CADe scheme. Results: With the MTSVR, the training time was reduced by a factor of 190, while a FP reduction performance [by-polyp sensitivity of 94.7% (18∕19) with 2.5 (230∕93) FPs∕patient] comparable to that of the original MTANN [the same sensitivity with 2.6 (244∕93) FPs∕patient] was achieved. The classification performance in terms of the area under the receiver-operating-characteristic curve value of the MTGPR (0.82) was statistically significantly higher than that of the original MTANN (0.77), with a two-sided p-value of 0.03. The MTGPR yielded a 94.7% (18∕19) by-polyp sensitivity at a FP rate of 2.5 (235∕93) per patient and reduced the training time by a factor of 1.3. Conclusions: Both MTSVR and MTGPR improve the efficiency of the training in the massive-training framework while maintaining a comparable performance. PMID:21626922
Comparative Analysis of River Flow Modelling by Using Supervised Learning Technique
NASA Astrophysics Data System (ADS)
Ismail, Shuhaida; Mohamad Pandiahi, Siraj; Shabri, Ani; Mustapha, Aida
2018-04-01
The goal of this research is to investigate the efficiency of three supervised learning algorithms for forecasting monthly river flow of the Indus River in Pakistan, spread over 550 square miles or 1800 square kilometres. The algorithms include the Least Square Support Vector Machine (LSSVM), Artificial Neural Network (ANN) and Wavelet Regression (WR). The forecasting models predict the monthly river flow obtained from the three models individually for river flow data and the accuracy of the all models were then compared against each other. The monthly river flow of the said river has been forecasted using these three models. The obtained results were compared and statistically analysed. Then, the results of this analytical comparison showed that LSSVM model is more precise in the monthly river flow forecasting. It was found that LSSVM has he higher r with the value of 0.934 compared to other models. This indicate that LSSVM is more accurate and efficient as compared to the ANN and WR model.
Classification of sodium MRI data of cartilage using machine learning.
Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R
2015-11-01
To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
The logistic model for predicting the non-gonoactive Aedes aegypti females.
Reyes-Villanueva, Filiberto; Rodríguez-Pérez, Mario A
2004-01-01
To estimate, using logistic regression, the likelihood of occurrence of a non-gonoactive Aedes aegypti female, previously fed human blood, with relation to body size and collection method. This study was conducted in Monterrey, Mexico, between 1994 and 1996. Ten samplings of 60 mosquitoes of Ae. aegypti females were carried out in three dengue endemic areas: six of biting females, two of emerging mosquitoes, and two of indoor resting females. Gravid females, as well as those with blood in the gut were removed. Mosquitoes were taken to the laboratory and engorged on human blood. After 48 hours, ovaries were dissected to register whether they were gonoactive or non-gonoactive. Wing-length in mm was an indicator for body size. The logistic regression model was used to assess the likelihood of non-gonoactivity, as a binary variable, in relation to wing-length and collection method. Of the 600 females, 164 (27%) remained non-gonoactive, with a wing-length range of 1.9-3.2 mm, almost equal to that of all females (1.8-3.3 mm). The logistic regression model showed a significant likelihood of a female remaining non-gonoactive (Y=1). The collection method did not influence the binary response, but there was an inverse relationship between non-gonoactivity and wing-length. Dengue vector populations from Monterrey, Mexico display a wide-range body size. Logistic regression was a useful tool to estimate the likelihood for an engorged female to remain non-gonoactive. The necessity for a second blood meal is present in any female, but small mosquitoes are more likely to bite again within a 2-day interval, in order to attain egg maturation. The English version of this paper is available too at: http://www.insp.mx/salud/index.html.
Mathematical modelling of vector-borne diseases and insecticide resistance evolution.
Gabriel Kuniyoshi, Maria Laura; Pio Dos Santos, Fernando Luiz
2017-01-01
Vector-borne diseases are important public health issues and, consequently, in silico models that simulate them can be useful. The susceptible-infected-recovered (SIR) model simulates the population dynamics of an epidemic and can be easily adapted to vector-borne diseases, whereas the Hardy-Weinberg model simulates allele frequencies and can be used to study insecticide resistance evolution. The aim of the present study is to develop a coupled system that unifies both models, therefore enabling the analysis of the effects of vector population genetics on the population dynamics of an epidemic. Our model consists of an ordinary differential equation system. We considered the populations of susceptible, infected and recovered humans, as well as susceptible and infected vectors. Concerning these vectors, we considered a pair of alleles, with complete dominance interaction that determined the rate of mortality induced by insecticides. Thus, we were able to separate the vectors according to the genotype. We performed three numerical simulations of the model. In simulation one, both alleles conferred the same mortality rate values, therefore there was no resistant strain. In simulations two and three, the recessive and dominant alleles, respectively, conferred a lower mortality. Our numerical results show that the genetic composition of the vector population affects the dynamics of human diseases. We found that the absolute number of vectors and the proportion of infected vectors are smaller when there is no resistant strain, whilst the ratio of infected people is larger in the presence of insecticide-resistant vectors. The dynamics observed for infected humans in all simulations has a very similar shape to real epidemiological data. The population genetics of vectors can affect epidemiological dynamics, and the presence of insecticide-resistant strains can increase the number of infected people. Based on the present results, the model is a basis for development of other models and for investigating population dynamics.
Eckhoff, Philip A; Bever, Caitlin A; Gerardin, Jaline; Wenger, Edward A; Smith, David L
2015-08-01
Since the original Ross-Macdonald formulations of vector-borne disease transmission, there has been a broad proliferation of mathematical models of vector-borne disease, but many of these models retain most to all of the simplifying assumptions of the original formulations. Recently, there has been a new expansion of mathematical frameworks that contain explicit representations of the vector life cycle including aquatic stages, multiple vector species, host heterogeneity in biting rate, realistic vector feeding behavior, and spatial heterogeneity. In particular, there are now multiple frameworks for spatially explicit dynamics with movements of vector, host, or both. These frameworks are flexible and powerful, but require additional data to take advantage of these features. For a given question posed, utilizing a range of models with varying complexity and assumptions can provide a deeper understanding of the answers derived from models. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Dobson, Andrew D M; Auld, Stuart K J R
2016-04-01
Models used to investigate the relationship between biodiversity change and vector-borne disease risk often do not explicitly include the vector; they instead rely on a frequency-dependent transmission function to represent vector dynamics. However, differences between classes of vector (e.g., ticks and insects) can cause discrepancies in epidemiological responses to environmental change. Using a pair of disease models (mosquito- and tick-borne), we simulated substitutive and additive biodiversity change (where noncompetent hosts replaced or were added to competent hosts, respectively), while considering different relationships between vector and host densities. We found important differences between classes of vector, including an increased likelihood of amplified disease risk under additive biodiversity change in mosquito models, driven by higher vector biting rates. We also draw attention to more general phenomena, such as a negative relationship between initial infection prevalence in vectors and likelihood of dilution, and the potential for a rise in density of infected vectors to occur simultaneously with a decline in proportion of infected hosts. This has important implications; the density of infected vectors is the most valid metric for primarily zoonotic infections, while the proportion of infected hosts is more relevant for infections where humans are a primary host.
Solving large mixed linear models using preconditioned conjugate gradient iteration.
Strandén, I; Lidauer, M
1999-12-01
Continuous evaluation of dairy cattle with a random regression test-day model requires a fast solving method and algorithm. A new computing technique feasible in Jacobi and conjugate gradient based iterative methods using iteration on data is presented. In the new computing technique, the calculations in multiplication of a vector by a matrix were recorded to three steps instead of the commonly used two steps. The three-step method was implemented in a general mixed linear model program that used preconditioned conjugate gradient iteration. Performance of this program in comparison to other general solving programs was assessed via estimation of breeding values using univariate, multivariate, and random regression test-day models. Central processing unit time per iteration with the new three-step technique was, at best, one-third that needed with the old technique. Performance was best with the test-day model, which was the largest and most complex model used. The new program did well in comparison to other general software. Programs keeping the mixed model equations in random access memory required at least 20 and 435% more time to solve the univariate and multivariate animal models, respectively. Computations of the second best iteration on data took approximately three and five times longer for the animal and test-day models, respectively, than did the new program. Good performance was due to fast computing time per iteration and quick convergence to the final solutions. Use of preconditioned conjugate gradient based methods in solving large breeding value problems is supported by our findings.
Sabel, Michael S; Rice, John D; Griffith, Kent A; Lowe, Lori; Wong, Sandra L; Chang, Alfred E; Johnson, Timothy M; Taylor, Jeremy M G
2012-01-01
To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid sentinel lymph node biopsy (SLNB), several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests, and support vector machines. We sought to validate recently published models meant to predict sentinel node status. We queried our comprehensive, prospectively collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon four published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false-negative rate (FNR). Logistic regression performed comparably with our data when considering NPV (89.4 versus 93.6%); however, the model's specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsy rates that were lower (87.7 versus 94.1 and 29.8 versus 14.3, respectively). Two published models could not be applied to our data due to model complexity and the use of proprietary software. Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Statistical predictive models must be developed in a clinically applicable manner to allow for both validation and ultimately clinical utility.
Li, Lin; Xu, Shuo; An, Xin; Zhang, Lu-Da
2011-10-01
In near infrared spectral quantitative analysis, the precision of measured samples' chemical values is the theoretical limit of those of quantitative analysis with mathematical models. However, the number of samples that can obtain accurately their chemical values is few. Many models exclude the amount of samples without chemical values, and consider only these samples with chemical values when modeling sample compositions' contents. To address this problem, a semi-supervised LS-SVR (S2 LS-SVR) model is proposed on the basis of LS-SVR, which can utilize samples without chemical values as well as those with chemical values. Similar to the LS-SVR, to train this model is equivalent to solving a linear system. Finally, the samples of flue-cured tobacco were taken as experimental material, and corresponding quantitative analysis models were constructed for four sample compositions' content(total sugar, reducing sugar, total nitrogen and nicotine) with PLS regression, LS-SVR and S2 LS-SVR. For the S2 LS-SVR model, the average relative errors between actual values and predicted ones for the four sample compositions' contents are 6.62%, 7.56%, 6.11% and 8.20%, respectively, and the correlation coefficients are 0.974 1, 0.973 3, 0.923 0 and 0.948 6, respectively. Experimental results show the S2 LS-SVR model outperforms the other two, which verifies the feasibility and efficiency of the S2 LS-SVR model.
A Novel Gradient Vector Flow Snake Model Based on Convex Function for Infrared Image Segmentation
Zhang, Rui; Zhu, Shiping; Zhou, Qin
2016-01-01
Infrared image segmentation is a challenging topic because infrared images are characterized by high noise, low contrast, and weak edges. Active contour models, especially gradient vector flow, have several advantages in terms of infrared image segmentation. However, the GVF (Gradient Vector Flow) model also has some drawbacks including a dilemma between noise smoothing and weak edge protection, which decrease the effect of infrared image segmentation significantly. In order to solve this problem, we propose a novel generalized gradient vector flow snakes model combining GGVF (Generic Gradient Vector Flow) and NBGVF (Normally Biased Gradient Vector Flow) models. We also adopt a new type of coefficients setting in the form of convex function to improve the ability of protecting weak edges while smoothing noises. Experimental results and comparisons against other methods indicate that our proposed snakes model owns better ability in terms of infrared image segmentation than other snakes models. PMID:27775660
T-wave end detection using neural networks and Support Vector Machines.
Suárez-León, Alexander Alexeis; Varon, Carolina; Willems, Rik; Van Huffel, Sabine; Vázquez-Seisdedos, Carlos Román
2018-05-01
In this paper we propose a new approach for detecting the end of the T-wave in the electrocardiogram (ECG) using Neural Networks and Support Vector Machines. Both, Multilayer Perceptron (MLP) neural networks and Fixed-Size Least-Squares Support Vector Machines (FS-LSSVM) were used as regression algorithms to determine the end of the T-wave. Different strategies for selecting the training set such as random selection, k-means, robust clustering and maximum quadratic (Rényi) entropy were evaluated. Individual parameters were tuned for each method during training and the results are given for the evaluation set. A comparison between MLP and FS-LSSVM approaches was performed. Finally, a fair comparison of the FS-LSSVM method with other state-of-the-art algorithms for detecting the end of the T-wave was included. The experimental results show that FS-LSSVM approaches are more suitable as regression algorithms than MLP neural networks. Despite the small training sets used, the FS-LSSVM methods outperformed the state-of-the-art techniques. FS-LSSVM can be successfully used as a T-wave end detection algorithm in ECG even with small training set sizes. Copyright © 2018 Elsevier Ltd. All rights reserved.
Role of Aedes aegypti (Linnaeus) and Aedes albopictus (Skuse) in local dengue epidemics in Taiwan.
Tsai, Pui-Jen; Teng, Hwa-Jen
2016-11-09
Aedes mosquitoes in Taiwan mainly comprise Aedes albopictus and Ae. aegypti. However, the species contributing to autochthonous dengue spread and the extent at which it occurs remain unclear. Thus, in this study, we spatially analyzed real data to determine spatial features related to local dengue incidence and mosquito density, particularly that of Ae. albopictus and Ae. aegypti. We used bivariate Moran's I statistic and geographically weighted regression (GWR) spatial methods to analyze the globally spatial dependence and locally regressed relationship between (1) imported dengue incidences and Breteau indices (BIs) of Ae. albopictus, (2) imported dengue incidences and BI of Ae. aegypti, (3) autochthonous dengue incidences and BI of Ae. albopictus, (4) autochthonous dengue incidences and BI of Ae. aegypti, (5) all dengue incidences and BI of Ae. albopictus, (6) all dengue incidences and BI of Ae. aegypti, (7) BI of Ae. albopictus and human population density, and (8) BI of Ae. aegypti and human population density in 348 townships in Taiwan. In the GWR models, regression coefficients of spatially regressed relationships between the incidence of autochthonous dengue and vector density of Ae. aegypti were significant and positive in most townships in Taiwan. However, Ae. albopictus had significant but negative regression coefficients in clusters of dengue epidemics. In the global bivariate Moran's index, spatial dependence between the incidence of autochthonous dengue and vector density of Ae. aegypti was significant and exhibited positive correlation in Taiwan (bivariate Moran's index = 0.51). However, Ae. albopictus exhibited positively significant but low correlation (bivariate Moran's index = 0.06). Similar results were observed in the two spatial methods between all dengue incidences and Aedes mosquitoes (Ae. aegypti and Ae. albopictus). The regression coefficients of spatially regressed relationships between imported dengue cases and Aedes mosquitoes (Ae. aegypti and Ae. albopictus) were significant in 348 townships in Taiwan. The results indicated that local Aedes mosquitoes do not contribute to the dengue incidence of imported cases. The density of Ae. aegypti positively correlated with the density of human population. By contrast, the density of Ae. albopictus negatively correlated with the density of human population in the areas of southern Taiwan. The results indicated that Ae. aegypti has more opportunities for human-mosquito contact in dengue endemic areas in southern Taiwan. Ae. aegypti, but not Ae. albopictus, and human population density in southern Taiwan are closely associated with an increased risk of autochthonous dengue incidence.
NASA Astrophysics Data System (ADS)
Wu, W.; Chen, G. Y.; Kang, R.; Xia, J. C.; Huang, Y. P.; Chen, K. J.
2017-07-01
During slaughtering and further processing, chicken carcasses are inevitably contaminated by microbial pathogen contaminants. Due to food safety concerns, many countries implement a zero-tolerance policy that forbids the placement of visibly contaminated carcasses in ice-water chiller tanks during processing. Manual detection of contaminants is labor consuming and imprecise. Here, a successive projections algorithm (SPA)-multivariable linear regression (MLR) classifier based on an optimal performance threshold was developed for automatic detection of contaminants on chicken carcasses. Hyperspectral images were obtained using a hyperspectral imaging system. A regression model of the classifier was established by MLR based on twelve characteristic wavelengths (505, 537, 561, 562, 564, 575, 604, 627, 656, 665, 670, and 689 nm) selected by SPA , and the optimal threshold T = 1 was obtained from the receiver operating characteristic (ROC) analysis. The SPA-MLR classifier provided the best detection results when compared with the SPA-partial least squares (PLS) regression classifier and the SPA-least squares supported vector machine (LS-SVM) classifier. The true positive rate (TPR) of 100% and the false positive rate (FPR) of 0.392% indicate that the SPA-MLR classifier can utilize spatial and spectral information to effectively detect contaminants on chicken carcasses.
Candolfi, Marianela; Curtin, James F; Yagiz, Kader; Assi, Hikmat; Wibowo, Mia K; Alzadeh, Gabrielle E; Foulad, David; Muhammad, A K M G; Salehi, Sofia; Keech, Naomi; Puntel, Mariana; Liu, Chunyan; Sanderson, Nicholas R; Kroeger, Kurt M; Dunn, Robert; Martins, Gislaine; Lowenstein, Pedro R; Castro, Maria G
2011-10-01
We have demonstrated that modifying the tumor microenvironment through intratumoral administration of adenoviral vectors (Ad) encoding the conditional cytotoxic molecule, i.e., HSV1-TK and the immune-stimulatory cytokine, i.e., fms-like tyrosine kinase 3 ligand (Flt3L) leads to T-cell-dependent tumor regression in rodent models of glioblastoma. We investigated the role of B cells during immune-mediated glioblastoma multiforme regression. Although treatment with Ad-TK+Ad-Flt3L induced tumor regression in 60% of wild-type (WT) mice, it completely failed in B-cell-deficient Igh6(-/-) mice. Tumor-specific T-cell precursors were detected in Ad-TK+Ad-Flt3L-treated WT mice but not in Igh6(-/-) mice. The treatment also failed in WT mice depleted of total B cells or marginal zone B cells. Because we could not detect circulating antibodies against tumor cells and the treatment was equally efficient in WT mice and in mice with B-cell-specific deletion of Prdm 1 (encoding Blimp-1), in which B cells are present but unable to fully differentiate into antibody-secreting plasma cells, tumor regression in this model is not dependent on B cells' production of tumor antigen-specific immunoglobulins. Instead, B cells seem to play a role as antigen-presenting cells (APCs). Treatment with Ad-TK+Ad-Flt3L led to an increase in the number of B cells in the cervical lymph nodes, which stimulated the proliferation of syngeneic T cells and induced clonal expansion of antitumor T cells. Our data show that B cells act as APCs, playing a critical role in clonal expansion of tumor antigen-specific T cells and brain tumor regression.
NASA Astrophysics Data System (ADS)
Verrelst, Jochem; Rivera, J. P.; Alonso, L.; Guanter, L.; Moreno, J.
2012-04-01
ESA’s upcoming satellites Sentinel-2 (S2) and Sentinel-3 (S3) aim to ensure continuity for Landsat 5/7, SPOT- 5, SPOT-Vegetation and Envisat MERIS observations by providing superspectral images of high spatial and temporal resolution. S2 and S3 will deliver near real-time operational products with a high accuracy for land monitoring. This unprecedented data availability leads to an urgent need for developing robust and accurate retrieval methods. Machine learning regression algorithms could be powerful candidates for the estimation of biophysical parameters from satellite reflectance measurements because of their ability to perform adaptive, nonlinear data fitting. By using data from the ESA-led field campaign SPARC (Barrax, Spain), it was recently found [1] that Gaussian processes regression (GPR) outperformed competitive machine learning algorithms such as neural networks, support vector regression) and kernel ridge regression both in terms of accuracy and computational speed. For various Sentinel configurations (S2-10m, S2- 20m, S2-60m and S3-300m) three important biophysical parameters were estimated: leaf chlorophyll content (Chl), leaf area index (LAI) and fractional vegetation cover (FVC). GPR was the only method that reached the 10% precision required by end users in the estimation of Chl. In view of implementing the regressor into operational monitoring applications, here the portability of locally trained GPR models to other images was evaluated. The associated confidence maps proved to be a good indicator for evaluating the robustness of the trained models. Consistent retrievals were obtained across the different images, particularly over agricultural sites. To make the method suitable for operational use, however, the poorer confidences over bare soil areas suggest that the training dataset should be expanded with inputs from various land cover types.
NASA Astrophysics Data System (ADS)
Tautz-Weinert, J.; Watson, S. J.
2016-09-01
Effective condition monitoring techniques for wind turbines are needed to improve maintenance processes and reduce operational costs. Normal behaviour modelling of temperatures with information from other sensors can help to detect wear processes in drive trains. In a case study, modelling of bearing and generator temperatures is investigated with operational data from the SCADA systems of more than 100 turbines. The focus is here on automated training and testing on a farm level to enable an on-line system, which will detect failures without human interpretation. Modelling based on linear combinations, artificial neural networks, adaptive neuro-fuzzy inference systems, support vector machines and Gaussian process regression is compared. The selection of suitable modelling inputs is discussed with cross-correlation analyses and a sensitivity study, which reveals that the investigated modelling techniques react in different ways to an increased number of inputs. The case study highlights advantages of modelling with linear combinations and artificial neural networks in a feedforward configuration.
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach.
Duarte, Belmiro P M; Wong, Weng Kee
2015-08-01
This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted.
Finding Bayesian Optimal Designs for Nonlinear Models: A Semidefinite Programming-Based Approach
Duarte, Belmiro P. M.; Wong, Weng Kee
2014-01-01
Summary This paper uses semidefinite programming (SDP) to construct Bayesian optimal design for nonlinear regression models. The setup here extends the formulation of the optimal designs problem as an SDP problem from linear to nonlinear models. Gaussian quadrature formulas (GQF) are used to compute the expectation in the Bayesian design criterion, such as D-, A- or E-optimality. As an illustrative example, we demonstrate the approach using the power-logistic model and compare results in the literature. Additionally, we investigate how the optimal design is impacted by different discretising schemes for the design space, different amounts of uncertainty in the parameter values, different choices of GQF and different prior distributions for the vector of model parameters, including normal priors with and without correlated components. Further applications to find Bayesian D-optimal designs with two regressors for a logistic model and a two-variable generalised linear model with a gamma distributed response are discussed, and some limitations of our approach are noted. PMID:26512159
The dynamic relationship between Bursa Malaysia composite index and macroeconomic variables
NASA Astrophysics Data System (ADS)
Ismail, Mohd Tahir; Rose, Farid Zamani Che; Rahman, Rosmanjawati Abd.
2017-08-01
This study investigates and analyzes the long run and short run relationships between Bursa Malaysia Composite index (KLCI) and nine macroeconomic variables in a VAR/VECM framework. After regression analysis seven out the nine macroeconomic variables are chosen for further analysis. The use of Johansen-Juselius Cointegration and Vector Error Correction Model (VECM) technique indicate that there are long run relationships between the seven macroeconomic variables and KLCI. Meanwhile, Granger causality test shows that bidirectional relationship between KLCI and oil price. Furthermore, after 12 months the shock on KLCI are explained by innovations of the seven macroeconomic variables. This indicate the close relationship between macroeconomic variables and KLCI.
Helminths as vectors of pathogens in vertebrate hosts: a theoretical approach.
Perkins, Sarah E; Fenton, Andy
2006-07-01
Pathogens frequently use vectors to facilitate transmission between hosts and, for vertebrate hosts, the vectors are typically ectoparasitic arthropods. However, other parasites that are intimately associated with their hosts may also be ideal candidate vectors; namely the parasitic helminths. Here, we present empirical evidence that helminth vectoring of pathogens occurs in a range of vertebrate systems by a variety of helminth taxa. Using a novel theoretical framework we explore the dynamics of helminth vectoring and determine which host-helminth-pathogen characteristics may favour the evolution of helminth vectoring. We use two theoretical models: the first is a population dynamic model amalgamated from standard macro- and microparasite models, which serves as a framework for investigation of within-host interactions between co-infecting pathogens and helminths. The second is an evolutionary model, which we use to predict the ecological conditions under which we would expect helminth vectoring to evolve. We show that, like arthropod vectors, helminth vectors increase pathogen fitness. However, unlike arthropod vectors, helminth vectoring increases the pathogenic impact on the host and may allow the evolution of high pathogen virulence. We show that concomitant infection of a host with a helminth and pathogen are not necessarily independent of one another, due to helminth vectoring of microparasites, with profound consequences for pathogen persistence and the impact of disease on the host population.
Estimation of Surface Seawater Fugacity of Carbon Dioxide Using Satellite Data and Machine Learning
NASA Astrophysics Data System (ADS)
Jang, E.; Im, J.; Park, G.; Park, Y.
2016-12-01
The ocean controls the climate of Earth by absorbing and releasing CO2 through the carbon cycle. The amount of CO2 in the ocean has increased since the industrial revolution. High CO2 concentration in the ocean has a negative influence to marine organisms and reduces the ability of absorbing CO2 in the ocean. This study estimated surface seawater fugacity of CO2 (fCO2) in the East Sea of Korea using Geostationary Ocean Color Imager (GOCI) and Moderate Resolution Imaging Spectroradiometer (MODIS) satellite data, and Hybrid Coordinate Ocean Model (HYCOM) reanalysis data. GOCI is the world first geostationary ocean color observation satellite sensor, and it provides 8 images with 8 bands hourly per day from 9 am to 4 pm at 500m resolution. Two machine learning approaches (i.e., random forest and support vector regression) were used to model fCO2 in this study. While most of the existing studies used multiple linear regression to estimate the pressure of CO2 in the ocean, machine learning may handle more complex relationship between surface seawater fCO2 and ocean parameters in a dynamic spatiotemporal environment. Five ocean related parameters, colored dissolved organic matter (CDOM), chlorophyll-a (chla), sea surface temperature (SST), sea surface salinity (SSS), and mixed layer depth (MLD), were used as input variables. This study examined two schemes, one with GOCI-derived products and the other with MODIS-derived ones. Results show that random forest performed better than support vector regression regardless of satellite data used. The accuracy of GOCI-based estimation was higher than MODIS-based one, possibly thanks to the better spatiotemporal resolution of GOCI data. MLD was identified the most contributing parameter in estimating surface seawater fCO2 among the five ocean related parameters, which might be related with an active deep convection in the East Sea. The surface seawater fCO2 in summer was higher in general with some spatial variation than the other seasons because of higher SST.
Wang, Xiu-Feng; Zhang, Lei; Wu, Qing-Hua; Min, Jian-Xin; Ma, Na; Luo, Lai-Cheng
2015-01-01
Psychological stress has become a common and important cause of premature ovarian failure (POF). Therefore, it is very important to explore the mechanisms of POF resulting from psychological stress. Sixty SD rats were randomly divided into control and model groups. Biomolecules associated with POF (β-EP, IL-1, NOS, NO, GnRH, CRH, FSH, LH, E2, P, ACTH, and CORT) were measured in the control and psychologically stressed rats. The regulation relationships of the biomolecules were explored in the psychologically stressed state using support vector regression (SVR). The values of β-EP, IL-1, NOS, and GnRH in the hypothalamus decreased significantly, and the value of NO changed slightly, when the values of 3 biomolecules in the hypothalamic-pituitary-adrenal axis decreased. The values of E2 and P in the hypothalamic-pituitary-ovarian axis decreased significantly, while the values of FSH and LH changed slightly, when the values of the biomolecules in the hypothalamus decreased. The values of FSH and LH in the pituitary layer of the hypothalamic-pituitary-ovarian axis changed slightly when the values of E2 and P in the target gland layer of the hypothalamic-pituitary-ovarian axis decreased. An Imbalance in the neuroendocrine-immune bimolecular network, particularly the failure of the feedback action of the target gland layer to pituitary layer in the pituitary-ovarian axis, is possibly one of the pathogenic mechanisms of POF. PMID:26885082
Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-01-01
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout (Oncorhynchus mykiss) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k-Nearest neighbours (k-NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k-NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet’s effects on fish skin. PMID:29596375
Saberioon, Mohammadmehdi; Císař, Petr; Labbé, Laurent; Souček, Pavel; Pelissier, Pablo; Kerneis, Thierry
2018-03-29
The main aim of this study was to develop a new objective method for evaluating the impacts of different diets on the live fish skin using image-based features. In total, one-hundred and sixty rainbow trout ( Oncorhynchus mykiss ) were fed either a fish-meal based diet (80 fish) or a 100% plant-based diet (80 fish) and photographed using consumer-grade digital camera. Twenty-three colour features and four texture features were extracted. Four different classification methods were used to evaluate fish diets including Random forest (RF), Support vector machine (SVM), Logistic regression (LR) and k -Nearest neighbours ( k -NN). The SVM with radial based kernel provided the best classifier with correct classification rate (CCR) of 82% and Kappa coefficient of 0.65. Although the both LR and RF methods were less accurate than SVM, they achieved good classification with CCR 75% and 70% respectively. The k -NN was the least accurate (40%) classification model. Overall, it can be concluded that consumer-grade digital cameras could be employed as the fast, accurate and non-invasive sensor for classifying rainbow trout based on their diets. Furthermore, these was a close association between image-based features and fish diet received during cultivation. These procedures can be used as non-invasive, accurate and precise approaches for monitoring fish status during the cultivation by evaluating diet's effects on fish skin.
Sabel, Michael S.; Rice, John D.; Griffith, Kent A.; Lowe, Lori; Wong, Sandra L.; Chang, Alfred E.; Johnson, Timothy M.; Taylor, Jeremy M.G.
2013-01-01
Introduction To identify melanoma patients at sufficiently low risk of nodal metastases who could avoid SLN biopsy (SLNB). Several statistical models have been proposed based upon patient/tumor characteristics, including logistic regression, classification trees, random forests and support vector machines. We sought to validate recently published models meant to predict sentinel node status. Methods We queried our comprehensive, prospectively-collected melanoma database for consecutive melanoma patients undergoing SLNB. Prediction values were estimated based upon 4 published models, calculating the same reported metrics: negative predictive value (NPV), rate of negative predictions (RNP), and false negative rate (FNR). Results Logistic regression performed comparably with our data when considering NPV (89.4% vs. 93.6%); however the model’s specificity was not high enough to significantly reduce the rate of biopsies (SLN reduction rate of 2.9%). When applied to our data, the classification tree produced NPV and reduction in biopsies rates that were lower 87.7% vs. 94.1% and 29.8% vs. 14.3%, respectively. Two published models could not be applied to our data due to model complexity and the use of proprietary software. Conclusions Published models meant to reduce the SLNB rate among patients with melanoma either underperformed when applied to our larger dataset, or could not be validated. Differences in selection criteria and histopathologic interpretation likely resulted in underperformance. Development of statistical predictive models must be created in a clinically applicable manner to allow for both validation and ultimately clinical utility. PMID:21822550
Computational model of a vector-mediated epidemic
NASA Astrophysics Data System (ADS)
Dickman, Adriana Gomes; Dickman, Ronald
2015-05-01
We discuss a lattice model of vector-mediated transmission of a disease to illustrate how simulations can be applied in epidemiology. The population consists of two species, human hosts and vectors, which contract the disease from one another. Hosts are sedentary, while vectors (mosquitoes) diffuse in space. Examples of such diseases are malaria, dengue fever, and Pierce's disease in vineyards. The model exhibits a phase transition between an absorbing (infection free) phase and an active one as parameters such as infection rates and vector density are varied.
Kinematic sensitivity of robot manipulators
NASA Technical Reports Server (NTRS)
Vuskovic, Marko I.
1989-01-01
Kinematic sensitivity vectors and matrices for open-loop, n degrees-of-freedom manipulators are derived. First-order sensitivity vectors are defined as partial derivatives of the manipulator's position and orientation with respect to its geometrical parameters. The four-parameter kinematic model is considered, as well as the five-parameter model in case of nominally parallel joint axes. Sensitivity vectors are expressed in terms of coordinate axes of manipulator frames. Second-order sensitivity vectors, the partial derivatives of first-order sensitivity vectors, are also considered. It is shown that second-order sensitivity vectors can be expressed as vector products of the first-order sensitivity vectors.
2014-01-01
Background Dengue vector control programmes are facing operational challenges due to resistance against commonly used insecticides throughout the endemic countries. Recently, there has been appreciable increase in the dengue cases in India, however, no recent data are available on susceptible status of dengue vectors. We have studied the susceptibility level of St. albopicta to commonly used insecticides in India. Adult mosquitoes were tested for the presence of dengue virus. Methods St. albopicta larval bioassays were carried out to determine the lethal concentrations (LC10, LC50 and LC99) and the resistance ratios (RR10, RR50 and RR99) for temephos. Susceptibility to 4% DDT, 0.05% deltamethrin and 5% malathion was assessed following standard procedure. Knock-down times (KDT10, KDT50 and KDT99) were estimated and knock-down resistance ratios (KRR10, KRR50 and KRR99) were calculated. VectorTest™ dengue antigen assay was used to detect the dengue virus in the field collected mosquitoes. Results In larval bioassays, the RR ranged from 1.4 (for RR99) to 1.7 (for RR50), which suggested that the tested St. albopicta were susceptible to temephos. There was no deviation among the lethal concentration data from linearity (r2 = 0.61). Adult St. albopicta mosquitoes were resistant to DDT, while fully susceptible to deltamethrin and malathion. The knock-down values (KDT10, KDT50 and KDT99) obtained for DDT displayed straight line in log-dose-probit analysis and follow linear regression model. The KRR99 for DDT was 4.9, which indicated a 4.9 folds increase in knock-down resistance to DDT. However, for malathion and deltamethrin, the KRR99 values were 1.6 and 1.5 respectively suggesting that mosquitoes were knock-down sensitive. None of the mosquito pool was dengue virus positive. Conclusion St. albopicta showed resistance to DDT and reduced sensitivity to deltamethrin and malathion. This data on insecticide resistance could help public health authorities in India to design more effective vector control measures. More dengue vector specimens need to be scanned to identify the potential dengue vector. PMID:24981885
GWAS-based machine learning approach to predict duloxetine response in major depressive disorder.
Maciukiewicz, Malgorzata; Marshe, Victoria S; Hauschild, Anne-Christin; Foster, Jane A; Rotzinger, Susan; Kennedy, James L; Kennedy, Sidney H; Müller, Daniel J; Geraci, Joseph
2018-04-01
Major depressive disorder (MDD) is one of the most prevalent psychiatric disorders and is commonly treated with antidepressant drugs. However, large variability is observed in terms of response to antidepressants. Machine learning (ML) models may be useful to predict treatment outcomes. A sample of 186 MDD patients received treatment with duloxetine for up to 8 weeks were categorized as "responders" based on a MADRS change >50% from baseline; or "remitters" based on a MADRS score ≤10 at end point. The initial dataset (N = 186) was randomly divided into training and test sets in a nested 5-fold cross-validation, where 80% was used as a training set and 20% made up five independent test sets. We performed genome-wide logistic regression to identify potentially significant variants related to duloxetine response/remission and extracted the most promising predictors using LASSO regression. Subsequently, classification-regression trees (CRT) and support vector machines (SVM) were applied to construct models, using ten-fold cross-validation. With regards to response, none of the pairs performed significantly better than chance (accuracy p > .1). For remission, SVM achieved moderate performance with an accuracy = 0.52, a sensitivity = 0.58, and a specificity = 0.46, and 0.51 for all coefficients for CRT. The best performing SVM fold was characterized by an accuracy = 0.66 (p = .071), sensitivity = 0.70 and a sensitivity = 0.61. In this study, the potential of using GWAS data to predict duloxetine outcomes was examined using ML models. The models were characterized by a promising sensitivity, but specificity remained moderate at best. The inclusion of additional non-genetic variables to create integrated models may improve prediction. Copyright © 2017. Published by Elsevier Ltd.
NASA Astrophysics Data System (ADS)
Ferwerda, Carolin
2009-12-01
Since its introduction to North America in 1987, the Asian tiger mosquito (Aedes albopictus) has spread rapidly. Due to its unique ecology and preference for container breeding sites, Ae. albopictus commonly inhabits urban/suburban areas and is often in close contact with humans. An aggressive pest, this mosquito species is a vector of multiple arboviruses. In order for mosquito control efforts to remain effective, control of this important vector must be guided by spatially explicit habitat models that aid in predicting mosquito outbreaks. Using linear regression, I determined the relationship between adult Ae. albopictus abundance and climate, census, and land use factors in nine urban/suburban study sites in central New Jersey. Systematically collected adult counts (females and males) from July to October 2008, served as estimates of abundance. Fine-scale land use/land cover data were obtained from object-oriented classifications of 2007 CIR orthophotos in Definiens eCognition. Mosquito abundance data were tested for spatial autocorrelation via Moran's I, semivariograms, and hotspot analysis in order to reveal consistent patterns in abundance. Spatial pattern analysis produced little evidence of consistent spatial autocorrelation, though several sites exhibited recurring hotspots, especially in areas near residential housing and vegetation. Stepwise multiple regression was able to explain 20-25 percent of variation in Ae. albopictus abundance at the 'backyard' or cell level and 72-78 percent of variation in abundance at the 'neighborhood' or study site level. Meteorological variables (temperature on the trap date and precipitation), census variables (vacant housing units and population density), and more detailed land use/land cover classes (deciduous woody vegetation, rights-of-way and vacant lots) were frequently selected in all eight models, though many other independent variables were included in the individual models. The results of the spatial statistics suggest that clustering may occur at a broader extent, while the superior predictive ability of the site level models over the finer grain cell level models supports this conclusion. Future work should focus on validating these models with 2009 field data and testing whether finer grain weather and census data enhance the models' predictive ability. Given the major differences between individual county models, future studies should further explore variations in Ae. albopictus habitat preferences in different geographic locations.
Zou, Meng; Liu, Zhaoqi; Zhang, Xiang-Sun; Wang, Yong
2015-10-15
In prognosis and survival studies, an important goal is to identify multi-biomarker panels with predictive power using molecular characteristics or clinical observations. Such analysis is often challenged by censored, small-sample-size, but high-dimensional genomic profiles or clinical data. Therefore, sophisticated models and algorithms are in pressing need. In this study, we propose a novel Area Under Curve (AUC) optimization method for multi-biomarker panel identification named Nearest Centroid Classifier for AUC optimization (NCC-AUC). Our method is motived by the connection between AUC score for classification accuracy evaluation and Harrell's concordance index in survival analysis. This connection allows us to convert the survival time regression problem to a binary classification problem. Then an optimization model is formulated to directly maximize AUC and meanwhile minimize the number of selected features to construct a predictor in the nearest centroid classifier framework. NCC-AUC shows its great performance by validating both in genomic data of breast cancer and clinical data of stage IB Non-Small-Cell Lung Cancer (NSCLC). For the genomic data, NCC-AUC outperforms Support Vector Machine (SVM) and Support Vector Machine-based Recursive Feature Elimination (SVM-RFE) in classification accuracy. It tends to select a multi-biomarker panel with low average redundancy and enriched biological meanings. Also NCC-AUC is more significant in separation of low and high risk cohorts than widely used Cox model (Cox proportional-hazards regression model) and L1-Cox model (L1 penalized in Cox model). These performance gains of NCC-AUC are quite robust across 5 subtypes of breast cancer. Further in an independent clinical data, NCC-AUC outperforms SVM and SVM-RFE in predictive accuracy and is consistently better than Cox model and L1-Cox model in grouping patients into high and low risk categories. In summary, NCC-AUC provides a rigorous optimization framework to systematically reveal multi-biomarker panel from genomic and clinical data. It can serve as a useful tool to identify prognostic biomarkers for survival analysis. NCC-AUC is available at http://doc.aporc.org/wiki/NCC-AUC. ywang@amss.ac.cn Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
NASA Astrophysics Data System (ADS)
Becciolini, Diego; Franzosi, Diogo Buarque; Foadi, Roshan; Frandsen, Mads T.; Hapola, Tuomas; Sannino, Francesco
2015-07-01
We analyze the Large Hadron Collider (LHC) phenomenology of heavy vector resonances with a S U (2 )L×S U (2 )R spectral global symmetry. This symmetry partially protects the electroweak S parameter from large contributions of the vector resonances. The resulting custodial vector model spectrum and interactions with the standard model fields lead to distinct signatures at the LHC in the diboson, dilepton, and associated Higgs channels.
Wang, Jing-Jing; Wu, Hai-Feng; Sun, Tao; Li, Xia; Wang, Wei; Tao, Li-Xin; Huo, Da; Lv, Ping-Xin; He, Wen; Guo, Xiu-Hua
2013-01-01
Lung cancer, one of the leading causes of cancer-related deaths, usually appears as solitary pulmonary nodules (SPNs) which are hard to diagnose using the naked eye. In this paper, curvelet-based textural features and clinical parameters are used with three prediction models [a multilevel model, a least absolute shrinkage and selection operator (LASSO) regression method, and a support vector machine (SVM)] to improve the diagnosis of benign and malignant SPNs. Dimensionality reduction of the original curvelet-based textural features was achieved using principal component analysis. In addition, non-conditional logistical regression was used to find clinical predictors among demographic parameters and morphological features. The results showed that, combined with 11 clinical predictors, the accuracy rates using 12 principal components were higher than those using the original curvelet-based textural features. To evaluate the models, 10-fold cross validation and back substitution were applied. The results obtained, respectively, were 0.8549 and 0.9221 for the LASSO method, 0.9443 and 0.9831 for SVM, and 0.8722 and 0.9722 for the multilevel model. All in all, it was found that using curvelet-based textural features after dimensionality reduction and using clinical predictors, the highest accuracy rate was achieved with SVM. The method may be used as an auxiliary tool to differentiate between benign and malignant SPNs in CT images.
Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction.
Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi
2017-08-08
Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors.
Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction
Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi
2017-01-01
Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors. PMID:28786957
A FORTRAN program for multivariate survival analysis on the personal computer.
Mulder, P G
1988-01-01
In this paper a FORTRAN program is presented for multivariate survival or life table regression analysis in a competing risks' situation. The relevant failure rate (for example, a particular disease or mortality rate) is modelled as a log-linear function of a vector of (possibly time-dependent) explanatory variables. The explanatory variables may also include the variable time itself, which is useful for parameterizing piecewise exponential time-to-failure distributions in a Gompertz-like or Weibull-like way as a more efficient alternative to Cox's proportional hazards model. Maximum likelihood estimates of the coefficients of the log-linear relationship are obtained from the iterative Newton-Raphson method. The program runs on a personal computer under DOS; running time is quite acceptable, even for large samples.
Hybrid approach of selecting hyperparameters of support vector machine for regression.
Jeng, Jin-Tsong
2006-06-01
To select the hyperparameters of the support vector machine for regression (SVR), a hybrid approach is proposed to determine the kernel parameter of the Gaussian kernel function and the epsilon value of Vapnik's epsilon-insensitive loss function. The proposed hybrid approach includes a competitive agglomeration (CA) clustering algorithm and a repeated SVR (RSVR) approach. Since the CA clustering algorithm is used to find the nearly "optimal" number of clusters and the centers of clusters in the clustering process, the CA clustering algorithm is applied to select the Gaussian kernel parameter. Additionally, an RSVR approach that relies on the standard deviation of a training error is proposed to obtain an epsilon in the loss function. Finally, two functions, one real data set (i.e., a time series of quarterly unemployment rate for West Germany) and an identification of nonlinear plant are used to verify the usefulness of the hybrid approach.
Passos, Cláudia P; Cardoso, Susana M; Barros, António S; Silva, Carlos M; Coimbra, Manuel A
2010-02-28
Fourier transform infrared (FTIR) spectroscopy has being emphasised as a widespread technique in the quick assess of food components. In this work, procyanidins were extracted with methanol and acetone/water from the seeds of white and red grape varieties. A fractionation by graded methanol/chloroform precipitations allowed to obtain 26 samples that were characterised using thiolysis as pre-treatment followed by HPLC-UV and MS detection. The average degree of polymerisation (DPn) of the procyanidins in the samples ranged from 2 to 11 flavan-3-ol residues. FTIR spectroscopy within the wavenumbers region of 1800-700 cm(-1) allowed to build a partial least squares (PLS1) regression model with 8 latent variables (LVs) for the estimation of the DPn, giving a RMSECV of 11.7%, with a R(2) of 0.91 and a RMSEP of 2.58. The application of orthogonal projection to latent structures (O-PLS1) clarifies the interpretation of the regression model vectors. Moreover, the O-PLS procedure has removed 88% of non-correlated variations with the DPn, allowing to relate the increase of the absorbance peaks at 1203 and 1099 cm(-1) with the increase of the DPn due to the higher proportion of substitutions in the aromatic ring of the polymerised procyanidin molecules. Copyright 2009 Elsevier B.V. All rights reserved.
Linard, Catherine; Lamarque, Pénélope; Heyman, Paul; Ducoffre, Geneviève; Luyasu, Victor; Tersago, Katrien; Vanwambeke, Sophie O; Lambin, Eric F
2007-05-02
Vector-borne and zoonotic diseases generally display clear spatial patterns due to different space-dependent factors. Land cover and land use influence disease transmission by controlling both the spatial distribution of vectors or hosts, and the probability of contact with susceptible human populations. The objective of this study was to combine environmental and socio-economic factors to explain the spatial distribution of two emerging human diseases in Belgium, Puumala virus (PUUV) and Lyme borreliosis. Municipalities were taken as units of analysis. Negative binomial regressions including a correction for spatial endogeneity show that the spatial distribution of PUUV and Lyme borreliosis infections are associated with a combination of factors linked to the vector and host populations, to human behaviours, and to landscape attributes. Both diseases are associated with the presence of forests, which are the preferred habitat for vector or host populations. The PUUV infection risk is higher in remote forest areas, where the level of urbanisation is low, and among low-income populations. The Lyme borreliosis transmission risk is higher in mixed landscapes with forests and spatially dispersed houses, mostly in wealthy peri-urban areas. The spatial dependence resulting from a combination of endogenous and exogenous processes could be accounted for in the model on PUUV but not for Lyme borreliosis. A large part of the spatial variation in disease risk can be explained by environmental and socio-economic factors. The two diseases not only are most prevalent in different regions but also affect different groups of people. Combining these two criteria may increase the efficiency of information campaigns through appropriate targeting.
Brito, Raíssa N; Gorla, David E; Diotaiuti, Liléia; Gomes, Anália C F; Souza, Rita C M; Abad-Franch, Fernando
2017-11-01
Insecticide spraying efficiently controls house infestation by triatomine bugs, the vectors of Trypanosoma cruzi. The strategy, however, is ineffective against sylvatic triatomines, which can transmit Chagas disease by invading (without colonizing) man-made structures. Despite growing awareness of the relevance of these transmission dynamics, the drivers of house invasion by sylvatic triatomines remain poorly understood. About 12,000 sylvatic triatomines were caught during routine surveillance in houses of Tocantins state, Brazil, in 2005-2013. Using negative binomial regression, information-theoretic model evaluation/averaging, and external model validation, we investigated the effects of regional (Amazon/Cerrado), landscape (preservation/disturbance), and climate covariates (temperature, rainfall) on the municipality-aggregated numbers of house-invading Rhodnius pictipes, R. robustus, R. neglectus, and Panstrongylus geniculatus. House invasion by R. pictipes and R. robustus was overall more frequent in the Amazon biome, tended to increase in municipalities with more well-preserved land, and decreased in rainier municipalities. Across species, invasion decreased with higher landscape-disturbance levels and in hotter-day municipalities. Invasion by R. neglectus and P. geniculatus increased somewhat with more land at intermediate disturbance and peaked in average-rainfall municipalities. Temperature effects were more pronounced on P. geniculatus than on Rhodnius spp. We report widespread, frequent house invasion by sylvatic triatomines in the Amazon-Cerrado transition. Our analyses indicate that readily available environmental metrics may help predict the risk of contact between sylvatic triatomines and humans at coarse geographic scales, and hint at specific hypotheses about climate and deforestation effects on those vectors-with some taxon-specific responses and some seemingly general trends. Thus, our focal species appear to be quite sensitive to higher temperatures, and might be less common in more heavily-disturbed than in better-preserved environments. This study illustrates, in sum, how entomological routine-surveillance data can be efficiently used for Chagas disease risk prediction and stratification when house-colonizing vectors are absent.
Orthogonal vector algorithm to obtain the solar vector using the single-scattering Rayleigh model.
Wang, Yinlong; Chu, Jinkui; Zhang, Ran; Shi, Chao
2018-02-01
Information obtained from a polarization pattern in the sky provides many animals like insects and birds with vital long-distance navigation cues. The solar vector can be derived from the polarization pattern using the single-scattering Rayleigh model. In this paper, an orthogonal vector algorithm, which utilizes the redundancy of the single-scattering Rayleigh model, is proposed. We use the intersection angles between the polarization vectors as the main criteria in our algorithm. The assumption that all polarization vectors can be considered coplanar is used to simplify the three-dimensional (3D) problem with respect to the polarization vectors in our simulation. The surface-normal vector of the plane, which is determined by the polarization vectors after translation, represents the solar vector. Unfortunately, the two-directionality of the polarization vectors makes the resulting solar vector ambiguous. One important result of this study is, however, that this apparent disadvantage has no effect on the complexity of the algorithm. Furthermore, two other universal least-squares algorithms were investigated and compared. A device was then constructed, which consists of five polarized-light sensors as well as a 3D attitude sensor. Both the simulation and experimental data indicate that the orthogonal vector algorithms, if used with a suitable threshold, perform equally well or better than the other two algorithms. Our experimental data reveal that if the intersection angles between the polarization vectors are close to 90°, the solar-vector angle deviations are small. The data also support the assumption of coplanarity. During the 51 min experiment, the mean of the measured solar-vector angle deviations was about 0.242°, as predicted by our theoretical model.
Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas
Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian
2016-01-01
The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R2 of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R2 between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R2 value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R2 and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible. PMID:27367708
Spectral Estimation Model Construction of Heavy Metals in Mining Reclamation Areas.
Dong, Jihong; Dai, Wenting; Xu, Jiren; Li, Songnian
2016-06-28
The study reported here examined, as the research subject, surface soils in the Liuxin mining area of Xuzhou, and explored the heavy metal content and spectral data by establishing quantitative models with Multivariable Linear Regression (MLR), Generalized Regression Neural Network (GRNN) and Sequential Minimal Optimization for Support Vector Machine (SMO-SVM) methods. The study results are as follows: (1) the estimations of the spectral inversion models established based on MLR, GRNN and SMO-SVM are satisfactory, and the MLR model provides the worst estimation, with R² of more than 0.46. This result suggests that the stress sensitive bands of heavy metal pollution contain enough effective spectral information; (2) the GRNN model can simulate the data from small samples more effectively than the MLR model, and the R² between the contents of the five heavy metals estimated by the GRNN model and the measured values are approximately 0.7; (3) the stability and accuracy of the spectral estimation using the SMO-SVM model are obviously better than that of the GRNN and MLR models. Among all five types of heavy metals, the estimation for cadmium (Cd) is the best when using the SMO-SVM model, and its R² value reaches 0.8628; (4) using the optimal model to invert the Cd content in wheat that are planted on mine reclamation soil, the R² and RMSE between the measured and the estimated values are 0.6683 and 0.0489, respectively. This result suggests that the method using the SMO-SVM model to estimate the contents of heavy metals in wheat samples is feasible.
Schleier, Jerome J.; Peterson, Robert K.D.; Irvine, Kathryn M.; Marshall, Lucy M.; Weaver, David K.; Preftakes, Collin J.
2012-01-01
One of the more effective ways of managing high densities of adult mosquitoes that vector human and animal pathogens is ultra-low-volume (ULV) aerosol applications of insecticides. The U.S. Environmental Protection Agency uses models that are not validated for ULV insecticide applications and exposure assumptions to perform their human and ecological risk assessments. Currently, there is no validated model that can accurately predict deposition of insecticides applied using ULV technology for adult mosquito management. In addition, little is known about the deposition and drift of small droplets like those used under conditions encountered during ULV applications. The objective of this study was to perform field studies to measure environmental concentrations of insecticides and to develop a validated model to predict the deposition of ULV insecticides. The final regression model was selected by minimizing the Bayesian Information Criterion and its prediction performance was evaluated using k-fold cross validation. Density of the formulation and the density and CMD interaction coefficients were the largest in the model. The results showed that as density of the formulation decreases, deposition increases. The interaction of density and CMD showed that higher density formulations and larger droplets resulted in greater deposition. These results are supported by the aerosol physics literature. A k-fold cross validation demonstrated that the mean square error of the selected regression model is not biased, and the mean square error and mean square prediction error indicated good predictive ability.
Lee, Byeong-Ju; Kim, Hye-Youn; Lim, Sa Rang; Huang, Linfang; Choi, Hyung-Kyoon
2017-01-01
Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values.
Lim, Sa Rang; Huang, Linfang
2017-01-01
Panax ginseng C.A. Meyer is a herb used for medicinal purposes, and its discrimination according to cultivation age has been an important and practical issue. This study employed Fourier-transform infrared (FT-IR) spectroscopy with multivariate statistical analysis to obtain a prediction model for discriminating cultivation ages (5 and 6 years) and three different parts (rhizome, tap root, and lateral root) of P. ginseng. The optimal partial-least-squares regression (PLSR) models for discriminating ginseng samples were determined by selecting normalization methods, number of partial-least-squares (PLS) components, and variable influence on projection (VIP) cutoff values. The best prediction model for discriminating 5- and 6-year-old ginseng was developed using tap root, vector normalization applied after the second differentiation, one PLS component, and a VIP cutoff of 1.0 (based on the lowest root-mean-square error of prediction value). In addition, for discriminating among the three parts of P. ginseng, optimized PLSR models were established using data sets obtained from vector normalization, two PLS components, and VIP cutoff values of 1.5 (for 5-year-old ginseng) and 1.3 (for 6-year-old ginseng). To our knowledge, this is the first study to provide a novel strategy for rapidly discriminating the cultivation ages and parts of P. ginseng using FT-IR by selected normalization methods, number of PLS components, and VIP cutoff values. PMID:29049369
Lu, Wei-Zhen; Wang, Wen-Jian
2005-04-01
Monitoring and forecasting of air quality parameters are popular and important topics of atmospheric and environmental research today due to the health impact caused by exposing to air pollutants existing in urban air. The accurate models for air pollutant prediction are needed because such models would allow forecasting and diagnosing potential compliance or non-compliance in both short- and long-term aspects. Artificial neural networks (ANN) are regarded as reliable and cost-effective method to achieve such tasks and have produced some promising results to date. Although ANN has addressed more attentions to environmental researchers, its inherent drawbacks, e.g., local minima, over-fitting training, poor generalization performance, determination of the appropriate network architecture, etc., impede the practical application of ANN. Support vector machine (SVM), a novel type of learning machine based on statistical learning theory, can be used for regression and time series prediction and have been reported to perform well by some promising results. The work presented in this paper aims to examine the feasibility of applying SVM to predict air pollutant levels in advancing time series based on the monitored air pollutant database in Hong Kong downtown area. At the same time, the functional characteristics of SVM are investigated in the study. The experimental comparisons between the SVM model and the classical radial basis function (RBF) network demonstrate that the SVM is superior to the conventional RBF network in predicting air quality parameters with different time series and of better generalization performance than the RBF model.
Nonparametric Stochastic Model for Uncertainty Quantifi cation of Short-term Wind Speed Forecasts
NASA Astrophysics Data System (ADS)
AL-Shehhi, A. M.; Chaouch, M.; Ouarda, T.
2014-12-01
Wind energy is increasing in importance as a renewable energy source due to its potential role in reducing carbon emissions. It is a safe, clean, and inexhaustible source of energy. The amount of wind energy generated by wind turbines is closely related to the wind speed. Wind speed forecasting plays a vital role in the wind energy sector in terms of wind turbine optimal operation, wind energy dispatch and scheduling, efficient energy harvesting etc. It is also considered during planning, design, and assessment of any proposed wind project. Therefore, accurate prediction of wind speed carries a particular importance and plays significant roles in the wind industry. Many methods have been proposed in the literature for short-term wind speed forecasting. These methods are usually based on modeling historical fixed time intervals of the wind speed data and using it for future prediction. The methods mainly include statistical models such as ARMA, ARIMA model, physical models for instance numerical weather prediction and artificial Intelligence techniques for example support vector machine and neural networks. In this paper, we are interested in estimating hourly wind speed measures in United Arab Emirates (UAE). More precisely, we predict hourly wind speed using a nonparametric kernel estimation of the regression and volatility functions pertaining to nonlinear autoregressive model with ARCH model, which includes unknown nonlinear regression function and volatility function already discussed in the literature. The unknown nonlinear regression function describe the dependence between the value of the wind speed at time t and its historical data at time t -1, t - 2, … , t - d. This function plays a key role to predict hourly wind speed process. The volatility function, i.e., the conditional variance given the past, measures the risk associated to this prediction. Since the regression and the volatility functions are supposed to be unknown, they are estimated using nonparametric kernel methods. In addition, to the pointwise hourly wind speed forecasts, a confidence interval is also provided which allows to quantify the uncertainty around the forecasts.
Sheela, A M; Sarun, S; Justus, J; Vineetha, P; Sheeja, R V
2015-04-01
Vector borne diseases are a threat to human health. Little attention has been paid to the prevention of these diseases. We attempted to identify the significant wetland characteristics associated with the spread of chikungunya, dengue fever and malaria in Kerala, a tropical region of South West India using multivariate analyses (hierarchical cluster analysis, factor analysis and multiple regression). High/medium turbid coastal lagoons and inland water-logged wetlands with aquatic vegetation have significant effect on the incidence of chikungunya while dengue influenced by high turbid coastal beaches and malaria by medium turbid coastal beaches. The high turbidity in water is due to the urban waste discharge namely sewage, sullage and garbage from the densely populated cities and towns. The large extent of wetland is low land area favours the occurrence of vector borne diseases. Hence the provision of pollution control measures at source including soil erosion control measures is vital. The identification of vulnerable zones favouring the vector borne diseases will help the authorities to control pollution especially from urban areas and prevent these vector borne diseases. Future research should cover land use cover changes, climatic factors, seasonal variations in weather and pollution factors favouring the occurrence of vector borne diseases.